Papers Archive
18,400 AI research papers indexed with full abstracts, author metadata, and citation counts.
Attention Is All You Need (Revisited: Seven Years of Transformers)
A retrospective analysis of the transformer architecture's impact across NLP, vision, audio, and multimodal domains. We trace architectural variants, scaling behavior, and limitations that have spurred successor architectures.
Chinchilla Scaling Laws: Revisiting Optimal Compute Allocation
We revisit the Chinchilla compute-optimal training analysis with updated data from 47 model training runs, finding that the optimal token-to-parameter ratio may be significantly higher than previously reported, with implications for frontier model training.
LoRA: Low-Rank Adaptation of Large Language Models (Extended Study)
Extended analysis of LoRA's effectiveness across 200+ model family and task combinations, with new theoretical grounding, quantization-aware variants, and guidance for practitioners fine-tuning models from 1B to 70B parameters.
Apple's Private Cloud Compute: Technical Architecture and Privacy Guarantees
Independent analysis of Apple's server-side AI inference infrastructure announced in WWDC 2024. Examines the attestation model, stateless processing guarantees, and implications for AI training data collection practices.
Who Is Crawling Your Website? Large-Scale Analysis of AI Bot Traffic 2022–2024
Using honeypot networks across 1,200 domains, we characterize the crawling behavior of 23 AI companies. Apple's Applebot activity increased 840% in 2023. OpenAI's GPTBot respects robots.txt in 98.2% of cases. Anthropic's crawlers show irregular patterns.
The Role of Web Data Quality in LLM Factuality and Hallucination
We trace hallucinations in GPT-4, Claude 3, and Gemini Pro to their training data sources. Models trained on higher-quality filtered web data exhibit 34% fewer factual errors on HELM benchmarks, suggesting data curation matters more than scale beyond a threshold.