Something unusual happened to our crawler monitoring network in 2023. Across 1,200 domains spanning tech documentation, scientific journals, product review sites, and multilingual news sources, one bot quietly began appearing more and more frequently: Applebot.
By the time we compiled our annual crawler analysis in early 2025, the numbers were striking. Between Q3 2023 and Q4 2024, Applebot-attributed requests across our monitored network had increased by 840%. For comparison, GPTBot — OpenAI's crawler, which drew significant press coverage at launch — grew by 210% in the same period. ClaudeBot grew by 340%.
Background: What Is Applebot?
Applebot has existed since at least 2015, when Apple officially documented it as the crawler behind Siri's web search results and Spotlight suggestions. For years, it behaved like a polite, modest crawler — appearing occasionally on high-traffic sites and largely staying out of the conversation about AI data collection.
That changed in June 2024, when Apple introduced Apple Intelligence at WWDC — a suite of on-device and server-side AI features requiring substantial training data. Apple simultaneously published its foundation model (AFM) technical report, which described models trained on "carefully curated licensed data, synthetic data, and publicly available data from the internet."
📌 Key insight: Apple's privacy-first positioning means they cannot use user data for training. This makes high-quality public web data proportionally more important to Apple's AI stack than to competitors who can leverage user interactions.
What Content Is Applebot Targeting?
By analyzing request patterns across domain categories, we identified clear content preferences for Applebot compared to other major crawlers:
| Content Category | Applebot | GPTBot | ClaudeBot |
|---|---|---|---|
| Scientific / Academic Text | +42% vs. avg | +18% vs. avg | +31% vs. avg |
| Product Descriptions | +38% vs. avg | +5% vs. avg | -8% vs. avg |
| Multilingual Content | +51% vs. avg | +12% vs. avg | +28% vs. avg |
| Long-form Articles (>2000w) | +29% vs. avg | +11% vs. avg | +47% vs. avg |
| API / Technical Docs | +8% vs. avg | +52% vs. avg | +19% vs. avg |
| Social Media / UGC | -22% vs. avg | -5% vs. avg | -19% vs. avg |
Three categories stand out: scientific text, product descriptions, and multilingual content. This pattern is consistent with Apple's stated focus areas for Apple Intelligence: answering factual queries (Siri), understanding and describing products (shopping, App Store), and supporting multiple languages across their global user base.
Crawl Behavior Analysis
Crawl Delay and Politeness
Applebot observes an average crawl delay of 3.6 seconds across observed domains — significantly more conservative than GPTBot (1.1s average) and Bytespider, which was observed ignoring crawl-delay directives entirely. This behavior aligns with Apple's public documentation and their 99.1% robots.txt compliance rate, the highest we observed among all major AI crawlers.
Re-crawl Frequency
Applebot shows a strong re-crawl pattern for pages it has previously indexed. Pages first crawled in Q3 2023 were re-crawled an average of 4.8 times over the following 12 months, suggesting active content freshness monitoring rather than one-time bulk collection.
HEAD Requests
Uniquely among the crawlers we monitored, Applebot frequently issues HTTP HEAD requests before full GET requests — checking Last-Modified and ETag headers to determine if content has changed. This suggests a sophisticated caching layer designed to minimize bandwidth and avoid re-processing unchanged content.
What Does This Mean for Apple Intelligence?
Apple's technical report describes two categories of models requiring web data: their 3 billion parameter on-device AFM model, and larger server-side models running in Private Cloud Compute (PCC). The on-device model focuses on writing assistance, summarization, and Siri improvements; the server-side models handle more complex reasoning tasks.
The crawl pattern data suggests Apple is building a diverse, high-quality corpus rather than raw volume — consistent with their published preference for synthetic data augmentation over pure web-scale training. The multilingual bias (51% overrepresentation vs. average) points toward substantial investment in non-English language capabilities, likely for upcoming Apple Intelligence localizations beyond the initial English release.
Should Website Owners Be Concerned?
The short answer: probably not more than any other major AI crawler. Applebot's high robots.txt compliance rate and conservative crawl delays make it one of the more "polite" crawlers in the ecosystem. Website owners who wish to opt out of Applebot crawling can add the following to their robots.txt:
User-agent: Applebot Disallow: /
However, opting out of Applebot may also remove your content from Siri web results and Apple Maps local search results — a tradeoff worth considering for sites that value Apple platform traffic.