How fast does Google crawl a new page?

For a healthy site with internal links pointing at the new URL, Googlebot typically crawls within 24-72 hours. Indexation (the decision to store and rank it) takes another 1-7 days for most sites. Pages with weak internal linking, thin content, or schema errors can sit Crawled-not-indexed for weeks. The lever is internal links from authoritative pages plus a clean sitemap entry.

Do AI engines use the same index as Google?

No, each engine builds its own. ChatGPT's web search runs Bing under the hood. Gemini uses Google's index. Perplexity uses a hybrid of multiple sources. Claude added its own search in 2025. The implication: a page indexed in Google does not automatically appear in ChatGPT — getting cited in AI engines is a separate optimisation problem with overlapping but distinct signals.

Why does a page sometimes drop in rankings without me changing anything?

Because the engine recalculates rankings continuously. A drop almost always means a competitor improved (better signals, more recent content, more citations), the query intent shifted (Google reinterpreting what users want), or a core update changed how factors are weighted. Looking at your own page in isolation misses the cause; rankings are relative to every other candidate.

How many pages does Google have in its index?

Hundreds of billions, but the number that matters is the number indexed for queries in your category — usually a few thousand to a few million. Index inclusion is the floor; the actual scarce resource is rank position one through ten for your target queries, which is a fixed pool of 10 slots no matter how many pages exist.

Fundamentals

Chapter 02 / 09

How search engines work

Crawling, indexing, ranking — and the fourth stage that didn't exist in 2018 but decides who gets cited inside ChatGPT and Google AI Overviews in 2026.

10 min readPublished May 4, 2026

Search engines work by repeating four stages on a continuous loop.Three of them — crawling, indexing, ranking — have been the model since 1998. The fourth — synthesis — is the one that didn’t exist when most SEO advice was written and now decides who gets cited inside ChatGPT, Google AI Overview, Gemini, Perplexity, and Claude. Skipping it is why a lot of teams ship technically perfect content that nobody finds.

This article walks each stage and points out where modern SEO actually moves the needle versus where the textbooks tell you to look.

“Most SEO problems don’t live where teams optimise. The bottleneck is usually one stage upstream of where they’re looking.”

The four stages

Stage	What happens	Where SEO leverage lives
1. Crawling	A bot fetches the URL, follows links, returns to the queue	Internal links, sitemap, robots.txt, render performance
2. Indexing	The engine decides whether the page is worth storing	Content quality, duplicate detection, canonical, schema
3. Ranking	When a query arrives, candidates are scored and ordered	Search intent match, authority signals, freshness, E-E-A-T
4. Synthesis	AI engines compose an answer citing multiple indexed sources	Passage structure, entity clarity, sameAs, citation-readiness

Stage1. Crawling

What happensA bot fetches the URL, follows links, returns to the queue

Where SEO leverage livesInternal links, sitemap, robots.txt, render performance

Stage2. Indexing

What happensThe engine decides whether the page is worth storing

Where SEO leverage livesContent quality, duplicate detection, canonical, schema

Stage3. Ranking

What happensWhen a query arrives, candidates are scored and ordered

Where SEO leverage livesSearch intent match, authority signals, freshness, E-E-A-T

Stage4. Synthesis

What happensAI engines compose an answer citing multiple indexed sources

Where SEO leverage livesPassage structure, entity clarity, sameAs, citation-readiness

1. Crawling — how the bot finds your page

Googlebot, Bingbot, ChatGPT’s GPTBot, Claude’s ClaudeBot, Perplexity’s PerplexityBot, and OpenAI’s OAI-SearchBot all do the same thing: fetch a URL, parse the HTML, follow the links inside, and add new URLs to a queue. The queue gets enormous fast — billions of URLs across the open web — so engines prioritise. The mistake is assuming “published” means “crawled.”

Three things determine whether a bot reaches your page in days versus weeks:

Internal links from authoritative pages. A new URL with three internal links from ranking pages gets crawled before the same URL with zero internal links. The single most underused crawl-priority lever in 2026 is publishing a new article and not adding it to the homepage, the cluster page, or sibling articles.
Sitemap freshness. An XML sitemap with accurate lastmodtimestamps tells the engine which URLs are new or changed since the last crawl. A sitemap that was generated once at launch and never regenerated is invisible to the engine’s prioritisation logic.
Render performance. Bots have a budget per site (informally called the crawl budget). Pages that take 8 seconds to render burn budget that the engine could have spent on other URLs. Core Web Vitals matter here too — not just for ranking but for how many of your pages get crawled in a given window.

2. Indexing — the decision to store

Crawled is not indexed. After fetching the page the engine asks one question: is this worth keeping? Pages that fail this check get crawled but discarded — they’ll never rank, no matter how good the on-page optimisation looks.

The 2026 reasons a page fails the indexing decision:

Thin or duplicate content. If the page repeats what already exists in the index — same boilerplate, same product description, same FAQ — the engine has no reason to add another copy. Programmatic SEO done badly fails here.
Confused canonical signals.When two URLs serve essentially the same content (e.g., a query-string variant) and don’t agree on which is canonical, the engine often indexes neither. Canonical chains and self-referential canonicals trip this constantly.
Missing or invalid schema.Schema doesn’t guarantee indexation, but an Article + FAQPage + BreadcrumbList graph signals to the engine that you’ve thought about what the page is. Pages without it look generic and lose tie-breakers.
Soft 404 patterns.Pages that load but say “no results” or “this product is unavailable” in their main content get classified as soft 404s by Google and skipped.

3. Ranking — the score on every query

When a user types a query, the engine pulls candidate pages from the index, scores them against hundreds of factors, and orders them. The score is not a single number computed once; it’s recalculated for each query because the same page can be a great fit for one query and a poor fit for another.

Most SEO advice fixates on the ranking stage because it’s the visible one. The reality is that ranking factors only matter for pages that already cleared the indexing decision — so optimising on-page elements before the indexation problem is solved is wasted effort.

That said, the factor categories that move ranking in 2026 are well established:

Search intent match. Does the page answer the actual user need behind the query? Informational queries want guides; transactional queries want products. Mismatched intent loses to a weaker page that nailed the intent.
Authority signals.Backlinks, brand mentions, sameAs identity, citation count in adjacent media. The engine’s shorthand for “is this site trustworthy in this category?”
Freshness.Different query types demand different freshness. “What year is it” needs daily updates; “how to write a will” doesn’t. Stale pages on time-sensitive queries lose; recently-updated pages on evergreen queries don’t automatically win.
E-E-A-T.Experience, Expertise, Authoritativeness, Trust — Google’s framework for who the engine should believe. Encoded through Person + Author schema, sameAs identity, and editorial signals from third parties.

4. Synthesis — the stage SEO advice keeps missing

AI engines do not show ten blue links. They compose an answer by pulling passages from multiple indexed sources, weaving them into a single response, and citing each source. The decision of which sources to pull and quote is the synthesis stage — and it scores pages on signals that classic ranking does not weigh as heavily.

What synthesis looks for that ranking under-weights:

Self-contained passages. Two-to-three sentences that answer a sub-question without needing the rest of the article for context. Synthesis tends to pull paragraphs, not pages, so paragraphs that stand alone get pulled more often.
Entity clarity. The page mentions the entity (your brand, product, person) in a way that the engine can disambiguate. Inconsistent entity descriptions across the site, vague company descriptions, missing sameAs links — all hurt synthesis pickup even if classic ranking is fine.
Citation-ready facts.Numbers, dates, attribution to sources. AI engines prefer to quote pages where the facts are clearly stated and clearly sourced. Vague writing (“studies show”) loses to specific writing (“a 2026 ahrefs study of 4M URLs found”).

Synthesis is why a page can rank position 8 in Google but be the source most quoted in ChatGPT for the same query — and vice versa. The two stages reward different things.

What this means for SEO practice

Map every SEO problem you’re trying to solve to one of the four stages first. The fix lives at the stage where the problem starts, not the stage where the symptom appears.

Symptom	Likely stage	Likely fix
Page not in Google index	Crawling or Indexing	Add internal links + check Search Console for 'Crawled — currently not indexed' or duplicate/canonical errors
Indexed but ranks position 30+	Ranking	Search intent mismatch or thin authority for this category — usually content-side, not technical
Ranks well in Google but never appears in ChatGPT	Synthesis	Restructure paragraphs to be self-contained; tighten entity description; add sameAs
Loses ranking after a Google core update	Ranking	Factor weights shifted — usually means the site was relying on something the algorithm now devalues

SymptomPage not in Google index

Likely stageCrawling or Indexing

Likely fixAdd internal links + check Search Console for 'Crawled — currently not indexed' or duplicate/canonical errors

SymptomIndexed but ranks position 30+

Likely stageRanking

Likely fixSearch intent mismatch or thin authority for this category — usually content-side, not technical

SymptomRanks well in Google but never appears in ChatGPT

Likely stageSynthesis

Likely fixRestructure paragraphs to be self-contained; tighten entity description; add sameAs

SymptomLoses ranking after a Google core update

Likely stageRanking

Likely fixFactor weights shifted — usually means the site was relying on something the algorithm now devalues

Common questions

Quick answers to what we get asked before every trial signup.

Crawling is when a bot fetches a URL and reads its content. Indexing is when the engine decides that page is worth storing in its database to rank later. Crawled is not indexed — Search Console reports the gap as 'Crawled — currently not indexed', and it's the most common reason a page exists but never ranks.

In this cluster

Fundamentals

01. What is SEO?
Read
02
How search engines work
Reading now
03. The Google algorithm
Upcoming
04. Types of SEO
Upcoming
05. SEO vs paid
Upcoming
06. SEO vs other channels
Upcoming
07. Is SEO dead? SEO myths debunked
Upcoming
08. Benefits of SEO
Upcoming
09. SEO strategy roadmap
Upcoming

Previous chapter

01. What is SEO?

Next chapter

03. The Google algorithm

Product

Resources

Company

How search engines work

The four stages

1. Crawling — how the bot finds your page

2. Indexing — the decision to store

3. Ranking — the score on every query

4. Synthesis — the stage SEO advice keeps missing

What this means for SEO practice

Common questions

Fundamentals