Why is my page not indexed?

Six common causes, in order of frequency. (1) Crawled but not indexed — Google rendered it and decided it wasn't valuable enough; usually a content quality / duplication issue. (2) Discovered but not crawled — Googlebot found the URL but didn't fetch it; crawl-budget or low priority. (3) Blocked by robots.txt. (4) Marked noindex. (5) Canonical tag pointing elsewhere. (6) Server returned a 4xx/5xx error when Googlebot fetched. Search Console > Pages report tells you which bucket the URL is in.

What is crawl budget and does it matter for me?

Crawl budget is the number of URLs Googlebot is willing to fetch from your site in a given timeframe. For sites with under ~10k URLs, it almost never matters — Google can crawl everything. For sites with hundreds of thousands or millions of URLs (e-commerce, marketplaces, programmatic), crawl budget becomes a real constraint and the optimisation work is real: faceted nav controls, parameter handling, sitemap prioritisation, removing low-value URLs from the crawl path.

How do I force Google to crawl a new page?

Three options, in order of speed. (1) Search Console > URL Inspection > Request Indexing — manual, useful for individual high-priority pages, daily quota. (2) Make sure the page is linked from a high-authority page on the site (homepage, hub page, recent article); Googlebot finds it via crawling. (3) Submit an updated XML sitemap that includes the new URL; over time Googlebot processes it. The fastest is option 1 for one-off urgent pages; option 2 is the durable system; option 3 is the bulk default.

What does 'soft 404' mean in Search Console?

A soft 404 is a page that returns HTTP 200 (success) but Google decided it's actually a 'not found' or 'no content' page — empty product pages, error pages styled to look normal, internal-search results with zero hits, deleted content that wasn't redirected. Google treats it as a 404 for indexing but flags it because the server is lying. Fix: either return a proper 404/410, or restore content, or redirect (301) to a relevant alternative.

Technical SEO

Chapter 04 / 09

Crawling and indexing

Two distinct stages, two distinct failure modes. What stops Googlebot from crawling, what stops it from indexing, and how to diagnose either one from Search Console without guessing.

9 min readPublished May 4, 2026

Crawling and indexing are the two foundational stages every page has to clear before it can rank. They’re distinct: crawling is discovery and rendering, indexing is decision and storage. They have different failure modes, different diagnostic surfaces in Search Console, and different fixes. Conflating them is the most common reason “why isn’t my page ranking” investigations go in circles.

“Crawled but not indexed is a content problem. Discovered but not crawled is a crawl-priority problem. Not even discovered is an internal-linking problem. Three different failures, three different fixes — and Search Console tells you which one you’re looking at.”

The full pipeline — crawl → render → index → rank

Stage	What happens	What can fail
Discovery	Googlebot finds the URL via internal link, sitemap, external link, or manual submission	No internal link + no sitemap = orphan page
Crawl	Googlebot fetches the HTML at the URL	Robots.txt block, 4xx/5xx server error, slow response
Render	Google runs JavaScript, builds the final DOM, extracts content + signals	JS errors, blocked resources, dynamic content not rendering
Index decision	Google decides whether the rendered page goes into the index	Low quality, duplication, noindex, canonical pointing elsewhere
Rank	Indexed pages compete in retrieval for a query	Out of scope for this article — see the Google algorithm cluster

StageDiscovery

What happensGooglebot finds the URL via internal link, sitemap, external link, or manual submission

What can failNo internal link + no sitemap = orphan page

StageCrawl

What happensGooglebot fetches the HTML at the URL

What can failRobots.txt block, 4xx/5xx server error, slow response

StageRender

What happensGoogle runs JavaScript, builds the final DOM, extracts content + signals

What can failJS errors, blocked resources, dynamic content not rendering

StageIndex decision

What happensGoogle decides whether the rendered page goes into the index

What can failLow quality, duplication, noindex, canonical pointing elsewhere

StageRank

What happensIndexed pages compete in retrieval for a query

What can failOut of scope for this article — see the Google algorithm cluster

Each stage has a Search Console signal. Discovery and crawl issues show in the Crawl Stats report and the Pages report (“Discovered — currently not indexed”). Render issues show in the Inspect URL tool when you compare HTML to rendered HTML. Index decisions show in the Pages report (“Crawled — currently not indexed”).

Stage 1 — Discovery

Googlebot finds new URLs through three primary channels:

Internal links from already-crawled pages on your domain.
XML sitemaps submitted via Search Console.
External links from other domains, plus URL submission via Search Console’s URL Inspection tool.

A page that doesn’t appear in any of those is an “orphan page” — Google doesn’t know it exists. The fix is the simplest in this whole article: add an internal link from somewhere reachable, or add the URL to your sitemap, or both.

Stage 2 — Crawl

Once Googlebot has the URL, it tries to fetch it. The fetch can fail for several reasons:

Robots.txt block. Common after launches when staging robots.txt rules accidentally promote to production.
4xx errors. 404s and 410s are correct for deleted pages but a problem when valid pages return them by mistake.
5xx errors. Server-side issues — overload, application crashes, misconfigured CDN. Googlebot backs off and retries; persistent 5xx demotes the URL.
Slow response. If the server takes more than 10–15 seconds to respond, Googlebot may abandon the fetch.
Crawl budget caps on large sites — Googlebot won’t fetch every URL on every visit.

Search Console’s Crawl Stats report shows the volume Googlebot is fetching, the response codes it’s seeing, and the average response time. Anomalies there usually predict ranking trouble before it shows in traffic.

Stage 3 — Render

Modern Google renders pages with a headless Chromium that executes JavaScript before extracting content. Two pages can return identical HTML and very different rendered DOMs depending on what their JS does. Render failures show as missing content in the indexed version even when the URL was crawled successfully.

Use Search Console > URL Inspection > Test live URL > View tested page > Screenshot + HTML. If the rendered HTML doesn’t match what users see in the browser, Google can’t see the missing content either. Common causes:

Render-blocking JavaScript that times out before the bot finishes rendering.
Content loaded after user interaction (click-to-reveal, infinite scroll without IntersectionObserver-based prerender).
Resources blocked by robots.txt — JS files, CSS files, API endpoints critical to the rendered output.
API failures during render — content fetched from a backend that the bot can’t reach.

See the dedicated JavaScript SEO article for the deeper render fix list.

Stage 4 — Index decision

Once rendered, Google decides whether the page is worth keeping in the index. The two most common rejection states in Search Console:

State	What it means	Typical fix
Crawled — currently not indexed	Google fetched + rendered the page and rejected it. Quality, duplication, or thin-content reasons.	Improve content quality, add unique value, consolidate duplicate URLs, refresh outdated pages
Discovered — currently not indexed	Google knows the URL exists but didn't fetch it. Crawl-priority or budget reason.	Increase internal linking from authoritative pages; reduce low-value URLs in the crawl path; check site speed
Duplicate without user-selected canonical	Google decided this page is a duplicate of another one, no canonical set	Set explicit canonical, consolidate duplicates, or improve uniqueness
Page with redirect	URL redirects to another URL — the destination is what gets indexed	Usually correct; verify the destination is the intended canonical
Soft 404	Page returns HTTP 200 but Google sees it as 'not found'	Return proper 404/410, restore content, or 301 redirect
Blocked by robots.txt	robots.txt prevents crawling	Adjust robots.txt if the block was unintentional
Excluded by 'noindex' tag	Page has noindex meta or X-Robots-Tag header	Remove noindex if the exclusion was unintentional

StateCrawled — currently not indexed

What it meansGoogle fetched + rendered the page and rejected it. Quality, duplication, or thin-content reasons.

Typical fixImprove content quality, add unique value, consolidate duplicate URLs, refresh outdated pages

StateDiscovered — currently not indexed

What it meansGoogle knows the URL exists but didn't fetch it. Crawl-priority or budget reason.

Typical fixIncrease internal linking from authoritative pages; reduce low-value URLs in the crawl path; check site speed

StateDuplicate without user-selected canonical

What it meansGoogle decided this page is a duplicate of another one, no canonical set

Typical fixSet explicit canonical, consolidate duplicates, or improve uniqueness

StatePage with redirect

What it meansURL redirects to another URL — the destination is what gets indexed

Typical fixUsually correct; verify the destination is the intended canonical

StateSoft 404

What it meansPage returns HTTP 200 but Google sees it as 'not found'

Typical fixReturn proper 404/410, restore content, or 301 redirect

StateBlocked by robots.txt

What it meansrobots.txt prevents crawling

Typical fixAdjust robots.txt if the block was unintentional

StateExcluded by 'noindex' tag

What it meansPage has noindex meta or X-Robots-Tag header

Typical fixRemove noindex if the exclusion was unintentional

Crawl budget — when it matters

For sites with fewer than ~10,000 URLs, crawl budget rarely matters; Google can crawl your entire site frequently. For larger sites — e-commerce with deep faceted catalogs, marketplaces, programmatic SEO at scale — crawl budget becomes a real constraint.

Symptoms of crawl budget pressure:

New URLs taking weeks to be crawled and indexed.
Updated content not refreshing in the index for a long time.
Large numbers of URLs in “Discovered — currently not indexed”.
Crawl Stats showing the bot spending most of its quota on low-value URLs (faceted nav permutations, sort variants, filter combinations).

Mitigations:

Block low-value URL parameters via robots.txt or noindex.
Use canonical tags to consolidate duplicates instead of letting all variants get crawled.
Prune dead-weight URLs (long-tail product pages with no traffic, archive listings nobody reads).
Improve site speed — faster responses = more URLs crawled per session.
Use XML sitemaps to signal priority URLs.

The Search Console diagnostic workflow

When a page isn’t ranking and you suspect crawl/index issues, work this sequence:

1. URL Inspection. Paste the URL, check “URL is on Google” status. If not indexed, the inspection tool tells you why.
2. Pages report > filter to the relevant URL pattern. See which bucket the URL falls into (indexed, crawled-not-indexed, discovered-not-crawled, etc).
3. Crawl Stats report. Confirm Googlebot is reaching the site successfully, response codes are sane, average response time is under a few seconds.
4. Coverage trends. Sudden drops in indexed-page count are usually a robots.txt regression, a noindex tag rolled out site-wide, or a canonical pointing elsewhere.
5. URL Inspection > Test live URL. Confirms the rendered HTML matches what you expect; checks if the bot can render the content.

The bottom line

Crawling and indexing are two stages, not one. A page can fail at discovery (no link, no sitemap), at crawl (robots block, 4xx/5xx), at render (JS issues), or at the index decision (quality, duplication, canonical). Each failure shows in a different Search Console surface and demands a different fix. Don’t guess — diagnose. The tooling is there; most teams just don’t use it systematically.

Common questions

Quick answers to what we get asked before every trial signup.

Crawling is the discovery stage — Googlebot follows links, fetches HTML, and decides what to render. Indexing is the storage stage — after rendering, Google decides whether the page is worth keeping in the index, and stores it with extracted signals (content, schema, canonical, links). A page can be crawled but not indexed (Google saw it but rejected it) and a page can fail to be crawled at all (no internal link, blocked by robots, server returned an error). Different failures require different fixes.

In this cluster

Technical SEO

Previous chapter

03. Mobile SEO

Next chapter

05. XML sitemaps

Product

Resources

Company

Crawling and indexing

The full pipeline — crawl → render → index → rank

Stage 1 — Discovery

Stage 2 — Crawl

Stage 3 — Render

Stage 4 — Index decision

Crawl budget — when it matters

The Search Console diagnostic workflow

The bottom line

Common questions

Technical SEO