Indexation: Getting Pages Into Google

Crawling is just the first step. A page can be crawled but not indexed. Understanding what stops indexation — and how to verify your actual index status — is critical for diagnosing why pages aren't ranking.

Crawling vs Indexation: The Critical Difference

Crawling is Googlebot visiting a page and downloading it. Indexation is Google processing that page and storing it in the search database (the index). A page must be crawled before it can be indexed, but crawling does not guarantee indexation.

Many sites crawl hundreds of thousands of pages but index far fewer. Google may decide a page is too low-quality, too similar to another page, or otherwise not worth storing in the index. This is the "crawl vs index gap" — pages that are crawled but not indexed.

Why This Matters

You can have perfect crawlability but still have indexation problems. A page must be indexed to rank. If your site crawls well but indexes poorly, you're not reaching your potential.

What Stops a Page From Being Indexed

The noindex Tag

The noindex meta tag explicitly tells Google not to index a page. It looks like: <meta name="robots" content="noindex">. Even if Googlebot crawls the page, it will not add it to the search index. This is useful for duplicate pages, internal search results, admin pages, or draft content that you don't want to rank.

Common mistake: accidentally leaving noindex on important pages. If a canonical page is noindexed, it will not rank. Check your important pages regularly to ensure they don't have noindex tags.

Robots.txt Blocking

If robots.txt blocks a page from crawling, Google usually won't index it (though Google has stated pages can technically be indexed from external links even if blocked by robots.txt, this is rare and unreliable). The point: if you block a page from crawling via robots.txt, assume it won't be indexed.

Canonical Tags Pointing Elsewhere

A canonical tag tells Google another URL is the "master" version. If Page A has a canonical pointing to Page B, Google will index Page B and not Page A (under most circumstances). This is intentional deduplication. But if you misapply a canonical, you can accidentally prevent a page from being indexed.

Thin or Duplicate Content

Google may decide a page is too similar to another page, too thin (very little unique content), or too low-value to deserve a place in the index. This is Google's quality filter at work. If a page is thin, beef it up. If it's duplicate, use a canonical tag to consolidate or use 301 redirects.

No Incoming Links

Pages that are not linked to from anywhere on your site or the web are harder for Google to discover and prioritise for indexing. If you have orphaned pages (pages not linked to), link them internally. Every page should be reachable from your site's navigation or structure.

How to Check Indexation Status

The site: Search

Use the site: operator in Google Search. For example, site:example.com shows all pages Google has indexed for your domain. Compare the number of results to your expected page count. If much fewer pages are indexed than exist, you have an indexation problem.

Google Search Console Index Coverage Report

GSC's Coverage report is more detailed. It shows:

Valid (indexed): Pages successfully indexed
Valid with warnings: Pages indexed but with issues like missing meta descriptions
Excluded: Pages not indexed (and why: noindex, robots.txt, canonical elsewhere, low quality, etc.)
Errors: Pages Googlebot couldn't crawl (4xx, 5xx errors, etc.)

If important pages are in the Excluded section, click to see the reason. That reason tells you exactly what to fix.

Common Indexation Problems

Important Pages Accidentally Noindexed

This happens. A developer might have put noindex on pages during development, then forgot to remove it before launch. Or a CMS template accidentally applies noindex to a category of pages. Check GSC Coverage to find accidentally noindexed pages, then remove the noindex tag.

Parameter URLs Bloating the Index

E-commerce sites often struggle with this. A product page with filters creates URLs like /products/shoes?color=red&size=10. Each unique parameter combination is a different URL. If Google indexes all variants, you waste index space on duplicates. Use canonical tags to consolidate parameter variations to a single canonical URL.

Staging Sites Indexed

Staging servers that accidentally got indexed. If staging.example.com is indexed, you have duplicate content competing with your live site. Block staging URLs from indexing with robots.txt or noindex.

Large Crawl-to-Index Gap

If GSC shows many more crawled URLs than indexed URLs, Google is choosing not to index many of your pages. Reasons: thin content, duplicates not properly consolidated, low quality or authority, or pages not linked internally. Fix by improving content quality, consolidating duplicates, and linking important pages from your main structure.

Index Coverage Investigation

Visit Google Search Console > Coverage. If more than 10-20% of your site is excluded, investigate. Click the "Excluded" tab to see the breakdown of reasons. Common fixable issues: remove noindex from important pages, consolidate duplicate content, fix 404s, improve page quality.

The Index-Rank Relationship

Indexation is necessary but not sufficient for ranking. A page must be indexed to rank, but being indexed doesn't mean it will rank. After indexation comes ranking, which depends on relevance, backlinks, Core Web Vitals, and dozens of other factors. Think of indexation as getting your page into the game. Ranking is actually winning the game.