Indexation: Getting Pages Into Google
Crawling is just the first step. A page can be crawled but not indexed. Understanding what stops indexation — and how to verify your actual index status — is critical for diagnosing why pages aren't ranking.
Crawling vs Indexation: The Critical Difference
Crawling is Googlebot visiting a page and downloading it. Indexation is Google processing that page and storing it in the search database (the index). A page must be crawled before it can be indexed, but crawling does not guarantee indexation.
Many sites crawl hundreds of thousands of pages but index far fewer. Google may decide a page is too low-quality, too similar to another page, or otherwise not worth storing in the index. This is the "crawl vs index gap" — pages that are crawled but not indexed.
What Stops a Page From Being Indexed
The noindex Tag
The noindex meta tag explicitly tells Google not to index a page. It looks like: <meta name="robots" content="noindex">. Even if Googlebot crawls the page, it will not add it to the search index. This is useful for duplicate pages, internal search results, admin pages, or draft content that you don't want to rank.
Common mistake: accidentally leaving noindex on important pages. If a canonical page is noindexed, it will not rank. Check your important pages regularly to ensure they don't have noindex tags.
Robots.txt Blocking
If robots.txt blocks a page from crawling, Google usually won't index it (though Google has stated pages can technically be indexed from external links even if blocked by robots.txt, this is rare and unreliable). The point: if you block a page from crawling via robots.txt, assume it won't be indexed.
Canonical Tags Pointing Elsewhere
A canonical tag tells Google another URL is the "master" version. If Page A has a canonical pointing to Page B, Google will index Page B and not Page A (under most circumstances). This is intentional deduplication. But if you misapply a canonical, you can accidentally prevent a page from being indexed.
Thin or Duplicate Content
Google may decide a page is too similar to another page, too thin (very little unique content), or too low-value to deserve a place in the index. This is Google's quality filter at work. If a page is thin, beef it up. If it's duplicate, use a canonical tag to consolidate or use 301 redirects.
No Incoming Links
Pages that are not linked to from anywhere on your site or the web are harder for Google to discover and prioritise for indexing. If you have orphaned pages (pages not linked to), link them internally. Every page should be reachable from your site's navigation or structure.
How to Check Indexation Status
The site: Search
Use the site: operator in Google Search. For example, site:example.com shows all pages Google has indexed for your domain. Compare the number of results to your expected page count. If much fewer pages are indexed than exist, you have an indexation problem.
Google Search Console Index Coverage Report
GSC's Coverage report is more detailed. It shows:
- Valid (indexed): Pages successfully indexed
- Valid with warnings: Pages indexed but with issues like missing meta descriptions
- Excluded: Pages not indexed (and why: noindex, robots.txt, canonical elsewhere, low quality, etc.)
- Errors: Pages Googlebot couldn't crawl (4xx, 5xx errors, etc.)
If important pages are in the Excluded section, click to see the reason. That reason tells you exactly what to fix.
Common Indexation Problems
Important Pages Accidentally Noindexed
This happens. A developer might have put noindex on pages during development, then forgot to remove it before launch. Or a CMS template accidentally applies noindex to a category of pages. Check GSC Coverage to find accidentally noindexed pages, then remove the noindex tag.
Parameter URLs Bloating the Index
E-commerce sites often struggle with this. A product page with filters creates URLs like /products/shoes?color=red&size=10. Each unique parameter combination is a different URL. If Google indexes all variants, you waste index space on duplicates. Use canonical tags to consolidate parameter variations to a single canonical URL.
Staging Sites Indexed
Staging servers that accidentally got indexed. If staging.example.com is indexed, you have duplicate content competing with your live site. Block staging URLs from indexing with robots.txt or noindex.
Large Crawl-to-Index Gap
If GSC shows many more crawled URLs than indexed URLs, Google is choosing not to index many of your pages. Reasons: thin content, duplicates not properly consolidated, low quality or authority, or pages not linked internally. Fix by improving content quality, consolidating duplicates, and linking important pages from your main structure.
The Index-Rank Relationship
Indexation is necessary but not sufficient for ranking. A page must be indexed to rank, but being indexed doesn't mean it will rank. After indexation comes ranking, which depends on relevance, backlinks, Core Web Vitals, and dozens of other factors. Think of indexation as getting your page into the game. Ranking is actually winning the game.