XML Sitemaps
An XML sitemap is a machine-readable file listing URLs you want indexed. It's a hint to search engines about what pages exist on your site. Think of it as a map you hand to Googlebot saying "here are the important pages — please crawl these."
What Is an XML Sitemap?
An XML sitemap is a file in XML format located at the root of your domain (usually /sitemap.xml) that lists URLs and optional metadata like:
- URL
- Last modified date (when the page last changed)
- Change frequency (how often it typically changes)
- Priority relative to other pages
The syntax is simple XML. For example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-one</loc>
<lastmod>2025-03-15</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
When Do You Need a Sitemap?
Not every site needs a sitemap. Small sites with strong internal linking and under 500 pages? Probably don't need one. But sitemaps are valuable for:
- Large sites (1000+ pages). Internal links alone may not reach all pages efficiently. A sitemap ensures all important URLs are discoverable.
- Sites with weak internal linking. If pages aren't well-linked from your main structure, a sitemap helps Google find them.
- New sites. Fresh domains with few external links benefit from a sitemap to accelerate discovery.
- Sites with frequently updated content. News sites, blogs, or e-commerce sites updating daily benefit from a sitemap with last-modified dates so Google knows what changed recently.
- Sites with pages inaccessible via standard navigation. For example, image galleries or video pages not linked in menus.
Sitemap Structure and Limits
An XML sitemap has size limits: 50,000 URLs maximum and 50 MB maximum file size. For larger sites, use a sitemap index file that references multiple sitemaps. A sitemap index looks like:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-1.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-2.xml</loc>
</sitemap>
</sitemapindex>
For an e-commerce site with 100,000 products, create multiple sitemaps (each with 50,000 URLs max) and one index file referencing all of them.
What to Include and Exclude
Include
- Canonical URLs only. If a URL has a canonical tag pointing elsewhere, include the canonical URL, not the variant.
- Important pages you want indexed and ranked.
- Pages that change frequently (blogs, news, product updates).
Exclude
- Pages with noindex tags. Don't list them — they're explicitly marked as not for indexing.
- Paginated archive pages (page 2, page 3, etc.) beyond page 1. Include only the first page or the main listing. Users reach subsequent pages through pagination.
- Parameter URLs or duplicate content. Use canonicals to consolidate; don't list duplicates.
- Login pages, admin pages, or pages with no indexable value.
Image and Video Sitemaps
You can create specialised sitemaps for images and videos. These help search engines understand that your site contains rich media. Google uses image sitemaps to index images for Image Search. Video sitemaps help with video discovery. If your site is image or video-heavy, consider these.
Submitting Your Sitemap
To submit a sitemap to Google Search Console:
- Log into Google Search Console for your domain.
- Select your property.
- Navigate to Sitemaps (in the Indexing section).
- Click "Add a new sitemap".
- Enter the URL (e.g., https://example.com/sitemap.xml).
- Click Submit.
You can also reference your sitemap in robots.txt using: Sitemap: https://example.com/sitemap.xml. This tells search engines where to find it.
Does a Sitemap Guarantee Indexation?
No. A sitemap is a hint, not a command. Submitting a sitemap tells Google "here are URLs I want indexed," but Google still evaluates each URL. If Google determines a page is low-quality, duplicate, or noindex, it won't index it regardless of sitemap inclusion.
However, a quality sitemap increases the likelihood that important pages get crawled and indexed faster. It's not a magic solution, but it's a worthwhile signal for large or complex sites.