Technical SEO for E-commerce
Crawl budget on large catalogues, pagination handling, performance optimisation, and image management.
Crawl Budget at Scale
Crawl budget is finite. Google allocates budget based on site age, authority, server speed. An established e-commerce site gets 10,000+ crawls per day. A new site gets 50 per day. Every crawl is an opportunity to index new content or re-index existing content. Wasting budget on low-value URLs means new products take longer to index.
Strategy: remove unnecessary URLs from crawlability. Noindex filter combinations, remove pagination parameter URLs from sitemaps, block internal search results from indexation. Prioritise indexable products and categories in your XML sitemap.
Monitor crawl budget in GSC. If Googlebot is crawling thousands of URLs monthly but crawl is declining (i.e., Googlebot allocated less budget than last month), you likely have a crawl budget problem. Remove low-value URLs and monitor the next month.
Pagination Handling
Category pages with many products need pagination. Page 2 of results is a different URL from page 1. The question: should page 2 be indexable or not?
Google no longer recommends rel=prev/next. Current guidance: treat each paginated page as a standalone page with its own content. Page 2 should have: unique H1 ("See more running shoes"), unique meta description, unique introductory content, pagination indicating position ("Page 2 of 15"). Use canonical on each page pointing to itself (not to page 1).
If pages are identical except for the product grid, use canonical pointing to page 1. But this wastes SEO potential — each page could rank for different long-tail keywords. "Red running shoes page 2" might rank for a different set of keywords than page 1.
Image Optimisation at Scale
With thousands of product images, optimisation must be automatic. Manual compression for 50,000 images is not viable.
Implementation:
- Use a CDN (Cloudflare, Imgix, etc.) to serve images at optimal sizes for device type
- Implement lazy loading so images only load as they enter viewport
- Use WebP format (smaller file size, modern browsers support) with PNG fallback
- Compress all source images before uploading (automate this in your product upload pipeline)
- Use srcset attributes to serve different image sizes for different screen widths
The ROI is clear: optimised images reduce page load time by 1-3 seconds. This improves Core Web Vitals, which directly affects rankings. A 2-second reduction in load time can drive 10-15% traffic lift from improved rankings plus improved conversion (users do not bounce as much).
Site Speed for Large Catalogues
Large catalogues are inherently slower: more database queries, more images, more JavaScript. Every millisecond matters. Audit your site's Core Web Vitals:
- Largest Contentful Paint (LCP): Time until main content is visible. Target < 2.5 seconds.
- First Input Delay (FID) / Interaction to Next Paint (INP): Responsiveness. Target < 100ms.
- Cumulative Layout Shift (CLS): Visual stability. Target < 0.1.
If your site fails these, identify bottlenecks: slow database queries, large JavaScript bundles, unoptimised images, slow third-party scripts (analytics, chat, ads). Prioritise by impact. Reducing image size often has the highest ROI.
Schema at Scale
With 10,000 products, schema must be generated dynamically from your database. JSON-LD templates generate schema when each product page renders. Category pages need CollectionPage schema. Breadcrumb schema should be on every page. Site search page (even if noindexed) can have SearchAction schema for the search box.
Test a sample of pages monthly in Google's Rich Results Test. If you discover a validation error, fix the template and re-deploy to all pages. Do not let errors accumulate.
Common Technical Issues and Fixes
| Issue | Impact | Fix |
|---|---|---|
| Soft 404s (200 status on product pages with no product found) | Google confused about page type, reduced crawl budget | Return proper HTTP status (404 or 410) for missing products |
| Missing canonical tags | Duplicate content penalties, authority dilution | Add canonical tag to every page pointing to itself or primary version |
| Broken internal links post-migration | Lost link equity, crawl budget wasted on 404s | Audit old URLs with backlinks; implement 301 redirects |
| JavaScript-only content (CSR) | Googlebot must render JS to see content, delayed indexation | Use SSR or SSG; ensure critical content is in HTML |
| No robots.txt or overly restrictive robots.txt | Googlebot blocked from crawling, reduced indexation | Create permissive robots.txt; disallow only low-value URLs |
| Slow server response time (TTFB > 600ms) | Page slow to load, poor Core Web Vitals | Optimise database queries, upgrade hosting, implement caching |
| Large or empty XML sitemap | Google wastes crawl budget on unnecessary URLs | Exclude noindexed pages, filters, pagination variants |
Monitoring and Tooling
Use these tools to monitor technical health:
- Google Search Console: crawl stats, indexation, errors, Core Web Vitals
- Google PageSpeed Insights: Core Web Vitals at scale
- Screaming Frog or Sitebulb: crawl audit for technical issues
- Google Analytics 4: track page speed, user experience correlation to conversions
Establish a monthly review: pull Core Web Vitals, crawl stats, indexation trends. Identify degradation. Investigate root cause. Most technical issues compound — small problems become large if unaddressed.
How This Connects
Technical SEO is the foundation. If Googlebot cannot crawl your site efficiently, if pages take 5 seconds to load, if schema is missing or broken, on-page optimisation and link building cannot compensate. Get the technical foundation right first, then layer in content and authority strategies.