Duplicate Content: What It Is and What to Do
Duplicate content itself isn't penalized. Google simply deduplicates — it picks one version to index and rank. The problem is you might not like which version it picks.
What Duplicate Content Is
Duplicate content is substantially similar content appearing at multiple URLs on the same domain or across domains. A few sentences duplicated doesn't count; we're talking about 30% plus similarity in the main content body.
There's no "duplicate content penalty" in Google's algorithm. Google doesn't punish sites for having duplicate content. It simply picks one version to index and rank. But you might prefer a different version, which is why duplicate content is a problem.
Why Duplicate Content Happens
URL Parameters
Session IDs, tracking parameters, or sorting options create duplicate content. example.com/product?color=blue, example.com/product?color=red, and example.com/product create three URLs with nearly identical content. Google has to guess which one is primary.
HTTP vs HTTPS, WWW vs Non-WWW
example.com, www.example.com, http://example.com, and https://www.example.com are technically different URLs but serve the same content. Without proper redirects, you have four copies of every page.
Print Pages
Many sites create print-friendly versions. example.com/article and example.com/article?print=true are essentially the same content at different URLs.
Category and Tag Pages
E-commerce sites often have the same products accessible through multiple category paths. Shoes might be at /men/shoes, /shoes/men, and /footwear/shoes/men.
Content Syndication
Publishing your article on your site and Medium or LinkedIn creates duplicate content across domains. Google has to decide which version is original.
Pagination
Some pagination setups create duplicate content. If your pagination is misconfigured, pages 1-5 might contain mostly the same content with slightly different items.
The Duplicate Content Myth vs Reality
Myth: Google penalizes sites for duplicate content. Reality: Google deduplicates and indexes one version.
Myth: Duplicate content gets you banned. Reality: It doesn't. Google just ignores duplicates.
Reality: The problem is you lose control over which version ranks. If Google picks the wrong version (or no version), your traffic suffers.
How to Fix Duplicate Content
Canonical Tags
A canonical tag tells Google which version you prefer: <link rel="canonical" href="https://example.com/preferred-url">
Use canonical tags when you have multiple URLs serving the same content but you can't reduce them to one. Examples: paginated content, filtered category pages, syndicated content.
301 Redirects
If you have multiple versions and can consolidate them, use 301 redirects instead of canonicals. Redirect the unwanted versions to the preferred one. This is stronger than a canonical tag because it consolidates link authority completely.
Robots.txt
You can block URLs you don't want crawled. Add to robots.txt: Disallow: /*?color= (blocks all URLs with a color parameter). This prevents Google from even seeing the duplicates.
Remove Duplicates
The best solution is eliminating duplicate content entirely. If you have three product pages for the same item, delete two. If you have print and web versions, serve one page with CSS for printing.
Set Preferred Domain
In Search Console, set your preferred domain (with or without www). This tells Google which version you prefer across your entire site. Always use 301 redirects to enforce this as backup.
Near-Duplicate Content
Near-duplicates are pages with mostly the same content but slight variations. You might have individual product pages for blue-widget-small, blue-widget-medium, and blue-widget-large. The description is 95% identical.
Some near-duplicates are necessary (different product variants). Others are avoidable (boilerplate content repeated across pages). The question is: do the pages serve different search intents?
If searchers are looking for "blue widget small," "blue widget medium," and "blue widget large" as separate queries, then these are separate pages targeting different keywords. Keep them distinct, make each unique, and make size the differentiating factor.
If these are all served by one "blue widget" page with size selection, consolidate them. Multiple product variant pages usually underperform a single page with a variant selector.
Syndication and Republishing
If you republish your content on medium.com, linkedin.com, or other platforms, use canonical tags pointing back to your original. In your Medium article, include <link rel="canonical" href="https://yoursite.com/original-article"> in the head.
This tells Google which version is original and prevents the syndicated version from ranking instead of yours.
Without canonicals, syndicated versions often rank better than your original (because the syndication platform has more authority). Use canonicals to solve this.
Self-Plagiarism Across Your Own Sites
If you own multiple sites and republish content across them, you're creating cross-domain duplicate content. Google will pick which site to rank and might not choose yours.
Don't republish content across multiple domains you own. Keep it unique or use canonical tags pointing to the preferred version.
| Duplicate Scenario | Recommended Fix | Why |
|---|---|---|
| Same product at /men/shoes and /shoes/men | 301 redirect one to the other, or use canonical | Consolidates link equity and clarity |
| HTTP and HTTPS versions both indexing | 301 redirect HTTP to HTTPS, use HTTPS in Search Console | Modern standard, ensures one version ranks |
| Article syndicated on Medium without canonical | Add canonical tag pointing to your site | Tells Google your version is original |
| Print and web versions | Use CSS for print styling, one URL | No need for duplicate content |
| Product with color variants (?color=) | Use canonical or robots.txt to block parameters | Prevents crawl waste on minor variants |
When Duplicate Content Is Okay
It's fine to have similar content across pages if each page serves a distinct search intent. A guide to "how to make sourdough" and a guide to "how to feed sourdough" are similar but targeting different queries. They're both ranking for good reasons.
What's not fine: the exact same content at multiple URLs with no clear reason for separate pages.
Finding Your Duplicate Content
Use crawl tools like Screaming Frog or Semrush to find duplicate content on your site. Crawl your entire site and look for pages with high similarity.
Check for: identical H1 tags across pages, identical title tags, identical body content, similar word counts with little variation.
Manually audit: parameter pages (sort, filter, session IDs), pagination pages, protocol/www variations, print versions, and any pages you've repurposed.