Duplicate Content: What It Is and What to Do

Duplicate content itself isn't penalized. Google simply deduplicates — it picks one version to index and rank. The problem is you might not like which version it picks.

What Duplicate Content Is

Duplicate content is substantially similar content appearing at multiple URLs on the same domain or across domains. A few sentences duplicated doesn't count; we're talking about 30% plus similarity in the main content body.

There's no "duplicate content penalty" in Google's algorithm. Google doesn't punish sites for having duplicate content. It simply picks one version to index and rank. But you might prefer a different version, which is why duplicate content is a problem.

Why Duplicate Content Happens

URL Parameters

Session IDs, tracking parameters, or sorting options create duplicate content. example.com/product?color=blue, example.com/product?color=red, and example.com/product create three URLs with nearly identical content. Google has to guess which one is primary.

HTTP vs HTTPS, WWW vs Non-WWW

example.com, www.example.com, http://example.com, and https://www.example.com are technically different URLs but serve the same content. Without proper redirects, you have four copies of every page.

Print Pages

Many sites create print-friendly versions. example.com/article and example.com/article?print=true are essentially the same content at different URLs.

Category and Tag Pages

E-commerce sites often have the same products accessible through multiple category paths. Shoes might be at /men/shoes, /shoes/men, and /footwear/shoes/men.

Content Syndication

Publishing your article on your site and Medium or LinkedIn creates duplicate content across domains. Google has to decide which version is original.

Pagination

Some pagination setups create duplicate content. If your pagination is misconfigured, pages 1-5 might contain mostly the same content with slightly different items.

The Duplicate Content Myth vs Reality

Myth: Google penalizes sites for duplicate content. Reality: Google deduplicates and indexes one version.

Myth: Duplicate content gets you banned. Reality: It doesn't. Google just ignores duplicates.

Reality: The problem is you lose control over which version ranks. If Google picks the wrong version (or no version), your traffic suffers.

The Real Problem

You have 10 URLs with identical product content. Google picks one to rank. You might have wanted a different one. That's the issue — not a penalty, but lost ranking opportunity.

How to Fix Duplicate Content

Canonical Tags

A canonical tag tells Google which version you prefer: <link rel="canonical" href="https://example.com/preferred-url">

Use canonical tags when you have multiple URLs serving the same content but you can't reduce them to one. Examples: paginated content, filtered category pages, syndicated content.

301 Redirects

If you have multiple versions and can consolidate them, use 301 redirects instead of canonicals. Redirect the unwanted versions to the preferred one. This is stronger than a canonical tag because it consolidates link authority completely.

Robots.txt

You can block URLs you don't want crawled. Add to robots.txt: Disallow: /*?color= (blocks all URLs with a color parameter). This prevents Google from even seeing the duplicates.

Remove Duplicates

The best solution is eliminating duplicate content entirely. If you have three product pages for the same item, delete two. If you have print and web versions, serve one page with CSS for printing.

Set Preferred Domain

In Search Console, set your preferred domain (with or without www). This tells Google which version you prefer across your entire site. Always use 301 redirects to enforce this as backup.

Near-Duplicate Content

Near-duplicates are pages with mostly the same content but slight variations. You might have individual product pages for blue-widget-small, blue-widget-medium, and blue-widget-large. The description is 95% identical.

Some near-duplicates are necessary (different product variants). Others are avoidable (boilerplate content repeated across pages). The question is: do the pages serve different search intents?

If searchers are looking for "blue widget small," "blue widget medium," and "blue widget large" as separate queries, then these are separate pages targeting different keywords. Keep them distinct, make each unique, and make size the differentiating factor.

If these are all served by one "blue widget" page with size selection, consolidate them. Multiple product variant pages usually underperform a single page with a variant selector.

Syndication and Republishing

If you republish your content on medium.com, linkedin.com, or other platforms, use canonical tags pointing back to your original. In your Medium article, include <link rel="canonical" href="https://yoursite.com/original-article"> in the head.

This tells Google which version is original and prevents the syndicated version from ranking instead of yours.

Without canonicals, syndicated versions often rank better than your original (because the syndication platform has more authority). Use canonicals to solve this.

Self-Plagiarism Across Your Own Sites

If you own multiple sites and republish content across them, you're creating cross-domain duplicate content. Google will pick which site to rank and might not choose yours.

Don't republish content across multiple domains you own. Keep it unique or use canonical tags pointing to the preferred version.

Duplicate Scenario	Recommended Fix	Why
Same product at /men/shoes and /shoes/men	301 redirect one to the other, or use canonical	Consolidates link equity and clarity
HTTP and HTTPS versions both indexing	301 redirect HTTP to HTTPS, use HTTPS in Search Console	Modern standard, ensures one version ranks
Article syndicated on Medium without canonical	Add canonical tag pointing to your site	Tells Google your version is original
Print and web versions	Use CSS for print styling, one URL	No need for duplicate content
Product with color variants (?color=)	Use canonical or robots.txt to block parameters	Prevents crawl waste on minor variants

When Duplicate Content Is Okay

It's fine to have similar content across pages if each page serves a distinct search intent. A guide to "how to make sourdough" and a guide to "how to feed sourdough" are similar but targeting different queries. They're both ranking for good reasons.

What's not fine: the exact same content at multiple URLs with no clear reason for separate pages.

Finding Your Duplicate Content

Use crawl tools like Screaming Frog or Semrush to find duplicate content on your site. Crawl your entire site and look for pages with high similarity.

Check for: identical H1 tags across pages, identical title tags, identical body content, similar word counts with little variation.

Manually audit: parameter pages (sort, filter, session IDs), pagination pages, protocol/www variations, print versions, and any pages you've repurposed.

Quick Audit

In Google Search Console, check "Coverage" to see how many pages Google has indexed. If your sitemap shows 500 pages but only 300 are indexed, you might have duplicate content issues. Investigate the missing 200 pages.