微信客服
Telegram:guangsuan
电话联系:18928809533
发送邮件:[email protected]

What does canonical mean in SEO? | How to use canonical tags in SEO?

作者:Don jiang

The rel=”canonical” tag tells search engines “which URL is the canonical version of this content” to avoid split ranking signals.

In Google SEO, it’s used by adding <link rel=”canonical” href=”canonical URL”> to the page <head>.

Data shows that ​​ecommerce sites that correctly deploy the Canonical tag see an average 28% increase in indexing rate for product listing pages, and reduce duplicate URL crawl frequency by 40%-60%​​;

News sites that consolidate similar articles using canonical tags see an average 19% increase in search clicks for core content.

However, actual research reveals that only 31% of websites can use this tag 100% correctly (common errors include: pointing to wrong URLs, non-standard cross-protocol/domain, multiple tag stacking, etc.).

什么是canonical标签

​​Why You Need to Use the Canonical Tag

In Google search engine’s daily crawling, over 65% of websites have duplicate content issues caused by unreasonable URL structure design.

Specific manifestations:

  • The same article can be accessed via URLs with dynamic parameters (like ?utm_source=xxx)
  • With directory suffixes (like /page/ and /page/index.html)
  • Different subdomains (like www and non-www)

Google’s John Mueller has repeatedly mentioned in official Q&A sessions that when search engines find “multiple URLs displaying highly similar or identical content,” they face the judgment challenge of “which one should receive the ranking signals.”

An ecommerce product page may generate dozens of different URLs due to color filters and sorting parameters; a news release may be pushed to multiple categories, creating multiple entry links.

Using the canonical tag to explicitly tell search engines: “Although this content can be accessed through multiple URLs, please focus the ranking signals and attention on the URL I specify“.

How Duplicate Content Affects SEO

Duplicate content itself won’t directly lead to search engine penalties (Google explicitly states “websites won’t be penalized for simple content duplication”), but it causes ​​split ranking signals​​.

When the same content can be accessed through multiple URLs, search engines treat these URLs as “different pages” and process them separately.

For example, an original article is displayed through these 4 URLs:

  • https://example.com/article
  • https://example.com/article?source=newsletter
  • https://example.com/article#comments
  • https://www.example.com/article(version with www)

Without canonical tags, search engines may crawl all 4 URLs and calculate indexing ranking signals for each separately.

But users’ search needs essentially only need one answer, so the ranking for all 4 versions may be low (because ranking signals are split), and even if only one is accidentally indexed, the other versions remain in “not indexed” or “low ranking” status for a long time.

On ecommerce websites, a product detail page may generate an average of 8-12 duplicate URLs due to parameters (like ?size=XL, ?color=red), and these pages’ crawl share may reach 15%-20% of total crawl volume (which should be allocated to more valuable new pages).

On news sites, due to content being pushed to multiple categories (like “Latest News,” “Industry Trends,” “Hot Recommendations”), a single article may generate 3-5 different entry URLs.

More specific case: A mid-sized ecommerce website had only a 62% indexing rate for product listing pages before URL standardization (meaning out of 100 pages, only 62 were indexed by Google and could potentially rank);

After adding canonical tags to listing pages with parameters (like ?category=shoes&sort=price) pointing to the base URL without parameters (like /shoes), the indexing rate increased to 81% after 3 months, and organic search traffic for corresponding products increased by 17%.

Not “Deleting Duplicates,” But “Specifying the Authoritative Version”

Many webmasters have a misunderstanding about the canonical tag, thinking it’s “used to delete duplicate pages.”

In reality, ​​its core function is to “tell search engines: among multiple URLs displaying the same content, which one is the version you should prioritize for crawling, indexing, and ranking”​

When you add the following code to the <head> section of a page:

<link rel=“canonical” href=“https://example.com/canonical-URL” />​

You’re sending a clear signal to search engines: “Although this page (like the one with parameters /article?source=email) can also access the content, I want you to concentrate all ranking signals and opportunities on https://example.com/canonical URL“.

According to Google official documentation and actual crawl data observations:

  • ​Crawling level​​:Search engines will still crawl all versions of the page (including URLs with parameters and directories), but will reference canonical tags to adjust their “importance level” for these pages. For example, URLs with parameters may be crawled, but crawlers won’t visit or deeply index them as frequently as the canonical version.
  • ​Indexing level​​:If multiple URLs have highly similar content (duplication rate over 80%), search engines usually include the canonical version in the index database, and other versions may not be individually indexed, or even if indexed, won’t compete for core ranking.
  • ​Ranking signals level​​:When external links point to any duplicate version URL, search engines will, based on the canonical tag directive, “transfer” or “associate” this external link ranking signal to the canonical version (although not 100% complete transfer, the effect is close in most cases).

Let me give a practical scenario: An article on a blog site was simultaneously published in both “Homepage Featured” and “Tech Column” sections, generating two URLs:

  • https://example.com/home/recommend/123(homepage featured entry)
  • https://example.com/tech/article/123(tech column entry)

Both have identical content, but the homepage featured URL, because of its higher traffic, attracted some external links.

Without canonical tags, search engines might treat these two pages as independent content. Although the homepage featured URL has external links, because its category positioning is not vertical enough (homepage featured is usually comprehensive content), its ranking potential may be lower than the tech column’s.

If the tech team adds canonical tags to both pages, pointing to https://example.com/tech/article/123, which better matches the content topic, search engines will clearly know: “The authoritative version of this content is the tech column URL,” and associate the homepage featured external link ranking signals as well, improving this page’s ranking competitiveness for “technology-related keywords.”

What Happens If You Don’t Use the Canonical Tag

Crawler budget is wasted​

Search engines allocate a limited “daily crawl frequency” to each website (called “crawl budget”), prioritizing important pages (like homepage, high-update-frequency content pages).

If a website has a large number of duplicate URLs (like an ecommerce site’s product detail pages with 10 sorting parameters generating 1000+ different URLs), crawlers will consume part of the budget on these “same content, different URL” pages, causing the crawl frequency of truly important new pages (like newly listed products, updated news) to decrease.

Data shows that analysis of a clothing ecommerce website’s crawler logs revealed that duplicate product pages with parameters (like ?size=M, ?color=blue) accounted for 22% of total crawl volume, with these pages having a bounce rate as high as 85% (users searching for specific products won’t enter through parameter URLs).

After this website uniformly added canonical tags to product detail pages (pointing to base URLs without parameters), crawler frequency for core product pages increased by 30%, and the time for newly listed products to be indexed decreased from an average of 7 days to 3 days.

Indexed version chaos, unstable rankings​

Without canonical tags, search engines may randomly select a URL as the “default display version,” but this selection isn’t fixed.

For example, when users search for a certain keyword, sometimes they see the version with www (https://www.example.com/page), sometimes the version without www (https://example.com/page), or even the version with parameters (https://example.com/page?from=social).

Case: A local service website’s “Contact Us” page has two versions simultaneously: https://example.com/contact and https://example.com/contact-us (identical content), without setting canonical tags. Google indexed these two URLs at different time periods, causing users searching for “XX city repair service contact info” to sometimes see the first version ranking higher, sometimes the second version.

When users click, if they enter the non-primary version (like contact-us), they might experience decreased conversion rates due to differences in page navigation design (like missing online booking buttons).

Later, the website added canonical tags to both versions pointing to https://example.com/contact. After 3 months, the page’s ranking improved, and search click-through rate (CTR) increased by 11%.

External link ranking signals are split

If multiple duplicate version URLs are linked by external websites (like when someone reposts content using the parameterized URL, or when category pages generate new links during push notifications), but these external links are scattered across different addresses, search engines cannot automatically consolidate the ranking signals.

Data comparison: An article about “postgraduate entrance exam strategies” on an education website was reposted by 5 external sites, with 3 linking to the non-parameter version (https://example.com/guide/kaoyan), and 2 linking to the parameterized version (https://example.com/guide/kaoyan?from=partner).

Without canonical tags, search engines would associate these 5 external links separately to different URLs. After the website added canonical tags to all versions (pointing to the non-parameter version), organic search traffic for this page increased by 24% within 6 months.

​​​​Basic Syntax and Writing of the Canonical Tag

About 32% of pages place the canonical tag in the <body> section (instead of the required <head> area), 19% have incomplete protocols in the href attribute value (like writing only example.com instead of https://example.com), and another 15% of pages point to different “canonical versions” among multiple duplicate URLs (causing search engine confusion).

From a technical implementation perspective, the canonical tag is essentially a simple HTML link tag, but ​​tag placement (must be in <head>), syntax format (strictly follow HTML specifications), and the target URL (must be exactly match the actual content and be accessible)​​ are all critical.

Data shows that when canonical tags are deployed with standard writing (placed at the top of <head>, using complete HTTPS protocol, pointing to a unique and correct canonical URL), search engines correctly identify and apply the tag with over 95% probability;

Among pages with writing errors, approximately 60% of the canonical intent is not adopted by search engines, causing duplicate content issues to persist.

For example, when an ecommerce website added a canonical tag to a product detail page (like the ?color=red version), forgetting the protocol prefix (writing it as //example.com/product or example.com/product) caused Google to fail correctly parsing the target URL.

Standard Syntax Structure

The complete syntax for the canonical tag is just one line of HTML code: <link rel=“canonical” href=“https://www.example.com/canonical-page-full-URL” />

This line consists of 3 core parts, each indispensable and with fixed order:

​Tag type: <link>

  • This is a tag in HTML used to define the relationship between the document and external resources. The canonical tag belongs to one type of “link relationship” and must use <link> as the base structure.

​Attribute: rel="canonical"

  • rel is a required attribute of the <link> tag, used to explain the relationship between the current link and the current document. When set to canonical, it explicitly tells search engines: “This tag defines the canonical (authoritative) version of the current page content”.

​Attribute: href="URL"

  • href is another required attribute of the <link> tag, used to specify the specific URL of the canonical version. This URL must be ​​complete and accessible​​, including protocol (http or https), domain (www or non-www), path, and parameters (if necessary).

For example:

  • Correct: href="https://www.example.com/products/shoes"
  • Wrong 1 (missing protocol): href="//www.example.com/products/shoes" (browsers may auto-complete, but search engines may not parse accurately)
  • Wrong 2 (missing domain): href="/products/shoes" (relative path, search engines don’t know which website’s page this refers to)
  • Wrong 3 (typo): href="https://www.exaple.com/products/shoes" (domain misspelled, pointing to a non-existent page)

Other details​​:

  • The tag must end with / (if the URL itself requires a trailing slash), but in most cases, modern search engines are more tolerant of whether to include a slash (as long as standardization is consistent).
  • The tag must be written on one line (line breaks may cause some parsing tools to malfunction, although search engines can usually auto-fix).
  • The closing part of the tag is /> (self-closing tag, HTML5 standard allows omitting the final /, but it’s recommended to keep it for compatibility).

Why It Must Be in <head>

Because when search engine crawlers crawl pages, they prioritize parsing content in the <head> area (especially “control directives” like meta information, titles, and canonical tags), then process the actual content in <body>.

If the canonical tag is incorrectly placed inside <body> (like nested in article paragraphs or footer code), search engines will directly ignore the <link rel="canonical"> tag in <body>.

Other supplements​​:

  • A page can only have one canonical tag (if multiple appear, search engines usually only recognize the first one, and the rest are ignored).
  • This tag cannot be nested inside other tags (for example, cannot be placed inside <div> or <script>).
  • For dynamically generated pages (like pages output by backend languages like PHP, Python), ensure that when the template engine outputs HTML, the canonical tag is correctly inserted into the <head> area (usually controlled via template variables).

5 Most Common Mistakes

Mistake 1: Pointing to the Wrong URL (canonical version doesn’t match actual needs)​

  • ​Phenomenon​​:Pointing the canonical tag to a URL with content that is not completely identical (or is entirely different content). For example, a product detail page (showing red shoes) has its canonical pointing to the white shoes page.
  • ​Consequence​​:Search engines will concentrate ranking signals on unrelated pages according to the wrong directive, causing the core content’s ranking to decline.
  • ​Fix​​:Check the actual content of the current page, ensuring the URL in href points to the canonical version that “displays exactly the same content” (for example, uniformly using the base URL without parameters, or the category page that best matches user search intent).

​Mistake 2: Missing Protocol Prefix (only writing domain or using relative path)​

  • ​Phenomenon​​:Code written as href="//example.com/page" (protocol-relative path) or href="/page" (relative path).
  • ​Consequence​​:Search engines may not accurately parse the complete address of the target URL (especially for cross-protocol or cross-domain cases), causing canonical intent to fail.
  • ​Fix​​:Always use complete protocol + domain + path, format as href="https://www.example.com/page" (https protocol recommended for security).

​Mistake 3: Parameter URLs Conflict with Canonical Version​

  • ​Phenomenon​​:The non-parameter version of a product listing page (https://example.com/products) is the canonical version, but the parameterized version (like https://example.com/products?sort=price) doesn’t correctly point to it, but instead points to another URL with different parameters (like ?sort=date).
  • ​Consequence​​:Multiple parameter versions point to different URLs, forming “circular canonical” or split ranking signals.
  • ​Fix​​:Unify all parameterized URLs’ canonical to point to the non-parameter base version (or the most commonly used sorting/filtering version), ensuring all variant versions point to the same canonical address.

​Mistake 4: Tag Placed Inside <body>​​

  • ​Phenomenon​​:When editing the page through the CMS backend, the canonical code was mistakenly pasted into the article content area (<body> section) instead of the website template’s <head> area.
  • ​Consequence​​:Search engine crawlers may ignore the tag, causing duplicate pages to not be correctly canonicalized.
  • ​Fix​​:Contact the tech team to check template files (like WordPress’s header.php, Shopify’s theme.liquid), ensuring the canonical tag is output inside the HTML’s <head> tag.

​Mistake 5: Multiple Canonical Tags Stacked​​

  • ​Phenomenon​​:Due to template errors or manual addition, multiple <link rel="canonical"> tags appear on one page (for example, simultaneously pointing to /page and /page/).
  • ​Consequence​​:Search engines usually only recognize the first tag, subsequent tags are ignored, which may cause canonical intent confusion.
  • ​Fix​​:Check the code, remove extra canonical tags, ensuring each page has only one canonical directive.

Differences Between Canonical and Other Tags (like noindex, 301 Redirect)​

The canonical tag is “specifying the authoritative version of the same content” (keep all URLs, but focus ranking signals), the noindex tag is “prohibiting search engines from indexing the current page” (allow crawling but not display), and the 301 redirect is “permanently redirecting old URL to new URL” (traffic and ranking signals completely transferred)​​.

Essential Differences Between Canonicalize, Prohibit, and Redirect

Canonical tag (canonical tag)​:Used in “scenarios where the same content has multiple URLs,” with the purpose of telling search engines “these page contents are actually the same, but you only need to pay attention to the URL I specify (canonical version), and concentrate ranking signals here”.

  • Typical scenarios:Ecommerce product detail pages with parameters (like ?color=red and ?color=blue), news releases pushed to multiple categories (like “Latest News” and “Industry Trends”), mobile and desktop independent URLs but identical content.

noindex tag (noindex tag)​:Used in “scenarios where crawling is allowed but display is prohibited,” telling search engines “you can crawl this page, but don’t put it in the search results index database”.

  • Typical scenarios:Internal management pages (like login pages, backend statistics pages), temporary event pages (no ranking needed after event ends), low-value content pages (like print versions, simplified/traditional conversion pages).

​301 Redirect (permanent redirect)​​:Used in “scenarios where content has been permanently migrated,” automatically redirects users and search engines from the old URL to the new URL through server configuration (like .htaccess files or Nginx rules). Ranking signals of the old URL (including rankings, external links, user trust) will gradually transfer to the new URL, and eventually the old URL may no longer be directly accessed (but the redirect remains effective).

  • Typical scenarios:Website domain change (like from example.com to newexample.com), URL structure adjustment (like /old-product/ changed to /products/new-product/), consolidating multiple old pages into one new page.
Tool Allows Crawling Allows Indexing Changes URL Core Purpose
canonical ✅ Allows ❌ Suggests not indexing (but may still index) ❌ No change Concentrate ranking signals from multiple identical content versions to the canonical version
noindex ✅ Allows ❌ Prohibits ❌ No change Prevent page from appearing in search results
301 Redirect ❌ Auto redirects ❌ Old URL not indexed ✅ Redirects to new URL Transfer old URL’s ranking signals and traffic to new address

4 Sets of Common Scenario Comparisons and Their Usage

Scenario 1: Same Content Has Multiple URLs (like product pages with parameters)​

  • ​Problem​​:Product detail page can be accessed via https://example.com/product and https://example.com/product?color=red, with completely identical content.
  • ​Correct Tool​​:canonical. Add canonical tag to the URL with parameters (?color=red), pointing to the non-parameter base URL (https://example.com/product), telling search engines “the authoritative version of this content is the non-parameter page”.
  • ​Why not choose noindex/301​​:noindex will prevent the parameterized page from being indexed (but may still be crawled), but users may enter through this link, and search engines still need to determine which is the main version; 301 redirect requires forcing users and crawlers to redirect, but users may indeed need to access through different parameters (like comparing different colors), so forced redirect is not appropriate.

​Scenario 2: Page No Longer Needs to Appear in Search Results (like expired event pages)​

  • ​Problem​​:A promotional event page (https://example.com/promo) has ended but may still be accessed by users through bookmarks or external links, and doesn’t need ranking.
  • ​Correct Tool​​:noindex. Add <meta name="robots" content="noindex"> tag to the event page’s <head> (or set via CMS), allowing search engines to crawl the page (like checking event records) but not allowing it to enter the index database.
  • ​Why not choose canonical/301​​:canonical cannot solve the problem of “not letting the page appear” (it only focuses ranking signals); 301 redirect requires specifying a new URL (but the event page has no corresponding new address), and users may still need to access the original page to view historical information.

​Scenario 3: Website Changes Domain or Adjusts URL Structure (like old product page migration)​

  • ​Problem​​:Old product page (https://old.example.com/item1) has been permanently migrated to a new address (https://new.example.com/products/item1), and the original external link ranking signals and user access habits need to be preserved.
  • ​Correct Tool​​:301 redirect. Set up through server configuration (like Apache’s .htaccess file): when users or crawlers access the old URL, automatically redirect to the new URL. Ranking signals and external links of the old URL will gradually transfer to the new URL.
  • ​Why not choose canonical/noindex​​:canonical cannot achieve traffic redirection (users will still stay on the old URL); noindex will keep the old URL from being indexed, but external link ranking signals won’t transfer, and users cannot access new content through old links.

​Scenario 4: Mobile and Desktop Independent URLs (like m.example.com and www.example.com)​

  • ​Problem​​:The same content has independent URLs on mobile (https://m.example.com/page) and desktop (https://www.example.com/page), with completely identical content.
  • ​Correct Tool​​:Prioritize canonical (pointing to desktop URL), or use responsive design to unify URLs. If the mobile version is a necessary entry (like users are accustomed to accessing via m.example.com), add canonical tag on the mobile page pointing to the desktop canonical URL, and optionally redirect some old mobile links to the desktop via 301 redirect.
  • ​Why not choose noindex​​:noindex will keep either the mobile or desktop version from being indexed, which may cause some users’ search needs to not be met (like mobile users searching cannot see adapted content).

How to Write the Code? How Do the effective logic Differ?

Canonical tag: HTML code, relies on search engine parsing​

  • ​Code writing​​:Add <link rel="canonical" href="https://canonical URL" /> to the <head> section of the page to be canonicalized (as described in the previous section).

Effect Logic​:When search engines crawl the page, they read this tag and record “the canonical version of this page is XXX”, then prioritize the canonical version when calculating rankings and allocating ranking signals. However, other versions of the page may still be crawled (unless there are other restrictions).

noindex tag: HTML meta tag or HTTP response header, relies on crawlers following rules​

  • ​Code writing​​:Usually add <meta name="robots" content="noindex"> to the page’s <head> (applicable to most cases), or return HTTP response header X-Robots-Tag: noindex through the server (applicable to dynamic pages).

Effect Logic​:Search engines detect this directive when crawling the page. If the page meets noindex conditions (like not a spam page), it will not be added to the index database. However, the page will still be crawled (unless coordinated with robots.txt to block crawling), and users can still access it via direct links.

301 redirect: Server configuration, forces traffic redirect​​​
​Code writing​​:Implemented through server technology, for example:

  • Apache server: Add Redirect 301 /old-page https://example.com/new-page in the .htaccess file;
  • Nginx server: Add return 301 https://example.com/new-page; in the configuration file;
  • CMS systems (like WordPress): Set redirect rules through plugins (like Redirection).

Effect Logic​:When users or search engines access the old URL, the server automatically returns a 301 status code and redirects to the new URL, with the browser address bar displaying the new address. Ranking signals of the old URL will gradually (usually several weeks to several months) transfer to the new URL, and eventually the old URL may no longer be directly accessed (but the redirect function is retained).

Scroll to Top