微信客服
Telegram:guangsuan
电话联系:18928809533
发送邮件:[email protected]

What canonical means in SEO丨How to use canonical tags in SEO

Author: Don jiang

The rel=”canonical” tag is used to inform search engines “which URL is the canonical version of the content,” avoiding dispersed authority.

In Google SEO, it’s implemented by adding in the page’s <head> section.

Data shows that, on average, the indexing rate of product listing pages for e-commerce websites that correctly implement the Canonical tag increases by 28%, and the number of crawl requests for duplicate URLs decreases by 40%-60%;

News sites that consolidate similar articles using the canonical tag see an average increase of 19% in search clicks for their core content.

However, an actual survey found that only 31% of websites use the tag 100% correctly (common errors include: pointing to the wrong URL, non-compliant cross-protocol/domain, and multiple tags stacking).

What is a canonical tag

Why the Canonical Tag is Necessary

In Google search engine’s daily crawling process, more than 65% of websites have duplicate content issues caused by unreasonable URL structure design.

Specifically, this manifests as:

     

  • The same article can be accessed through URLs with parameters (such as ?utm_source=xxx)
  •  

  • URLs with directory suffixes (such as /page/ and /page/index.html)
  •  

  • Different subdomains (such as www and non-www)

Google’s John Mueller has repeatedly mentioned in official Q&A sessions that when a search engine discovers “multiple URLs displaying highly similar or identical content,” it faces the difficult judgment of “to which one should the authority be allocated.”

An e-commerce product page might generate a dozen different URLs due to color filtering or sorting parameters; a press release might be published to multiple sections, creating multiple entry links.

Using the canonical tag clearly tells the search engine: “Although this content can be seen through multiple URLs, please focus the authority and ranking attention on this specific URL I designated.”

How Duplicate Content Affects SEO

Duplicate content itself does not directly lead to search engine penalties (Google explicitly stated it “does not penalize websites simply for content duplication”), but it leads to authority dispersion.

When the same content is accessible through multiple URLs, the search engine treats these URLs as “different pages” and processes them separately.

For instance, an original article is displayed through the following 4 URLs:

     

  • https://example.com/article
  •  

  • https://example.com/article?source=newsletter
  •  

  • https://example.com/article#comments
  •  

  • https://www.example.com/article (the www version)

Without a canonical identifier, the search engine might crawl all 4 URLs simultaneously and calculate indexing authority for each.

However, the user’s search need fundamentally requires only one answer. Consequently, the ranking of these 4 versions might all be low (because the authority is dispersed), or only one is occasionally indexed, with the others remaining “unindexed” or “low-ranking” for a long time.

On e-commerce websites, a single product detail page can generate an average of 8-12 duplicate URLs due to parameters (such as ?size=XL, ?color=red). The crawler’s crawl rate for these pages might account for 15%-20% of the total crawl budget (which should have been allocated to more valuable new pages).

News sites might generate 3-5 different entry URLs for a single article because the content is pushed to multiple sections (such as “Latest News,” “Industry Trends,” “Popular Recommendations”).

A more specific case: Before canonicalizing its URLs, a medium-sized e-commerce website’s product listing page indexing rate was only 62% (meaning only 62 out of 100 pages were indexed by Google and potentially participated in ranking);

By adding a canonical tag to parameterized listing pages (such as ?category=shoes&sort=price) pointing to the basic URL without parameters (such as /shoes), the indexing rate increased to 81% after 3 months, and the corresponding product’s organic search traffic grew by 17%.

It’s not “deleting duplicates,” but “designating the authoritative version”

Many webmasters misunderstand the canonical tag, believing it is “used to delete duplicate pages.”

In fact, its core function is to “tell the search engine: among multiple URLs displaying the same content, which one is the version you should prioritize for indexing, inclusion, and ranking.”

When you add the following code to the <head> section of a page:

<link rel=“canonical” href=“https://example.com/canonicalURL” />​

You are sending a clear signal to the search engine: “Although this page (e.g., the parameterized /article?source=email) also provides the content, I want you to concentrate its authority and ranking potential on the https://example.com/canonicalURL address.”

Based on Google’s official documentation and observation of actual crawl data:

     

  • Crawl Level: Search engines still crawl all versions of the page (including parameterized and directory URLs), but they adjust the “level of importance” given to these pages based on the canonical tag. For instance, parameterized URLs might be crawled, but the crawler will not revisit them as frequently or index them as deeply as the canonical version.
  •  

  • Index Level: If the content of multiple URLs is highly similar (duplication rate exceeds 80%), the search engine will typically include the canonical version in the index, while other versions might not be indexed separately, or even if indexed, will not participate in core ranking competition.
  •  

  • Authority Level: When external links point to any of the duplicate versions, the search engine will, guided by the canonical tag, “transfer” or “associate” this portion of the external link authority to the canonical version (though not 100% fully transferred, the effect is close in most cases).

A practical scenario: An article on a blog site is published under both the “Homepage Recommendation” and “Technology Column” sections, generating two URLs:

     

  • https://example.com/home/recommend/123 (Homepage Recommendation entry)
  •  

  • https://example.com/tech/article/123 (Technology Column entry)

The content is identical, but the Homepage Recommendation URL, due to higher traffic, attracted some external links.

Without the canonical tag, the search engine might treat these two pages as independent content. Although the Homepage Recommendation URL has external links, its ranking potential might be lower than the Technology Column’s because the section’s focus is less specific (homepage recommendations are generally for mixed content).

If the technical team adds a canonical tag to both pages, pointing to the URL more aligned with the content’s topic, https://example.com/tech/article/123, the search engine will clearly know: “The authoritative version of this content is the Technology Column’s URL,” associating the Homepage Recommendation’s external link authority with it and improving the page’s ranking competitiveness for “technology-related keywords.”

What Happens If the Canonical Tag Is Not Used

Crawl Budget is Wasted

The “daily crawl limit” allocated by search engines to each website is finite (known as the “crawl budget”), prioritizing the crawling of important pages (such as the homepage, high-update-frequency content pages).

If the website has many duplicate URLs (e.g., an e-commerce product detail page with 10 sorting parameters, generating 1000+ different URLs), the crawler will spend part of the budget on these “same content but different URL” pages, leading to a decrease in the crawling frequency of genuinely important new pages (such as newly listed products, updated news).

Data analysis of a clothing e-commerce website’s crawler log shows that parameterized duplicate product pages (such as ?size=M, ?color=blue) accounted for 22% of the total crawl volume, while the bounce rate of these pages was as high as 85% (users are searching for specific products, not entering through parameterized URLs).

When the website uniformly added a canonical tag to product detail pages (pointing to the basic URL without parameters), the crawler’s frequency on core product pages increased by 30%, and the time to index newly listed products was shortened from an average of 7 days to 3 days.

Index Version Confusion, Unstable Ranking

Without a canonical identifier, the search engine might randomly choose one URL as the “default display version,” but this choice is not fixed.

For instance, when a user searches for a specific keyword, they might sometimes see the www version (https://www.example.com/page), sometimes the non-www version (https://example.com/page), or even a parameterized version (https://example.com/page?from=social).

Case study: The “Contact Us” page of a local service website existed in two versions: https://example.com/contact and https://example.com/contact-us (identical content), without a canonical tag. Google indexed these two URLs at different times, causing users searching for “XX city repair service contact information” to sometimes see the first version ranking higher, and sometimes the second.

If the user clicks and lands on a non-primary version (such as contact-us), the conversion rate might decrease due to differences in page navigation design (e.g., missing an online appointment button).

Later, the website added a canonical tag to both versions, pointing to https://example.com/contact. Three months later, the page’s ranking improved, and the search Click-Through Rate (CTR) increased by 11%.

External Link Authority Dispersion

If multiple duplicate versions of the URL are linked by external websites (e.g., someone reposts content using a parameterized URL, or a new link is generated when pushed through a section page), but these external links are scattered across different addresses, the search engine cannot automatically consolidate the authority.

Data comparison: An education website’s “Postgraduate Entrance Exam Guide” article was reposted by 5 external sites, 3 of which linked to the non-parameterized version (https://example.com/guide/kaoyan), and 2 linked to the parameterized version (https://example.com/guide/kaoyan?from=partner).

Without the canonical tag, the search engine would associate these 5 external links with different URLs. After the website added the canonical tag to all versions (pointing to the non-parameterized version), the page’s organic search traffic increased by 24% within 6 months.

Canonical Tag Basic Syntax and Implementation

About 32% of pages placed the canonical tag in the <body> section (instead of the required <head> area), 19% of the href attribute values lacked the complete protocol (e.g., writing only example.com instead of https://example.com), and 15% of pages pointed to different “canonical versions” among multiple duplicate URLs (causing search engine confusion).

From a technical perspective, the canonical tag is essentially a simple HTML link tag, but the tag position (must be within <head>), syntax format (strictly follow HTML specification), and the target URL (must exactly match the actual content and be accessible) are critical.

Data shows that when the canonical tag is deployed according to the standard writing method (i.e., placed at the top of <head>, using the full HTTPS protocol, and pointing to a unique and correct canonical URL), the probability of search engines correctly identifying and applying the tag exceeds 95%;

For pages with incorrect implementation, about 60% of the canonical intent is not adopted by search engines, meaning the duplicate content issue persists.

For example, an e-commerce website, when adding a canonical tag for a product detail page (e.g., the parameterized ?color=red version), omitted the protocol header (writing it as //example.com/product or example.com/product), causing Google to be unable to correctly parse the target URL.

Standard Syntax Structure

The complete syntax of the canonical tag is a single line of HTML code: <link rel=“canonical” href=“https://www.example.com/fullURLofthecanonicalpage” />

This line of code consists of 3 core parts, all necessary and in fixed order:

Tag Type: <link>

     

  • This is the HTML tag used to define the relationship between the document and external resources. The canonical tag is a type of “link relationship” and must use <link> as its basic structure.

Attribute: rel="canonical"

     

  • rel is a required attribute for the <link> tag, used to specify the relationship between the current link and the current document. When its value is set to canonical, it clearly tells the search engine: “This tag defines the canonical (authoritative) version of the current page’s content.”

Attribute: href="URL"

     

  • href is the other required attribute for the <link> tag, used to specify the exact URL of the canonical version. This URL must be complete and accessible, including the protocol (http or https), domain (www or non-www), path, and parameters (if necessary).

For example:

     

  • Correct implementation: href="https://www.example.com/products/shoes"
  •  

  • Incorrect implementation 1 (missing protocol): href="//www.example.com/products/shoes" (the browser might automatically complete it, but the search engine might not parse the target URL accurately)
  •  

  • Incorrect implementation 2 (missing domain): href="/products/shoes" (relative path, the search engine does not know which website the page belongs to)
  •  

  • Incorrect implementation 3 (spelling error): href="https://www.exaple.com/products/shoes" (domain misspelled, pointing to a non-existent page)

Other Details:

     

  • The tag must end with / (if the URL itself requires a trailing slash), but in most cases, modern search engines are tolerant of whether a slash is present or not (as long as the canonical is consistent).
  •  

  • The tag must be written on a single line (line breaks might cause errors in some parsing tools, although search engines usually fix them automatically).
  •  

  • The closing part of the tag is /> (self-closing tag, the HTML5 standard allows omitting the final /, but it is recommended to keep it for compatibility).

Why it Must Be in <head>

Because when search engine crawlers fetch a page, they prioritize parsing the content in the <head> area (especially meta information, title, canonical tags, and other “control directives”), and only then process the actual content in the <body>.

If the canonical tag is incorrectly placed inside the <body> (e.g., nested within an article paragraph or footer code), the search engine will ignore the <link rel="canonical"> tag within the <body>.

Other Supplements:

     

  • A page can only have one canonical tag (if multiple appear, the search engine usually only recognizes the first one, and the rest are ignored).
  •  

  • The tag cannot be nested inside other tags (e.g., it cannot be placed within <div> or <script>).
  •  

  • For dynamically generated pages (e.g., pages output through back-end languages like PHP, Python, etc.), it must be ensured that the template engine correctly inserts the canonical tag into the <head> area when outputting HTML (usually controlled by template variables).

5 Most Common Errors

Error 1: Pointing to the Wrong URL (Canonical version does not match actual need)

     

  • Phenomenon: The canonical tag points to a URL where the content is not entirely consistent (or is not the same content at all). For example, the canonical of a product detail page (showing red shoes) points to the page for white shoes.
  •  

  • Consequence: The search engine will concentrate authority on an irrelevant page according to the incorrect instruction, leading to a decline in the core content’s ranking.
  •  

  • Correction: Check the actual content of the current page and ensure the URL in the href points to the canonical version that “displays exactly the same content” (e.g., uniformly use the basic URL without parameters, or the section page that best matches the user’s search intent).

Error 2: Missing Protocol Header (only writing domain or using relative path)

     

  • Phenomenon: The code is written as href="//example.com/page" (protocol relative path) or href="/page" (relative path).
  •  

  • Consequence: The search engine may not be able to accurately parse the complete address of the target URL (especially in cross-protocol or cross-domain situations), leading to the canonical intent failing.
  •  

  • Correction: Always use the full protocol + domain + path, in the format href="https://www.example.com/page" (HTTPS protocol is recommended for security).

Error 3: Parameterized URL conflicts with the Canonical Version

     

  • Phenomenon: The non-parameterized version of a product listing page (https://example.com/products) is the canonical version, but the parameterized version (such as https://example.com/products?sort=price) does not correctly point to it, instead pointing to another URL with different parameters (such as ?sort=date).
  •  

  • Consequence: Multiple parameterized versions point to different URLs, forming a “canonical loop” or authority dispersion.
  •  

  • Correction: Standardize the canonical tag of all parameterized URLs to point to the basic non-parameterized version (or the most frequently used sorting/filtering version), ensuring all variant versions point to the same canonical address.

Error 4: Tag Placed Inside <body>

     

  • Phenomenon: When editing the page through the CMS backend, the canonical code is mistakenly pasted into the article content area (<body> section), instead of the website template’s <head> area.
  •  

  • Consequence: Search engine crawlers may ignore the tag, leading to duplicate pages not being correctly canonicalized.
  •  

  • Correction: Contact the technical team to check the template files (such as WordPress’s header.php, Shopify’s theme.liquid) to ensure the canonical tag is output within the HTML’s <head> tag.

Error 5: Multiple Canonical Tags Stacking

     

  • Phenomenon: Due to a template error or manual addition, multiple <link rel="canonical"> tags appear on a page (e.g., pointing to both /page and /page/).
  •  

  • Consequence: The search engine usually only recognizes the first tag, and subsequent tags are ignored, potentially leading to canonical confusion.
  •  

  • Correction: Check the code, delete redundant canonical tags, and ensure only one canonical directive per page.

Differences Between Canonical and Other Tags (such as noindex, 301 Redirect)

The canonical tag is for “designating the authoritative version of the same content” (keeping all URLs but focusing authority), the noindex tag is for “prohibiting search engines from indexing the current page” (allowing crawling but not displaying), and the 301 redirect is for “permanently redirecting the old URL to the new URL” (traffic and authority are completely transferred).

Essential Differences between Canonicalization, Prohibition, and Redirection

Canonical Tag (Canonicalization Tag): Used for “multiple URL scenarios with the same content,” the purpose is to tell the search engine “these pages have the same content, but you only need to focus on this specified URL (the canonical version) and concentrate the authority here.”

     

  • Typical Scenarios: E-commerce product detail pages with parameters (such as ?color=red and ?color=blue), news articles pushed to multiple sections (such as “Latest News” and “Industry Trends”), independent mobile and PC URLs with identical content.

Noindex Tag (Index Prohibition Tag): Used for “scenarios where crawling is allowed but displaying is prohibited,” telling the search engine “you can crawl this page, but do not put it in the search result index.”

     

  • Typical Scenarios: Internal management pages (such as login pages, backend statistics pages), temporary event pages (no need for ranking after the event ends), low-value content pages (such as print versions, simplified/traditional Chinese conversion pages).

301 Redirect (Permanent Redirection): Used for “scenarios where content has permanently migrated,” automatically redirecting users and search engines from the old URL to the new URL via server configuration (such as .htaccess files or Nginx rules). The old URL’s authority (including ranking, external links, user trust) will gradually transfer to the new URL, and eventually, the old URL may no longer be accessed (but the redirection remains effective).

     

  • Typical Scenarios: Website domain change (e.g., migrating from example.com to newexample.com), adjusting URL structure (e.g., changing /old-product/ to /products/new-product/), merging multiple old pages into one new page.
ToolAllow CrawlingAllow IndexingChange URLCore Purpose
canonical✅ Yes❌ Recommended not to index (but might still index)❌ No changeConcentrate authority of multiple same-content pages to the canonical version
noindex✅ Yes❌ Prohibited❌ No changePrevent the page from appearing in search results
301 Redirect❌ Automatic Redirect❌ Old URL not indexed✅ Redirect to new URLTransfer authority and traffic from old URL to new address

4 Common Scenario Comparisons and Their Usage

Scenario 1: Same Content with Multiple URLs (e.g., parameterized product pages)

     

  • Problem: A product detail page can be accessed via https://example.com/product and https://example.com/product?color=red, with identical content.
  •  

  • Correct Tool: canonical. Add the canonical tag to the parameterized URL (?color=red), pointing to the basic URL without parameters (https://example.com/product), telling the search engine “the authoritative version of this content is the non-parameterized page.”
  •  

  • Why not noindex/301: noindex would prevent the parameterized page from being indexed (though it might still be crawled), but users might enter through this link, and the search engine still needs to decide which is the primary version; 301 redirect would forcibly redirect the user and crawler, but users might genuinely need to access different parameters (e.g., to compare different colors), so forced redirection is not suitable.

Scenario 2: Page No Longer Needs to Appear in Search Results (e.g., expired event page)

     

  • Problem: A promotional event page (https://example.com/promo) has ended but might still be accessed by users via bookmarks or external links, and ranking is no longer needed.
  •  

  • Correct Tool: noindex. Add the <meta name="robots" content="noindex"> tag to the event page’s <head> (or configure via CMS), allowing the search engine to crawl the page (e.g., to check event history) but prohibiting its inclusion in the index.
  •  

  • Why not canonical/301: canonical cannot solve the problem of “not showing the page” (it only focuses authority); 301 redirect requires specifying a new URL (but the event page has no corresponding new address), and users might still need to visit the original page to view historical information.

Scenario 3: Website Domain Change or URL Structure Adjustment (e.g., old product page migration)

     

  • Problem: The old product page (https://old.example.com/item1) has been permanently moved to a new address (https://new.example.com/products/item1), and the original external link authority and user access habits need to be preserved.
  •  

  • Correct Tool: 301 Redirect. Set up server configuration (such as Apache’s .htaccess file) to automatically redirect to the new URL when a user or crawler accesses the old URL. The old URL’s ranking and external link authority will gradually transfer to the new URL.
  •  

  • Why not canonical/noindex: canonical cannot achieve traffic redirection (users would still stay at the old URL); noindex would prevent the old URL from being indexed, but external link authority would not be transferred, and users could not access the new content through the old link.

Scenario 4: Independent Mobile and PC URLs (e.g., m.example.com and www.example.com)

     

  • Problem: The same content has independent URLs for mobile (https://m.example.com/page) and PC (https://www.example.com/page), with identical content.
  •  

  • Correct Tool: Prioritize using canonical (pointing to the PC URL), or unify the URL with responsive design. If the mobile URL is a necessary entry point (e.g., users are accustomed to accessing m.example.com), add a canonical tag on the mobile page pointing to the PC canonical URL, and optionally use a 301 redirect to redirect some old mobile links to the PC URL.
  •  

  • Why not noindex: noindex would prevent one version (mobile or PC) from being indexed, potentially preventing some users’ search needs from being met (e.g., mobile users not seeing the adapted content in search results).

How to Write the Code? How Does the Effect Logic Differ?

Canonical Tag: HTML Code, Relies on Search Engine Parsing

     

  • Code Implementation: Add <link rel="canonical" href="https://CanonicalURL" /> to the <head> section of the page that needs canonicalization (as described in the previous section).

Effect Logic: When the search engine crawls the page, it reads the tag and records “the canonical version of this page is XXX,” and subsequently prioritizes the canonical version when calculating ranking and allocating authority. However, other versions of the page may still be crawled (unless there are other restrictions).

Noindex Tag: HTML Meta Tag or HTTP Response Header, Relies on Crawler Compliance

     

  • Code Implementation: Usually, add <meta name="robots" content="noindex"> to the page’s <head> (suitable for most cases), or return the HTTP response header X-Robots-Tag: noindex via the server (suitable for dynamic pages).

Effect Logic: The search engine detects this directive when crawling the page. If it confirms that the page meets the noindex condition (e.g., not a spam page), it will not add it to the index. However, the page will still be crawled (unless crawling is blocked with robots.txt), and users can access it via a direct link.

301 Redirect: Server Configuration, Forces Traffic Redirection

Code Implementation: Implemented through server technology, for example:

     

  • Apache Server: Add Redirect 301 /old-page https://example.com/new-page to the .htaccess file;
  •  

  • Nginx Server: Add return 301 https://example.com/new-page; to the configuration file;
  •  

  • CMS Systems (such as WordPress): Set up redirection rules via a plugin (such as Redirection).

Effect Logic: When a user or search engine accesses the old URL, the server automatically returns a 301 status code and redirects to the new URL, and the browser’s address bar displays the new address. The authority of the old URL will gradually (usually weeks to months) transfer to the new URL, and eventually, the old URL may no longer be directly accessed (but the redirection functionality remains).

滚动至顶部