微信客服
Telegram:guangsuan
电话联系:18928809533
发送邮件:[email protected]

How to handle duplicate content on a website without deleting pages

作者:Don jiang

The specific steps are simple, just follow these operations:

  1. Add canonical tag: Add to duplicate pages
    <link rel="canonical" href="main page URL">
  2. Set 301 redirect: Redirect similar or old URLs to the main page (server or .htaccess configuration)
  3. Limit parameter page indexing: Add to duplicate parameter pages
    <meta name="robots" content="noindex,follow">
  4. Unify internal links: All internal links point only to the main URL

This generally can reduce 30%-50% of duplicate indexing.

Setting Canonical Tags (Preferred Solution)

How to Set It

Placing the code at the beginning of the webpage in the area called <head> is the basic operation. Remember to place it immediately below <title>, keeping it within the first 15 lines of code. Crawlers can see it at a glance when downloading the first 20KB of webpage data. If placed in the <body> main content area, crawlers will ignore it and toss it aside, wasting hundreds of server access quotas daily.

Online store URLs always come with a long string of letters and numbers used for accounting and tracking. The server takes an extra 150 milliseconds to respond for each long URL generated. When backend technicians write this standard code, make sure to cut off all the tail after the question mark, leaving only the clean version of the URL to fill in the href field.

Tracking tails commonly encountered when handling URLs include these types:

  • sessionid=12345 (visitor browsing records)
  • utm_source=google (mark advertising source)
  • sort=price_asc (sort by price low to high)
  • category=shoes (select shoe subcategory)
  • page=2 (page to the second page)

For 2MB-sized product manual PDF files, you can’t insert frontend code like normal webpages. Once indexed, PDF files compete with webpage versions for ranking. You need to modify the underlying configuration of Nginx or Apache servers to give this non-web format file a special pass.

The method is to modify the .htaccess root file in the server, adding a directive with Link: <https://site.com/product-page>; rel="canonical". The server sends this signal to search engines within the first 50 milliseconds of serving the PDF file. Once the crawler receives the complete signal with HTTP status code 200, the authority transfers smoothly.

When you publish your own hard-written blog on Medium or other major forums, your original website’s search traffic gets taken away by a large margin. Cross-domain canonical tags can handle the handover between two completely different URLs. In the publishing backend of those external forums, fill in your original article’s full absolute URL with https, and your home website can easily recover nearly 90% of the initial authority.

For building platforms like Shopify, the underlying /collections/all directory often generates thousands of duplicate product pages. The frontend needs to modify the theme.liquid theme file, finding the section from line 25 to line 40. Adding a rendering code snippet with {{ canonical_url }}, the system can clarify the ownership of tens of thousands of duplicate pages across the entire site in 0.2 seconds.

Where to fill in URLs in major content management systems:

  • Yoast SEO plugin: When writing articles, scroll to the bottom and find the “Advanced” menu.
  • Rank Math tool: Click the gear icon in the right sidebar to “Advanced” tab.
  • Magento 2 system: Follow Store to Configuration then to Catalog.
  • Wix builder: “Advanced Markup” area at the bottom of independent page settings.

Within 48 hours after the code goes live, log into Google Search Console official platform. Enter the URL in the search box above and press Enter. In the “Webpage Indexing” report section, watch the line “Canonical URL chosen by Google”. Carefully check if the URL the system crawled matches the URL you typed, 100% character match.

The URLs submitted in the site map sitemap.xml file must be exactly the same as the main URLs you want to set, not even a millisecond off. If the map sends crawled URLs with long tails while the webpage points to URLs without tails, crawlers have to loop around conflicting instructions thousands of times daily. Writing a cleaning script to remove irrelevant URLs from the map can save 30% of the daily crawl budget for the entire website.

Self-check steps for new pages using Chrome browser:

  • Press F12 key to bring up the frontend panel.
  • Click the Elements inspection option with your mouse.
  • Press Ctrl+F to search for rel="canonical".
  • Open your eyes wide to check if https:// is missing from the href field.
  • Search through the entire webpage source code to ensure this line appears only once.

For long articles split into multiple pages, the old approach used to point all pages from the second through the tenth back to the first page. Now crawlers have changed their rules. A URL like /blog/page/2 for the second page must honestly point to itself, filling in href="https://site.com/blog/page/2". If you point everything to the first page, the 20 articles starting from the second page will be treated as non-existent.

For websites still using the old m.site.com mobile version domain, desktop and mobile versions need to leave each other’s addresses. The mobile version code must precisely point to the desktop URL. The desktop version should add a tag indicating screen width with max-width: 640px as an auxiliary tag, helping crawlers match both sides’ content within 0.1 seconds.

For large multilingual websites with hreflang language tags, setting canonical code requires extreme care. The French version fr/ URL absolutely cannot cross-reference to the English version en/ URL. Each goes to its own home, the system has a 45% failure rate when checking consistency of such matches for large websites, and slightly misdirected references can destroy the entire multilingual index database.

Three Professional Bottom Lines

Placing URLs in webpages with no-index tags is an extremely common pitfall for beginners. When the crawler receives the reading instruction from webpage A, it runs to webpage B and finds the code explicitly stating “do not index”. These two conflicting instructions cause the server to fall into an infinite dead loop within 0.5 seconds. The entire historical authority accumulated over three years for both pages A and B is instantly wiped to zero.

When a website undergoes redesign or URL changes, casually filling in old links easily leads to a 301 permanent redirect dead end. When the crawler follows the address and finds it needs three consecutive jumps to see the complete content. Once chained jumps exceed 5 network nodes, the machine program forcibly terminates the current crawl task. The website loses nearly 600 extremely valuable crawl quotas daily.

Missing one letter “s” at the beginning of a URL causes a traffic disaster of catastrophic magnitude. Assigning a webpage with an SSL security certificate to an unencrypted old page. Google’s security review algorithm, once it scans a protocol downgrade action, confiscates that URL’s security display badge within 24 hours. The webpage’s original search impressions instantly drop by more than 60%.

Typing errors when entering URLs easily create a bunch of troublesome invalid instructions:

  • Missing the trailing slash at the end of a URL, the system treats it as two completely different addresses.
  • Mixing uppercase and lowercase letters, Store encounters store triggers path recognition errors.
  • Test domain names copied unchanged into production source code.
  • Entering paths with two dots triggers absolute path recognition collapse.

If two different SEO optimization plugins are accidentally enabled in the website backend, there will definitely be two conflicting canonical codes in the webpage head source. When the crawl machine reads the first 15KB of HTML and encounters two commanding main URLs. The algorithm’s approach is to destroy both lines of code on the spot. Thousands of similar pages in the system’s underlying layer start competing for ranking resources again.

When adjusting the underlying code of category listing pages, it’s common to point all pages after the second page to the first page. When the crawl machine follows the first page’s instructions to look further, it finds that the codes for the next 49 pages are all pointing back. The 1000+ old articles hidden after the second page completely lose their ticket to the search index.

URLs containing question marks and dynamic session IDs absolutely cannot be written into the attribute box. Each time a visitor clicks, the backend database randomly generates a new string of numbers. Within a single day, the system can artificially create 30,000 useless fake separate URLs. Setting URLs with garbled parameters as the main page causes server memory load to skyrocket 300% within a week.

When checking website code health, experienced professionals follow a standard screening procedure:

  • Run Screaming Frog software to do a deep scan of all 50,000 webpages on the site.
  • Remove rows with status codes other than 200 OK from the table.
  • Export an Excel error warning list of pages missing canonical tags.
  • Check the red error warnings in the index coverage section of the console.

For multinational business multilingual websites, language tags must be tightly bound with canonical code. The Japanese version directory code absolutely cannot be cross-assigned to English version pages across the ocean. The algorithm takes 0.3 seconds to compare the language character differences between two pages and can detect mismatches. A multilingual website architecture costing hundreds of thousands faces 80% devaluation penalty.

Pointing all delisted out-of-stock product pages to the website homepage is extremely dangerous. Webmasters think about preserving the 10 years of external links accumulated by old pages, but when the crawl tool compares the homepage full of promotional banners, it finds nothing related to the original shoe-selling page. The algorithm slaps an illegal practice of forcibly pairing with a “Soft 404” tag. After 15 days, all offending webpages are thrown out of the index.

Websites still using separate independent URLs for desktop and mobile on the old building method easily reverse the arrows on both sides. The mobile URL not pointing to the desktop main site, the desktop missing screen size recognition code. Getting the two-way matching arrows wrong means 70% of mobile internet users have an extremely high probability of landing on desktop webpages with severely misaligned layouts.

Right-click to open the browser’s view webpage source interface, press Ctrl+F and input the canonical code to check the specific quantity. Only when the number in the upper right corner shows 1/1 is it safe. If the number shows 1/3 or more, quickly go to the backend to disable the conflicting plugins. This simple method is often the most effective in daily troubleshooting.

When articles are batch-crawled by content farms using scrapers, the canonical code on the original author’s site can serve as an anti-theft lock. Pirate scrapers crawl the HTML source code along with it, and the line with the absolute URL will also appear on the pirate website. The search engine compares the ownership declarations from both sides within 2 hours and accurately returns 95% of search traffic to the original first-published URL.

The code’s placement and format have ironclad strict rules, leaving no room for carelessness:

  • The code must be placed in the <head> area at the very top of the HTML document.
  • Absolutely cannot stuff this tag into the body text block of the webpage main content.
  • The URL entered must be the decoded clean Chinese characters.
  • PDF files must have the header response written in the server root directory.

Content Differentiation Rewrite

Split Search Intent

If a website has two articles about the same thing, visitors click to read, look for a few seconds, then close. The page bounce rate stays stuck at 85%. Rewrite one of the articles with a different approach, specifically written for people rushing to work. Replace the first 500 characters of explanation entirely with images and text, telling them how to get a hot latte in 3 minutes in the morning.

People in the know searching the same machine want to see if the water pump pressure is stable. Rewrite the second article as a test report, adding screenshots of 15 Bar pump pressure tests. Post a 92-degree constant temperature control curve chart in the middle of the page. Attach 12 real-shot videos made with the 58mm brew head, and the two articles look completely different.

Content that regular people understand cannot have too many obscure terms. Adjust the article’s reading difficulty level. Delete all complicated machine terminology, short sentences should account for more than 75% of the full text.

  • 1.5L water tank brews 5 cups
  • Steam pipe with 45-degree tilt angle
  • Plastic shell withstands 120 degrees Celsius
  • Box contains 24-page manual

Experts spend longer reading articles, even if a webpage has 2500 words, they can still read with great interest. Adding several sets of hardware parameter comparisons can keep the average time each person stays on the page at a steady 4 minutes 30 seconds.

  • Temperature control system with plus or minus 1 degree fine adjustment
  • Two boilers with total 1500 watts
  • Brass fittings estimated to last 10 years
  • Pressure gauge needle lags 300 milliseconds

Submit the two rewritten articles to the search engine, and the titles visitors see will automatically separate. Searching “newbie coffee machine” brings up articles with $200 price tags and images. Searching “single group machine review” brings up long tech articles with test tables. Two weeks later, the 150 daily clicks that used to compete with each other now have their respective owners.

When writing machine reviews, browse real buyer reviews on e-commerce websites. Find 120 historical comments with less than 4 stars on Amazon, calculate how many times the bottom drip tray leaked. Insert a short video of the machine’s operating noise reaching 65 decibels into the third paragraph of the webpage. With real footage that can be seen and heard, the number of people willing to scroll down to see the second screen increases by more than half.

Put the 7-day no-questions-asked return policy in bold, below the red shopping cart button. People buying things definitely stare at the length of the after-sales period before placing orders. Adding this promise text silently extends page dwell time by about 40 seconds.

  • Safety certification number from the lab
  • Promise of less than 3% repair rate within three years
  • Customer service definitely responds within 48 hours

Include specific dollar amounts in the webpage subtitle. Mark “budget under $500” to keep out random curious visitors. People with buying intent who click in, each person on average continues to browse 2.5 pages on the site. No one closes the webpage after one look, and the penalty for high similarity fades away.

Expand dry product descriptions into illustrated shopping guides, inserting 10 Q&A text boxes. Among the 200 people visiting daily, 15 will click to read the boxes. The word count and substance ratio in the article must be just right, delete all the rambling filler. Post 5 real shots with a tape measure, noting whether the 28cm wide body fits in the kitchen.

Add a few expandable panels in the webpage that can be clicked with the mouse, filled with long factory reports. Only people who really want to research will click open that PDF file occupying 2MB of space. In the backend code, replace the original H1 tag, and connect the two articles with hyperlinks. People who finished the newbie illustrated guide click a link and naturally go to the new product showcase.

Shift Perspective

I will try hard to think about your question: Let me think about it carefully.

An electric standing desk 120cm long and 60cm wide can be written about in two completely different ways. The first article takes the tone of a freelancer, using “I” throughout to chat casually. Talk about spending 8 hours typing at the computer daily, the fifth lumbar vertebra getting so sore you had to visit an orthopedist.

Turn load capacity numbers into living room scenes. On the desktop sit two 27-inch monitors, a 5-pound big fat cat, and a cup of 400ml hot Americano. Press the plastic button on the desk corner, the motor pushes the thick wood board upward. The coffee liquid surface in the glass barely moves more than 2mm.

The entire article’s tone should be like chatting over beer at a street stall. Use lots of short sentences starting with “we”, “hey”, “see”. Write about your clumsy self spending 45 minutes tightening the 16 long screws at the bottom, so tired you plopped down on the wooden floor gasping.

Mention the scene of the whole family sleeping soundly at 11pm. Press the height reduction button, hold the decibel meter close to the desk leg, and measure the motor sound at only 45 decibels, about the same as an old electric fan on low. This sound won’t wake the 3-month-old baby in the next bedroom.

Switch the pen to write the second article, with the subject all changed to “company procurement kid”. The tone should be like a meticulous warehouse manager with a calculator. Draw a 150 square meter open-plan office on paper, calculate that after fitting 20 desks at once, there’s 80cm of aisle space left.

Remove all the fat cat and coffee cup from the first article, replace with thick safety inspection certificates. Emphasize that the desktop passed the BIFMA formaldehyde release test. In a closed 20 square meter room without windows for a full 7 days, the instrument measured the odor index in the air firmly at 0.03mg.

People viewing the desk Personal home use Company bulk purchase
Weight test Two 27″ screens plus stuff total 35 lbs Withstand 150 lb industrial sandbag
Motor lifespan 4 lifts per day, estimated 3 years 10,000 continuous lifts, still cool to touch
What if it breaks Ask customer service for a spare power adapter Contract includes full dual motor unit replacement

The boss buying the desk doesn’t care how tired you were tightening screws, they’re watching the installation technician’s work speed. Write that buying 10 or more desks, the manufacturer immediately sends 3 workers in uniform. They bring 3 powerful electric screwdrivers, spend only 2 hours assembling all the scattered steel tubes, powering on, and testing.

Content for company buyers needs more invoicing details. Add a screenshot of tiered pricing with 15% off for one-time purchases of $5000 or more. Write clearly about the 15-character limit for invoice抬头, and attach the 3-5 working day financial review time for company bank transfers.

For articles for home users, sentences are very short:

  • Yesterday my back hurt so bad I couldn’t bend over
  • The button feels a bit crisp when pressed
  • Red sauce from takeout drips on the table, wipe it off with a paper towel easily

For articles for buyers, there are all hard technical terms:

  • Bulk order includes 12-page environmental review documents
  • Steel frame exterior sprayed with 2mm anti-rust paint
  • Comes with five-year enterprise-level on-site warranty card

In articles for home use, the desktop color is called “cherry natural wood style”, said to pair especially well with the cream-colored linen curtains at home. In content for company viewing, that color needs to be renamed “scratch-resistant laminate”. Write that scratching the wood board vigorously with a metal key creates 15cm white marks, which wipe clean with a damp cloth.

Remember to change how you describe the preset height buttons too. In the personal version, it says long-press the “1” key for 3 seconds, and the desk stops at the height suitable for someone 1.75m tall at 102cm. Press “2” key to drop to 75cm, just right for the $200 second-hand black leather swivel chair.

In the personal webpage, the photo shows a warm yellow desk lamp lighting up the wood grain in close-up. An old novel opened to page 30 rests casually beside it. The company procurement webpage’s image shows 20 empty desks neatly arranged in two columns under fluorescent lighting. Clean plastic cable channels line the floor, not a single extra black power cord visible outside.

Add Exclusive Value

Other websites have 300+ articles with identical punctuation down to the mark. I spent 899 yuan buying the same coffee machine, grabbed a screwdriver and removed the bottom shell. Pressed the macro lens shutter against the heating mainboard that’s only two fingers wide, took over 20 photos. Selected the clearest enlarged original photo and posted it at the very beginning of the article.

Normally people online only see the manufacturer’s retouched pretty photos, rarely see ones with mud. I took a photo of the brew head covered in brown coffee grounds and posted it. The photo has a timestamp watermark showing 3pm that afternoon. When visitors saw the webpage, the finger scrolling the mouse stopped.

Went to the coffee shop owner at the corner downstairs who had been open for 5 years. Took out a recorder, recorded 3 minutes of him talking about the machine’s continuous cup-making ability. Listened to the recording at home and typed, writing the rate at which water temperature drops into a bordered text block.

  • Temperature drops to 85 degrees on the 6th consecutive cup
  • Steam wand gets one-third of holes clogged after extended use
  • Original basket can’t fit 18g dark roast beans
  • Top hot cup plate holds at most 2 cups

The shop owner’s authentic colloquial words mixed into the article, duplicate checking software can’t scan out duplicate paragraphs. The machine used on the counter for three months, black mud accumulated in all the plastic shell seams. Used a discarded toothbrush with water, scrubbed back and forth in corners for a full 12 minutes, all the muddy work details typed into the webpage.

Bought a kitchen digital scale that can measure to 0.1g. For 30 consecutive days, weighed the consumed coffee beans every morning and recorded on paper. Put the 30 recorded numbers into a computer spreadsheet, drew a line chart with ups and downs, and firmly placed it in the center of the webpage.

Added the specific date the original rubber seal ring broke in the middle of the article. The first ring used until day 105, a tiny crack appeared at the edge. Went to the hardware store on the street and spent 2 yuan getting a same-size silicone ring. Wrote 150+ words about the money-saving DIY experience.

The water tank lid always falls off, buyers often complain in the reviews. Found a small square magnet 3cm by 3cm, applied glue and pressed it on the back of the lid. Took 4 step-by-step photos of the lid modification with a phone, arranged them in order below the text.

Early morning when starting up the machine to preheat, used the phone to record a video close to the shell. No background music throughout, keeping the buzz of the water pump. The video is exactly 45 seconds, uploaded to the webpage’s video player for everyone to watch.

  • 42 seconds to preheat the machine
  • 6 seconds after pressing switch for liquid to flow
  • 15 more seconds to heat the steam wand
  • Waste grounds drip tray holds exactly 300ml when full

Buyers want to hear how loud the machine is in the kitchen. After a day, the backend data recorded the video progress bar being dragged back and forth over 200 times. Added a screenshot of the 72‑decibel noise reading from a decibel meter held close in the article.

Also calculated the ongoing cost of parts that need constant purchasing after buying the machine. Need to replace the soft water filter 4 times a year, original filters priced at 80 yuan each. One year just on filter parts costs 320 yuan, the detailed cost calculation takes up over 100 words on the webpage.

Bought a 199‑yuan manual coffee press pot beside as a control group. Opened the same bag of Ethiopian coffee beans, weighed out 15 g on each side. The manual press requires about 15 kg of force, finishing one cup with sweat on the forehead in 2 minutes; the measured numbers from both are written side by side.

The coffee pulled by the machine has a crema layer about 4 mm thick. The manual press produces only a thin 1 mm layer of crema, which disperses in less than 2 minutes. Measured both with a tape measure, took a comparison photo showing the scale marks clearly and uploaded it to the webpage.

Ordered a cheap $9.9 plastic hard‑bristle brush online, plus a $50 small can of cleaning powder. A screenshot of the shopping list with all items heavily pixelated is placed near the bottom of the webpage. People reading the article stare at the long list of expenses and start calculating the pocket change.

Found a maintenance service point table that nobody normally organizes. Called the after‑sales hotlines of the top 10 cities nationwide, recorded all 10 street addresses in the article footer. Added each shop’s business hours—opening at 9 am and closing the shutters at 6 pm—beside it.

Taught step by step how to remove the stubborn old coffee stains from the brew head. Boiled a pot of 100‑degree water on the gas stove, removed the metal filter screen and soaked it in a water basin for a full 20 minutes. Wrote out the action of using a toothpick to pick out 5 black hard pieces from the screen holes, the whole screen full of everyday hustle and bustle.

  • Soak in boiling water for 20 minutes to soften old grease
  • Scrub back and forth 30 times with hard brush on both sides
  • Air dry on windowsill takes 4 hours

Using Noindex Tags

Best Situations to Use

A store selling 500 short-sleeve shirts, with color and size options in the sidebar. Visitors randomly select combinations, the backend generates 250,000 URLs with filter parameters. The search robot’s daily quota is only 100,000, all consumed by identical product listings. Adding a no-index code to pages with more than 3 options stacked is the standard practice.

The original product webpage retains the sole access qualification. The technical team’s weekly report shows the error rate for useless webpages dropped below the 5% safety line. Programmers tested 4 new webpage layouts on a subdomain named test. Forgot to add a directive, within two weeks 4,500 pages with garbled half-finished webpages ran into the public search area.

Test phase website headers must forcibly write no-index directives. Use server-side files to lock this rule, providing 100% blocking guarantee. WordPress-built websites automatically generate archive pages by date and author. An 800-word diary gets copied verbatim to 5 different URLs.

Install a small plugin in the backend to set all archive directories without individual articles to no-index. The main category directory preserves indexing eligibility, search traffic decline risk lowered by 30%.

The 2022 Double 11 promotion page is full of long-expired 50 yuan coupons. The page’s click-through rate is less than 0.01%, staying in the main site drags down the overall quality rating by 15 points. Adding no-index tags to expired campaign pages is a cost-effective move. Returning customers browsing history bookmarks can still see past promotion details, search programs delete it from the library within 72 hours.

When visitors type characters in the site search box, the system assembles a dynamic results page. Machine crawlers following search boxes crawl randomly, creating 15,000 layout-broken useless webpages overnight. Adding a no-crawl declaration at the very top of the search template file. To clean up the 12,000 search URLs already incorrectly indexed, submit a 6-month blocking application in the backend.

Look at specific handling methods for different types of webpages:

Webpage content characteristics Special characters in URL Handling method Estimated cleanup time
Login form page Contains login Add code to webpage header 48 to 72 hours
Backup pages for testing Contains variant Server directive 7 to 14 days
Internal staff directory Contains staff Set global in plugin backend 3 to 5 days
E-book download package Has pdf suffix Config file block rule More than 15 days

Download links for multi-page PDFs or Word format whitepapers cannot use regular webpage tags. In the server configuration file, add response header directives for 3 specific suffix files to complete blocking.

A new forum has 100,000 registered users, most never posted after registering. 80,000 blank personal pages with only usernames and no posts pull the site’s trust score very low. Set a rule that personal pages of users with less than 5 posts automatically get no-index tags. A medium-sized forum used this trick, effective indexing rate in backend records rose from 22% to 68%.

Some webpages have outrageously high content duplication:

  • 5000-word privacy policy terms copied verbatim by 20 subsidiary websites
  • Affiliate sites building lots of redirect fake pages for 5% commission clicks
  • Simplified mobile pages removing 90% of decorative code

The main site retains the indexing rights for that 1 privacy policy, all 19 subsidiary sites’ terms pages get no-crawl directives. Adding no-index tags to redirect codes cuts off the crawler’s 2nd round of probing on intermediate pages. One tag declaration moves simplified pages out of the regular webpage pool.

Subsidiaries of multinational companies often provide similar multilingual versions. A directory made for Canadian English region has 95% text overlap with the US main site. Add no-index tags to 300 non-essential service pages on the Canadian subsidiary. Search engines will prioritize sending US main site’s high-authority pages to 15 regional search results.

Financial institutions issued 120 quarterly reports, each with an ad-free print version. 2,500 print versions remove the top navigation bar and 4 bottom declaration blocks. Batch adding no-index directives to all print version URLs across the site is a basic operation. The main financial report page content remains unaffected, redundant request proportion in internal logs dropped by 18%.

Expired job postings on recruitment websites accumulate more and more, reaching 500,000. Those job detail pages that have already filled positions still bear 300,000 invalid crawl requests daily. Implement code-level blocking on positions closed for 45 days. Spider programs allocate computing resources to new high-timeliness positions, new job posting indexing time shortened by 14 hours.

A small independent store selling 200 phone case models provides 6 color display images for each case. The system generates 1,200 separate product pages for each color, but the text description is identical word for word. Keep the red main push model as the main webpage, write no-crawl code for all 5 remaining color pages. The store owner checked 30-day reports and found search impressions for those 200 phone cases increased by 45%.

Local news websites republish 50 external articles daily, with very long tracking source codes. Articles generate 150 redundant links with UTM parameters daily, accumulating on the server. Write no-index rules in the URL template with parameters. The 150 redundant links no longer compete for spots in the search pool, the clean 50 original articles get higher display opportunities.

How to Add

Open the webpage file, type a few lines between lines 3 and 5. Open the file with .html suffix in Notepad, find the location of the <head> tag. Copy and paste <meta name="robots" content="noindex, follow"> in, upload to the server and it’s done.

This 43-character sentence can block search engines at the door. The code retains the followattribute, so the machine crawler follows the 20 underlined links on the webpage to read other articles. With only 100,000 crawl quotas per day, adding this code protects the entire site’s crawl channel from being blocked.

People who can’t code rely on website backend plugins. Install the Yoast SEO plugin with over 5 million downloads, open the article writing backend editor page. Scroll down to find the advanced settings block with a gear icon, change the allow indexing dropdown from “Yes” to “No”.

The whole process takes just 3 mouse clicks, the plugin automatically inserts the code into the webpage header. Using a browser to check the source code of 50 pages just modified, that 43-character line sits quietly at line 8.

For 5MB PDFs or 10MB PPT files, the previous webpage code doesn’t work. Non-web format files don’t have a<head> area for entering directives. Modifying the HTTP response header config file at the server level can solve this problem.

For websites running on Apache servers, find a file called .htaccess in the root directory. Add 3 lines of blocking rules specifically issuing X-Robots-Tag to .pdf, .doc and .ppt files.

Every time the machine crawler comes to read those 2,500 whitepaper files, the server proactively sends out a message with a noindex directive. The file itself returns a normal 200 status code for regular visitors to download, and the crawler receives the message and politely backs off.

Look at where to modify for 4 common building platforms:

What the website is built with Where to find the modification entry How many lines of code to add Estimated time to take effect
Pure HTML static site index.html head 1 line 24 – 48 hours
WordPress system Plugin advanced settings panel 0 lines (automatic) 12 – 24 hours
Shopify store theme.liquid template 3 to 5 lines of logic Within 48 hours
Nginx server nginx.conf config file 2 lines of Header directive Immediate effect

Shopify store owners handling 15,000 garbled search parameter URLs need to modify the store’s main template file. Find the theme.liquid file in the backend code area, scroll the mouse wheel to line 12. Type in a 5-line logic code containing if template contains 'search'.

This code is an automatic switch, only running the no-crawl directive when a visitor searches on the site. Original product display pages and 50 official blog posts still get indexed by search engines normally. At the end of the month, the store owner checks logs and the garbled URL error count drops to single digits.

After finishing, check if the 1,000 tags added are actually taking effect. Log into Google Search Console backend, click “Page” status report in the left menu. Find a chart with a red dashed line called “Excluded due to noindex tag”.

The number curve on the chart will steadily climb as you add tags to more webpages. Pick 5 URLs just modified, enter them in the URL inspection box at the very top. The inspection tool runs for 15 seconds and spits out a health report with 8 detailed parameters.

If line 4 of the report reads “noindex detected”, the work is done. In the robots.txt file, delete all 18 characters of the line with Disallow. Only when the crawler walks on an unblocked path can it see the new tags just added.

For large category listings with 500 subpages, handling requires a few more lines of logic:

  • Add no-index tags to the heads of all pagination codes from page 2 to page 500
  • Write a 15-character regex formula in the pagination template
  • Extract page 1’s URL separately to ensure it isn’t accidentally affected by the first two rules

Internal Link Redirects

Application Scenarios

An e-commerce platform launches a pure cotton short-sleeve. The system automatically creates 24 product URLs with long strings of letters based on S, M, L three sizes and 8 colors.

The search engine sends a robot to browse all 24 URLs, the crawled webpage text content similarity reaches 98%.

The robot expends 75% of its energy on these identical short-sleeve pages, slowing the website’s daily 800 new clothing item indexing progress.

Write a directive in the backend to redirect all 24 complex URLs to one clean main URL /tshirt. Visitors clicking from external sites are all brought to this sole main channel.

The out-of-stock down jacket webpage from last winter is often clicked by regular customers. The 2023 model’s jacket webpage still receives 150 daily clicks from browser bookmarks.

The screen pops up “Product discontinued” a few words, causing 85% of visitors to close the browser tab within 3 seconds.

Add a 301 status code to the old jacket URL, forcefully directing people to the 2024 latest winter down jacket list page of that clothing brand.

Visitors’ attention is captured by the full screen of current products for sale, webpage bounce rate hard-dropped by 40%. All the access history accumulated by the old URL flows entirely to the new list page, pushing the new page up 3 positions in search results.

Website redesign changes domain, adjusts the backend system, previous URL links become widely invalid. The old system habitually used URLs like /article.php?id=567 with question marks and numbers.

The new system fully switches to static URLs with English words, thousands of old links face being batch discarded.

The search engine’s database has 50,000+ records of old URLs, and still sends nearly 8,000 daily “page not found” 404 errors to the server.

Programmers write code to extract numeric IDs from old URLs, matching them one by one to new static URLs. The server processes 200,000 redirect requests within a week, smoothly moving the website’s four years of accumulated external link reputation to its new home.

As article-writing websites age, they often accumulate piles of similar old articles. The database has five old articles about “how to maintain phone battery”, each receiving only 200 to 500 people daily.

These five articles compete with each other on search result pages, and none manage to get into the top five of the first search results page.

The administrator combines the text of these five old articles, editing them into one comprehensive 3,000-word battery maintenance handbook.

The server sets signposts on the five old article URLs, directing all traffic to the new handbook webpage. The day after the new webpage goes live, the combined historical traffic pushes the daily average visitors to 1,500+.

Ahrefs backend data charts show the 120 high-quality external website links carried by the old articles, following the signposts and inviting the search engine robot to the new webpage.

Add a security lock to the website; the communication port must undergo a major overhaul. All network access requests on the old HTTP protocol port 80 must move to the encrypted HTTPS protocol port 443.

Once the unsecured old webpage is opened, the Chrome browser ruthlessly pops up a 100% full-screen red “Not secure” warning.

In the .htaccess config file in the server root directory, type three lines of code, forcing all traffic to go through the HTTPS encrypted channel.

The search engine took two days to replace the secured new URLs in the rankings, and webpage indexing didn’t drop even a tiny bit. The nearly 30,000 daily requests clicking the old URL are all stuffed into the secure connection channel within 50 milliseconds.

In the past, viewing webpages on mobile, the URL always had a separate letter “m” prefix (like m.domain.com/page1). Modern webpage code is smarter, desktop and mobile share one URL and automatically adapt layout.

If you shut down the URL with “m” for mobile-only domain, everyone who saved it in their mobile browser’s bookmarks now has dead links.

Set a domain-wide blocking rule, grabbing all mobile network requests with “m”, precisely dropping them into the corresponding page of the main site.

The 5,000 daily visitors clicking their old Android phone bookmarks don’t even notice the webpage flicker before being delivered to the new version page. The entire site’s mobile device organic search traffic smoothly transitioned within two weeks, without receiving any search engine demotion.

For multilingual websites spanning multiple countries, slightly adjusting language folder names causes international visitors to get lost. The original /en-us/about webpage address was changed to the shorter /en/about format.

How to Implement Safely

Moving or merging webpages for a website, actions that are too aggressive easily damage the underlying code. The .htaccess file in the server root directory is plain text, opened with regular Notepad, filled with all kinds of English characters and punctuation marks.

One typo of a space or missing one slash, all 500+ webpages on the site instantly go down. The screen only shows a blank “500 Internal Server Error” message, visitors can’t load even one image.

Webmasters who don’t understand coding are safer using ready-made tools. In the WordPress backend plugin marketplace, there’s a tool called Redirection with over 2 million actual downloads.

In the settings panel, paste old URLs with .html suffix in the left box, paste the new URL in the right box, click save, the backend program automatically writes the underlying redirect directive.

Before starting, verify the URL list in hand:

  • Does the old URL have a www prefix
  • Is there a trailing slash / at the end
  • Change all uppercase letters in the URL to lowercase
  • Have the image attachment addresses of the old webpage been transferred
  • Have the question mark parameter tails been cleaned

Redirect actions between webpages are like a relay race. Point webpage A to webpage B, then after some time, add a new directive to webpage B pointing it to webpage C.

The search engine’s machine crawler follows links, after 4 consecutive jumps, its patience parameter automatically resets to zero. The crawler gives up reading webpage C’s content on the spot, returns to the server room carrying an empty cache file.

Make an Excel table list containing 800 URLs, write a VLOOKUP function to query the table. Extract and delete all intermediate URLs B that jump around.

Force-change webpage A’s endpoint target to webpage C. The Screaming Frog crawl tool scans 10,000 URLs in under 5 minutes, rows highlighted in red in the software interface report are all redundant chains with more than 3 redirects.

Misdirecting directions will make the search engine’s review mechanism turn hostile. Force-point a 2,000-word old article about “2021 old model phone case” to a product page selling “pure leather sofa”.

Googlebot scans and compares the text semantics of the two webpages within 0.2 seconds. Finding that the copy on both sides doesn’t match at all, the backend slaps a “Soft 404” tag on the code.

The 30+ external website links accumulated by the old article cannot be carried over to the sofa page.

Honestly find a similar webpage selling “2024 new model phone case”. Both sides’ text semantic overlap exceeds 60%, previous credits can safely transfer at nearly 90%.

After typing the code, run a few real tests with hand tools:

  • Unplug the network cable and test all old URLs with phone 5G
  • Clear all Chrome browser cache records and retest
  • Manually submit old URLs for inspection in Google Search Console
  • Use httpstatus.io to batch test 20 URLs’ real status codes
  • Continuously observe the traffic report line graph for one week

Old websites habitually stuff a <meta http-equiv="refresh" content="5;url=..."> in the HTML header. Visitors watch a 5-second countdown on screen before the page slowly refreshes and changes.

The countdown method slows page TTFB by at least 3,000 milliseconds, long abandoned by major browsers. When batch relocating thousands of product pages, writing regex wildcard expressions saves time and effort. In Nginx environment, type one line rewrite ^/shoes/(.*)$ /new-shoes/$1 permanent;.

Typing one wrong bracket position, the daily 800 visitors expecting to see shoes get randomly assigned to the website homepage. The backend shopping cart checkout count drops by 70% within 24 hours.

The server’s daily auto-generated original log file is a great thing. It records millions of lines of human visitor and robot access traces. Open the .log file exceeding 50MB with a text editor, filter out rows with 404 status codes, pick out all orphaned incomplete URLs. Spend two hours each week registering abandoned URLs, all assigned a 301 destination.

Scroll to Top