微信客服
Telegram:guangsuan
电话联系:18928809533
发送邮件:[email protected]

How to identify whether the links provided by an external link service provider are “link farms”

作者:Don jiang

You can judge from three indicators: First, look at site content quality—if a large number of pages have fewer than 500 characters, duplicate content exceeds 30%, and updates are irregular, you can basically determine it’s a low-quality site; Second, look at external link distribution—normal external links should be distributed across different domains and IPs. If more than 70% are concentrated in the same C-segment or obvious site networks, the risk is extremely high; Third, look at the site’s real value—use tools to check if there are stable keyword rankings (such as TOP100 keywords ≥ 50) and continuous indexing growth. Sites meeting “weak content + network concentration + no rankings” should basically be identified as spam farms and cooperation should be avoided.

Why High DA/DR Doesn’t Equal High Quality

DA/DR Is Not Google’s Standard

In 2006, a small company in Seattle called Moz created a score called DA. At that time, everyone was still looking at the green PageRank progress bar on web pages, with a perfect score of only 10. The programmers at this small company wrote their own crawling code, followed the network cables, scraped links from across the web, and assigned each website a number from 0 to 100.

Later, a company called Ahrefs also created a scoring system called DR. They rented thousands of computer hosts and crawled approximately 30 billion web pages on the internet every day. This software simply counted how many backlinks a website received, piecing together a specific number by brute force. People selling links would take screenshots of their 80-point DR scores and go around collecting money from clueless newcomers.

Google’s official ranking code has never written even a single line of third-party tool scoring code.

Go check Google employee John Mueller’s chat records on Twitter. In December 2019, someone asked him how much DA affects webpage positioning. He replied bluntly: Google doesn’t recognize DA at all. In 2020, he posted three consecutive threads on Reddit, clarifying that Google has never had a scoring item called “Domain Authority.”

These external software databases can’t even reach one percent of Google’s scale. Ahrefs advertises that they store 3 trillion active backlinks, which sounds impressive. Google had already exceeded 30 trillion pages crawled when they went public in 2013.

  • A certain obscure Wikipedia science page has a third-party score of only 12
  • A gambling transit site that launched just six months ago has a DA as high as 76
  • A dental clinic website in a third-tier city that has been operating for ten years has a DR that consistently stays at 5
  • A farm blog that relies on machines posting 50 copied articles every day has a DR that reached 65

That dental clinic with a DR of only 5 has its homepagesteadily ranked at the top for the local “tooth extraction” search term. The clinic’s address on Google Maps has over 300 five-star real reviews. The watered-down article with dental links posted by the high-score backlink farm didn’t even make it into the top 200 search results.

Many beginners stare at the number fluctuations in Moz tools every day. They spent $200 buying two DA50 news website links. When the software updated the following month, the score rose by 3 points. They opened the site’s real backend statistics panel, and the number of visitors from Google search remained a glaring zero.

Let’s look at the math formula Ahrefs uses to calculate scores. The software only recognizes how many unique old domains point to you. Someone spent 50 yuan buying a mass-posting plugin on Taobao and flooded tens of thousands of abandoned forums with URLs. The plugin ran overnight, Ahrefs caught 2,000 new URLs, and the DR score doubled the following week.

The so-called high scores are nothing more than number packaging created by commercial software to sell monthly subscription packages.

Google engineer Gary Illyes mentioned this again at the Pubcon conference in 2023. The algorithm only looks at the quality of individual pages, even if they’re on a low-score new domain registered just yesterday. A serious search engine company would never hand their livelihood over to a few external software vendors selling SEO monthly query services.

The price lists sent by backlink dealers are always divided into tiers by scores. DA30 to 40 costs $30, DA70 and above is priced at $500. Look carefully at that website selling for $500—the server is hidden in a cheap data center in Dallas, USA, packed with tens of thousands of machine-translated fake news articles, with fewer than ten real human readers every day.

Look at the historical snapshots of sites that were severely penalized by Google for cheating:

  • A certain webmaster blog whose real monthly traffic dropped from 50,000 to 30
  • Its Ahrefs DR score still stubbornly holds at a high 72
  • Moz’s crawler still gives it an impressive high score of 68
  • Majestic’s calculated trust level even continues to climb

Google penalized a cheating site, causing its pages to completely disappear from search results. Moz and Ahrefs outside have no access to peek at Google’s internal blacklist database. These two companies’ computers can only see that there are still a bunch of links connected to the dead site, and foolishly continue giving it high scores.

In over ten years of website operation observation, I’ve encountered far too many cases where high scores were completely useless. An independent small site selling handmade catnip had only 15 real pet buyers mention the URL in their personal diaries across the entire web. The third-party software measured its DA score as a pitiful 8 points.

It stably receives 4,000 real visitors per month from Google search who come to buy products. These 4,000 visitors bring nearly $20,000 in real sales revenue every month. A competitor next door spent $3,000 buying high-score backlinks and forced their DR up to 60. They didn’t earn back a few hundred dollars in annual server rent.

Ahrefs’s cheapest monthly package costs at least $99. Salespeople need an easy-to-understand single number to convince paying customers that this software is useful. Creating a percentage-based score lets software buyers see that 70 is greater than 30, and immediately feel like they’ve figured out the underlying rules of the internet.

Google’s data centers allocate computing power through an entirely different path. The system has thousands of micro-adjustment signals to determine whether web pages actually have human visitors. Chrome browser accounts for over 65% of global desktop computer share, and Android phones send hundreds of millions of real user clicks and webpage dwell time records to servers every day.

External query tools can’t access even a single real visitor mouse scroll trajectory. Moz can only use a few rented cloud hosts to crawl around following HTML tag codes in web pages. Machines can’t tell whether an article saying “best baby stroller” was typed word by word by a mother or churned out by a machine in three seconds.

The information gap phenomenon is particularly severe in non-English small-language markets:

  • A pure Polish local forum shows DR of 0
  • 30,000 young Polish mothers post and discuss there every day
  • Googlesteadily ranks it at the top for Polish maternal-infant search terms
  • A DA60 Polish sock puppet site built by an organization through spending money is now ranked beyond page 10

People who know what they’re doing never look at big score panels when checking websites. Experienced users click into the Referring Domains detail list and look at the rows one by one. A website claiming DR60, whose top 100 sources are all free subdomains that no one has heard of and don’t need paid renewal—the scam is immediately exposed.

“Brushed” High Scores

Every day, 100,000 unwanted expired URLs drop on the Namecheap platform. Domain dealers spend $15 to snag an old non-profit public welfare organization domain registered in 2012. Ahrefs’s software database has always kept the .edu suffix recommendations that the public welfare domain accumulated over the past ten years.

The new buyer takes control, completely clears the original public welfare promotional page code. Someone with technical knowledge puts a 301 permanent redirect code on the server, sending all the decade-long accumulation to a gambling page that’s been online for just three days. Moz software updates monthly data within 72 hours, and the new URL’s DA score surges from 0 to 45.

Underground private blog networks operate entirely on assembly lines. The operator rents 500 cheap data center independent IP addresses in Eastern Europe and Southeast Asia, installs free basic WordPress on each computer. There are many cheap writers on Fiverr offering services for $5, specifically hired to pad word counts with rough English articles using free stock photos.

Look carefully at the hundreds of high-score sites from the same seller, and you can find many identical website-building traces:

  • All use Cloudflare free version SSL security certificates
  • The homepage always displays 5 articles stiffly machine-translated
  • No 2023 copyright year statement or privacy policy found at the bottom
  • Page load times often exceed 3000 milliseconds

Using software to mass-post comments to inflate DR scores is an extremely cheap business. GSA Search Engine Ranker, an automatic program created by Russian hackers, works 24 hours a day. The program roams the internet looking for unguarded forum message boards, stuffing in 50,000 ad replies with strings of English letters every day.

Some people hack into major news websites’ subdirectory permissions to sell money. Hackers break into a US local TV station’s network server with extremely weak defense and secretly install an unregistered WordPress standalone system in the backend. People reselling backlinks advertise everywhere—spend $80 and you can post a native article on a channel backed by the high-score TV station.

Someone also exploits redirect vulnerabilities at major tech companies. Some people take advantage of Google Maps’ built-in redirect function, wrapping a long string of messy URLs inside a Google long link. Third-party detection tools see google.com at the beginning of the URL, the program is completely fooled, and judges the target URL as having extremely deep backing.

Sites propped up entirely by piling up garbage links will show absurdly disordered data across all indicators when examined:

  • Majestic tool’s calculated TF score is only 2, while the CF score surprisingly jumps to 60
  • Semrush’s recorded historical traffic curve is always a flat line stuck to the bottom
  • Backlink source countries are 90% highly concentrated in India, Russia, or Brazil
  • Anchored text with commercial ad words account for over 70% of the entire site’s total

Spam farms operating under the guest blog name are priced at $30 to $150. The webmaster posts 50 completely unrelated commercial sponsored promotional articles every day like clockwork. A personal blog originally focused on kitten and puppy care was stuffed with messy ads for e-cigarettes, adult products, and online drug sales within three months.

The weight a single webpage can distribute is like a fixed amount of water in a cup. When the number of outbound links from a single page reaches 3,000, the water drop that a buyer paid a high price for, distributed through one URL, drops to a pitiful 0.03%. When Google’s spider crawls and sees the page full of outward exits, it judges the site as an unmoderated low-quality directory collection.

Check the domain registrar’s public information, and you can immediately see the site’s payment records. Legitimate business websites will pay for Whois registration periods in a lump sum for 5 to 10 years at once. Scammers running spam farms save money—over 99% of domains pay only the basic 1-year fee, calculating that they can discard old URLs for new cover identities anytime they’re penalized.

To make sold backlinks last longer, sellers build several messy link frameworks. Someone with knowledge writes a Python script to register 10,000 Tumblr free blog accounts at once. Machines randomly generate incomprehensible gibberish text, all pointing to the first-tier high-DR main site, artificially inflating the main site’s data.

Open the farm articles and look carefully—the traces of shoddy production are everywhere:

  • Every article’s body text is exactly stuck at the standard 300 or 500 character edges
  • The author field displays white-collar cutout fake avatars stolen from free material libraries
  • Click any text link in an article, and a bunch of 404 page not found errors appear
  • Social media sharing data are all glaring zeros

Traffic and Ranking Verification

The “Cliff-Like” Traffic Trend Chart

Open Ahrefs software, enter the domain the seller gave, and click on Organic search to see the chart. Don’t look at the high DR 75 score at the top. Manually extend the time range to over two years. In September 2023, Google updated its algorithm, and many websites’ visitor numbers changed completely that month.

A digital blog that honestly creates content will have monthly visitor numbers fluctuate slightly between 22,000 and 25,000. During Black Friday sales season, the number rises to just over 30,000. The trend chart for the seller’s farm site looks completely different—within three months, it can soar from 500 visitors to 80,000.

The highlight moment lasts less than two weeks, and the blue line representing traffic is chopped off at the waist, falling straight toward zero. Last month there were clearly 83,400 visitors, but by mid-next month, there are only 112 left. A red line appears on the chart—Google just cleaned up a batch of violating and fraudulent websites.

Machine-assembled low-quality articles were deleted from the search engine database. Webpages originally ranked in the top three of search results dropped to page 50 and beyond. Regardless of what guarantees the seller makes verbally, the traffic valuation of just $1.5 shown on Ahrefs has already given away the truth.

When practically checking and guarding against high-frequency explosion points, data features:

  • March 2024 large-scale algorithm adjustment period
  • Single-week traffic drop exceeding 85%
  • Number of indexed pages sharply decreasing in a short period
  • Natural traffic remaining below 50 for over half a year

To hide ugly charts, sellers often play tricks. The screenshots they send are specifically chosen from the three months with peak traffic data. They use Semrush’s Authority Score screenshot to fool people, saying nothing about the natural traffic crash. Always check historical records yourself before buying links.

Many farm sites like to play the game of expiring domain registration. A certain university club’s official website was abandoned in 2018, and the seller bought it for $60. The first part of the chart’s data is flat like a straight line, with only occasional dozens of people. In April 2023, they started stuffing over a hundred scraped articles daily.

They pushed the visitor count to 15,000 through false prosperity. The algorithm quickly discovered that the newly posted content has nothing to do with the original educational background. The following cliff fall knocked the daily visitor count back to its original state—there weren’t even 10 people. The link bought for $150 was hanging on a blocked and dead site.

The traffic crash is often accompanied by a major reshuffle of the search ranking keyword library. Originally there were 4,500 search terms ranking in Google’s top 10. On the day the chart fell off the cliff, the keywords with rankings shrank to only 30. The few that survived are all obscure gibberish that no one searches for.

When Google’s spider crawls your link, italso attaches a negative label. The link hanging on a site whose visitors dropped from 60,000 to zero transfers all penalty records. The Penguin anti-cheat algorithm follows backlinks all the way to the source.

Check specific tool panel sections to avoid abandoned sites:

  • Check Ahrefs overview page with one-year and all-view settings
  • Compare this year’s same month with last year’s same month traffic difference
  • Check Semrush trend chart red triangle markers
  • Verify recent large-scale backlink deletion history records

Open the Wayback Machine, enter the specific date of the traffic cliff. Last month’s page was still selling garden shears—a legitimate e-commerce blog. In the snapshot after the crash the following month, the entire webpage turned into Las Vegas casino ads and flashing animated graphics.

Verify Traffic Sources

Open Ahrefs software’s main interface, skip the traffic total line chart at the very top, and scroll down about two screen lengths. A pie chart will appear in the lower-left corner of the screen, labeled Traffic by country. The person selling links just bragged on WeChat that they have a Boston real estate local network, charging $150 per article.

An English-language webpage filled with Massachusetts apartment rental ads, the tool shows 22,000 visitors clicked through last month. Look at the country-ranked pie chart—the top traffic source is Bangladesh, accounting for 45%. India follows with 32%, and Pakistan contributes 15% of visits.

A website that claims to serve Boston local home buyers, the real US local visitors are a mere 3%. Cheap workers in internet cafes in Dhaka, South Asia, with base salaries of $0.001, frantically refresh English webpages every day. The seller spent $20 on the Fiverr gig platform, easily buying 10,000 fake click records.

Numbers used to create a show can be completely exposed by people who know what they’re looking at. Analytics plugins installed in Google Chrome can record visitor dwell time—the vast majority of IP addresses stayed on the webpage for less than 4 seconds. The mouse didn’t scroll even 1 millimeter, and the page was forcibly closed. Before paying, verify the site’s language, business coverage area, and actual visitor country.

Install Similarweb’s free browser plugin on your computer and open the URL the seller sent. Click on the panel labeled Geography in the plugin icon. An online store that claims to sell Vancouver, Canada snow blower parts has 18,000 clicks all from Nigerian cheap data center IPs.

All data center IP addresses are distributed in cheap cloud server data centers, completely different from regular home broadband networks. Google’s security defenses have long put IP segments marked with data center numbers into the cheating list. Spending $250 on a backlink at this store was like hanging it on a webpage visited entirely by machine code.

Occasionally you encounter sellers willing to spend money, specifically buying high-end US proxy IPs to brush traffic. The country distribution chart on Ahrefs is indeed brushed to show 90% US visitor ratio. Look to the right at the traffic composition ratio data panel.

An honest content blog will have 60% to 80% of visitors come from typing questions into the Google search box. A US farm site brushed with high-end proxy IPs has only a pitiful 2% from organic search. The remaining 98% is all classified as Direct no-source traffic.

No one memorizes an unknown spam blog’s URL and types it into their browser address bar character by character every day. Those tens of thousands of no-source visits are all from cheaters using Python scripts with automatically executing programs. A piece of code sends handshake requests to the server on a timer every day, creating the illusion of false prosperity.

Specific data indicators for verifying country and access channel:

  • Focus on the top three visitor countries shown on Ahrefs overview page
  • Verify the site’s stated office address against actual IP source
  • Check the percentage ratio of organic search to no-source visits

Sellers will send a screenshot of the Google Analytics backend in the chat. The image shows 50,000 active users over the past 30 days. Zoom in on the image, and there’s a very subtle line of text in the lower-right corner marking Avg. Session Duration.

That string of numbers clearly shows 00:00:02. 50,000 users with US IPs clicked on a 3,000-word real estate investment analysis article. Everyone looked for just two seconds and collectively clicked the X in the upper-right corner to exit. Tens of thousands of visit records without real reading behavior can’t pass any weight score through the hyperlinks hanging in the articles.

For domains with extremely strong local attributes, check long-tail keyword rankings that include location names. Open Semrush, enter the domain, and switch the search region to “US-Boston”. The website claiming to be Boston’s top real estate information site has 0 keywords ranked in the top 50 when queried locally.

Exposing “Ranking Keywords”

Open Ahrefs software’s left menu and click into the “Organic keywords” report. A website claiming to do North American home and garden appears on screen. The link seller is asking $80 per article. First, focus on the top 10 English keyword groups that can bring the most visitors.

An honest flower-planting blog should have phrases like “how to prune tomato plants” (4,500 monthly searches) ranked at the top. Or “best soil for indoor ferns” (1,200 monthly searches). The entire screen should be full of soil and plant names.

This $80 data report looks completely different. The keyword ranked #2 is “buy cheap tramadol online.” #5 reads “crypto casino no KYC.” #8 is “write my nursing essay.”

The seller will probably make excuses about having a comprehensive lifestyle category on the site. A life information site with 15,000 monthly visitors couldn’t possibly have 8,200 clicks all coming from adult entertainment and prescription drug slang.

Look at the CPC (Cost Per Click) column on the right. Legitimate lawnmower and pruning shear terms have advertisers bidding around $0.50 to $1.20. The medical-category black hat terms occupying the site’s top 10 have CPC as high as $15.00 to $45.00 per click.

High-frequency keyword characteristics for immediately identifying dirt hidden on sites:

  • Online poker and sports betting terms
  • Controlled drugs purchasable online without prescriptions
  • Crypto black platforms for money laundering
  • Student exam-taking services guaranteeing passing grades
  • Late-night chat services with location names

Scroll down to positions 11 through 50. Check each KD (Keyword Difficulty) score one by one. Ahrefs gives each keyword group a score, with the maximum perfect score being 100.

The report has arbitrarily produced 3,400 weird letter combinations with KD scores of 0. Something like “jhsfdg review 2024″—a string of gibberish plus a year—has the tool show a big zero egg for the true monthly search volume (SV).

Cheating software automatically produces tens of thousands of completely useless fake webpages every day. Machines stuff a bunch of random letters into sentences about “reviews” or “prices.” No real human being types that string of letters into Google’s search box.

The seller sends a Semrush console screenshot with bold “Total Keywords: 24,500” printed on it. After sending the image in the Skype chat, they keep the real English words behind the number tightly hidden.

Download the complete CSV spreadsheet file to your computer desktop yourself. Open it in Excel and apply filters to the “URL” or “Keyword” column. In the search box, type “CBD” or “Casino.”

Among the 24,500 listed keywords, 18,300 words fully contain the letters you just typed. The remaining 6,000 keyword groups are cobbled together from Russian and Thai, yet the site’s homepage is entirely in English, with the profile stating the company is in Ohio.

Data characteristics to watch for during spreadsheet verification:

  • Small-language keywords fill over 30% of the entire table’s rows
  • More than half of keywords have monthly search volume (SV) below 10
  • Competition difficulty scores (KD) remain at 0 or 1 year-round
  • Over 90% of traffic is supported by a single black-market page

Occasionally you’ll find a page titled “2023 College Student High-Cost-Performance Laptop.” It looks exactly like a legitimate tech review article. Scroll down and stop at the third paragraph.

A sentence suddenly appears: “Studying is so tiring, why not relax by playing some real gold slot machines?” Under those bolded words is a hyperlink. Click it and the page redirects to a Cyprus gambling site.

Google’s crawling spiders read through text word by word. The NLP system extracted the context and issued a red and yellow card warning to this domain claiming to sell laptops.

Your SaaS software company bought a posting spot on this warned domain. Finance pulled out $120 and paid the bill. The article with the hyperlink was published at 9 AM Tuesday morning.

When you log into Google Search Console (GSC) on Friday, your official website’s impressions dropped from 1,500 per day to a pitiful 400. The penalty tag crawled along the $120 link into your own website.

To verify data, check the specific pages that bring the most visitors. Ahrefs calls this the “Top Pages” report. A health article with the URL suffix /health/benefits-of-water/ brings zero visitors per month.

Switch to another page with /sponsored/bet365-login-link/—this single page brought in 9,800 visitors in one month. The entire website’s framework is empty, entirely propped up by three or four extremely profitable violation pages inside.

Indicators to verify in the Top Pages report:

  • Check the appearance of URLs ranked in the top 5
  • Look for “guest-post” or sponsorship keywords in links
  • Compare high-traffic pages against the site’s own theme positioning
  • Click through the top-ranked pages one by one to see real content

Real content-focused webmasters build trust through hundreds of articles around one theme. An Italian pasta recipe webpage gets 80 visitors, a steak-frying tutorial page gets 150—the visitor distribution across pages is very even.

Content Quality and Relevance Review

Quality Review

Throw the URL into a plagiarism checker website. If the screen is full of red warnings, the articles are basically copied. Late last year, search engines cleaned up 4.5 million low-quality pages. These deleted pages all had text duplicate rates exceeding 85%. Some auto-word-changing software can mechanically alter 300 synonyms in one minute, making the text extremely awkward to read.

Take two paragraphs and run them through a readability tester. If they can’t even pass the US 6th-grade level, there’s definitely something wrong with the article. A sentence of 80 words has not a single comma to be found, just one long breath of nonsense. Industry-specific term frequency has been artificially inflated to 5.2%, while normal people writing articles rarely exceed 2%.

You can also spot tricks just by looking at what web pages look like:

  • Each paragraph’s word count seems measured with a ruler, exactly stuck between 490 and 510 characters
  • Bold headlines are stiffly stuffed with long search terms
  • The sidebar has 6 or more completely unrelated category directories
  • Scrolling to the very bottom, the copyright year is still stuck before 2018

Text padding often comes with random image matching. Right-click to save an image from the webpage and check the camera info in file properties. For these batch-produced websites, over 90% of images have camera model and shooting time stripped. The few that slip through mostly have dates stuck in free royalty-free image libraries before 2014.

Images are crudely compressed by the server to 600×400 pixels uniformly. Press F12 to open the browser code panel. Someone who knows what they’re doing uses proper quote formatting for article text. Low-quality web pages have underlying code tag counts bloated to over 2,000. Text layout relies entirely on typing hundreds of meaningless line breaks to forcibly create white space.

Machine-assembled content has many obvious flaws in word pairing:

  • Related synonyms or extended vocabulary missing rate is as high as 70%
  • From beginning to end, the entire article doesn’t have a single specific year or data point
  • Article source links all go to expired 404 error pages
  • The speaking tone switches between “I” and “he” like split personality

Copy the URL into an AI writing detector. On the chart, the green blocks representing “common prediction words” account for over 80%—this is absolutelyassembly-line-produced content. These software tools have an unchangeable habit: they particularly like using long strings of adverbs at the beginning of each paragraph.

Count how many external links are stuffed into one article. In just 500 characters, there are 8 underlined clickable words densely packed. These 8 links jump to four unrelated overseas server IPs. Third-party tools rate such a webpage’s trust score below 10 points.

Open the site’s map file and check the date marks. A small blog that usually doesn’t update for half a month suddenly spat out 150 new pages within one hour between 2 and 3 AM. The human typing limit is about 80 words per minute.

The backend auto-program hit the batch-send button. Check the Wayback Machine for what the site looked like historically. A URL was selling Japanese ramen before 2021, then in early 2022 everything was cleared and it became a site posting cryptocurrency trading news. The mask is immediately torn off.

The other side of fake webpages is that there are simply no real people watching:

  • No traffic tracking code can be found anywhere on the page
  • Visitor dwell time doesn’t even reach 11 seconds before closing
  • Nearby share buttons have click counts that are permanently zero
  • The server backend shows the most basic webpage compression function is not enabled

Stare at the ad placements at the top of the page for a while. The small ad boxes show cheap money-making links that pay $0.02 per click. Calculate the site’s entire traffic value with tools—less than $5 per month.

Find two articles by the same author name and compare them. The previous article was still in the voice of a 50-year-old carpenter teaching people how to carve rosewood furniture. The next one immediately switched personas, becoming an 8-year investment analyst on Wall Street—the identity discrepancy is extreme.

Check the server addresses behind these websites. The 10 backlinks purchased are distributed across 10 websites with different names, but testing reveals their IP prefixes are all 192.168.1.x. Under the same network pathway, over 300 identically-looking websites are hanging.

Click on the site’s “About Us” page. The intro text is the same template used by 5,000 other small websites worldwide, with only the company name changed. The customer service email is a free mailbox that requires no real-name registration, and the phone number provided always rings with a “number not in service” prompt.

Overall Relevance

Click on the site’s top navigation bar and scan what words are in the menu. A webpage named “Geek Digital,” its dropdown list is tightly packed with “Medical Beauty,” “Sports Betting,” and “Formaldehyde Removal Company.” A real tech media outlet extends categories at most to phone cases or smart home appliances—there’s no way one server can spit out news spanning 8 distant industries within one week.

Scroll down and count the latest 20 articles displayed on the homepage. The previous post was still teaching you how to choose Sony mirrorless camera lenses, and the next line immediately reads “2023 Dubai Real Estate Investment Visa Guide.” The time interval between the two articles is precisely 2 hours and 15 minutes, down to the second—the automated posting software is crazily flooding into the database.

Throw the URL into a domain lookup website to check its background. A second-hand domain bought for $50 was clearly a local small-town dental clinic’s official website from 2019 to 2021. In April 2022, the pages completely changed, suddenly becoming a fashion blog daily posting “How to Identify High-Quality Designer Bags.”

Check where all the outgoing links from the entire site go. A normal parenting forum, 90% of outbound URLs are about diaper reviews or pediatric hospitals. Put the purchased URLs into an analysis tool to run an outbound link report, export it as an Excel spreadsheet, and you’ll see extremely absurd industry distribution data:

Industry Category for Link Exit Percentage of Total Site Backlinks Danger Level Estimate
Online Poker and Gambling 35.5% Triggers entire site blacklist
Weight Loss Drugs and Adult Products 28.2% High danger red zone
Machinery Equipment and Moving/Cleaning 22.0% Seriously deviates from original topic
Original Tech Category on This Site 14.3% Real content completely squeezed out

The numbers in the table reveal the site’s true colors. Under the shell of an internet information domain, there are as high as 85.7% of chaotic industry redirect links. Search crawlers come on schedule every month to find nothing but topics completely unrelated to tech—the machine will stuff this domain into the spam site blacklist.

Look for those dense text tag walls on the right side or bottom of the page. A live webmaster categorizes articles into specific sub‑drawers—click the “Router” tag and you’ll find over 40 hardware reviews just lying there. A posting site’s tag wall has over 300 solitary search terms—click “Los Angeles Postpartum Care Center” and there’s only one short article, without even images, just to pad the numbers.

Keep your eyes on the small text showing the author name at the beginning of each article. An account named “Tech Little Expert” posted “How to Write Python Crawler Code” in the morning, and in the afternoon used the same pseudonym to post “How to Care for Dogs After Neutering.” One person’s brain can’t store the niche knowledge bases of 12 different professions—common medical terms are misspelled into gibberish.

The identity setup of the hodgepodge website is leaking everywhere:

  • A beauty blogger’s avatar uses a 50-year-old mechanic’s stock photo
  • Yesterday claimed to work at a New York investment bank, today becomes a fruit farmer
  • Three WeChat numbers selling shoes areforcibly squeezed into the personal bio

Run the traffic structure chart to see the site’s real visitor destination. A blog dedicated to vertical content—the search traffic users mostly stay on the hottest featured articles. A disguised farm site claims 50,000 monthly visitors—check the detailed breakdown, and 98% of visits are crowded on a few obscure posts about “rent-a-car without deposit,” while the homepage hasn’t had 5 visitors in three months.

Type /sitemap.xml after the URL and press Enter. A site focused on makeup reviews should only have a few hundred lipstick and eyeshadow page URLs. What appears in the plain text box is filled with 45,000 links, arranged A to Z covering law firms and drain cleaning service pages from various locations.

Grab 10,000 Chinese characters from the website for vocabulary concentration testing. A pet medical blog—the top 10 high-frequency words are inseparable from “vaccines,” “neutering,” and “dog food.” The current page’s data shows the top high-frequency words are all “factory direct sales,” “free shipping,” “same WeChat number” sales talk—no half industry-specific term to be found.

Check the background colors and icon design style on both sides of the webpage. Clearly a lifestyle-focused interior design website, the right sidebar unexpectedly has 6 red-and-yellow flashing “Excavator Parts Wholesale” animated ads. Elements worlds apart are squeezed into one screen—the site owner doesn’t care about page layout, as long as money is paid, the code gets uploaded.

Fake geographic information traces are extremely obvious:

  • The contact address at the bottom is in Shenzhen, but all articles share Iceland aurora photography
  • The local weather widget loads with New York time
  • The customer service landline is Beijing, while the mobile phone area code is Hainan

Scroll down to the comment box area at the very bottom of articles. Under an article analyzing this year’s new energy vehicle battery range, if real people were reading, the discussion area would probably be about whether charging stations are sufficient. Those few existing comments are all in English saying “excellent article,” followed stiffly by a link to buy knockoff watches.

Try replacing the www at the beginning of the URL with bbs to test it. The main domain looks like a clean travel diary blog, but enter the unmoderated forum section and it’s full of posts from machines selling fake diplomas and exam-taking services. The main site pretends to be a legitimate media outlet, while the adjacent sub-channels have long become a trash heap where anything goes.

Local Relevance

Focus on those underlined words, read 30 characters to the left and right. When a living person mentions a Beijing moving company, the surrounding conversation absolutely centers on cardboard box packing, truck restrictions, or refundable deposits. The first two sentences were clearly about visiting Wuhan University to see cherry blossoms in March, but the second half stiffly veers to finding moving help—the brain immediately feels extremely awkward.

Count how many Chinese characters are in the colored hyperlinks. Real web pages often have linked text that’s half a spoken sentence, containing about 4 to 7 Chinese characters. Paid links often have dry two characters like “women’s clothing,” or simply a 15-character Taobao product search title.

Look at which corner of the webpage that out-of-place phrase is stuffed into. Posting machines, for convenience, put the URL to add over 90% of the time in the very last paragraph. The entire article, a long and detailed 1,500 characters, the first 1,400 are all teaching you how to stew braised pork, and the last 50 characters abruptly switch to selling used auto parts.

The state when hovering the mouse contains several extremely subtle tricks:

  • Within 50 words around the link, no noun of the same category can be found
  • The text color of the paragraph containing the link is two shades lighter than normal body text
  • When the mouse stays on it, the lower-left corner shows a long string of meaningless letter tracking codes
  • The entire sentence has a forced command tone urging you to click immediately

Extract that entire sentence containing the URL, copy it into the computer’s notepad and read it aloud. Machine batch-inserting code has a common flaw—no matter the context, it forcibly inserts itself. A sentence saying “We provide quality service” is split from the middle, forcibly stuffed into a bold teeth whitening price—the subject-predicate-object is shattered.

Flip through other clickable positions in the same article. In a 1,200-character article with 4 external redirects, the normal proportion often has 3 pointing to Wikipedia or government open data for credibility, leaving 1 for commercial sales sites. The fake farm distributes all 4 redirects to 4 unrelated overseas small online stores.

The machine algorithm’s dictionary constantly calculates the numerical distance between words. “Coffee” and “Starbucks” might have a distance of 0.2, while “coffee” and “excavator” can stretch to 9.8. If an unfamiliar word with a distance exceeding 8.0 suddenly appears in text about how to brew pour-over coffee, it immediately triggers a red light for manual review.

Open the page’s underlying source code with a keyboard shortcut and check for hidden anti-tracking marks. A real individual webmaster recommending URLs, when uncertain about a site, will conveniently add the rel=”nofollow” attribute—the site’s-wide ratio remains fluctuating around 20%. On paid-for layouts, hundreds of backlinks are all bare without any protective tags.

Whether article images match the redirect links reveals how much the backstage manipulator slack off:

  • Next to text about imported dog food, there’s a photo of a repair shop floor covered with waste tires
  • The image’s alt text is stuffed with orthopedic medical terms completely unrelated to the paragraph
  • The link happens to beclosely against below a blurry screenshot forcibly stretched to 800 pixels wide
  • The image-text distance is forcibly stretched beyond two-thirds of a computer screen height

Count the concentration of the entire site’s outbound links and compare the sentence template overlap rate. Randomly sample 200 articles posted by the site in the past 3 months. Linked paragraphs are without exception all exactly the same sentence structure—the entire screen is full of “for more details click here.” A living person typing absolutely cannot write hundreds of sentences with identical punctuation.

Run a traffic estimation tool to check that page’s real visitor count. After a whole month, the living visitor count for a single webpage is less than 3 people. An abrupt link hidden in an article’s corner that’s completely disconnected from context has an actual click probability of less than 0.5 in ten thousandths.

Copy the text containing the link into a translation tool and backtranslate between Chinese and English twice. Real Chinese native writing has a natural breathing rhythm when speaking. The 50 characters that were forcefully assembled, when translated back to Chinese, are surprisingly smooth and fluent—the original stiff foreigner grammar structure and assembly-line machine translation taste, when compared, are completely exposed.

Subtle flaws in local layout are glaringly obvious:

  • Only the line with the link has its line spacing suddenly shrunk from 1.5x to 1.2x
  • The first two characters’ indent at the paragraph beginning suddenly disappears, tightlyagainst the white frame on the left of the webpage
  • An extremely strange half-width space appears both before and after the linked text
  • The entire paragraph has no punctuation marks whatsoever,line breaks entirely by hitting the spacebar

Crawl the server access records for the day the webpage was generated. A redirect link that truly captures readers’ interest has extremely obvious emotional coloring in the surrounding text—perhaps mixed with 50% surprise or 30% questioning tone. Text stuffed with commercial URLs is dry like an old appliance instruction manual—the emotion scoring tool always outputs a flat zero.

Carefully look at what the punctuation looks like around the link. In an originally very smooth Chinese sentence, an English-input comma (half-width) is suddenly mixed in, and periods mix full-width and half-width situations are extremely severe. When the posting robot chopped and replaced that 30-character text block, it absolutely did not have the ability to unify the punctuation character set of the entire 1,500-character webpage.

Scroll to Top