微信客服
Telegram:guangsuan
电话联系:18928809533
发送邮件:[email protected]

I’ve written an SEO article but it’s not being indexed. What should I do? | Found but not yet indexed

作者:Don jiang

Strengthen internal links (most effective), add 2-3 anchor text links from already indexed high-authority pages pointing to the new article, guiding spiders to visit again. Ensure originality, add professional charts or data, word count recommended 800+ words to satisfy E-E-A-T authority. Click “Request indexing” in GSC (Search Console), or check if page load time exceeds 3 seconds.

Increase Internal Links

Drive traffic from “High-authority pages”

Go to Google Analytics 4 backend and export reports from the past 12 months. The top 10 URLs that captured over 70% of the site’s total organic search clicks. Using high-traffic old pages to bring in uncrawled new URLs is hundreds of times faster than stuffing them into the footer of an old article that only gets 20 visitors per month.

Use Ahrefs to filter through all site links, selecting old pages that meet 5 hard criteria as the starting point for traffic:

  • Ahrefs UR score ≥30
  • GSC impressions exceeded 10,000 in the past 90 days
  • Visitor dwell time on page exceeds 2 minutes 15 seconds
  • Page carries fewer than 40 outbound links
  • Page fully loads in under 2.5 seconds

After selecting the page, immediately write the new link into the HTML code below the <header>, within the first 200 characters of the body. If you scroll down more than two screen heights, roughly below 2160 pixels, the probability of a link being clicked drops below 2%. Keeping it in the visible area above the fold can capture 12% of real user clicks.

Contextual paragraph links are indexed 3 days faster than bare list links. Leave 15 to 20 related words before and after the link. When netizens scan the screen, their eyes linger on blue underlined text for 0.4 seconds. Using long-tail keywords with 5 to 8 characters as hyperlinks will increase click-through rate from 0.5% to 3.2%.

Before adding links, rewrite the page content according to several data format criteria:

  • Anchor text word count strictly controlled between 4 to 8 words
  • Text overlap with page H1 title below 80%
  • The entire paragraph containing the link maintains 40 to 60 words
  • Link text and background color contrast ratio adjusted to 4.5:1
  • Add :hover color change feedback in CSS code

Run the old articles prepared for link additions through Screaming Frog. If the page already carries more than 60 outbound links, they must be trimmed urgently. Go to the backend to remove 20 old links in the footer that nobody clicks. The crawl quota allocated to the new URL will immediately increase by 15%. Only add 1 new link per week. Adding more than 3 at once will trigger SpamBrain algorithm warnings.

After modifying the code, publish the page. Stretch the operation timeline to 2 to 3 weeks. Go to GSC search box, enter the old article URL and click “Request indexing”. Force the crawler to review the high-traffic HTML document again within 24 hours. Following the trail, it will crawl in, and around day 14, the new URL will appear in the search database.

Build “Topic Clusters”

Throwing a newly written 500-word short article into a massive database with 10 million URLs, the crawler will most likely skip that unknown new address. Modify the site’s layout architecture, using an 8,000-word long article as the backbone, binding 15 to 20 supplementary short articles around it with code.

The server’s internal Log files record the robot’s real crawling preferences. When facing interconnected URL groups, the robot’s single visit duration extends from the usual 14 seconds to two and a half minutes. Following the 8,000-word long article down, every 400 words in the page source code there will be a colored tag jumping to a sub-page.

Use “coffee machine reviews” as the main webpage of several thousand words, and treat “water temperature’s effect on extraction rate” and “filter paper flow rate test” as sub-webpages clustered nearby, weaving a web where each points to the other.

Scattered URLs that have been stuck in GSC backend for 45 days with no movement, once woven into an inter-voting network structure, will see their crawl priority score skyrocket 300 times. The system detects that over 10 interconnected HTML documents are stored in the same directory and calculates an extremely high expertise score for the entire site.

The content depth of a large page determines how much traffic it can handle. Open the backend editor and write to 4,000 to 6,000 words, covering 12 different sub-topics. In the super long article, every time a specific knowledge point is written, reserve a position to provide a jump entry for later short-form content.

Modify the hyperlinks in the long page, adjust the code according to several hard parameters:

  • Main page carries 15 to 25 related pages
  • Remove all messy links jumping to unrelated categories
  • Arrange colored jump text in the first 60% of the page
  • Two sub-page links in the code must be at least 50 words apart
  • First paragraph of sub-page body provides blue text linking back to main page

One-way traffic flow cannot form a mutually pulling weight cycle. Go to modify the backend code of those long-stalling unindexed short articles, forcibly insert a text linking back to the 8,000-word long article in the first 3 lines of the article layout. The machine follows the A link to page B, then follows the C link at the top of page B back to starting point A.

A micro web page capture net is constructed. The IP address enters through the port, the robot will bounce back and forth among dozens of high-frequency inter-linking pages, conveniently bringing all pages with 200 status codes into the search database.

Sub short articles need to intersect at the code level. Open Screaming Frog to pull up the full site map and check whether the “coffee bean storage” article is linked to “burr grinder cleaning”. These two small articles, each with only 1,500 words, that guide each other through hyperlinks in the body text, can shorten the indexing waiting period from 4 weeks to 72 hours.

Control the depth of the entire internal link network within the site’s directory tree. Check the URL hierarchy paths in the WordPress backend. Don’t let dozens of interconnected pages hang in mid-air with nobody managing them. Place the long article’s URL under the second-level menu of the homepage navigation bar. After visitors open the site homepage, they only need to click a maximum of 2 times.

When handling the unindexed list, use the following table to scan the full site code:

  • Check if more than 3 sub-pages point to inaccessible 404 addresses
  • Article headline word overlap must not exceed 40%
  • Manually delete thin pages with fewer than 600 words
  • Ensure each page has at least 1 keyword with monthly search volume exceeding 1,000
  • Put the 20 URLs participating in the web into the same XML file node

Anchor Text Semantic Guidance

Pasting a bare URL into the body text, the machine passing by will only recognize a string of cold code characters. Search engines parse text hyperlinks with just 50 milliseconds. During this extremely short period, the crawler treats the words on the blue underline as the unindexed page’s mini business card. Search log backend data confirms that bare links without any decoration are 68% less likely to be indexed than text links.

The few words written in the hyperlink tag carry a weight contribution of up to 15% in the entire ranking algorithm. Since the BERT natural language model went online in 2019, machines have learned to look up dictionaries and understand jump phrases like ordinary readers. When driving traffic to an article reviewing Manhattan coffee shops, using “Manhattan hand-drip coffee shop ratings” as the jump phrase provides 4 more dimensions of semantic information than simply writing “coffee shop.”

Feeding phrases with clear meaning to crawlers allows unindexed pages to skip up to 3 weeks of sandbox evaluation. Run a full site report in Ahrefs’ health check backend. Many sites have 12% of links written as “click to read.” Every day tens of thousands of crawlers encounter those words, and the machine brain cannot connect meaningless action verbs to any specific commercial search intent.

Prepare to replace meaningless jump text, tap the keyboard following strict character revision standards:

  • Remove all action prefixes like “click,” “view”
  • English character length maintained between 12 to 25 letters
  • Other text length controlled between 5 to 8 words
  • Ensure each paragraph contains at least 1 long-tail keyword with search volume greater than 500
  • Avoid 100% complete overlap with the target page’s main title

Selecting long-tail phrases requires data tool assistance. Open Semrush and enter target keywords to filter. Find phrases with competition difficulty (KD value) between 15 to 30 that carry clear search intent. An article about keyboard parts stuck in unindexed status, go to backend and change “mechanical keyboard” in the old article to “red axis vs brown axis tactile comparison.”

Text with clear expectations can bring a 2.5% click-through rate improvement. The real visitor’s left mouse click sends a 200 OK status code response request to the server. Accumulating over 15 real jump clicks daily, user behavior data flows back to the data center in Oregon, and unindexed pages will be forcibly added to the index within 48 hours.

Text jumps don’t exist in isolation; they form an N-gram text block with the words before and after. Leave 15 to 20 highly relevant ordinary words before and after the hyperlink. Run sentiment analysis on that passage using Google Cloud’s Natural Language API. Entity score must exceed 0.8. Below the threshold, it will be judged as awkwardly inserted.

Force-feeding a webpage address about selling Jordan sneakers into an article about Texas BBQ. The machine extracted the “beef brisket,” “smoked” context words and cross-referenced them with “AJ1 Travis Scott” in the link. The co-occurrence rate in the word frequency algorithm is below 0.01%. An awkward jump will trigger a red deprecation warning in the SpamBrain anti-spam engine.

Visual presentation determines whether ordinary readers pause on the phrase. The human eye scans the screen at 240 words per minute. When the gaze hits the classic blue with hexadecimal code #0000FF, eye movement speed suddenly drops to 120 words per minute. The visual pause provides a large operating window for mouse clicks.

Adjust the webpage CSS stylesheet and add ergonomically correct visual anchor points to the traffic-driving text:

  • Screen display font size set between 15px to 17px
  • Add 1 pixel thick underline to text
  • Color contrast with surrounding black body text maintained above 4.5:1
  • Mobile touch hot zone set no smaller than 48×48 pixels
  • Remove all filter code that would darken screen text

Manual Submission Request

Submission Steps

Paste the long URL into the search box at the top of the page, approximately 600 pixels wide. Remember the character count cannot exceed 2,048. Extra characters will be silently truncated by the system. Press Enter, the browser sends a GET request with identity verification tokens to the backend.

The server takes 1.5 to 3 seconds to search the Bigtable database. The spinning gray circle on screen is the system finding your link among hundreds of billions of records. If the record is not found, the panel displays gray “URL is not on Google” text.

Click the blue request button beside it. The system wakes up the crawler program responsible for smartphone simulation crawling. The program goes out with a strict 60-second countdown limit. It must establish a TCP connection to the target server and complete the SSL handshake process before timing out.

The crawler arrives quickly and checks several baseline items:

  • Status code cleanly returns 200 OK
  • Page jump count is less than 5
  • Total HTML file size stays within 15MB
  • Robots.txt file has no blocking directives
  • Server Time to First Byte is under 600 milliseconds

If the page contains an 8MB uncompressed high-definition image that drags load time to 5+ seconds, the test program immediately lights up red error. If all indicators are green, the system attaches an internal tracking ID in the backend. A green success prompt box appears on screen.

Regular accounts can click the request button approximately 50 times per day maximum. If exceeded within one day, a reCAPTCHA verification popup appears. You have to squint at a 3×3 grid of images and select crosswalks or find fire hydrants. Using the API interface, you can send 200 JSON format request bodies per day.

Instructions enter a processing queue, and waiting time entirely depends on the site’s foundation. New domains online less than 3 months must wait 72 hours before a crawler shows up. An established news website updating an internal page gets crawled within 15 minutes.

A regular virtual host configured with 2 cores, 4GB RAM, and 5M bandwidth can roughly handle 100 to 300 crawler visits per day. If the server consecutively returns 10 error codes like 500 or 503, the crawler determines the host is about to give out.

Crawl actions are forcibly stopped for 24 hours. All previously submitted URLs return to origin. After the host recovers, the crawler grabs the source code and renders it with a headless browser. All JavaScript scripts must finish running within 5 seconds.

Headless browser rendering the webpage has several hard rules:

  • Total DOM tree nodes combined under 1,500
  • Page nesting depth kept within 32 levels
  • Avoid overly complex CSS selectors

Crawling and Indexing

Machine programs visiting the site to collect information have hard quotas. A regular e-commerce site receives approximately 5,000 visits per day, with download traffic limited to around 200MB. Turning on the server’s ETag cache switch is a good idea. After checking file fingerprints and finding no content modification, the server reports 304 status code, instantly saving 30% of the quota.

After web page files are packaged and pulled back to the data center, the processing program strips away excess code. After removing redundant formatting, the extracted pure text block volume falls between 10KB to 50KB. Taking this text block to check for duplicates against the hundreds of billions of records in the database. If the overlap ratio exceeds 80%, this content gets tagged as backup and set aside.

Saving those precious visits relies on several small improvements:

  • Add Cache-Control directive to response headers
  • Compress SVG vector graphics code embedded in the page
  • Remove extra tracking parameters from URLs
  • Completely rewrite 404 error pages as 410 status deletions
  • Convert all large images to WebP format

E-commerce sites often hang the same product in three different category directories, producing a string of long complex URLs. Placing a rel="canonical" directive at the beginning of the page code concentrates the scores of four different URLs onto the single link. Setting up a 301 redirect for deprecated old pages, the accumulated score will fully migrate to the new home within 2 to 3 weeks.

Web page opening speed data has become a hard pass/fail threshold. The system will review Chrome’s real user feedback reports to score. It focuses on measuring how many seconds the large image filling the center of the screen took to fully display. As long as load time crosses the 2.5-second line, the indexing system immediately marks this content with a low score.

Passing speed data must stay within the dead-end rules:

  • Time to First Byte under 800 milliseconds
  • Page element offset score kept under 0.1
  • Click to response time under 200 milliseconds
  • Text loading with font-display: swap to prevent flash
  • Images below the screen that users haven’t seen set to lazy load

Rendering the entire webpage is extremely computationally expensive. The V8 engine runtime allocated to a single page is strictly locked within 5 seconds. Some single-page applications stuff thousands of lines of code into one app.js file, ballooning the size to 2MB. If the browser slowly renders beyond 5 seconds, the crawler ultimately captures a completely blank page with no content.

Webpage code nesting depth has strict rules. If the total page node count crosses the 1,500 red line, the system will alarm. Writing code with too many layers of <div> tags, depth running to 32 levels and beyond. The machine program gives up halfway through, and the comments and external links at the very bottom of the page never get a chance to be indexed.

The article’s word count becomes a dead-end threshold. After removing duplicate content from the body area, counting only 100 to 300 words, the system stamps it with “content too thin.” The text-to-code ratio drops below 25%. The machine determines the page is full of irrelevant ad blocks or flashy formatting, and casually throws the link into the unindexed wastebasket.

Machine content quality review holds several criteria:

  • Split paragraphs into 3 to 4 sentences
  • Use H2 and H3 tags to divide headings
  • Add several external links pointing to well-known sites
  • Text with hyperlinks must clearly state where they lead
  • Visitors stay on the page long enough

Check Site Map

Confirm First

Enter yourdomain.com/sitemap.xml in the browser and press Enter. In the full screen of code, each <loc> tag is a pass. A newly written 2,500-word article not on this list won’t be found by the crawl program. Press Ctrl+F to bring up the search box. Paste the unindexed URL in. The match result is 0/0. This string of 45 characters is completely absent from the official directory.

If the URL is missing, first check the caching plugin in the website building backend. Tools like WP Rocket can easily freeze the XML file as a regular webpage. With the cache expiration time set as long as 10 hours, all newly written content gets blocked by stale cached versions.

Common Caching Tools Default Cache Duration Recommended Action
Cloudflare 4 hours Create page rules to bypass
Litespeed 8 hours Enable real-time purge function
WP Rocket 10 hours Exclude /sitemap.xml path
Nginx FastCGI 12 hours Modify conf file to skip

After ruling out the caching issue, look at the sitemap file’s pagination design. When total site articles exceed 1,000, the tool automatically splits the file. The one you just opened in the browser is only the main directory containing 5 sub-file addresses.

Click the link named post-sitemap.xml. The library stores 4,500 articles, split into 5 numbered independent files. Go to the one with the highest number and newest date, then search again.

Check category exclusion switches in the backend system:

  • Check the SEO tool’s “Content Type” tab settings
  • Compare whether the article’s category has “Noindex” checked
  • Look at the “Advanced” panel index directive at the bottom of individual articles

If an article has a “password protected” tag, the generation program automatically kicks that address out. The post_status field in the database is wrong. The frontend can display the text, but the sitemap generation code treats it as an unfinished draft.

Custom-coded programs rely on server scheduled tasks for updates. Set to pull data from the database once every 24 hours. An article published at 2 PM must wait until 3 AM the next day when the script runs before it enters the list.

Open the terminal interface and manually enter instructions. Type the code and press Enter, watch 150 lines of logs scroll by. It takes 8 seconds to forcibly stuff the latest 30 URLs into the root directory file.

Check read-write permissions of the server folder:

  • Use FTP software to pull up the root directory file properties panel
  • Confirm the sitemap file permission number is set to 644
  • Check the outer folder permission is relaxed to 755
  • Verify the file owner is the current running environment

If permissions are blocked, the code assembles the new list in memory but simply cannot save to the 2MB file on the hard drive. After fixing permissions, check the PHP program’s memory limit in the database backend. Pulling 15,000 records at once, memory consumption instantly jumps to 128MB. The host’s memory just happens to be at this number, and at record 8,500, the system forcibly cuts off the process.

Manually verify the trailing slash symbol at the end of URLs in the list. Articles published from the frontend carry a trailing slash, but the sitemap generates them without the slash. A difference of 1 byte in the symbol makes them completely two unrelated pages in the machine’s eyes.

Multilingual sites with translation tools split one article into 3 languages. When checking English URLs, note whether Spanish alternative links are output below. Missing 1 language code means all crawl volume for that language goes down the drain. Check the SEO plugin’s blacklist filter box. It contains the new article’s number 8592. This 4-digit instruction locks the 6-hour content creation firmly outside.

Open the custom-built site’s SQL database code. The sitemap file has LIMIT 500 record cap. When site content grows to 680, this rigid rule blocks the latest 180 new articles from the list.

For custom-built sites, modify the underlying code:

  • Go to line 45 of the sitemap file and increase the query limit
  • Add sort by date descending rule statement
  • Add a LEFT JOIN statement to include custom fields

Pull up the server access log to check records. The timestamp shows the crawler pulled the file at 15:42. The file size is only a pitiful 15KB. A text file holding 2,000 URLs should occupy at least 250KB of space.

The serious size shrinkage exposes a broken output process. Check the page source code to find the error location. On line 850, the article title carries an unescaped angle bracket. The parser encounters an illegal character and crashes in 0.2 seconds.

Check whether special characters in URLs are encoded. URLs with “100%” in the title must be converted to %25 with the percent sign. Symbols that haven’t been properly converted will tear a big hole in the originally intact file structure.

Open the browser debug panel and look at the network tab’s response headers. That line CF-Cache-Status: HIT exposes the CDN node meddling. Go to the console and click the purge cache button, wait 30 seconds, then refresh the browser and watch the code containing the new URLs appear.

Check Reading Status

Open GSC backend and find the Sitemap option on the left. Pay attention to the “Last Read Date” column, which represents the actual time Googlebot visited. Reading time usually carries a 24 to 72 hour delay. If an article is 3+ days old but the date stays from last week, it means the crawler never visited.

Focus on the “Status” column next to it. Showing green “Success” means normal reading. Red “Could not fetch” means network connection is broken. Causes are mostly attributed to server firewall blocking or host returning 503 errors.

Click into the detailed error for troubleshooting:

  • Reporting 404 error: The sitemap file URL entered has a typo
  • Reporting 403 error: Server permissions restrict, crawl denied
  • Timeout: Server response too slow, exceeding 30-second limit

A blank page originates from host available memory being too small, causing the program to freeze. When memory falls below 256MB, downtime faults easily trigger. Modify the php.ini setting in the server, increase available memory to 512M to break through the hardware bottleneck.

When encountering “Errors found” prompt, clicking through usually reveals unsupported file format. The sitemap file needs to be pure UTF-8 text encoding. Mixed-in garbled characters will freeze the reading program. A single file exceeding 50MB size limit will also be ruthlessly rejected by the system.

Even if the status shows green success, clicking into “View Webpage Index Coverage” reveals very thin data. 10,000 URLs submitted, but only 1,200 actually indexed. The remaining 8,800 are all stuck in the “Discovered, not indexed yet” queue waiting.

Go to backend settings and browse “Crawl Statistics,” reviewing the robot’s work record from the past 90 days. Normally, the robot’s requests for crawling pages should account for over 30%. Below 10% reveals the crawl quota is being largely wasted on invalid files.

Focus on troubleshooting these quota waste phenomena:

  • Crawled too many webpage stylesheets and script codes
  • Got stuck in an infinite loop in parameter URLs with question marks
  • Proportion of normally opened pages below 80%

Enter https://www.google.com/ping?sitemap=sitemap_address in the browser address bar and press Enter. The page showing the English short sentence “Notification received” represents successfully sending a 100-byte notification packet, actively calling the robot to come inspect.

Enter the unindexed article’s URL in the top search box. Open the Coverage module and verify whether the referring source points to your sitemap. Showing “Not detected” confirms the system’s parsing queue has been blocked for over 2 weeks.

Clean Up Invalid Pages

The robot’s crawl list is mixed with impurities. The submitted sitemap file contains 15,000 URLs but secretly hides 1,800 deleted 404 blank pages. The crawler follows the table knocking on doors one by one, getting nothing but dust and wasting 80 to 150 milliseconds of load time for each.

When dead links reach the 12% red line, the search engine’s crawler count plummets. Originally visiting 3,500 times daily, within half a month it shrinks to just 200 times per day. Identifying and kicking out inaccessible garbage URLs is the top priority.

301 redirect pages must absolutely not remain in the table. When an old article changes to a new URL, the old path doesn’t deserve a spot in the XML list. The crawler follows the old address, receives the server’s redirect instruction, and goes to the new location.

Going back and forth wastes 200 milliseconds of network requests for nothing. A site that has been running for 5 years accumulates 6,000 redirect pages all stuffed in, doubling the workload. After thorough cleaning, the sitemap can only contain pure web pages returning 200 normal code.

Use data-crawling software to do a full site health check:

  • Feed URLs to Screaming Frog to test connectivity rate
  • Pick out entries reporting 4xx and 5xx errors
  • Strip duplicate sub-pages with canonical tags
  • Trim tracking character parameters from URLs

The crawler doesn’t care how beautiful the webpage layout is; it only recognizes the three-digit server code. If 500 internal errors frequently appear, the robot determines the site is on the verge of crashing anytime.

GSC backend catches 80 consecutive 503 errors over 5 days. Articles ranking on search page 2 quietly drop beyond position 100. Open the backend database and pull out half-finished drafts with garbled characters from the sitemap generation plugin.

Overflow of dynamic URLs with tail parameters easily destroys indexing. Filter pages with ?sort=price proliferate like cell division, creating tens of thousands of identical web pages. Once the defense breaks, the sitemap instantly fills with 25,000 identical clone links.

The 8,000 visit quota allocated for crawling new articles all gets consumed by repeated pages’ bottomless pit. Rewrite the underlying code generating XML files, restrict the sitemap to only capture unique URLs that have undergone pseudo-static processing.

Start pruning the output list that takes up too much space:

  • Turn off the plugin option that generates sitemaps by tags
  • Exclude author pages and date archive pages from the list
  • Don’t let the sitemap index comment section pagination links
  • Block skinny content under 300 words

Go to the website building backend and change the default SEO plugin settings. A personal blog with 800 different tags artificially creates 800 empty shell category pages with zero content. The sitemap’s value is greatly diluted.

Monitor the website backend’s visitor statistics. Old articles that haven’t received a single click for 120 consecutive days, all packed and thrown out of the auto-generated sitemap file.

Scroll to Top