微信客服
Telegram:guangsuan
电话联系:18928809533
发送邮件:[email protected]

Cloudflare Firewall Blocking Google Crawler | How to Solve the Indexing Failure

作者:Don jiang

Many webmasters find their website suddenly “disappears” from Google search results, and the underlying cause is likely Cloudflare’s firewall mistakenly blocking Googlebot, preventing search engines from properly crawling pages.

Since Cloudflare’s default security rules are relatively strict, especially for crawler IPs with high-frequency access, they can easily trigger firewall blocks. Minor issues cause delayed indexing, while severe cases lead to dramatic drops in keyword rankings.

This article will provide 4 practical steps, from troubleshooting block reasons and adjusting basic firewall settings to precisely configuring crawler whitelists, to help you resolve Google indexing issues caused by Cloudflare rules.

Cloudflare firewall blocking Google crawler

First confirm if it’s actually being blocked

Many webmasters rush to modify configurations when they discover their site isn’t indexed by Google, but Cloudflare may not be blocking crawlers at all—it could be other SEO issues (such as content quality or robots.txt restrictions).

First verify the block through the following methods to avoid blind operations that cause more complex issues.

Google Search Console Crawl Error Report

  • Path: Go to GSC dashboard → “Indexing” on the left → “Coverage” → Check “Crawl failed” records under “Excluded” pages.
  • Key indicator: If error types show “Denied” (403/5xx) or “Redirected,” it may be a firewall block.

Compare Cloudflare Firewall Logs

Operation: Log in to Cloudflare → Go to “Security” → “Events” → Filter time range, search for requests where “User-Agent” contains “Googlebot.”

Focus on the status:

  1. Block: Explicit block (needs to be allowed)
  2. Challenge: Triggered CAPTCHA (may affect crawler efficiency)
  3. JS Challenge: Browser check (may cause mobile crawler failures)

Use Google’s Official Testing Tool

  1. Tool address: https://search.google.com/search-console/inspect
  2. Enter the blocked page URL, click “Test Live URL,” and observe results:
  3. If “Cannot crawl” (Crawl blocked) is displayed, confirm with HTTP response codes in the details below (such as 403).

Distinguish Between “CAPTCHA” and “Complete Block”

CAPTCHA Challenge: Crawler receives a CAPTCHA page (returns 200 but content is CAPTCHA), Google cannot parse it, causing indexing failure.

Complete Block: Directly returns 403/5xx error codes, crawler cannot get any page content.

Check Cloudflare Firewall Basic Settings

Cloudflare’s default security configuration protects your site but may also “accidentally harm” Google crawlers.

Especially high-frequency crawling behavior can be judged as an attack, causing crawlers to be rate-limited or even blocked.

The following 4 basic settings must be checked first—simple adjustments can significantly reduce false block probability.

Adjust Security Level

  1. Problem: When set to “High” or “Under Attack,” may block more than 30% of legitimate crawler requests.
  2. Operation: Go to Cloudflare dashboard → “Security” → “Settings” → Set “Security Level” to “Medium” or “Low.”
  3. Note: After lowering, observe attack logs and can cooperate with “Custom Rules” to precisely block real threats.

Turn Off False Block Options in Region Blocking

  • Risk point: If “Zone Lockdown” is enabled and blocks North American or European IP ranges, may accidentally block Google crawlers (Googlebot servers are mainly located in the US).
  • Operation: Go to “Security” → “WAF” → “Tools” → Check if region blocking is enabled, recommend temporarily disabling or excluding ASN15169 (Google’s dedicated network).

Turn Off Under Attack Mode (Red Shield Icon)

  • Impact: This mode forces all visitors to verify identity first (5-second redirect page), but Google crawlers cannot pass this verification, causing complete blocks.
  • Operation: On Cloudflare dashboard homepage → Find “Under Attack Mode” toggle → Confirm it’s turned off.

Disable JS Challenge for Search Engines

Fatal Error: When “Browser Integrity Check” is enabled, some crawlers (especially mobile Googlebot) fail to crawl because they cannot execute JS scripts.

Operation: Go to “Security” → “Settings” → Find “Browser Integrity Check” → Check “Do not apply to search engines.”

Additional: Can separately disable JS challenge for requests where User-Agent contains Googlebot.

Required Firewall Rules Whitelist

Simply lowering the security level may expose your site to risks—a more secure approach is to “precisely allow” Google crawlers through firewall rules.

Cloudflare supports setting whitelists based on conditions like User-Agent, IP source, and ASN (Autonomous System Number).

User-Agent Whitelist (Highest Priority)

Rule Function: Directly allow all requests carrying the Googlebot identifier, bypassing firewall detection.

Operation Path:

Cloudflare Dashboard → “Security” → “WAF” → “Rules” → Create new rule

  • Field: User-Agentcontains → Enter regex: .*Googlebot.*
  • Action: Select “Bypass” or “Skip”

Note: Must also match variants like Googlebot-Image (image crawler), Googlebot Smartphone (mobile version), etc.

ASN Allow (Prevent Fake User-Agent)

Necessity: Malicious crawlers may fake Googlebot’s UA, requiring IP source verification.

Operation: Add conditions in firewall rules:

  • Field: ASNequals → Enter 15169 (Google’s global server dedicated ASN number)
  • Action: Set to “Allow”

Verification Tool: Use IPinfo to query any IP’s ASN ownership.

Import Google’s Official IP Ranges (Ultimate Protection)

Data Source: Use Google’s officially published crawler IP list: https://developers.google.com/search/apis/ipranges/googlebot.json

Operation:

  1. Download JSON file, extract all IPv4/IPv6 address ranges
  2. In Cloudflare firewall rules, set “IP Source” to match these IP ranges and set to “Allow”

Maintenance Cost: Need to manually update IP database once per quarter (Google dynamically adjusts).

Set Rate Limiting Exception for Googlebot

Scenario: If “Rate Limiting” is enabled on your site, it may mistakenly judge high-frequency crawling as an attack.

Operation:

  1. Go to “Security” → “WAF” → “Rate Limiting Rules” → Edit existing rules
  2. Add condition: IP Source within Googlebot IP ranges → Select “Do not apply this rule”

Pitfall Avoidance Tips:

  • Rule priority: Ensure whitelist rules are above blocking rules (Cloudflare executes in top-to-bottom order).
  • Avoid over-allowing: If rules include both User-Agent and ASN conditions, use “AND” logic (instead of “OR”) to prevent being exploited by malicious requests.

Verify Successful Fix

After adjusting firewall rules, don’t rush to wait for Google to automatically recover! Due to cache delay or rule conflicts, you may encounter “configuration changed but crawler still blocked” situations.

The following methods can quickly verify fix effectiveness, avoiding missed best remediation timing due to misjudgment.

Simulate Googlebot Request in Terminal (Fastest Verification)

Command:

curl -A "Googlebot/2.1" https://your-site-url -I  

Key Indicators:

Returns HTTP/2 200: Crawling normal

Returns 403 or 5xx: Block not released

Returns 301/302: Check if caused by redirect rules causing crawler loop

View Cloudflare Allow Logs in Real-Time

Operation Path: Cloudflare Dashboard → “Security” → “Events” → Filter conditions:

  1. Action: Allow
  2. User-Agent: contains Googlebot
  3. ASN: 15169

Success Indicator: Multiple Allow records for Googlebot appearing within 5 minutes

Google Official Crawl Testing Tool

Tool: Google URL Inspection Tool: https://search.google.com/search-console/inspect

Operation:

Enter previously blocked URL → Click “Test Live URL” → Check “Crawl Status”

Pass Conditions: Display “URL is available on Google” with no “Blocked by robots.txt” warning

Monitor Mobile Crawler Dedicated UA

Special UA: Googlebot Smartphone (mobile crawler is more likely to trigger JS challenges)

Verification Method:

Search this UA in Cloudflare firewall logs

Or use command:

curl -A "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.606.0 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" https://your-site-url -I  

Submit Sitemap and Observe Index Coverage

Operation: Re-submit sitemap.xml in Google Search Console

Success Signals:

“Indexed” page count gradually increasing within 24 hours

Related errors in “Excluded” report decreasing

Precautions:

  • If using CDN cache, first clear Cloudflare cache (path: “Caching” → “Configuration” → “Purge Everything”)
  • Google crawler effective delay: Test tool results are real-time, but index recovery takes 1-3 days
  • Rule priority conflicts: Check if other firewall rules override whitelist settings

After adjusting firewall rules, crawler traffic usually recovers within 6 hours. If traffic doesn’t rebound, 90% of problems come from verification环节的疏漏 (oversights in the verification process). Use curl and real-time logs to precisely locate remaining block points!

Scroll to Top