Cloudflare does not just check whether you have a valid cookie. It checks your browser fingerprint, JavaScript execution ability, behavioral patterns, IP reputation, and a range of other signals before deciding whether to grant access. If your request passes all of those checks, Cloudflare issues a cf_clearance cookie that acts as a session pass for subsequent requests to that site.
The challenge for scrapers is that standard HTTP clients like Python's requests library cannot solve Cloudflare's initial challenges. No challenge solved means no cf_clearance cookie, which means no access.
One approach is to solve the challenge once using a tool that can handle it, extract the cf_clearance cookie, and then use that cookie in your regular scraping requests within the same session.
In this tutorial you will learn how cf_clearance works, how to extract it using CF-Clearance-Scraper, and how to use it in a requests session to bypass Cloudflare.
Understanding cf_clearance and how Cloudflare issues it
When a request reaches a Cloudflare-protected site, Cloudflare runs a series of checks before deciding whether to let it through. These include JavaScript challenge solving, browser fingerprint analysis, IP reputation checks, behavioral signals, and network traffic patterns.
A request that passes all of these checks receives the cf_clearance cookie. This cookie is then required on all subsequent requests to that site within the same session.
Two things make cf_clearance strict to work with:
- It is bound to an IP address. The cookie is tied to the IP that solved the original challenge. If the IP changes mid-session, Cloudflare invalidates the cookie immediately.
- It is bound to a User Agent. The same User Agent string used during the challenge must be sent with every subsequent request. A mismatch triggers a new challenge or a block.
This means you cannot just extract a cookie once and reuse it freely. You need to maintain the exact same IP and User Agent throughout the entire session that cookie covers.
How to scrape and use cf_clearance cookies
You will use CF-Clearance-Scraper, a command-line tool that runs a headless Chrome instance to solve Cloudflare challenges and extract the resulting cf_clearance cookie. Then you will use that cookie in a requests session to access the protected content.
Step 1: Requirements and installation
CF-Clearance-Scraper requires Python 3.10 or later and Chrome installed on your machine. Clone the repository and install its dependencies:
git clone https://github.com/Xewdy444/CF-Clearance-Scraper
cd CF-Clearance-Scraper
pip3 install -r requirements.txtStep 2: Understanding the parameters
CF-Clearance-Scraper runs from the command line by executing main.py with the target URL and optional configuration parameters:
| Parameter | Description |
|---|---|
URL | The Cloudflare-protected target URL (required) |
-f | Output JSON file to write the scraped cookies |
-t | Request timeout in seconds |
-p | Proxy URL to use when solving the challenge |
-ua | User Agent string for the request |
--disable-http2 | Disables HTTP/2 protocol |
--disable-http3 | Disables HTTP/3 protocol |
-ac | Save all cookies in addition to cf_clearance |
The tool works best when you provide a User Agent and a proxy. The User Agent you pass here is the one you must use in every subsequent request that uses this cookie.
The basic command structure:
python main.py -p <PROXY_URL> -t <TIMEOUT> -ua "<USER_AGENT>" -f cookies.json <TARGET_URL>Step 3: Scraping the cf_clearance cookie
Run the command against a Cloudflare-protected page. This example uses a 60 second timeout and writes cookies to cookies.json:
python main.py \
-p http://190.58.248.86:80 \
-t 60 \
-ua "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36" \
-f cookies.json \
https://www.scrapingcourse.com/cloudflare-challenge# Output
[12:40:42] [INFO] Cookie: cf_clearance=KkssR4xQ9xEJwlNtUXQEKkoQl...lgI5The cookie is logged to the terminal and written to cookies.json. To use this in a scraper, you need to capture it programmatically. Here is a Python function that runs the command via subprocess and extracts the cookie value from the output using regex:
import subprocess
import re
def get_cf_clearance(url, proxy, user_agent):
command = [
"python", "main.py",
"-p", proxy,
"-t", "60",
"-ua", user_agent,
"-f", "cookies.json",
url,
]
try:
process = subprocess.run(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
)
match = re.search(r"cf_clearance=([^\s]+)", process.stdout)
return match.group(1) if match else None
except Exception as e:
print(f"Error: {e}")
return None
target_url = "https://www.scrapingcourse.com/cloudflare-challenge"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
proxy = "http://190.58.248.86:80"
cf_clearance = get_cf_clearance(target_url, proxy, user_agent)
print(cf_clearance)# Output
MKybX880PCu.GfWLhonkBnG64WBs4ASAXeZ...Tux0eDIStep 4: Using the cf_clearance cookie in your scraper
Now use the cookie in a requests session. The session must use the exact same User Agent and proxy that was used to obtain the cookie. Any deviation invalidates it:
import subprocess
import re
import requests
def get_cf_clearance(url, proxy, user_agent):
command = [
"python", "main.py",
"-p", proxy,
"-t", "60",
"-ua", user_agent,
"-f", "cookies.json",
url,
]
try:
process = subprocess.run(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
)
match = re.search(r"cf_clearance=([^\s]+)", process.stdout)
return match.group(1) if match else None
except Exception as e:
print(f"Error: {e}")
return None
def scrape_with_clearance(url, cf_clearance, proxy, user_agent):
session = requests.Session()
# cookie, User Agent, and proxy must all match what was used to obtain the cookie
session.cookies.set("cf_clearance", cf_clearance)
session.headers.update({"User-Agent": user_agent})
session.proxies.update({"http": proxy, "https": proxy})
try:
response = session.get(url)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
return f"Request failed: {e}"
target_url = "https://www.scrapingcourse.com/cloudflare-challenge"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
proxy = "http://190.58.248.86:80"
cf_clearance = get_cf_clearance(target_url, proxy, user_agent)
if cf_clearance:
html = scrape_with_clearance(target_url, cf_clearance, proxy, user_agent)
print(html)
else:
print("Failed to retrieve cf_clearance. Exiting.")<!-- Output -->
<h2>You bypassed the Cloudflare challenge! :D</h2>A successful run returns the protected page HTML. Note the caveat in the code comment: the cookie, User Agent, and proxy must all match exactly what was used during the challenge. One mismatch and Cloudflare rejects the session.
Sticky sessions for rotating proxies
If you use a rotating proxy service, standard rotation will break your session because the IP changes between requests. Look for a service that supports sticky sessions, which pins you to the same exit IP for a configurable time window.
With a sticky session you set the IP lifetime long enough to cover your full scraping session. If it is a short crawl, 1 to 5 minutes is usually enough. For longer jobs, extend it accordingly.
The limitations of the cf_clearance approach
The manual cf_clearance approach works but it is genuinely fragile in practice.
- Low and inconsistent success rate. The ZenRows docs on CF-Clearance-Scraper openly acknowledge that the tool may need multiple runs before successfully extracting the cookie. On some Cloudflare configurations it may not succeed at all. You often need to retry, and there is no reliable signal for how many retries a given target will take.
- Cookies expire mid-session.
cf_clearancecookies have a finite lifetime. A long scraping job can run past the cookie's expiry, which breaks the session mid-run and leaves you with incomplete data. You need to detect this, re-solve the challenge, and restart the affected portion of the crawl. - IP binding is strict. If your proxy rotates between the challenge-solving step and the scraping step, the cookie is immediately invalid. Even a brief IP change is enough to trigger a block. This makes the approach incompatible with most standard rotating proxy setups unless sticky sessions are available and configured correctly.
- Cloudflare updates break it. CF-Clearance-Scraper is open source. Cloudflare can study its approach and update their challenge mechanism to defeat it. A tool that worked reliably last month may start failing consistently after a Cloudflare update. There is no automatic recovery.
- Chrome overhead. The tool runs a full headless Chrome instance to solve each challenge. That is significant memory and startup time for what is essentially a cookie retrieval step, before any actual scraping has happened.
A more reliable alternative: Spidra
The core problem with the cf_clearance approach is that you are doing Cloudflare's challenge-solving in a fragile, manually-maintained way and then trying to carry that solved state across into a different HTTP client. Every handoff point in that chain is a failure mode.
Spidra eliminates the handoff entirely. Every request runs through a real browser with residential proxy rotation, CAPTCHA solving, and fingerprint management built in.
Cloudflare's challenge-solving happens inside the same request context that fetches the page. There is no cookie to extract, transfer, or expire. You just send the URL.
pip install spidrafrom spidra import SpidraClient, ScrapeParams, ScrapeUrl
import os
spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])
job = spidra.scrape.run_sync(ScrapeParams(
urls=[ScrapeUrl(url="https://www.scrapingcourse.com/cloudflare-challenge/")],
prompt="Extract the main heading and body text",
use_proxy=True,
proxy_country="us",
))
print(job.result.content)
# { "heading": "You bypassed the Cloudflare challenge! :D" }No Chrome to launch. No cookie to manage. No sticky session to configure. No retry logic to build. The same request works on the first call.
And unlike the cf_clearance approach, which returns raw HTML you still need to parse, Spidra extracts exactly what you describe and returns clean structured JSON. For the Cloudflare page above, the output is already structured and ready to use without any parsing step.
For scraping the actual content of a protected page:
job = spidra.scrape.run_sync(ScrapeParams(
urls=[ScrapeUrl(url="https://www.scrapingcourse.com/cloudflare-challenge/")],
prompt="Extract all product names and prices",
output="json",
use_proxy=True,
proxy_country="us",
))
print(job.result.content)[
{"name": "Abominable Hoodie", "price": "$69.00"},
{"name": "Adrienne Trek Jacket", "price": "$57.00"}
]Proxy usage is billed against your bandwidth quota separately so there is no credit multiplier when anti-bot bypass is needed.
cf_clearance approach vs. Spidra
| cf_clearance + CF-Clearance-Scraper | Spidra | |
|---|---|---|
| Cloudflare bypass | Inconsistent, may need retries | Built in, automatic |
| Cookie management | Manual, must maintain IP and UA | Not needed |
| Session expiry handling | Manual, you detect and re-solve | Not applicable |
| Proxy requirement | Sticky session required | Built in, 50 countries |
| Chrome overhead | Yes, full instance per challenge | Managed infrastructure |
| Structured output | Raw HTML, you parse it | AI extraction, optional schema |
| Maintenance as Cloudflare updates | Manual, tool can break | Handled by Spidra |
| Best for | Understanding how cf_clearance works | Production scraping of protected sites |
Conclusion
The cf_clearance approach is a real technique and understanding how it works is genuinely useful. The cf_clearance cookie is Cloudflare's session pass and extracting it manually is one way to get through the protection.
The practical problem is reliability. The success rate is inconsistent, cookies expire, IP binding is strict, and Cloudflare updates can break the entire approach without warning. For a production scraping pipeline that needs to run reliably, the maintenance overhead of keeping the cf_clearance approach working is significant.
Spidra handles Cloudflare bypass automatically inside every request, with no cookie management, no sticky session configuration, and no fragile handoffs between tools. The same code works today and after the next Cloudflare update.
Get started free at spidra.io. No credit card required.
