Skip to main content
Blog/ How to scrape cf_clearance cookies from Cloudflare-protected websites
June 10, 2026 · 9 min read

How to scrape cf_clearance cookies from Cloudflare-protected websites

Joel Olawanle
Joel Olawanle
How to scrape cf_clearance cookies from Cloudflare-protected websites

Cloudflare does not just check whether you have a valid cookie. It checks your browser fingerprint, JavaScript execution ability, behavioral patterns, IP reputation, and a range of other signals before deciding whether to grant access. If your request passes all of those checks, Cloudflare issues a cf_clearance cookie that acts as a session pass for subsequent requests to that site.

The challenge for scrapers is that standard HTTP clients like Python's requests library cannot solve Cloudflare's initial challenges. No challenge solved means no cf_clearance cookie, which means no access.

One approach is to solve the challenge once using a tool that can handle it, extract the cf_clearance cookie, and then use that cookie in your regular scraping requests within the same session.

In this tutorial you will learn how cf_clearance works, how to extract it using CF-Clearance-Scraper, and how to use it in a requests session to bypass Cloudflare.

Understanding cf_clearance and how Cloudflare issues it

When a request reaches a Cloudflare-protected site, Cloudflare runs a series of checks before deciding whether to let it through. These include JavaScript challenge solving, browser fingerprint analysis, IP reputation checks, behavioral signals, and network traffic patterns.

A request that passes all of these checks receives the cf_clearance cookie. This cookie is then required on all subsequent requests to that site within the same session.

Two things make cf_clearance strict to work with:

  • It is bound to an IP address. The cookie is tied to the IP that solved the original challenge. If the IP changes mid-session, Cloudflare invalidates the cookie immediately.
  • It is bound to a User Agent. The same User Agent string used during the challenge must be sent with every subsequent request. A mismatch triggers a new challenge or a block.

This means you cannot just extract a cookie once and reuse it freely. You need to maintain the exact same IP and User Agent throughout the entire session that cookie covers.

How to scrape and use cf_clearance cookies

You will use CF-Clearance-Scraper, a command-line tool that runs a headless Chrome instance to solve Cloudflare challenges and extract the resulting cf_clearance cookie. Then you will use that cookie in a requests session to access the protected content.

Step 1: Requirements and installation

CF-Clearance-Scraper requires Python 3.10 or later and Chrome installed on your machine. Clone the repository and install its dependencies:

git clone https://github.com/Xewdy444/CF-Clearance-Scraper
cd CF-Clearance-Scraper
pip3 install -r requirements.txt

Step 2: Understanding the parameters

CF-Clearance-Scraper runs from the command line by executing main.py with the target URL and optional configuration parameters:

ParameterDescription
URLThe Cloudflare-protected target URL (required)
-fOutput JSON file to write the scraped cookies
-tRequest timeout in seconds
-pProxy URL to use when solving the challenge
-uaUser Agent string for the request
--disable-http2Disables HTTP/2 protocol
--disable-http3Disables HTTP/3 protocol
-acSave all cookies in addition to cf_clearance

The tool works best when you provide a User Agent and a proxy. The User Agent you pass here is the one you must use in every subsequent request that uses this cookie.

The basic command structure:

python main.py -p <PROXY_URL> -t <TIMEOUT> -ua "<USER_AGENT>" -f cookies.json <TARGET_URL>

Step 3: Scraping the cf_clearance cookie

Run the command against a Cloudflare-protected page. This example uses a 60 second timeout and writes cookies to cookies.json:

python main.py \
  -p http://190.58.248.86:80 \
  -t 60 \
  -ua "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36" \
  -f cookies.json \
  https://www.scrapingcourse.com/cloudflare-challenge
# Output
[12:40:42] [INFO] Cookie: cf_clearance=KkssR4xQ9xEJwlNtUXQEKkoQl...lgI5

The cookie is logged to the terminal and written to cookies.json. To use this in a scraper, you need to capture it programmatically. Here is a Python function that runs the command via subprocess and extracts the cookie value from the output using regex:

import subprocess
import re

def get_cf_clearance(url, proxy, user_agent):
    command = [
        "python", "main.py",
        "-p", proxy,
        "-t", "60",
        "-ua", user_agent,
        "-f", "cookies.json",
        url,
    ]

    try:
        process = subprocess.run(
            command,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )

        match = re.search(r"cf_clearance=([^\s]+)", process.stdout)
        return match.group(1) if match else None

    except Exception as e:
        print(f"Error: {e}")
        return None

target_url = "https://www.scrapingcourse.com/cloudflare-challenge"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
proxy = "http://190.58.248.86:80"

cf_clearance = get_cf_clearance(target_url, proxy, user_agent)
print(cf_clearance)
# Output
MKybX880PCu.GfWLhonkBnG64WBs4ASAXeZ...Tux0eDI

Step 4: Using the cf_clearance cookie in your scraper

Now use the cookie in a requests session. The session must use the exact same User Agent and proxy that was used to obtain the cookie. Any deviation invalidates it:

import subprocess
import re
import requests

def get_cf_clearance(url, proxy, user_agent):
    command = [
        "python", "main.py",
        "-p", proxy,
        "-t", "60",
        "-ua", user_agent,
        "-f", "cookies.json",
        url,
    ]

    try:
        process = subprocess.run(
            command,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )
        match = re.search(r"cf_clearance=([^\s]+)", process.stdout)
        return match.group(1) if match else None
    except Exception as e:
        print(f"Error: {e}")
        return None

def scrape_with_clearance(url, cf_clearance, proxy, user_agent):
    session = requests.Session()

    # cookie, User Agent, and proxy must all match what was used to obtain the cookie
    session.cookies.set("cf_clearance", cf_clearance)
    session.headers.update({"User-Agent": user_agent})
    session.proxies.update({"http": proxy, "https": proxy})

    try:
        response = session.get(url)
        response.raise_for_status()
        return response.text
    except requests.exceptions.RequestException as e:
        return f"Request failed: {e}"

target_url = "https://www.scrapingcourse.com/cloudflare-challenge"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
proxy = "http://190.58.248.86:80"

cf_clearance = get_cf_clearance(target_url, proxy, user_agent)

if cf_clearance:
    html = scrape_with_clearance(target_url, cf_clearance, proxy, user_agent)
    print(html)
else:
    print("Failed to retrieve cf_clearance. Exiting.")
<!-- Output -->
<h2>You bypassed the Cloudflare challenge! :D</h2>

A successful run returns the protected page HTML. Note the caveat in the code comment: the cookie, User Agent, and proxy must all match exactly what was used during the challenge. One mismatch and Cloudflare rejects the session.

Sticky sessions for rotating proxies

If you use a rotating proxy service, standard rotation will break your session because the IP changes between requests. Look for a service that supports sticky sessions, which pins you to the same exit IP for a configurable time window.

With a sticky session you set the IP lifetime long enough to cover your full scraping session. If it is a short crawl, 1 to 5 minutes is usually enough. For longer jobs, extend it accordingly.

The limitations of the cf_clearance approach

The manual cf_clearance approach works but it is genuinely fragile in practice.

  • Low and inconsistent success rate. The ZenRows docs on CF-Clearance-Scraper openly acknowledge that the tool may need multiple runs before successfully extracting the cookie. On some Cloudflare configurations it may not succeed at all. You often need to retry, and there is no reliable signal for how many retries a given target will take.
  • Cookies expire mid-session. cf_clearance cookies have a finite lifetime. A long scraping job can run past the cookie's expiry, which breaks the session mid-run and leaves you with incomplete data. You need to detect this, re-solve the challenge, and restart the affected portion of the crawl.
  • IP binding is strict. If your proxy rotates between the challenge-solving step and the scraping step, the cookie is immediately invalid. Even a brief IP change is enough to trigger a block. This makes the approach incompatible with most standard rotating proxy setups unless sticky sessions are available and configured correctly.
  • Cloudflare updates break it. CF-Clearance-Scraper is open source. Cloudflare can study its approach and update their challenge mechanism to defeat it. A tool that worked reliably last month may start failing consistently after a Cloudflare update. There is no automatic recovery.
  • Chrome overhead. The tool runs a full headless Chrome instance to solve each challenge. That is significant memory and startup time for what is essentially a cookie retrieval step, before any actual scraping has happened.

A more reliable alternative: Spidra

The core problem with the cf_clearance approach is that you are doing Cloudflare's challenge-solving in a fragile, manually-maintained way and then trying to carry that solved state across into a different HTTP client. Every handoff point in that chain is a failure mode.

Spidra eliminates the handoff entirely. Every request runs through a real browser with residential proxy rotation, CAPTCHA solving, and fingerprint management built in.

spidra-ui.webp

Cloudflare's challenge-solving happens inside the same request context that fetches the page. There is no cookie to extract, transfer, or expire. You just send the URL.

pip install spidra
from spidra import SpidraClient, ScrapeParams, ScrapeUrl
import os

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[ScrapeUrl(url="https://www.scrapingcourse.com/cloudflare-challenge/")],
    prompt="Extract the main heading and body text",
    use_proxy=True,
    proxy_country="us",
))

print(job.result.content)
# { "heading": "You bypassed the Cloudflare challenge! :D" }

No Chrome to launch. No cookie to manage. No sticky session to configure. No retry logic to build. The same request works on the first call.

And unlike the cf_clearance approach, which returns raw HTML you still need to parse, Spidra extracts exactly what you describe and returns clean structured JSON. For the Cloudflare page above, the output is already structured and ready to use without any parsing step.

For scraping the actual content of a protected page:

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[ScrapeUrl(url="https://www.scrapingcourse.com/cloudflare-challenge/")],
    prompt="Extract all product names and prices",
    output="json",
    use_proxy=True,
    proxy_country="us",
))

print(job.result.content)
[
    {"name": "Abominable Hoodie", "price": "$69.00"},
    {"name": "Adrienne Trek Jacket", "price": "$57.00"}
]

Proxy usage is billed against your bandwidth quota separately so there is no credit multiplier when anti-bot bypass is needed.

cf_clearance approach vs. Spidra

cf_clearance + CF-Clearance-ScraperSpidra
Cloudflare bypassInconsistent, may need retriesBuilt in, automatic
Cookie managementManual, must maintain IP and UANot needed
Session expiry handlingManual, you detect and re-solveNot applicable
Proxy requirementSticky session requiredBuilt in, 50 countries
Chrome overheadYes, full instance per challengeManaged infrastructure
Structured outputRaw HTML, you parse itAI extraction, optional schema
Maintenance as Cloudflare updatesManual, tool can breakHandled by Spidra
Best forUnderstanding how cf_clearance worksProduction scraping of protected sites

Conclusion

The cf_clearance approach is a real technique and understanding how it works is genuinely useful. The cf_clearance cookie is Cloudflare's session pass and extracting it manually is one way to get through the protection.

The practical problem is reliability. The success rate is inconsistent, cookies expire, IP binding is strict, and Cloudflare updates can break the entire approach without warning. For a production scraping pipeline that needs to run reliably, the maintenance overhead of keeping the cf_clearance approach working is significant.

Spidra handles Cloudflare bypass automatically inside every request, with no cookie management, no sticky session configuration, and no fragile handoffs between tools. The same code works today and after the next Cloudflare update.

Get started free at spidra.io. No credit card required.

Frequently asked questions

It is a session cookie that Cloudflare issues to clients that have passed its bot detection checks. The cookie acts as a clearance token for that session, allowing subsequent requests to proceed without repeating the full challenge. It is bound to the IP address and User Agent used during the original challenge.

Share this article

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.