Blog/ Stealth browsers for web scraping

March 19, 2026 · 10 min read

Stealth browsers for web scraping

Shittu Olumide

When web scraping heavily protected websites, standard headless browser automation can falter. Even with sophisticated IP rotation and realistic browser headers, advanced anti-bot systems analyze a multitude of signals. These include network-level characteristics like TLS fingerprints, browser-level attributes such as JavaScript properties, and behavioral patterns to discern automated traffic from human interaction.

Stealth browsers are engineered to circumvent these detection mechanisms. This article will demystify what stealth browsers are, how they operate, and provide an in-depth benchmark of eight prominent open-source solutions to assess their efficacy in web scraping scenarios.

Before delving into the specifics of each tool, a concise overview is presented below.

Tool	Primary Language	Description	Ideal Use Case	Recommended Scale
Byparr	Python	Self-hosted reverse proxy server with an HTTP API for anti-bot bypass, compatible with FlareSolverr	Prioritizing bypass success on Cloudflare-protected sites where higher latency is acceptable	Moderate concurrency, accepting operational overhead for full browser instances
FlareSolverr	Python	Open-source proxy server driving a headless browser, accessible via a ``/v1`` API	Balancing success rate and speed for Cloudflare-protected targets	Light, short-lived tasks
Camoufox	Python	Anti-detect browser (Firefox-based) with a Playwright-style API for deep fingerprint customization	Python-based scraping of challenging anti-bot pages where fine-grained control over fingerprints is paramount	Small to medium workloads with limited concurrency; latency is not a concern
Zendriver	Python	Async-first Python CDP automation for real Chrome sessions with profile and cookie persistence	Python CDP-based scrapers needing real Chrome sessions and built-in profile/cookie management	Moderate concurrency until CPU/memory from full Chromium sessions become bottlenecks
Pydoll	Python	Async-first CDP-based Chromium automation library focused on anti-bot evasion and realistic interactions	Async Chromium scraping with built-in support for Cloudflare Turnstile and reCAPTCHA v3	Moderate parallel tabs before resource constraints become critical
Puppeteer Real Browser	Node.js	Node.js library launching a full Chrome binary via ``chrome-launcher`` controlled by Puppeteer	Short-lived Node.js tasks requiring CAPTCHA solving in a real Chrome window, already using Puppeteer	Small runs with low concurrency due to resource demands of full browsers
Scrapling	Python	Python scraping library with HTTP, Playwright, and Camoufox-based stealth fetchers for specific targets	Scraping tasks requiring bypass for only a few anti-bot-protected URLs within a larger crawl	Small to medium mixed crawls; stealth mode has scalability limitations
SeleniumBase	Python	Python automation framework built on Selenium, featuring an Undetected ChromeDriver mode	Selenium-based projects needing support for Cloudflare Turnstile or similar checkbox CAPTCHAs	Small to medium runs where headed Chrome is feasible and tuning is acceptable

Understanding Stealth Browsers

A stealth browser is essentially an automation framework enhanced to present itself as a standard user session to anti-bot systems. While leveraging established automation libraries like Playwright, Puppeteer, or Selenium, it actively conceals the tell-tale signs of automated execution.

The core function of a stealth browser is to modify the signals that websites use for detection. This involves masking characteristics such as JavaScript properties that identify the browser environment, and altering how graphical elements like canvas and WebGL are rendered, which can expose automation. Furthermore, stealth browsers adjust network-level details, including the order of HTTP headers and Transport Layer Security (TLS) fingerprints, aligning them with patterns observed from genuine desktop browsers. By achieving this, automated sessions acquire a human-like browser fingerprint.

The Mechanics of Stealth Browsers

Stealth browsers operate by orchestrating changes across multiple layers: network communication, internal browser behavior, and page interaction. They typically integrate with a full browser engine, ensuring that network connections, HTTP/2 framing, and TLS handshake details mirror those of a typical desktop user.

Prior to rendering a web page, a stealth layer modifies the browser's operational parameters. This is achieved through custom browser executables, patched drivers, injected scripts, or a combination thereof. These modifications are designed to obscure automation indicators and mitigate the risk of detection. Many stealth solutions also facilitate the maintenance of consistent browser profiles, enabling the reuse of cookies and local storage across multiple requests. Depending on the sophistication of the stealth implementation, they can also introduce human-like interaction patterns, such as variable timing for actions, simulated scrolling, and cursor movements, to prevent bot-like predictability. Libraries and plugins encapsulate these evasion techniques, providing a unified "stealth" mode that wraps common automation tools.

To illustrate the differences, consider the signals exposed by a standard headless browser versus one employing a stealth plugin when visiting a fingerprinting test page.

First, let's examine a basic Playwright script without any stealth enhancements:

# Install required library: pip3 install playwright
# Then install browser binaries: playwright install chromium

import asyncio
from playwright.async_api import async_playwright

# Define key signals to inspect
SIGNALS = {
    "navigator.webdriver": "navigator.webdriver",
    "plugins count": "navigator.plugins.length",
    "languages": "JSON.stringify(navigator.languages)",
    "userAgent": "navigator.userAgent",
    "outerWidth x outerHeight": "`${window.outerWidth}x${window.outerHeight}`",
    "Notification.permission": "Notification.permission",
}

async  def  analyze_headless_browser():
    async  with async_playwright() as p:
    # Launch a standard headless browser
    browser = await p.chromium.launch(headless=True)
    page = await browser.new_page()
    await page.goto("https://browserleaks.com/javascript", wait_until="networkidle")
    print(f"{'Signal':<30}  {'Value'}")

# Evaluate JavaScript to retrieve signal values
for label, js_expression in SIGNALS.items():
    value = await page.evaluate(f"() => String({js_expression})")
    print(f"{label:<30}  {value}")
    await browser.close()

if  __name__ == "__main__":
    asyncio.run(analyze_headless_browser())

The output from this script reveals clear indicators of automated activity:

Signal Value
navigator.webdriver true
plugins count 0
languages ["en-US"]
userAgent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/145.0.7632.6 Safari/537.36
outerWidth x outerHeight 1280x720
Notification.permission denied

The presence of `navigator.webdriver: true`, an empty plugin list, and `HeadlessChrome` in the User Agent string are all explicit flags that anti-bot systems can readily detect.

Now, let's re-run the same analysis using Camoufox, a Playwright-integrated stealth browser that utilizes a Firefox backend:

# Install required library: pip install -U camoufox[geoip]
# Fetch the browser binary: camoufox fetch (or appropriate command for your OS)

from camoufox.sync_api import Camoufox

# Re-use the same signals dictionary
SIGNALS = {
    "navigator.webdriver": "navigator.webdriver",
    "plugins count": "navigator.plugins.length",
    "languages": "JSON.stringify(navigator.languages)",
    "userAgent": "navigator.userAgent",
    "outerWidth x outerHeight": "`${window.outerWidth}x${window.outerHeight}`",
    "Notification.permission": "Notification.permission",
}

# Launch Camoufox in headless mode
with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto("https://browserleaks.com/javascript", wait_until="networkidle")
    print(f"{'Signal':<30}  {'Value'}")

for label, js_expression in SIGNALS.items():
    value = page.evaluate(f"() => String({js_expression})")
    print(f"{label:<30}  {value}")
    page.close()

A sample output from the Camoufox-enabled script shows a significant change:

Signal Value
navigator.webdriver false
plugins count 5
languages ["en-US","en"]
userAgent Mozilla/5.0 (X11; Linux x86_64; rv:135.0) Gecko/20100101 Firefox/135.0
outerWidth x outerHeight 1920x1048
Notification.permission default

Here, `navigator.webdriver` is false, a realistic number of plugins is detected, and the User Agent string no longer contains `HeadlessChrome`, presenting a much more human-like profile.

Benchmarking Open-Source Stealth Browsers for Web Scraping

To empirically evaluate the effectiveness of various open-source stealth browsers, a rigorous benchmark was conducted. Eight prominent tools (Byparr, FlareSolverr, Camoufox, Scrapling, Pydoll, Puppeteer Real Browser, SeleniumBase, and Zendriver) were tested against a challenging anti-bot protection page.

Methodology for Tool Evaluation

The selection of these eight tools was based on extensive research into the current landscape of open-source stealth browser solutions.

The benchmark involved a mix of "cold" runs, initiating each test with a fresh browser profile, and "warm" runs, which reused existing profiles and session cookies. Each tool underwent 100 sequential test iterations. During these iterations, we recorded the success rate of bypassing the anti-bot challenge and the time taken for each bypass. The testing environment consisted of a machine with 16 GB of RAM and a 2.60GHz processor.

Important Note: While variations in execution speed can occur due to language differences and underlying automation tools, speed was included in this benchmark because both bypass reliability and execution performance are critical in practical web scraping. Ultimately, the most effective tool is one that strikes an optimal balance between these two factors, irrespective of its technological underpinnings.

The aggregated results are as follows:

Tool	Primary Language	Success Rate (%)	Bypass Time (s)
Byparr	Python	92.16	18.28
FlareSolverr	Python	90.38	15.21
Camoufox	Python	88.58	42.49
SeleniumBase	Python	80.76	20.14
Pydoll	Python	78.76	28.87
Zendriver	Python	62.68	18.02
Scrapling	Python	58.03	29.16
Puppeteer Real Browser	Node.js	57.36	12.65

Bar chart comparing success rate and bypass time of open-source stealth browsers against anti-bot measures.png

The success rates reported here are specific to the conditions of this benchmark and may evolve with future anti-bot updates. Furthermore, the effectiveness of these tools can vary significantly based on the testing environment. A common challenge with open-source stealth solutions is the degradation of their efficacy over time, as maintaining pace with rapid anti-bot security enhancements is a persistent hurdle for maintainers.

Based solely on success rate, Byparr emerged as the top-performing open-source stealth browser in this evaluation.

Evaluating Bypass Reliability Against Execution Speed

However, when considering the interplay between success rate and bypass time, FlareSolverr presents a compelling case for efficiency. It achieves a high success rate of 90.38% while completing bypasses in an average of just 15.21 seconds.

Conversely, Camoufox, with the longest bypass time at 42.49 seconds, demonstrates that extensive fingerprint spoofing does not always guarantee a superior success rate (88.58%). In many practical scenarios, more streamlined and specialized stealth solutions can yield comparable or better results with substantially lower latency.

For many scraping projects, especially those with tight performance requirements, the trade-off between advanced customization and processing speed is a critical decision point. Tools like Puppeteer Real Browser, despite a lower success rate in this benchmark (57.36%), offer the quickest bypass times at 12.65 seconds. This might make them suitable for less resilient targets or scenarios where rapid, albeit less reliable, access is prioritized.

The benchmark highlights that there isn't a single "best" stealth browser; the ideal choice depends heavily on the specific requirements of the scraping task. Factors such as the target website's anti-bot sophistication, the need for speed versus accuracy, and the operational capacity for self-hosting or managing complex infrastructure all play a role.

The Evolving Landscape of Anti-Bot Technologies

Anti-bot systems are in a constant arms race with web scrapers. As stealth browsers become more sophisticated in mimicking human behavior and browser fingerprints, anti-bot vendors develop new detection methods. These include more advanced behavioral analysis, device fingerprinting techniques that go beyond traditional browser attributes (e.g., font rendering, audio context), and more complex challenges that require nuanced human-like interaction to solve.

For instance, while many stealth browsers aim to defeat standard challenges like CAPTCHAs and TLS fingerprinting, newer methods might analyze the subtle timing variations in mouse movements, scroll events, or even the sequence in which browser APIs are accessed. This means that a stealth browser that is highly effective today might require updates to remain functional tomorrow.

Scaling Open-Source Stealth Browsers

While the open-source solutions evaluated offer powerful capabilities for individual scraping tasks or smaller projects, scaling them to enterprise levels presents distinct challenges. These include:

Infrastructure Management: Running multiple instances of stealth browsers, especially those that launch full browser processes, requires significant server resources and robust orchestration.
Proxy Management: Maintaining a large pool of clean, rotating IP addresses is crucial for high-volume scraping, and managing this can be complex and costly.
Fingerprint Drift: Anti-bot systems constantly update their detection signatures. Open-source projects, while often updated by the community, may lag behind these changes, leading to reduced success rates over time.
Maintenance Overhead: Keeping up with browser updates, library changes, and evolving anti-bot techniques requires dedicated development and maintenance resources.

When to Consider a Managed Solution

For mission-critical applications that demand near-perfect uptime, resilience against sophisticated and rapidly evolving anti-bot measures, and simplified scalability, a managed scraping solution often proves to be the most pragmatic and cost-effective approach. These platforms abstract away the complexities of proxy management, browser automation, CAPTCHA solving, and ongoing maintenance. They provide a unified API or interface that allows users to focus on defining their data extraction needs rather than managing the underlying infrastructure.

Embracing a No-Code Future for Web Scraping

The challenges of web scraping, particularly against advanced anti-bot defenses, have historically required significant technical expertise. However, the landscape is evolving, with solutions emerging that aim to democratize data extraction through more intuitive interfaces and powerful underlying technologies.

If you want to bypass complex anti-bot measures without managing intricate infrastructure, check out Spidra. It offers an AI-powered, no-code approach. Spidra transforms web scraping by allowing users to define data extraction needs using plain English prompts. It automatically handles the complexities of residential proxy rotation, JavaScript rendering, and CAPTCHA solving (including Cloudflare Turnstile and reCAPTCHA v2/v3), presenting a simplified API for programmatic access or a visual interface for building and scheduling scrapers. This approach allows users to focus on the data itself, rather than the mechanics of retrieving it from protected websites.

Share this article

Guides

HTML vs Markdown for AI: which format is better for LLMs?

Raw HTML wastes up to 90% of your LLM context window on boilerplate. Learn why Markdown is better for RAG and AI training, and when HTML is actually the right choice.

June 12, 2026 · 12 min read

Guides

What is a headless browser? How it works, uses, and tools (2026)

Learn what a headless browser is, how it works, Playwright vs Puppeteer explained with real code, and when to use a managed API instead.

June 11, 2026 · 15 min read

Guides

How to scrape cf_clearance cookies from Cloudflare-protected websites

Learn how to extract cf_clearance cookies from Cloudflare-protected sites, use them in a requests session, and why the approach breaks at scale.

June 10, 2026 · 10 min read

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.

Stealth browsers for web scraping

Understanding Stealth Browsers

The Mechanics of Stealth Browsers

Benchmarking Open-Source Stealth Browsers for Web Scraping

Methodology for Tool Evaluation

Evaluating Bypass Reliability Against Execution Speed

The Evolving Landscape of Anti-Bot Technologies

Scaling Open-Source Stealth Browsers

When to Consider a Managed Solution

Embracing a No-Code Future for Web Scraping

Share this article

Related posts

HTML vs Markdown for AI: which format is better for LLMs?

What is a headless browser? How it works, uses, and tools (2026)

How to scrape cf_clearance cookies from Cloudflare-protected websites

Start scraping for free.