Are headless browsers detectable by websites?

Yes. Anti-bot systems check for specific signals that headless browsers expose: the navigator.webdriver flag, HeadlessChrome in the user agent, empty plugin and extension lists, software renderer strings in WebGL, and behavioral patterns like machine-precise timing. Stealth plugins address the most obvious signals but sophisticated systems running JavaScript challenges are harder to bypass.

What is the best headless browser for web scraping?

Playwright with Chromium is the recommended choice for new projects in 2026. It has the cleanest API, built-in auto-wait for elements, network interception, and active development. Puppeteer is a solid alternative for Node.js projects already using it. Selenium is worth choosing if you need multi-browser or multi-language support in an existing testing infrastructure.

Can headless browsers handle JavaScript-rendered pages?

Yes. This is the primary reason to use a headless browser over a plain HTTP client. The browser executes all JavaScript, waits for async data calls, and gives you the fully rendered DOM state. Content that appears dynamically after page load is accessible the same way it is to a real user.

What is headless mode in a browser?

Headless mode is a way to run a browser without displaying its graphical interface. Chrome, Chromium, and Firefox all support headless mode. You enable it at launch with a flag or configuration option. The browser works identically to its normal mode in terms of page processing and JavaScript execution.

Is it legal to use headless browsers for web scraping?

Generally yes, for publicly accessible content. What matters legally is what you scrape and how: whether you respect robots.txt and the site's terms of service, whether you are collecting personal data, and whether your requests are placing unreasonable load on the server. Consulting applicable laws and the site's terms before scraping at scale is always the right call.

When should I use a managed scraping API instead of a headless browser?

When the overhead of managing browser infrastructure, anti-bot bypass, proxy rotation, and maintenance becomes larger than the value of having direct browser control. For testing your own applications, use a headless browser directly. For scraping third-party sites at scale, especially bot-protected ones, a managed API typically saves more engineering time than it costs.

Blog/ What is a headless browser? How it works, uses, and tools (2026)

June 11, 2026 · 13 min read

What is a headless browser? How it works, uses, and tools (2026)

Joel Olawanle

What is a headless browser? How it works, uses, and tools (2026)

If you have ever tried to scrape a modern website with a simple HTTP request and gotten back an empty shell with a <div id="app"></div> instead of the content you wanted, you have already run into the problem headless browsers solve.

Most of the web today renders its content through JavaScript after the initial page load. A standard HTTP client fetches the HTML response and stops there. A headless browser fetches the response, executes the JavaScript, waits for async calls to complete, and hands you the DOM in its final state — exactly what a real user would see.

This guide covers what headless browsers are, how they work, how to use them for scraping and testing, which ones to choose, and when it makes more sense to use a managed alternative instead.

What is a headless browser?

A headless browser is a web browser that runs without a graphical user interface. It can do everything a regular browser can — load pages, execute JavaScript, submit forms, click buttons, handle cookies, follow redirects — but it does all of this in the background without rendering anything on screen.

The name comes from the idea of removing the "head" from the browser: the visual layer that displays content to a user. What remains is the engine that processes pages and the APIs that let you interact with them programmatically.

Headless browsers are used primarily by developers for:

Web scraping — extracting data from pages that require JavaScript to load their content
Automated testing — running test suites against web applications without a visible browser window
Screenshot and PDF generation — capturing visual snapshots of pages for auditing, previews, or reports
Performance monitoring — measuring page load times and resource usage programmatically

Headless browser vs. regular browser

The difference is simpler than it sounds. A regular browser like Chrome or Firefox renders the visual layer of a page so a human can read and interact with it. A headless browser processes the same page but skips the rendering step.

	Regular Browser	Headless Browser
Renders visual UI	Yes	No
Executes JavaScript	Yes	Yes
Handles cookies and sessions	Yes	Yes
Can click, scroll, type	Via user interaction	Via code and APIs
Resource usage	Higher (renders visuals)	Lower (skips rendering)
Primary users	End users	Developers and automated systems
Debugging	Visual, intuitive	Log files and screenshots

The key point is that headless browsers are not simpler or less capable than regular browsers in terms of web functionality. They just skip the display step, which makes them faster and cheaper to run at scale.

How headless browsers work

When you load a page in a headless browser, the same pipeline runs as in a regular browser — minus the final render to screen.

DNS resolution and HTTP request. The browser resolves the domain, connects to the server, and fetches the HTML response.
HTML parsing. The browser builds the initial DOM tree from the HTML.
Resource loading. Linked CSS, JavaScript files, images, and other assets are fetched.
JavaScript execution. Scripts run. For modern frameworks like React, Vue, and Angular, this is where the actual page content gets inserted into the DOM.
Async operations. API calls, data fetches, and dynamic content loading complete.
DOM finalization. The page reaches its final state — the same state a real user would see after everything loads.
Interaction. Your code can now read the DOM, click elements, fill forms, scroll, and extract data.

The key difference from a plain HTTP request is steps 3 through 6. A simple requests.get() or fetch() call gets you the raw HTML from step 1 and stops. A headless browser runs the full pipeline.

Headless browsers vs. browser automation tools

This distinction trips a lot of developers up. Headless browsers and browser automation tools are different things, though they are almost always used together.

A headless browser is the engine: Chrome, Firefox, or Chromium running without a GUI.

A browser automation tool is the API layer that lets you control the browser from code: tell it where to navigate, what to click, what to type, and what to extract.

Think of it like this: the headless browser is the car engine. The automation tool is the steering wheel and pedals. You need both.

The main automation tools in use today:

Playwright (by Microsoft) — works with Chromium, Firefox, and WebKit. Supports Python, JavaScript, Java, and .NET. Considered the most modern and actively maintained option. Recommended for new projects.
Puppeteer (by Google) — works with Chrome and Chromium only. JavaScript and Node.js. Slightly lower-level than Playwright but very widely used.
Selenium — the oldest and most established option. Works with Chrome, Firefox, Safari, and Edge. Supports more languages than any other tool. More verbose than Playwright but excellent for testing frameworks that have built on top of it.
Playwright is the recommended starting point for new scraping or testing projects in 2026. It has the cleanest API, the best async support, and actively maintained stealth and browser control features.

How to use a headless browser for web scraping

Here is a minimal Playwright example in Python that scrapes product data from a page:

# pip install playwright
# playwright install chromium
from playwright.sync_api import sync_playwright

def scrape_products(url: str) -> list[dict]:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        page.goto(url, wait_until="networkidle")

        products = page.eval_on_selector_all(
            ".product",
            """items => items.map(item => ({
                name:  item.querySelector('.product-name')?.innerText || '',
                price: item.querySelector('.price')?.innerText || '',
            }))"""
        )

        browser.close()
        return products

data = scrape_products("https://www.scrapingcourse.com/ecommerce/")
print(data)

# Output
[
    {'name': 'Abominable Hoodie', 'price': '$69.00'},
    {'name': 'Adrienne Trek Jacket', 'price': '$57.00'},
    # ...
]

The same example in JavaScript with Puppeteer:

// npm install puppeteer
const puppeteer = require('puppeteer');

async function scrapeProducts(url) {
    const browser = await puppeteer.launch({ headless: true });
    const page = await browser.newPage();

    await page.goto(url, { waitUntil: 'networkidle2' });

    const products = await page.$$eval('.product', items =>
        items.map(item => ({
            name:  item.querySelector('.product-name')?.innerText || '',
            price: item.querySelector('.price')?.innerText || '',
        }))
    );

    await browser.close();
    return products;
}

scrapeProducts('https://www.scrapingcourse.com/ecommerce/')
    .then(console.log);

Both examples do the same thing: launch a headless browser, navigate to the page, wait for JavaScript to finish loading content, extract the data, and close the browser.

Waiting for content to load

The most common mistake with headless browser scraping is not waiting long enough for content to appear. Modern pages load data asynchronously and content that looks instant in a real browser may take several API calls to populate.

Playwright offers several wait strategies:

# wait for the network to go idle (no requests for 500ms)
page.goto(url, wait_until="networkidle")

# wait for a specific element to appear in the DOM
page.wait_for_selector(".product-name")

# wait for a specific API call to complete
with page.expect_response("**/api/products**"):
    page.click(".load-products-button")

# simple time-based wait (last resort)
page.wait_for_timeout(2000)

wait_for_selector is usually the most reliable option. Wait for the element you actually want to scrape rather than waiting a fixed number of milliseconds.

Interacting with pages before scraping

Some data only appears after user interaction. Cookie banners, load more buttons, search forms, tabs that hide content by default. Playwright handles all of this:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    page.goto("https://example.com/search")

    # dismiss cookie banner
    page.click("button:has-text('Accept all')")

    # fill the search form
    page.fill("input[name='q']", "wireless headphones")
    page.click("button[type='submit']")

    # wait for results
    page.wait_for_selector(".search-results")

    # scroll down to load more results
    page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
    page.wait_for_timeout(1000)

    # extract results
    results = page.eval_on_selector_all(
        ".product-card",
        "items => items.map(i => ({ name: i.querySelector('h2')?.innerText, price: i.querySelector('.price')?.innerText }))"
    )

    browser.close()
    print(results)

How to use a headless browser for testing

For testing, the same browser automation API is used but the goal is verifying behavior rather than extracting data.

from playwright.sync_api import sync_playwright, expect

def test_login_flow():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        page.goto("https://example.com/login")

        # fill the login form
        page.fill("input[name='email']", "[email protected]")
        page.fill("input[name='password']", "password123")
        page.click("button[type='submit']")

        # verify the redirect happened
        page.wait_for_url("**/dashboard")

        # verify the dashboard loads
        expect(page.locator("h1")).to_have_text("Welcome back")

        browser.close()

test_login_flow()

Playwright's expect assertions wait for conditions to be true rather than failing immediately, which makes tests more reliable on real applications where things load asynchronously.

For continuous integration, run headless browser tests as part of your CI pipeline. Playwright can generate HTML reports, screenshots on failure, and video recordings of test runs for debugging.

Common use cases

Web scraping

Price monitoring — e-commerce pages heavily rely on JavaScript to load product data. A headless browser renders the full page before extraction.
Lead generation — scraping business directories and professional platforms that paginate via JavaScript or require interaction.
News aggregation — collecting articles from publications that load content dynamically.
Market research — tracking competitor product launches, pricing changes, and content updates at scale.
AI training data — collecting large volumes of web content for LLM training datasets.

Automated testing

End-to-end testing — testing complete user flows from login through checkout.
Regression testing — running test suites automatically on every code commit to catch regressions.
Cross-browser testing — verifying behavior across Chromium, Firefox, and WebKit.
Performance testing — measuring page load times and resource consumption in automated runs.
Visual regression testing — comparing screenshots between releases to catch unintended UI changes.

Screenshot and PDF generation

Page previews — generating thumbnails or previews of URLs for link sharing or dashboards.
PDF reports — converting web-based reports to PDF for distribution.
Audit trails — capturing page states at specific points for compliance or debugging.

Comparing headless browser tools

	Playwright	Puppeteer	Selenium
Maintained by	Microsoft	Google	Open source community
Browsers supported	Chromium, Firefox, WebKit	Chrome, Chromium	Chrome, Firefox, Safari, Edge
Languages	Python, JS, Java, .NET	JavaScript only	Python, JS, Java, Ruby, C#, and more
API style	Modern, async-first	Async, slightly lower-level	More verbose, older API design
Auto-wait	Yes, built in	Manual waits required	Manual waits required
Network interception	Yes	Yes	Limited
Recommended for	New projects, scraping, E2E testing	Chrome-specific scraping, existing codebases	Legacy testing infrastructure, multi-language teams
Status	Actively developed	Actively developed	Actively developed

PhantomJS is not included here because it was discontinued in 2018 and should not be used for new projects. If you have existing PhantomJS code, migrate to Playwright or Puppeteer.

Why headless browsers get detected and blocked

This is where most scraping tutorials stop being honest. Headless browsers are detectable. Anti-bot systems like Cloudflare, DataDome, and PerimeterX specifically look for them.

Here is what they check:

navigator.webdriver flag. Headless browsers set this to true by default. It is one of the first signals anti-bot systems check.
User agent string. Headless Chrome's default user agent includes HeadlessChrome. Sites check for this string explicitly.
Missing browser APIs. Real browsers have extensions, plugins, and other APIs that headless environments do not populate. An empty navigator.plugins list is a strong signal.
JavaScript execution timing. Real users take time between actions. Automated scripts execute with machine-level precision. Behavioral analysis catches this.
WebGL and Canvas fingerprinting. As covered in our WebGL fingerprinting article, headless browsers use software renderers rather than real GPU hardware, which produces a characteristically different rendering output.
TLS fingerprinting. The specific cipher suites and TLS extensions a browser advertises during the handshake are unique to browser types. Headless Chrome has a different TLS fingerprint than regular Chrome.

You can patch some of these. Playwright Stealth and similar plugins address the most obvious ones. But this is an ongoing arms race. Anti-bot vendors study open-source stealth tools and update their detection accordingly.

Advantages and limitations of headless browsers

Advantages

JavaScript rendering. The core capability. Pages that would return empty HTML to a plain HTTP request return full content to a headless browser.
Page interaction. Clicks, scrolls, form submissions, drag and drop. Anything a real user can do, you can automate.
Accurate representation. You get what the user sees, not a partial or pre-rendered version of the page.
Screenshot and PDF output. Built into Playwright and Puppeteer. Useful for auditing, previews, and visual testing.
Wide language support. Playwright supports Python, JavaScript, Java, and .NET. Selenium supports even more.

Limitations

Memory and CPU overhead. Each browser instance uses 200 to 400 MB of RAM. Running many concurrent instances pushes hardware limits quickly.
Startup latency. Launching a browser process takes time. For high-frequency scraping, this overhead adds up.
Detection. As covered above, headless browsers are identifiable by anti-bot systems. Staying ahead of detection requires ongoing maintenance.
Debugging difficulty. Without a visible interface, tracking what is happening requires log files, screenshots, or running in headful mode temporarily.
Proxy and anti-bot infrastructure is separate. Headless browsers handle rendering. They do not handle proxy rotation, CAPTCHA solving, or anti-bot bypass. You need to build or integrate those separately.

Common challenges and how to handle them

Challenge 1: Getting blocked

Anti-bot systems flag headless browsers based on the signals described above.

What to try: Use stealth plugins (playwright-extra with playwright-stealth for Python or puppeteer-extra-plugin-stealth for Node.js), add a realistic user agent, randomize interaction timing, and use residential proxies rather than data center IPs.

The honest limitation: Stealth plugins help against basic detection but do not reliably bypass sophisticated systems like Cloudflare's JavaScript challenge. Anti-bot vendors study open-source stealth tools and update their detection.

Challenge 2: Performance at scale

Running 20 concurrent Chromium instances on a single machine pushes RAM limits fast.

What to try: Use CONCURRENCY_CONTEXT or CONCURRENCY_PAGE models instead of separate browser instances where isolation is not required. Playwright's BrowserContext is significantly lighter than a new browser per URL. Close pages and contexts as soon as you are done with them.

Challenge 3: Dynamic content not loading

Data appears in the real browser but not in your scraper.

What to try: Replace time-based waits (wait_for_timeout) with condition-based waits (wait_for_selector, wait_for_response). Check whether the data comes from an API call and consider intercepting that request directly instead of parsing the DOM.

Challenge 4: Debugging without a visual interface

When something goes wrong, it is hard to see what the browser is actually doing.

What to try: Switch to headless=False temporarily to watch the browser execute your script. Use page.screenshot() at key points in your code to capture what the page looks like. Enable verbose logging.

When to use a managed scraping API instead

Headless browsers are the right choice when you need:

Low-level browser control for specific automation tasks
Integration into a testing framework
Local development and prototyping
Complete control over the browser environment

A managed scraping API is the right choice when you need:

Reliable scraping on bot-protected sites without ongoing maintenance
Anti-bot bypass, proxy rotation, and CAPTCHA solving handled automatically
Clean structured JSON output without writing CSS selectors or parsers
Scaling beyond what local hardware can support
Reducing the engineering time spent maintaining browser infrastructure

Spidra handles the headless browser layer, anti-bot bypass, residential proxy rotation, and AI-powered structured extraction through a single API. The same request that works on an open page works on a Cloudflare-protected page without any changes.

pip install spidra

from spidra import SpidraClient, ScrapeParams, ScrapeUrl
import os

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[ScrapeUrl(url="https://www.scrapingcourse.com/ecommerce/")],
    prompt="Extract all product names and prices",
    output="json",
))

print(job.result.content)
# [{'name': 'Abominable Hoodie', 'price': '$69.00'}, ...]

No browser to launch. No selectors to write. No proxy to configure. No stealth plugins to maintain.

For scraping that needs page interaction first, Spidra's browser actions replace the Playwright interaction code:

from spidra import BrowserAction

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[
        ScrapeUrl(
            url="https://example.com/search",
            actions=[
                BrowserAction(type="click", value="Accept cookies"),
                BrowserAction(type="type", selector="input[name='q']", value="wireless headphones"),
                BrowserAction(type="click", value="Search button"),
                BrowserAction(type="wait", duration=1500),
            ]
        )
    ],
    prompt="Extract all product names and prices from the search results",
    output="json",
))

Headless browser vs. managed scraping API: When to use which

Situation	Use
Automated testing of your own web application	Headless browser (Playwright)
Scraping open, static pages locally	Headless browser
Scraping bot-protected sites reliably	Managed scraping API
Need clean structured JSON without writing parsers	Managed scraping API
Need anti-bot bypass, proxy rotation, CAPTCHA solving	Managed scraping API
Scaling beyond local hardware limits	Managed scraping API
CI/CD pipeline for end-to-end tests	Headless browser (Playwright)
Production data pipeline from third-party sites	Managed scraping API

Frequently asked questions

A regular browser renders the visual interface so a human can see and interact with it. A headless browser processes pages and executes JavaScript exactly the same way but skips rendering the visual layer. Both are fully capable of loading modern web pages including JavaScript-heavy ones. Headless browsers are faster and use less memory because they skip the display step.

Both control Chrome/Chromium in headless mode. Playwright additionally supports Firefox and WebKit, works in Python/Java/.NET as well as JavaScript, has built-in auto-wait that removes most manual timing code, and has more active development. Puppeteer is JavaScript/Node.js only, Chrome/Chromium only, and requires more manual wait management. For new projects Playwright is the better choice. For existing Puppeteer codebases, switching is worthwhile but not urgent.

Share this article

Guides

Get structured data from popular websites

Learn how to get structured data from popular websites like Amazon using a JSON Schema and AI prompt, no selectors or proxies required.

July 8, 2026 · 5 min read

Guides

Spidra crawl API: how to crawl an entire website and extract data

Discover and extract data from entire websites with Python and Node.js. Covers re-extraction, authenticated crawling, and proxy routing.

June 24, 2026 · 15 min read

Guides

Spidra browser actions: complete guide to clicking, scrolling, and interacting before scraping

Complete guide to Spidra browser actions. Learn how to click, scroll, type, and use forEach with real examples.

June 23, 2026 · 15 min read

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.

What is a headless browser? How it works, uses, and tools (2026)

What is a headless browser?

Headless browser vs. regular browser

How headless browsers work

Headless browsers vs. browser automation tools

How to use a headless browser for web scraping

Waiting for content to load

Interacting with pages before scraping

How to use a headless browser for testing

Common use cases

Web scraping

Automated testing

Screenshot and PDF generation

Comparing headless browser tools

Why headless browsers get detected and blocked

Advantages and limitations of headless browsers

Advantages

Limitations

Common challenges and how to handle them

Challenge 1: Getting blocked

Challenge 2: Performance at scale

Challenge 3: Dynamic content not loading

Challenge 4: Debugging without a visual interface

When to use a managed scraping API instead

Headless browser vs. managed scraping API: When to use which

Frequently asked questions

Share this article

Related posts

Get structured data from popular websites

Spidra crawl API: how to crawl an entire website and extract data

Spidra browser actions: complete guide to clicking, scrolling, and interacting before scraping

Start scraping for free.