Blog/ Puppeteer real browser: How to avoid detection when scraping
May 28, 2026 · 8 min read

Puppeteer real browser: How to avoid detection when scraping

Joel Olawanle
Joel Olawanle
Puppeteer real browser: How to avoid detection when scraping

Puppeteer is one of the most popular browser automation libraries for Node.js. It lets you control a real Chrome browser using JavaScript, making it useful for tasks like automated testing, web scraping, form filling, and interacting with JavaScript-heavy websites.

Out of the box, however, Puppeteer is designed for automation (not stealth). Browsers launched through Puppeteer expose detectable automation signals such as the WebDriver flag, missing browser APIs, and bot-like execution traces. Those signals are acceptable in testing environments, but during web scraping, they become exactly what anti-bot systems look for to identify and block automated traffic.

Puppeteer Real Browser is a library built to reduce those detection signals and make Puppeteer behave more like a regular human-operated browser. It patches common fingerprinting leaks, improves browser authenticity, and adds stealth features intended to help scrapers avoid basic bot detection systems.

In this tutorial, you’ll learn how Puppeteer Real Browser works, how it extends standard Puppeteer with stealth capabilities, how to use it to scrape data from a live website, and how well it performs when tested against real anti-bot protection systems.

What is Puppeteer real browser?

Puppeteer Real Browser is a Node.js library that wraps Puppeteer and applies a set of stealth patches to reduce its detectability. It relies on Rebrowser patches under the hood, which hide the WebDriver navigator field, replace missing browser APIs, patch bot-like stack traces, and mimic a real browser runtime environment more closely than standard Puppeteer.

Because it uses Puppeteer's API directly, it is a drop-in replacement. If you already have a working Puppeteer scraper, switching to Puppeteer Real Browser means changing the import and the browser launch call, not rewriting your scraping logic.

The library also includes Ghost Cursor, a cursor emulator that generates realistic mouse movement patterns rather than the robotic straight-line movements Puppeteer produces by default. This is particularly relevant for CAPTCHA interactions that check mouse behavior before deciding whether the visitor is human.

One important note before going further: the project has been discontinued and is no longer actively maintained. That context matters and we will come back to it.

How Puppeteer real browser enhances stealth

Standard Puppeteer fails basic fingerprinting checks on sites that test for automation signals. It reveals WebDriver usage, missing browser extensions, and other identifiers that flag the session as automated before your scraper ever touches the data you need.

Puppeteer Real Browser patches the most obvious of these. In headless mode with a custom User-Agent it passes fingerprinting tests that standard Puppeteer fails. The Rebrowser patches combined with Ghost Cursor's realistic mouse movement give it noticeably better stealth than base Puppeteer for sites running lighter bot detection.

How to scrape with Puppeteer real browser

You will extract product names and prices from an e-commerce test page.

Prerequisites

  • Node.js (latest LTS)
  • puppeteer-real-browser
npm install puppeteer-real-browser

Step 1: Scrape product data

Import the library, launch the browser, navigate to the target page, and extract product data using standard Puppeteer selectors:

// npm install puppeteer-real-browser
const { connect } = require('puppeteer-real-browser');

const scraper = async () => {
    const { browser, page } = await connect({
        headless: true,
    });

    await page.goto('https://www.scrapingcourse.com/ecommerce/');
    await new Promise((resolve) => setTimeout(resolve, 3000));

    const products = await page.$$eval('.product', (items) => {
        return items.map((item) => ({
            name:  item.querySelector('.product-name')?.innerText || '',
            price: item.querySelector('.price')?.innerText || '',
        }));
    });

    console.log(products);
    await browser.close();
};

scraper();
// Output
[
    { name: 'Abominable Hoodie', price: '$69.00' },
    { name: 'Artemis Running Short', price: '$45.00' },
    // ... rest of results
]

That works. The API feels exactly like standard Puppeteer, which is the point. Now test what actually matters: whether the stealth patches hold up against real protection.

Step 2: Testing against anti-bot protection

Puppeteer Real Browser's stealth works best in non-headless mode. Configure it with turnstile: true to activate the automatic Turnstile CAPTCHA clicker, defaultViewport: null for a full browser window, and --start-maximized to mimic a real user's screen:

// npm install puppeteer-real-browser
const { connect } = require('puppeteer-real-browser');

const scraper = async () => {
    const { browser, page } = await connect({
        headless: false,
        turnstile: true,
        connectOption: {
            defaultViewport: null,
        },
        args: ['--start-maximized'],
    });

    await page.goto('https://www.scrapingcourse.com/antibot-challenge/');
    await new Promise((resolve) => setTimeout(resolve, 20000));

    const content = await page.content();
    console.log(content);

    await browser.close();
};

scraper();

Running this against the anti-bot challenge page shows the problem. The browser opens in GUI mode, navigates to the page, and gets stuck on the challenge. The Turnstile clicker does not fire. The session sits on the block page for the full 20 seconds and never reaches the content.

Despite passing fingerprinting tests in controlled conditions, Puppeteer Real Browser still leaks enough signals in live conditions to fail the JavaScript challenge that Cloudflare and similar systems run in the background.

The limitations of Puppeteer real browser

  • JavaScript challenges remain a blocker. The test above shows this directly. Fingerprinting test sites check for surface-level signals. Real anti-bot systems run deeper JavaScript-based challenges that check browser behavior, timing, and execution environment in ways that patching flags alone does not satisfy.
  • Open source means diminishing returns over time. Anti-bot vendors study public libraries. Any stealth tool that gets popular will eventually have its specific patches identified and added to the detection checklist. The window between a new patch version and the corresponding detection update keeps shrinking.
  • The project is discontinued. No active maintenance means no updates when anti-bot systems change their detection logic. Any evasion technique that works today has no guarantee of working in three months, and there is nobody updating the library when it stops.
  • No proxy infrastructure. Puppeteer Real Browser has no built-in proxy rotation or geo-targeting. All of that is your responsibility.
  • Resource-heavy at scale. Full browser instances are memory-intensive. Running many concurrent sessions pushes hardware limits quickly and makes parallel scraping expensive.

For sites without serious anti-bot protection, Puppeteer Real Browser is a solid upgrade over standard Puppeteer. For anything running modern bot detection, these limitations surface fast.

Getting past what Puppeteer real browser cannot handle

The test above points to something worth understanding. Puppeteer Real Browser got stuck on the challenge page, not because its patches are poorly written, but because the problem it is trying to solve keeps changing. Anti-bot systems update. Open-source patches catch up. Then the anti-bot systems update again.

Keeping up with that cycle yourself is what burns engineering time. The alternative is to move the anti-bot handling out of your code and into a service that maintains it for you.

Spidra handles the full stack at the API level. Every request runs through a real browser with residential proxy rotation across 50 countries, CAPTCHA solving, and fingerprinting that stays current with anti-bot updates automatically. You do not configure any of that. You send a URL and get back clean data.

spidra-ui.webp

Here is the same e-commerce page from the Puppeteer Real Browser tutorial, using Spidra's Node.js SDK:

npm install spidra-js
import { SpidraClient } from 'spidra-js';

const spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY });

const job = await spidra.scrape.run({
    urls: [{ url: 'https://www.scrapingcourse.com/ecommerce/' }],
    prompt: 'Extract all product names and prices',
    output: 'json',
});

console.log(job.result.content);
[
    { "name": "Abominable Hoodie", "price": "$69.00" },
    { "name": "Artemis Running Short", "price": "$45.00" }
]

Same data. No CSS selectors. No browser to launch. No fingerprints to manage.

Now the same request on the anti-bot challenge page that Puppeteer Real Browser could not get through:

const job = await spidra.scrape.run({
    urls: [{ url: 'https://www.scrapingcourse.com/antibot-challenge/' }],
    prompt: 'Extract the main heading',
    useProxy: true,
    proxyCountry: 'us',
});

console.log(job.result.content);
// { "heading": "You bypassed the Antibot challenge! :D" }

No configuration change between the open page and the protected one. The same request works on both. Because Spidra is a managed service, it stays current with anti-bot changes without you tracking library updates or applying patches.

Proxy usage is billed against your bandwidth quota separately, so there is no credit multiplier when bypass is needed.

Puppeteer Real Browser vs. Spidra

Puppeteer Real BrowserSpidra
JavaScript renderingYes, via patched PuppeteerYes, real browser built in
Cloudflare bypassFails on JS challengesBuilt in, automatic
DataDome / PerimeterXNot reliableBuilt in, automatic
Ghost Cursor / human-like mouseYesHandled internally
Proxy rotationNot built inBuilt in, 50 countries
Actively maintainedNo, discontinuedYes
Stays current with anti-bot updatesManual patches onlyManaged by Spidra
Structured outputRaw HTML, you parse itAI extraction, optional JSON schema
LanguageNode.jsNode.js, Python, Go, PHP, Ruby, and more
Best forLight scraping, basic fingerprint bypassProtected sites, production pipelines

Conclusion

Puppeteer Real Browser is a meaningful improvement over standard Puppeteer for sites that check surface-level automation signals. The drop-in replacement API means almost zero migration cost if you already use Puppeteer, and it does what it says on lighter targets.

The limits show up against real anti-bot protection, and the fact that the project is discontinued means those limits will only grow over time as detection systems evolve and the library does not keep up.

If you need reliable scraping on protected sites without maintaining the anti-bot layer yourself, Spidra handles that full stack automatically. The same code that works on open pages works on protected ones without any changes.

Get started free at spidra.io. No credit card required.

Frequently asked questions

It handles basic fingerprint checks and passes surface-level automation detection tests. It struggles with JavaScript-based challenges like those used by Cloudflare, DataDome, and PerimeterX. The anti-bot test in this tutorial shows it getting stuck on the challenge page even in non-headless GUI mode with all stealth options enabled.

Share this article

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.