Blog/ How to scale web scraping with Playwright BrowserContext
June 3, 2026 · 9 min read

How to scale web scraping with Playwright BrowserContext

Joel Olawanle
Joel Olawanle
How to scale web scraping with Playwright BrowserContext

Running a separate Playwright browser instance for every URL you want to scrape is memory-intensive and does not scale. Each Chromium process is heavy. Spin up ten of them on the same machine and you are already pushing memory limits before your scraping logic runs.

Playwright's BrowserContext solves this by letting you run multiple isolated sessions inside a single browser process. Each context has its own cookies, local storage, and session state, but they all share one browser instance rather than launching separate processes.

In this tutorial you will learn what BrowserContext is, its advantages and limitations, how to use it for single and batched scraping, and how to add concurrency with asyncio.

What is Playwright BrowserContext?

A BrowserContext is an isolated browser session within a single Playwright browser instance. Think of it like separate browser profiles running inside the same Chrome window. Each context is fully isolated from the others in terms of cookies, authentication state, and local storage, but they share the underlying browser process and its resources.

Each context can also run multiple pages. Pages inside the same context share that context's session data. Pages across different contexts are completely isolated from each other.

This makes BrowserContext the middle ground between fully isolated browser instances (expensive) and pages sharing a single context (no isolation).

Advantages of BrowserContext

  • Memory efficiency. Multiple contexts share one browser process instead of spinning up separate Chromium instances. The memory cost per additional context is far lower than the cost per additional browser.
  • Session isolation. Each context has its own cookies and storage. You can run authenticated sessions for different accounts simultaneously, or keep sessions completely separate by design.
  • Fast to create. Creating a new context is much faster than launching a new browser. There is no process startup overhead.
  • Multiple pages per context. Each context can run as many pages as you need, all sharing that context's session state.

Limitations of BrowserContext

  • A browser crash takes everything down. Because all contexts share the same browser process, if that process crashes every context running inside it fails simultaneously. There is no isolation at the process level.
  • Memory still grows with scale. Sharing a browser process is more efficient than separate instances, but opening dozens of contexts still increases memory consumption on the host machine. At high enough numbers you hit the same performance wall as separate instances, just later.
  • Not ideal for fingerprint variation. Browser-level properties like user agent and TLS fingerprint are shared across all contexts in the same browser instance. If you need each worker to look like completely separate traffic, you need separate browser instances, not separate contexts.
  • Higher detection risk on the same domain. Because contexts share the same browser process and fingerprint, scraping the same domain from multiple contexts looks like one machine opening many sessions. Anti-bot systems that analyze traffic at the fingerprint level can connect those sessions.

How to use Playwright BrowserContext

Step 1: Single context with multiple pages

The simplest use of BrowserContext is opening several pages inside one context. Each page navigates independently but shares the same session:

# pip3 install playwright
# playwright install
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()

    # one context, three pages
    context = browser.new_context()

    page1 = context.new_page()
    page1.goto("https://www.scrapingcourse.com/ecommerce/")
    print(page1.title())

    page2 = context.new_page()
    page2.goto("https://www.scrapingcourse.com/ecommerce/page/2/")
    print(page2.title())

    page3 = context.new_page()
    page3.goto("https://www.scrapingcourse.com/ecommerce/page/3/")
    print(page3.title())

    context.close()
    browser.close()
# Output
Ecommerce Test Site to Learn Web Scraping - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 2 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 3 - ScrapingCourse.com

All three pages ran inside one browser process using one context.

Step 2: Multiple contexts for URL batching

For larger URL lists, split them across multiple contexts so each context handles a batch. This keeps session state separate per batch and lets you add concurrency later:

from playwright.sync_api import sync_playwright

def process_batch(browser, urls):
    context = browser.new_context()
    for url in urls:
        page = context.new_page()
        page.goto(url)
        print(page.title())
    context.close()

with sync_playwright() as p:
    browser = p.chromium.launch()

    urls = [
        "https://www.scrapingcourse.com/ecommerce/",
        "https://www.scrapingcourse.com/ecommerce/page/2/",
        "https://www.scrapingcourse.com/ecommerce/page/3/",
        "https://www.scrapingcourse.com/ecommerce/page/4/",
        "https://www.scrapingcourse.com/ecommerce/page/5/",
    ]

    # split URLs evenly between two contexts
    mid = len(urls) // 2
    process_batch(browser, urls[:mid])
    process_batch(browser, urls[mid:])

    browser.close()
# Output (sequential — batches run one after the other)
Ecommerce Test Site to Learn Web Scraping - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 2 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 3 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 4 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 5 - ScrapingCourse.com

The batches still run sequentially here. To get them running at the same time, switch to async mode.

Step 3: Concurrent contexts with asyncio

Switch to async_playwright and use asyncio.gather to run both context batches concurrently:

import asyncio
from playwright.async_api import async_playwright

async def process_batch(browser, urls):
    context = await browser.new_context()
    for url in urls:
        page = await context.new_page()
        await page.goto(url)
        print(await page.title())
    await context.close()

async def scraper():
    async with async_playwright() as p:
        browser = await p.chromium.launch()

        urls = [
            "https://www.scrapingcourse.com/ecommerce/",
            "https://www.scrapingcourse.com/ecommerce/page/2/",
            "https://www.scrapingcourse.com/ecommerce/page/3/",
            "https://www.scrapingcourse.com/ecommerce/page/4/",
            "https://www.scrapingcourse.com/ecommerce/page/5/",
        ]

        mid = len(urls) // 2

        # run both batches concurrently
        await asyncio.gather(
            process_batch(browser, urls[:mid]),
            process_batch(browser, urls[mid:]),
        )

        await browser.close()

asyncio.run(scraper())
# Output (unordered — batches run concurrently)
Ecommerce Test Site to Learn Web Scraping - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 3 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 2 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 4 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 5 - ScrapingCourse.com

The non-deterministic output order confirms the batches are running at the same time rather than waiting for each other.

Step 4: Extract actual data

Printing page titles confirms the structure works. Here is the same async setup doing real product extraction:

import asyncio
from playwright.async_api import async_playwright

async def scrape_products(browser, urls):
    context = await browser.new_context()
    results = {}

    for url in urls:
        page = await context.new_page()
        await page.goto(url, wait_until="networkidle")

        products = await page.eval_on_selector_all(
            ".product",
            """items => items.map(item => ({
                name:  item.querySelector('.product-name')?.innerText || '',
                price: item.querySelector('.price')?.innerText || '',
            }))"""
        )
        results[url] = products
        await page.close()

    await context.close()
    return results

async def scraper():
    async with async_playwright() as p:
        browser = await p.chromium.launch()

        urls = [
            "https://www.scrapingcourse.com/ecommerce/",
            "https://www.scrapingcourse.com/ecommerce/page/2/",
            "https://www.scrapingcourse.com/ecommerce/page/3/",
            "https://www.scrapingcourse.com/ecommerce/page/4/",
        ]

        mid = len(urls) // 2

        batch1, batch2 = await asyncio.gather(
            scrape_products(browser, urls[:mid]),
            scrape_products(browser, urls[mid:]),
        )

        all_results = {**batch1, **batch2}
        for url, products in all_results.items():
            print(f"\n{url}")
            for p in products[:2]:
                print(p)

        await browser.close()

asyncio.run(scraper())
# Output
https://www.scrapingcourse.com/ecommerce/
{'name': 'Abominable Hoodie', 'price': '$69.00'}
{'name': 'Adrienne Trek Jacket', 'price': '$57.00'}

https://www.scrapingcourse.com/ecommerce/page/2/
{'name': 'Beaumont Summit Kit', 'price': '$36.00'}
{'name': 'Breathe-Easy Tank', 'price': '$34.00'}

Where BrowserContext hits its limits

The async multi-context setup works well. The limits are structural rather than technical.

  • Memory still accumulates. Each additional context and each additional page inside a context adds memory. At high enough concurrency you hit the same performance wall as launching separate browser instances, just with a higher ceiling. On a typical server with 8 GB of RAM, you can run around 10 to 20 Chromium contexts before performance degrades noticeably.
  • It still runs on one machine. Multiple contexts in one browser process, or even multiple browser instances on one server, is still a single point of failure. No redundancy. No horizontal scaling. If the process crashes or the server goes down, your entire scraping job stops.
  • Anti-bot bypass is entirely separate. BrowserContext handles concurrency and session management. It does not handle Cloudflare, DataDome, proxy rotation, or CAPTCHA solving. Every protected site your contexts encounter is still your problem to handle.
  • You are still writing and maintaining selectors. The data extraction layer is still raw Playwright. You write the CSS selectors, handle missing elements, normalize formatting differences between pages, and update your parsing logic when the target site changes.

For scraping a moderate number of known URLs in a controlled environment, BrowserContext with asyncio is clean and effective. When the URL list grows, the sites become protected, or you need the job to keep running reliably over time, these constraints accumulate.

Scaling beyond BrowserContext with Spidra

The natural ceiling with BrowserContext is your hardware. The alternative is to move the browser infrastructure, anti-bot handling, and data extraction off your machine entirely.

Spidra's batch endpoint processes up to 50 URLs in parallel per request on cloud infrastructure. Every URL runs in a real browser with residential proxy rotation across 50 countries, automatic CAPTCHA solving, and AI-powered structured extraction. You do not manage any browser sessions or parse any HTML.

Here is the same multi-page product scraping task using Spidra's Python SDK:

pip install spidra
import asyncio
from spidra import SpidraClient, BatchScrapeParams
import os

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

urls = [
    "https://www.scrapingcourse.com/ecommerce/",
    "https://www.scrapingcourse.com/ecommerce/page/2/",
    "https://www.scrapingcourse.com/ecommerce/page/3/",
    "https://www.scrapingcourse.com/ecommerce/page/4/",
    "https://www.scrapingcourse.com/ecommerce/page/5/",
]

batch = spidra.batch.run_sync(BatchScrapeParams(
    urls=urls,
    prompt="Extract all product names and prices",
    output="json",
))

for item in batch.items:
    if item.status == "completed":
        print(f"\n{item.url}")
        for product in item.result.content[:2]:
            print(product)
    else:
        print(f"Failed: {item.url} — {item.error}")
# Output
https://www.scrapingcourse.com/ecommerce/
{'name': 'Abominable Hoodie', 'price': '$69.00'}
{'name': 'Adrienne Trek Jacket', 'price': '$57.00'}

https://www.scrapingcourse.com/ecommerce/page/2/
{'name': 'Beaumont Summit Kit', 'price': '$36.00'}
{'name': 'Breathe-Easy Tank', 'price': '$34.00'}

All 5 URLs run in parallel in the cloud. No browser to launch. No contexts to manage. No selectors to write or maintain. Each item in the response has its own status so partial failures are visible and handleable.

For protected pages, add use_proxy=True and anti-bot bypass is automatic:

batch = spidra.batch.run_sync(BatchScrapeParams(
    urls=urls,
    prompt="Extract all product names and prices",
    output="json",
    use_proxy=True,
    proxy_country="us",
))

For jobs where you do not know all the URLs upfront, the crawl endpoint discovers and processes pages automatically:

from spidra import SpidraClient, CrawlParams, PollOptions

job = spidra.crawl.run_sync(
    CrawlParams(
        base_url="https://www.scrapingcourse.com/ecommerce/",
        crawl_instruction="Follow all paginated product pages",
        transform_instruction="Extract all product names and prices from each page",
        max_pages=20,
    ),
    PollOptions(timeout=300),
)

for page in job.result:
    print(page.url, page.data)

BrowserContext vs. Spidra batch

Playwright BrowserContextSpidra Batch
Concurrencyasyncio within one browser processParallel cloud workers
Max scaleLimited by machine RAMUp to 50 URLs per request
Browser infrastructureYou manage locallyFully managed in cloud
Anti-bot bypassNot includedBuilt in, automatic
Proxy rotationNot includedBuilt in, 50 countries
Session isolationPer contextPer request
Single point of failureYes, process crash kills allNo, cloud infrastructure
Structured outputRaw HTML, you write parsersAI extraction, optional schema
Best forMedium-scale scraping, open pagesProtected sites, large-scale pipelines

Conclusion

Playwright's BrowserContext is the right tool when you need to run multiple isolated sessions without the memory overhead of launching a separate browser per session. The asyncio pattern with asyncio.gather adds concurrency cleanly, and the structure scales reasonably well up to the limits of your server's RAM.

Those limits show up when the URL list grows large, the targets are bot-protected, or you need the pipeline to keep running reliably without manual intervention. BrowserContext does not help with anti-bot bypass, proxy rotation, or data extraction. Those are all still your responsibility.

Spidra's batch endpoint handles all of it in the cloud so you can process more URLs with less code and no infrastructure to maintain.

Get started free at spidra.io. No credit card required.

Frequently asked questions

A browser instance is a full Chromium process. A BrowserContext is an isolated session inside that process with its own cookies and local storage. Multiple contexts share one browser process, making them significantly cheaper to create than additional browser instances while still providing session isolation between them.

Share this article

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.