Blog/ Web scraping with Scrapling: 2026 tutorial
May 26, 2026 · 7 min read

Web scraping with Scrapling: 2026 tutorial

Joel Olawanle
Joel Olawanle
Web scraping with Scrapling: 2026 tutorial

Website redesigns break scrapers. A class name changes, a layout shifts, a new wrapper div appears, and your carefully written selectors start returning nothing.

Scrapling is a Python library that tries to solve this with an adaptive selector feature that tracks elements across structure changes, so you do not have to manually update your code every time a site updates its HTML.

In this tutorial, you will learn how Scrapling works, how to use it to extract product data from a real e-commerce page, and how its stealth mode handles bot-protected sites.

What is Scrapling?

scrapling.jpg

Scrapling is an open-source Python web scraping library built around adaptive selectors. When you target an element, and the site later changes its structure, Scrapling uses a similarity algorithm to find the same element in its new location without you updating the selector.

Beyond that, it handles standard HTML parsing with chainable selectors, supports JavaScript rendering through Playwright, and has a built-in stealth mode for bypassing basic anti-bot checks.

Key features

  • Adaptive parsing. The core differentiator. When a site changes a class name or restructures its DOM, Scrapling auto-updates the selector fingerprint so the same extraction logic keeps working.
  • Stealth mode. Built-in bypass for Cloudflare Turnstile. Works on basic challenges without additional configuration.
  • Session management. Persist a single session across multiple requests, useful for login flows or reusing a bypassed session.
  • JavaScript rendering. Playwright integration for pages that require a real browser to load their content.
  • Selector chaining. Select one element and Scrapling traverses the rest of the matching elements automatically, no manual loops needed.
  • Built-in regex. Pattern matching directly within element parsing, useful for pulling specific values out of mixed HTML content.
  • Command-line tool. Scrape and write output to files directly from the terminal without writing a script.

Web scraping with Scrapling

You will extract product names, prices, and image URLs from an e-commerce test page. Here is everything you need to get started.

Prerequisites

  • Python 3.10 or later
  • Scrapling
pip3 install "scrapling[all]"

Step 1: Get the page HTML

Start simple. Confirm Scrapling can fetch the target page and return its HTML before doing any extraction.

# pip3 install "scrapling[all]"
from scrapling.fetchers import Fetcher

# fetch the webpage
page = Fetcher.get("https://www.scrapingcourse.com/ecommerce/")

# print the raw HTML
print(page.html_content)
<!DOCTYPE html>
<html lang="en-US">
<head>
    <title>Ecommerce Test Site to Learn Web Scraping - ScrapingCourse.com</title>
</head>
<body>
    <p class="woocommerce-result-count">Showing 1-16 of 188 results</p>
    <ul class="products columns-4">
        <!-- products -->
    </ul>
</body>
</html>

Scrapling reached the page and returned the full HTML. Now extract the specific fields you need.

Step 2: Extract product data with the adaptive feature

Inspecting the page in DevTools shows:

  • Product name is inside .product-name
  • Price is inside .price
  • Product image is in an <img> tag inside .woocommerce-LoopProduct-link

Pass auto_save=True into page.css() to enable the adaptive feature. Scrapling saves a fingerprint of the matched element so it can find it again even if the selector changes in a future redesign:

# pip3 install "scrapling[all]"
from scrapling.fetchers import Fetcher

page = Fetcher.get("https://www.scrapingcourse.com/ecommerce/")

# select elements with auto_save to enable adaptive tracking
names  = page.css(".product-name", auto_save=True)
prices = page.css(".price", auto_save=True)
images = page.css(".woocommerce-LoopProduct-link img", auto_save=True)

products = []

for name, price, image in zip(names, prices, images):
    # use built-in regex to pull the numeric price from mixed HTML
    price_match = price.html_content.re(r"</span>([\d.,]+)")

    products.append({
        "name":  name.text,
        "price": f"${price_match[0]}",
        "image": image.attrib["src"],
    })

print(products)
# Output
[
    {
        "name": "Abominable Hoodie",
        "price": "$69.00",
        "image": "https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh09-blue_main.jpg"
    },
    {
        "name": "Artemis Running Short",
        "price": "$45.00",
        "image": "https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wsh04-black_main.jpg"
    },
    # ... rest of results
]

You have the product data. The auto_save=True parameter means if the site renames .product-name to .product-title next month, Scrapling will still find the right element using its saved fingerprint rather than throwing an error.

Step 3: Scrapling stealth mode for Cloudflare bypass

Scrapling has a StealthySession class for bypassing Cloudflare Turnstile. Import it, create a session with solve_cloudflare=True, and request the protected page:

# pip3 install "scrapling[all]"
from scrapling.fetchers import StealthySession

session = StealthySession(headless=True, solve_cloudflare=True)

with session:
    page = session.fetch("https://www.scrapingcourse.com/cloudflare-challenge/")
    print(page.html_content)
<h2>You bypassed the Cloudflare challenge! :D</h2>

That works. The session handles the Turnstile challenge and returns the real page content. For basic Cloudflare protection on low-volume scraping, this does the job.

The limitations of Scrapling

Scrapling works well for what it is: a Python library you run yourself. The constraints below are worth understanding before you build a production pipeline on it.

  • Stealth coverage is narrow. The built-in bypass targets Cloudflare Turnstile specifically. It does not reliably handle DataDome, PerimeterX, Akamai, or more advanced Cloudflare configurations. High request rates can still get you blocked even on supported targets. And because Scrapling is open source, anti-bot vendors can study its fingerprint and update their detection to block it over time.
  • No proxy infrastructure. Scrapling has no built-in proxy rotation or geo-targeting. Sourcing proxies, building rotation logic, tracking failure rates per IP, and replacing burned exits are all things you manage yourself. At any meaningful volume, that becomes a significant side project.
  • Browser rendering is resource-heavy. JavaScript scraping requires a full Playwright instance. Each browser process uses 200 to 400 MB of RAM. Running many concurrent sessions means real memory and CPU overhead, which limits how much you can parallelize before hardware becomes the bottleneck.
  • DIY architecture. Scrapling is a library. There is no managed infrastructure. Concurrency limits, retries, failure handling, and scaling are things you design and build yourself.

These are not reasons to avoid Scrapling for the right use cases. For smaller jobs, internal tooling, or scraping sites you control, it is a capable choice. The limitations surface when you need to run reliably at volume against sites that are actively trying to stop you.

Going beyond Scrapling's limits

When the target site is protected, when you need to run at scale, or when you want clean structured data instead of raw HTML that still needs parsing, the missing pieces in Scrapling's stack add up quickly. You need proxy infrastructure, a more complete anti-bot bypass, and a way to extract structured fields without writing and maintaining selectors.

Spidra is built to handle that full stack. It runs every request through a real headless browser, rotates residential proxies across 50 countries automatically, solves CAPTCHAs, and instead of returning raw HTML for you to parse, it extracts exactly what you describe in plain English and returns clean, structured JSON.

Here is the same e-commerce page you scraped with Scrapling, using Spidra instead:

pip install spidra
from spidra import SpidraClient, ScrapeParams, ScrapeUrl
import os

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[ScrapeUrl(url="https://www.scrapingcourse.com/ecommerce/")],
    prompt="Extract all product names, prices, and image URLs",
    output="json",
))

print(job.result.content)
[
    {
        "name": "Abominable Hoodie",
        "price": "$69.00",
        "image": "https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh09-blue_main.jpg"
    },
    {
        "name": "Artemis Running Short",
        "price": "$45.00",
        "image": "https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wsh04-black_main.jpg"
    }
]

Same data. No CSS selectors. No regex. No HTML parsing. The site structure can change, and the extraction keeps working because it is based on what the content means, not where it sits in the DOM.

Now try the same request on the anti-bot protected page:

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[ScrapeUrl(url="https://www.scrapingcourse.com/antibot-challenge/")],
    prompt="Extract the main heading",
    use_proxy=True,
    proxy_country="us",
))

print(job.result.content)
# { "heading": "You bypassed the Antibot challenge! :D" }

No configuration changes between the open page and the protected one. The same request works on both. Spidra handles Cloudflare and other systems automatically, and because it is a managed service rather than an open-source library, it stays ahead of anti-bot updates without you doing anything.

Proxy usage is billed against your bandwidth quota separately so there is no credit multiplier when bypass is needed.

Scrapling vs. Spidra

ScraplingSpidra
Adaptive to site changesYes, similarity algorithmYes, AI reads content, not DOM position
Anti-bot bypassCloudflare Turnstile onlyCloudflare and more
Proxy rotationNot built inBuilt in 50 countries
JavaScript renderingYes, via PlaywrightYes, real browser built in
Structured outputYou write the parserAI extraction, optional JSON schema
Maintenance as anti-bots evolveYou keep it updatedHandled by Spidra
Language supportPython onlyPython, Node.js, Go, PHP, Ruby, and more
Best forSmaller jobs, selector resilienceProduction pipelines, protected sites, scale

Conclusion

Scrapling is a well-built library, and the adaptive selector feature solves a real problem. If you are running smaller scraping jobs and selector maintenance has been your main pain point, it is worth using.

When you start hitting protected sites, need to run at volume, or want structured data without writing a parser, the gaps in Scrapling's stack require you to build a lot of supporting infrastructure yourself. Spidra handles that full stack automatically and you can start with the same Python code you already know.

Get started free at spidra.io. No credit card required.

Frequently asked questions

For basic Cloudflare Turnstile it can work. For more advanced configurations, higher request volumes, or other anti-bot systems, it is not reliable. Scrapling is also open source, which means anti-bot vendors can study and update their detection to block it over time.

Share this article

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.