Blog/ How to bypass IP bans when web scraping in 2026
May 25, 2026 · 10 min read

How to bypass IP bans when web scraping in 2026

Joel Olawanle
Joel Olawanle
How to bypass IP bans when web scraping in 2026

IP bans rarely announce themselves at the start of a scraping job. They show up in the middle of a run that was working fine five minutes ago.

Pages that returned 200s start coming back as 403 Forbidden or 429 Too Many Requests. Your success rate drops. Retries pile up. The scraper grinds to a halt before you finish collecting what you needed.

This guide covers why IP bans happen, what you can do about them yourself, and when it makes more sense to let a managed API handle it for you.

Why scrapers get IP banned

Sites run automated filters that watch request volume, traffic patterns, header signatures, and where traffic is coming from. When your requests cross a threshold, the site starts throttling, challenging, or outright blocking the IP.

Early blocks are usually temporary, but keep triggering them and they escalate into longer or permanent bans. Here is what causes them.

Rate limits

Every site has a limit on how many requests a single IP can send within a given time window. Cross it and you start seeing 429 Too Many Requests. Responses often include a Retry-After header telling you how long to wait. Ignore that and keep sending requests at the same rate, and the site escalates from throttling to blocking.

Bot and anomaly detection

Some defenses do not wait for high volume. They fingerprint the browser, check behavioral signals, analyze TLS handshakes, and look for patterns that do not match how a real user would interact with the page. When the traffic looks automated, the site responds with a challenge or a block before you ever hit a rate limit.

IP reputation

Some sites skip all of that and just block based on where the IP comes from. Data center IP ranges are well documented, and many sites deny them outright. Shared proxy pools carry the reputation of everyone who has used them before you. If previous users triggered bans on that IP, you inherit the problem.

Geo restrictions

Some targets deny access based on the country the IP resolves to. This often happens before your request even reaches the site's backend. You see redirects, block pages, or consistent denials on URLs that should work, and the only variable is your location.

How to bypass IP bans when web scraping

There are two paths. You can build the defenses into your own scraper and maintain them as targets change. Or you can use a managed scraping API that handles it automatically. The self-managed path works at small scale. It becomes a maintenance job at larger ones.

Here are the techniques, starting with what you can do yourself.

1. Use rotating proxies

Rotating proxies are the most direct solution when bans are triggered by a single IP sending too many requests. Each request goes through a different exit IP, so no single address accumulates enough traffic to trigger a ban.

The rotation strategy depends on what you are scraping. For independent pages, rotate per request so every fetch uses a fresh IP. For sessions that require cookies, like pagination or login flows, use sticky sessions that keep the same IP for the duration of the session and rotate only when the session ends.

Pool size matters. If you cycle through a small list of proxies too quickly, you hit per-IP limits and start seeing blocks again. Track failure rates per proxy. Any IP consistently returning 403s or block pages should be retired from the pool immediately so retries do not waste time on burned exits.

Here is a basic rotating proxy setup in Python:

import requests
from itertools import cycle

URL = "https://httpbin.io/ip"
TIMEOUT = 15

# load your proxy list, one IP:PORT per line
with open("proxies.txt", "r", encoding="utf-8") as f:
    proxies_list = [line.strip() for line in f if line.strip()]

proxy_pool = cycle(proxies_list)

for _ in range(4):
    proxy = next(proxy_pool)
    proxies = {
        "http": f"http://{proxy}",
        "https": f"http://{proxy}"
    }

    try:
        r = requests.get(URL, proxies=proxies, timeout=TIMEOUT)
        r.raise_for_status()
        print(r.text)
    except requests.RequestException as e:
        print(f"Proxy failed ({proxy}): {e}")
# Output
{ "origin": "195.158.8.123:3128" }
{ "origin": "156.246.90.81:80" }
{ "origin": "82.115.60.51:80" }

The origin changes on every request. That confirms rotation is working. Note that free proxy lists are unreliable. For real scraping operations, use premium residential proxies.

2. Slow down requests and handle rate limits

Most IP bans start as throttling. If you keep sending the same volume after hitting a 429, the site escalates from temporary throttling to a longer block. Controlling your request rate is the simplest way to prevent that escalation.

Set a per-domain concurrency limit so no single site receives too many parallel requests at once. When you hit a 429, back off with exponential delays and add jitter so retries do not line up into another burst. If the response includes a Retry-After header, respect it exactly. Cap retries per URL and add a cooldown for URLs that keep failing so you do not drain your proxy pool on a single blocked endpoint.

Deduplicating URLs before you crawl and caching stable pages also helps. Even a basic in-memory cache cuts traffic significantly during retry-heavy runs.

import time
import requests

url = "https://www.cloudflare.com/rate-limit-test/"

max_requests = 20
seconds_between = 1
cooldown_seconds = 65

session = requests.Session()

for i in range(1, max_requests + 1):
    r = session.get(url, timeout=30)
    print(f"req {i}/{max_requests} -> {r.status_code}")

    if r.status_code in (429, 403, 503):
        retry_after = r.headers.get("Retry-After", "")
        wait = int(retry_after) if retry_after.isdigit() else cooldown_seconds
        print(f"{r.status_code} received. Cooling down for {wait}s")
        time.sleep(wait)
        break

    time.sleep(seconds_between)

# check if access is restored after cooldown
r = session.get(url, timeout=30)
print("after cooldown ->", r.status_code)
# Output
req 1/20 -> 200
...
req 13/20 -> 429
429 received. Cooling down for 64s
after cooldown -> 200

The cooldown brings the request rate back under the limit. The final request returns 200.

3. Keep sessions and request profiles consistent

Many sites look at more than just the IP. They check whether your requests behave like a continuous browser session. If your headers, cookies, or user agent shift in the middle of a flow, it looks suspicious and can trigger a block even if your request rate is fine.

Persist cookies for each session and reuse them across related requests. Keep headers and user agent stable within the same session. If you need to rotate identities, do it between flows, not in the middle of pagination, login sequences, or list-to-detail scraping patterns. If the site ties session state to both the IP and cookies, sticky sessions keep both consistent.

import requests
from uuid import uuid4

base = "https://httpbin.org"

def run_session(user_agent: str):
    session = requests.Session()
    session.headers.update({"User-Agent": user_agent})

    # set a session cookie to simulate a stateful browsing flow
    sid = uuid4().hex[:8]
    session.get(
        f"{base}/cookies/set?sid={sid}",
        allow_redirects=True,
        timeout=30
    ).raise_for_status()

    # confirm cookie persists across requests in the same session
    cookies = session.get(f"{base}/cookies", timeout=30).json()
    headers = session.get(f"{base}/headers", timeout=30).json().get("headers", {})

    print("session id:", cookies.get("cookies", {}).get("sid"))
    print("user agent:", headers.get("User-Agent"))

# session 1 — identity held stable across all requests
run_session("UA-session-1")

# session 2 — new session, new identity
run_session("UA-session-2")
# Output
session id: c51f0894
user agent: UA-session-1
session id: 2cb738a0
user agent: UA-session-2

Each session keeps its identity consistent. A new session gets a fresh identity. Rotate between sessions, not within them.

4. Match the proxy type to the target

Proxy rotation does not help if the site blocks the type of IPs you are using before it even checks your request rate. Many sites filter known data center ranges early. If you are seeing instant 403s or block pages on the first request, your proxy type is likely the problem, not your request volume.

Switch to residential proxies when the target filters data center ranges. Residential IPs come from real ISP allocations and survive stricter IP filtering on sites that are aggressive about blocking automation.

If content or access rules vary by location, keep your IP geolocation consistent. Scraping the same pages from different countries can return different results or trigger geo-based blocks, and the inconsistency can also look suspicious.

Avoid reusing a small set of subnets across different projects. IP bans often apply to entire ranges. A pool that looks large but draws from only a few subnets degrades quickly once those ranges are flagged.

5. Use a managed scraping API

All of the techniques above work. The problem is that they require ongoing maintenance. Proxy pools decay as IPs get banned and subnets get flagged. Anti-bot systems update their detection logic and what worked last month may not work this month. Keeping up with all of that becomes a part-time job.

A managed scraping API handles all of this behind a single endpoint. You send a URL. You get back the page content. The API takes care of proxy rotation, session management, browser fingerprinting, CAPTCHA solving, and anti-bot bypass automatically.

Spidra is built for exactly this. Every request runs through a real headless browser with residential proxy rotation across 50 countries, automatic CAPTCHA solving, and browser fingerprint randomization. It handles Cloudflare and other protection systems without any configuration from you.

spidra-dashbaord.webp

Install the Python SDK:

pip install spidra

Here is how to scrape a heavily protected page like Zillow:

from spidra import SpidraClient, ScrapeParams, ScrapeUrl
import os

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[ScrapeUrl(url="https://www.zillow.com/us/condos/")],
    prompt="Extract all property listings with address and price",
    output="json",
    use_proxy=True,
    proxy_country="us",
))

print(job.result.content)
[
  {
    "address": "3233 NE 34th St #1517, Fort Lauderdale, FL 33308",
    "price": "$344,900"
  },
  {
    "address": "3040 N Sheffield Ave APT 2, Chicago, IL 60657",
    "price": "$575,000"
  }
]

No proxy configuration. No session management. No selector maintenance. The same request works on protected pages and open ones without any changes.

You can also go further with Spidra's browser actions if the page requires interaction before the data loads:

from spidra import SpidraClient, ScrapeParams, ScrapeUrl, BrowserAction
import os

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[
        ScrapeUrl(
            url="https://www.zillow.com/us/condos/",
            actions=[
                BrowserAction(type="click", value="Accept cookies"),
                BrowserAction(type="scroll", to="50%"),
            ],
        )
    ],
    prompt="Extract all property listings with address, price, and number of bedrooms",
    output="json",
    use_proxy=True,
    proxy_country="us",
))

print(job.result.content)

Proxy usage is billed against your bandwidth quota, not your credit balance, so there is no credit multiplier when bypass is needed.

Why self-managed proxy setups break down at scale

A self-managed proxy setup can work fine for a single target or a small crawl. The problems show up when you scale.

  • Proxy pools decay constantly. IPs get banned. Subnets get flagged. Success rates drop quietly and you do not always notice until retries have already spiked and your pipeline has fallen behind. Keeping the pool healthy means monitoring failure rates per exit IP, retiring burned addresses, sourcing replacements, and repeating that cycle indefinitely.
  • Rotation logic grows with your crawl complexity. Simple rotation works for stateless page fetches. Once your crawl involves login flows, pagination, or list-to-detail navigation, you need sticky sessions for some requests and rotation for others. Getting that right across multiple targets with different session rules is a lot of per-site tuning.
  • Browser rendering adds overhead. When targets use JavaScript or challenge pages, you need a full rendering layer on top of your proxy stack. Browser instances use significantly more CPU and memory than plain HTTP requests. Throughput drops and debugging slows down because you are inspecting rendered browser output rather than status codes.
  • Anti-bot systems keep changing. The setup that reliably bypasses Cloudflare today may not work after their next detection update. Keeping up means reading changelogs, testing configurations, and pushing fixes to production. That is maintenance that never ends.

For low-volume scraping or one-off research jobs, the self-managed path is reasonable. For pipelines that need to run reliably at scale and keep running as sites change around them, the ongoing maintenance cost is almost always higher than using an API.

Conclusion

IP bans follow a predictable pattern. A single IP sends too many requests, or requests that look automated, and the site starts blocking. The fixes are also predictable: rotate IPs, slow down, keep sessions consistent, and use the right proxy type for the target.

The harder part is keeping all of that working reliably over time as proxy pools decay and anti-bot systems update. For scraping at scale, a managed API like Spidra handles the full stack automatically so you can focus on the data rather than the infrastructure.

Get started free at spidra.io. No credit card required.

Frequently asked questions

It depends on the site and what triggered the ban. Soft bans from hitting rate limits often lift within minutes to hours, especially if you back off completely. Harder bans triggered by bot detection or IP reputation checks can last days, weeks, or permanently for that IP. The safest approach is to rotate to a clean IP rather than waiting for a ban to expire.

Share this article

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.