Changelog

Stay up to date with the latest features, improvements, and bug fixes in Spidra.

Batch Scraping

If you have ever scraped a big list of URLs, you write a loop, add some retry logic, try to figure out which ones failed and which ones succeeded, and by the end of it half your code is just plumbing. None of that logic is unique to your product, you just end up writing it over and over.

We built batch scraping to take all of that off your plate. You hand us the list, we handle everything else.

Send up to 50 URLs in one request and Spidra processes them all in parallel. You get a batchId back immediately, and you poll that to check progress:

curl --request POST \
  --url https://api.spidra.io/api/batch/scrape \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --data '{
    "urls": [
      "https://example.com/product-1",
      "https://example.com/product-2",
      "https://example.com/product-3"
    ],
    "prompt": "Extract product name, price, and availability",
    "output": "json"
  }'

When you poll the status, you can see exactly where each URL is and what came back from it:

{
  "batchId": "b_01jrmxyz...",
  "status": "running",
  "totalItems": 3,
  "completedItems": 2,
  "failedItems": 0,
  "items": [
    {
      "url": "https://example.com/product-1",
      "status": "completed",
      "data": { "name": "Widget Pro", "price": "$29.99", "available": true },
      "creditsUsed": 2
    },
    {
      "url": "https://example.com/product-2",
      "status": "completed",
      "data": { "name": "Widget Lite", "price": "$14.99", "available": false },
      "creditsUsed": 2
    },
    {
      "url": "https://example.com/product-3",
      "status": "running"
    }
  ]
}

Each URL lives in its own little world inside the batch. If one fails, the others keep going. The batch only reaches completed once every single item has settled, one way or another.

When some items fail, you do not have to redo the whole thing. There is a retry endpoint that picks up only the failed items and queues them again. Everything that already worked stays exactly as it was. You call it and Spidra figures out what still needs doing.

curl --request POST \
  --url https://api.spidra.io/api/batch/scrape/BATCH_ID/retry \
  --header 'x-api-key: YOUR_API_KEY'

If you want to stop a batch early, you can cancel it. Any URLs that have not started yet get cancelled, and their reserved credits come back to your account automatically. The ones already in progress will finish and you only pay for what actually ran.

curl --request DELETE \
  --url https://api.spidra.io/api/batch/scrape/BATCH_ID \
  --header 'x-api-key: YOUR_API_KEY'

On credits: when you submit a batch, 2 credits per URL are reserved upfront. If a URL fails, you are not charged for it. AI token costs, CAPTCHA solving, and proxy usage are all calculated per item once it finishes, same as a regular scrape.

Everything you can do in a regular scrape works here too. You can pass a schema and every URL in the batch returns data in the same shape. You can use stealth mode with proxy rotation, target a specific country with proxyCountry, pass cookies for pages that need a login session, and take viewport or full-page screenshots. All of it applies uniformly across every URL in the batch.

Read the full API reference in the docs.

Structured Output

Here is a situation a lot of people run into. You set up a scrape, the AI does its job, and you get JSON back... but the field names are not what you expected. Or a field you needed is missing because the AI was not confident enough to include it. Or you run the same scrape twice and the shape is slightly different between runs.

The moment you try to push that data into a database or process it downstream, you start writing all sorts of defensive handling just to deal with the inconsistency.

We built structured output to fix that at the source. You tell Spidra exactly what shape you want, and it returns data in that shape, every time.

You do it by passing a schema alongside your prompt. The schema is standard JSON Schema, so if you have used it before it will feel familiar. Here is a real example:

curl --request POST \
  --url https://api.spidra.io/api/scrape \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --data '{
    "urls": [{ "url": "https://jobs.example.com/senior-engineer" }],
    "prompt": "Extract the job details",
    "schema": {
      "type": "object",
      "required": ["title", "company", "remote", "employment_type"],
      "properties": {
        "title":           { "type": "string" },
        "company":         { "type": "string" },
        "location":        { "type": ["string", "null"] },
        "remote":          { "type": ["boolean", "null"] },
        "salary_min":      { "type": ["number", "null"] },
        "salary_max":      { "type": ["number", "null"] },
        "employment_type": {
          "type": ["string", "null"],
          "enum": ["full_time", "part_time", "contract", null]
        },
        "skills": {
          "type": "array",
          "items": { "type": "string" }
        }
      }
    }
  }'

You do not need to set "output": "json" manually. When you pass a schema, Spidra sets that automatically.

The result comes back clean, with the exact fields you defined:

{
  "title": "Senior Software Engineer",
  "company": "Acme Corp",
  "location": "Austin, TX",
  "remote": true,
  "salary_min": 140000,
  "salary_max": 180000,
  "employment_type": "full_time",
  "skills": ["Python", "React", "PostgreSQL", "Docker"]
}

The required rule is the most important thing to understand here. Fields you put in required will always be present in the response, even when the AI cannot find them on the page. In that case it writes null rather than leaving the field out.

Fields you leave out of required may be omitted entirely if there is nothing to extract. So if you need a field to always show up in your output, even as null, put it in required. If you are fine with it just being absent when there is no data, leave it out of required and the AI will skip it rather than guess.

Making fields nullable. To let a field be either a value or null, pass the type as an array:

{ "type": ["string", "null"] }

This works for any type: string, number, boolean. It tells the AI it can write null if the page does not have that information.

Spidra validates your schema before running anything. If there is a structural problem, like the root not being type object, or the schema being nested too deeply, you get a 422 back before the job is ever queued (no credits are used).

Structured output also works with batch scraping. Pass a schema in a batch request and every URL in the batch returns data in the same shape.

Read the full docs here.

Re-extract data from a completed crawl job

Once a crawl job is done, all the page content is stored on our end. With the new extract endpoint you can run a fresh AI extraction on those stored pages at any time, just by sending a new instruction.

POST /api/crawl/:jobId/extract
{
  "transformInstruction": "Extract only the product name, price, and availability from each page"
}

Spidra fetches the saved markdown for every successful page in that job, runs your instruction through the AI, and creates a new job with the results. You poll the returned jobId the same way you would any regular crawl job.

This is useful when you realise you need different fields from a crawl you already ran, or when you want to reformat structured data without touching the site again. Only AI tokens are billed for extract jobs. There is no base URL charge, no stealth cost, and no CAPTCHA charge.

Read more in the Extract from Crawl docs.

forEach gets modes, pagination, and per-item extraction

The forEach action has been rebuilt with a lot more flexibility. You can now control exactly how it interacts with each element on the page, follow pagination to collect more items, and run a separate AI extraction per item before combining results.

Three interaction modes

{
  "type": "forEach",
  "observe": "Find all product cards",
  "mode": "inline"
}

click is the default. Spidra clicks each element and captures whatever opens, such as a modal, drawer, or accordion panel.

inline reads each element's content directly without clicking anything. This is the right choice for cards, table rows, list items, and other containers where the content is already visible.

navigate follows the link on each element to the destination page, captures the content there, and then comes back to continue with the next element.

Pagination

forEach can now follow a "next page" button to keep collecting elements across multiple pages before it stops.

{
  "type": "forEach",
  "observe": "Find all job listings",
  "mode": "inline",
  "captureSelector": ".job-card",
  "pagination": {
    "nextSelector": "button[aria-label='Next page']",
    "maxPages": 5
  },
  "maxItems": 50
}

Per-item AI extraction

For large datasets or when using navigate mode where each destination page has a lot of content, you can run a Gemini extraction on each item individually before the results are combined.

{
  "type": "forEach",
  "observe": "Find all speaker profile links",
  "mode": "navigate",
  "itemPrompt": "Extract speaker name, title, company, and bio as JSON"
}

Other new fields

  • captureSelector lets you point forEach at the exact container you want to capture, as a CSS selector, XPath, or plain English description. If you skip it, Spidra falls back to [role="dialog"] and then the full page.
  • maxItems caps how many elements are processed across all pages. The default is 50 and that is also the hard maximum.
  • waitAfterClick controls how long Spidra waits after clicking before it captures content. Default is 2500ms.
  • observe is now the preferred field for describing which elements to find. The older value and selector fields still work.

Smarter selector routing

All actions now automatically detect whether a selector is CSS or XPath vs natural language and route accordingly. CSS and XPath go straight to the browser engine, which is faster and does not use an AI call. Natural language descriptions still go through Stagehand as before. The selector field is the recommended way to specify a target for click, type, check, and uncheck actions.

Read more in the Browser Actions docs.

New: forEach action

You can now add a forEach action to any URL's actions array. It tells Spidra to find all matching elements on the page, interact with each one, and apply your extraction prompt to the content it reveals.

{
  "urls": [{
    "url": "https://example.com/listings",
    "actions": [
      { "type": "forEach", "value": "Find all listing detail links" }
    ]
  }],
  "prompt": "Extract name, price, and description for each listing",
  "output": "json"
}

This works well for hotel room cards, product grids, FAQ accordions, and any repeating UI pattern where the content you want is hidden behind a click or expand interaction.

Read more in the Browser Actions docs.

AI automation is now always on

Spidra has always used AI to power element finding and page interaction. Previously you had to enable this explicitly with an aiMode flag in your API requests or toggle it in the Playground. That is no longer the case. Every scrape now runs with full AI automation by default.

Natural language selectors work out of the box. You can describe elements in plain English and Spidra will find them, regardless of their CSS class or ID. If you had "aiMode": true in your API requests, you can safely remove it.

Playground improvements

The operation builder in the Playground has a redesigned action dropdown and a per-URL AI/CSS toggle, so you can choose between natural language and CSS selectors for each URL independently.

New Playground UI

Read more in the Playground docs.

Introducing Authenticated Scraping and Crawling

Introducing Authenticated Scraping and Crawling

A large portion of valuable web data sits behind login walls — dashboards, internal tools, admin panels, analytics views, private documentation, customer portals, and CRM systems. Until now, Spidra was limited to publicly available pages.

Today, we're introducing authenticated scraping and crawling, which allows Spidra to operate inside a logged-in session, just like a real user's browser.

Spidra uses cookie-based authentication, the same mechanism browsers use to maintain login sessions. Simply log in to the target website using your browser, copy the relevant cookies from DevTools, and pass them to Spidra. From that point on, all requests behave as if they're coming from your authenticated session.

This works in both the Playground and the API. Here's how it looks via the API:

curl --request POST \
  --url https://api.spidra.io/api/scrape \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --data '{
    "urls": [
      { "url": "https://app.example.com/dashboard" }
    ],
    "cookies": "session_id=abc123; auth_token=xyz789"
  }'

Authenticated crawling works the same way — cookies are applied at the start of the crawl and the session is preserved across all discovered pages, so Spidra can navigate links, pagination, and internal sections without re-authentication issues.

The API supports both standard cookie strings and raw DevTools pastes. Spidra automatically detects the format and parses it accordingly. Authentication cookies are never stored — they are used transiently for the duration of a job and discarded immediately afterward.

Read the full blog here.

Introducing Crawling via the Spidra API

Introducing Crawling via the Spidra API

The Spidra API now includes a comprehensive set of crawling endpoints. You can submit crawl jobs, track progress, and retrieve results — all from your own code.

Submit a crawl by providing a starting URL, a natural-language instruction describing which pages to discover, and how content should be extracted:

curl --request POST \
  --url https://api.spidra.io/api/crawl \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --data '{
    "baseUrl": "https://example.com/blog",
    "crawlInstruction": "Crawl all blog post pages",
    "transformInstruction": "Extract title, author, date, and content",
    "maxPages": 5
  }'

Spidra handles page discovery, navigation, captcha solving, and transformation automatically. The crawl runs asynchronously — you get back a job ID and can poll for progress or results:

{
  "status": "completed",
  "progress": {
    "pagesCrawled": 5,
    "maxPages": 10
  },
  "result": [
    {
      "url": "https://example.com/blog/post-1",
      "status": "success",
      "data": {
        "title": "First Post",
        "author": "John",
        "date": "2025-01-01"
      }
    }
  ]
}

The API also includes endpoints for fetching crawled pages independently, viewing job configuration details, and listing crawl history with pagination. The API uses the same crawling engine as the Playground, including automatic CAPTCHA handling, optional stealth mode with proxy rotation, and retry logic for failed pages.

Read the full blog here.

Captcha Solving and Stealth Mode Are Now Live for Crawling

Captcha Solving and Stealth Mode Are Now Live for Crawling

Crawling protected websites just got easier.

Spidra's crawler can now solve captchas automatically and run in stealth mode using residential proxy rotation. This reduces blocks, improves reliability, and makes it possible to crawl sites that were previously difficult to access.

When the crawler encounters a captcha, it detects the type and solves it automatically in the background. No manual steps are required.

Some websites also block or limit repeated requests from the same IP address. When stealth mode is enabled, traffic is routed through rotating residential proxies to reduce detection and avoid rate limits during larger crawls. These updates are especially useful for:

  • Websites with aggressive bot protection
  • Large or deep crawls that previously hit blocking limits
  • Competitor research where anonymity matters

Read the full blog here.

Dark Mode Is Now Available in Spidra

Dark Mode Is Now Available in Spidra

Dark mode is now live in Spidra. You can switch between light and dark mode anytime from the dashboard. The experience stays exactly the same while scraping, crawling, and reviewing results — it's simply easier on the eyes during long sessions.

This update is part of our ongoing beta improvements as we continue refining Spidra based on real usage and feedback.

Read the full blog here.

Introducing AI-Powered Crawling

Introducing AI-Powered Crawling

Until now, Spidra focused on no-code scraping — extracting clean content from specific URLs. Many users kept asking: "Can I crawl an entire website, not just individual pages?" Now you can.

In real workflows, you rarely know every URL you need upfront. Whether you're auditing a website, analyzing competitor content, or building datasets for research, manually listing pages doesn't scale. Crawling solves this by automatically discovering pages and extracting content at scale.

You provide a website, how many pages you want, and a natural-language instruction describing what content matters. Spidra then discovers valid pages based on your instructions, crawls each page using a real browser, extracts only meaningful content, and transforms everything into clean, structured output. All of this happens inside the UI — no code required.

AI-powered crawling includes full-site page discovery, clean content extraction without navigation noise, bulk results in the dashboard, ZIP export with proper filenames, crawl logs with per-page retries, and prompt-based refinement so you can adjust instructions and rerun transformations without starting over.

Read the full blog here.

Spidra Beta Is Now Live

Spidra Beta Is Now Live

After months of building, refining, and testing behind the scenes, Spidra is officially in beta. We're welcoming our first batch of early users to help shape the future of AI-powered web scraping.

Spidra is an AI-first web scraping platform that lets you extract data from any website using real browser automation and natural language prompts. Instead of writing selectors, scripts, or dealing with brittle crawlers, you tell Spidra what you want and it handles the clicking, scrolling, typing, and waiting for you.

Under the hood, Spidra uses large language models and real browser automation to navigate pages like a human, extract clean data, solve captchas, and stay undetected by anti-bot systems.

Web scraping hasn't evolved much in the last decade. Most tools still rely on selectors, brittle automations, and endless debugging. We wanted to rethink scraping from the ground up — what if you could simply describe what you need and the scraper actually understood the page, located the right data, and delivered it in clean, structured formats?

You can join the beta by signing up at app.spidra.io. It's free to get started.

Read the full blog here.

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.