In late 2025, we launched Spidra in beta. The goal has always been to make it easy for both developers and non-technical users to extract data from any website using plain English and real browser automation, without writing selectors or maintaining brittle scraping scripts.
Today, Spidra has grown significantly. Here is a full rundown of everything we have shipped recently.
forEach browser action
When Spidra opens a page, it does not just grab the HTML and leave. You can define a sequence of steps that run before extraction happens. You can click a button, type into a search field, scroll down to trigger lazy-loaded content, or dismiss a cookie banner.
We have now added a forEach action, which is currently the most powerful action in Spidra. One of our customers described the feature as “a game changer”.
Rather than scraping a page once and hoping all the data is on screen, forEach finds a set of repeating elements on the page and processes each one individually, combining all the results into a single output. Think of it as running a mini scrape on every item in a list.
A lot of the most valuable data on the web lives in grids and lists: product cards, hotel room options, job listings, search results, FAQ accordions. forEach was built for exactly this.
The three modes
The mode field controls how Spidra interacts with each element it finds.
Inline mode reads each element's content directly without clicking anything. This is best used when the data is already visible on the page inside each matched element, such as product cards, quote blocks, or table rows.
{
"url": "https://books.toscrape.com/catalogue/category/books/mystery_3/index.html",
"actions": [{
"type": "forEach",
"observe": "Find all book cards",
"mode": "inline",
"captureSelector": "article.product_pod",
"maxItems": 10,
"itemPrompt": "Extract title, price, and star rating. Return as JSON: {title, price, star_rating}"
}]
}Navigate mode follows the link on each element to its destination page, extracts the content there, then returns to the next element. You can use this when each card links to a detail page that has the full information you need.
{
"url": "https://books.toscrape.com/catalogue/category/books/mystery_3/index.html",
"actions": [{
"type": "forEach",
"observe": "Find all book title links in the product grid",
"mode": "navigate",
"captureSelector": "article.product_page",
"maxItems": 6,
"waitAfterClick": 800,
"itemPrompt": "Extract title, price, star rating, and availability. Return as JSON."
}]
}Click mode clicks each element and captures whatever opens on the same page, a modal, a drawer, or an accordion panel. Use this for hotel room cards, FAQ rows, or any pattern where the content expands in place rather than navigating away.
{
"url": "https://hotels.example.com/hotel/grand-plaza",
"actions": [{
"type": "forEach",
"observe": "Find all room category cards",
"mode": "click",
"captureSelector": "[role='dialog']",
"waitAfterClick": 1200,
"itemPrompt": "Extract room name, bed type, price per night, and amenities. Return as JSON."
}]
}Pagination
forEach can follow a next page button and keep collecting items across multiple pages before it stops. You define which button triggers the next page and how many additional pages to process:
{
"type": "forEach",
"observe": "Find all job listings",
"mode": "inline",
"captureSelector": ".job-card",
"maxItems": 50,
"itemPrompt": "Extract job title, company, and location as JSON",
"pagination": {
"nextSelector": "li.next > a",
"maxPages": 3
}
}maxPages is the number of additional pages beyond the first one. Setting maxPages: 3 means Spidra processes the starting page plus 3 more, covering 4 pages total. maxItems caps the total across all pages combined.
Per-element actions
The actions field inside forEach lets you run browser actions on each item after landing on it, before capturing its content. This is useful when the destination page needs a scroll to reveal the full description, or an extra click to expand a collapsed section:
{
"type": "forEach",
"observe": "Find all book title links in the product grid",
"mode": "navigate",
"captureSelector": "article.product_page",
"maxItems": 3,
"waitAfterClick": 1000,
"actions": [
{ "type": "scroll", "to": "50%" }
],
"itemPrompt": "Extract title, price, star rating, and the full product description. Return as JSON: {title, price, star_rating, description}"
}Learn more about this in our documentation.
It's also important to note that all of these are available in the Spidra UI:
Structured output
AI extraction is flexible, but the output is not always predictable. One run might return product_name, the next returns name, and sometimes a field disappears entirely because the AI was not confident enough to include it.
Once you try to push that data into a database or process it downstream, you end up writing defensive handling just to deal with the inconsistency.
Structured output fixes this at the source. You pass a schema (optionally alongside your prompt), and Spidra enforces that shape on every response.
{
"urls": [{ "url": "https://jobs.example.com/senior-engineer" }],
"prompt": "Extract the job details",
"schema": {
"type": "object",
"required": ["title", "company", "remote", "employment_type"],
"properties": {
"title": { "type": "string" },
"company": { "type": "string" },
"location": { "type": ["string", "null"] },
"remote": { "type": ["boolean", "null"] },
"salary_min": { "type": ["number", "null"] },
"salary_max": { "type": ["number", "null"] },
"employment_type": {
"type": ["string", "null"],
"enum": ["full_time", "part_time", "contract", null]
},
"skills": {
"type": "array",
"items": { "type": "string" }
}
}
}
}The response comes back with exactly the fields you defined:
{
"title": "Senior Software Engineer",
"company": "Acme Corp",
"location": "Austin, TX",
"remote": true,
"salary_min": 140000,
"salary_max": 180000,
"employment_type": "full_time",
"skills": ["Python", "React", "PostgreSQL", "Docker"]
}This can also be done via the Spidra UI:
Learn more about this in our docs.
Batch scraping
Scraping a list of URLs used to mean writing a loop, managing concurrency, tracking which requests failed, and building retry logic. Every project ends up with the same boilerplate. Batch scraping takes all of that off your plate.
You can now send up to 50 URLs in a single request, and Spidra processes them in parallel. You get a batchId back immediately and poll it to track progress:
curl --request POST \
--url https://api.spidra.io/api/batch/scrape \
--header 'Content-Type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--data '{
"urls": [
"https://example.com/product-1",
"https://example.com/product-2",
"https://example.com/product-3"
],
"prompt": "Extract product name, price, and availability",
"output": "json"
}'The status response shows exactly where each URL is and what came back from it:
{
"batchId": "b_01jrmxyz...",
"status": "running",
"totalItems": 3,
"completedItems": 2,
"failedItems": 0,
"items": [
{
"url": "https://example.com/product-1",
"status": "completed",
"data": { "name": "Widget Pro", "price": "$29.99", "available": true },
"creditsUsed": 2
},
{
"url": "https://example.com/product-2",
"status": "completed",
"data": { "name": "Widget Lite", "price": "$14.99", "available": false },
"creditsUsed": 2
},
{
"url": "https://example.com/product-3",
"status": "running"
}
]
}Each URL is independent inside the batch. If one fails, the others keep going. When some items fail, you do not resubmit the whole batch. A single retry call picks up only the failed items:
curl --request POST \
--url https://api.spidra.io/api/batch/scrape/BATCH_ID/retry \
--header 'x-api-key: YOUR_API_KEY'You can also cancel a batch mid-flight. Any URLs that have not started yet get cancelled and their reserved credits come back to your account automatically. Items already in progress finish, and you only pay for what actually ran:
curl --request DELETE \
--url https://api.spidra.io/api/batch/scrape/BATCH_ID \
--header 'x-api-key: YOUR_API_KEY'Everything that works on a single scrape, like structured output, stealth proxies, authenticated sessions, and screenshots, all apply uniformly across every URL in a batch request.
Learn more here.
Re-extract from a crawl you already ran
If you ran a crawl last week and now need different fields from those same pages, you do not have to crawl again. The crawl extract endpoint lets you run a fresh AI extraction on pages Spidra has already visited.
POST /api/crawl/:jobId/extract
json
{
"transformInstruction": "Extract only the product name, price, and availability from each page"
}Spidra fetches the stored content for every successful page in that job, runs your new instruction through the AI, and returns a new job with the results. You poll the returned jobId the same way you would any regular crawl job.
Only AI token costs apply. There is no base URL charge, no stealth cost, and no CAPTCHA charge.
What is coming next
We ship features around real workflows, usually within days of identifying a need. If something is missing from your pipeline or if you have a use case we have not covered yet, reach out at spidra.io/contact or start a conversation through the dashboard.
You can get started free at spidra.io. No credit card required.
