Blog/ Spidra API tutorial: complete guide to web scraping with the Spidra API

June 9, 2026 · 18 min read

Spidra API tutorial: complete guide to web scraping with the Spidra API

Joel Olawanle

Spidra API tutorial: complete guide to web scraping with the Spidra API

Getting data from websites programmatically has always involved more work than it should. You write selectors, they break when the site updates. You try a headless browser, anti-bot protection blocks you. You get the data, but it is raw HTML and you still have to parse it into something useful.

The Spidra API is designed to solve all three of those problems in one place. You send a URL, describe what you want, and get back structured data. The browser rendering, CAPTCHA solving, proxy rotation, and AI extraction all happen on Spidra's side.

This guide walks through the entire API from authentication to crawling. By the end you will know how every endpoint works, what the response structure looks like, and how to build a real scraping pipeline around it.

Before you start

You need a Spidra account and an API key.

Sign up at spidra.io. The free plan includes 300 credits with no credit card required. Once you are in, go to app.spidra.io → Settings → API Keys and create a key.

Keep it somewhere safe. Every request you make to the API includes this key in the header.

How the API works

The Spidra API is a REST API with one base URL:

https://api.spidra.io/api

Every request is authenticated by including your API key in the x-api-key header. There are no bearer tokens, no OAuth flows, just a header on every request.

curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"urls": [{"url": "https://example.com"}]}'

One important thing to understand before you make your first request: Spidra jobs are asynchronous. When you submit a scrape, you do not get the data back immediately. You get a job ID. You then poll a status endpoint every few seconds until the job is complete and the data is ready.

This is by design. Browser rendering, CAPTCHA solving, and AI extraction take a few seconds. The async pattern means you are not holding a connection open the whole time.

The flow for every job type looks like this:

Submit the job. Receive a job ID in the response.
Poll the status endpoint every 2 to 5 seconds.
When status is completed, read your results.

Now let us go through each part of the API.

Authentication

Every request needs the x-api-key header. That is it.

-H "x-api-key: YOUR_API_KEY"

If the key is missing or invalid, the API returns a 401. If your credits are exhausted, you get a 403.

Here is the full set of response codes you will encounter:

Code	What it means
`200`	Request completed successfully
`202`	Job queued successfully. Poll for results.
`400`	Bad request. Missing or invalid parameters.
`401`	API key missing, invalid, or expired
`403`	Credits exhausted or plan limit reached
`404`	Job or resource not found
`429`	Rate limit hit. Back off and retry.
`500`	Something went wrong on Spidra's side

All errors come back in the same format:

{
  "status": "error",
  "message": "Detailed explanation of what went wrong"
}

Scraping a single page

The scrape endpoint is where most people start. You give it one to three URLs and it returns structured data from each one.

Endpoint: POST /api/scrape

The minimal request

The only required field is urls, which takes an array of URL objects. Each URL object requires a url field and optionally takes an actions array for browser interactions.

curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [{"url": "https://news.ycombinator.com"}]
  }'

Response:

{
  "status": "queued",
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Scrape job has been queued. Poll /api/scrape/550e8400... to get the result."
}

Save that jobId. You need it to check on the job.

Polling for results

Call GET /api/scrape/{jobId} every few seconds until the status changes.

curl https://api.spidra.io/api/scrape/550e8400-e29b-41d4-a716-446655440000 \
  -H "x-api-key: YOUR_API_KEY"

While the job is running, you will see something like this:

{
  "status": "active",
  "progress": {
    "message": "Processing content with AI...",
    "progress": 0.6
  },
  "result": null,
  "error": null
}

The progress field goes from 0 to 1 as the job moves through its stages: loading the browser, executing actions, solving CAPTCHAs, running AI extraction.

When it finishes:

{
  "status": "completed",
  "progress": {
    "message": "Scrape completed successfully",
    "progress": 1
  },
  "result": {
    "content": "...",
    "data": [
      {
        "url": "https://news.ycombinator.com",
        "title": "Hacker News",
        "markdownContent": "...",
        "success": true,
        "screenshotUrl": null
      }
    ],
    "screenshots": [],
    "ai_extraction_failed": false,
    "stats": {
      "durationMs": 4200,
      "captchaSolvedCount": 0,
      "inputTokens": 312,
      "outputTokens": 84,
      "totalTokens": 396
    }
  },
  "error": null
}

The result.content field is the main output. What it contains depends on what you asked for:

If you passed a prompt, content is the AI-extracted result
If you did not pass a prompt, content is the raw page content as Markdown

result.data is an array with one entry per URL. Each entry has the page title, the full Markdown content for that URL, whether it succeeded, and a screenshot URL if you requested one.

result.stats tells you how long the job took, how many CAPTCHAs were solved, and how many tokens the AI extraction used.

A polling loop in Python

import requests
import time

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

def scrape(url):
    # Submit the job
    response = requests.post(
        f"{BASE_URL}/scrape",
        headers=HEADERS,
        json={"urls": [{"url": url}]}
    )
    response.raise_for_status()
    job_id = response.json()["jobId"]

    # Poll until complete
    while True:
        status_response = requests.get(
            f"{BASE_URL}/scrape/{job_id}",
            headers=HEADERS
        )
        data = status_response.json()

        if data["status"] == "completed":
            return data["result"]
        elif data["status"] == "failed":
            raise Exception(f"Scrape failed: {data['error']}")

        time.sleep(3)

result = scrape("https://news.ycombinator.com")
print(result["content"])

The same in Node.js:

const API_KEY = "YOUR_API_KEY";
const BASE_URL = "https://api.spidra.io/api";
const HEADERS = {
  "x-api-key": API_KEY,
  "Content-Type": "application/json"
};

async function scrape(url) {
  const submitRes = await fetch(`${BASE_URL}/scrape`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify({ urls: [{ url }] })
  });
  const { jobId } = await submitRes.json();

  while (true) {
    const statusRes = await fetch(`${BASE_URL}/scrape/${jobId}`, {
      headers: HEADERS
    });
    const data = await statusRes.json();

    if (data.status === "completed") return data.result;
    if (data.status === "failed") throw new Error(data.error);

    await new Promise(r => setTimeout(r, 3000));
  }
}

const result = await scrape("https://news.ycombinator.com");
console.log(result.content);

AI extraction with prompts

The plain scrape above gives you raw Markdown. Most of the time you want something more specific. That is where the prompt field comes in.

Add a prompt and Spidra reads the rendered page and extracts exactly what you described. The AI understands context. It knows a number next to a currency symbol is a price, that a short bold line near the top of a product page is probably the title, and that a block of longer text is likely a description. You describe the output you want and it figures out where to find it.

curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [{"url": "https://news.ycombinator.com"}],
    "prompt": "Extract the top 10 post titles and their point scores",
    "output": "json"
  }'

When the job completes, result.content contains the AI-extracted data as JSON:

[
  {"title": "Show HN: I built a thing", "points": 342},
  {"title": "Ask HN: What are you working on?", "points": 289}
]

The output field controls the format. It defaults to "json" but you can set it to "markdown" if you want the extracted content as formatted text instead of structured data.

One thing to know: if you set output: "json" without a prompt, Spidra still runs a default AI extraction pass. If you want the raw page content with no AI processing at all, omit both output and prompt.

If AI extraction fails for any reason (a near-empty page, a heavily obfuscated site), Spidra falls back to returning the raw page Markdown and sets ai_extraction_failed: true in the response so your code can detect and handle it.

Structured output with JSON schema

Prompts are flexible but they are not predictable. The AI decides what fields to return and what to call them. For production pipelines where downstream systems expect a specific shape, that is a problem.

The schema field solves this. Pass a JSON Schema object and the AI must return data that matches it exactly. Required fields always appear in the output, as null if the page does not have that value. Field names match exactly what you defined. The structure never varies between runs.

curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [{"url": "https://jobs.example.com/senior-engineer"}],
    "prompt": "Extract the job details. Normalize salary to a number in USD.",
    "schema": {
      "type": "object",
      "required": ["title", "company", "remote", "employment_type"],
      "properties": {
        "title":           {"type": "string"},
        "company":         {"type": "string"},
        "remote":          {"type": ["boolean", "null"]},
        "salary_min":      {"type": ["number", "null"]},
        "salary_max":      {"type": ["number", "null"]},
        "employment_type": {
          "type": ["string", "null"],
          "enum": ["full_time", "part_time", "contract", null]
        }
      }
    }
  }'

The response will always have title, company, remote, and employment_type because they are in required. If the page does not mention a salary, salary_min and salary_max come back as null rather than being omitted.

When you provide a schema, output is automatically set to "json". You do not need to set it yourself.

The schema is validated before the job is queued. If it is malformed, the API returns a 422 with descriptive errors. Non-fatal issues like unsupported keywords come back as schema_warnings in the response.

Schema limits to be aware of: maximum nesting depth is 5 levels, maximum schema size is 10 KB.

Browser actions

Some pages do not show you the data you want until you interact with them first. Cookie banners blocking content. A "Load More" button that reveals the next batch of results. A search form you need to fill before anything appears. Tabs that hide content by default.

The actions array on each URL object lets you interact with the page before extraction runs. Actions execute in order, inside a real browser, before Spidra runs your prompt.

Here is an example that dismisses a cookie banner, fills a search form, and waits for results to load:

{
  "urls": [{
    "url": "https://example.com/search",
    "actions": [
      {"type": "click", "value": "Accept cookies button"},
      {"type": "type", "selector": "input[name='q']", "value": "wireless headphones"},
      {"type": "click", "selector": "button[type='submit']"},
      {"type": "wait", "duration": 1500},
      {"type": "scroll", "to": "80%"}
    ]
  }],
  "prompt": "Extract all product names and prices from the search results",
  "output": "json"
}

Notice that for the first click, the value field is a plain English description of the element. For the second click, the selector field is a CSS selector. Both approaches work and you can mix them in the same actions array.

For any click, check, or uncheck action:

Use selector for a CSS selector or XPath expression like "#accept-cookies" or ".submit-btn"
Use value for a plain English description like "Accept cookies button" and Spidra's AI will find the element for you

Both are equally valid. Use whichever makes more sense for the page you are working with.

Available actions

Action	What it does	Key fields
`click`	Clicks a button, link, tab, or any element	`selector` or `value`
`type`	Types text into an input or search field	`selector`, `value`
`check`	Checks a checkbox	`selector` or `value`
`uncheck`	Unchecks a checkbox	`selector` or `value`
`wait`	Pauses for a number of milliseconds	`duration`
`scroll`	Scrolls to a percentage of the page height	`to` (e.g. `"80%"`)
`forEach`	Finds matching elements and processes each one	`value`, `mode`

The forEach action

forEach is the most powerful action in the API. It finds a set of repeating elements on the page (product cards, search result links, accordion rows, directory listings) and processes each one individually, then combines all the results into a single output.

It supports three modes:

inline reads the content of each matched element directly. Use this for product cards, table rows, or any content that lives inside the element itself.

navigate follows each element as a link, loads the destination page, and scrapes it. Use this when the data you want is on detail pages that you need to navigate into.

click clicks each element to expand or reveal content, then scrapes what appears. Use this for accordions, modals, or expandable sections.

{
  "urls": [{
    "url": "https://directory.example.com/companies",
    "actions": [
      {"type": "click", "value": "Accept cookies"},
      {
        "type": "forEach",
        "value": "Find all company listing cards",
        "mode": "navigate",
        "maxItems": 20,
        "itemPrompt": "Extract company name, website, and industry",
        "pagination": {
          "nextSelector": "a.next-page",
          "maxPages": 3
        }
      }
    ]
  }],
  "output": "json"
}

This dismisses the cookie banner, finds every company card on the page, navigates into each one, extracts the company details, and repeats across 3 pages of pagination. All in a single API call.

Proxy and geo-targeting

Some sites block traffic from cloud IP ranges. Others serve different content based on location. The useProxy and proxyCountry fields route your requests through residential proxies to handle both situations.

{
  "urls": [{"url": "https://amazon.de/dp/B123456"}],
  "prompt": "Extract the product price",
  "output": "json",
  "useProxy": true,
  "proxyCountry": "de"
}

Setting useProxy: true routes the request through the residential proxy network. proxyCountry accepts:

A two-letter ISO country code like "us", "de", "gb", "fr"
"eu" to rotate randomly across all 27 EU member states
"global" or omit it entirely for no country preference

Proxy usage is billed from your bandwidth quota, not your credits. There is no credit multiplier for using proxies.

Additional options

Extract content only

Strip navigation, headers, footers, and sidebars before extraction. Useful when you only want the main content of a page and want to reduce noise.

{
  "urls": [{"url": "https://blog.example.com/article"}],
  "prompt": "Summarize this article",
  "extractContentOnly": true
}

Screenshots

Capture screenshots of scraped pages for debugging, archival, or visual monitoring.

{
  "urls": [{"url": "https://example.com"}],
  "screenshot": true,
  "fullPageScreenshot": true
}

screenshot: true captures the visible viewport. fullPageScreenshot: true captures the entire scrollable page. The screenshot URLs are returned in result.screenshots and in each item's screenshotUrl field.

Authenticated scraping

Pass session cookies to access pages behind a login. Get the cookies from your browser's DevTools after logging in manually, then include them in your request.

{
  "urls": [{"url": "https://app.example.com/dashboard"}],
  "prompt": "Extract the account summary",
  "cookies": "session_id=abc123; auth_token=xyz789"
}

Standard cookie format (name=value; name2=value2) and Chrome DevTools paste format both work. Cookies are passed ephemerally to the browser worker and never stored.

Batch scraping

When you have a list of URLs to process, the batch endpoint handles up to 50 at a time in parallel. Each URL runs in its own independent worker.

Endpoint: POST /api/batch/scrape

import requests
import time

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

urls = [
    "https://example.com/product/1",
    "https://example.com/product/2",
    "https://example.com/product/3",
]

# Submit the batch
response = requests.post(
    f"{BASE_URL}/batch/scrape",
    headers=HEADERS,
    json={
        "urls": urls,
        "prompt": "Extract the product name, price, and availability",
        "output": "json",
    }
)
batch_id = response.json()["batchId"]

# Poll until complete
while True:
    status = requests.get(
        f"{BASE_URL}/batch/scrape/{batch_id}",
        headers=HEADERS
    ).json()

    if status["status"] in ("completed", "failed", "partial"):
        break

    time.sleep(3)

# Process results
for item in status["items"]:
    if item["status"] == "completed":
        print(f"{item['url']}: {item['result']['content']}")
    else:
        print(f"Failed: {item['url']} — {item['error']}")

The batch response includes a status for the overall batch and an items array with one entry per URL. Each item has its own status, result, and error so you can see exactly which URLs succeeded and which failed.

Credits are reserved upfront when you submit and reconciled per item when processing completes. If a URL fails, credits for that item are returned.

Batch with structured output

Everything that works in single scrape works in batch. Pass a schema and every item in the batch returns data matching that shape:

requests.post(
    f"{BASE_URL}/batch/scrape",
    headers=HEADERS,
    json={
        "urls": urls,
        "prompt": "Extract the product details",
        "schema": {
            "type": "object",
            "required": ["name", "price"],
            "properties": {
                "name":      {"type": "string"},
                "price":     {"type": ["number", "null"]},
                "currency":  {"type": ["string", "null"]},
                "available": {"type": ["boolean", "null"]}
            }
        }
    }
)

Managing batches

Beyond submitting and polling, the batch API has a few more endpoints worth knowing:

Endpoint	What it does
`GET /api/batch/scrape`	List all your batch jobs with status and credit usage
`DELETE /api/batch/scrape/{batchId}`	Cancel a running or pending batch. Credits for unprocessed items are refunded.
`POST /api/batch/scrape/{batchId}/retry`	Re-queue only the failed items in a completed batch without resubmitting the ones that already succeeded.

The retry endpoint is particularly useful for large batches where a handful of items fail due to transient issues. You do not need to resubmit the full batch, just the failures.

Crawling

Batch scraping works when you already know the URLs. Crawling is for when you want Spidra to discover pages for you.

You give it a starting URL, describe which pages to follow, and describe what to extract from each one. Spidra loads the base URL, finds links matching your crawl instruction, visits each one up to your maxPages limit, and applies your transform instruction to every page it visits.

Endpoint: POST /api/crawl

response = requests.post(
    f"{BASE_URL}/crawl",
    headers=HEADERS,
    json={
        "baseUrl": "https://docs.example.com",
        "crawlInstruction": "Follow all documentation pages. Skip changelog and login pages.",
        "transformInstruction": "Extract the page title and full body text as clean Markdown. Preserve all headings and code examples.",
        "maxPages": 20,
        "useProxy": False
    }
)
job_id = response.json()["jobId"]

Three fields are required: baseUrl, crawlInstruction, and transformInstruction. Everything else is optional.

maxPages defaults to 5 and goes up to 20. The crawl discovers links from the base URL first, then works through them in order of discovery.

Poll GET /api/crawl/{jobId} for status. When complete, results are available through several endpoints:

Endpoint	What it returns
`GET /api/crawl/{jobId}`	Overall status and summary
`GET /api/crawl/{jobId}/pages`	All crawled pages with extracted data and signed URLs to the original HTML and Markdown
`GET /api/crawl/{jobId}/download`	ZIP archive of all results
`POST /api/crawl/{jobId}/extract`	Run a new extraction on already-crawled pages without re-crawling
`GET /api/crawl/history`	Paginated list of your past crawl jobs

The extract endpoint is worth highlighting. If you crawl a site and later decide you want to extract different fields, you can run a new extraction on the cached pages without making a single new browser request. That saves time and credits.

A complete crawl example

import requests
import time
import json

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

# Submit the crawl
job = requests.post(
    f"{BASE_URL}/crawl",
    headers=HEADERS,
    json={
        "baseUrl": "https://blog.example.com",
        "crawlInstruction": "Follow all blog post pages. Skip tag pages, author pages, and the homepage.",
        "transformInstruction": "Extract the article title, author, publish date, and full body text.",
        "maxPages": 15
    }
).json()

job_id = job["jobId"]
print(f"Crawl job started: {job_id}")

# Poll until complete
while True:
    status = requests.get(
        f"{BASE_URL}/crawl/{job_id}",
        headers=HEADERS
    ).json()

    print(f"Status: {status['status']}")

    if status["status"] == "completed":
        break
    elif status["status"] == "failed":
        raise Exception("Crawl failed")

    time.sleep(5)

# Fetch all crawled pages
pages = requests.get(
    f"{BASE_URL}/crawl/{job_id}/pages",
    headers=HEADERS
).json()

# Save as JSONL
with open("crawl_results.jsonl", "w") as f:
    for page in pages["pages"]:
        f.write(json.dumps({
            "url": page["url"],
            "data": page["data"]
        }) + "\n")

print(f"Saved {len(pages['pages'])} pages")

Monitoring and logs

The Spidra API keeps a log of every scrape job you run. This is useful for debugging, auditing, and understanding your credit consumption.

# List recent scrape logs
logs = requests.get(
    f"{BASE_URL}/scrape-logs",
    headers=HEADERS
).json()

for log in logs["data"]:
    print(f"{log['started_at']} — {log['status']} — {log['latency_ms']}ms — {log['tokens_used']} tokens")

# Get full details of a specific log
log_detail = requests.get(
    f"{BASE_URL}/scrape-logs/{log['uuid']}",
    headers=HEADERS
).json()

Usage statistics

Track your credit consumption over time:

usage = requests.get(
    f"{BASE_URL}/account/usage",
    headers=HEADERS
).json()

print(usage)

This returns time-series data covering requests, tokens, crawls, and credit consumption over a configurable period.

Putting it all together: a real pipeline

Here is a complete example that combines scraping, batch processing, and structured output into a pipeline that collects job listings from multiple pages and saves them to a JSONL file:

import requests
import time
import json

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

JOB_SCHEMA = {
    "type": "object",
    "required": ["title", "company", "location"],
    "properties": {
        "title":           {"type": "string"},
        "company":         {"type": "string"},
        "location":        {"type": ["string", "null"]},
        "remote":          {"type": ["boolean", "null"]},
        "salary_min":      {"type": ["number", "null"]},
        "salary_max":      {"type": ["number", "null"]},
        "employment_type": {
            "type": ["string", "null"],
            "enum": ["full_time", "part_time", "contract", None]
        }
    }
}

def collect_job_urls(board_url):
    """Use forEach to collect job listing URLs from a board page."""
    response = requests.post(
        f"{BASE_URL}/scrape",
        headers=HEADERS,
        json={
            "urls": [{
                "url": board_url,
                "actions": [
                    {"type": "click", "value": "Accept cookies"},
                    {
                        "type": "forEach",
                        "value": "Find all job listing links",
                        "mode": "navigate",
                        "maxItems": 50,
                        "itemPrompt": "Extract job title, company, location, remote status, salary range, and employment type",
                        "pagination": {
                            "nextSelector": "a.next-page",
                            "maxPages": 3
                        }
                    }
                ]
            }],
            "output": "json",
            "schema": JOB_SCHEMA
        }
    )
    job_id = response.json()["jobId"]

    while True:
        status = requests.get(
            f"{BASE_URL}/scrape/{job_id}",
            headers=HEADERS
        ).json()

        if status["status"] == "completed":
            return status["result"]["content"]
        elif status["status"] == "failed":
            raise Exception(status["error"])

        time.sleep(3)

# Collect from multiple job boards
boards = [
    "https://jobs.example.com/engineering",
    "https://careers.anothersite.com/remote",
]

all_jobs = []
for board in boards:
    print(f"Collecting from {board}...")
    jobs = collect_job_urls(board)
    if isinstance(jobs, list):
        all_jobs.extend(jobs)
    print(f"  Got {len(jobs) if isinstance(jobs, list) else 0} jobs")

# Save results
with open("jobs.jsonl", "w") as f:
    for job in all_jobs:
        f.write(json.dumps(job) + "\n")

print(f"\nTotal: {len(all_jobs)} jobs saved to jobs.jsonl")

Error handling

Wrap your API calls properly and handle the cases that actually come up in production.

import requests

def safe_scrape(url, prompt):
    try:
        response = requests.post(
            f"{BASE_URL}/scrape",
            headers=HEADERS,
            json={
                "urls": [{"url": url}],
                "prompt": prompt,
                "output": "json"
            }
        )

        if response.status_code == 401:
            raise Exception("Invalid API key. Check your x-api-key header.")

        if response.status_code == 403:
            raise Exception("Credits exhausted or plan limit reached.")

        if response.status_code == 429:
            raise Exception("Rate limit hit. Wait before retrying.")

        response.raise_for_status()
        return response.json()["jobId"]

    except requests.exceptions.ConnectionError:
        raise Exception("Could not connect to the Spidra API.")

For polling loops, always handle the failed status and check ai_extraction_failed in the result:

if status["status"] == "completed":
    result = status["result"]

    if result.get("ai_extraction_failed"):
        # AI extraction failed, content is raw Markdown fallback
        print("AI extraction failed, using raw content")
        content = result["data"][0]["markdownContent"]
    else:
        content = result["content"]

API reference summary

Method	Endpoint	Purpose
`POST`	`/api/scrape`	Submit a scrape job (1 to 3 URLs)
`GET`	`/api/scrape/{jobId}`	Poll for job status and results
`POST`	`/api/batch/scrape`	Submit a batch job (up to 50 URLs)
`GET`	`/api/batch/scrape/{batchId}`	Poll batch status and per-item results
`GET`	`/api/batch/scrape`	List all your batch jobs
`DELETE`	`/api/batch/scrape/{batchId}`	Cancel a batch and refund unused credits
`POST`	`/api/batch/scrape/{batchId}/retry`	Retry only the failed items in a batch
`POST`	`/api/crawl`	Submit a crawl job
`GET`	`/api/crawl/{jobId}`	Poll crawl status
`GET`	`/api/crawl/{jobId}/pages`	Get all crawled pages with extracted data
`POST`	`/api/crawl/{jobId}/extract`	Re-extract from crawled pages without re-crawling
`GET`	`/api/crawl/{jobId}/download`	Download crawl results as ZIP
`GET`	`/api/crawl/history`	List your past crawl jobs
`GET`	`/api/scrape-logs`	List recent scrape logs
`GET`	`/api/scrape-logs/{id}`	Get full details of a single log
`GET`	`/api/account/usage`	Get usage statistics

What next

You now have a working understanding of every part of the Spidra API. Here are the natural next steps depending on what you are building:

If you want to go deeper on browser actions and forEach, read the Browser Actions Guide in the docs. It covers every option for each action type with real examples.

If you are building something that needs guaranteed output shapes, read the Structured Output Guide for full details on schemas, nullable fields, Zod and Pydantic integration, and schema limits.

If you are using an SDK in a specific language, each one has its own guide: Node.js, Python, Go, PHP, Ruby, Rust, .NET, Elixir, Java, and Swift.

Get your API key at app.spidra.io. The free plan has 300 credits and no card required.

Share this article

Guides

Get structured data from popular websites

Learn how to get structured data from popular websites like Amazon using a JSON Schema and AI prompt, no selectors or proxies required.

July 8, 2026 · 5 min read

Guides

Spidra crawl API: how to crawl an entire website and extract data

Discover and extract data from entire websites with Python and Node.js. Covers re-extraction, authenticated crawling, and proxy routing.

June 24, 2026 · 15 min read

Guides

Spidra browser actions: complete guide to clicking, scrolling, and interacting before scraping

Complete guide to Spidra browser actions. Learn how to click, scroll, type, and use forEach with real examples.

June 23, 2026 · 15 min read

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.

Spidra API tutorial: complete guide to web scraping with the Spidra API

Before you start

How the API works

Authentication

Scraping a single page

The minimal request

Polling for results

A polling loop in Python

AI extraction with prompts

Structured output with JSON schema

Browser actions

Available actions

The forEach action

Proxy and geo-targeting

Additional options

Extract content only

Screenshots

Authenticated scraping

Batch scraping

Batch with structured output

Managing batches

Crawling

A complete crawl example

Monitoring and logs

Usage statistics

Putting it all together: a real pipeline

Error handling

API reference summary

What next

Share this article

Related posts

Get structured data from popular websites

Spidra crawl API: how to crawl an entire website and extract data

Spidra browser actions: complete guide to clicking, scrolling, and interacting before scraping

Start scraping for free.