Data Enrichment

Feed in a list.
Get back a database.

Use the Spidra API to loop through any list of URLs, perform AI extraction, handle proxies, and follow multi-hop links. You own the orchestration. We handle the hard parts.

300 free credits included. No credit card required.

Input — your URL list

booking.com/hotel/br/grand-hyatt

eventbrite.com/o/houston-arts-42417

zillow.com/homedetails/123-main-st

amazon.com/dp/B08N5WRWNW

Spidra enriches

Output — structured fields

name"Grand Suite Ocean View"

sizeM282

email"[email protected]"

price$1,249,000

rating4.7

inStocktrue

Perfect for

Travel and hospitality

Tour operators, OTAs, and hotel chains that maintain large property catalogs. Extract room specs, amenities, dining, wellness, and location data from Booking.com, IHG, Marriott, or any hotel website.

Sales and marketing teams

Enrich existing contact lists with emails, phone numbers, social links, and company details from organizer pages, business directories, and external websites.

E-commerce and product teams

Populate product databases with specs, prices, descriptions, and images from supplier sites or competitor pages. Keep catalog data fresh without manual data entry or expensive third-party feeds.

Real estate and finance

Aggregate property listings, valuation data, planning permits, and neighborhood stats from multiple sources. Normalize every record to the same schema so your models and dashboards always have complete, consistent data.

How it works

Four steps from a raw list of URLs to a fully populated dataset.

Start with your seed list

Start with any URL list. Read a CSV in your script, hardcode an array, or pull from a database. Pass each URL to the Spidra API and it works from whatever you have.

Define your schema

Describe the fields you want as a text prompt, or pass a JSON Schema. Spidra locks the output shape so every record comes back with the same fields, every time.

Spidra follows the chain

Most data lives across multiple pages. Spidra clicks into modals, follows links, visits external websites, and resolves redirects. All automatic, no extra code from you.

Get normalized JSON

Every field is extracted, normalized, and returned as clean JSON. Null means not found. The shape never changes. Plug it straight into your database, CRM, or pipeline.

Multi-hop extraction

Real-world data rarely lives on a single page. Spidra follows every link in the chain until it has everything you asked for.

Hotel content pipeline

Hotel page

https://booking.com/hotel/br/grand-hyatt-rio

Opens page, scrolls to availability table

Room modals (forEach)

Clicks each room category link

Extracts name, size, view, amenities per room

Parallel crawls

8 simultaneous category extractions

Dining, wellness, sport, facilities, services, kids, location, basic

Structured output

Full hotel profile, normalized to schema

{ rooms: [...], dining: {...}, wellness: {...}, location: {...} }

Contact enrichment pipeline

Event page

https://eventbrite.com/e/event-123

Extracts event name, date, organizer name and profile link

Organizer profile

https://eventbrite.com/o/organizer-456

Extracts website URL, Facebook page, follower count, total events

Organizer website

Tries homepage, /contact, /about

Extracts email, phone, address — falls back to Facebook if missing

Structured output

CRM-ready record, all fields filled

{ email: "...", phone: "...", address: "...", followers: 2400 }

Developer API

Build your enrichment pipeline with a few API calls.

No scraper maintenance. No fragile selectors. Just describe what you need and Spidra handles the browser, the AI, the proxies, and the extraction.

Batch any number of URLs in parallel

forEach opens modals and collapsed sections automatically

Proxy rotation built in for geo-restricted sources

Returns consistent JSON schema every single run

// Hotel content enrichment with forEach + schema

const res = await fetch("https://api.spidra.io/api/scrape", {

method: "POST",

headers: {

"Content-Type": "application/json",

"x-api-key": API_KEY,

body: JSON.stringify({

urls: [{

url: "https://www.booking.com/hotel/br/grand-hyatt-rio.de.html",

actions: [{

type: "forEach",

observe: "Find all clickable room category links in the availability table",

mode: "click",

itemPrompt: "Extract room name, size in m2, view, bathroom type, and amenities"

}]

}],

prompt: "Extract all room details. Normalize sizes to m2.",

output: "json",

useProxy: true,

proxyCountry: "de",

}),

});

const { jobId } = await res.json();

Built for real pipelines

Two examples of what teams build with the Spidra API. Same API, different pipelines.

Enterprise

Hotel content pipeline for a large catalog

A tour operator managing a large hotel catalog needs structured facts across every property: rooms, amenities, dining, wellness, and location. Extracted from Booking.com and direct hotel sites, normalized to an internal content schema.

SourceBooking.com, IHG, Marriott, direct hotel sites

ExtractionforEach for room modals + 8 parallel crawls per hotel

CategoriesRooms, dining, wellness, sport, facilities, services, kids, location

OutputStructured JSON, normalized to internal schema

ScaleHandles 100k+ hotels, quarterly refresh

Sample output — one room

{
  "name": "Grand Suite Ocean View",
  "sizeM2": 82,
  "view": "sea",
  "accommodationType": "suite",
  "bathroom": "both",
  "airConditioning": true,
  "minibar": true,
  "balcony": true,
  "safe": true,
  "coffeeTea": true
}

Sales automation

Contact enrichment for an event organizer database

A sales automation team has a list of Eventbrite organizer URLs with partial data. They need email, phone, address, and social links filled in across thousands of records and exported as a CRM-ready dataset in a single automated run.

SourceEventbrite organizer pages + external websites + Facebook

Extraction4-hop chain: event → organizer → website → social fallback

FieldsEmail, phone, address, social links, follower count, event count

OutputCRM-ready JSON exported to CSV

Scale4,500 records per run, skips already-enriched rows

Sample output — one organizer

{
  "organizer_name": "Houston Arts Collective",
  "email": "[email protected]",
  "phone": "(713) 555-0182",
  "website": "houstonarts.org",
  "facebook": "fb.com/houstonarts",
  "follower_count": 3200,
  "total_events": 47
}

How Spidra compares

See how Spidra stacks up for large-scale data enrichment.

Feature

Spidra

Manual entry

Data vendors

Build your own

Multi-hop extraction (follows links)

Real-time data from source

Custom schema per record type

JavaScript rendering + modal clicks

Proxy rotation built in

Scales to 100k+ records

No infrastructure to maintain

Works on any website

FAQ

Common questions about data enrichment with Spidra.

Yes. In your script, read the records you already have and pass each URL to the Spidra API. Skip rows that are already complete by checking before you call. Spidra fetches and fills only what you send it. This is how the contact enrichment example works: read a CSV, skip rows that already have an email, enrich the rest.

Stop filling in data
by hand.

Use the API to loop through your URLs. Get back a complete, structured dataset. 300 free credits to start.

We build features around real workflows. Usually within days.

Feed in a list. Get back a database.

Perfect for

Travel and hospitality

Sales and marketing teams

E-commerce and product teams

Real estate and finance

How it works

Start with your seed list

Define your schema

Spidra follows the chain

Get normalized JSON

Multi-hop extraction

Hotel page

Room modals (forEach)

Parallel crawls

Structured output

Event page

Organizer profile

Organizer website

Structured output

Build your enrichment pipeline with a few API calls.

Built for real pipelines

Hotel content pipeline for a large catalog

Contact enrichment for an event organizer database

How Spidra compares

FAQ

Stop filling in databy hand.

Feed in a list.
Get back a database.

Stop filling in data
by hand.