Data Enrichment

Feed in a list.
Get back a database.

Point Spidra at a CSV of URLs or a list of product pages. Define the fields you need. Spidra follows every link, opens every modal, and returns a fully structured dataset with every field filled in.

300 free credits included. No credit card required.

Input — your URL list
booking.com/hotel/br/grand-hyatt
eventbrite.com/o/houston-arts-42417
zillow.com/homedetails/123-main-st
amazon.com/dp/B08N5WRWNW
Spidra enriches
Output — structured fields
name"Grand Suite Ocean View"
sizeM282
price$1,249,000
rating4.7
inStocktrue
130k+
Records enriched per pipeline
8
Parallel data categories
4-hop
Multi-source extraction depth
Any
URL, schema, or industry

Perfect for

Travel and hospitality

Tour operators, OTAs, and hotel chains that maintain large property catalogs. Extract room specs, amenities, dining, wellness, and location data from Booking.com, IHG, Marriott, or any hotel website — all normalized to your internal schema.

Sales and marketing teams

Enrich existing contact lists with emails, phone numbers, social links, and company details from organizer pages, business directories, and external websites. Take a CSV of 4,000 profiles and get back a CRM-ready dataset in one run.

E-commerce and product teams

Populate product databases with specs, prices, descriptions, and images from supplier sites or competitor pages. Keep catalog data fresh without manual data entry or expensive third-party feeds.

Real estate and finance

Aggregate property listings, valuation data, planning permits, and neighborhood stats from multiple sources. Normalize every record to the same schema so your models and dashboards always have complete, consistent data.

How it works

Four steps from a raw list of URLs to a fully populated dataset.

01

Start with your seed list

Pass a CSV, an array of URLs, or a single starting URL. Spidra works from whatever you already have — hotel pages, organizer profiles, product listings, or property URLs.

02

Define your schema

Describe the fields you want in plain English or pass a JSON Schema. Spidra locks the output shape so every record comes back with the same fields, every time.

03

Spidra follows the chain

Most data lives across multiple pages. Spidra clicks into modals, follows organizer links, visits external websites, and resolves redirects — all automatically.

04

Get normalized JSON

Every field is extracted, normalized, and returned as clean JSON. Null means not found. The shape never changes. Plug it straight into your database, CRM, or pipeline.

Multi-hop extraction

Real-world data rarely lives on a single page. Spidra follows every link in the chain until it has everything you asked for.

Hotel content pipeline

Hotel page

https://booking.com/hotel/br/grand-hyatt-rio

Opens page, scrolls to availability table

Room modals (forEach)

Clicks each room category link

Extracts name, size, view, amenities per room

Parallel crawls

8 simultaneous category extractions

Dining, wellness, sport, facilities, services, kids, location, basic

Structured output

Full hotel profile, normalized to schema

{ rooms: [...], dining: {...}, wellness: {...}, location: {...} }

Contact enrichment pipeline

Event page

https://eventbrite.com/e/event-123

Extracts event name, date, organizer name and profile link

Organizer profile

https://eventbrite.com/o/organizer-456

Extracts website URL, Facebook page, follower count, total events

Organizer website

Tries homepage, /contact, /about

Extracts email, phone, address — falls back to Facebook if missing

Structured output

CRM-ready record, all fields filled

{ email: "...", phone: "...", address: "...", followers: 2400 }

Developer API

Build your enrichment pipeline with a few API calls.

No scraper maintenance. No fragile selectors. Just describe what you need and Spidra handles the browser, the AI, the proxies, and the extraction.

Batch any number of URLs in parallel
forEach opens modals and collapsed sections automatically
Proxy rotation built in for geo-restricted sources
Returns consistent JSON schema every single run
// Hotel content enrichment with forEach + schema
const res = await fetch("https://api.spidra.io/api/scrape", {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": API_KEY,
},
body: JSON.stringify({
urls: [{
url: "https://www.booking.com/hotel/br/grand-hyatt-rio.de.html",
actions: [{
type: "forEach",
observe: "Find all clickable room category links in the availability table",
mode: "click",
itemPrompt: "Extract room name, size in m2, view, bathroom type, and amenities"
}]
}],
prompt: "Extract all room details. Normalize sizes to m2.",
output: "json",
useProxy: true,
proxyCountry: "de",
}),
});
 
const { jobId } = await res.json();

Built for real pipelines

From a solo founder enriching 4,500 contacts to an enterprise tour operator managing 130,000 hotel records — the same API handles both.

Enterprise

Hotel content database for a major tour operator

A large European tour operator needed structured hotel facts across their entire catalog — rooms, amenities, dining, wellness, and location — extracted from Booking.com and direct hotel websites, normalized to their internal content schema.

SourceBooking.com, IHG, Marriott, direct hotel sites
ExtractionforEach for room modals + 8 parallel crawls per hotel
CategoriesRooms, dining, wellness, sport, facilities, services, kids, location
OutputStructured JSON, normalized to internal schema
Scale130,000+ hotels, quarterly refresh
Sample output — one room
{
  "name": "Grand Suite Ocean View",
  "sizeM2": 82,
  "view": "sea",
  "accommodationType": "suite",
  "bathroom": "both",
  "airConditioning": true,
  "minibar": true,
  "balcony": true,
  "safe": true,
  "coffeeTea": true
}
Sales automation

4,500 event organizer profiles enriched for a sales platform

An AI sales automation founder had a list of Eventbrite organizer URLs with partial data. They needed email, phone, address, and social links filled in across all 4,500 records and exported as a CRM-ready dataset in a single automated run.

SourceEventbrite organizer pages + external websites + Facebook
Extraction4-hop chain: event → organizer → website → social fallback
FieldsEmail, phone, address, social links, follower count, event count
OutputCRM-ready JSON exported to CSV
Scale4,500 records per run, skips already-enriched rows
Sample output — one organizer
{
  "organizer_name": "Houston Arts Collective",
  "email": "[email protected]",
  "phone": "(713) 555-0182",
  "website": "houstonarts.org",
  "facebook": "fb.com/houstonarts",
  "follower_count": 3200,
  "total_events": 47
}
Open source

See a complete enrichment pipeline in action

We built an open-source Eventbrite scraper on top of the Spidra API that shows the full enrichment chain end to end. It starts from a search URL, follows 4 levels of links, and outputs a complete JSON dataset with contact info for every organizer. ~200 lines of code. Fully documented.

4
Hops deep
~200
Lines of code
4,500+
Records per run
100%
Spidra API

How Spidra compares

See how Spidra stacks up for large-scale data enrichment.

Feature
Spidra
Manual entry
Data vendors
Build your own
Multi-hop extraction (follows links)
Real-time data from source
Custom schema per record type
JavaScript rendering + modal clicks
Proxy rotation built in
Scales to 100k+ records
No infrastructure to maintain
Works on any website

FAQ

Common questions about data enrichment with Spidra.

Yes. Pass a CSV or an array of URLs you already have, and Spidra fills in the gaps. If a field is already populated in your source data, you can instruct the workflow to skip that record and only process ones that are missing data. This is how the 4,500-organizer enrichment run works — it reads an existing CSV and only fetches the records that need updating.

Stop filling in data
by hand.

Feed Spidra your list. Get back a complete, structured dataset. 300 free credits to start.

We build features around real workflows. Usually within days.