eBay has millions of active listings across every product category. Every listing includes prices, seller ratings, item conditions, shipping costs, and product specifications, all of which are publicly visible.
Whether you are building a price monitor, running market research, or tracking inventory across sellers, eBay is one of the most useful publicly available e-commerce datasets available.
Getting to that data programmatically is where things get interesting.
We will use this listing as our target throughout:
https://www.ebay.com/itm/125575167955A portable wireless mouse. $7.95, seller feedback 99.3%, 15,446 ratings, 4 product images, a full specification table.
Prerequisites
Install the required libraries:
pip install requests beautifulsoup4Step 1: retrieve the page HTML
Start with a basic request to confirm you can reach the page:
import requests
url = "https://www.ebay.com/itm/125575167955"
response = requests.get(url)
if response.status_code != 200:
print(f"Failed: {response.status_code}")
else:
print(response.text[:500])eBay allows a small number of requests without issue. Once you exceed its rate limit threshold, it returns block pages or empty responses. Adding a realistic User-Agent header and a referrer helps:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.google.com/",
}
url = "https://www.ebay.com/itm/125575167955"
response = requests.get(url, headers=headers)Step 2: scrape the product details
With the HTML in hand, use BeautifulSoup to parse the page and locate specific elements:
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")Locate and extract the product name
Right-click the product title and select Inspect. The title is the only <h1> on the page — the simplest selector on eBay:
product_name = soup.find("h1").text.strip()
print(f"Product Name: {product_name}")Product Name: Portable Wireless Mouse, 2.4GHz Silent with USB Receiver, Optical USB MouseLocate and extract the price
Right-click the price and select Inspect. The price sits inside a <div> with the class x-price-primary:
price_div = soup.find("div", {"class": "x-price-primary"})
price = price_div.text.strip() if price_div else None
print(f"Price: {price}")Price: US $7.95/eaExtract product images
eBay's product images are inside a carousel container. Right-click any product image and select Inspect. The carousel is a <div> with the class ux-image-grid, and each image is an <img> tag inside button elements:
image_data = []
carousel = soup.find("div", {"class": "ux-image-grid"})
if carousel:
for img in carousel.find_all("img"):
src = img.get("src")
if src:
image_data.append(src)
print(image_data)[
'https://i.ebayimg.com/images/g/jpoAAOSwrf9jVE4k/s-l140.jpg',
'https://i.ebayimg.com/images/g/1i8AAOSwSyhjVE5R/s-l140.jpg',
...
]Locate and extract the product description
This is where eBay gets complicated. The product description is not on the main page — it lives inside an <iframe> with the ID desc_ifr. That iframe points to a separate URL containing a different HTML document. You need to make a second HTTP request to that URL before you can parse the description.
description_data = []
iframe = soup.find("iframe", {"id": "desc_ifr"})
if iframe:
iframe_url = iframe.get("src")
# second request to the iframe's source URL
iframe_response = requests.get(iframe_url, headers=headers)
iframe_soup = BeautifulSoup(iframe_response.text, "html.parser")
# the specification table inside the iframe
table = iframe_soup.find("table")
if table:
for row in table.find_all("tr"):
cells = [td.text.strip() for td in row.find_all("td")]
if cells:
description_data.append(cells)
# any bullet points inside the iframe
bullets = iframe_soup.find(id="feature-bullets")
if bullets:
for li in bullets.find_all("li"):
text = li.text.strip()
if text:
description_data.append(text)
print(description_data)
else:
print("iframe not found")[
['Connectivity Technology', 'USB'],
['Special Feature', 'wireless'],
['Number of Buttons', '4'],
['Hand Orientation', 'Ambidextrous'],
'▶[HIGH DURABILITY & STABLE CONNECTION]: computer mouse has 5,000,000 clicks...',
...
]Two requests per product. If either gets blocked, you lose the description silently unless you check. And these class names — x-price-primary, ux-image-grid — are not stable. eBay updates its frontend regularly and does not treat your selectors as a dependency.
Locate and extract seller reviews
Seller feedback comments appear at the bottom of the listing. Each comment is a <li> with the class fdbk-container:
review_data = []
review_items = soup.find_all("li", {"class": "fdbk-container"})
for item in review_items:
comment = item.find("div", {"class": "fdbk-container__details__comment"})
if comment:
review_data.append(comment.text.strip())
print(review_data)[
'Awesome seller! Accidentally ordered this twice and was quickly refunded...',
'This wireless mouse is by far the most ergonomic and quiet clicking mouse...',
'Silent as advertised. Great quality.',
...
]Complete BeautifulSoup scraper
Here is the complete code:
import requests
from bs4 import BeautifulSoup
import csv
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.google.com/",
}
url = "https://www.ebay.com/itm/125575167955"
response = requests.get(url, headers=headers)
if response.status_code != 200:
print(f"Failed: {response.status_code}")
else:
soup = BeautifulSoup(response.text, "html.parser")
# Product name
product_name = soup.find("h1").text.strip()
# Price
price_div = soup.find("div", {"class": "x-price-primary"})
price = price_div.text.strip() if price_div else None
# Images
image_data = []
carousel = soup.find("div", {"class": "ux-image-grid"})
if carousel:
for img in carousel.find_all("img"):
src = img.get("src")
if src:
image_data.append(src)
# Description — requires second request to iframe
description_data = []
iframe = soup.find("iframe", {"id": "desc_ifr"})
if iframe:
iframe_response = requests.get(iframe.get("src"), headers=headers)
iframe_soup = BeautifulSoup(iframe_response.text, "html.parser")
table = iframe_soup.find("table")
if table:
for row in table.find_all("tr"):
cells = [td.text.strip() for td in row.find_all("td")]
if cells:
description_data.append(cells)
bullets = iframe_soup.find(id="feature-bullets")
if bullets:
for li in bullets.find_all("li"):
text = li.text.strip()
if text:
description_data.append(text)
# Reviews
review_data = []
for item in soup.find_all("li", {"class": "fdbk-container"}):
comment = item.find("div", {"class": "fdbk-container__details__comment"})
if comment:
review_data.append(comment.text.strip())
product_data = [{
"Product Name": product_name,
"Price": price,
"Images": image_data,
"Product Description": description_data,
"Reviews": review_data,
}]
with open("ebay_product.csv", mode="w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=product_data[0].keys())
writer.writeheader()
writer.writerows(product_data)
print("Saved to ebay_product.csv")
The problem with this approach at scale
The BeautifulSoup scraper works for a single listing on a good request. At scale, two problems appear.
The first is blocking. eBay's rate limiting kicks in after a small number of requests from the same IP. At any real volume you need proxy rotation, which means managing proxy providers, handling rotation logic, and detecting block pages versus real content.
The second is the description iframe. Every product requires two HTTP requests — one for the main page, one for the iframe URL. Both can be blocked independently. The iframe URL also changes per listing, so you cannot cache or predict it. For 1,000 products that is 2,000 requests, each of which needs to get through eBay's rate limiter.
Note: Scraping publicly visible eBay listing data is generally considered legal in the United States based on the hiQ Labs v. LinkedIn ruling (Ninth Circuit 2022). eBay's User Agreement restricts automated access contractually, and enforcement is primarily technical. Stick to public pages. We cover this in full in our guide to web scraping legality.
What data eBay exposes publicly
Two page types are the foundation of any eBay data pipeline.
- Item pages live at
ebay.com/itm/{item_number}. This gives you title, price, condition, availability, item number, seller name, seller feedback score and count, shipping cost, location, images, and a structured specification table. The?_skw=query strings you see from search clicks are tracking parameters — strip them. The clean URL is alwayshttps://www.ebay.com/itm/{item_number}. - Search results pages live at
ebay.com/sch/i.html?_nkw={keyword}. Each page returns 40 listings with title, price, condition, shipping cost, sold count when displayed, and thumbnails. eBay's pagination uses_pgn=as the page parameter.
The right pattern for bulk collection: use search results to find item numbers at scale, then batch-scrape item pages for the full data.
Scraping with the Spidra API
Spidra handles the proxy rotation, CAPTCHA solving, and browser rendering automatically, and uses AI extraction so you never write or maintain selectors. Sign up free at app.spidra.io — 300 credits, no card required.
The schema
ITEM_SCHEMA = {
"type": "object",
"required": ["title", "price", "condition", "availability"],
"properties": {
"title": {"type": "string"},
"price": {"type": ["number", "null"]},
"original_price": {"type": ["number", "null"]},
"currency": {"type": ["string", "null"]},
"condition": {"type": ["string", "null"]},
"availability": {"type": "string"},
"item_number": {"type": ["string", "null"]},
"seller_name": {"type": ["string", "null"]},
"seller_feedback": {"type": ["number", "null"]},
"seller_feedback_count": {"type": ["integer", "null"]},
"shipping_cost": {"type": ["string", "null"]},
"returns_accepted": {"type": ["boolean", "null"]},
"location": {"type": ["string", "null"]},
"images": {"type": "array", "items": {"type": "string"}},
"features": {
"type": "array",
"items": {
"type": "object",
"properties": {
"label": {"type": "string"},
"value": {"type": "string"}
}
}
},
"description": {"type": ["string", "null"]}
}
}The features field uses {label, value} pairs because eBay's specification table pairs attributes with values. A flat array of strings loses that structure.
If you want to generate a schema from real output rather than writing it by hand, paste your JSON into the free JSON Schema Generator and it builds the structure for you.
REST API
The Spidra REST API is easy to use:
# Submit
curl -X POST https://api.spidra.io/api/scrape \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [{"url": "https://www.ebay.com/itm/125575167955"}],
"prompt": "Extract the full product listing details. For returns_accepted check for Returns accepted or No returns text near the shipping section.",
"output": "json",
"useProxy": true,
"proxyCountry": "us",
"schema": { ... }
}'
# Poll
curl https://api.spidra.io/api/scrape/{jobId} \
-H "x-api-key: YOUR_API_KEY"Python SDK
import os
from spidra import SpidraClient, ScrapeParams, ScrapeUrl
spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])
job = spidra.scrape.run_sync(ScrapeParams(
urls=[ScrapeUrl(url="https://www.ebay.com/itm/125575167955")],
prompt="Extract the full product listing details. For returns_accepted check for Returns accepted or No returns text near the shipping section.",
output="json",
schema=ITEM_SCHEMA,
use_proxy=True,
proxy_country="us",
))
print(job.result.content)The Python SDK and Node.js SDK both support the same schema and prompt parameters.
Real output
From an actual test against https://www.ebay.com/itm/125575167955 with proxyCountry: "us":
{
"title": "Portable Wireless Mouse, 2.4GHz Silent with USB Receiver, Optical USB Mouse",
"price": 7.95,
"original_price": null,
"currency": "USD",
"condition": "New",
"availability": "3 available",
"item_number": "125575167955",
"seller_name": "butrad-0",
"seller_feedback": 99.3,
"seller_feedback_count": 15446,
"shipping_cost": "US $15.54",
"returns_accepted": null,
"location": "Flushing, NY, United States",
"images": [
"https://i.ebayimg.com/images/g/jpoAAOSwrf9jVE4k/s-l500.webp",
"https://i.ebayimg.com/images/g/1i8AAOSwSyhjVE5R/s-l500.webp",
"https://i.ebayimg.com/images/g/SqUAAOSwD45jVE5S/s-l500.webp",
"https://i.ebayimg.com/images/g/HSYAAOSwKoBjVE5T/s-l500.webp"
],
"features": [
{"label": "Brand", "value": "Unbranded"},
{"label": "Type", "value": "Mini Mouse"},
{"label": "Maximum DPI", "value": "1600"},
{"label": "Connectivity", "value": "Wireless"},
{"label": "Number of Buttons","value": "4"},
{"label": "Tracking Method", "value": "Optical"},
{"label": "Country of Origin","value": "China"}
],
"description": "..."
}No second request for the iframe. No selector maintenance. One call returns the structured data.
Why proxyCountry matters
Without setting a country, Spidra routes through whichever residential proxy is available. eBay localises pricing based on the connecting IP. Testing without proxyCountry returned prices in SEK with shipping shown as "approx 151.31 SEK". Setting proxyCountry: "us" returns clean USD.
Scraping eBay search results
Search results pages collect item numbers at scale. The array sits inside an object because the Spidra API requires the root schema type to always be "object". If you use the JSON Schema Generator on an array of results, wrap the output before using it.
SEARCH_SCHEMA = {
"type": "object",
"required": ["listings"],
"properties": {
"listings": {
"type": "array",
"items": {
"type": "object",
"required": ["title", "url"],
"properties": {
"title": {"type": "string"},
"url": {"type": "string"},
"price": {"type": ["number", "null"]},
"condition": {"type": ["string", "null"]},
"shipping_cost": {"type": ["string", "null"]},
"sold_count": {"type": ["integer", "null"]},
"sponsored": {"type": ["boolean", "null"]},
"thumbnail": {"type": ["string", "null"]}
}
}
}
}
}REST API
curl -X POST https://api.spidra.io/api/scrape \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [{"url": "https://www.ebay.com/sch/i.html?_nkw=wireless+mouse"}],
"prompt": "Extract all product listings on this page",
"output": "json",
"useProxy": true,
"proxyCountry": "us",
"schema": { ... }
}'Python SDK
job = spidra.scrape.run_sync(ScrapeParams(
urls=[ScrapeUrl(url="https://www.ebay.com/sch/i.html?_nkw=wireless+mouse")],
prompt="Extract all product listings on this page",
output="json",
schema=SEARCH_SCHEMA,
use_proxy=True,
proxy_country="us",
))
listings = job.result.content["listings"]
print(f"Got {len(listings)} listings")What comes back
From a real test against https://www.ebay.com/sch/i.html?_nkw=wireless+mouse:
{
"listings": [
{
"title": "Wireless Mouse 2.4G Optical Cordless Mice for PC Laptop Computer",
"url": "https://www.ebay.com/itm/166383500910",
"price": 8.88,
"condition": "Brand New",
"shipping_cost": "Free International Shipping",
"sold_count": 2575,
"sponsored": null,
"thumbnail": "https://i.ebayimg.com/images/g/3BkAAOSwL4Jm9RSR/s-l500.webp"
},
{
"title": "Inland ic210 Wireless Keyboard & Mouse Combo",
"url": "https://www.ebay.com/itm/267100499182",
"price": 38.85,
"condition": "Pre-Owned",
"shipping_cost": "+$29.60 shipping",
"sold_count": 52,
"sponsored": null,
"thumbnail": "https://i.ebayimg.com/images/g/6V4AAOSwr8lnYuqN/s-l500.webp"
}
]
}40 listings came back. sold_count is present on high-volume listings only — eBay does not show it on every card. sponsored comes back null because eBay renders sponsored badges as visual images without readable text.
The production pattern: search to batch item pages
import os, json
from spidra import SpidraClient, ScrapeParams, ScrapeUrl, BatchScrapeParams
spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])
# Stage 1: collect item URLs from search results
def collect_item_urls(keyword: str, pages: int = 3) -> list[str]:
urls = []
seen = set()
for page in range(1, pages + 1):
page_url = f"https://www.ebay.com/sch/i.html?_nkw={keyword.replace(' ', '+')}&_pgn={page}"
job = spidra.scrape.run_sync(ScrapeParams(
urls=[ScrapeUrl(url=page_url)],
prompt="Extract all product listings on this page",
output="json",
schema=SEARCH_SCHEMA,
use_proxy=True,
proxy_country="us",
))
if job.result.ai_extraction_failed:
print(f"Page {page}: extraction failed, skipping")
continue
for listing in job.result.content.get("listings", []):
url = listing.get("url")
if url and url not in seen:
urls.append(url)
seen.add(url)
print(f"Page {page}: {len(urls)} unique items collected")
return urls
# Stage 2: batch-scrape item pages
def scrape_items(item_urls: list[str]) -> list[dict]:
results = []
for i in range(0, len(item_urls), 50):
chunk = item_urls[i:i + 50]
batch_num = i // 50 + 1
total = -(-len(item_urls) // 50)
batch = spidra.batch.run_sync(BatchScrapeParams(
urls=chunk,
prompt="Extract the full product listing details. For returns_accepted check for Returns accepted or No returns text near the shipping section.",
output="json",
schema=ITEM_SCHEMA,
use_proxy=True,
proxy_country="us",
))
for item in batch.items:
if item.status == "completed" and item.result:
results.append(item.result)
else:
print(f" Failed: {item.url}")
print(f"Batch {batch_num}/{total}: {batch.completed_count}/{batch.total_urls} succeeded")
return results
# Run
item_urls = collect_item_urls("wireless mouse", pages=2)
items = scrape_items(item_urls)
with open("ebay_items.jsonl", "w") as f:
for item in items:
f.write(json.dumps(item) + "\n")
print(f"Saved {len(items)} items to ebay_items.jsonl")Price monitoring pipeline
eBay is a particularly useful target for price monitoring because the same product typically has dozens of competing listings at different prices and conditions. Running a daily batch surfaces price drops, new cheap listings, and condition changes.
import os, json
from pathlib import Path
from spidra import SpidraClient, BatchScrapeParams
spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])
PRICE_SCHEMA = {
"type": "object",
"required": ["item_number", "title", "price", "condition"],
"properties": {
"item_number": {"type": "string"},
"title": {"type": "string"},
"price": {"type": ["number", "null"]},
"currency": {"type": ["string", "null"]},
"condition": {"type": "string"},
"availability":{"type": ["string", "null"]},
}
}
# Confirmed working item numbers
WATCHED_ITEMS = [
"125575167955", # Portable Wireless Mouse — $7.95
"125575135033", # Related listing from same seller
]
def load_previous(path="data/ebay_prices.json") -> dict:
p = Path(path)
return json.loads(p.read_text()) if p.exists() else {}
def save_current(data: dict, path="data/ebay_prices.json"):
Path(path).parent.mkdir(parents=True, exist_ok=True)
Path(path).write_text(json.dumps(data, indent=2))
def check_prices(item_numbers: list[str]) -> dict:
batch = spidra.batch.run_sync(BatchScrapeParams(
urls=[f"https://www.ebay.com/itm/{n}" for n in item_numbers],
prompt="Extract the item number, title, price, condition, and availability",
output="json",
schema=PRICE_SCHEMA,
use_proxy=True,
proxy_country="us",
))
results = {}
for item in batch.items:
if item.status == "completed" and item.result:
n = item.result.get("item_number")
if n:
results[n] = item.result
return results
def find_changes(previous: dict, current: dict, threshold: float = 5.0) -> list[dict]:
changes = []
for n, data in current.items():
curr = data.get("price")
prev = previous.get(n, {}).get("price")
if not curr or not prev or prev == 0:
continue
pct = ((curr - prev) / prev) * 100
if abs(pct) >= threshold:
changes.append({
"item_number": n,
"title": data.get("title", "")[:60],
"prev_price": prev,
"curr_price": curr,
"change_pct": round(pct, 1),
"direction": "up" if pct > 0 else "down",
})
return sorted(changes, key=lambda x: abs(x["change_pct"]), reverse=True)
previous = load_previous()
current = check_prices(WATCHED_ITEMS)
save_current(current)
changes = find_changes(previous, current)
if changes:
print(f"{len(changes)} price changes:")
for c in changes:
sign = "+" if c["direction"] == "up" else ""
print(f" {c['title']}: ${c['prev_price']} to ${c['curr_price']} ({sign}{c['change_pct']}%)")
else:
print("No significant price changes")For the full market research and data enrichment pipeline, swap in the full ITEM_SCHEMA to capture seller details, images, and specifications on each monitoring run.
