Why do I get SEK prices without proxyCountry?

Amazon geo-localises pricing based on the connecting IP's country. Without specifying a country, the residential proxy used may be routing through Sweden, Germany, or anywhere else. Setting proxyCountry: "us" pins the request to a US residential IP.

Why is rating null on search results?

Amazon renders star ratings in search results as SVG images. There is no readable numeric text in the DOM for the AI to extract. Get ratings from product detail pages, where the rating appears as clear text.

Can I scrape Amazon reviews?

The featured reviews visible on the public product detail page can be extracted. Since May 2026, Amazon's full review pagination redirects unauthenticated users to a login wall. Accessing that content requires an authenticated session, which moves into different legal territory.

Amazon Standard Identification Number. A 10-character alphanumeric code that uniquely identifies every product. It appears in every product URL after /dp/. The canonical product URL is https://www.amazon.com/dp/{ASIN}.

The JSON Schema Generator gave me a root array — how do I fix it?

Wrap it. The generator outputs "type": "array" when you paste in an array. The Spidra API requires "type": "object" at the root. Put the array inside an object: {"type": "object", "required": ["products"], "properties": {"products": { ... your array schema ... }}}. The JSON Schema Generator itself will be updated to handle this automatically.

How many ASINs can I process at once?

The batch scraping endpoint handles up to 50 URLs per request. Chunk larger lists into groups of 50 as shown in the pipeline above.

Does Spidra have a free tier?

The free plan at app.spidra.io includes 300 credits with no card required. A product detail page uses around 2-3 credits.

Blog/ How to scrape Amazon product data with Python and Node.js (2026)

June 26, 2026 · 14 min read

How to scrape Amazon product data with Python and Node.js (2026)

Joel Olawanle

How to scrape Amazon product data with Python and Node.js (2026)

If you have tried to pull product data from Amazon, you know how the first attempt goes. You write a quick requests call, add a User-Agent header, and it works for maybe three pages before you start getting block pages.

You add proxy rotation. You add delays. You get through more pages, but the HTML keeps shifting on you. The price is in .a-price-whole until it is not. The rating is in .AverageCustomerReviews until Amazon runs an A/B test and it moves somewhere else.

This guide covers how to scrape Amazon product pages and search results reliably, without the selector maintenance problem. All code examples use the Spidra REST API, with Python SDK and Node.js SDK alternatives for each section.

What makes Amazon hard to scrape in 2026

The core problem is that Amazon runs AWS WAF with Bot Control, which blocks datacenter IP ranges before requests even reach the application layer.

A plain requests call without residential proxies fails on the first attempt. Add residential proxies and you get through, but then you are managing proxy rotation, detecting when you get a CAPTCHA or a "dog page," and handling retries.

Assuming you solve the network layer, the data extraction is its own problem. Here is what scraping a product price looks like with BeautifulSoup after you get a real HTML response:

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, "html.parser")

# Price is split across two separate elements
whole    = soup.find(class_="a-price-whole")
fraction = soup.find(class_="a-price-fraction")

if whole and fraction:
    price = f"${whole.text.strip('.')}.{fraction.text.strip()}"
else:
    # Out of stock, or Amazon changed the element structure again
    price = None

# Rating text comes back as "4.6 out of 5 stars4.6 out of 5"
# so you split on the first occurrence
rating_div = soup.find(class_="AverageCustomerReviews")
rating = rating_div.text.strip().split(" out of")[0] if rating_div else None

# Images are not in a simple src attribute.
# They are serialised in JSON inside a script tag
# and the format changes between product types.

This works until it does not. Amazon changes class names and restructures product sections when it A/B tests page layouts. Any scraper built on Amazon selectors needs ongoing maintenance as a baseline assumption.

Note: Scraping publicly visible Amazon product data is generally considered legal in the United States based on the hiQ Labs v. LinkedIn ruling (Ninth Circuit 2022), which held that scraping publicly accessible data does not constitute unauthorised access under the Computer Fraud and Abuse Act. Amazon's Terms of Service prohibit automated access, which is a contractual restriction, and Amazon enforces it technically rather than legally in most cases.

The two Amazon page types you need to understand

Amazon product pages come in two forms that serve different purposes in a scraping pipeline.

A product detail page lives at amazon.com/dp/{ASIN}. The ASIN is a 10-character alphanumeric identifier that uniquely identifies every product on Amazon, and it is the key piece of infrastructure for everything else.

This is where you get the full picture: price, original price, discount percentage, star rating, review count, bullet-point features, images, seller, Prime status, Best Seller Rank with full category path, and specifications.

A search results page lives at amazon.com/s?k={keyword}. It returns 20-25 product cards per page. Each card has a title, ASIN, price, and review count.

The practical pattern is to use search results pages to collect ASINs at scale, then batch-scrape the product detail pages for full data. We build exactly this pipeline below.

Prerequisites

Sign up free at app.spidra.io. The free plan gives you 300 credits and no card is required. Get your API key from Settings → API Keys once you are in.

# Python SDK
pip install spidra

# Node.js SDK
npm install spidra

export SPIDRA_API_KEY="YOUR_API_KEY"

Scraping a product detail page

The clean URL format for any Amazon product is https://www.amazon.com/dp/{ASIN}. You will often see product URLs with long ref= query strings from clicking through search results. These work fine, but the /dp/ASIN format is cleaner to store and construct programmatically.

Building the schema

Passing a JSON Schema to the API tells it exactly what shape to return. Fields in required always appear in the output even if the page does not have a value for them — they come back as null. This matters for production pipelines because it means you can write to a database or pass the result downstream without defensive handling for missing fields.

If you want to generate a schema from an existing JSON sample rather than writing it by hand, the free JSON Schema Generator at spidra.io/tools does this in seconds.

Paste in any JSON output or use the [...] and it infers the full schema. Here is the schema for a product detail page:

{
  "type": "object",
  "required": ["title", "asin", "price", "availability"],
  "properties": {
    "title":               {"type": "string"},
    "brand":               {"type": ["string", "null"]},
    "asin":                {"type": "string"},
    "price":               {"type": ["number", "null"]},
    "original_price":      {"type": ["number", "null"]},
    "currency":            {"type": ["string", "null"]},
    "discount_percentage": {"type": ["integer", "null"]},
    "availability":        {"type": "string"},
    "rating":              {"type": ["number", "null"]},
    "review_count":        {"type": ["integer", "null"]},
    "features":            {"type": "array", "items": {"type": "string"}},
    "images":              {"type": "array", "items": {"type": "string"}},
    "seller":              {"type": ["string", "null"]},
    "ships_from":          {"type": ["string", "null"]},
    "prime":               {"type": ["boolean", "null"]},
    "bsr_rank":            {"type": ["integer", "null"]},
    "bsr_category":        {"type": ["string", "null"]},
    "color":               {"type": ["string", "null"]},
    "model_number":        {"type": ["string", "null"]}
  }
}

REST API

The API follows an async job pattern. You POST a scrape request, receive a jobId, and poll GET /api/scrape/{jobId} until the status is completed.

# Submit
curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [{"url": "https://www.amazon.com/dp/B0G8G4SQXQ"}],
    "prompt": "Extract the full product details",
    "output": "json",
    "useProxy": true,
    "proxyCountry": "us",
    "schema": { ... }
  }'

# Returns: {"jobId": "abc-123", "status": "queued"}

# Poll
curl https://api.spidra.io/api/scrape/abc-123 \
  -H "x-api-key: YOUR_API_KEY"

Python SDK

import os
from spidra import SpidraClient, ScrapeParams, ScrapeUrl

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

PRODUCT_SCHEMA = {
    "type": "object",
    "required": ["title", "asin", "price", "availability"],
    "properties": {
        "title":               {"type": "string"},
        "brand":               {"type": ["string", "null"]},
        "asin":                {"type": "string"},
        "price":               {"type": ["number", "null"]},
        "original_price":      {"type": ["number", "null"]},
        "currency":            {"type": ["string", "null"]},
        "discount_percentage": {"type": ["integer", "null"]},
        "availability":        {"type": "string"},
        "rating":              {"type": ["number", "null"]},
        "review_count":        {"type": ["integer", "null"]},
        "features":            {"type": "array", "items": {"type": "string"}},
        "images":              {"type": "array", "items": {"type": "string"}},
        "seller":              {"type": ["string", "null"]},
        "ships_from":          {"type": ["string", "null"]},
        "prime":               {"type": ["boolean", "null"]},
        "bsr_rank":            {"type": ["integer", "null"]},
        "bsr_category":        {"type": ["string", "null"]},
        "color":               {"type": ["string", "null"]},
        "model_number":        {"type": ["string", "null"]}
    }
}

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[ScrapeUrl(url="https://www.amazon.com/dp/B0G8G4SQXQ")],
    prompt="Extract the full product details",
    output="json",
    schema=PRODUCT_SCHEMA,
    use_proxy=True,
    proxy_country="us",
))

print(job.result.content)

Node.js SDK

import { SpidraClient } from 'spidra'

const spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY! })

const job = await spidra.scrape.run({
  urls: [{ url: 'https://www.amazon.com/dp/B0G8G4SQXQ' }],
  prompt: 'Extract the full product details',
  output: 'json',
  schema: PRODUCT_SCHEMA,
  useProxy: true,
  proxyCountry: 'us',
})

console.log(job.result.content)

You can also use the Spidra playground:

What comes back

Scraping https://www.amazon.com/dp/B0G8G4SQXQ with proxyCountry: "us" returned this:

{
  "title": "Mopchnic Wireless Headset with Noise Cancelling Microphone",
  "brand": "Mopchnic",
  "asin": "B0G8G4SQXQ",
  "price": 29.99,
  "original_price": 46.99,
  "currency": "$",
  "discount_percentage": 36,
  "availability": "In Stock",
  "rating": 4.5,
  "review_count": 5000,
  "features": [
    "Active Noise Cancelling",
    "Built-in Microphone",
    "Bluetooth 5.3",
    "Hi-Fi Stereo Sound",
    "USB-C Charging"
  ],
  "images": [
    "https://m.media-amazon.com/images/I/61+j+lJ6eJL._AC_SL1500_.jpg",
    "https://m.media-amazon.com/images/I/61c+5p-3pNL._AC_SL1500_.jpg",
    "https://m.media-amazon.com/images/I/61Xg8p1k73L._AC_SL1500_.jpg"
  ],
  "seller": "Mopchnic Official Store",
  "ships_from": "Amazon",
  "prime": true,
  "bsr_rank": 15,
  "bsr_category": "Electronics > Headphones > Over-Ear Headphones",
  "color": "Black",
  "model_number": "MCH-ANC-BT53-BLK"
}

You'll notice that bsr_category returns the full path, not just the leaf category, discount_percentage comes from what is shown on the page, not computed from the two prices, and the currency field returns the symbol as rendered — "$" rather than "USD".

If your pipeline needs ISO codes, either add that instruction to the prompt ("return currency as a 3-letter ISO code like USD, GBP, EUR") or normalize in code:

CURRENCY_MAP = {"$": "USD", "£": "GBP", "€": "EUR", "¥": "JPY", "CA$": "CAD"}
product["currency"] = CURRENCY_MAP.get(product.get("currency", ""), product.get("currency"))

Why `proxyCountry` matters

Without a proxy country preference, the request routes through whichever residential proxy is geographically closest to available capacity. Amazon serves localized pricing based on the connecting IP's country.

Testing without proxyCountry returned prices in SEK at 250 SEK for the same product that costs $29.99 in the US. Setting proxyCountry: "us" gives you consistent USD pricing on amazon.com.

If you are scraping a regional marketplace, match the country:

Marketplace	Domain	proxyCountry
United States	amazon.com	`"us"`
United Kingdom	amazon.co.uk	`"gb"`
Germany	amazon.de	`"de"`
Canada	amazon.ca	`"ca"`
Japan	amazon.co.jp	`"jp"`

Scraping Amazon search results

Search results pages work on the same async pattern. The main difference is the output shape: instead of one product object, you get a list of product cards. Because the API requires the root schema type to be "object", you wrap the list in a named key:

{
  "type": "object",
  "required": ["products"],
  "properties": {
    "products": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["title", "asin"],
        "properties": {
          "title":        {"type": "string"},
          "asin":         {"type": "string"},
          "url":          {"type": ["string", "null"]},
          "price":        {"type": ["number", "null"]},
          "currency":     {"type": ["string", "null"]},
          "rating":       {"type": ["number", "null"]},
          "review_count": {"type": ["integer", "null"]},
          "prime":        {"type": ["boolean", "null"]},
          "sponsored":    {"type": ["boolean", "null"]},
          "thumbnail":    {"type": ["string", "null"]}
        }
      }
    }
  }
}

Amazon lazy-loads thumbnails as the page is scrolled. Without scroll actions, image URLs come back null on most cards. Adding a scroll sequence before extraction triggers the lazy load:

REST API

curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      {
        "url": "https://www.amazon.com/s?k=wireless+headphones",
        "actions": [
          {"type": "scroll", "to": "50%"},
          {"type": "wait", "duration": 1000},
          {"type": "scroll", "to": "100%"},
          {"type": "wait", "duration": 1000}
        ]
      }
    ],
    "prompt": "Extract all product listings. For rating check aria-label attributes containing out of 5 stars. For prime check for Prime badge images or prime in class names. For thumbnail use the m.media-amazon.com image src. Clean product URLs to https://www.amazon.com/dp/ASIN format.",
    "output": "json",
    "useProxy": true,
    "proxyCountry": "us",
    "schema": { ... }
  }'

Python SDK

from spidra import BrowserAction

job = spidra.scrape.run_sync(ScrapeParams(
    urls=[
        ScrapeUrl(
            url="https://www.amazon.com/s?k=wireless+headphones",
            actions=[
                BrowserAction(type="scroll", to="50%"),
                BrowserAction(type="wait", duration=1000),
                BrowserAction(type="scroll", to="100%"),
                BrowserAction(type="wait", duration=1000),
            ]
        )
    ],
    prompt="Extract all product listings. For rating check aria-label attributes containing out of 5 stars. For prime check for Prime badge images or prime in class names. For thumbnail use the m.media-amazon.com image src. Clean product URLs to https://www.amazon.com/dp/ASIN format.",
    output="json",
    schema=SEARCH_SCHEMA,
    use_proxy=True,
    proxy_country="us",
))

products = job.result.content["products"]
print(f"Got {len(products)} products")

Node.js SDK

const job = await spidra.scrape.run({
  urls: [{
    url: 'https://www.amazon.com/s?k=wireless+headphones',
    actions: [
      { type: 'scroll', to: '50%' },
      { type: 'wait', duration: 1000 },
      { type: 'scroll', to: '100%' },
      { type: 'wait', duration: 1000 },
    ],
  }],
  prompt: 'Extract all product listings. For rating check aria-label attributes containing out of 5 stars. For prime check for Prime badge images or prime in class names. For thumbnail use the m.media-amazon.com image src. Clean product URLs to https://www.amazon.com/dp/ASIN format.',
  output: 'json',
  schema: SEARCH_SCHEMA,
  useProxy: true,
  proxyCountry: 'us',
})

const { products } = job.result.content as any

What the results look like

From a real test against https://www.amazon.com/headset/s?k=headset:

{
  "products": [
    {
      "title": "HyperX Cloud II Gaming Headset - 7.1 Surround Sound",
      "asin": "B00SAYCXWG",
      "url": "https://www.amazon.com/HyperX-Cloud-Gaming-Headset-KHX-HSCP-RD/dp/B00SAYCXWG",
      "price": 49.99,
      "currency": "USD",
      "rating": null,
      "review_count": 2000,
      "prime": false,
      "sponsored": false,
      "thumbnail": "https://m.media-amazon.com/images/I/71631Jb-dZL._AC_SX679_.jpg"
    },
    {
      "title": "Bose QuietComfort Headphones - Wireless Bluetooth, Active Noise Cancelling",
      "asin": "B0CCZ26B5V",
      "url": "https://www.amazon.com/Bose-QuietComfort-Cancelling-Headphones-Bluetooth/dp/B0CCZ26B5V",
      "price": 249.00,
      "currency": "USD",
      "rating": null,
      "review_count": 7000,
      "prime": false,
      "sponsored": false,
      "thumbnail": "https://m.media-amazon.com/images/I/71kW6VVY1cL._AC_SX679_.jpg"
    }
  ]
}

In the returned data above, rating is null on all products. This is because Amazon renders star ratings in search results as SVG graphics, not as readable text in the DOM.

There is no reliable way to extract a number from a star image at this stage. You can collect ratings from the product detail page, which always has the rating as clear text.

The URLs come back clean in /dp/ASIN format, which is exactly what you need for the next stage.

Collecting at scale: search to batch PDPs

One search page gives you roughly 20-25 ASINs. Three pages gives you 60-75. From there, a single batch request can scrape all of them in parallel, up to 50 at a time. The full pipeline looks like this:

import os, json
from spidra import SpidraClient, ScrapeParams, ScrapeUrl, BrowserAction, BatchScrapeParams

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

def collect_asins(keyword: str, pages: int = 3) -> list[str]:
    asins = []
    seen = set()

    for page in range(1, pages + 1):
        url = f"https://www.amazon.com/s?k={keyword.replace(' ', '+')}&page={page}"

        job = spidra.scrape.run_sync(ScrapeParams(
            urls=[
                ScrapeUrl(
                    url=url,
                    actions=[
                        BrowserAction(type="scroll", to="50%"),
                        BrowserAction(type="wait", duration=1000),
                        BrowserAction(type="scroll", to="100%"),
                        BrowserAction(type="wait", duration=1000),
                    ]
                )
            ],
            prompt="Extract all product listings. Clean product URLs to https://www.amazon.com/dp/ASIN format.",
            output="json",
            schema=SEARCH_SCHEMA,
            use_proxy=True,
            proxy_country="us",
        ))

        if job.result.ai_extraction_failed:
            print(f"Page {page}: extraction failed, skipping")
            continue

        for p in job.result.content.get("products", []):
            asin = p.get("asin")
            if asin and asin not in seen:
                asins.append(asin)
                seen.add(asin)

        print(f"Page {page}: {len(asins)} unique ASINs collected")

    return asins


def scrape_products(asins: list[str]) -> list[dict]:
    results = []
    urls = [f"https://www.amazon.com/dp/{asin}" for asin in asins]

    for i in range(0, len(urls), 50):
        chunk = urls[i:i + 50]
        batch_num = i // 50 + 1
        total = -(-len(urls) // 50)

        batch = spidra.batch.run_sync(BatchScrapeParams(
            urls=chunk,
            prompt="Extract the full product details",
            output="json",
            schema=PRODUCT_SCHEMA,
            use_proxy=True,
            proxy_country="us",
        ))

        for item in batch.items:
            if item.status == "completed" and item.result:
                results.append(item.result)
            else:
                print(f"  Failed: {item.url}")

        print(f"Batch {batch_num}/{total}: {batch.completed_count}/{batch.total_urls} succeeded")

    return results


asins = collect_asins("wireless headphones", pages=2)
products = scrape_products(asins)

with open("amazon_products.jsonl", "w") as f:
    for product in products:
        f.write(json.dumps(product) + "\n")

print(f"Saved {len(products)} products to amazon_products.jsonl")

The Node.js version of the same pipeline:

import { SpidraClient } from 'spidra'
import { writeFileSync } from 'fs'
import * as os from 'os'

const spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY! })

async function collectAsins(keyword: string, pages = 3): Promise<string[]> {
  const asins: string[] = []
  const seen = new Set<string>()

  for (let page = 1; page <= pages; page++) {
    const url = `https://www.amazon.com/s?k=${keyword.replace(/ /g, '+')}&page=${page}`

    const job = await spidra.scrape.run({
      urls: [{
        url,
        actions: [
          { type: 'scroll', to: '50%' },
          { type: 'wait', duration: 1000 },
          { type: 'scroll', to: '100%' },
          { type: 'wait', duration: 1000 },
        ],
      }],
      prompt: 'Extract all product listings. Clean product URLs to https://www.amazon.com/dp/ASIN format.',
      output: 'json',
      schema: SEARCH_SCHEMA,
      useProxy: true,
      proxyCountry: 'us',
    })

    const products = (job.result.content as any)?.products ?? []
    for (const p of products) {
      if (p.asin && !seen.has(p.asin)) {
        asins.push(p.asin)
        seen.add(p.asin)
      }
    }
    console.log(`Page ${page}: ${asins.length} unique ASINs`)
  }

  return asins
}

async function scrapeProducts(asins: string[]): Promise<unknown[]> {
  const results: unknown[] = []
  const urls = asins.map(a => `https://www.amazon.com/dp/${a}`)

  for (let i = 0; i < urls.length; i += 50) {
    const chunk = urls.slice(i, i + 50)
    const batch = await spidra.batch.run({
      urls: chunk,
      prompt: 'Extract the full product details',
      output: 'json',
      schema: PRODUCT_SCHEMA,
      useProxy: true,
      proxyCountry: 'us',
    })

    for (const item of batch.items) {
      if (item.status === 'completed' && item.result) results.push(item.result)
    }
    console.log(`Batch: ${batch.completedCount}/${batch.totalUrls}`)
  }

  return results
}

const asins = await collectAsins('wireless headphones', 2)
const products = await scrapeProducts(asins)

writeFileSync('amazon_products.jsonl', products.map(p => JSON.stringify(p)).join(os.EOL))
console.log(`Saved ${products.length} products`)

Price monitoring

Once you have a list of ASINs you track regularly, checking them for price changes is a straightforward batch job. Run it daily, compare to the previous snapshot, and surface anything that moved by more than your threshold.

import os, json
from pathlib import Path
from spidra import SpidraClient, BatchScrapeParams

spidra = SpidraClient(api_key=os.environ["SPIDRA_API_KEY"])

PRICE_SCHEMA = {
    "type": "object",
    "required": ["asin", "title", "price", "availability"],
    "properties": {
        "asin":           {"type": "string"},
        "title":          {"type": "string"},
        "price":          {"type": ["number", "null"]},
        "original_price": {"type": ["number", "null"]},
        "currency":       {"type": ["string", "null"]},
        "availability":   {"type": "string"},
    }
}

# Real ASINs tested and confirmed working
WATCHED_ASINS = [
    "B0G8G4SQXQ",  # Mopchnic Wireless Headset
    "B00SAYCXWG",  # HyperX Cloud II
    "B0C3BV19Q3",  # HyperX Cloud III
    "B0BS1RT9S2",  # Sony WH-CH520
    "B0CCZ26B5V",  # Bose QuietComfort
]

def load_previous(path="data/prices.json") -> dict:
    p = Path(path)
    return json.loads(p.read_text()) if p.exists() else {}

def save_current(data: dict, path="data/prices.json"):
    Path(path).parent.mkdir(parents=True, exist_ok=True)
    Path(path).write_text(json.dumps(data, indent=2))

def check_prices(asins: list[str]) -> dict:
    batch = spidra.batch.run_sync(BatchScrapeParams(
        urls=[f"https://www.amazon.com/dp/{a}" for a in asins],
        prompt="Extract the product ASIN, title, price, and availability",
        output="json",
        schema=PRICE_SCHEMA,
        use_proxy=True,
        proxy_country="us",
    ))
    results = {}
    for item in batch.items:
        if item.status == "completed" and item.result:
            asin = item.result.get("asin")
            if asin:
                results[asin] = item.result
    return results

def find_changes(previous: dict, current: dict, threshold: float = 3.0) -> list[dict]:
    changes = []
    for asin, data in current.items():
        curr = data.get("price")
        prev = previous.get(asin, {}).get("price")
        if not curr or not prev or prev == 0:
            continue
        pct = ((curr - prev) / prev) * 100
        if abs(pct) >= threshold:
            changes.append({
                "asin":       asin,
                "title":      data.get("title", "")[:60],
                "prev_price": prev,
                "curr_price": curr,
                "change_pct": round(pct, 1),
                "direction":  "up" if pct > 0 else "down",
            })
    return sorted(changes, key=lambda x: abs(x["change_pct"]), reverse=True)


previous = load_previous()
current = check_prices(WATCHED_ASINS)
save_current(current)

changes = find_changes(previous, current)
if changes:
    print(f"{len(changes)} price changes:")
    for c in changes:
        sign = "+" if c["direction"] == "up" else ""
        print(f"  {c['title']}: ${c['prev_price']} to ${c['curr_price']} ({sign}{c['change_pct']}%)")
else:
    print("No significant price changes")

This covers the market research and data enrichment pattern for e-commerce. You can swap in the full product schema to access more detailed data on each monitoring run without changing the pipeline structure.

Frequently asked questions

Scraping publicly visible product data is generally considered legal in the US based on the hiQ Labs v. LinkedIn ruling (2022). Amazon's Terms of Service prohibit automated access, which is a contractual restriction, and Amazon enforces this primarily through technical means. The line is public pages only. We cover the full legal picture in our web scraping legality guide.

Share this article

Tutorials

How to scrape Amazon reviews in 2026

How to scrape Amazon reviews in 2026 using the Spidra API and BeautifulSoup. Step-by-step guide covering anti-bot bypass, JSON schema, real output, and CSV export.

June 26, 2026 · 11 min read

Tutorials

Spidra API Node.js tutorial: scrape any website with JavaScript and TypeScript

Scrape any website with Node.js using the Spidra API. AI extraction, JSON schema with Zod, browser actions, batch scraping, crawling, and Next.js examples included.

June 12, 2026 · 16 min read

Tutorials

Spidra API Python tutorial: scrape any website with Python

Scrape any website with Python using the Spidra API. Covers AI extraction, JSON schema, browser actions, batch scraping, and crawling with working code examples.

June 10, 2026 · 16 min read

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.

How to scrape Amazon product data with Python and Node.js (2026)

What makes Amazon hard to scrape in 2026

The two Amazon page types you need to understand

Prerequisites

Scraping a product detail page

Building the schema

REST API

Python SDK

Node.js SDK

What comes back

Why proxyCountry matters

Scraping Amazon search results

REST API

Python SDK

Node.js SDK

What the results look like

Collecting at scale: search to batch PDPs

Price monitoring

Frequently asked questions

Share this article

Related posts

How to scrape Amazon reviews in 2026

Spidra API Node.js tutorial: scrape any website with JavaScript and TypeScript

Spidra API Python tutorial: scrape any website with Python

Start scraping for free.

Why `proxyCountry` matters