How many reviews can I scrape per product page?

Amazon surfaces 10-13 featured reviews on the product detail page. These are what both methods above extract. The full review pagination requires a login as of May 2026.

Why is review_image returning the string "null"?

When a reviewer did not upload a photo, the AI returns the string "null" rather than a JSON null. Add a normalisation step after extraction: if r.get("review_image") == "null": r["review_image"] = None.

Why does BeautifulSoup return a 503 or empty content?

Amazon has blocked the request at the network layer. A plain HTTP request without residential proxies rarely gets through Amazon's WAF. You would need proxy rotation and browser rendering to reliably access the page — which is what the Spidra approach handles automatically.

Can I use the Python or Node.js SDK instead of the REST API?

Yes. The Python SDK and Node.js SDK accept the same schema and prompt parameters. run_sync() in Python and run() in Node.js both handle the polling automatically.

How do I scrape reviews for multiple products?

Use the batch endpoint with a list of product URLs in /dp/{ASIN} format. The batch endpoint processes up to 50 in parallel per request.

Blog/ How to scrape Amazon reviews in 2026

June 26, 2026 · 10 min read

How to scrape Amazon reviews in 2026

Joel Olawanle

Product reviews are some of the most useful data on Amazon. They are useful for sentiment analysis, competitive research, feature gap tracking, and quality monitoring across your own listings.

Every product page surfaces them publicly, and this guide walks through how to extract them. We will use the Logitech G502 Hero gaming mouse as our target throughout:

https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/

By the end you will have code that extracts the following from any Amazon product page:

Reviewer name
Review title
Review text
Star rating
Review date
Review image URL

We will cover two approaches. The first uses the Spidra API, which is faster to set up and no selector maintenance required. The second uses Python with BeautifulSoup to show the manual approach and why it is harder to maintain at scale.

What changed in May 2026

Before diving in, it is worth knowing that Amazon's public review pagination endpoint at /product-reviews/{ASIN}/ now requires a login. If you send a request to that URL without an authenticated session, you get a sign-in page. That path is no longer available for public scraping.

What is still accessible is the featured reviews section on the product detail page itself. Amazon shows 10-13 reviews directly on every product page, visible to any visitor without authentication. That is what both methods in this guide target.

Why Amazon blocks review scrapers

Amazon runs AWS WAF with Bot Control, which makes scraping its pages harder than most sites. Three defences are worth understanding before writing any code.

CAPTCHA challenges appear when Amazon detects automated behaviour — repeated requests from the same IP, missing browser headers, or unusual request timing. They block the scraping process entirely unless solved.
Rate limiting kicks in when too many requests come from a single IP within a short window. Amazon may temporarily block the IP or serve a CAPTCHA as a checkpoint.
IP blocking happens when traffic patterns look suspicious at the network layer. Datacenter IP ranges are blocked outright by default. Residential proxies are more reliable, but managing and rotating them adds significant overhead to any DIY solution.

For a few requests, you can get by with a carefully crafted User-Agent header and some delay between requests. At any real scale, you need proxies, CAPTCHA handling, and browser rendering — or a tool that handles all three for you.

Method 1: scraping Amazon reviews with Spidra

The Spidra API handles the proxy rotation, CAPTCHA solving, and browser rendering automatically. You describe the data you want, define the output schema, and get structured JSON back.

Step 1: get your API key

Sign up free at app.spidra.io. The free plan gives you 300 credits with no card required. Your API key is under Settings → API Keys once you are in.

Step 2: define the schema

A JSON Schema tells Spidra exactly what shape to return. Every field in required always appears in the output — as null if the page does not have that value. Because the output is a list of reviews, the array sits inside an object with a named key. The Spidra API requires the root type to always be "object".

If you want to generate this schema automatically from a sample output rather than writing it by hand, paste your example JSON into the free JSON Schema Generator and it builds the structure for you.

{
  "type": "object",
  "required": ["reviews"],
  "properties": {
    "reviews": {
      "type": "array",
      "items": {
        "type": "object",
        "required": [
          "reviewer_name",
          "review_title",
          "review_date",
          "review_text",
          "review_rating",
          "review_image"
        ],
        "properties": {
          "reviewer_name": {"type": "string"},
          "review_title":  {"type": "string"},
          "review_date":   {"type": "string"},
          "review_text":   {"type": "string"},
          "review_rating": {"type": "string"},
          "review_image":  {"type": ["string", "null"]}
        }
      }
    }
  }
}

Step 3: submit the scrape job

The Spidra API is async. You POST a request, receive a jobId, then poll until the status is completed.

curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      {
        "url": "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
      }
    ],
    "prompt": "Extract all customer reviews visible on the page. For review_image return the image URL if the reviewer uploaded a photo, otherwise return null.",
    "output": "json",
    "useProxy": true,
    "proxyCountry": "us",
    "schema": {
      "type": "object",
      "required": ["reviews"],
      "properties": {
        "reviews": {
          "type": "array",
          "items": {
            "type": "object",
            "required": [
              "reviewer_name", "review_title", "review_date",
              "review_text", "review_rating", "review_image"
            ],
            "properties": {
              "reviewer_name": {"type": "string"},
              "review_title":  {"type": "string"},
              "review_date":   {"type": "string"},
              "review_text":   {"type": "string"},
              "review_rating": {"type": "string"},
              "review_image":  {"type": ["string", "null"]}
            }
          }
        }
      }
    }
  }'

You will get back:

{"jobId": "abc-123", "status": "queued"}

Step 4: poll for the result

curl https://api.spidra.io/api/scrape/abc-123 \
  -H "x-api-key: YOUR_API_KEY"

Keep polling every few seconds until status is completed. Here is the full flow in Python using requests directly:

import requests, time, json, os

API_KEY = os.environ["SPIDRA_API_KEY"]
BASE    = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

REVIEW_SCHEMA = {
    "type": "object",
    "required": ["reviews"],
    "properties": {
        "reviews": {
            "type": "array",
            "items": {
                "type": "object",
                "required": [
                    "reviewer_name", "review_title", "review_date",
                    "review_text", "review_rating", "review_image"
                ],
                "properties": {
                    "reviewer_name": {"type": "string"},
                    "review_title":  {"type": "string"},
                    "review_date":   {"type": "string"},
                    "review_text":   {"type": "string"},
                    "review_rating": {"type": "string"},
                    "review_image":  {"type": ["string", "null"]}
                }
            }
        }
    }
}

# Submit
resp = requests.post(f"{BASE}/scrape", headers=HEADERS, json={
    "urls": [{"url": "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"}],
    "prompt": "Extract all customer reviews visible on the page. For review_image return the image URL if the reviewer uploaded a photo, otherwise return null.",
    "output": "json",
    "useProxy": True,
    "proxyCountry": "us",
    "schema": REVIEW_SCHEMA,
})
job_id = resp.json()["jobId"]
print(f"Job submitted: {job_id}")

# Poll
while True:
    result = requests.get(f"{BASE}/scrape/{job_id}", headers=HEADERS).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    time.sleep(3)

reviews = result["result"]["content"]["reviews"]
print(f"Extracted {len(reviews)} reviews")

Step 5: check the output

Running this against the Logitech G502 Hero page returned 13 reviews:

{
  "reviews": [
    {
      "reviewer_name": "Subnet",
      "review_title": "Best mouse hands down!",
      "review_date": "March 14, 2021",
      "review_text": "There is a ton to like about this mouse from how it fits into your hand, the grip, and the competitive edge for gaming you will begin to notice immediately...",
      "review_rating": "5 out of 5 stars",
      "review_image": "https://m.media-amazon.com/images/I/619veDQXQsL._SY500_.jpg"
    },
    {
      "reviewer_name": "Alex Reeves",
      "review_title": "Excellent upgrade from entry-level gaming mice",
      "review_date": "March 12, 2026",
      "review_text": "The Logitech G502 Hero is an excellent wired gaming mouse with great build quality, strong customization options, and a very accurate sensor...",
      "review_rating": "5 out of 5 stars",
      "review_image": null
    },
    {
      "reviewer_name": "David Malchin",
      "review_title": "Better Than Expected",
      "review_date": "June 5, 2026",
      "review_text": "Wanting to try this popular gaming mouse, I thought it looked a bit silly on the images and wasn't sure how comfortable it would be...",
      "review_rating": "4 out of 5 stars",
      "review_image": null
    }
  ]
}

One small things to handle in post-processing.

review_rating comes back as "5 out of 5 stars". If you need the numeric value for averaging or sorting:

def parse_rating(s: str) -> float | None:
    try:
        return float(s.split(" out of")[0])
    except (ValueError, AttributeError):
        return None

for r in reviews:
    r["rating_score"] = parse_rating(r["review_rating"])

Step 6: export to CSV

import csv

csv_file = "amazon_reviews.csv"

with open(csv_file, mode="w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=[
        "reviewer_name", "review_title", "review_date",
        "review_text", "review_rating", "review_image"
    ])
    writer.writeheader()
    writer.writerows(reviews)

print(f"Saved {len(reviews)} reviews to {csv_file}")

If you prefer to work with the Python SDK or Node.js SDK instead of requests directly, both support the same schema and prompt fields. The SDK handles the polling for you.

Method 2: scraping Amazon reviews with BeautifulSoup

This approach uses Python's requests library to fetch the page HTML and BeautifulSoup to parse out the review fields one by one.

It is useful to understand how the page is structured, and the output is the same, but every selector you write is a dependency that breaks when Amazon updates its frontend.

Prerequisites

pip install requests beautifulsoup4

Fetch the page HTML

Start with a basic request to get the full HTML of the product page:

import requests
from bs4 import BeautifulSoup

target_url = "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "en-US,en;q=0.9",
}

response = requests.get(target_url, headers=headers)

if response.status_code != 200:
    print(f"Request failed: {response.status_code}")
else:
    soup = BeautifulSoup(response.text, "html.parser")
    print("Page fetched successfully")

If you get a 503 or a CAPTCHA page instead of a 200, Amazon has already detected the automated request. This is where proxy rotation and browser rendering would be needed to continue, which is what the Spidra approach handles automatically.

Locate and extract each field

Once you have the HTML, you find each review field by inspecting the page in DevTools and identifying the element class names.

Reviewer names sit inside a <span> tag with the class a-profile-name:

reviewer_names = soup.find_all("span", class_="a-profile-name")
names_list = [name.text.strip() for name in reviewer_names]

Review titles are in an <a> tag with the class review-title. Amazon embeds the star rating text inside the same element, so you need to strip it:

review_titles = soup.find_all("a", class_="review-title")
titles_list = [t.text.replace("5.0 out of 5 stars\n", "").strip() for t in review_titles]

Review text is in a <span> tag with the class review-text:

review_texts = soup.find_all("span", class_="review-text")
texts_list = [t.get_text(separator="\n").strip() for t in review_texts]

Review dates are in a <span> tag with the class review-date:

review_dates = soup.find_all("span", class_="review-date")
dates_list = [d.text.strip() for d in review_dates]

Ratings are in an <i> tag with the class review-rating. The text includes "out of 5" twice, so split on the first occurrence:

review_ratings = soup.find_all("i", class_="review-rating")
ratings_list = [r.text.strip().split(" out of")[0] for r in review_ratings]

Review images are in an <img> tag with the class review-image-tile. Only reviews where the customer uploaded a photo will have this element:

review_images = soup.find_all("img", class_="review-image-tile")
image_urls = [img["src"] for img in review_images]

Complete code

Here is the final updated scraper code:

import requests
from bs4 import BeautifulSoup
import csv

target_url = "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "en-US,en;q=0.9",
}

response = requests.get(target_url, headers=headers)

if response.status_code != 200:
    print(f"Request failed: {response.status_code}")
else:
    soup = BeautifulSoup(response.text, "html.parser")

    reviewer_names  = [n.text.strip() for n in soup.find_all("span", class_="a-profile-name")]
    review_titles   = [t.text.replace("5.0 out of 5 stars\n", "").strip() for t in soup.find_all("a", class_="review-title")]
    review_texts    = [t.get_text(separator="\n").strip() for t in soup.find_all("span", class_="review-text")]
    review_dates    = [d.text.strip() for d in soup.find_all("span", class_="review-date")]
    review_ratings  = [r.text.strip().split(" out of")[0] for r in soup.find_all("i", class_="review-rating")]
    review_images   = [img["src"] for img in soup.find_all("img", class_="review-image-tile")]

    csv_file = "amazon_reviews.csv"
    with open(csv_file, mode="w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(["Reviewer Name", "Review Title", "Review Text", "Review Date", "Rating", "Image URL"])
        for i in range(len(reviewer_names)):
            writer.writerow([
                reviewer_names[i],
                review_titles[i]  if i < len(review_titles)  else "N/A",
                review_texts[i]   if i < len(review_texts)   else "N/A",
                review_dates[i]   if i < len(review_dates)   else "N/A",
                review_ratings[i] if i < len(review_ratings) else "N/A",
                review_images[i]  if i < len(review_images)  else "N/A",
            ])

    print(f"Saved to {csv_file}")

This works on a good day. Amazon notes in its own developer documentation that it changes its page layout regularly.

The class names a-profile-name, review-title, review-text, review-date, and review-rating are not guaranteed to stay stable. When Amazon pushes a frontend update, the selectors return empty lists or misaligned data without any error, and you may not notice until you check the output.

Each selector is a maintenance debt you carry indefinitely.

Wrapping up

You now have two working approaches for scraping Amazon reviews. The BeautifulSoup method works and is worth understanding because it shows how the page is structured.

The Spidra approach removes the selector maintenance entirely — the prompt and schema stay the same regardless of what Amazon changes in its frontend.

Both methods target the featured reviews on the product detail page, which is the publicly accessible review data after the May 2026 changes to the pagination endpoint.

If you want to pull reviews for many products at once, the same schema works in a batch request against multiple /dp/{ASIN} URLs. The batch scraping guide covers how to run up to 50 URLs in parallel.

For the full picture on scraping Amazon product data including prices, BSR rankings, and search results, see the guide on how to scrape Amazon product data.

Frequently asked questions

Scraping publicly visible review data from Amazon product pages is generally considered legal in the United States based on the hiQ Labs v. LinkedIn ruling (2022). The featured reviews on a product detail page are accessible to any visitor without authentication. Amazon's Terms of Service prohibit automated access contractually, and enforcement is primarily technical.

Share this article

Tutorials

How to scrape Amazon with Selenium: step-by-step tutorial (2026)

Step-by-step tutorial for scraping Amazon product pages with Selenium in Python. Covers headless Chrome setup, extracting price, rating, and images, plus why Selenium struggles with Amazon at scale.

June 26, 2026 · 10 min read

Tutorials

How to scrape Amazon with Python: step-by-step tutorial (2026)

Step-by-step tutorial for scraping Amazon product pages with Python in 2026. Covers BeautifulSoup selectors, search results, pagination, and working around blocks.

June 26, 2026 · 13 min read

Tutorials

How to scrape Amazon product data with Python and Node.js (2026)

How to scrape Amazon product data with Python and Node.js in 2026. Covers product pages, search results, batch scraping 50 ASINs in parallel, and price monitoring.

June 26, 2026 · 16 min read

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.

How to scrape Amazon reviews in 2026

What changed in May 2026

Why Amazon blocks review scrapers

Method 1: scraping Amazon reviews with Spidra

Step 1: get your API key

Step 2: define the schema

Step 3: submit the scrape job

Step 4: poll for the result

Step 5: check the output

Step 6: export to CSV

Method 2: scraping Amazon reviews with BeautifulSoup

Prerequisites

Fetch the page HTML

Locate and extract each field

Complete code

Wrapping up

Frequently asked questions

Share this article

Related posts

How to scrape Amazon with Selenium: step-by-step tutorial (2026)

How to scrape Amazon with Python: step-by-step tutorial (2026)

How to scrape Amazon product data with Python and Node.js (2026)

Start scraping for free.