Product reviews are some of the most useful data on Amazon. They are useful for sentiment analysis, competitive research, feature gap tracking, and quality monitoring across your own listings.
Every product page surfaces them publicly, and this guide walks through how to extract them. We will use the Logitech G502 Hero gaming mouse as our target throughout:
https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/By the end you will have code that extracts the following from any Amazon product page:
- Reviewer name
- Review title
- Review text
- Star rating
- Review date
- Review image URL
We will cover two approaches. The first uses the Spidra API, which is faster to set up and no selector maintenance required. The second uses Python with BeautifulSoup to show the manual approach and why it is harder to maintain at scale.
What changed in May 2026
Before diving in, it is worth knowing that Amazon's public review pagination endpoint at /product-reviews/{ASIN}/ now requires a login. If you send a request to that URL without an authenticated session, you get a sign-in page. That path is no longer available for public scraping.
What is still accessible is the featured reviews section on the product detail page itself. Amazon shows 10-13 reviews directly on every product page, visible to any visitor without authentication. That is what both methods in this guide target.
Why Amazon blocks review scrapers
Amazon runs AWS WAF with Bot Control, which makes scraping its pages harder than most sites. Three defences are worth understanding before writing any code.
- CAPTCHA challenges appear when Amazon detects automated behaviour — repeated requests from the same IP, missing browser headers, or unusual request timing. They block the scraping process entirely unless solved.
- Rate limiting kicks in when too many requests come from a single IP within a short window. Amazon may temporarily block the IP or serve a CAPTCHA as a checkpoint.
- IP blocking happens when traffic patterns look suspicious at the network layer. Datacenter IP ranges are blocked outright by default. Residential proxies are more reliable, but managing and rotating them adds significant overhead to any DIY solution.
For a few requests, you can get by with a carefully crafted User-Agent header and some delay between requests. At any real scale, you need proxies, CAPTCHA handling, and browser rendering — or a tool that handles all three for you.
Method 1: scraping Amazon reviews with Spidra
The Spidra API handles the proxy rotation, CAPTCHA solving, and browser rendering automatically. You describe the data you want, define the output schema, and get structured JSON back.
Step 1: get your API key
Sign up free at app.spidra.io. The free plan gives you 300 credits with no card required. Your API key is under Settings → API Keys once you are in.
Step 2: define the schema
A JSON Schema tells Spidra exactly what shape to return. Every field in required always appears in the output — as null if the page does not have that value. Because the output is a list of reviews, the array sits inside an object with a named key. The Spidra API requires the root type to always be "object".
If you want to generate this schema automatically from a sample output rather than writing it by hand, paste your example JSON into the free JSON Schema Generator and it builds the structure for you.
{
"type": "object",
"required": ["reviews"],
"properties": {
"reviews": {
"type": "array",
"items": {
"type": "object",
"required": [
"reviewer_name",
"review_title",
"review_date",
"review_text",
"review_rating",
"review_image"
],
"properties": {
"reviewer_name": {"type": "string"},
"review_title": {"type": "string"},
"review_date": {"type": "string"},
"review_text": {"type": "string"},
"review_rating": {"type": "string"},
"review_image": {"type": ["string", "null"]}
}
}
}
}
}Step 3: submit the scrape job
The Spidra API is async. You POST a request, receive a jobId, then poll until the status is completed.
curl -X POST https://api.spidra.io/api/scrape \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
{
"url": "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
}
],
"prompt": "Extract all customer reviews visible on the page. For review_image return the image URL if the reviewer uploaded a photo, otherwise return null.",
"output": "json",
"useProxy": true,
"proxyCountry": "us",
"schema": {
"type": "object",
"required": ["reviews"],
"properties": {
"reviews": {
"type": "array",
"items": {
"type": "object",
"required": [
"reviewer_name", "review_title", "review_date",
"review_text", "review_rating", "review_image"
],
"properties": {
"reviewer_name": {"type": "string"},
"review_title": {"type": "string"},
"review_date": {"type": "string"},
"review_text": {"type": "string"},
"review_rating": {"type": "string"},
"review_image": {"type": ["string", "null"]}
}
}
}
}
}
}'You will get back:
{"jobId": "abc-123", "status": "queued"}Step 4: poll for the result
curl https://api.spidra.io/api/scrape/abc-123 \
-H "x-api-key: YOUR_API_KEY"Keep polling every few seconds until status is completed. Here is the full flow in Python using requests directly:
import requests, time, json, os
API_KEY = os.environ["SPIDRA_API_KEY"]
BASE = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}
REVIEW_SCHEMA = {
"type": "object",
"required": ["reviews"],
"properties": {
"reviews": {
"type": "array",
"items": {
"type": "object",
"required": [
"reviewer_name", "review_title", "review_date",
"review_text", "review_rating", "review_image"
],
"properties": {
"reviewer_name": {"type": "string"},
"review_title": {"type": "string"},
"review_date": {"type": "string"},
"review_text": {"type": "string"},
"review_rating": {"type": "string"},
"review_image": {"type": ["string", "null"]}
}
}
}
}
}
# Submit
resp = requests.post(f"{BASE}/scrape", headers=HEADERS, json={
"urls": [{"url": "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"}],
"prompt": "Extract all customer reviews visible on the page. For review_image return the image URL if the reviewer uploaded a photo, otherwise return null.",
"output": "json",
"useProxy": True,
"proxyCountry": "us",
"schema": REVIEW_SCHEMA,
})
job_id = resp.json()["jobId"]
print(f"Job submitted: {job_id}")
# Poll
while True:
result = requests.get(f"{BASE}/scrape/{job_id}", headers=HEADERS).json()
print(f"Status: {result['status']}")
if result["status"] == "completed":
break
time.sleep(3)
reviews = result["result"]["content"]["reviews"]
print(f"Extracted {len(reviews)} reviews")Step 5: check the output
Running this against the Logitech G502 Hero page returned 13 reviews:
{
"reviews": [
{
"reviewer_name": "Subnet",
"review_title": "Best mouse hands down!",
"review_date": "March 14, 2021",
"review_text": "There is a ton to like about this mouse from how it fits into your hand, the grip, and the competitive edge for gaming you will begin to notice immediately...",
"review_rating": "5 out of 5 stars",
"review_image": "https://m.media-amazon.com/images/I/619veDQXQsL._SY500_.jpg"
},
{
"reviewer_name": "Alex Reeves",
"review_title": "Excellent upgrade from entry-level gaming mice",
"review_date": "March 12, 2026",
"review_text": "The Logitech G502 Hero is an excellent wired gaming mouse with great build quality, strong customization options, and a very accurate sensor...",
"review_rating": "5 out of 5 stars",
"review_image": null
},
{
"reviewer_name": "David Malchin",
"review_title": "Better Than Expected",
"review_date": "June 5, 2026",
"review_text": "Wanting to try this popular gaming mouse, I thought it looked a bit silly on the images and wasn't sure how comfortable it would be...",
"review_rating": "4 out of 5 stars",
"review_image": null
}
]
}One small things to handle in post-processing.
review_rating comes back as "5 out of 5 stars". If you need the numeric value for averaging or sorting:
def parse_rating(s: str) -> float | None:
try:
return float(s.split(" out of")[0])
except (ValueError, AttributeError):
return None
for r in reviews:
r["rating_score"] = parse_rating(r["review_rating"])Step 6: export to CSV
import csv
csv_file = "amazon_reviews.csv"
with open(csv_file, mode="w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=[
"reviewer_name", "review_title", "review_date",
"review_text", "review_rating", "review_image"
])
writer.writeheader()
writer.writerows(reviews)
print(f"Saved {len(reviews)} reviews to {csv_file}")If you prefer to work with the Python SDK or Node.js SDK instead of requests directly, both support the same schema and prompt fields. The SDK handles the polling for you.
Method 2: scraping Amazon reviews with BeautifulSoup
This approach uses Python's requests library to fetch the page HTML and BeautifulSoup to parse out the review fields one by one.
It is useful to understand how the page is structured, and the output is the same, but every selector you write is a dependency that breaks when Amazon updates its frontend.
Prerequisites
pip install requests beautifulsoup4Fetch the page HTML
Start with a basic request to get the full HTML of the product page:
import requests
from bs4 import BeautifulSoup
target_url = "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
}
response = requests.get(target_url, headers=headers)
if response.status_code != 200:
print(f"Request failed: {response.status_code}")
else:
soup = BeautifulSoup(response.text, "html.parser")
print("Page fetched successfully")If you get a 503 or a CAPTCHA page instead of a 200, Amazon has already detected the automated request. This is where proxy rotation and browser rendering would be needed to continue, which is what the Spidra approach handles automatically.
Locate and extract each field
Once you have the HTML, you find each review field by inspecting the page in DevTools and identifying the element class names.
Reviewer names sit inside a <span> tag with the class a-profile-name:
reviewer_names = soup.find_all("span", class_="a-profile-name")
names_list = [name.text.strip() for name in reviewer_names]
Review titles are in an <a> tag with the class review-title. Amazon embeds the star rating text inside the same element, so you need to strip it:
review_titles = soup.find_all("a", class_="review-title")
titles_list = [t.text.replace("5.0 out of 5 stars\n", "").strip() for t in review_titles]Review text is in a <span> tag with the class review-text:
review_texts = soup.find_all("span", class_="review-text")
texts_list = [t.get_text(separator="\n").strip() for t in review_texts]Review dates are in a <span> tag with the class review-date:
review_dates = soup.find_all("span", class_="review-date")
dates_list = [d.text.strip() for d in review_dates]Ratings are in an <i> tag with the class review-rating. The text includes "out of 5" twice, so split on the first occurrence:
review_ratings = soup.find_all("i", class_="review-rating")
ratings_list = [r.text.strip().split(" out of")[0] for r in review_ratings]Review images are in an <img> tag with the class review-image-tile. Only reviews where the customer uploaded a photo will have this element:
review_images = soup.find_all("img", class_="review-image-tile")
image_urls = [img["src"] for img in review_images]Complete code
Here is the final updated scraper code:
import requests
from bs4 import BeautifulSoup
import csv
target_url = "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
}
response = requests.get(target_url, headers=headers)
if response.status_code != 200:
print(f"Request failed: {response.status_code}")
else:
soup = BeautifulSoup(response.text, "html.parser")
reviewer_names = [n.text.strip() for n in soup.find_all("span", class_="a-profile-name")]
review_titles = [t.text.replace("5.0 out of 5 stars\n", "").strip() for t in soup.find_all("a", class_="review-title")]
review_texts = [t.get_text(separator="\n").strip() for t in soup.find_all("span", class_="review-text")]
review_dates = [d.text.strip() for d in soup.find_all("span", class_="review-date")]
review_ratings = [r.text.strip().split(" out of")[0] for r in soup.find_all("i", class_="review-rating")]
review_images = [img["src"] for img in soup.find_all("img", class_="review-image-tile")]
csv_file = "amazon_reviews.csv"
with open(csv_file, mode="w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["Reviewer Name", "Review Title", "Review Text", "Review Date", "Rating", "Image URL"])
for i in range(len(reviewer_names)):
writer.writerow([
reviewer_names[i],
review_titles[i] if i < len(review_titles) else "N/A",
review_texts[i] if i < len(review_texts) else "N/A",
review_dates[i] if i < len(review_dates) else "N/A",
review_ratings[i] if i < len(review_ratings) else "N/A",
review_images[i] if i < len(review_images) else "N/A",
])
print(f"Saved to {csv_file}")This works on a good day. Amazon notes in its own developer documentation that it changes its page layout regularly.
The class names a-profile-name, review-title, review-text, review-date, and review-rating are not guaranteed to stay stable. When Amazon pushes a frontend update, the selectors return empty lists or misaligned data without any error, and you may not notice until you check the output.
Each selector is a maintenance debt you carry indefinitely.
Wrapping up
You now have two working approaches for scraping Amazon reviews. The BeautifulSoup method works and is worth understanding because it shows how the page is structured.
The Spidra approach removes the selector maintenance entirely — the prompt and schema stay the same regardless of what Amazon changes in its frontend.
Both methods target the featured reviews on the product detail page, which is the publicly accessible review data after the May 2026 changes to the pagination endpoint.
If you want to pull reviews for many products at once, the same schema works in a batch request against multiple /dp/{ASIN} URLs. The batch scraping guide covers how to run up to 50 URLs in parallel.
For the full picture on scraping Amazon product data including prices, BSR rankings, and search results, see the guide on how to scrape Amazon product data.
