Skip to main content
Blog/ How to scrape Amazon with Python: step-by-step tutorial (2026)
June 26, 2026 · 12 min read

How to scrape Amazon with Python: step-by-step tutorial (2026)

Joel Olawanle
Joel Olawanle
How to scrape Amazon with Python: step-by-step tutorial (2026)

Amazon product data is useful for price monitoring, competitive research, trend tracking, and building product catalogs. The data is publicly visible on every product page. In this tutorial you will build a Python scraper that extracts it.

We will use the Mopchnic Wireless Headset as our target throughout:

https://www.amazon.com/dp/B0G8G4SQXQ

By the end you will have a scraper that extracts the following from a product page:

  • Product title
  • Price
  • Rating and review count
  • Product images
  • Bullet-point features

We will then scale it to scrape search result pages and handle pagination. At the end you will see how to do all of this without writing or maintaining any selectors.

Understanding Amazon's anti-scraping measures

Before writing a line of code, it is worth understanding what you are dealing with. Amazon is one of the most aggressively protected sites on the internet.

  • CAPTCHA challenges appear when Amazon detects automated behaviour — repeated requests from the same IP, missing browser headers, or unusual request timing. They block the scraping process entirely.
  • Rate limiting kicks in when too many requests hit the servers within a short window. Amazon temporarily blocks the IP or surfaces a CAPTCHA as a checkpoint.
  • IP blocking happens at the network layer. Datacenter IP ranges are on a deny list by default. A plain requests call from a cloud server or your home IP will often get a block page rather than a product page, especially under any sustained load.

For a single test request with a good User-Agent you can often get through. At any real scale, you need proxy rotation and browser rendering.

Prerequisites

Make sure you have Python 3.9 or higher. Install the required libraries:

pip install requests beautifulsoup4

We will use requests as the HTTP client and BeautifulSoup to parse the HTML.

Step 1: retrieve the page HTML

Start with a basic request to confirm you can reach the page. Create a file called scraper.py and add the following:

import requests

target_url = "https://www.amazon.com/dp/B0G8G4SQXQ"

response = requests.get(target_url)

if response.status_code != 200:
    print(f"Request failed: {response.status_code}")
else:
    print(response.text[:500])

Run this and you will likely get a 503, a CAPTCHA page, or a "Something went wrong" block page rather than the product. Amazon identifies the missing browser headers and blocks the request before it reaches the application layer.

Adding a User-Agent header makes the request look more like a real browser:

import requests

custom_headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.google.com/",
}

target_url = "https://www.amazon.com/dp/B0G8G4SQXQ"

response = requests.get(target_url, headers=custom_headers)

if response.status_code != 200:
    print(f"Request failed: {response.status_code}")
else:
    print("Success")
    print(response.text[:500])

This can get you a 200 for a small number of requests. Once you start scraping at volume, Amazon flags the IP. If you are sending more than a handful of requests, add a proxy:

proxies = {
    "http":  "http://YOUR_PROXY_HOST:PORT",
    "https": "http://YOUR_PROXY_HOST:PORT",
}

response = requests.get(target_url, headers=custom_headers, proxies=proxies)

Free proxy lists are unreliable and short-lived. Residential proxies from a paid provider are the only option that holds up at scale. We will cover the managed approach at the end of this guide.

Your complete setup code before any parsing:

import requests

custom_headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.google.com/",
}

target_url = "https://www.amazon.com/dp/B0G8G4SQXQ"

response = requests.get(target_url, headers=custom_headers)

if response.status_code != 200:
    print(f"Request failed: {response.status_code}")
else:
    print("Got the page")

Step 2: scrape the product details

Once you have the HTML, you use BeautifulSoup to find specific elements. The general approach is: open the product page in Chrome, right-click the element you want, select Inspect, find a stable ID or class name, then use soup.find() or soup.select_one() to extract it.

Add BeautifulSoup to your imports and parse the HTML:

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, "html.parser")

Locate and extract the product title

medium_amazon_product_page_title_inspection_c0ba45ddbb.png

Right-click the product title and select Inspect. The title lives in a <span> tag with the ID productTitle. IDs are the most stable selectors on Amazon because they identify a single element rather than a class shared across many:

title_element = soup.select_one("#productTitle")
title = title_element.text.strip() if title_element else None
print("Title:", title)
Title: Mopchnic Wireless Headset with Noise Cancelling Microphone, On-Ear Bluetooth Headset with USB Dongle, Mute Function & Charging Base

Locate and extract the price

medium_amazon_product_page_price_element_inspection_8fd8158803.png

Amazon's price element has changed a few times and now uses multiple container structures depending on whether the product is on sale, has a Prime deal, or is a third-party listing.

The selector #priceblock_ourprice that appears in older tutorials is gone. The reliable approach in 2026 is to try multiple fallbacks:

price = None
for selector in [
    "#corePriceDisplay_desktop_feature_div .a-offscreen",
    "span.priceToPay .a-offscreen",
    ".apexPriceToPay .a-offscreen",
    ".a-price .a-offscreen",
]:
    el = soup.select_one(selector)
    if el:
        price = el.text.strip()
        break

print("Price:", price)
Price: $29.99

The fallback loop is necessary because Amazon uses different price containers depending on the product type, the seller, and which A/B test you land in. A single selector will work until it does not.

Locate and extract the rating

medium_amazon_product_page_rating_count_inspection_81f622107b.webp

The star rating in 2026 is most reliably in the #acrPopover element, which contains a span with the full "4.5 out of 5 stars" text:

rating_element = soup.select_one("#acrPopover span.a-icon-alt")
rating = rating_element.text.strip() if rating_element else None
print("Rating:", rating)
Rating: 4.5 out of 5 stars

Locate and extract the review count

medium_amazon_product_page_description_inspection_61bd124755.png

The review count sits in a <span> with the ID acrCustomerReviewText:

review_count_element = soup.select_one("#acrCustomerReviewText")
review_count = review_count_element.text.strip() if review_count_element else None
print("Reviews:", review_count)
Reviews: 5,000 ratings

If you need a number rather than a formatted string for storage or comparison, parse it:

import re
review_count_raw = review_count_element.text.strip() if review_count_element else "0"
review_count_num = int(re.sub(r"[^\d]", "", review_count_raw)) if review_count_raw else 0

Locate and extract the main product image

medium_amazon_product_page_featured_image_inspection_3177a9278a.png

The main product image has a stable ID of landingImage. Extract the src attribute:

image_element = soup.select_one("#landingImage")
main_image = image_element.get("src") if image_element else None
print("Image:", main_image)
Image: https://m.media-amazon.com/images/I/61+j+lJ6eJL._AC_SL1500_.jpg

Locate and extract the bullet-point features

medium_amazon_product_page_alternative_images_inspection_1feaf82e60.webp

The "About this item" bullet points live inside #feature-bullets. Each point is a <li> containing a <span> with the text:

feature_bullets = []
bullets = soup.select("#feature-bullets .a-list-item")
for bullet in bullets:
    text = bullet.text.strip()
    if text:
        feature_bullets.append(text)

print("Features:", feature_bullets[:2])
Features: ['ENC Noise Cancelling & Clear Calls: This wireless headset blocks most distracting background noise...', 'Dual-Pairing & Wide Compatibility: Bluetooth 5.3 supports connection to two devices at once...']

Step 3: complete product scraper code

import requests
import re
from bs4 import BeautifulSoup

custom_headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.google.com/",
}

target_url = "https://www.amazon.com/dp/B0G8G4SQXQ"

response = requests.get(target_url, headers=custom_headers)

if response.status_code != 200:
    print(f"Request failed: {response.status_code}")
else:
    soup = BeautifulSoup(response.text, "html.parser")

    # Title
    title_el = soup.select_one("#productTitle")
    title = title_el.text.strip() if title_el else None

    # Price — try multiple selectors, Amazon changes these regularly
    price = None
    for selector in [
        "#corePriceDisplay_desktop_feature_div .a-offscreen",
        "span.priceToPay .a-offscreen",
        ".apexPriceToPay .a-offscreen",
        ".a-price .a-offscreen",
    ]:
        el = soup.select_one(selector)
        if el:
            price = el.text.strip()
            break

    # Rating
    rating_el = soup.select_one("#acrPopover span.a-icon-alt")
    rating = rating_el.text.strip() if rating_el else None

    # Review count
    review_count_el = soup.select_one("#acrCustomerReviewText")
    review_count_raw = review_count_el.text.strip() if review_count_el else "0"
    review_count = int(re.sub(r"[^\d]", "", review_count_raw)) if review_count_raw else 0

    # Main image
    image_el = soup.select_one("#landingImage")
    main_image = image_el.get("src") if image_el else None

    # Feature bullets
    feature_bullets = [
        b.text.strip()
        for b in soup.select("#feature-bullets .a-list-item")
        if b.text.strip()
    ]

    data = {
        "title":        title,
        "price":        price,
        "rating":       rating,
        "review_count": review_count,
        "main_image":   main_image,
        "features":     feature_bullets,
    }

    print(data)

Output:

{
    'title':        'Mopchnic Wireless Headset with Noise Cancelling Microphone...',
    'price':        '$29.99',
    'rating':       '4.5 out of 5 stars',
    'review_count': 5000,
    'main_image':   'https://m.media-amazon.com/images/I/61+j+lJ6eJL._AC_SL1500_.jpg',
    'features':     ['ENC Noise Cancelling & Clear Calls...', 'Dual-Pairing & Wide Compatibility...', ...]
}

Step 4: export to CSV

import csv

csv_file = "product.csv"

with open(csv_file, mode="w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["Title", "Price", "Rating", "Review Count", "Main Image"])
    writer.writerow([
        data["title"],
        data["price"],
        data["rating"],
        data["review_count"],
        data["main_image"],
    ])

print(f"Saved to {csv_file}")

Scraping Amazon search results

A single product page is a good start, but most use cases require collecting data across many products. Amazon search results pages list 20-25 products per page and are the starting point for any bulk extraction pipeline.

The target URL for a keyword search follows a consistent pattern:

https://www.amazon.com/s?k=wireless+headphones
amazon-headphone.jpg

Inspect any product listing. The product link sits in an <a> tag inside an <h2> element within a container that has a data-asin attribute. That attribute is the most reliable anchor because it uniquely identifies each product card:

import requests, csv, time
from bs4 import BeautifulSoup

custom_headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.google.com/",
}

base_url   = "https://www.amazon.com"
target_url = "https://www.amazon.com/s?k=wireless+headphones"

response = requests.get(target_url, headers=custom_headers)

if response.status_code != 200:
    print(f"Request failed: {response.status_code}")
else:
    soup = BeautifulSoup(response.text, "html.parser")

    product_links = []
    for link in soup.select("[data-asin] h2 a"):
        href = link.get("href", "")
        if href and "/dp/" in href:
            full_url = base_url + href if not href.startswith("https") else href
            product_links.append(full_url)

    print(f"Found {len(product_links)} products on this page")
    for url in product_links[:5]:
        print(url)

The [data-asin] h2 a selector finds every product title link by looking for anchor tags inside <h2> elements that live within a container with a data-asin attribute. This is more reliable than class-based selectors on Amazon search pages because data-asin is used structurally rather than for styling.

Handling pagination

Amazon breaks search results across multiple pages. The Next button at the bottom of the listing has a stable class of s-pagination-next. When there are no more pages, that element disappears from the DOM.

Wrap the scraper in a loop that follows the Next link until it is gone:

import requests, csv, time
from bs4 import BeautifulSoup

custom_headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.google.com/",
}

base_url   = "https://www.amazon.com"
target_url = "https://www.amazon.com/s?k=wireless+headphones"
all_links  = []

while True:
    response = requests.get(target_url, headers=custom_headers)

    if response.status_code != 200:
        print(f"Request failed: {response.status_code}")
        break

    soup = BeautifulSoup(response.text, "html.parser")

    for link in soup.select("[data-asin] h2 a"):
        href = link.get("href", "")
        if href and "/dp/" in href:
            full_url = base_url + href if not href.startswith("https") else href
            if full_url not in all_links:
                all_links.append(full_url)

    print(f"Page scraped — {len(all_links)} total links so far")

    # Find the next page link
    next_page = soup.select_one("a.s-pagination-next")
    if not next_page:
        print("No more pages")
        break

    next_href = next_page.get("href", "")
    if not next_href:
        break

    target_url = base_url + next_href if not next_href.startswith("https") else next_href

    # Pause between requests to avoid rate limiting
    time.sleep(3)

# Save all links to CSV
with open("product_links.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["Product URL"])
    for url in all_links:
        writer.writerow([url])

print(f"Saved {len(all_links)} links to product_links.csv")

This crawls all pages for the keyword, stops when the Next button disappears, and saves the collected links to a CSV.

The challenges you will run into at scale

Getting blocked

The User-Agent header and 3-second delays help, but they are not enough for sustained scraping. Amazon's bot detection goes beyond headers — it looks at TLS fingerprints, browser behaviour signals, and IP reputation.

Datacenter IPs get blocked almost immediately under volume. Residential proxies with rotation are the baseline requirement for reliable Amazon scraping.

Selectors breaking without warning

Amazon updates its frontend continuously. The most famous casualty is #priceblock_ourprice — it appeared in hundreds of scraping tutorials for years before Amazon removed it. Any code depending on that ID silently returned nothing. The price selector in this guide already uses multiple fallbacks because a single selector is not sustainable. Amazon does not announce these changes and does not treat your selectors as a contract.

The further implication is that every selector in this tutorial may need updating by the time you read it. Check each element in DevTools before assuming the selector will work.

The easier approach: Spidra

Instead of managing selectors, proxies, and CAPTCHA handling separately, Spidra handles all three.

You describe what you want from the page and define an output schema. Spidra loads the page in a real browser, routes through residential proxies automatically, and returns structured JSON matching your schema regardless of how Amazon's HTML is structured.

Sign up free at app.spidra.io — 300 credits, no card required.

import requests, time, os

API_KEY = os.environ["SPIDRA_API_KEY"]
BASE    = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

PRODUCT_SCHEMA = {
    "type": "object",
    "required": ["title", "asin", "price", "availability"],
    "properties": {
        "title":               {"type": "string"},
        "brand":               {"type": ["string", "null"]},
        "asin":                {"type": "string"},
        "price":               {"type": ["number", "null"]},
        "original_price":      {"type": ["number", "null"]},
        "currency":            {"type": ["string", "null"]},
        "discount_percentage": {"type": ["integer", "null"]},
        "availability":        {"type": "string"},
        "rating":              {"type": ["number", "null"]},
        "review_count":        {"type": ["integer", "null"]},
        "features":            {"type": "array", "items": {"type": "string"}},
        "images":              {"type": "array", "items": {"type": "string"}},
        "seller":              {"type": ["string", "null"]},
        "prime":               {"type": ["boolean", "null"]},
        "bsr_rank":            {"type": ["integer", "null"]},
        "bsr_category":        {"type": ["string", "null"]},
    }
}

resp = requests.post(f"{BASE}/scrape", headers=HEADERS, json={
    "urls": [{"url": "https://www.amazon.com/dp/B0G8G4SQXQ"}],
    "prompt": "Extract the full product details",
    "output": "json",
    "useProxy": True,
    "proxyCountry": "us",
    "schema": PRODUCT_SCHEMA,
})
job_id = resp.json()["jobId"]

while True:
    result = requests.get(f"{BASE}/scrape/{job_id}", headers=HEADERS).json()
    if result["status"] == "completed":
        break
    time.sleep(3)

print(result["result"]["content"])

Output from the actual request:

{
  "title": "Mopchnic Wireless Headset with Noise Cancelling Microphone",
  "brand": "Mopchnic",
  "asin": "B0G8G4SQXQ",
  "price": 29.99,
  "original_price": 46.99,
  "currency": "$",
  "discount_percentage": 36,
  "availability": "In Stock",
  "rating": 4.5,
  "review_count": 5000,
  "features": [
    "ENC Noise Cancelling & Clear Calls",
    "Dual-Pairing & Wide Compatibility",
    "Extended Battery & Easy Charging",
    "Crystal Stereo Sound for Calls & Music",
    "Ultra Comfortable & All-Day Wear"
  ],
  "images": [
    "https://m.media-amazon.com/images/I/61+j+lJ6eJL._AC_SL1500_.jpg",
    "https://m.media-amazon.com/images/I/61c+5p-3pNL._AC_SL1500_.jpg"
  ],
  "seller": "Mopchnic Official Store",
  "prime": true,
  "bsr_rank": 15,
  "bsr_category": "Electronics > Headphones > Over-Ear Headphones"
}

No selector maintenance. No proxy setup. No CAPTCHA handling. The schema definition replaces all of it, and when Amazon changes its HTML the prompt keeps working because it describes the data, not its location in the DOM.

For scraping many products in parallel, the batch scraping endpoint processes up to 50 ASINs in a single request. For collecting ASINs from search results first, see the Amazon product data guide which covers the full search-to-PDP pipeline.

Conclusion

You have now seen how to scrape Amazon product pages and search results with Python's Requests and BeautifulSoup. Here is a summary of what this guide covered:

  • Getting the full HTML of an Amazon product page, including handling User-Agents and the blocking problem
  • Extracting product title, price, rating, review count, images, and features field by field
  • Exporting scraped data to a CSV file
  • Collecting product links from search result pages
  • Handling pagination to crawl across multiple pages
  • The maintenance challenges that come with selector-based Amazon scraping
  • Using Spidra as the alternative that removes selector and proxy management entirely

Scraping Amazon at any real scale is genuinely difficult. The BeautifulSoup approach works for small, occasional scrapes, but if you are building a monitoring pipeline or collecting data regularly, managing the infrastructure adds up quickly.

Frequently asked questions

Amazon removed this element in 2023. It appeared in many tutorials written before that and still appears in older guides. The working price selectors in 2026 are #corePriceDisplay_desktop_feature_div .a-offscreen with fallbacks to span.priceToPay .a-offscreen and .a-price .a-offscreen.

Share this article

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.