Skip to main content
Blog/ How to scrape Amazon with Selenium: step-by-step tutorial (2026)
June 26, 2026 · 9 min read

How to scrape Amazon with Selenium: step-by-step tutorial (2026)

Joel Olawanle
Joel Olawanle
How to scrape Amazon with Selenium: step-by-step tutorial (2026)

Selenium is a popular choice for scraping Amazon because it runs a real browser. That means JavaScript renders fully, page elements load as they would for a human visitor, and you can interact with dynamic content. For a site as JavaScript-heavy as Amazon, that matters.

This tutorial walks through scraping an Amazon product page with Selenium in Python, including how to handle each data field and what to watch out for when things get more complex.

We will use the Logitech G502 Hero gaming mouse as our target:

https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/

By the end you will have a Selenium scraper that extracts the following from a product page:

  • Product name
  • Price
  • Feature bullet points
  • Star rating
  • Main product image

Step 1: prerequisites

You need Python 3.9 or higher. Install Selenium and the WebDriver Manager, which handles Chrome driver installation automatically:

pip install selenium webdriver-manager

The WebDriver Manager checks your Chrome version and downloads the matching ChromeDriver. It only runs on first use — subsequent executions skip the download step.

Step 2: access the Amazon page

Start by opening the product page in a browser to confirm your Selenium setup is working. This runs Chrome in visible mode first, then we will switch to headless.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

target_url = "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
driver.get(target_url)

driver.quit()

Running this opens a Chrome window, loads the product page, and closes. If you see the product page, Selenium is working.

Running a visible browser window is fine for development, but adds memory overhead and is impractical for real scraping. Switch to headless mode by adding Chrome options:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

options = webdriver.ChromeOptions()
options.add_argument("--headless=new")

driver = webdriver.Chrome(
    options=options,
    service=Service(ChromeDriverManager().install())
)

target_url = "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
driver.get(target_url)

driver.quit()

The --headless=new flag is the current Chrome headless argument. The older --headless flag without the =new suffix uses a deprecated implementation.

Step 3: scrape the product details

With the page loaded in the browser, you use Selenium's find_element methods to locate specific elements by their CSS selectors or IDs. Add the By class to your imports:

from selenium.webdriver.common.by import By

Amazon's selectors change with regular DOM updates. The ones below were verified in 2026, but inspect the page in DevTools if any of them stop returning data.

Locate and extract the product name

medium_amazon_product_page_name_inspection_4673d5f5eb.jpg

Right-click the product title and select Inspect. The title is a <span> inside a <h1> element with the ID productTitle. Extract it with By.ID:

name_element = driver.find_element(By.ID, "productTitle")
product_name = name_element.text.strip()
print("Name:", product_name)
Name: Logitech G502 HERO High Performance Wired Gaming Mouse, HERO 25K Sensor, 25,600 DPI, RGB, Adjustable Weights, 11 Programmable Buttons, On-Board Memory, PC / Mac

Locate and extract the price

medium_amazon_product_page_price_element_d8ec16d110.webp

Amazon's price structure involves multiple nested span elements depending on whether the product has a sale price, a Prime deal, or a third-party seller. Using find_element with just a-offscreen can return the wrong element or an empty string because that class appears in multiple places.

JavaScript's querySelector via Selenium's execute_script gives more control. Target the price inside its immediate parent container:

price_element = driver.execute_script(
    'return document.querySelector(".a-price.a-text-price span.a-offscreen")'
)
price = driver.execute_script("return arguments[0].textContent", price_element)
print("Price:", price)
Price: $79.99

The note on execute_script: arguments[0] refers to the element passed as the second argument. This two-step approach — find the element, then extract its text via a separate script call — is necessary here because Selenium's built-in text property does not reliably read content from hidden span elements.

Locate and extract the feature bullets

medium_amazon_product_page_description_element_b4d0412597.jpg

The "About this item" bullet points are <li> tags inside an unordered list within #feature-bullets. Find the list, collect all its <li> children, and read the <span> inside each one:

description_list = driver.find_element(
    By.CSS_SELECTOR,
    "ul.a-unordered-list.a-vertical.a-spacing-mini"
)
description_items = description_list.find_elements(By.TAG_NAME, "li")

description_data = []
for item in description_items:
    try:
        text = item.find_element(By.TAG_NAME, "span").text.strip()
        if text:
            description_data.append(text)
    except Exception:
        pass

print("Features:", description_data[:2])
Features: [
    'HERO 25K sensor: Next-gen HERO 25K gaming sensor with 100 - 25,600 max DPI sensitivity...',
    'Adjustable weight system: 5 x 3.6g removable weights let you customize the balance...'
]

The try/except around each find_element call prevents the whole loop from stopping if a single <li> does not contain a <span>.

Locate and extract the star rating

medium_amazon_product_page_rating_element_4b9313cfc0.jpg

The rating score sits inside a <span> within the acrPopover element. The text property on the parent element returns the full accessible rating string:

rating_element = driver.find_element(By.ID, "acrPopover")
rating = rating_element.text.strip()
print("Rating:", rating)
Rating: 4.7

Locate and extract the main product image

amazon_product_page_image_element_1de336bd18.jpg

The main product image is an <img> tag inside a <div> with the ID imgTagWrapperId. Extract the src attribute to get the image URL:

image_div = driver.find_element(By.ID, "imgTagWrapperId")
product_image = image_div.find_element(By.TAG_NAME, "img")
product_image_url = product_image.get_attribute("src")
print("Image:", product_image_url)
Image: https://m.media-amazon.com/images/I/61mpMH5TzkL._AC_SY355_.jpg

Complete product scraper

Here's the final code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument("--headless=new")

driver = webdriver.Chrome(
    options=options,
    service=Service(ChromeDriverManager().install())
)

target_url = "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
driver.get(target_url)

# Product name
name_element = driver.find_element(By.ID, "productTitle")
product_name = name_element.text.strip()

# Price via JavaScript querySelector
price_element = driver.execute_script(
    'return document.querySelector(".a-price.a-text-price span.a-offscreen")'
)
price = driver.execute_script("return arguments[0].textContent", price_element)

# Feature bullets
description_list = driver.find_element(
    By.CSS_SELECTOR, "ul.a-unordered-list.a-vertical.a-spacing-mini"
)
description_items = description_list.find_elements(By.TAG_NAME, "li")
description_data = []
for item in description_items:
    try:
        text = item.find_element(By.TAG_NAME, "span").text.strip()
        if text:
            description_data.append(text)
    except Exception:
        pass

# Rating
rating_element = driver.find_element(By.ID, "acrPopover")
rating = rating_element.text.strip()

# Main image
image_div = driver.find_element(By.ID, "imgTagWrapperId")
product_image = image_div.find_element(By.TAG_NAME, "img")
product_image_url = product_image.get_attribute("src")

data = {
    "Name":        product_name,
    "Price":       price,
    "Description": description_data,
    "Rating":      rating,
    "Image":       product_image_url,
}

print(data)
driver.quit()

Output:

{
    'Name':        'Logitech G502 HERO High Performance Wired Gaming Mouse...',
    'Price':       '$79.99',
    'Description': ['HERO 25K sensor: Next-gen HERO 25K gaming sensor...', ...],
    'Rating':      '4.7',
    'Image':       'https://m.media-amazon.com/images/I/61mpMH5TzkL._AC_SY355_.jpg'
}

Step 4: export to CSV

import csv

csv_file = "product.csv"

with open(csv_file, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(data.keys())
    writer.writerow(data.values())

print(f"Data saved to {csv_file}")
driver.quit()

The Description field writes as a Python list in the CSV. If you need each bullet on its own row, iterate over description_data and write a separate row per item.

Challenges of scraping Amazon with Selenium

Selenium works but comes with limitations that matter at scale.

Getting blocked

Headless Chrome is detectable. Amazon's bot detection checks for the navigator.webdriver property, which Selenium sets to true by default. It also checks for missing browser APIs, unusual timing signatures, and other fingerprinting signals that headless browsers expose. Adding a User-Agent header helps for a small number of requests. For sustained volume it is not enough — Amazon's WAF blocks headless browsers at the IP and fingerprint layer even with evasion attempts.

Performance overhead

Selenium runs a full Chrome instance for every browser session. A full browser loads CSS, fonts, scripts, and images for every page even when you only need a few text values. On a machine running multiple concurrent sessions this adds up quickly. Scraping 100 product pages serially with Selenium takes significantly longer than it needs to because of this overhead.

A few ways to reduce it:

  • Block unnecessary resource types (images, fonts, CSS) using Chrome DevTools Protocol
  • Run multiple browser instances in parallel using a thread pool
  • Set timeouts on driver.get() to avoid waiting on slow-loading elements

Even with these optimisations, Selenium is a heavier tool for scraping than it needs to be for most Amazon use cases.

Selectors breaking without warning

Amazon updates its DOM regularly. The class ul.a-unordered-list.a-vertical.a-spacing-mini for the feature bullets, the acrPopover ID for ratings, the imgTagWrapperId ID for images — any of these can change in an Amazon frontend deployment and your scraper stops returning data without throwing an error. At scale, silent failures are harder to catch than exceptions.

The simpler approach: using the Spidra API

All three challenges above come from the same root problem: your code is tightly coupled to Amazon's current HTML. When Amazon changes its HTML, your selectors break. When your IP gets flagged, you need a new one. When your browser gets fingerprinted, you need a different fingerprint.

The Spidra API removes this coupling entirely. You describe the data you want in a prompt and define the output shape in a schema. Spidra loads the page in a real browser with residential proxy routing and CAPTCHA handling, extracts the data using AI, and returns structured JSON matching your schema.

When Amazon changes its HTML, the prompt keeps working because it describes what the data means, not where it sits in the DOM.

import requests, time, os

API_KEY = os.environ["SPIDRA_API_KEY"]
BASE    = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

PRODUCT_SCHEMA = {
    "type": "object",
    "required": ["title", "price", "rating", "availability"],
    "properties": {
        "title":        {"type": "string"},
        "brand":        {"type": ["string", "null"]},
        "asin":         {"type": "string"},
        "price":        {"type": ["number", "null"]},
        "currency":     {"type": ["string", "null"]},
        "rating":       {"type": ["number", "null"]},
        "review_count": {"type": ["integer", "null"]},
        "availability": {"type": "string"},
        "features":     {"type": "array", "items": {"type": "string"}},
        "images":       {"type": "array", "items": {"type": "string"}},
        "seller":       {"type": ["string", "null"]},
        "prime":        {"type": ["boolean", "null"]},
        "bsr_rank":     {"type": ["integer", "null"]},
        "bsr_category": {"type": ["string", "null"]},
    }
}

resp = requests.post(f"{BASE}/scrape", headers=HEADERS, json={
    "urls": [{"url": "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"}],
    "prompt": "Extract the full product details",
    "output": "json",
    "useProxy": True,
    "proxyCountry": "us",
    "schema": PRODUCT_SCHEMA,
})
job_id = resp.json()["jobId"]

while True:
    result = requests.get(f"{BASE}/scrape/{job_id}", headers=HEADERS).json()
    if result["status"] == "completed":
        break
    time.sleep(3)

print(result["result"]["content"])

No ChromeDriver to manage. No browser instance to spin up. No selector to maintain. No proxy to configure separately.

For scraping many products at once, the batch scraping endpoint processes up to 50 ASINs in parallel per request. For the full pipeline from search results to product pages, see the Amazon product data guide.

Get your API key at app.spidra.io — the free plan gives you 300 credits with no card required.

Conclusion

You now know how to scrape Amazon product pages with Selenium in Python. Here is a recap:

  • Setting up Selenium with ChromeDriver in headless mode
  • Extracting product name, price, feature bullets, rating, and image field by field
  • Exporting scraped data to a CSV file
  • The main challenges of Selenium-based Amazon scraping: blocking, performance, and selector maintenance

For occasional scraping where you want full browser control, Selenium gets the job done. For anything that runs on a schedule or scrapes at volume, the infrastructure overhead and blocking problem make a managed API the more practical option.

Frequently asked questions

Headless browsers expose the navigator.webdriver property which Amazon's bot detection checks for. Selenium also has timing signatures and missing browser APIs that differ from a real user's session. Headers and User-Agents help for small numbers of requests but do not prevent blocking at scale.

Share this article

Start scraping for free.

Get 300 free credits to explore Spidra. Build your first scraper in minutes, not hours. Upgrade anytime as you scale.

We build features around real workflows. Usually within days.