Selenium Webscraper (Python)
- Shone Pious
- Oct 15, 2023
- 3 min read
In this blog:
What is Selenium?
Selenium is an open-source, JavaScript based, automated web testing tool used to run tests on different web applications and browsers.
One drawback is that Selenium can only test web applications, so mobile and desktop apps are not currently supported, however, it can deploy on Linux, Mac and Windows.
Web applications that are supported include ➡️ Safari, Chrome, Opera, Firefox and tests can be coded in a plethora of languages including PHP, Python, Perl, Ruby and Java.
Downloading Chromedriver
First we have to download either Chrome or Edge – Any Chromium based browser should work.
Then download the webdriver for the version of browser that you have.
To check what your browser version is, go to chrome://version/ in the search bar.

I will download the chromedriver for win64. Just copy and paste the URL into the search bar and it will download.

Unzip the folder.

Importing webdriver
Run the following, with the path of your chromedriver executable.
from selenium import webdriver
chromeDriver = "C:/Users/shone/OneDrive/Documents/chromedriver-win64/chromedriver-win64/chromedriver.exe"
driver = webdriver. Chrome(executable_path=chromeDriver)
driver.get('https://www.google.com')
driver.quit()
If you get this error, that means the executable_path argument is deprecated and is a change in selenium 4.10.0.

To resolve this issue, we must use the Option argument now.
Since the path I specified wasn’t working, I installed Selenium Manager in the terminal.
The new Manager tool can now detect the version and driver executable that you need and will embed it into the code automatically. So you can remove the executable_path URL from the code. Type ➡️
pip install selenium

The solution to this bug was found here > https://stackoverflow.com/questions/76550506/typeerror-webdriver-init-got-an-unexpected-keyword-argument-executable-p
Type the following ➡️
option = webdriver.ChromeOptions()
driver = webdriver.Chrome(options=option)
options = Options()
driver.get('https://www/google.com')
driver.quit()
Running the script now will open google.com for a second and close the window by itself.

Now that we have the webdriver itself working, we can proceed to the webscraping part which utilises the webdriver to retrieve information that we want.
Webscraping Amazon item price
If we want to find the price of an item off Amazon, we can use the classes used in the elements code.
Open Amazon and find a product. I have found this Ring doorbell.

Highlight the price and click inspect ➡️

Under the driver.get command, you can type
price = driver.find_element_by_class_name()
But this is the deprecated version.
The new version is ➡️
price = driver.find_element(By.CLASS_NAME, "a-price aok-align-center reinventPricePriceToPayMargin priceToPay")
This is the class I found in the Amazon inspect page.
Next, we need to change the get URL to the current URL ➡️
driver.get(link)

We might get an error with the ‘By’ part. Just import By ➡️
from selenium-webdriver.common.by import By

Running the script throws a no such element exception error.
I messed around trying to find a different class I could use.
This class allowed me to get the price of the doorbell camera.

Note >>
I have commented out the line that says ➡️
options.page_load_strategy = ‘non’
as this closed the window too quick before elements loaded.
I also added
driver.implicitly_wait(10)
as this keeps the window open for 10 seconds. I needed the window open for this long as I was asked to verify my human identity. I only got the price after I verified myself.

Automating the scan
Now say we want to check the price of this item every 5 mins, to see if it has dropped below a certain value.
We can do this by putting our current code into a function, and simply calling it every 5 mins.
In this example, I will set the sleeper time as 5 seconds to demonstrate it.
Create a While loop that runs our function called five_seconds().
Simply put our code that we wrote for the item price into the while loop, and have the Boolean statement to be true so that the loop runs indefinitely.

Every five seconds, our script will run and print the value of the item. I let it run 3 times before interrupting it.
What's next?
To learn about how to hunt for passwords in GitHub repositories using Selenium, check out this blog.
👍