In this post I will teach you how to create a simple Python script that will notify the user when a new element is added on the given URL. For this, we will need the URL of the target website and internet connection.
We will use beautiful soup for this project.
import requests
from bs4 import BeautifulSoup
import schedule
import time
def extract_new_elements(url):
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
current_elements = set(soup.find('a', class_='vehicles-export')['data-vehicles'].split(','))
new_elements = current_elements - previous_elements
if new_elements:
print("New elements found:", new_elements)
else:
print("No new elements found.")
else:
print("Failed to retrieve data from the URL.")
Importing Libraries: The function depends on the requests
library to make HTTP requests and the BeautifulSoup
class from the bs4
module to parse HTML content.
HTTP Request: It sends an HTTP GET request to the URL provided as the url
parameter.
Response Handling: It checks if the response status code is 200
, which indicates that the request was successful and the webpage is available.
Parsing HTML: If the response is successful, the function uses BeautifulSoup to parse the HTML content of the webpage.
Finding Current Elements: It then finds a specific <a>
tag with a class attribute value of 'vehicles-export'
. This tag likely contains data about vehicles. The function extracts the value of the 'data-vehicles'
attribute from this tag and splits it into a set of elements. These elements represent the current data about vehicles available on the webpage.
Finding New Elements: It calculates the set difference between the current_elements
set and a set named previous_elements
. This subtraction operation identifies any elements that are in current_elements
but not in previous_elements
, effectively finding new elements since the last time this function was called.
Printing Results: If new elements are found, it prints a message indicating the presence of new elements along with the new elements themselves. If no new elements are found, it prints a message stating so.
Error Handling: If the HTTP request fails (e.g., due to a connection issue or an invalid URL), it prints an error message.
def job():
print("Checking for new elements...")
extract_new_elements(url)
job() is a function that does one simple task. It wraps the extract_new_elements()
function around it, so, in simple terms, whenever the function job
is executed, extract_new_elements()
will be executed. This will seem to serve a purpose in later parts of the script.
previous_elements = set()
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
previous_elements = set(soup.find('a', class_='vehicles-export')['data-vehicles'].split(','))
else:
print("Failed to retrieve initial data from the URL.")
This code fetches data from a specified URL using the requests.get()
method. If the HTTP response status code is 200 (indicating success), it parses the HTML content using BeautifulSoup. Specifically, it searches for an <a>
tag with the class attribute set to ‘vehicles-export’ and extracts the value of the ‘data-vehicles’ attribute. This value is split into elements and stored in a set named previous_elements
.
schedule.every(15).seconds.do(job)
while True:
schedule.run_pending()
time.sleep(1)
This is the triggering snippet of code that will start looking for new elements on the given URL. It will not stop its execution until user manually closes the execution by closing the terminal by pressing ctrl + c
.
# Function to extract new elements from the URL
def extract_new_elements(url):
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
current_elements = set(soup.find('a', class_='vehicles-export')['data-vehicles'].split(','))
new_elements = current_elements - previous_elements
if new_elements:
print("New elements found:", new_elements)
# Save new elements in a variable or perform any other desired action
# For example:
# new_elements_variable = list(new_elements)
else:
print("No new elements found.")
else:
print("Failed to retrieve data from the URL.")
# Function to be scheduled to run every 10 minutes
def job():
print("Checking for new elements...")
extract_new_elements(url)
# URL to scrape
url = "https://remarketing.jyskefinans.dk/cars/"
# Variable to store previously seen elements
previous_elements = set()
# Initial extraction of elements
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
previous_elements = set(soup.find('a', class_='vehicles-export')['data-vehicles'].split(','))
else:
print("Failed to retrieve initial data from the URL.")
# Schedule job to run every 10 minutes
schedule.every(15).seconds.do(job)
# Run the scheduler
while True:
schedule.run_pending()
time.sleep(1)
The official documentation of the beautiful soup project can be found here:
https://www.crummy.com/software/BeautifulSoup/bs3/documentation.html
© 2021. All rights reserved by Your Company.