2150866281

Web scraping script for a custom website

In this post, I will show you how to write a script using selenium that can perform desired actions on any webpage.

We will need the URL of the target page and a stable internet connection. 

Procedure

Install and import the required dependencies:

from selenium import webdriver
from selenium.webdriver.common.by import By
import time

Define the URL of the target website

 

url = "https://remarketing.jyskefinans.dk/cars/?maerke=Citro%C3%ABn&model=C3"

Initialize the URL, load the URL and set a timer of a few seconds to wait for the site to load completely. 

driver = webdriver.Chrome()
driver.get(url)
time.sleep(2)

Define a try/catch block to accept the cookies of the webpage before proceeding to ensure a smooth operation. 

try:

    # Find the accept all button and click it

   

    accept_button = driver.find_element(By.CLASS_NAME, "coi-banner__accept")

    accept_button.click()




except:

    pass

Inspect the webpage while coding and search the container/div/button that is to be targeted. 

try:

    while True:

        # Find the div element by its class name

        div_element = driver.find_element(By.CLASS_NAME, "vehicles-export-container")

        # Click the div element

        div_element.click()

        # Wait for 5 seconds

        time.sleep(5)

The above section is the crux of the whole script. Basic actions like click, drag etc can be achieved by making a few modifications in this block of code. 

Refer to this documentation of selenium to find more attributes of the driver() method and how it can interact with webpages:

https://www.selenium.dev/documentation/webdriver/elements/finders/

 

The final script file would look something like this:

from selenium import webdriver

from selenium.webdriver.common.by import By

import time

url = "https://remarketing.jyskefinans.dk/cars/?maerke=Citro%C3%ABn&model=C3"

# Initialize WebDriver

driver = webdriver.Chrome()

# Open the URL

driver.get(url)

# Wait for the cookie popup to appear (assuming it's an overlay)

time.sleep(2)  # Adjust this delay as needed to ensure the popup is fully loaded

try:

    # Find the accept all button and click it

   

    accept_button = driver.find_element(By.CLASS_NAME, "coi-banner__accept")

    accept_button.click()

except:

    # If the accept button is not found or if there's any error, continue without accepting cookies

    pass

try:

    while True:

        # Find the div element by its class name

        div_element = driver.find_element(By.CLASS_NAME, "vehicles-export-container")

        # Click the div element

        div_element.click()

        # Wait for 5 seconds

        time.sleep(5)

except KeyboardInterrupt:

    # Quit the WebDriver if a keyboard interrupt (Ctrl+C) is received

    driver.quit()
Frame 50

Develop an ATS scanner using Python and gemini-pro-vision

Introduction

This blog is a tutorial/documentation about creating a script that serves as a ATS(Application Tracking System) scanner for resume. It utilizes Python and Generative AI for implementation.

Procedure

It is advised to create a virtual environment for this project since the dependencies might clash with other local dependencies. 

$ python -m venv venv

Then activate the venv using the following command:

$ .\venv\Scripts\activate

 

Install the required dependencies. To make things simpler, create a requirements.txt file in the project directory and type the following names into it:

  1. streamlit
  2. google-generativeai
  3. python-dotenv
  4. pdf2image
  5. popplers
$ pip install -r requirements.txt

Generate an API key from here. Create a .env file and paste it like this:

GOOGLE_API_KEY  = " paste your api key here"

Create a app.py file in the same project directory and proceed with the following steps.

Importing and loading dependencies/libraries

Add the following libraries:
from dotenv import load_dotenv
load_dotenv()
import base64
import streamlit as st
import os
import io
fromPILimportImage
import pdf2image
import google.generativeai as genai
Import the Google API Key:
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

This will import your API key from the .env file into your app.py file.
Define a function to get response from the gemini-vision model.

 

def get_gemini_response(input,pdf_cotent,prompt):

    model=genai.GenerativeModel('gemini-pro-vision')

    response=model.generate_content([input,pdf_content[0],prompt])

    return response.text
Define a function that will take input from the user, convert it into base64 and return the contents of the pdf as an image so that the gemini-vision model can read and interact with it. 

 

The vision model can interact with images so we are converting our pdf into an image to make our work easy. All ATS systems use different mechanisms but this is the closest we can come to an industry grade ATS scanner. 

def input_pdf_setup(uploaded_file):

    if uploaded_file is not None:

        ## Convert the PDF to image

        images=pdf2image.convert_from_bytes(uploaded_file.read())

        first_page=images[0]
        # Convert to bytes

        img_byte_arr = io.BytesIO()

        first_page.save(img_byte_arr, format='JPEG')

        img_byte_arr = img_byte_arr.getvalue()

        pdf_parts = [

            {

                "mime_type": "image/jpeg",

                "data": base64.b64encode(img_byte_arr).decode()  # encode to base64

            }

        ]
        return pdf_parts

    else:

        raise FileNotFoundError("No file uploaded")
Setup streamlit web interface for our app to interact with users

 

Streamlit is a user-friendly library that helps users create a web interface to showcase their ML/AI models and projects. It saves them the time and hassle of writing a full-fledged website that would take days to develop otherwise. Streamlit does it in just a few lines of code.

st.set_page_config(page_title="ATS Resume Expert")

st.header("ATS Tracking System")
input_text=st.text_area("Job Description: ",key="input")
uploaded_file=st.file_uploader("Upload your resume(PDF)...",type=["pdf"])
if uploaded_file is not None:

    st.write("PDF Uploaded Successfully")
submit1 = st.button("Tell Me About the Resume")
submit3 = st.button("Percentage match")
Setup input prompts for the model. 

This is where all the magic happens. You can play around with the prompts and tweak them in order to get relevant information from the LLM. 

input_prompt1 = """

 You are an experienced Technical Human Resource Manager,your task is to review the provided resume against the job description.

  Please share your professional evaluation on whether the candidate's profile aligns with the role.

 Highlight the strengths and weaknesses of the applicant in relation to the specified job requirements.

"""

input_prompt3 = """

You are an skilled ATS (Applicant Tracking System) scanner with a deep understanding of data science and ATS functionality,

your task is to evaluate the resume against the provided job description. give me the percentage of match if the resume matches

the job description. First the output should come as percentage and then keywords missing and last final thoughts.

"""
Add the action buttons
if submit1:

    if uploaded_file is not None:

        pdf_content=input_pdf_setup(uploaded_file)

        response=get_gemini_response(input_prompt1,pdf_content,input_text)

        st.subheader("The Repsonse is")

        st.write(response)

    else:

        st.write("Please uplaod the resume")

elif submit3:

    if uploaded_file is not None:

        pdf_content=input_pdf_setup(uploaded_file)

        response=get_gemini_response(input_prompt3,pdf_content,input_text)

        st.subheader("The Repsonse is")

        st.write(response)

    else:

        st.write("Please uplaod the resume")

Use text model instead of vision model

A text based model works well when the resume is handmade using overleaf(LaTeX). In such scenario, the text is easily recognizable using PDF readers and there is no need to convert it into image for recognition. All text based models can interact with it. 

In our case, we will use the gemini-pro model.

 

modify the following functions: 

def get_gemini_response(input_text, resume_content, job_description):

    model = genai.GenerativeModel('models/gemini-pro')

    response = model.generate_content([input_text, resume_content, job_description])

    return response.text
def extract_text_from_pdf(uploaded_file):

    resume_text = ""

    with uploaded_file:

        pdf_reader = PyPDF2.PdfReader(uploaded_file)

        num_pages = len(pdf_reader.pages)

        for page_number in range(num_pages):

            page =  pdf_reader.pages[page_number]
            resume_text += page.extract_text()

    return resume_text
def input_pdf_setup(uploaded_file):

    if uploaded_file is not None:

        resume_text = extract_text_from_pdf(uploaded_file)

        paragraphs = resume_text.split('\n\n')  # Splitting based on double newlines,

        resume_content = '\n'.join(paragraphs)  # Concatenate paragraphs into a single string

        return resume_content

    else:

        raise FileNotFoundError("No file uploaded")

Modifying these three functions would ensure that your script successfully runs on text based models and shows more accuracy on text formatted resume. Everything else is the same. Even the API key is common for both the models. 

Frame 167

AI showdown: ChatGPT vs. Gemini

Google Gemini and ChatGPT have distinct origins and purposes. Google Gemini is a part of Google’s AI research division, and it focuses on natural language processing and understanding, aiming to enhance various Google services by integrating advanced AI capabilities. On the other hand, ChatGPT is developed by OpenAI, a research organisation dedicated to artificial general intelligence. ChatGPT is a language model designed to generate human-like text responses and facilitate natural language conversations.

First, it’s worth noting that both Gemini and ChatGPT are based on incredibly vast and powerful large language models (LLMs), far more advanced than anything publicly available in the past.

ChatGPT is just the interface through which users communicate with the language model – GPT4 (paying users of ChatGPT Pro) or GPT3.5 (free users).

In Google’s case, the interface is called Gemini (previously Bard), and it’s used to communicate with the language model, which is a separate entity but is also called Gemini (or Gemini Ultra if you’re paying for the Gemini Advanced service).

Something important to take into consideration is that although we call them both chatbots, the intended user experience is slightly different. ChatGPT is designed to enable conversations and help solve problems in a conversational manner – much like chatting with an expert on a subject.

Gemini, on the other hand, seems designed to process information and automate tasks in a way that saves the user time and effort.
One advantage of Gemini is that by default, it considers all of the information at its fingertips – including the internet, Google’s vast knowledge graph, and its training data.

ChatGPT, on the other hand, will often still choose to try and answer a question solely relying on its training data. This can occasionally lead to out-of-date information.

Gemini proves to be slightly more adept than ChatGPT when it comes to online searching and integrating the information it finds into its responses.
When ChatGPT does head online and look for information, its responses tend to lose some of their dynamism. It often seems as if it will answer questions or provide responses based on a single web search and a single source of information rather than conduting a comprehensive analysis of all the information it can access and coming to a conclusion.
ChatGPT 4.0 generates images using the DALL-E model, which was also developed by OpenAI. Gemini, on the other hand, utilises Google’s Imagen 2 engine. Both are clearly very powerful and can generate amazing results.
ChatGPT has a user-friendly interface and a straightforward but paid API, making it easy for beginners to start. Its simple text-based input and output format are readily accessible to many users. While Gemini with its advanced capabilities, may need more technical expertise for complex tasks. The team hasn’t disclosed its interface and API details, but they might involve more complex configurations than ChatGPT.
The main difference between ChatGPT and Gemini is that ChatGPT focuses on text generation and conversation, excelling in creative writing, translation, and engaging in open-ended, informative dialogue, whereas Gemini emphasises multimodality, meaning it can seamlessly handle and generate text, images, audio, and video.

One notable aspect of ChatGPT is its focus on democratizing access to advanced AI capabilities. Through its user-friendly interface and accessible API, individuals with varying levels of technical expertise can leverage the power of language models for a wide range of applications. Whether it’s assisting with customer support, generating content, or simply engaging in casual conversation, ChatGPT offers a straightforward platform for users to interact with AI.

In contrast, Gemini’s advanced capabilities may require a higher level of technical proficiency to fully leverage. While it excels in tasks such as information retrieval and automated processing, its complexity may present a barrier to entry for some users. However, for those with the expertise to harness its full potential, Gemini offers unparalleled capabilities in extracting insights and synthesizing information from diverse sources.

Ultimately, the choice between ChatGPT and Gemini depends on the specific needs and objectives of the user. Whether seeking to enhance customer interactions, streamline workflows, or explore the frontiers of AI research, both platforms represent significant milestones in the evolution of natural language understanding and multimodal AI. As AI continues to advance, the convergence of technologies represented by ChatGPT and Gemini promises to reshape how we interact with information and each other in the digital age.

In summary, ChatGPT and Gemini are both advanced language models, but they serve different purposes. ChatGPT focuses on facilitating natural language conversations and text generation tasks, prioritising user-friendly interfaces and conversational interactions. In contrast, Gemini emphasises multimodality, handling text, images, audio, and video seamlessly, with a focus on processing information efficiently. While ChatGPT excels in open-ended dialogue and creative writing, Gemini leverages its ability to integrate various media types and access extensive datasets for comprehensive responses. Ultimately, the choice between them depends on the specific needs and preferences of the user.

A thorough analysis and comparison of both the services can be found on wired.

af902d14-ac6c-4d46-ad9e-5280b31b2b51

RAG Essentials: Fine-tuning and Prompt Engineering

RAG stands for Retrieval Augmented Generation. It is a subsidiary of LLMs where you feed a model your knowledge base and use its pre-trained capabilities to engage to add that knowledge base in your LLM.

The workings of RAG involve a series of intricate processes designed to seamlessly integrate the knowledge base with the LLM. Initially, the knowledge base is fed into a vector database where the information is encoded into numerical representations. These vectors are then embedded and fed into the LLM, allowing it to process and analyze the given data effectively. By leveraging the combined power of the knowledge base and the LLM, RAG enables users to pose complex questions and receive insightful answers derived from a wealth of information.

Working

 

  1. The knowledge base is fed into the vector db
  2. The vectors are then embedded
  3. These embeddings are then fed into the LLM (knowledge base)
  4. The LLM then can process and analyse your given data and answer questions from it.
Although there are multiple ways through which RAG apps can be created and modified, the easiest and trending method is using LangChain. For vector db, pinecone is the most famous option out there. For embeddings, there are multiple options such as Hugging face embeddings, tiktoken and it is also possible to create your own embeddings, but it won’t have much complexity and token range.

Fine-tuning


Fine-tuning should not be confused with RAG. However, both terms are similar and closely related. Fine-tuning means retraining the model with custom parameters or specific parameters such that its usage and the knowledge-base can be complementary to each other.
One way of fine tuning an LLM is PEFT, which stands for parameter efficient fine tuning. This means retraining only the weights that are related to our use case instead of retraining the full model all over again.

Fine-tuning plays a crucial role in optimizing the performance of RAG models. Although distinct from RAG, fine-tuning involves retraining the model with custom parameters tailored to specific use cases. Parameter Efficient Fine Tuning (PEFT) offers a streamlined approach to fine-tuning by selectively adjusting relevant model weights without retraining the entire model from scratch. This allows for greater flexibility in adapting the LLM to different knowledge bases and user requirements.

Prompt Engineering

Prompt engineering is a relatively new term and it is a very important skill when it comes to getting desired outputs from the LLM. This means specifically designing an input prompt that has all the required details but not excess information. This prompt can optimally guide the LLM to generate a desired output. 

Prompt engineering emerges as a key skill in maximizing the effectiveness of RAG models. By crafting carefully designed input prompts, users can guide the LLM to generate desired outputs with precision and efficiency. Whether generating content for blogs, speeches, or refining existing text, a well-engineered prompt ensures optimal performance from the LLM, resulting in more accurate and contextually relevant responses.

This skill is important when you want the LLM to write content for you. eg. blogs, speeches, keynotes, etc. It is also important when you want the model to refine/modify your text. A well engineered prompt always gets the most optimal answer from the LLM.
In conclusion, RAG represents a groundbreaking advancement in AI-driven natural language processing, offering a powerful framework for integrating external knowledge bases with LLMs. Through a combination of advanced technologies and innovative techniques such as fine-tuning and prompt engineering, RAG empowers users to unlock new possibilities in information retrieval and generation, paving the way for more intelligent and insightful interactions with AI systems.