Develop an ATS scanner using Python and gemini-pro-vision

March 14, 2024

Introduction

This blog is a tutorial/documentation about creating a script that serves as a ATS(Application Tracking System) scanner for resume. It utilizes Python and Generative AI for implementation.

Procedure

It is advised to create a virtual environment for this project since the dependencies might clash with other local dependencies.

$ python -m venv venv

Then activate the venv using the following command:

$ .\venv\Scripts\activate

Install the required dependencies. To make things simpler, create a requirements.txt file in the project directory and type the following names into it:

streamlit
google-generativeai
python-dotenv
pdf2image
popplers

$ pip install -r requirements.txt

Generate an API key from here. Create a .env file and paste it like this:

GOOGLE_API_KEY  = " paste your api key here"

Create a app.py file in the same project directory and proceed with the following steps.

Importing and loading dependencies/libraries

Add the following libraries:

from dotenv import load_dotenv

load_dotenv()

import base64

import streamlit as st

import os

import io

fromPILimportImage

import pdf2image

import google.generativeai as genai

Import the Google API Key:

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

This will import your API key from the .env file into your app.py file.

Define a function to get response from the gemini-vision model.

def get_gemini_response(input,pdf_cotent,prompt):

    model=genai.GenerativeModel('gemini-pro-vision')

    response=model.generate_content([input,pdf_content[0],prompt])

    return response.text

Define a function that will take input from the user, convert it into base64 and return the contents of the pdf as an image so that the gemini-vision model can read and interact with it.

The vision model can interact with images so we are converting our pdf into an image to make our work easy. All ATS systems use different mechanisms but this is the closest we can come to an industry grade ATS scanner.

def input_pdf_setup(uploaded_file):

    if uploaded_file is not None:

        ## Convert the PDF to image

        images=pdf2image.convert_from_bytes(uploaded_file.read())

        first_page=images[0]
        # Convert to bytes

        img_byte_arr = io.BytesIO()

        first_page.save(img_byte_arr, format='JPEG')

        img_byte_arr = img_byte_arr.getvalue()

        pdf_parts = [

            {

                "mime_type": "image/jpeg",

                "data": base64.b64encode(img_byte_arr).decode()  # encode to base64

            }

        ]
        return pdf_parts

    else:

        raise FileNotFoundError("No file uploaded")

Setup streamlit web interface for our app to interact with users

Streamlit is a user-friendly library that helps users create a web interface to showcase their ML/AI models and projects. It saves them the time and hassle of writing a full-fledged website that would take days to develop otherwise. Streamlit does it in just a few lines of code.

st.set_page_config(page_title="ATS Resume Expert")

st.header("ATS Tracking System")

input_text=st.text_area("Job Description: ",key="input")

uploaded_file=st.file_uploader("Upload your resume(PDF)...",type=["pdf"])

if uploaded_file is not None:

    st.write("PDF Uploaded Successfully")

submit1 = st.button("Tell Me About the Resume")

submit3 = st.button("Percentage match")

Setup input prompts for the model.

This is where all the magic happens. You can play around with the prompts and tweak them in order to get relevant information from the LLM.

input_prompt1 = """

 You are an experienced Technical Human Resource Manager,your task is to review the provided resume against the job description.

  Please share your professional evaluation on whether the candidate's profile aligns with the role.

 Highlight the strengths and weaknesses of the applicant in relation to the specified job requirements.

"""

input_prompt3 = """

You are an skilled ATS (Applicant Tracking System) scanner with a deep understanding of data science and ATS functionality,

your task is to evaluate the resume against the provided job description. give me the percentage of match if the resume matches

the job description. First the output should come as percentage and then keywords missing and last final thoughts.

"""

Add the action buttons

if submit1:

    if uploaded_file is not None:

        pdf_content=input_pdf_setup(uploaded_file)

        response=get_gemini_response(input_prompt1,pdf_content,input_text)

        st.subheader("The Repsonse is")

        st.write(response)

    else:

        st.write("Please uplaod the resume")

elif submit3:

    if uploaded_file is not None:

        pdf_content=input_pdf_setup(uploaded_file)

        response=get_gemini_response(input_prompt3,pdf_content,input_text)

        st.subheader("The Repsonse is")

        st.write(response)

    else:

        st.write("Please uplaod the resume")

Use text model instead of vision model

A text based model works well when the resume is handmade using overleaf(LaTeX). In such scenario, the text is easily recognizable using PDF readers and there is no need to convert it into image for recognition. All text based models can interact with it.

In our case, we will use the gemini-pro model.

modify the following functions:

def get_gemini_response(input_text, resume_content, job_description):

    model = genai.GenerativeModel('models/gemini-pro')

    response = model.generate_content([input_text, resume_content, job_description])

    return response.text

def extract_text_from_pdf(uploaded_file):

    resume_text = ""

    with uploaded_file:

        pdf_reader = PyPDF2.PdfReader(uploaded_file)

        num_pages = len(pdf_reader.pages)

        for page_number in range(num_pages):

            page =  pdf_reader.pages[page_number]
            resume_text += page.extract_text()

    return resume_text

def input_pdf_setup(uploaded_file):

    if uploaded_file is not None:

        resume_text = extract_text_from_pdf(uploaded_file)

        paragraphs = resume_text.split('\n\n')  # Splitting based on double newlines,

        resume_content = '\n'.join(paragraphs)  # Concatenate paragraphs into a single string

        return resume_content

    else:

        raise FileNotFoundError("No file uploaded")

Modifying these three functions would ensure that your script successfully runs on text based models and shows more accuracy on text formatted resume. Everything else is the same. Even the API key is common for both the models.

Dharmik Valani

Dharmik valani is the full-stack developer with a passion for crafting innovative solutions. With a robust skill set encompassing both front-end and back-end technologies, they bring a wealth of expertise to our platform. Stay tuned for insightful articles, as this author navigates the ever-evolving landscape of web development, sharing valuable insights and best practices.

Aimbrill