UAT (User Acceptance Testing) with Python and Excel

Automating user acceptance test using Python.

Aug 23, 2024

While working on projects, most of the time there are some data warehouse table outputs. And these tables need to be tested before project is completed. Table can contain various type of data: report output, grouped data, customer list, log data.

Since the environment is data warehouse as a conventional method (beside manually checking with human eye) I use SQL queries for my tests. Lately, over the past couple of years, I’ve started to use Python to automate and speed up my process.

Prerequisites:

Any Python installed machine (desktop, server, service etc.). I am currently using Jupyter Notebook on the Anaconda Distribution.
Any IDE (Integrated Development Environment) which can execute code would suffice (Spyder, Jupyter Notebook, Jupyter Lab, IDLE etc.).
Database connection (I am using Oracle for this example)
MS Office or any program which can open .xlsx format. This is not mandatory. It will only be necessary to open the file in order to examine the test results.
Windows Operation System (Maybe this is obvious but I’d written the code on Windows, if you are using Linux or Mac OS, it might need some tweaking)

1. Creating SQL queries

Regardless your role is in IT department or business user, this process is mostly the same. Once the queries are written, it can be executed as many as you want. Every test case you think you need should be written in this step. I mean every test case (sample tests: data type check, null check, trim checks, data check, count of records, comparing the count of records by another table, cross check with lookup tables, amount controls etc.).

2. File Locations

Actually, this is up to you. For different projects it can be different folders, it doesn’t matter as long as you can access and read/write permission to those folders. Frankly I'd prefer to keep my common queries together which separated by what they do and project-based queries in their project folders. If you are using Python Server environment or equivalent of it, same permissions are required.

3. Preparing Excel Template File

This is the file that Python reads and executes the SQL queries. Here are my columns and their explanations (User Acceptance Testing Template.xlsx):

Id: Row Id (Just to identify the record, nothing more)
Group: The group of queries (This is important and very useful. While executing queries based on your need, you might want to filter some groups and execute only them.)
SubGroup: The subgroup of query (Subgroup of Group. It can be used same filtering purposes)
QueryName: This is the query name. I’d prefer naming them as “QUERY_0001” format, without the extension. Extensions are handled in the Python code because I am using the same name for export file name.
Folder: This is the queries where your SQL query results are saved. It can be any folder as long as you have write permission. (It will be left empty for this example. Queries are read from predefined QUERIES folder in the present working directory. But you can set this link to a different folder for each query.
ExportFile: With the current template, we don’t need this. But if you want to add and define exported file names according to this cell value, it can be done. (It will be left empty for this example)
Connection: This column shows which database connection is required for query execution. These are the predefined aliases. Every alias is mapped to different database connections. Python code searches this alias in Oracle Connections.xlsx while executing the code and matches the corresponding database connection. Query is executed by using this database connection. There is no need to write, define or update database connection string for every file/code.
Active Flag: Shows if the query is active and it should be executed or not.
Frequency: Frankly, this is optional. You can leave it as blank or you can add options like every day, every month, every Monday etc. The code can filter the records based on these parameters and chose to execute or not execute those records (It will be left empty for this example).

4. Database Connections:

This is the file where the database connections are written. Since I use only Oracle database, the Python code is configured accordingly. The Oracle Connections.xlsx file contains 2 columns:

Alias: This is the alias column we are searching while executing the query. This value should be matched Connection column in the User Acceptance Testing Template.xlsx
ConnectionInfo: This is the corresponding value of the found alias. (Database connection string)

5. Python and Excel Integration

# Importing libraries
import python_to_oracle as pto
import LoopDates
import pandas as pd
import os
import datetime as dt
import calendar
from datetime import datetime
from datetime import date, timedelta

# Basic variables
my_pwd = os.getcwd() # Get the current working directory

# User Parameter Section

# Which file is used for query mapping?
uat_file = "User Acceptance Testing Template.xlsx" # It should me in the same folder with this code. You can name whatever you want as long as it is defined here.

# Export folder for query results
export_folder = os.path.join(my_pwd, "QUERY_RESULTS") # You should create a QUERY_RESULTS folder

# UAT date range (uat_start_date and uat_end_date are included)
uat_start_date = datetime(2024, 1, 24) # YYYY, MM, DD format
uat_end_date = datetime(2024, 3, 26) # YYYY, MM, DD format

# UAT frequency
# If it is "daily" then it executes the code between uat_start_date and uat_end_date for every day
# If it is "monthly" then it executes the code between uat_start_date and uat_end_date for every end of month
uat_frequency = "daily"

# Based on uat_frequency, it creates the proper dates between uat_start_date and uat_end_date
date_loop_results = LoopDates.date_range(uat_start_date, uat_end_date, uat_frequency)


possible_encodings = ["utf-8", "iso-8859-9", "cp1254"]

# Other functions to define

# Finding current date time
def find_now()
    return dt.datetime.now()

# Finding the current date time. It is used in the exported file name
def find_file_name_datetime()
    return dt.datetime.now().strftime("%Y%m%d_%H%M%S")

# Mark the start time
start_time = find_now()

# Read the UAT excel
uat_file_df = pd.read_excel(uat_file)

# Loop through the rows using iterrows()
for index, row in uat_file_df.iterrows():
    if row["Active Flag"] == 1: # Skips if the query is inactive.
        # Creates query link
        query_link = os.path.join(my_pwd, "QUERIES", row["Query Name"] + ".sql")

        # Opens the file with proper encoding
        file_content, file_encoding = read_file_with_encodings(query_link, possible_encodings)

        # Creates sql query
        sql_query = file_content

        # Establish Oracle connection with proper database alias
        oracle_connection = pto.connect_db(row["CONNECTION"])
        # Opens a cursor
        cursor = oracle_connection.cursor()
        
        # Loops through predefined dynamic dates
        for uat_date in date_loop_results:
            v_my_date = uat_date.strftime("%d.%m.%Y") # Used in the query
            v_my_date_f = UAT_DATE.strftime("%Y%m%d") # Used for the file name
            
            result_df = pd.read_sql_query(sql_query, oracle_connection, params={"MY_DATE": v_my_date}) # This is the important part.
            # Every query contains a bind variable on necessary date column: TO_DATE(:MY_DATE, 'DD.MM.YYYY')
            # Date formats should match in both Python and Oracle
            
            result_count = len(result_df)
            if result_count > 0: # If the query returns any row then export. Remember these are UAT queries therefore we are looking invalid records. Queries should be written according to this perspective.
                # Export file path
                export_file_xlsx = os.path.join(export_folder, row["QUERY_NAME"], + "_" + v_my_date_f + "_" + find_file_name_datetime())
                
                # Exporting step
                with pd.ExcelWriter(export_file_xlsx, date_format="dd.mm.yyyy", datetime_format="dd.mm.yyyy") as writer:
                    result_df.to_excel(writer, sheet_name="DATA", startrow=0, index=False, header=True)
        
        cursor.close()
        oracle_connection.close() # It should be closed because next row can be a different connection or database.

# Finishing up

end_time = find_now()

print("Process completed")
print("Start time", start_time)
print("End time", end_time)
print("Difference", end_time - start_time)

6. Scheduling

This video explains how to schedule Python script and execute in Windows (How to Run a Python Script with the Windows Task Scheduler - Run Scheduled Python Scripts):

6. Additional Information

Like filtering active records, you can filters any group or sub group you want without opening the file: unique_groups = uat_file_df["Group"].unique(). Simply add another filter where you filter active records:

# Loop through the rows using iterrows()
for index, row in uat_file_df.iterrows():
    if row["Active Flag"] == 1 and row["Group"] == "GROUP_2": # Skips if the query is inactive or filtered group.
        # Do something

Notes:

python_to_oracle.py module should be in the same folder with executed code or the location of the file should be defined using sys.path.append: sys.path.append("folder_path")
For python_to_oracle.py module, please see:
Connecting Python to Oracle
Onur Demircioğlu
·
July 26, 2024
Read full story
For LoopDates.py, please see:
Looping Dates in Python
Onur Demircioğlu
·
August 9, 2024
Read full story
For read_file_with_encoding function, please see:
Opening Files with Different Encodings in Python
Onur Demircioğlu
·
August 3, 2024
Read full story

You can access the files here.

Computer Diaries by Onur

Connecting Python to Oracle

Looping Dates in Python

Opening Files with Different Encodings in Python

Discussion about this post

Ready for more?