Calculating Median of Cash Withdrawal with Python

Previously, we had calculated median of cash withdrawal with SQL, Excel, PL/SQL and Excel VBA, Google Sheets Apps Script. And finally we…

Jun 07, 2024

Previously, we had calculated median of cash withdrawal with SQL, Excel, PL/SQL and Excel VBA, Google Sheets Apps Script. And finally we are calculating with Python.

Solution is the same as before. Finding last 3 transaction for each customer and comparing it to necessary value.

# import libraries
import numpy as np
import pandas as pd

# Creates file name/path
file_name = "Calculating Median of Cash Withdrawal with Excel.xlsx"

# Create pandas dataframe
df = pd.read_excel(file_name, sheet_name="Sheet1", skiprows=1, header=0, index_col=None, usecols="A:C")
df

# Creating Transaction Count for every customer. We wil use this skip first 2 calculations
df["Transaction Count"] = df.groupby('Customer Id').cumcount() + 1
df

# Calculationg median of 3 transactions.

# Sort the DataFrame
df = df.sort_values(by=['Customer Id', 'Transaction Date'])

# Reset the index to ensure proper alignment
df.reset_index(drop=True, inplace=True)

# Calculate the rolling median for each customer using groupby and rolling
window_size = 3  # Window size of 3 for current row, previous row, and two rows above (last 3 transactions)

# Calculating median of last 3 transaction of each customer regarding. window_size determines the number of transactions to include.
rolling_medians = df.groupby('Customer Id')['Transaction Amount'].rolling(window=window_size, min_periods=3).median()
# min_periods = Minimum number of observations in window required to have a value; otherwise, result is np.nan. If it is different than window_size (for this use case) it still can calculate even number of transactions less than 3.

# Reset the index of the rolling_medians DataFrame to align with df
rolling_medians = rolling_medians.reset_index(level=0, drop=True)

# Assign the calculated rolling medians to the DataFrame
df['Median of N Values'] = rolling_medians

# Replace NaN values in "Median of N Values" with 0 because we will calculate "Double of Transaction" column based on this column later. NaN values cause a (minor) problem.
df['Median of N Values'].fillna(0, inplace=True)

df

As before, it is possible to make it dynamic by changing input values of window_size and min_periods parameters.

# Create a new column "Double of Transaction" based on the conditions (If median = 0 then it is 0, otherwise it is multiplication of "Transaction Amount" by 2.
df['Double of Transaction'] = df['Transaction Amount'] * 2
df.loc[df['Median of N Values'] == 0, 'Double of Transaction'] = 0

df

# Create a new column "Alert Status" based on the condition (result column)
df['Alert Status'] = np.where(
    (df['Median of N Values'] > 0) & 
    (df['Double of Transaction'] > 0) & 
    (df['Median of N Values'] >= df['Double of Transaction']), 
    'SEND ALERT', 
    'DO NOT SEND ALERT'
)

df

You can access the sample file and codes here.

Computer Diaries by Onur

Discussion about this post