Calculating Median of Cash Withdrawal with Python
Previously, we had calculated median of cash withdrawal with SQL, Excel, PL/SQL and Excel VBA, Google Sheets Apps Script. And finally we…

Previously, we had calculated median of cash withdrawal with SQL, Excel, PL/SQL and Excel VBA, Google Sheets Apps Script. And finally we are calculating with Python.
Solution is the same as before. Finding last 3 transaction for each customer and comparing it to necessary value.
# import libraries
import numpy as np
import pandas as pd
# Creates file name/path
file_name = "Calculating Median of Cash Withdrawal with Excel.xlsx"
# Create pandas dataframe
df = pd.read_excel(file_name, sheet_name="Sheet1", skiprows=1, header=0, index_col=None, usecols="A:C")
df
# Creating Transaction Count for every customer. We wil use this skip first 2 calculations
df["Transaction Count"] = df.groupby('Customer Id').cumcount() + 1
df
# Calculationg median of 3 transactions.
# Sort the DataFrame
df = df.sort_values(by=['Customer Id', 'Transaction Date'])
# Reset the index to ensure proper alignment
df.reset_index(drop=True, inplace=True)
# Calculate the rolling median for each customer using groupby and rolling
window_size = 3 # Window size of 3 for current row, previous row, and two rows above (last 3 transactions)
# Calculating median of last 3 transaction of each customer regarding. window_size determines the number of transactions to include.
rolling_medians = df.groupby('Customer Id')['Transaction Amount'].rolling(window=window_size, min_periods=3).median()
# min_periods = Minimum number of observations in window required to have a value; otherwise, result is np.nan. If it is different than window_size (for this use case) it still can calculate even number of transactions less than 3.
# Reset the index of the rolling_medians DataFrame to align with df
rolling_medians = rolling_medians.reset_index(level=0, drop=True)
# Assign the calculated rolling medians to the DataFrame
df['Median of N Values'] = rolling_medians
# Replace NaN values in "Median of N Values" with 0 because we will calculate "Double of Transaction" column based on this column later. NaN values cause a (minor) problem.
df['Median of N Values'].fillna(0, inplace=True)
df
As before, it is possible to make it dynamic by changing input values of window_size and min_periods parameters.
# Create a new column "Double of Transaction" based on the conditions (If median = 0 then it is 0, otherwise it is multiplication of "Transaction Amount" by 2.
df['Double of Transaction'] = df['Transaction Amount'] * 2
df.loc[df['Median of N Values'] == 0, 'Double of Transaction'] = 0
df
# Create a new column "Alert Status" based on the condition (result column)
df['Alert Status'] = np.where(
(df['Median of N Values'] > 0) &
(df['Double of Transaction'] > 0) &
(df['Median of N Values'] >= df['Double of Transaction']),
'SEND ALERT',
'DO NOT SEND ALERT'
)
df
You can access the sample file and codes here.
Further Reading References:
Median: https://en.wikipedia.org/wiki/Median
Pandas DataFrame groupby: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
Pandas DataFrame rolling: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html
Pandas DataFrame cumcount (Cumulative Count): https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.cumcount.html
NumPy: https://numpy.org/