import pandas as pd
import numpy as np
import altair as alt
import statsmodels.formula.api as sm
Marketing Mix Modeling (MMM) explanation and implementation
Marketing Mix Modeling (MMM) is a powerful technique used to measure the effectiveness of various marketing activities and understand how they contribute to sales or other key performance indicators (KPIs). In this blog post, we’ll explore MMM from a machine learning perspective, providing in-depth explanations, code examples in Python, and insightful visualizations.
What is MMM?
MMM uses statistical models to quantify the impact of different marketing channels on sales. It helps answer questions like:
- What is the return on investment (ROI) of each marketing channel?
- How do different channels interact with each other?
- What is the optimal allocation of marketing budget across channels?
Data and Methodology
For this project, we’ll use a simulated dataset containing daily marketing spend for TV, radio, and social media advertising, along with the corresponding sales figures. We’ll then build a regression model to analyze the relationship between marketing activities and sales.
Note: While this example uses simulated data, the principles and techniques apply to real-world marketing datasets.
1. Import Libraries
2. Generate and Prepare the Data
# Create a dataframe with the following columns:
# `Date`, `TV`, `Radio`, `Social_Media`, `Sales`
= pd.DataFrame({
df "Date": pd.to_datetime(["2023-01-01", "2023-01-02", "2023-01-03", "2023-01-04", "2023-01-05", "2023-01-06", "2023-01-07", "2023-01-08", "2023-01-09", "2023-01-10", "2023-01-11", "2023-01-12", "2023-01-13", "2023-01-14", "2023-01-15"]),
"TV": [1000, 1200, 1100, 1300, 1400, 1500, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100],
"Radio": [500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200],
"Social_Media": [250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600],
"Sales": [2500, 2700, 2900, 3100, 3300, 3500, 3700, 3900, 4100, 4300, 4500, 4700, 4900, 5100, 5300]
})
# Calculate the 7-day rolling averages for the `TV`, `Radio`, and `Social_Media` columns
"TV_7d_avg"] = df["TV"].rolling(window=7, center=True).mean()
df["Radio_7d_avg"] = df["Radio"].rolling(window=7, center=True).mean()
df["Social_Media_7d_avg"] = df["Social_Media"].rolling(window=7, center=True).mean()
df[
# Calculate the 7-day rolling averages for the `Sales` column
"Sales_7d_avg"] = df["Sales"].rolling(window=7, center=True).mean() df[
3. Visualize the Data
I haven’t used Altair plotting library before, so I will try to use it in this project.
# Plot the 7-day rolling averages for `TV`, `Radio`, and `Social_Media` advertising spends over time
= alt.Chart(df).encode(x=alt.X('Date', axis=alt.Axis(title='Date')))
base
# Plot the 7-day rolling averages for `TV`
= base.mark_line(color='blue').encode(
tv_plot =alt.Y('TV_7d_avg', axis=alt.Axis(title='TV 7-day Avg')),
y=['Date', 'TV_7d_avg']
tooltip='7-Day Rolling Average of TV Advertising Spend')
).properties(title
# Plot the 7-day rolling averages for `Radio`
= base.mark_line(color='red').encode(
radio_plot =alt.Y('Radio_7d_avg', axis=alt.Axis(title='Radio 7-day Avg')),
y=['Date', 'Radio_7d_avg']
tooltip='7-Day Rolling Average of Radio Advertising Spend')
).properties(title
# Plot the 7-day rolling averages for `Social_Media`
= base.mark_line(color='green').encode(
social_media_plot =alt.Y('Social_Media_7d_avg', axis=alt.Axis(title='Social Media 7-day Avg')),
y=['Date', 'Social_Media_7d_avg']
tooltip='7-Day Rolling Average of Social Media Advertising Spend')
).properties(title
# Combine the plots
= tv_plot + radio_plot + social_media_plot
combined_plot
combined_plot.display()
# Plot the 7-day rolling averages for `Sales` over time
= base.mark_line().encode(
sales_plot =alt.Y('Sales_7d_avg', axis=alt.Axis(title='Sales 7-day Avg')),
y=['Date', 'Sales_7d_avg']
tooltip='7-Day Rolling Average of Sales')
).properties(title
sales_plot.display()
# Create a scatter plot of `TV` spend against `Sales`
= alt.Chart(df).mark_point().encode(
scatter_tv =alt.X('TV', axis=alt.Axis(title='TV Spend')),
x=alt.Y('Sales', axis=alt.Axis(title='Sales')),
y=['TV', 'Sales']
tooltip='TV Spend vs. Sales')
).properties(title
scatter_tv.display()
# Create a scatter plot of `Radio` spend against `Sales`
= alt.Chart(df).mark_point().encode(
scatter_radio =alt.X('Radio', axis=alt.Axis(title='Radio Spend')),
x=alt.Y('Sales', axis=alt.Axis(title='Sales')),
y=['Radio', 'Sales']
tooltip='Radio Spend vs. Sales')
).properties(title
scatter_radio.display()
# Create a scatter plot of `Social_Media` spend against `Sales`
= alt.Chart(df).mark_point().encode(
scatter_social_media =alt.X('Social_Media', axis=alt.Axis(title='Social Media Spend')),
x=alt.Y('Sales', axis=alt.Axis(title='Sales')),
y=['Social_Media', 'Sales']
tooltip='Social Media Spend vs. Sales')
).properties(title
scatter_social_media.display()
4. Build and Evaluate the Model
Fit a regression model with Sales
as the outcome and TV
, Radio
, and Social_Media
as predictors
# Fit a regression model
= sm.ols("Sales ~ TV + Radio + Social_Media", data=df).fit()
model
# Print the model summary
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Sales R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 3.026e+29
Date: Fri, 18 Oct 2024 Prob (F-statistic): 6.08e-173
Time: 07:57:22 Log-Likelihood: 372.97
No. Observations: 15 AIC: -739.9
Df Residuals: 12 BIC: -737.8
Df Model: 2
Covariance Type: nonrobust
================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------
Intercept 500.0000 6.19e-12 8.07e+13 0.000 500.000 500.000
TV -1.72e-15 1.33e-14 -0.129 0.900 -3.08e-14 2.74e-14
Radio 3.2000 1.57e-14 2.04e+14 0.000 3.200 3.200
Social_Media 1.6000 7.83e-15 2.04e+14 0.000 1.600 1.600
==============================================================================
Omnibus: 2.048 Durbin-Watson: 0.014
Prob(Omnibus): 0.359 Jarque-Bera (JB): 1.554
Skew: 0.649 Prob(JB): 0.460
Kurtosis: 2.104 Cond. No. 1.00e+18
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.05e-29. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
/Users/fabianlanderos/miniforge3/envs/quarto/lib/python3.12/site-packages/scipy/stats/_stats_py.py:1971: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=15
k, _ = kurtosistest(a, axis)
# plot OLS model
'Sales_pred'] = model.predict(df)
df[
= alt.Chart(df).encode(x=alt.X('Date', axis=alt.Axis(title='Date')))
base = base.mark_line(color='black').encode(
sales_plot =alt.Y('Sales', axis=alt.Axis(title='Sales')),
y=['Date', 'Sales']
tooltip='Actual Sales')
).properties(title
= base.mark_line(color='red').encode(
sales_pred_plot =alt.Y('Sales_pred', axis=alt.Axis(title='Sales')),
y=['Date', 'Sales_pred']
tooltip='Predicted Sales')
).properties(title
= sales_plot + sales_pred_plot
combined_plot
combined_plot.display()
# Plot the residuals
'Residuals'] = model.resid
df[= base.mark_line(color='green').encode(
residuals_plot =alt.Y('Residuals', axis=alt.Axis(title='Residuals')),
y=['Date', 'Residuals']
tooltip='Model Residuals')
).properties(title
residuals_plot.display()
5. Interpreting the Results
The regression model output provides valuable insights into the effectiveness of each marketing channel. Here’s how to interpret the key components:
- Coefficients: These indicate the estimated impact of each channel on sales. For example, a coefficient of 1.6 for social media suggests that, on average, a one-unit increase in social media spend is associated with a 1.6 unit increase in sales, holding other factors constant.
- R-squared: This measures the proportion of variance in sales explained by the model.
Let’s dive deeper into the sm.ols
model and the associated metrics.
Understanding sm.ols
sm.ols
is a function from the statsmodels
library in Python used for performing Ordinary Least Squares (OLS) regression. OLS is a common statistical method for estimating the relationship between a dependent variable (in our case, Sales
) and one or more independent variables (our marketing channels: TV
, Radio
, Social_Media
).
The core idea of OLS is to find the line that best fits the data by minimizing the sum of the squared differences between the observed sales values and the values predicted by the line. This line represents the linear relationship between marketing spend and sales.
Key Metrics from the Model Summary
The model.summary()
output provides a wealth of information to assess the model’s fit and interpret the results. Here are some of the key metrics:
- R-squared and Adjusted R-squared:
R-squared
represents the proportion of variance in the dependent variable (Sales
) that is explained by the independent variables. It ranges from 0 to 1, with higher values indicating a better fit.Adjusted R-squared
is a modified version ofR-squared
that adjusts for the number of predictors in the model. It penalizes the inclusion of unnecessary variables that don’t significantly improve the model’s explanatory power.
- Coefficients:
- These are estimates of the effect of each independent variable on the dependent variable, holding other variables constant.
- Each coefficient has an associated standard error, which measures the uncertainty in the estimate.
- The
t
-statistic andP>|t|
values help determine the statistical significance of each coefficient. A lowP>|t|
value (typically below 0.05) suggests that the predictor has a statistically significant impact on sales.
- F-statistic and Prob (F-statistic):
- The
F
-statistic tests the overall significance of the regression model. It assesses whether at least one of the independent variables is significantly related to the dependent variable. Prob (F-statistic)
is the p-value associated with theF
-statistic. A low p-value indicates that the model is statistically significant.
- The
- AIC and BIC:
AIC
(Akaike Information Criterion) andBIC
(Bayesian Information Criterion) are measures of model fit that penalize models with more parameters. Lower values of AIC and BIC indicate a better balance between model fit and complexity.
- Omnibus, Durbin-Watson, Jarque-Bera:
- These are diagnostic tests to check the assumptions of the linear regression model.
Omnibus
andJarque-Bera
test the normality of the residuals.Durbin-Watson
tests for autocorrelation in the residuals.
Interpreting the Results in the Context of MMM
In the context of MMM, these metrics help us understand the effectiveness of our marketing mix. For instance:
- Significant coefficients for
TV
,Radio
, orSocial_Media
indicate that these channels have a measurable impact on sales. - The magnitude of the coefficients helps compare the relative effectiveness of different channels.
R-squared
tells us how well our model captures the overall variation in sales.- Diagnostic tests help ensure the reliability of our results.
By carefully analyzing these metrics, marketers can make informed decisions about budget allocation, campaign optimization, and overall marketing strategy.