Dominicks offers us a rich real world scanner dateset (focusing on food from stores)
Overview of the data
The Dominick’s dataset is a rich rich and very useful dataset to leverage in validating how diagnostics and analytics can be designed with scanner data. Since it was first proposed by Mehrhoff (2018), it has been widely used within price statistics, from analysis of multilateral methods by Lamboray (2021) to empirical tests for the decomposition of multilateral methods by Webster and Tarnow-Mordi (2019). We leverage this dataset as it provides a very realistic example that NSOs would need to work with.
The dataset comes pre-categorized in separate product and movement files on the Chicago Booth School website ( ([2013] 2018)).1 This makes it simple to work with as we can assume that the category represents the elementary index prior to its integration into the CPI with other retailer dataset and field data.
Categories
To simulate a real use case, lets say that we use Dominick’s data for production price indices. Below, several category outputs are shown from a ‘birds eye view’ of the whole dataset. This is done to pick representative periods that can be selected and used for targeted analysis – targeted to the specific time period and therefore a simulation of a normal monthly process.
Fabric Softener
Figure 1 shows the Fabric Softeners category over several years.
Show the code
# Load your functionssource("../../src/dominicks_utils.R")library(plotly)ird <-homogenous_product_aggregation(category_name='fsf',data_dir='../../data/processed',time_sample=c(1,2),group_by_parameters=c('NITEM', 'REF_PERIOD'),window=list("start"="1990-01-01","end"="1997-04-01") )ccdi <-spliced_CCDI(ird)ccdi_df <-data.frame(period =names(ccdi),score =as.numeric(ccdi))fig <-plot_ly(ccdi_df, x =~period, y =~score, type ='scatter', mode ='lines',text =~paste("Period:", period, "<br>Index:", score),hoverinfo ='text') %>%layout(title ="CCDI with mean splice (Fabric Softeners category)",xaxis =list(title ="Time Period", type ='category'), # Ensures order is maintainedyaxis =list(title ="Index", tickformat =".1"))fig
Figure 1: Fabric Softeners GEKS-T example
Beer
Figure 2 shows the Beer category over several years.
Show the code
ird <-homogenous_product_aggregation(category_name='ber',data_dir='../../data/processed',time_sample=c(1,2),group_by_parameters=c('NITEM', 'REF_PERIOD'),window=list("start"="1990-01-01","end"="1997-04-01") )ccdi <-spliced_CCDI(ird)ccdi_df <-data.frame(period =names(ccdi),score =as.numeric(ccdi))fig <-plot_ly(ccdi_df, x =~period, y =~score, type ='scatter', mode ='lines',text =~paste("Period:", period, "<br>Index:", score),hoverinfo ='text') %>%layout(title ="CCDI with mean splice (Beer category)",xaxis =list(title ="Time Period", type ='category'), # Ensures order is maintainedyaxis =list(title ="Index", tickformat =".1"))fig
Figure 2: Beer GEKS-T example
Analgetics
Figure 3 shows the Analgetics category over several years.
Show the code
ird <-homogenous_product_aggregation(category_name='ana',data_dir='../../data/processed',time_sample=c(1,2),group_by_parameters=c('NITEM', 'REF_PERIOD'),window=list("start"="1990-01-01","end"="1997-04-01") )ccdi <-spliced_CCDI(ird)ccdi_df <-data.frame(period =names(ccdi),score =as.numeric(ccdi))fig <-plot_ly(ccdi_df, x =~period, y =~score, type ='scatter', mode ='lines',text =~paste("Period:", period, "<br>Index:", score),hoverinfo ='text') %>%layout(title ="CCDI with mean splice (Analgetics category)",xaxis =list(title ="Time Period", type ='category'), # Ensures order is maintainedyaxis =list(title ="Index", tickformat =".1"))fig
Figure 3: Analgetics GEKS-T example
Cookies
Figure 4 shows the Cookies category over several years.
Show the code
ird <-homogenous_product_aggregation(category_name='coo',data_dir='../../data/processed',time_sample=c(1,2),group_by_parameters=c('NITEM', 'REF_PERIOD'),window=list("start"="1990-01-01","end"="1997-04-01") )ccdi <-spliced_CCDI(ird)ccdi_df <-data.frame(period =names(ccdi),score =as.numeric(ccdi))fig <-plot_ly(ccdi_df, x =~period, y =~score, type ='scatter', mode ='lines',text =~paste("Period:", period, "<br>Index:", score),hoverinfo ='text') %>%layout(title ="CCDI with mean splice (Cookies category)",xaxis =list(title ="Time Period", type ='category'), # Ensures order is maintainedyaxis =list(title ="Index", tickformat =".1"))fig
Figure 4: Cookies GEKS-T example
Bottled Juices
Figure 5 shows the Bottled Juices category over several years.
Show the code
ird <-homogenous_product_aggregation(category_name='bjc',data_dir='../../data/processed',time_sample=c(1,2),group_by_parameters=c('NITEM', 'REF_PERIOD'),window=list("start"="1990-01-01","end"="1997-04-01") )ccdi <-spliced_CCDI(ird)ccdi_df <-data.frame(period =names(ccdi),score =as.numeric(ccdi))fig <-plot_ly(ccdi_df, x =~period, y =~score, type ='scatter', mode ='lines',text =~paste("Period:", period, "<br>Index:", score),hoverinfo ='text') %>%layout(title ="CCDI with mean splice (Bottled Juices category)",xaxis =list(title ="Time Period", type ='category'), # Ensures order is maintainedyaxis =list(title ="Index", tickformat =".1"))fig
Figure 5: Bottled Juices GEKS-T example
Dominicks specific data flow
To contextualize the data flow, Figure 6 shows how data flows in a research sense, but also how it would flow were it to be a real dataset.
Figure 6: Visual flow of data processing for Dominick’s as if it were a real production dataset
References
(2013) 2018. Dominick’s Data Manual. Kilts Center for Marketing.
Lamboray, Claude. 2021. “Index Compilation Techniques for Scanner Data: An Overview.”Group of Experts on Consumer Price Indices.
Mehrhoff, Jens. 2018. “Promoting the Use of a Publically Available Scanner Data Set in Price Index Research and for Capacity Building.”Manuscript, European Commission, Https://Bit. Ly/2ZBUbg9.
Webster, Michael, and Rory C Tarnow-Mordi. 2019. “Decomposing Multilateral Price Indexes into the Contributions of Individual Commodities.”Journal of Official Statistics 35 (2): 461–86.
Footnotes
See catalogue record in the Price Statistics Open Data catalogue for more information about the dataset.↩︎