Dominicks supports a constant quality food example
Overview of the data
Dominick’s is a rich and very useful dataset to that can act as a representative scanner example. Since it was first proposed by Mehrhoff (2018), it has been widely used within price statistics, from analysis of multilateral methods by Lamboray (2021) to empirical tests for the decomposition of multilateral methods by Webster and Tarnow-Mordi (2019). We leverage this dataset as it provides a very realistic example that NSOs would need to work with.
The dataset comes pre-categorized in separate product and movement files on the Chicago Booth School website ( ([2013] 2018)).1 This makes it simple to work with as we can assume that the category represents the elementary index prior to its integration into the CPI with other retailer dataset and field data.
TipData flow
To contextualize the data flow, Figure 1 shows how data flows in a research sense, but also how it would flow were it to be a real dataset.
Figure 1: Visual flow of data processing for Dominick’s as if it were a real production dataset
Categories
To simulate a real use case, lets say that we use Dominick’s data for production price indices. Below, several category outputs are shown from a ‘birds eye view’ of the whole dataset. This is done to pick representative periods that can be selected and used for targeted analysis – targeted to the specific time period and therefore a simulation of a normal monthly process.
Fabric Softener
Show the code
# Load your functionssource("R/dominicks_utils.R")source("R/price_indices.R")library(plotly)library(DT)ird <-read_parquet("data/clean/ird_fsf_timesamp_NULL_groupby_NITEM-REF_PERIOD.parquet")ccdi <-spliced_CCDI(ird)ccdi_df <-data.frame(period =names(ccdi),score =as.numeric(ccdi))write_parquet(ccdi_df, "output/fsf_ccdi.parquet")fig <-plot_ly(ccdi_df, x =~period, y =~score, type ='scatter', mode ='lines',text =~paste("Period:", period, "<br>Index:", score),hoverinfo ='text') %>%layout(title ="CCDI with mean splice (Fabric Softeners category)",xaxis =list(title ="Time Period", type ='category'), # Ensures order is maintainedyaxis =list(title ="Index", tickformat =".01"),shapes =list(list(type ="line",x0 ="1992-07",x1 ="1992-07",y0 =0,y1 =1,yref ="paper",line =list(color ="red", width =2) ) ))
Monthly Dashboard for Fabric Softener sold at Dominick’s: July 1992
Dominicks, Fabric Softener, GEKS
No matching items
References
(2013) 2018. In Dominick’s Data Manual. Kilts Center for Marketing.
Lamboray, Claude. 2021. “Index Compilation Techniques for Scanner Data: An Overview.”Group of Experts on Consumer Price Indices.
Mehrhoff, Jens. 2018. “Promoting the Use of a Publically Available Scanner Data Set in Price Index Research and for Capacity Building.”Manuscript, European Commission, Https://Bit. Ly/2ZBUbg9.
Webster, Michael, and Rory C Tarnow-Mordi. 2019. “Decomposing Multilateral Price Indexes into the Contributions of Individual Commodities.”Journal of Official Statistics 35 (2): 461–86.
Footnotes
See catalogue record in the Price Statistics Open Data catalogue for more information about the dataset.↩︎