Showcase with Dominick’s dataset

Dominicks supports a constant quality food example

Overview of the data

Dominick’s is a rich and very useful dataset to that can act as a representative scanner example. Since it was first proposed by Mehrhoff (2018), it has been widely used within price statistics, from analysis of multilateral methods by Lamboray (2021) to empirical tests for the decomposition of multilateral methods by Webster and Tarnow-Mordi (2019). We leverage this dataset as it provides a very realistic example that NSOs would need to work with.

The dataset comes pre-categorized in separate product and movement files on the Chicago Booth School website (School (2022)).¹ This makes it simple to work with as we can assume that the category represents the elementary index prior to its integration into the CPI with other retailer dataset and field data.

Data flow

To contextualize the data flow, Figure 1 shows how data flows in a research sense, but also how it would flow were it to be a real dataset.

Figure 1: Visual flow of data processing for Dominick’s as if it were a real production dataset

Categories

To simulate a real use case, lets say that we use Dominick’s data for production price indices. Below, several category outputs are shown from a ‘birds eye view’ of the whole dataset. This is done to pick representative periods that can be selected and used for targeted analysis – targeted to the specific time period and therefore a simulation of a normal monthly process.

Fabric Softener

Show the code

# Load your functions
source("R/dominicks_utils.R")
source("R/price_indices.R")

library(plotly)
library(DT)

ird <- read_parquet("data/clean/ird_fsf_timesamp_NULL_groupby_NITEM-REF_PERIOD.parquet")

ccdi <- spliced_CCDI(ird)
ccdi_df <- data.frame(
  period = names(ccdi),
  score = as.numeric(ccdi)
)

write_parquet(ccdi_df, "output/fsf_ccdi.parquet")


fig <- plot_ly(ccdi_df, 
               x = ~period, 
               y = ~score, 
               type = 'scatter', 
               mode = 'lines',
               text = ~paste("Period:", period, "<br>Index:", score),
               hoverinfo = 'text') %>%
  layout(title = "CCDI with mean splice (Fabric Softeners category)",
         xaxis = list(title = "Time Period", type = 'category'), # Ensures order is maintained
         yaxis = list(title = "Index", tickformat = ".01"),
         shapes = list(
           list(
             type = "line",
             x0 = "1992-07",
             x1 = "1992-07",
             y0 = 0,
             y1 = 1,
             yref = "paper",
             line = list(color = "red", width = 2)
           )
         ))

Beer

Front-end-candies

Cookies

Bottled Juices

Oatmeal

Example dashboards

Title	Subtitle	Categories
1995-03 Dashboard	Monthly Dashboard for Beer sold at Dominick’s: March 1995	Dominicks, Beer, GEKS
1992-07 Dashboard	Monthly Dashboard for Fabric Softener sold at Dominick’s: July 1992	Dominicks, Fabric Softener, GEKS

References

Lamboray, Claude. 2021. “Index Compilation Techniques for Scanner Data: An Overview.” Group of Experts on Consumer Price Indices.

Mehrhoff, Jens. 2018. “Promoting the Use of a Publically Available Scanner Data Set in Price Index Research and for Capacity Building.” Manuscript, European Commission, Https://Bit. Ly/2ZBUbg9.

School, Chicago Booth. 2022. “Dominicks Finer Foods Dataset.” Https://www.chicagobooth.edu/research/kilts/research-data/dominicks.

Webster, Michael, and Rory C Tarnow-Mordi. 2019. “Decomposing Multilateral Price Indexes into the Contributions of Individual Commodities.” Journal of Official Statistics 35 (2): 461–86.

Footnotes

See catalogue record in the Price Statistics Open Data catalogue for more information about the dataset.↩︎