Draft

Showcase with Dominick’s dataset

Dominicks offers us a rich real world scanner dateset (focusing on food from stores)

Overview of the data

The Dominick’s dataset is a rich rich and very useful dataset to leverage in validating how diagnostics and analytics can be designed with scanner data. Since it was first proposed by Mehrhoff (2018), it has been widely used within price statistics, from analysis of multilateral methods by Lamboray (2021) to empirical tests for the decomposition of multilateral methods by Webster and Tarnow-Mordi (2019). We leverage this dataset as it provides a very realistic example that NSOs would need to work with.

The dataset comes pre-categorized in separate product and movement files on the Chicago Booth School website ( ([2013] 2018)).1 This makes it simple to work with as we can assume that the category represents the elementary index prior to its integration into the CPI with other retailer dataset and field data.

Categories

To simulate a real use case, lets say that we use Dominick’s data for production price indices. Below, several category outputs are shown from a ‘birds eye view’ of the whole dataset. This is done to pick representative periods that can be selected and used for targeted analysis – targeted to the specific time period and therefore a simulation of a normal monthly process.

Fabric Softener

Figure 1 shows the Fabric Softeners category over several years.

Show the code
# Load your functions
source("../../src/dominicks_utils.R")

library(plotly)

ird <- homogenous_product_aggregation(
        category_name='fsf',
        data_dir='../../data/processed',
        time_sample=c(1,2),
        group_by_parameters=c('NITEM', 'REF_PERIOD'),
        window=list(
        "start" = "1990-01-01",
        "end"   = "1997-04-01")
    )

ccdi <- spliced_CCDI(ird)
ccdi_df <- data.frame(
  period = names(ccdi),
  score = as.numeric(ccdi)
)

fig <- plot_ly(ccdi_df, 
               x = ~period, 
               y = ~score, 
               type = 'scatter', 
               mode = 'lines',
               text = ~paste("Period:", period, "<br>Index:", score),
               hoverinfo = 'text') %>%
  layout(title = "CCDI with mean splice (Fabric Softeners category)",
         xaxis = list(title = "Time Period", type = 'category'), # Ensures order is maintained
         yaxis = list(title = "Index", tickformat = ".1"))

fig
Figure 1: Fabric Softeners GEKS-T example

Beer

Figure 2 shows the Beer category over several years.

Show the code
ird <- homogenous_product_aggregation(
        category_name='ber',
        data_dir='../../data/processed',
        time_sample=c(1,2),
        group_by_parameters=c('NITEM', 'REF_PERIOD'),
        window=list(
        "start" = "1990-01-01",
        "end"   = "1997-04-01")
    )

ccdi <- spliced_CCDI(ird)
ccdi_df <- data.frame(
  period = names(ccdi),
  score = as.numeric(ccdi)
)

fig <- plot_ly(ccdi_df, 
               x = ~period, 
               y = ~score, 
               type = 'scatter', 
               mode = 'lines',
               text = ~paste("Period:", period, "<br>Index:", score),
               hoverinfo = 'text') %>%
  layout(title = "CCDI with mean splice (Beer category)",
         xaxis = list(title = "Time Period", type = 'category'), # Ensures order is maintained
         yaxis = list(title = "Index", tickformat = ".1"))

fig
Figure 2: Beer GEKS-T example

Analgetics

Figure 3 shows the Analgetics category over several years.

Show the code
ird <- homogenous_product_aggregation(
        category_name='ana',
        data_dir='../../data/processed',
        time_sample=c(1,2),
        group_by_parameters=c('NITEM', 'REF_PERIOD'),
        window=list(
        "start" = "1990-01-01",
        "end"   = "1997-04-01")
    )

ccdi <- spliced_CCDI(ird)
ccdi_df <- data.frame(
  period = names(ccdi),
  score = as.numeric(ccdi)
)

fig <- plot_ly(ccdi_df, 
               x = ~period, 
               y = ~score, 
               type = 'scatter', 
               mode = 'lines',
               text = ~paste("Period:", period, "<br>Index:", score),
               hoverinfo = 'text') %>%
  layout(title = "CCDI with mean splice (Analgetics category)",
         xaxis = list(title = "Time Period", type = 'category'), # Ensures order is maintained
         yaxis = list(title = "Index", tickformat = ".1"))

fig
Figure 3: Analgetics GEKS-T example

Cookies

Figure 4 shows the Cookies category over several years.

Show the code
ird <- homogenous_product_aggregation(
        category_name='coo',
        data_dir='../../data/processed',
        time_sample=c(1,2),
        group_by_parameters=c('NITEM', 'REF_PERIOD'),
        window=list(
        "start" = "1990-01-01",
        "end"   = "1997-04-01")
    )

ccdi <- spliced_CCDI(ird)
ccdi_df <- data.frame(
  period = names(ccdi),
  score = as.numeric(ccdi)
)

fig <- plot_ly(ccdi_df, 
               x = ~period, 
               y = ~score, 
               type = 'scatter', 
               mode = 'lines',
               text = ~paste("Period:", period, "<br>Index:", score),
               hoverinfo = 'text') %>%
  layout(title = "CCDI with mean splice (Cookies category)",
         xaxis = list(title = "Time Period", type = 'category'), # Ensures order is maintained
         yaxis = list(title = "Index", tickformat = ".1"))

fig
Figure 4: Cookies GEKS-T example

Bottled Juices

Figure 5 shows the Bottled Juices category over several years.

Show the code
ird <- homogenous_product_aggregation(
        category_name='bjc',
        data_dir='../../data/processed',
        time_sample=c(1,2),
        group_by_parameters=c('NITEM', 'REF_PERIOD'),
        window=list(
        "start" = "1990-01-01",
        "end"   = "1997-04-01")
    )

ccdi <- spliced_CCDI(ird)
ccdi_df <- data.frame(
  period = names(ccdi),
  score = as.numeric(ccdi)
)

fig <- plot_ly(ccdi_df, 
               x = ~period, 
               y = ~score, 
               type = 'scatter', 
               mode = 'lines',
               text = ~paste("Period:", period, "<br>Index:", score),
               hoverinfo = 'text') %>%
  layout(title = "CCDI with mean splice (Bottled Juices category)",
         xaxis = list(title = "Time Period", type = 'category'), # Ensures order is maintained
         yaxis = list(title = "Index", tickformat = ".1"))

fig
Figure 5: Bottled Juices GEKS-T example

Dominicks specific data flow

To contextualize the data flow, Figure 6 shows how data flows in a research sense, but also how it would flow were it to be a real dataset.

Figure 6: Visual flow of data processing for Dominick’s as if it were a real production dataset

References

(2013) 2018. Dominick’s Data Manual. Kilts Center for Marketing.
Lamboray, Claude. 2021. “Index Compilation Techniques for Scanner Data: An Overview.” Group of Experts on Consumer Price Indices.
Mehrhoff, Jens. 2018. “Promoting the Use of a Publically Available Scanner Data Set in Price Index Research and for Capacity Building.” Manuscript, European Commission, Https://Bit. Ly/2ZBUbg9.
Webster, Michael, and Rory C Tarnow-Mordi. 2019. “Decomposing Multilateral Price Indexes into the Contributions of Individual Commodities.” Journal of Official Statistics 35 (2): 461–86.

Footnotes

  1. See catalogue record in the Price Statistics Open Data catalogue for more information about the dataset.↩︎