Putting windows in the black box: Developing a set of diagnostic and analytical tools to support multilateral price indexes in production

Presented at the 19th Ottawa Group meeting Poland, May 13-15, 2026

Working Paper
Authors

Serge Goussev

Frances Krsinich

Abstract

Using alternative data as a source in lower level aggregation considerably complicates the task of understanding and explaining price movements within the overall CPI. This is made even harder when the NSO applies multilateral price index methods, as data from multiple time periods is leveraged to measure price change, expanding the amount of data points and trends that must be considered. Specifically, alongside the overall price measure that could process the data as a ‘black box’, National Statistical Organizations (NSOs) need to also identify any issues with the input data that may impact the measurement (input diagnostics) and also explain the calculated index (output analytics. This research develops, tests, and demonstrates in an open and reproducible manner a set of diagnostic and analytical outputs that could be used alongside multilateral methods. The paper thus contributues to operational considration necessary in the adoption of these methods – ensuring that these methods can be better understood prior to adoption and their adoption supports the NSO in understanding why price movements occur as they do.

Introduction

National Statistical Organizations (NSOs) are increasingly using alternative sources of data, such as scanner or administrative sources, to measure price change. These data sources are high-frequency and volatile with high churn, as such, require multilateral indices to measure properly. Much of the attention so far has been on the multilateral methods themselves and how to select and operationalize the most appropriate method to measure overall average price change. Effort is increasingly placed on breaking down and understanding the overall trend to its sub-components, such as measuring shrinkflation, cheapflation, and related ways to understand the dispersion of price change Kristiansen and Kolstad (2026). These trends are not

There are different multilateral methods, the most common being RGEKS, GK, hedonic and TPD-based methods. They all share the fact that they use a lot of data and involve lots of calculation, whether by multilayered bilateral index calculations (the GEKS-methods) or large-scale regression models on sparse matrices (the TPD or hedonic methods). Unlike traditional index estimation, where it is relatively straightforward to explain the drivers of price movements at different levels of the price index hierarchy, the estimation tends to feel like a ‘black box’. This level of opaqueness is not acceptable as is, as some of the methods can be sensitive to specific data issues that need to be caught and handled (Fox 2026), as well as due to the requirement to understand the movements in production.

This research develops and tests a set of sensible default diagnostic outputs that should play a key role in supporting both analytical and operational needs. Specifically, it grounds diagnostic and analytical outputs to the data processing pipeline, from early stages that focus on data quality testing (Guðmundsdóttir and Jónasdóttir 2016) to decomposition of multilateral methods (Webster and Tarnow-Mordi 2019), defining the overall price index method requirements to keep an eye on and overall conceptual aspects to keep in mind. Dashboards to operationalize applicable diagnostic and analytical outputs are demonstrated using open benchmark datasets to support peer review and simplify adoption. The project thus outlines in an open and reproducible fashion an approach that can help NSOs easily understand and adopt analytical and diagnostic measures that would help them adopt multilateral methods in production and for various research purposes. Developing such robust measures ensures that the NSO validates that everything is operating as it should and enables it to understand and explain the sources of inflation to support the dissemination process.

Price indices formulas and their requirements

The three classes of multilateral methods typically adopted and the focus of this research are the Time Dummy Hedonic, the GEKS family, and the Time Product Dummy. The TDH is estimated using characteristics of the products in a regression, while the GEKS and TPD use the product id. While all three of these methods appropriately deal with legitimate product churn (i.e. products entering and leaving the market) and missing prices (periods where the product isn’t sold), potential for inconsistencies in the data over time can lead to breaks in the longitudinal record which will affect the indexes, sometimes significantly.

As all three use a window of time (i.e. multiple periods, hence ‘multilateral’) it is important that the characteristics (for TDH) of the product ids (for GEKS and TPD) are consistently defined with the length of the estimation window. In practice, with alternative prices data such as scanner, online or administrative data, the coding of characteristics or product ids can change over time. If this isn’t identified and the characteristics or product ids made consistent, it will lead to incorrect movements in the index. Therefore, it is important to have diagnostics to identify if this could be happening.

Defining diagnostics

Diagnostics can be categorized into three types

  1. Monitoring and Validation diagnostics: check that the data has been received and looks reasonable, with reference to previous data
  2. Input diagnostics: check that the data as structured for input to the multilateral estimation process looks sensible, and that, at this level of aggregation, any errors that are identifiable by outliers have been followed up.
  3. Analytical and output diagnostics: are possible once the indexes have been calculated. Analytical diagnostics are about understanding the quality-adjusted price change using decomposition measures, and output diagnostics are about picking up problems, either with the data or the estimation process itself, that can be identified by comparison of the index to previous production runs and identification of unusual or significant movements in the index.
Figure 1: Overall data flow

These diagnostics are aligned with the value adding process within the processing pipeline common to NSOs. Specifically, data gets increasingly processed and the further it goes from initial reception to elementary price indices, which is where multilateral indices are integrated with other data sources into an Elementary Aggregate (Office for National Statistics 2025). In production, NSOs will generally start by evaluating the overall Elementary Aggregate and the sub-indices that contributed to it, and then the analytical and output diagnostics of a specific elementary price index, then proceed backwards towards monitoring diagnostics to answer various questions of the data each month. Each stage thus has to clearly connect and show when it is appropriate to go back one more step. Figure 1 showcases the overall flow of data from initial ingestion to elementary price index (typically just before integration with other data). Specifically, raw data is first received prior to its ingestion, classification and other processing to create homogeneous products, which are then aggregated via the selected multilateral method.

Monitoring/Validation

The first set of diagnostics are those basic checks that should be performed to ensure that the data received is complete and correct. This has been defined as early as by Guðmundsdóttir and Jónasdóttir (2016), with the UN e-handbook providing a useful summary in the Monitoring, validation section. This paper builds on this to conceptualize the following key aspects:

  • Checking the data was received and is in the right format with the required variables.
  • If the data is received disaggregated by store, checking that all stores are either represented or their churn understood and verified, and any store-level classification changes understood and verified.
  • Checking that descriptive measures within retailer/store against historical benchmarks. For instance:
    1. Total revenue by retailer/store are within certain range limits
    2. Number of article codes (and for a specified sample, checks that sample selection is correct)
    3. Unit value prices per article code, between consecutive periods, lie within an acceptable range of variation
    4. Time series of monthly quantities and prices to evaluate data consistency and the appearance of outliers

Note that if data is received more frequently than the production schedule dictates, this is an opportunity to identify and follow-up any surfaced issues early.

Input diagnostics

The input diagnostics are those that investigate the data once it’s been processed into the correct format for the index estimation process. For example, the raw data might be at the level of individual transactions, or the aggregation might be at a more disaggregated level (e.g. weeks) than the production index (e.g. months). There may also be filtering out of data that is outside the target coverage of e.g. region or product level. Once at this stage, basic descriptive statistics similar to those of the validation/monitoring stage should be redone unless the transformation from raw to processed data is trivial. That is, measures to check descriptive measures within retailer/store against historical benchmarks (of the corresponding historical processed data):

  1. Total revenue by retailer/store are within certain range limits
  2. Number of article codes (and for a specified sample, checks that sample selection is correct)
  3. Unit value prices per index-aggregation,1 between consecutive periods, lie within an acceptable range of variation
  4. Time series of monthly quantities and prices per index-aggregation* to evaluate data consistency and the appearance of outliers

There may be outliers that emerge at this stage that weren’t identified at the monitoring/validation step. Furthermore, diagnostics at this stage are to a large part around product churn. Specifically, are the levels of churn (products/articles leaving and arriving) in line with historical trends and variation? Indeed this is the stage where issues that might be caused by coding changes in the input data, particularly with the coding of characteristics if those are used in the estimation, can be picked up on. For instance Stansfield and Krsinich (2022) showcase where an issue with data coding could lead to breaks in product identifiers in the case of consumer electronics (where the product identifier is defined in terms of the values of characteristics), which can be identified by analyzing the churn to identify discontinuities.

Analytical/Output diagnostics

The third class of measures and diagnostics are those that can be undertaken once the indexes have been estimated. These enable us to use the estimated indexes themselves to both identify any previously-unidentified issues with the data, or failures with the index estimation itself (output diagnostics) and also to understand the price changes (analytical measures).

Output diagnostics

Decomposition of the indexes to the article level can be used to identify any previously unidentified errors in the data. Ranking the contribution of each article/product and flagging for investigation if they are over some threshold value is an effective way to approach this. In most cases we would expect not to pick up data errors by this stage, but it another valuable opportunity. As multilateral indexes use an estimation window of historical data, we have the ability to directly compare the movements in the overlapping window of the current and previous production runs. While it is legitimate for the movements to change slightly with updated data, there shouldn’t be any major differences. Confirming this is an important part of verifying that the estimation has completed correctly.

Analytical measures

Multilateral methods on alternative prices data, such as scanner or administrative data, are very powerful but by the nature of the size of the data and the complexity of the algorithms, it can be difficult if not impossible, to understand what’s going on ‘under the hood’. Analytical measures are about understanding and verifying the overall movement, particularly if the movement is unusual. Furthermore, this step is doubly critical if the multilateral method is quality adjusted (i.e. if a TDH is used), as understanding the pure price change and the compositional change that showcases a shift in quality of the group of products.

This is where decomposition—i.e. ‘point effects’—methods are required and, depending on the multilateral method, there are different approaches and methods that can be used for decomposition. These can range from brute-force (iterative rerunning of the index excluding one product or product-group at a time) to analytical (exact formulae derived from the index method itself). Decomposition measures for analytical use are most likely to be useful when performed at the sub-index level—for example brand of computer, or type of produce, rather than at the individual article/product/barcode level.

Knowing when to stop

It is infeasible to define the comprehensive set of diagnostics and outputs necessary to operationalize for several reasons. On the one hand, there should be a set that will help explain most of the fluctuations in a production flow, hence designing an optimal set of dashboards is appropriate. In other words, a 80-20 rule can be applied on building a number of outputs that help explain most movements. On the other hand, it is likely that edge cases will emerge that require further investigation—requiring a clean way to query various granularities of the data for one-off analysis. As such, it is critical to design the main dashboards and diagnostics on top of a well documented and clear data flow, allowing easy analysis.

Empirical examples

The following section demonstrates and explains example diagnostics for a specific category in Dominick’s Finer foods dataset ([2013] 2018). Further examples are available on the project site.2 Figure 2 shows the application of GEKS-T to the Fabric Softener dataset. A specific period of price change is then chosen for further analysis.

Figure 2: Fabric Softeners GEKS-T example

Overview of price change this month

Analytical (output) diagnostics

Input diagnostics

Monitoring / Validation diagnostics

Conclusion

References

(2013) 2018. In Dominick’s Data Manual. Kilts Center for Marketing.
Crescenzi, Federico, Gholamreza Hajargasht, Tiziana Laureti, Luigi Palumbo, and D. S. Prasada Rao. 2026. “Cheapflation and Inflation Inequality: Quantile Time Product Dummy Indices from Italian Web Prices.” Proceedings of the 19th Ottawa Group Conference (Warsaw, Poland), May. https://ottawagroup2026.stat.gov.pl/en/DownloadFile?nrSesji=3&nrWystapienia=2&nrPliku=2.
Fox, Kevin J. 2026. “The Curious Case of an Exploding GEKS Index.” Proceedings of the 19th Ottawa Group Meeting (Warsaw, Poland), May. https://ottawagroup2026.stat.gov.pl/en/DownloadFile?nrSesji=2&nrWystapienia=1&nrPliku=2.
Guðmundsdóttir, Heiðrún Erika, and Lára Guðlau Jónasdóttir. 2016. “Scanner Data: Initial Data Testing.” Meeting of the Group of Experts on Consumer Price Indices (Geneva, Switzerland).
Kristiansen, Espen, and Sebastian Kolstad. 2026. “Assessing Cheapflation Using Norwegian Scanner Data.” Proceedings of the 19th Ottawa Group Meeting (Warsaw, Poland), May. https://ottawagroup2026.stat.gov.pl/en/DownloadFile?nrSesji=3&nrWystapienia=1&nrPliku=3.
Office for National Statistics. 2025. Introducing Alternative Data into Consumer Price Statistics: Aggregation and Weights. ONS Website. https://www.ons.gov.uk/economy/inflationandpriceindices/articles/introducingalternativedataintoconsumerpricestatisticsaggregationandweights/2025-04-29.
Stansfield, Matthew, and Frances Krsinich. 2022. “A MAP for the Future of Price Indexes at Stats NZ.” Proceedings of the 17th Ottawa Group on Price Indices (Rome, Italy). https://stats.unece.org/ottawagroup/download/f631.pdf.
Webster, Michael, and Rory C Tarnow-Mordi. 2019. “Decomposing Multilateral Price Indexes into the Contributions of Individual Commodities.” Journal of Official Statistics 35 (2): 461–86.

Footnotes

  1. Note that, at this stage, aggregate statistics like (raw) average are usefully calculated at the level of the index itself, and any relevant sub-indexes. Unit prices at the article level have already been investigated in the monitoring/validation stage.↩︎

  2. See the examples, such as the Dominick’s example. The overview page shows the categories analyzed, with a number of “Monthly dashboards” listed to showcase how the category could be analyzed for that month.↩︎

Citation

BibTeX citation:
@inproceedings{goussev,
  author = {Goussev, Serge and Krsinich, Frances},
  title = {Putting Windows in the Black Box: {Developing} a Set of
    Diagnostic and Analytical Tools to Support Multilateral Price
    Indexes in Production},
  booktitle = {Presented at the 19th Ottawa Group meeting, Warsaw,
    Poland},
  url = {https://sergegoussev.github.io/multilateral-diagnostics/docs/paper.html},
  langid = {en},
  abstract = {Using alternative data as a source in lower level
    aggregation considerably complicates the task of understanding and
    explaining price movements within the overall CPI. This is made even
    harder when the NSO applies multilateral price index methods, as
    data from multiple time periods is leveraged to measure price
    change, expanding the amount of data points and trends that must be
    considered. Specifically, alongside the overall price measure that
    could process the data as a “black box”, National Statistical
    Organizations (NSOs) need to also identify any issues with the input
    data that may impact the measurement (input diagnostics) and also
    explain the calculated index (output analytics. This research
    develops, tests, and demonstrates in an open and reproducible manner
    a set of diagnostic and analytical outputs that could be used
    alongside multilateral methods. The paper thus contributues to
    operational considration necessary in the adoption of these methods
    -\/- ensuring that these methods can be better understood prior to
    adoption and their adoption supports the NSO in understanding why
    price movements occur as they do.}
}
For attribution, please cite this work as:
Goussev, Serge, and Frances Krsinich. “Putting Windows in the Black Box: Developing a Set of Diagnostic and Analytical Tools to Support Multilateral Price Indexes in Production.” Presented at the 19th Ottawa Group Meeting, Warsaw, Poland, accepted. https://sergegoussev.github.io/multilateral-diagnostics/docs/paper.html.