MACHINE LEARNING TECHNIQUES FOR THE DETECTION OF UNFAIR PRICING IN SUPERMARKETS ACROSS TRINIDAD AND TOBAGO

Arti K. Ramdhanie*

Faculty of Science and Technology, The University of the West Indies, Trinidad

Email: arti.ramdhanie@sta.uwi.edu (Corresponding author)

Abstract:

The tracking of prices in monitored supermarkets across Trinidad and Tobago is done by the Ministry of Trade and Industry. This initiative involves data collection every month for 118 grocery items (“standard basket”). The task of identifying which supermarkets are non-conforming in their pricing schemes is linked to the “total basket price” (total cost of the 118 items). An outlier is defined as any datapoint that varies significantly from all other observations in a dataset.  In this paper, it is any supermarket that exceeds this total basket price by 5%. The aim of this research was twofold, with the first goal being to employ feature selection methods to reduce the number of items being collected. The second goal was to create a logistic regression learning model that can identify whether supermarkets are non-conforming, given their pricing information. The dataset contained 692 datapoints and out of these, only eight (8) were classified as outliers. This is an imbalanced dataset. Resampling by SMOTE (Synthetic Minority Oversampling Technique) was used to synthetically generate data for the training set. Seven (7) feature selection methods were also investigated and their results discussed and analysed. In doing this, a more balanced dataset was achieved which was tested and validated on the unseen data (testing set). The metrics indicated that a subset of these features can be collected whilst still maintaining the supermarket outliers.

 

Keywords: Feature Selection, Logistic Regression, Machine Learning, Outlier Detection, SMOTE.

 

https://doi.org/10.47412/GIDS9258

 

 

Full PDF Article