Audrey M. Roy Greenfeld

Quietly building the future.

Exploratory Data Analysis: Product Recommendations

2023-06-16T02:36:00.002-07:00

In data science, EDA is an exploratory analysis of a data set. The goal is to better understand the stories that the data tells, and to uncover interesting ideas that may turn into new hypotheses to explore. 


I'm working on an EDA of the Santander product recommendation dataset. It contains anonymized data about 1.5 years of customer behavior data. The goal is to predict which new products an existing customer will get.

Here's a sample chart I created with the Python visualization library Matplotlib to better understand how many of each new product were purchased/obtained during the 1.5 years. There are actually 27 products total, and I'm visualizing the first 8.


This is only a small portion. There's much more, and I'm still working on it. I've published what I have so far on Kaggle: Santander - Product Recommendation EDA (WIP)