This project aims to develop a machine learning classification model that accurately identifies tracks labeled under the "Rock" genre and assigns them to their correct genres. The primary goal is to address the common issue of genre mislabeling, where tracks tagged as "Rock" may span various subgenres or even belong to entirely different musical categories. By utilizing a dataset of audio and music categorical features, this project will apply supervised learning techniques to improve genre classification accuracy, helping to refine the categorization process and ensure that "Rock" songs are appropriately classified according to their actual sound. characteristics
-
Data Cleaning: Removed duplicates and missing values for better data integrity.
-
Exploratory Analysis: Examined the distributions of song features like danceability, energy, and tempo, and visualized genre popularity and correlations between features.
-
Genre Popularity: Analyzed the most and least popular genres based on song popularity.
-
Modeling: Built and evaluated classification models, such as Random Forest, to predict whether a song belongs to the "Rock" genre based on its features.
Comprised of 114k songs from Spotify with genre categories for each song
Dataset:
Column Info:
Summary Statistics for Quantitative Columns:
Histograms of Numeric Features:
Correlation Matrix:
Value Count for Each Genre:
Top 10 Most/Least Popular Genres (By Average Popularity):
50 Most Popular Rock Songs:
Preparing Features:
Features & Labels:
Training and Performance Metrics w/ Default Parameters:
From a simple overview of model performance, each of our models is doing a great job; however, let's take a closer look into the accuracy of predicting certain labels.
Random Forest Classifier:
Gradient Boosting Classifier:
K-Nearest Neighbors:
K-Nearest Neighbors provides the strongest precision score when predicting values that fall under the "Rock" genre compared to the other tested models, so we'll work with this model and improve its precision even more in predicting values under the "Rock" label.
Hyperparameter Tuning
Setting our n_neighbor parameter to 4 we get the results below