BoardGameGeek Game Rank Analysis

Timothy W. Dooley (LASSO)



NB The main notebook of model creation is found in this directory as model_creation.ipynb.

Problem (BGG) is a boardgame database website that includes tens of thousands of entries. The data for each game includes the game's complexity, rating, designer, publisher, among many other points. Games are ranked according to BGG own ranking system. Further, they provide a 'Geek' rating alongside their user provided rating average to evaluate ranking.
Objective: Provide insight to boardgame designers and publishers regarding the prospective ranking of their game through a supervised learning model.


I created a scraping solution using BeautifulSoup and Selenium to scrape 10,000 entries in BGG.
The use of Selenium was necessary to collect market price data from dynamic marketplace entries. The scraped data can be found in this repo as a .csv file for further analysis. In toto:

Columns with suffix des represent mean values for a designer appended to each game they designed. Likewise, pub represents mean publisher data.
Genre columns are encoded with dummy variables (0,1).
quality entries include dummy variables with 1 being average or above average for designer/publisher.

The following heatmap demonstrates the features I chose to focus on.


The model focuses on the following features :

sub_df[['avgweight', 'strategy','war','family', 'abstract', 'thematic', 'geek_des', 'geek_pub']]
sub_df['log_geek_des'] = np.log(sub_df['geek_des'])
sub_df['log_geek_pub'] = np.log(sub_df['geek_pub'])

Our target/response variable is the ranking on BGG.

The R2 of this model is 0.60 .

Mean Absolute Error:

'Train MAE:', 0.4198582356908472, 'Test MAE:', 0.40880771810103766

The above model scores 0.61 on test data. I was able to eliminate negative output variables by taking the log of the output and using a polynomial fit.

Moving Forward

Better pricing information would be helpful. As many games sold on marketplaces are collector items, pricing contains numerous outliers.
