Data description and format: The data sets for this competition are provided in two different formats. The main one is a collection of JSON records which describe in details different game states. Each JSON record that can be used for training a prediction model is written in a single row of a text file. It contains information about a condition of each of the competing heroes, played minion cards, cards in the hand of the first player (it is assumed that the first player always starts the game) and much more. In particular, JSON records contain names of the cards in the hand of the first player and names of minions played by both players. It is allowed to use external knowledge bases about Hearthstone cards as long as they are publically available and their source is clearly stated in the submitted competition report. One example of such a source is the HearthPwn portal.
The training data is also available in a simpler tabular format, where each row corresponds to a different game state. Columns of the data tables correspond to the most important fields from the JSON records or to some aggregations of information from other fields, e.g. a sum of a maximal HP of the minions played by the first player, total mana cost of the minions played by the opponent or a number of spell cards in the hand of the first player. Please note that this is just an exemplary tabular representation of the available JSON data and it is likely to miss some important information.
The training data was compressed into two files, namely trainingData_tabular.zip and trainingData_JSON.zip, which correspond to the tabular representation and JSON representation, respectively. They can be downloaded from the Data files folder after a successful enrollment to the competition. In total, the training data contains descriptions of 2,000,000 game states, which are equally divided into four data chunks. The first column of tabular data and the gamestate_id field in the JSON records store unique identifiers of the game states. The second column and the decision field hold the information about the result of the game from which a given game state was extracted. The decision in the data is ‘1’ if the first player won the game and ‘0’ otherwise. The remaining columns/fields constitute a description of the game states. Their names are given in the data files and are rather self-explanatory.
The test data is available in the same format as the training sets, however, there is no information about the decisions. In the tabular format, the corresponding column stores no values and the decision field is missing in the JSON files. This is the target for predictions. In total, the test data consists of 1,250,000 records750,000 records (see the post in the News section) divided into three chunks – 250,000 game states each. Please note that the training and test data sets contain game states from different play outs.
The format of submissions: The participants of the competition are asked to predict likelihoods of winning by the first player based on their representation of the data and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 750,000 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly one real number corresponding to the predicted likelihood. The values do not need to be in a particular range, however, higher numerical values should indicate a higher chance of winning.
Evaluation of results: The submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a small subset of the test set, fixed for all participants. It will correspond to approximately 5% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session devoted to this competition, which will be organized at the FedCSIS'17 conference. The assessment of solutions will be done using the Area Under the ROC Curve (AUC) measure.