─ Prepared by Team Agape on 24.03.2023
Members: Peace Oluchi Onyekachi | Owolola Rakayat
Exploring a dataset is one rewarding act that gives one grand insight about that dataset. It takes us down the history lane to what happened, how it went and the what to learn from there. This FIFA Women's World Cup Stats dataset from Kaggle, contains detailed information on the players, matches, and results of every Women's World Cup tournament from 1991 to 2019.
By analyzing this dataset, we gained valuable insights into the evolution of women's soccer over the years, the players performances, and their countries which all played a part on how they succeed in this highly competitive sport.
- Thoroughly explore and analyze (EDA) the dataset to know top performers, their goal scores and countries.
- The workflow of how data exploratory analysis plays a major part in the know how on every aspects of life.
The dataset used was gotten from kaggle: https://www.kaggle.com/datasets/mattop/fifa-womens-world-cup-stats which contains 136 number of Rows and 21 columns with so may insights on this highly competitive game.
This is the first step/approach on analyzing any dataset. We get to understand the dataset, then went ahead to prepare and clean it for easy use in our analysis.
- Data Preparation:
Towards understanding the dataset, some functions like shape, head, describe, info, columns were used to ascertain the core feature and its relatives. To further ensure that our data will be ready to be used, we cleaned up the data by removing duplicates, dropping irrelevant rows & columns and renaming the columns for easy identification.
- Understanding The Features (univariate) Univariate analysis simply means analyzing each feature to ascertain their aggregates. We used plotting features distributions like Bar, Histogram, Area, Density, Box etc to test each series. We were able to see the value counts of each series, their unique values and how their distributions can fit & will be presented.
Total number of goals series analysis
- Features Relationships:
Comparing the features side by side using different functions / methods like scatter plot, Heatmap, Pairplot etc made us realize that there can be so much connecting two or more variables than anticipated. Visualization of the distributions of the data helped us to understand & determine its behaviours and patterns. These went further to relate to us the relationships amongst the independent variables, amongst the dependent variables and between the independent & dependent ones. This brought about the establishment of the perfect positive relationship.
What did this analysis solve? What can be seen as the most important take out on this whole exploration process? After this thorough analysis, it threw many light on some areas:
- The most important thing in this whole process is that games is all about Goals / Winning.
- The Starting Time of Players are very important.
- The goals scored on non penalty seems special.
- Major top Players tends to attract Yellow Cards.
- Some Countries has been producing many top players for years.
The drive that contains all the necessary document: Agape - Datathon.