Technical details | |
---|---|
Submit file | *.ipynb |
Music. Experts have been trying to understand sound and what differentiates one song from another for a long time. What makes a tone different from another? Can we visualize sound and if yes, how? Audio files contain a lot of data, and our goal is to understand what an audio file is, the data it contains, and what we can do with that data. What features can we visualize on this kind of data?
Your Mission
Your first mission will be to find a library that "reads" music. (Computers don`t listen to music, they read it :) ) Then your second mission will be to separate a collection of music files into different music genres, and to do so, you will need to be able to first identify features inside the music files that will be used for classification.
What to expect: Build an application in Python that automatically classifies different musical genres from an audio snippet. You should expect to handle large data sets. You should also expect to analyse media files to generate data and identify patterns.
Your deliverables:
- A presentation with slides on how you classified the music, as well as assumptions, implications, and other important information.
- Code that the DevOps team should be able to push to production.
Implement a multi-variable linear regression model on a large and complex data set. Analyze and evaluate the implications of your model on real-life users. Analyze and evaluate the risk to the business of the impact, assumptions, and decisions in your model
Learning outcomes: the five stages of your project In this project, you should expect to cover the five major stages of working with data:
- Data Collecting / Cleaning (see below)
- Data Exploration
- Data Visualization
- Machine Learning
- Communication
You will have to prove yourself in each of these. We are confident that you will succeed! :)
Where to find the data?
- Music Data Set ~1.2go -> If you upload this data set inside Docode, it will freeze it. Work directly from your computer for this project.
This dataset was used for the well-known paper in genre classification "Musical genre classification of audio signals" by G. Tzanetakis and P. Cook in IEEE Transactions on Audio and Speech Processing 2002.
Reminder, it will be one of your portfolio projects. You can find a lot of different ideas. Plagiarism is not tolerated in the company either here. :-)
How To Use Jupyter In Docode