Skip to content

mirkosavasta/data-science-projects

Repository files navigation

My Data Science Portfolio

This is the repository that contains my data science projects. The repository is still "work in progress" and with time I will add more Jupyter Notebooks.

Udacity Deep Learning Nanodegree Projects:

  1. UD1_My_first_neural_network: In this project I coded my first feed-forward neural net from scratch and used it to predict daily bike rental ridership. Click here to see my work.

  2. UD2_Image_classification: In this project, I classified images from the CIFAR-10 dataset. To do so, I preprocessed the images and trained a convolutional neural network. Click here to see my work.

  3. UD3_Simpson_script_generation: In this project, I generated Simpsons TV scripts using a Recurrent Neural Net. Click here to see my work.

  4. UD4_language_translation: In this project, I took a peek into the realm of neural network machine translation. I trained a sequence to sequence model on a dataset of English and French sentences and used it to translate new sentences from English to French. Click here to see my work.

  5. UD5_face_generation: In this project, I use generative adversarial networks to generate both MNIST images and human faces. I trained everything using an AWS GPU optimized instance. Click here to see my work.

Other Projects:

  1. Time Series Forecasting with XGboost: The aim of this notebook is consolidating the knowledge acquired with my first Kaggle Competition. In this notebook I mainly use Pandas for data wrangling. I also use some sklearn functions to prepare the features for learning. I finally perform a basic cross-validation using an Extreme Gradient Boosting Regressor. Click here to see my work.

  2. Stacking and Text Classification (Jan-19): The aim of this notebook is applying common techniques that are useful when building a machine learning model. I made an extensive use of Scikit-Learn pipelines, created checkpoints to save already processed feature matrices, performed automated feature selection, used dimensionality reduction techniques (NMF and TSVD), optimized parameters with RandomSearch, abused cross-validation and implemented Stacking! I used an AWS instance to train models and CloudWatch Alarms to automatically stop the instance. Click here to see my work.

Utilities:

  1. utility1_connection: With this class you can query your databases using 3 lines of code. It's based on SQLAlchemy and seamlessly handles SQL Server Port Forwarding and SSH Tunnel Creation for you. I always need to convert SQL query results to pandas dataframes and I built this class to speed up my work. You can get your query results using a with/as statement.
query = "SELECT * FROM table;"

with Connection("ENV_NAME") as conn:
    query_results = conn.get_dataframe(query)

About

My Data Science Portfolio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published