Skip to content

Latest commit

 

History

History
328 lines (254 loc) · 11.6 KB

File metadata and controls

328 lines (254 loc) · 11.6 KB

Course 1: What is Data Science?

What you'll learn:

  • Define data science and its importance in today’s data-driven world.
  • Describe the various paths that can lead to a career in data science.
  • Summarize advice given by seasoned data science professionals to data scientists who are just starting out.
  • Explain why data science is considered the most in-demand job in the 21st century.

Skills you'll gain: Model Selection, Data Analysis, Python Programming, Data Visualization, Predictive Modelling

Table of Contents

W1 : Defining Data Science and What Data Scientists Do


What is Data Science ?

Data Science is a process to understand data and do differents things

  • keywords : trends, insights, data structured and unstructured, expore, manipulation, find answers, recommendation

Fundamentals of Data Scientist :

  • Data Analysis (question problem ? => Find Answers => Create Value)
  • Data Science (Deal with different type of datas) : log files, social medias, email, sales data, patient info files, sports perfomance data, sensor data, security camera ...
  • Be curious
  • Data visualization => to explain => trends, to tell history, to detect a pattern, new behaiviour ...

Many path to Data Science

  • Maths : statistics
  • Business Analysis & strategies
  • SW Engineer /!\ Soft skills/Qualities : Cusiosity, Take Position => Confidence, Story teller

What do Data Scientists do ?

  • recomendation algorithm
  • Prediction Model
  • Find patterns
  • unstructured vs strutured data

A day in life of Data Scientist ?

  • /!\ Find solution by analyzing datas

  • Curious, fluency in analysis, good communication skills => story teller

Old vs new problems ?

  • old => Predict congestion (uber taxi)

  • new => Environment, Climate Changes

    - Find (new) solution  : 
    		=> Identify the problem  \
    		=> Gather datas     	  \ BUILD A MODEL !
    		=> Identify 			  /
    		=> Tools				 /
    

Topics algorithms

  • list : regression, data viz, neural net, nearest neighbour, classification ...
  • strutured data : ranged / organized data (excel row, colomns)
  • unstructured data : comes from sensors(video, audio...), web not in row /colomns

Clouds for Data Scientist ?

  • Cloud : Cloud allows to access to data, collaborate easy
    • Storage data, high performance computing, save physical space in own computer, simulations work in the same data
  • IBM : IBM Cloud
  • Amazon : Amazon Web Services(AWS)
  • Google : Google Cloud Platform
  • Tools : Hadloop, stata ...

Lab

Quiz

W2 : Data Science Topics

Foundations of Big Data ?

Big Data and Data Mining

  • Two approches
  1. Approach (traditional) : Statistical analysis
  2. Approach : unsupervised + machine-learning algorithms.
  • Goal: to explore hitherto unknown trends and insights by subjecting data to analysis

What is Hadoop ?

Big Data(BD) : Dynamic, large, Disperate volume of data, created(apps, machine, tools ...)

  • The 5Vs :
    • Velocity : Speed data (quick RT)
    • Volume: Scalable (2,5 quintillons bytes => 10 millions DVDS)
    • Variety: Diversity => data come from diffrent Sources (audio, video, img ...)
    • Veracity: qlity / origin of data (Releablity, accuracy)
    • Value : turns data to value (Solution, profits..)

Data scientist extract/drives data from big data Tools : - Apache sparks - Hadoop (created by Dong Cutting ) => based on Data Cluster, splits data into a pieces to computing a large amount of data (gain speed, performance ...)

How Big Data is Driving Digital Transformation?

	CEO => working w/ CDO & CIO and the financial department to adapt the needs

			/ 					\
		CDO						CIO 
	(Chief Data Officer)  (Chief Information Officer)
  • Data Science Skills & Big Data
    • Cloud
    • Programming Skills
    • Python (panda for data viz)
    • R
    • Unix/Lunix
    • Maths(Algebra Statistics)
    • Jupyter note book + AWS (Virtual Account)
    • BIG DATA (concept created by Google, Statistical technics to handle large volume of data)

Data Mining?

7 Steps Down the Data Mining :

  1. Establish data mining goals
  • set up goals for the exercise.
  • identify the key questions that need to be answered
  • costs and benefits of exercise
  • Define expected exercise resultats and accuracy
  • High levels of accuracy from data mining would cost more and vice versa.
  1. Select data
  • The output of a data mining exercise largely depends upon the quality of data being used.
  • Data might be available
  • If Data not available then go to : plan new data collection initiatives, including surveys
  • The type of data, its size, and frequency of collection have a direct bearing on the cost
  • Datas come from differents Databases
  1. Preprocess data
  • clean
  • identify relevant attributes
  • identify the irrelevant attributes of data
  1. Transform data
  • determine the appropriate format in which data must be stored
  • data mining is to reduce the number of attributes needed to explain the phenomena --- using Data reduction algorithms
  • store data in the variables
  1. Store data
  • Storing Data into the right/good Data mining format for immediate read/write
  • new variables can be created to store data and temporary write/read back into the database
  • store data on servers or storage media that keeps the data secure
  • Data safety and privacy should be a prime concern for storing data.
  1. Mine data
  • After data is appropriately processed, transformed, and stored, it is subject to data mining
  • data analysis methods
  • including parametric
  • non-parametric methods, and machine-learning algorithms. /!\ - data visualization : good starting point for data mining
  • Multidimensional views of the data
  • data mining software
  • FIND : HIDDEN trends in the data
  1. Evaluate data mining results
  • Extract the result
  • Do a formal evaluation
  • Testing the predictative capabilities of the model

  • efficiency & of the algorithms in the producing data

/!\ in-sample forecast

-> Share the result w/ Stakeholders -> improve the quality of the resultat from the shared feedback

Data Scientists at New York University

Deep Learning and Machine Learning ?


  • ML => Algorithms
  • DL => Model based on Neural Network
  • DS => Extracting knowledge/insights of large volume of (disparated) data
    • Maths Technics : statistical analysis, data viz, ML/DL algo/ Models
  • AI => everything allows the computer to learn, to solve problems and make intelligent decision

Application of ML :

  • Recommendation systems (Decision tree, naive bayes, bayesian analysis)
  • Classification
  • Predict Analysis
  • Fraud Detection

Hands-on Exercise: Data Science Exploration

  • Chap.7 : Book by Murtaza Haider

The Final Deliverable

Analytics = communicate findings (set of insights in the data) using Tables & plots.

  1. define the ingredients and requirements ?
  2. how to search for background information ?
  3. Tools needed to generate the deliverable ?
  • tools : grabber

  • working w/ Data Scientists :

    - Data Scientists work at the same level as the CIO (Chief Information Officer)
    - Passion / DNA / curiosity / sens of humor/ Story teller  
    - Communication skills / relatable (relationship esaier)
    - Good analysis / Driven - data / Analytics skills ? 
    - Company needs : Engineer/Architecture/ Design/ Team expansion
    - Technical skills : Statistical, algo, ml, bigdata : datastoring ...
    

Lab

Quiz

W3 : Data Science in Business

How Data Science is saving lives

  • healthcare
    • help professionals to give the best treatment to patients
  • Prevent natural catastrophes

How Should Companies Get Started in Data Science?

  • capturing/gathering data (about costs and revenues ...)
  • archive it
  • start doing measurement
  • build a strong team

Applications of Data Science

  • recomendation systems based on generated data by the constumers from devices like fitbits, Apple/android watches
  • UPS
    • in 2013 : uses Data from constumers, drivers and vehicles in a new Data Science system (Route guidance systems) to save time, money and fuel
  • Netflix
    • recomendation systems
  • Help a firm to build competetive advantage

How Can Someone Become a Data Scientist?

Recruiting for Data Science

Careers in Data Science

High School Students and Data Science Careers

What Analytical Techniques/Methods Do You Need?

  • identify the type of data for analysis
  • bias vs compensation ? Ex: Gender Wages(salary) between Men and Women => REGRESSION MODEL !!!

The Narrative

  • OK

The structure of narrative

Two types of deliverable

  • brief
  • detailed

The deliverable Contents:

  • Tables (row, columns)
  • Plot graphics
The deliverable template: 
- cover page
- Table of figures
- Table of graphics ??
- table of contents(ToC)
- executive summary//abstract
- (Introduction)
- detailed contents
- -Literature review
- -Methodology sections ??
- -Results section??
- -Discussion section ?? (POWER OF NARRATIVE//STORYTELLING)
- Conclusion
- acknowledgments
- references, and appendices (if needed).

Lab

Quiz

References