What you'll learn:
- Define data science and its importance in today’s data-driven world.
- Describe the various paths that can lead to a career in data science.
- Summarize advice given by seasoned data science professionals to data scientists who are just starting out.
- Explain why data science is considered the most in-demand job in the 21st century.
Skills you'll gain: Model Selection, Data Analysis, Python Programming, Data Visualization, Predictive Modelling
- W1 : Defining Data Science and What Data Scientists Do
- W2 : Data Science Topics
- W3 : Data Science in Business
- References
Data Science is a process to understand data and do differents things
- keywords : trends, insights, data structured and unstructured, expore, manipulation, find answers, recommendation
- Data Analysis (question problem ? => Find Answers => Create Value)
- Data Science (Deal with different type of datas) : log files, social medias, email, sales data, patient info files, sports perfomance data, sensor data, security camera ...
- Be curious
- Data visualization => to explain => trends, to tell history, to detect a pattern, new behaiviour ...
- Maths : statistics
- Business Analysis & strategies
- SW Engineer /!\ Soft skills/Qualities : Cusiosity, Take Position => Confidence, Story teller
- recomendation algorithm
- Prediction Model
- Find patterns
- unstructured vs strutured data
-
/!\ Find solution by analyzing datas
-
Curious, fluency in analysis, good communication skills => story teller
-
old => Predict congestion (uber taxi)
-
new => Environment, Climate Changes
- Find (new) solution : => Identify the problem \ => Gather datas \ BUILD A MODEL ! => Identify / => Tools /
- list : regression, data viz, neural net, nearest neighbour, classification ...
- strutured data : ranged / organized data (excel row, colomns)
- unstructured data : comes from sensors(video, audio...), web not in row /colomns
- Cloud : Cloud allows to access to data, collaborate easy
- Storage data, high performance computing, save physical space in own computer, simulations work in the same data
- IBM : IBM Cloud
- Amazon : Amazon Web Services(AWS)
- Google : Google Cloud Platform
- Tools : Hadloop, stata ...
- Two approches
- Approach (traditional) : Statistical analysis
- Approach : unsupervised + machine-learning algorithms.
- Goal: to explore hitherto unknown trends and insights by subjecting data to analysis
Big Data(BD) : Dynamic, large, Disperate volume of data, created(apps, machine, tools ...)
- The 5Vs :
Velocity
: Speed data (quick RT)Volume
: Scalable (2,5 quintillons bytes => 10 millions DVDS)Variety
: Diversity => data come from diffrent Sources (audio, video, img ...)Veracity
: qlity / origin of data (Releablity, accuracy)Value
: turns data to value (Solution, profits..)
Data scientist extract/drives data from big data Tools : - Apache sparks - Hadoop (created by Dong Cutting ) => based on Data Cluster, splits data into a pieces to computing a large amount of data (gain speed, performance ...)
CEO => working w/ CDO & CIO and the financial department to adapt the needs
/ \
CDO CIO
(Chief Data Officer) (Chief Information Officer)
- Data Science Skills & Big Data
- Cloud
- Programming Skills
- Python (panda for data viz)
- R
- Unix/Lunix
- Maths(Algebra Statistics)
- Jupyter note book + AWS (Virtual Account)
- BIG DATA (concept created by Google, Statistical technics to handle large volume of data)
7 Steps Down the Data Mining :
- Establish data mining goals
- set up goals for the exercise.
- identify the key questions that need to be answered
- costs and benefits of exercise
- Define expected exercise resultats and accuracy
- High levels of accuracy from data mining would cost more and vice versa.
- Select data
- The output of a data mining exercise largely depends upon the quality of data being used.
- Data might be available
- If Data not available then go to : plan new data collection initiatives, including surveys
- The type of data, its size, and frequency of collection have a direct bearing on the cost
- Datas come from differents Databases
- Preprocess data
- clean
- identify relevant attributes
- identify the irrelevant attributes of data
- Transform data
- determine the appropriate format in which data must be stored
- data mining is to reduce the number of attributes needed to explain the phenomena --- using Data reduction algorithms
- store data in the variables
- Store data
- Storing Data into the right/good Data mining format for immediate read/write
- new variables can be created to store data and temporary write/read back into the database
- store data on servers or storage media that keeps the data secure
- Data safety and privacy should be a prime concern for storing data.
- Mine data
- After data is appropriately processed, transformed, and stored, it is subject to data mining
- data analysis methods
- including parametric
- non-parametric methods, and machine-learning algorithms. /!\ - data visualization : good starting point for data mining
- Multidimensional views of the data
- data mining software
- FIND : HIDDEN trends in the data
- Evaluate data mining results
- Extract the result
- Do a formal evaluation
-
Testing the predictative capabilities of the model
-
efficiency & of the algorithms in the producing data
/!\ in-sample forecast
-> Share the result w/ Stakeholders -> improve the quality of the resultat from the shared feedback
ML
=> AlgorithmsDL
=> Model based on Neural NetworkDS
=> Extracting knowledge/insights of large volume of (disparated) data- Maths Technics : statistical analysis, data viz, ML/DL algo/ Models
AI
=> everything allows the computer to learn, to solve problems and make intelligent decision
Application of ML :
- Recommendation systems (Decision tree, naive bayes, bayesian analysis)
- Classification
- Predict Analysis
- Fraud Detection
- Chap.7 : Book by Murtaza Haider
Analytics = communicate findings (set of insights in the data) using Tables & plots.
- define the ingredients and requirements ?
- how to search for background information ?
- Tools needed to generate the deliverable ?
-
tools : grabber
-
working w/ Data Scientists :
- Data Scientists work at the same level as the CIO (Chief Information Officer) - Passion / DNA / curiosity / sens of humor/ Story teller - Communication skills / relatable (relationship esaier) - Good analysis / Driven - data / Analytics skills ? - Company needs : Engineer/Architecture/ Design/ Team expansion - Technical skills : Statistical, algo, ml, bigdata : datastoring ...
- healthcare
- help professionals to give the best treatment to patients
- Prevent natural catastrophes
- capturing/gathering data (about costs and revenues ...)
- archive it
- start doing measurement
- build a strong team
- recomendation systems based on generated data by the constumers from devices like fitbits, Apple/android watches
- UPS
- in 2013 : uses Data from constumers, drivers and vehicles in a new Data Science system (Route guidance systems) to save time, money and fuel
- Netflix
- recomendation systems
- Help a firm to build competetive advantage
- identify the type of data for analysis
- bias vs compensation ? Ex: Gender Wages(salary) between Men and Women => REGRESSION MODEL !!!
- OK
Two types of deliverable
- brief
- detailed
The deliverable Contents:
- Tables (row, columns)
- Plot graphics
The deliverable template:
- cover page
- Table of figures
- Table of graphics ??
- table of contents(ToC)
- executive summary//abstract
- (Introduction)
- detailed contents
- -Literature review
- -Methodology sections ??
- -Results section??
- -Discussion section ?? (POWER OF NARRATIVE//STORYTELLING)
- Conclusion
- acknowledgments
- references, and appendices (if needed).