Skip to content

Latest commit

 

History

History
119 lines (108 loc) · 7.82 KB

README.md

File metadata and controls

119 lines (108 loc) · 7.82 KB

Applied Data Analytics training program – Kansas City, MO (2018)

The Coleridge Initiative's Applied Data Analytics training program focused on Local Economic Development and Job Creation in Kansas City, Missouri.

The program was generously sponsored and hosted by the Ewing Marion Kauffman Foundation.

Class Overview and Projects

The Coleridge Initiative Applied Data Analytics Training Program is designed to equip public policy professionals with advanced computer science and data science skills. The program provides a hands-on introduction to data analytics topics ranging from basic SQL and Python coding to running and interpreting machine learning models. Application to real-world issues is key to the organization of the program – the present Training Program in Kansas City, MO explored local economic development and job creation.

During the program, participants were grouped into teams in which they scoped and executed a project related to Economic Development and/or Job Creation. Two template projects were prepared in parallel and can be found in the Sample Projects folder.

  • The project mo_dashboard creates an interactive dashboard to track economic development metrics at county level across the state of Missouri and on the border with Illinois. The visualized metrics include typical employment statistics such as total jobs and average wages, as well as the more advanced QWI metrics.
  • The project predict_business_vitality builds a machine learning model to predict which employers survive in the following years based on firm characteristics, industrial sector, and geography. Potential applications include intervention on failing firms and identification of weakening industries.

Participant teams usually worked on projects similar to the ones shared here – restricting to a cohort of interest or improving the models with additional features. All teams presented their results on June 28th and 29th, 2018.

Data Available for Class

For both class notebooks and team projects, participants of the Applied Data Analytics Training Program have access to several datasets, including confidential micro-data hosted on the secure Administrative Data Research Facility platform. The datasets available for the Training Program in Kansas City, MO, include:

  • Missouri State LEHD Wage and Employers Data
  • Kansas City, MO, Consumer Water Data
  • Kansas City, MO, Business Licenses
  • Census LEHD Origin-Destination Employment Statistics (LODES)
  • Kansas City Star articles from the Business section
  • Illinois Department of Employment Services (IDES) data
  • Illinois Department of Corrections (IDOC) data

Program Schedule

  • Day 1: Computing Environment, Datasets and Projects
    • 10:00-11:30 - Program introduction & motivation
    • 11:30-12:30 - ADRF security and data agreements
    • 12:30-1:30 - LUNCH
    • 1:30-3:00 - Scoping projects
    • 3:00-4:30 - Introduction to Databases
    • Variables Notebook
    • Application: Generating QWI Statistics
  • Day 2: Databases and Visualization
    • 9:00-9:30 - Day 2 introduction
    • 9:30-10:45 - Guided dataset exploration
    • 10:45-11:00 - Break
    • 11:00-12:30 - Project scoping discussions
      • Task: define overall research question and initial data exploration questions
    • 12:30-1:30 - LUNCH
    • 1:30-3:00 - Data Visualization (lecture)
    • 3:00-4:00 - Data Visualization (exercises)
    • Databases Notebook
    • Data Visualization Notebook
  • Day 3: Record Linkage
    • 9:00-9:15 - Welcome to day 3; initial project ideas from teams & instructor feedback
    • 9:15-11:00 - Project work
    • 11:00-11:15 - Break
    • 11:15-12:30 - Record Linkage
    • 12:30-2:00 - LUNCH off-site: Closest options are at the Country Club Plaza
    • 2:00-3:00 - Record linkage
    • 3:00-4:00 - Project work
    • Record Linkage Notebook
  • Day 4: Text Analysis and Network Analysis
    • 9:00-9:05 - Welcome to day 4
    • 9:05-10:15 - Introduction to Text Analysis
    • 10:15-10:30 - Break
    • 10:30-11:45 - Text Analysis (exercises)
    • 11:45-12:30 - Project work (brief notes)
    • 12:30-1:30 - LUNCH
    • 1:30-2:45 - Network Analysis (lecture)
    • 2:45-3:00 - Break
    • 3:00-4:00 - Network Analysis (exercises)
    • Test Analysis Notebook
    • Networks Notebook
  • Day 5: Introduction to Machine Learning
  • Day 6: Machine Learning for Prediction
    • 9:00-9:30 - Welcome back and goals for the week
    • 9:30-12:30 - Machine Learning (lecture)
    • 1:30-3:00 - Project work
    • 3:00-4:00 - Team project presentations
  • Day 7: Inference
  • Day 8: Machine Learning in Practice
    • 9:00-10:00 - Group project discussion: issues, questions, concerns
    • 10:00-12:30 - Machine Learning in practice)
    • 1:30-2:30 - Considering “fairness” of model output
    • 2:30-4:00 - Project work
  • Day 9: Privacy, Confidentiality, and Ethics
    • 9:00-10:30 - Privacy and Confidentiality, ethics (lecture)
    • 11:00-11:45 - ADRF disclosure review and export request (exercise)
    • 11:45-12:00 - Mechanics of WebEx for presentations
    • 1:00-4:00 - Project work
  • Day 10: Class Recap and Project Work
    • 9:00-11:00 - Project work
    • 11:00-12:00 - Project presentations & feedback
    • 12:00-12:30 - Closing discussion recap of course
  • Final Presentations
    • Thursday, June 28
      • 1:00-1:30 PM Team 07: Does access to jobs via public transit inform job growth?
      • 1:30-2:00 PM Team 04: Calculating KCMO churn rates from State wage records
      • 2:00-2:30 PM Team 01: Survivability and resiliency of businesses based on the wage gap
      • 2:30-3:00 PM Team 03: Impact of the Great Recession on wages for Missouri’s Computer and Electronic Manufacturing industry
      • 3:00-3:30 PM Team 08: Employment trajectories of low-income individuals
    • Friday, June 29
      • 1:00-1:30 PM Team 09: A Kink in the Hose – No Business – No Water
      • 1:30-2:00 PM Team 05: Predicting KCMO firm survivability in low-wage industries using machine learning
      • 2:00-2:30 PM Team 02: Predicting success of firms during recession
      • 2:30-3:00 PM Team 10: Predicting employment success for ex-inmates
      • 3:00-3:30 PM Team 06: Illinois ex-offender earnings in IL and MO
      • 3:30-4:00 PM Team 11: Assessing the Strength of Missouri’s Middle Class