About

Who Am I?

Hi I'm Richard Lu. I am a data scientist trained in industrial engineering (BS) and the social sciences (PhD) interested in data of all kinds. I specialize in using sophisticated computational techniques to derive business insights from Big Data.

Education

My Training

Haas School of Business,
University of California, Berkeley

Dissertation (In Progress) - Chameleons in Organizations: The Influences and Impacts of Self-Monitoring Behavior

  • Sameer Srivastava, Chair
  • Toby Stuart
  • Ming Leung
  • David Bamman
  • Cameron Anderson

H. Milton Stewart School of Industrial and Systems Engineering,
Georgia Institute of Technology

GPA: 4.00

Skills

My Specialties

Because learning rarely follows a linear trend, I have included here my self-reported proficiencies for my skills and languages. For the langauges, I additionally include a rough estimate of the number of hours I have spent in each.

Skills

Data Visualization

Dimensionality Reduction

Machine Learning

Natural Language Processing

Object Oriented Programming

Relational Databases

Statistics

Web Scraping

Languages

Python

4800 hours

R

1000 hours

SQL

290 hours

HTML, CSS

140 hours

Latex

75 hours

Bash

40 hours
Projects

Selected Data Experience

Imputing Cultural Fit

  • Developed a generalizable methodology for extending cross-sectional surveys to longitudinal data using a random forest model
  • Leveraged natural language processing tools and principal components analysis to extract features from the raw content of over five million emails
  • Overcame challenges in the machine learning pipeline such as small N, class imbalance, and model validation by transforming classification probabilities to a weighted mean measure, bootstrapping unbalanced classes, and designing complementary evaluation metrics, respectively

Worked with Jennifer Chatman (UC Berkeley), Amir Goldberg (Stanford), and Sameer Srivastava (UC Berkeley) to develop a research paper.

Visualizing Responsibility

  • Extended a transfer learning convolutional neural network model based on Google's Inception-v3 computer vision architecture to evaluate the perceived responsibility of a profile picture by training on unique survey data
  • Integrated recent research on model interpretation in the form of class activation mapping to produce heatmaps of elements that most contributed to the responsibility ratings, opening the black box of deep learning models
  • Performed multivariate linear regression analysis to identify teh impact of perceived responsibility in a low-wage, technology-mediated labor market

Worked with Ming Leung (UC Riverside), Sibo Lu (UpWork), and Michael Fermanian to write a grant proposal for which we received $8,000 from the Fisher Center for Business Analytics.

Assessing Career Progression

  • Cleaned and extended a personnel dataset of more than three million person-month observations by creating variables such as organizational hierarchy based on direct reports and move atypicality based on all realized job title transitions
  • Analyzed differential effects of move atypicality by gender on career outcomes (pay and performance) using statistical methods such as matching on observables, piecewise exponential hazard rate models, and linear regression

Worked with Ming Leung (UC Riverside) to develop a research paper (backend only here).

Improving Flow Time

  • Worked with a team of seven other individuals to improve the flow time of inventory through a 235,000 square foot distribution center
  • Developed a simulation model and a set of decision support tools, including a layout optimization, to estimate an overall improvement of 325% on the flow time of inventory

Delivered a technical report to the organization.

Contact

Get in Touch