This course will introduce modern predictive and learning analytics techniques. The main emphasis will be on the applied aspects of these techniques with programming in the R language (an open source software that has gained tremendous popularity recently). Each lecture is designed to introduce new methods followed by real data applications from various applied fields (marketing, operations, finance, economics, and sports analytics). In introducing these predictive analytics tools, the course will feature discussions on four broadly defined areas of focus: 1) finding the most appropriate model that best represents the data, 2) selecting the optimal set of predictors, 3) reducing the dimension of data and dealing with correlated predictors, 4) improving prediction performance.
Today’s businesses collect and analyze large sets of data in almost every field imaginable. Most of these companies are interested in hiring candidates who are “data” savvy. There are many questions that needs to be answered by these companies, examples include questions like
- Which set of characteristics help us determine if a customer will purchase a product?
- Given that a person likes movie X, what are the chances he/she will also like movie Y?
- What is the likelihood that a stock will go up given its past performance?
- What are the chances a sports team will win a given game?
- How do we classify customers into groups?
- What is probability that a customer will default on their mortgage?
A summary of topics that will be covered in the course is as follows: linear and non-linear regression analysis (ridge, Lasso, K-nearest neighbor, non-linear splines, neural networks), classification methods (logistic regression, linear discriminant analysis, support vector machines), tree based methods (regression/classification trees, bagging, boosting, random forests), unsupervised learning methods (principle components analysis, k-means clustering, hierarchical clustering).
Upon successful completion of this course, students will be able to 1) identify the most appropriate methodology in analyzing data for both prediction and explanatory purposes, 2) run the relevant models in R, and 3) interpret the outputs.