The bootcamp’s third week started off with a review of Statistics. We then moved on to dealing with missing data by imputation. The discussion on missingness acted as a segway into Machine Learning through the K Nearest Neighbors (KNN) algorithm in R. Our last two lectures moved into the most ubiqitous territory of Machine Learning: Linear Regression. We covered Simple Linear Regression in depth from both a theoretical and practical standpoint before doing the same for Multiple Linear Regression.
The most striking facet of this weeks lessons was the depth at which the material was addressed. i had come across all these topics before, but never at the level of profundity of the lectures presented in Week 3. One entirely new concept was the use of KNN as a method of imputation where previously I had only heard of forward/backward fill and using the mean. Again, the lengthy assignments served to make the material stick.
Every day this past week an hour was dedicated to presentations. There were several interesting topics across the board from other members of the cohort which showed off the Data Visualizations concepts presented last week. I thought my own presentation on the data from the Perkins Loan Program went pretty well. Here is a link to my slides, the first of which has a link to the project’s GitHub repo.
This week also marked the debut of our first speaker, Susan Sun, from Thomas-Reuters. She gave a great presentation on how Data Science is used at her company, and some great advice on what we as nascent Data Scientists should do to prepare ourselves to enter the field - both from a technical and practical point of view.
Next week we will dive into more Machine Learning in R while working on project 2 which will be to build a Shiny app.