11 Years of Statehood Stability: [Blog&Code]
Description: The Fund For Peace is a nonprofit which applies a data-driven approach to understanding the world’s problems. One of their initiatives is the Fragile State Index, a yearly ranking of the stability of countries around the world. With over a decade’s worth of their data in hand, let’s see what it has to say.
Bundesliga xG 15-16: [Blog&Code]
Description: Expected Goals is the hot metric in soccer analytics. Here I compare the actual results and those predicted by Expected Goals compare in Germany’s top soccer league, the Bundesliga.
Description: Identifying hate speech is an important task on the internet. I used scikit-learn and the nltk package to build a hate speech classifier using Twitter data from CrowdFlower and built an accompanying app. The final model utilized the Random Forest Classifier and achieved 76% accuracy on unseen data, a 26% increase over the baseline accuracy of 50%.
Description: Kickstarter is the foremost platform for crowd-sourced projects on the internet. I first scraped Kickstarter data and then used R’s caret to build a model that predicts whether or not a project will be funded. The final model was an ensemble of a Logistic Regression classifier built on numerical features and a Random Forest model built from text features. It achieved close to 83% accuracy on unseen data compared to a baseline of 60%. I am still working on this project to build an associated web app. This app will allow a user to input information on his/her project idea, calculate a probability of its being funded, and make recommendations to increase this probability.
Description: An interactive app made in R which shows the Open Data Science Conference’s (ODSC) meetups around the world. Data is scraped from the Meetup.com using rvest, and displayed as a map via the leaflet library. The flexdashboard library provides the layout.
Description: A Twitter bot which uses the input of five philosophical works into a Markov Chain to produce its own philosophical musings.
Noted Resources: [Site]
Description: A collection of resource across different topics.
An Introduction To Principal Component Analysis: [Repo]
Description: A talk given at the New York Data Science Study Group’s meeting for August. In the presentation I went through the algorithm’s mathematical foundations, and then moved through three increasing complex examples of its use. The repository also contains a link to a binder for interactive exploration of the material.