My personal website can now be found here
With the school term now finished, here are some links to a few of the research projects I did this semester.
- Predicting Fama French Factors Using Machine Learning Techniques
- PDF, Github repository
- This paper applies a variety of machine learning models to the task of predicting two of the Fama-French factors, SMB and HML. For each year of predictions the models are trained on the previous four-hundred months of data. Predictions are made for the period of 1988-2018. The performance of trading strategies based on these predictions are assessed. Our results suggest that machine learning models do possess predictive ability for these factors.
- Rating the Critics: An In-Depth Look at New York Times Film Critics and the Academy of Motion Picture Arts and Sciences Voting Membership
Recently completed a class project using the NYT movie review API. HTML report can be found here, with underlying code here. It was a cool opportunity to do some web scraping and practice model building. I will probably move away from WordPress in the near future given it does not play particularly nice with Jupyter notebooks (and Github Pages looks much nicer!).
Finally got the opportunity to upload some of the draft tools I used this year to Github. I used a combination of a PostgreSQL database full of player stats, and a Jupyter Notebook for data wrangling and visualization.
Two plots which I found quite helpful in my drafting:
1. This shows the 2017 vs 2018 relative performance of the top 10% of players in the league. Very helpful in identifying potential bargains, or overvalued players.
2. For a given goalie this shows the past three years performance (W, GAA, SV%). This came in very helpful during the actual draft when I had to quickly compare multiple goalies.
In the past I largely used Excel to do this type of analysis. What I realized this year is that for any non-trivial type of data analysis like this, Python is much, much, much, easier to use.
In July I was a guest on Eric Cai’s YouTube talkshow The Central Equilibrium. In the episode I discuss different sets of numbers (natrual, integer, rational, irrational, real, complex), prove that the square root of 2 is irrational, define injection/surjection/bijection, and finish by proving the rational numbers are countable.
It was a great learning experience for me as it was one of the few times I have ever formally taught math.
Links to the videos, and PDF of the show notes below:
I just finished courses 1-4 in the 5 course Deep Learning Specialization on Coursera. For those considering enrolling, here are some of my thoughts.
The specialization will give you some intuition about how deep learning works, however most of this comes in the first two courses. The lecture videos are well done and Andrew Ng is a clear lecturer. The programming assignments were disappointing.
Where I Was Coming From
I will be beginning my MSc Statistics at the University of Toronto this fall, and will be taking a number of machine learning courses. I enrolled in this specialization to get a small preview of what I will see in these courses, as well as to improve my programming skills. This past year I took courses in linear algebra and multivariable calculus so none of the math in the courses was new to me.
Cost & Time
The course works on a subscription basis where you pay $49 USD a month (~$65 CAD). On average each course in the specialization takes fifteen hours to complete.
- The course is very approachable and the lecture videos are easy to follow along with.
- The Heroes of Deep Learning videos (interviews with famous names in the machine learning community) are terrific. Each interview subject has a unique perspective and gives their advice for anyone interested in breaking into the field.
- The course has been quite popular, and as a result the forums have lots of archived advice/tips if you get stuck on an assignment.
- The assignments are completed in Jupyter Notebooks hosted by Coursera. Given this is a fairly popular programming environment, it was great to get some experience with it.
- My biggest issue with this course was with the programming assignments. Obviously Andrew Ng wanted to make the course accessible to a wide audience, so I was not expecting the assignments to be overly difficult. However, the assignments tended to be either incredibly easy (aided by some hints which at times provided nearly all of the code you needed to input), or overly structured (instead of being tasked with a problem to solve, you are tasked with filling in snippets of code in a nearly solved problem). I had previously taken the course An Introduction to Interactive Programming in Python which had a much better assignment format. In that course you were given a problem to solve, a couple of hints, and then you coded up a solution from scratch.
- The course had a very heavy focus on computer vision. It would have been nice if it had devoted more time to other areas where deep learning is being applied.
- My personal learning style was to paste the lecture slides into OneNote and take notes on top of them. However for several videos the lecture slides were missing (i.e. Ng would lecture using a set of slides but you could not download them). For an online course that costs money this seemed inexcusable.
If I were to do it again I think I would have just done the first two courses, and instead of courses 3 and 4, devoted my time to building things from scratch or trying a Kaggle competition with what I had learnt. For those looking to get a better understanding of what deep learning is at a high level, I would recommend checking out the first two courses in this specialization.
A follow-up to my earlier data visualization (World Cup 2018 – Who is Toronto Cheering For?).
Blue represents census tracts where the French population outnumbers the Croatian population. Red represents census tracts where the Croatian population outnumbers the French population.