MovieLens Project Part 2

Baseline models are important for 2 key reaons:

  1. Baseline models give us a starting point to which to compare all future models, and
  2. Smart baselines/averages may be needed to fill in missing data for more complicated models

Here, we'll explore a few typical baseline models for recommender systems and …

Read more →

MovieLens Project Part 1

I'm beginning a project on the MovieLens dataset to learn about collaborative filtering algorithms. This is Part 1 of this project, where I do an initial exploratory data analysis to see what the data looks like. The remainder of this post is straight out of a Jupyter Notebook file you …

Read more →

Analyzing Larger-than-Memory Data on your Laptop

If you want to run some analysis on a dataset that's just a little too big to load into memory on your laptop, but you don't want to leave the comfort of using Pandas dataframes in a Jupyter notebook, then Dask may be just your thing. Dask is an amazing …

Read more →

Taking Advantage of Sparsity in the ALS-WR Algorithm

The ALS-WR algorithm works well for recommender systems involving a sparse matrix of users by items to review, which happens when most people only review a small subset of many possible items (businesses, movies, etc.). By tweaking the code from a great tutorial to take advantage of this sparsity, I was able to dramatically reduce the computation time.

Read more →