MovieLens Project Part 2

Baseline models are important for 2 key reaons:

  1. Baseline models give us a starting point to which to compare all future models, and
  2. Smart baselines/averages may be needed to fill in missing data for more complicated models

Here, we'll explore a few typical baseline models for recommender systems and …

Read more →

MovieLens Project Part 1

I'm beginning a project on the MovieLens dataset to learn about collaborative filtering algorithms. This is Part 1 of this project, where I do an initial exploratory data analysis to see what the data looks like. The remainder of this post is straight out of a Jupyter Notebook file you …

Read more →

Analyzing Larger-than-Memory Data on your Laptop

If you want to run some analysis on a dataset that's just a little too big to load into memory on your laptop, but you don't want to leave the comfort of using Pandas dataframes in a Jupyter notebook, then Dask may be just your thing. Dask is an amazing …

Read more →

Taking Advantage of Sparsity in the ALS-WR Algorithm

The ALS-WR algorithm works well for recommender systems involving a sparse matrix of users by items to review, which happens when most people only review a small subset of many possible items (businesses, movies, etc.). By tweaking the code from a great tutorial to take advantage of this sparsity, I was able to dramatically reduce the computation time.

Read more →

Dealing with Grid Data in Python

In my PhD research, I do a lot of analysis of 2D and 3D grid data output by simulations I run. In my analyses, it's very helpful to restructure these data into a more useable format. A few key lines of python code do the trick.

Read more →

Interactive D3 Map of Baby Name Popularity

Choose a name:

Year: 1910

Using and Understanding this Map

To use the map above, select a name from the dropdown list (you should be able to type a name if you don't want to scroll), then drag the slider to move in time between the years 1910 and 2014 …

Read more →