Projects

Comparing Collaborative Filtering Methods

I wanted to dive into the fundamentals of collaborative filtering and recommender systems, so I implemented a few common methods and compared them.

Read

Deploying a Cookiecutter Django Site on AWS

The Django Cookiecutter template is an amazing tool to start up a Django project with all the bells and whistles ready to go. Getting your production site up and running can still be a bit of a hassle though, so to save myself, and hopefully a few others, from this hassle in the future, I'm recording all the steps that worked for me here.

Read
Data Science

Running Jupyter Lab Remotely

I'm a huge fan of Jupyter Notebooks, and I was very excited when I found out about Jupyter Lab, which provides a much more comprehensive user experience around Jupyter Notebooks. Here I share how to run Jupyter Lab efficiently on a remote machine. I have a research cluster where I do most of my analyses for my PhD work, and running Jupyter Lab directly on the cluster means I don't have to copy files between the cluster and my desktop.

Read
Data Science

Taking Advantage of Sparsity in the ALS-WR Algorithm

The ALS-WR algorithm works well for recommender systems involving a sparse matrix of users by items to review, which happens when most people only review a small subset of many possible items (businesses, movies, etc.). By tweaking the code from a great tutorial to take advantage of this sparsity, I was able to dramatically reduce the computation time.

Read
Data Science

Analyzing Larger-than-Memory Data on your Laptop

If you want to run some analysis on a dataset that's just a little too big to load into memory on your laptop, but you don't want to leave the comfort of using Pandas dataframes in a Jupyter notebook, then Dask may be just your thing. Dask is an amazing Python library that lets you do all your Pandas-style dataframe manipulations with just a few simple tweaks so you don't have to worry about Jupyter freezing up.

Read
Data Science

Dealing with Grid Data in Python

In my PhD research, I do a lot of analysis of 2D and 3D grid data output by simulations I run. In my analyses, it's very helpful to restructure these data into a more useable format. A few key lines of python code do the trick.

Read
Projects

Interactive D3 Map of Baby Name Popularity

I built a D3 map showing baby name popularity by state with an interactive time slider you can use to see how the popularity changes over time. Come check it out!

Read
Projects

Parameter Sweep Bash Script

In my polymer field theory research, often my studies involve running a bunch of simulations where I pick one or more input parameters and change them over a range of values, then compare the results of each separate simulation to see how that/those variable(s) affect the system I’m simulating. It can be very tedious to manually create input files for each job, so I wrote a bash script to help me out.

Read