Data Science

Data Science

Running Jupyter Lab Remotely

I'm a huge fan of Jupyter Notebooks, and I was very excited when I found out about Jupyter Lab, which provides a much more comprehensive user experience around Jupyter Notebooks. Here I share how to run Jupyter Lab efficiently on a remote machine. I have a research cluster where I do most of my analyses for my PhD work, and running Jupyter Lab directly on the cluster means I don't have to copy files between the cluster and my desktop.

Read
Data Science

Taking Advantage of Sparsity in the ALS-WR Algorithm

The ALS-WR algorithm works well for recommender systems involving a sparse matrix of users by items to review, which happens when most people only review a small subset of many possible items (businesses, movies, etc.). By tweaking the code from a great tutorial to take advantage of this sparsity, I was able to dramatically reduce the computation time.

Read
Data Science

Analyzing Larger-than-Memory Data on your Laptop

If you want to run some analysis on a dataset that's just a little too big to load into memory on your laptop, but you don't want to leave the comfort of using Pandas dataframes in a Jupyter notebook, then Dask may be just your thing. Dask is an amazing Python library that lets you do all your Pandas-style dataframe manipulations with just a few simple tweaks so you don't have to worry about Jupyter freezing up.

Read
Data Science

Dealing with Grid Data in Python

In my PhD research, I do a lot of analysis of 2D and 3D grid data output by simulations I run. In my analyses, it's very helpful to restructure these data into a more useable format. A few key lines of python code do the trick.

Read