Dealing with Grid Data in Python

python
numpy
Some tricks for dealing with 2D and 3D grid data in python.
Author

Ben Lindsay

Published

December 8, 2016

In my PhD research, I do a lot of analysis of 2D and 3D grid data output by simulations I run. If, for example, I had a toy system that was 3x2x2 grid points, the raw data would be structured sort of like this:

x   y   z   value
0   0   0   0.9
1   0   0   1.1
2   0   0   0.8
0   1   0   1.1
1   1   0   1.0
2   1   0   0.9
0   0   1   0.6
1   0   1   1.2
2   0   1   0.8
0   1   1   0.9
1   1   1   1.2
2   1   1   1.3

In my analyses, it’s very helpful to restructure these data into a format where, in this case, x = [0, 1, 2], y = [0, 1], z = [0, 1], and value is a 3D array such that value[i, j, k] returns the value corresponding to position (x[i], y[j], z[k]).

It’s easy to do that in just a few lines. Say the above raw data is stored in data.dat.

>>> import numpy as np
>>> x, y, z, value = np.loadtxt('data.dat', skiprows=1).T
>>> x, y, z = np.unique(x), np.unique(y), np.unique(z)
>>> nx, ny, nz = len(x), len(y), len(z)
>>> value = value.reshape((nz, ny, nx)).T

Note that if the raw data had x varying the slowest and z varying the fastest, the final line would look like value = value.reshape((nx, ny, nz)).

Finally, if you want to go the other way, where you have your x, y, and z arrays and 3D values array, you can make use of the sklearn.utils.extmath.cartesian function (first introduced on this StackOverflow post. If you want z to be the fastest changing variable, it would look something like this:

>>> from sklearn.utils.extmath import cartesian
>>> import numpy as np
>>> x, y, z = [0, 1, 2], [0, 1], [0, 1]
>>> nx, ny, nz = len(x), len(y), len(z)
>>> value = np.arange(nx*ny*nz).reshape((nx,ny,nz)) # define 3D value array
>>> xyz = cartesian((z, y, x))
>>> value = value.flatten()
>>> np.hstack((xyz, value[:,None]))
array([[ 0,  0,  0,  0],
        [ 0,  0,  1,  1],
        [ 0,  1,  0,  2],
        [ 0,  1,  1,  3],
        [ 1,  0,  0,  4],
        [ 1,  0,  1,  5],
        [ 1,  1,  0,  6],
        [ 1,  1,  1,  7],
        [ 2,  0,  0,  8],
        [ 2,  0,  1,  9],
        [ 2,  1,  0, 10],
        [ 2,  1,  1, 11]])

The value[:,None] thing on the last line adds an extra dimension to the 1D value array so the elements of the tuple passed to np.hstack are both 2D numpy arrays.

>>> value.shape
(12,)
>>> value[:,None].shape
(12, 1)
>>> value[None,:].shape
(1, 12)