Module 6 · Models · Free & interactive

Interpolation & Kriging in Python

Predict values at unsampled locations. Written by Eric Vaz. Read it here, or open the interactive version to write and run the Python yourself, in your browser, on real open data.

Open the interactive version (write & run the code) →

Trend surfaces

For a continuous field sampled at points, the first question is whether there's a broad drift across the region. A trend surface answers it by regressing the value on the coordinates: first order fits a tilted plane, second order adds curvature. It's ordinary least squares; the spatial part is only in the design matrix.

ExampleLoad the elevation points (run me first)

import numpy as np
from pyodide.http import open_url
import pandas as pd
topo = pd.read_csv(open_url("https://cdn.jsdelivr.net/gh/vincentarelbundock/Rdatasets@master/csv/MASS/topo.csv"))[["x","y","z"]]
x, yy, z = topo.x.values, topo.y.values, topo.z.values
print("loaded", topo.shape, "elevation soundings")

Try it yourselfBuild the design matrix

Complete design so order 1 adds the linear terms x and y (order 2 already adds the quadratic terms). Fill the blank with the two linear columns.

def design(x, y, order):
    cols = [np.ones_like(x)]
    if order >= 1: cols += [x, y]
    if order >= 2: cols += [x*x, x*y, y*y]
    return np.column_stack(cols)

print("order 1 columns:", design(x, yy, 1).shape[1], "| order 2 columns:", design(x, yy, 2).shape[1])

In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.

Try it yourselfFit and score

Fit orders 1 and 2 by least squares and report R². Fill the lstsq arguments so it solves A b ≈ z.

for o in (1, 2):
    A = design(x, yy, o)
    b, *_ = np.linalg.lstsq(A, z, rcond=None)
    r2 = 1 - np.sum((z - A@b)**2) / np.sum((z - z.mean())**2)
    print(f"order {o}: R^2 = {r2:.3f}  ->  explains {r2*100:.0f}%")

In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.

Fit the second-order surface, draw it as filled contours, and (just as important) map the residuals: what the smooth trend cannot explain. Their spatial structure is exactly what the variogram and kriging will model next.

ExampleThe surface, and what it leaves behind (run me)

A = design(x, yy, 2)
b, *_ = np.linalg.lstsq(A, z, rcond=None)
resid = z - A @ b
import matplotlib.pyplot as plt
gx = np.linspace(x.min(), x.max(), 60); gy = np.linspace(yy.min(), yy.max(), 60)
GX, GY = np.meshgrid(gx, gy)
GZ = (design(GX.ravel(), GY.ravel(), 2) @ b).reshape(GX.shape)
fig, ax = plt.subplots(1, 2, figsize=(9.2, 3.8))
cf = ax[0].contourf(GX, GY, GZ, 12, cmap="viridis")
ax[0].scatter(x, yy, c="white", s=12, edgecolor="k"); ax[0].set(title="2nd-order trend surface"); ax[0].set_aspect("equal")
plt.colorbar(cf, ax=ax[0])
sc = ax[1].scatter(x, yy, c=resid, cmap="RdBu_r", s=70, edgecolor="k", linewidth=.4)
ax[1].set(title="Residuals: structure the trend missed"); ax[1].set_aspect("equal"); plt.colorbar(sc, ax=ax[1])
plt.tight_layout()

The variogram

How quickly does similarity fall off with distance? The empirical semivariogram answers it: for every pair of points, take half the squared difference in value, then average those within distance bins. Three numbers describe the fitted model: the nugget (variance at zero distance), the sill (the plateau), and the range (where points stop being correlated). We compute it on the trend residuals.

ExampleDetrend the field (run me first)

import numpy as np
from pyodide.http import open_url
import pandas as pd
topo = pd.read_csv(open_url("https://cdn.jsdelivr.net/gh/vincentarelbundock/Rdatasets@master/csv/MASS/topo.csv"))[["x","y","z"]]
x, yy, z = topo.x.values, topo.y.values, topo.z.values
def design(x, y, order):
    cols = [np.ones_like(x)]
    if order >= 1: cols += [x, y]
    if order >= 2: cols += [x*x, x*y, y*y]
    return np.column_stack(cols)
A1 = design(x, yy, 1); b1, *_ = np.linalg.lstsq(A1, z, rcond=None)
r = z - A1 @ b1                      # detrended residuals -> the structured part
print("detrended; residual std =", round(float(r.std()), 2))

Try it yourselfEmpirical semivariogram

For every pair of points: half the squared value-difference, averaged in distance bins. pdist needs a 2-D array, so pass the residuals r as a column. Fill the blank.

from scipy.spatial.distance import pdist
P = np.c_[x, yy]
h = pdist(P)                                  # all pairwise distances
g = 0.5 * pdist(r.reshape(-1, 1), "sqeuclidean")
edges = np.linspace(0, h.max()/2, 12); ctr = 0.5*(edges[1:] + edges[:-1])
gamma = np.array([g[(h>=lo)&(h<hi)].mean() if ((h>=lo)&(h<hi)).any() else np.nan
                  for lo, hi in zip(edges[:-1], edges[1:])])
print("semivariance per bin:", np.round(gamma, 1))

In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.

Try it yourselfFit a spherical model

Fit the nugget, partial sill and range with curve_fit. The first argument is the model function itself. Fill the blank with spherical.

from scipy.optimize import curve_fit
def spherical(h, c0, c1, a):
    h = np.asarray(h, float)
    s = c0 + c1 * (1.5*h/a - 0.5*(h/a)**3); s[h > a] = c0 + c1
    return s
ok = ~np.isnan(gamma)
(c0, c1, a), _ = curve_fit(spherical, ctr[ok], gamma[ok],
    p0=[gamma[ok].min(), np.ptp(gamma[ok]), (h.max()/2)*0.6],
    bounds=([0,0,1e-6], [np.inf]*3), maxfev=10000)
print(f"nugget = {c0:.0f}   sill = {c0+c1:.0f}   range = {a:.2f}")

In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.

ExamplePlot the fitted variogram (run me)

import matplotlib.pyplot as plt
hh = np.linspace(0, h.max()/2, 100)
fig, ax = plt.subplots(figsize=(6.2, 3.9))
ax.plot(ctr[ok], gamma[ok], "o", color="#14233D", label="empirical")
ax.plot(hh, spherical(hh, c0, c1, a), color="#D6552F", lw=2, label="spherical model")
ax.set(xlabel="distance  h", ylabel="semivariance", title="Variogram of detrended elevation"); ax.legend()

Kriging

Kriging predicts the value at an unsampled location as a weighted average of the observations, with weights chosen so the prediction is unbiased and has minimum variance, and the weights come from the variogram you just fitted. Its signature feature is the second output: a kriging variance, low near data and high in the gaps.

ExampleSet up trend + variogram (run me, self-contained)

import numpy as np
from pyodide.http import open_url
import pandas as pd
from scipy.spatial.distance import pdist, cdist
from scipy.optimize import curve_fit
topo = pd.read_csv(open_url("https://cdn.jsdelivr.net/gh/vincentarelbundock/Rdatasets@master/csv/MASS/topo.csv"))[["x","y","z"]]
x, yy, z = topo.x.values, topo.y.values, topo.z.values
def design(x, y, order):
    cols = [np.ones_like(x)]
    if order >= 1: cols += [x, y]
    if order >= 2: cols += [x*x, x*y, y*y]
    return np.column_stack(cols)
A1 = design(x, yy, 1); b1, *_ = np.linalg.lstsq(A1, z, rcond=None); r = z - A1 @ b1
P = np.c_[x, yy]; h = pdist(P); g = 0.5*pdist(r.reshape(-1,1), "sqeuclidean")
edges = np.linspace(0, h.max()/2, 12); ctr = 0.5*(edges[1:]+edges[:-1])
gamma = np.array([g[(h>=lo)&(h<hi)].mean() if ((h>=lo)&(h<hi)).any() else np.nan for lo,hi in zip(edges[:-1],edges[1:])])
def spherical(hh, c0, c1, a):
    hh = np.asarray(hh, float); s = c0 + c1*(1.5*hh/a - 0.5*(hh/a)**3); s[hh>a] = c0+c1; return s
ok = ~np.isnan(gamma)
(c0, c1, a), _ = curve_fit(spherical, ctr[ok], gamma[ok], p0=[gamma[ok].min(), np.ptp(gamma[ok]), (h.max()/2)*.6], bounds=([0,0,1e-6],[np.inf]*3), maxfev=10000)
print("ready -> nugget=%.0f sill=%.0f range=%.2f" % (c0, c0+c1, a))

Try it yourselfSolve the kriging system

Each prediction's weights solve K w = k: K holds variogram values between data points (plus a row/col of ones for unbiasedness), k holds variogram values to the target. Fill the blank to solve the linear system.

def gamma_fn(hh): return spherical(hh, c0, c1, a)
n = len(P)
K = np.ones((n+1, n+1)); K[:n, :n] = gamma_fn(cdist(P, P)); K[n, n] = 0.0

def krige(pt):
    k = np.append(gamma_fn(cdist([pt], P)[0]), 1.0)   # +1 for the unbiasedness constraint
    w = np.linalg.solve(K, k)
    return w[:n] @ r, w @ k                            # residual prediction, variance

pred, var = krige([3.0, 3.0])
print("residual prediction at (3,3):", round(float(pred), 2), "| variance:", round(float(var), 2))

In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.

ExampleMap prediction + uncertainty (run me)

import matplotlib.pyplot as plt
gx = np.linspace(x.min(), x.max(), 45); gy = np.linspace(yy.min(), yy.max(), 45)
GX, GY = np.meshgrid(gx, gy); Z = np.zeros_like(GX); V = np.zeros_like(GX)
for i in range(GX.shape[0]):
    for j in range(GX.shape[1]):
        pr, vr = krige([GX[i,j], GY[i,j]])
        Z[i,j] = pr + np.array([1, GX[i,j], GY[i,j]]) @ b1   # add the trend back
        V[i,j] = vr
fig, ax = plt.subplots(1, 2, figsize=(9.6, 3.9))
cf = ax[0].contourf(GX, GY, Z, 14, cmap="viridis"); ax[0].scatter(x, yy, c="white", s=12, edgecolor="k")
ax[0].set(title="Kriged elevation"); ax[0].set_aspect("equal"); plt.colorbar(cf, ax=ax[0])
cv = ax[1].contourf(GX, GY, np.sqrt(np.clip(V, 0, None)), 14, cmap="magma"); ax[1].scatter(x, yy, c="#39d0ff", s=10, edgecolor="k")
ax[1].set(title="Kriging std. error (uncertainty)"); ax[1].set_aspect("equal"); plt.colorbar(cv, ax=ax[1])
plt.tight_layout()
print("Error is lowest at the dots and grows where there's no data.")

Go deeper / cite

Interpolation in land-use and environmental modelling

Eric Vaz — Google Scholar profile — geostatistics and spatial prediction in regional studies
Selected publications on ericvaz.com

Found this useful?It's free and always will be. A coffee keeps new modules coming.

☕ Buy me a coffee

More free modules

Spatial Autocorrelation — Understand how spatial relationships create patterns.Hot Spot Analysis — Identify statistically significant hot and cold spots.All modules & the interactive course →