Module 6 · Models · Free & interactive
Interpolation & Kriging in Python
Predict values at unsampled locations. Written by Eric Vaz. Read it here, or open the interactive version to write and run the Python yourself, in your browser, on real open data.
Open the interactive version (write & run the code) →Trend surfaces
For a continuous field sampled at points, the first question is whether there's a broad drift across the region. A trend surface answers it by regressing the value on the coordinates: first order fits a tilted plane, second order adds curvature. It's ordinary least squares; the spatial part is only in the design matrix.
import numpy as np
from pyodide.http import open_url
import pandas as pd
topo = pd.read_csv(open_url("https://cdn.jsdelivr.net/gh/vincentarelbundock/Rdatasets@master/csv/MASS/topo.csv"))[["x","y","z"]]
x, yy, z = topo.x.values, topo.y.values, topo.z.values
print("loaded", topo.shape, "elevation soundings")design so order 1 adds the linear terms x and y (order 2 already adds the quadratic terms). Fill the blank with the two linear columns.def design(x, y, order):
cols = [np.ones_like(x)]
if order >= 1: cols += [x, y]
if order >= 2: cols += [x*x, x*y, y*y]
return np.column_stack(cols)
print("order 1 columns:", design(x, yy, 1).shape[1], "| order 2 columns:", design(x, yy, 2).shape[1])
In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.
lstsq arguments so it solves A b ≈ z.for o in (1, 2):
A = design(x, yy, o)
b, *_ = np.linalg.lstsq(A, z, rcond=None)
r2 = 1 - np.sum((z - A@b)**2) / np.sum((z - z.mean())**2)
print(f"order {o}: R^2 = {r2:.3f} -> explains {r2*100:.0f}%")
In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.
Fit the second-order surface, draw it as filled contours, and (just as important) map the residuals: what the smooth trend cannot explain. Their spatial structure is exactly what the variogram and kriging will model next.
A = design(x, yy, 2)
b, *_ = np.linalg.lstsq(A, z, rcond=None)
resid = z - A @ b
import matplotlib.pyplot as plt
gx = np.linspace(x.min(), x.max(), 60); gy = np.linspace(yy.min(), yy.max(), 60)
GX, GY = np.meshgrid(gx, gy)
GZ = (design(GX.ravel(), GY.ravel(), 2) @ b).reshape(GX.shape)
fig, ax = plt.subplots(1, 2, figsize=(9.2, 3.8))
cf = ax[0].contourf(GX, GY, GZ, 12, cmap="viridis")
ax[0].scatter(x, yy, c="white", s=12, edgecolor="k"); ax[0].set(title="2nd-order trend surface"); ax[0].set_aspect("equal")
plt.colorbar(cf, ax=ax[0])
sc = ax[1].scatter(x, yy, c=resid, cmap="RdBu_r", s=70, edgecolor="k", linewidth=.4)
ax[1].set(title="Residuals: structure the trend missed"); ax[1].set_aspect("equal"); plt.colorbar(sc, ax=ax[1])
plt.tight_layout()The variogram
How quickly does similarity fall off with distance? The empirical semivariogram answers it: for every pair of points, take half the squared difference in value, then average those within distance bins. Three numbers describe the fitted model: the nugget (variance at zero distance), the sill (the plateau), and the range (where points stop being correlated). We compute it on the trend residuals.
import numpy as np
from pyodide.http import open_url
import pandas as pd
topo = pd.read_csv(open_url("https://cdn.jsdelivr.net/gh/vincentarelbundock/Rdatasets@master/csv/MASS/topo.csv"))[["x","y","z"]]
x, yy, z = topo.x.values, topo.y.values, topo.z.values
def design(x, y, order):
cols = [np.ones_like(x)]
if order >= 1: cols += [x, y]
if order >= 2: cols += [x*x, x*y, y*y]
return np.column_stack(cols)
A1 = design(x, yy, 1); b1, *_ = np.linalg.lstsq(A1, z, rcond=None)
r = z - A1 @ b1 # detrended residuals -> the structured part
print("detrended; residual std =", round(float(r.std()), 2))pdist needs a 2-D array, so pass the residuals r as a column. Fill the blank.from scipy.spatial.distance import pdist
P = np.c_[x, yy]
h = pdist(P) # all pairwise distances
g = 0.5 * pdist(r.reshape(-1, 1), "sqeuclidean")
edges = np.linspace(0, h.max()/2, 12); ctr = 0.5*(edges[1:] + edges[:-1])
gamma = np.array([g[(h>=lo)&(h<hi)].mean() if ((h>=lo)&(h<hi)).any() else np.nan
for lo, hi in zip(edges[:-1], edges[1:])])
print("semivariance per bin:", np.round(gamma, 1))
In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.
curve_fit. The first argument is the model function itself. Fill the blank with spherical.from scipy.optimize import curve_fit
def spherical(h, c0, c1, a):
h = np.asarray(h, float)
s = c0 + c1 * (1.5*h/a - 0.5*(h/a)**3); s[h > a] = c0 + c1
return s
ok = ~np.isnan(gamma)
(c0, c1, a), _ = curve_fit(spherical, ctr[ok], gamma[ok],
p0=[gamma[ok].min(), np.ptp(gamma[ok]), (h.max()/2)*0.6],
bounds=([0,0,1e-6], [np.inf]*3), maxfev=10000)
print(f"nugget = {c0:.0f} sill = {c0+c1:.0f} range = {a:.2f}")
In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.
import matplotlib.pyplot as plt
hh = np.linspace(0, h.max()/2, 100)
fig, ax = plt.subplots(figsize=(6.2, 3.9))
ax.plot(ctr[ok], gamma[ok], "o", color="#14233D", label="empirical")
ax.plot(hh, spherical(hh, c0, c1, a), color="#D6552F", lw=2, label="spherical model")
ax.set(xlabel="distance h", ylabel="semivariance", title="Variogram of detrended elevation"); ax.legend()Kriging
Kriging predicts the value at an unsampled location as a weighted average of the observations, with weights chosen so the prediction is unbiased and has minimum variance, and the weights come from the variogram you just fitted. Its signature feature is the second output: a kriging variance, low near data and high in the gaps.
import numpy as np
from pyodide.http import open_url
import pandas as pd
from scipy.spatial.distance import pdist, cdist
from scipy.optimize import curve_fit
topo = pd.read_csv(open_url("https://cdn.jsdelivr.net/gh/vincentarelbundock/Rdatasets@master/csv/MASS/topo.csv"))[["x","y","z"]]
x, yy, z = topo.x.values, topo.y.values, topo.z.values
def design(x, y, order):
cols = [np.ones_like(x)]
if order >= 1: cols += [x, y]
if order >= 2: cols += [x*x, x*y, y*y]
return np.column_stack(cols)
A1 = design(x, yy, 1); b1, *_ = np.linalg.lstsq(A1, z, rcond=None); r = z - A1 @ b1
P = np.c_[x, yy]; h = pdist(P); g = 0.5*pdist(r.reshape(-1,1), "sqeuclidean")
edges = np.linspace(0, h.max()/2, 12); ctr = 0.5*(edges[1:]+edges[:-1])
gamma = np.array([g[(h>=lo)&(h<hi)].mean() if ((h>=lo)&(h<hi)).any() else np.nan for lo,hi in zip(edges[:-1],edges[1:])])
def spherical(hh, c0, c1, a):
hh = np.asarray(hh, float); s = c0 + c1*(1.5*hh/a - 0.5*(hh/a)**3); s[hh>a] = c0+c1; return s
ok = ~np.isnan(gamma)
(c0, c1, a), _ = curve_fit(spherical, ctr[ok], gamma[ok], p0=[gamma[ok].min(), np.ptp(gamma[ok]), (h.max()/2)*.6], bounds=([0,0,1e-6],[np.inf]*3), maxfev=10000)
print("ready -> nugget=%.0f sill=%.0f range=%.2f" % (c0, c0+c1, a))K w = k: K holds variogram values between data points (plus a row/col of ones for unbiasedness), k holds variogram values to the target. Fill the blank to solve the linear system.def gamma_fn(hh): return spherical(hh, c0, c1, a)
n = len(P)
K = np.ones((n+1, n+1)); K[:n, :n] = gamma_fn(cdist(P, P)); K[n, n] = 0.0
def krige(pt):
k = np.append(gamma_fn(cdist([pt], P)[0]), 1.0) # +1 for the unbiasedness constraint
w = np.linalg.solve(K, k)
return w[:n] @ r, w @ k # residual prediction, variance
pred, var = krige([3.0, 3.0])
print("residual prediction at (3,3):", round(float(pred), 2), "| variance:", round(float(var), 2))
In the interactive version this line is blanked out and you write it yourself, with a hint, a solution, and an automatic check.
import matplotlib.pyplot as plt
gx = np.linspace(x.min(), x.max(), 45); gy = np.linspace(yy.min(), yy.max(), 45)
GX, GY = np.meshgrid(gx, gy); Z = np.zeros_like(GX); V = np.zeros_like(GX)
for i in range(GX.shape[0]):
for j in range(GX.shape[1]):
pr, vr = krige([GX[i,j], GY[i,j]])
Z[i,j] = pr + np.array([1, GX[i,j], GY[i,j]]) @ b1 # add the trend back
V[i,j] = vr
fig, ax = plt.subplots(1, 2, figsize=(9.6, 3.9))
cf = ax[0].contourf(GX, GY, Z, 14, cmap="viridis"); ax[0].scatter(x, yy, c="white", s=12, edgecolor="k")
ax[0].set(title="Kriged elevation"); ax[0].set_aspect("equal"); plt.colorbar(cf, ax=ax[0])
cv = ax[1].contourf(GX, GY, np.sqrt(np.clip(V, 0, None)), 14, cmap="magma"); ax[1].scatter(x, yy, c="#39d0ff", s=10, edgecolor="k")
ax[1].set(title="Kriging std. error (uncertainty)"); ax[1].set_aspect("equal"); plt.colorbar(cv, ax=ax[1])
plt.tight_layout()
print("Error is lowest at the dots and grows where there's no data.")Interpolation in land-use and environmental modelling
- Eric Vaz — Google Scholar profile — geostatistics and spatial prediction in regional studies
- Selected publications on ericvaz.com