Regridding climate data with xESMF¶

A common element of climate data workflows is regridding, or reprojection, of model data unto more standard grids, or simply unto another dataset’s grid. The powerful ESMF program, written in FORTRAN, has long been a reference in the matter. The xESMF python package provides an easy to use high-level API for using ESMF’s methods. This notebook shows some examples of common regridding operations.

Regridding with xESMF is usually a two-step process:

Create a Regridder objects from two datasets, defining the input and the output grids. This computation a weights mask which can, if needed, be saved to a netCDCF file.
Regrid a DataArray or Dataset by calling the Regridder with it. As the weights have already been computed, it reuses them for all time slices, which allows much better performance than, for example, interpolation using scipy.interpolation.interpn.

# NBVAL_IGNORE_OUTPUT

import copy
import json
import warnings
from tempfile import NamedTemporaryFile

warnings.filterwarnings("ignore", category=DeprecationWarning)

import cf_xarray as cfxr
import geopandas as gpd
import matplotlib.pyplot as plt
import shapely
import xarray as xr
import xesmf as xe
from clisops.core.subset import subset_bbox
from owslib.wfs import WebFeatureService

#  A colormap with grey where the data is missing
cmap = copy.copy(plt.cm.get_cmap("viridis"))
cmap.set_bad("lightgray")

Second example: Conservative regridding and reusing weights¶

xESMF provides the following regridding methods : “bilinear”, “conservative”, “conservative_normed”, “nearest_s2d”, “nearest_d2s” and “patch” (see method descriptions). Conservative methods preserve areal averages, and for these methods we need to provide the coordinates of the grid cells’ corners rather than the coordinates at the cells center.

Untangling corners definitions¶

Before we go further, it’s worth highlighting differences between xESMF’s description of corner coordinates and how the same information is stored in CF-compliant files.

For an N x M lon/lat grid, xESMF expects an array with one element more than the coordinates. For example, on a regular grid, the corner of point at lon[0] are given by lon_b[0] and lon_b[1]. However, in a typical CF-compliant file, grid corner information is in an array of shape (N, 2) typically called lon_bounds and lat_bounds. Thus, the western and eastern corners of point at lon[0] are given by lon_corners[0, 0] and lon_corners[0, 1].

The cf_xarray package differentiates the two concepts by naming the CF-compliant one “bounds” and the xESMF one “vertices”. However, CF conventions sometime uses vertices and bound interchangeably, and in our model dataset, the vertices_longitude variable stores corners according to the “bounds” definition… We will nevertheless stick with cf_xarray’s nomenclature in the following.

The table below summarizes the difference between the two versions:

xESMF definition	bounds	vertices
CF-compliant	Yes	No
Shape (regular grid)	(N, 2)	(N+1, )
Shape (irregular grid)	(Nx, Ny, 4)	(Nx+1, Ny+1)

Computing the corners¶

The corners of regular grids (1D lat/lon) are inferred automatically if not given. This will be the case for our ds_tgt dataset.

For irregular grids, xESMF will check for variables lon_b and lat_b, or try automatic detection with the help of cf_xarray. If they are found, it uses cf_xarray’s method to convert from the CF-compliant “bounds” to the required “vertices” syntax. However, a small bug in xESMF 0.5.2 prevents use from using this feature with our model dataset. We will convert the corner variables ourselves from the CF-compliant format we have to the format xESMF expects.

# Get the bounds variable and convert them to "vertices" format
# Order=none, means that we do not know if the bounds are listed clockwise or counterclockwise, so we ask cf_xarray to try both.
lat_corners = cfxr.bounds_to_vertices(ds_in.vertices_latitude, "vertices", order=None)
lon_corners = cfxr.bounds_to_vertices(ds_in.vertices_longitude, "vertices", order=None)
ds_in_crns = ds_in.assign(lon_b=lon_corners, lat_b=lat_corners)

Regridding¶

The regridding process is as simple as above now that ds_in_crns contains the corner coordinates (lon_b, lat_b). Here we also pass a filename, so that the weights are saved to disk and can be reused (see below).

%%time

conservative_regridder = NamedTemporaryFile(delete=False, suffix=".nc")

reg_cons = xe.Regridder(
    ds_in_crns, ds_tgt, "conservative", filename=conservative_regridder.name
)
print(reg_cons)

# Regrid as before
sic_cons = reg_cons(ds_in_crns.siconc)

xESMF Regridder 
Regridding algorithm:       conservative 
Weight filename:            /tmp/tmpfgyaa2es.nc 
Reuse pre-computed weights? False 
Input grid shape:           (291, 360) 
Output grid shape:          (207, 570) 
Periodic in longitude?      False
CPU times: user 3.07 s, sys: 31.7 ms, total: 3.1 s
Wall time: 3.11 s

# Now let's look at the results
fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(14, 4))

sic_bil.isel(time=0).plot(ax=axs[0], cmap=cmap)
axs[0].set_title("Method: Bilinear")

sic_cons.isel(time=0).plot(ax=axs[1], cmap=cmap)
axs[1].set_title("Method: Conservative")

# A divergent colormap with gray on missing values
cmap_div = copy.copy(plt.cm.get_cmap("RdBu"))
cmap_div.set_bad("lightgray")
(sic_bil - sic_cons).isel(time=0).plot(ax=axs[2], cmap=cmap_div, vmin=-40, vmax=40)
diff_NaNs = (sic_bil.isnull() ^ sic_cons.isnull()).isel(time=0)
diff_NaNs.where(diff_NaNs).plot(
    cmap=plt.cm.Greens, ax=axs[2], vmin=0, add_colorbar=False
)
axs[2].set_title(
    "Bilinear minus Conservative\nGreen indicates missing values in one but not the other"
)
fig.tight_layout()

../_images/15fbc5b56b10eda1d12b85285508c862c1259b1294d7c3b2a9f870d92a2d2f5c.png

As we can see, “bilinear” regridding results in a smooth output field, while “conservative” results preserves the original data’s coarser resolution. In the last panel, the green cells show that the two methods have different missing values results. In our case of increasing resolution, there will often be more missing values when using “bilinear”. The next example explains how xESMF can explicitly manage missing values. But before, we look at the reusability of the weights generated by xESMF.

Reusing weights¶

The weights of the previous regridding have been written to disk. We can simply reuse them by specifying that filename and passing reuse_weights=True. You’ll notice how faster the process is, as we don’t compute the weights again.

%%time

reg_bis = xe.Regridder(
    ds_in_crns,
    ds_tgt,
    "conservative",
    reuse_weights=True,
    filename=conservative_regridder.name,
)
print(reg_bis)

# Regrid as before
sic_bis = reg_bis(ds_in_crns.siconc)

xESMF Regridder 
Regridding algorithm:       conservative 
Weight filename:            /tmp/tmpfgyaa2es.nc 
Reuse pre-computed weights? True 
Input grid shape:           (291, 360) 
Output grid shape:          (207, 570) 
Periodic in longitude?      False
CPU times: user 87.7 ms, sys: 4.98 ms, total: 92.6 ms
Wall time: 91.9 ms

Fourth example : Averaging over polygons¶

Because the conservative regridding method preserves areal averages, we can use xESMF to compute exact averages over polygons. We call it “exact” because is takes into account partial overlaps between the gridcells and the shapes, including potential holes. While it is fast and powerful, this polygon averaging functionality is new in xESMF and still lacks some features, like missing values handling and performance issues with high-resolution polygons.

The following example grabs some polygon shapes from PAVICS’ Geoserver and averages the NRCan data over them.

Define polygon shapes¶

This example fetches all MRC of Québec and then only selects 10 large ones.

wfs_url = "https://pavics.ouranos.ca/geoserver/wfs"  # TEST_USE_PROD_DATA
# # Connect to GeoServer WFS service.
wfs = WebFeatureService(wfs_url, version="2.0.0")
# Get the json as a binary stream
# Here we select Quebec's MRCs polygons
# We select only a few properties
data = wfs.getfeature(
    typename="public:quebec_mrc_boundaries",
    # bbox=(-93.1, 41.1, -75.0, 49.6),
    outputFormat="json",
    propertyname=["the_geom", "MRS_NM_MRC"],
)

# Load into a GeoDataFrame by reading the json on-the-fly
shapes_all = gpd.GeoDataFrame.from_features(json.load(data))
# Just for simplicity, let's take 10 large MRCs
shapes_all["AREA"] = shapes_all.area
shapes = shapes_all.sort_values("AREA").iloc[-20:-10].set_index("MRS_NM_MRC")

Validate and simplify shapes¶

High resolution polygons might slow down the creation of the xESMf averager object. Here we ensure polygons are simplified to a resolution 50x times finer than the input data. This should have a minimal impact on the output while still improving performance.

As it is the case here, downloaded polygons sometime have topological problems which can be tested with shapes.is_valid. Simplifying polygons sometimes help overcome these issues: here, we simplify with a tolerance of 1/100th of the grid size. Another workaround for self-intersections is to call shapes.buffer(0).

# NBVAL_IGNORE_OUTPUT

shapes.is_valid.all()

np.False_

# This is only to show the decrease in size


def count_points(elem):
    def _count(poly):
        return len(poly.exterior.coords) + sum(
            len(hole.coords) for hole in poly.interiors
        )

    if isinstance(elem, shapely.geometry.MultiPolygon):
        return sum(_count(poly) for poly in elem.geoms)
    return _count(elem)


# Count the total number of nodes in the shapes:
print(
    "Total number of nodes in the raw shapes : ",
    shapes.geometry.apply(count_points).sum(),
)

min_grid_size = float(
    min(abs(ds_in.lat.diff("lat")).min(), abs(ds_in.lon.diff("lon")).min())
)
print(
    f"Minimal grid size [°] of input ds: {min_grid_size:0.3f}, we will simplify to a tolerance of {min_grid_size / 100:0.5f}"
)

# Simplify geometries
shapes_simp = shapes.copy()
shapes_simp["geometry"] = shapes.simplify(min_grid_size / 100).buffer(0)

print(
    "Total number of nodes in the simplified shapes : ",
    shapes_simp.geometry.apply(count_points).sum(),
)
if shapes_simp.buffer(0).is_valid.all():
    print("All shapes are valid")

Total number of nodes in the raw shapes :  166813
Minimal grid size [°] of input ds: 0.083, we will simplify to a tolerance of 0.00083
Total number of nodes in the simplified shapes :  7231
All shapes are valid

Averaging over each polygon¶

Performing the spatial average is as simple as regridding. We first construct a SpatialAverager object from the input grid and polygons, then call it with the data to average. Note that xESMf expects a list of shapes, so we pass the shapes.geometry series (and not the GeoDataFrame itself).

The returned DataArray was averaged along its spatial (lat/lon) dimensions and the average over the different shapes are along the new geom dimension, which is in the same order as the initial GeoDataframe.

The current missing value handling in xESMF’s SpatialAverager is very strict, and we can see here how the three (3) MRCs that overlap with ocean cells of the data (where tasmin is NaN) are flagged as missing (NaN).

# NBVAL_IGNORE_OUTPUT

savg = xe.SpatialAverager(ds_in, shapes_simp.geometry)
tn_avg = savg(ds_in.tasmin)
tn_avg

/home/tjs/mambaforge/envs/pavics-sdi/lib/python3.12/site-packages/xesmf/frontend.py:1220: UserWarning: `polys` contains large (> 1°) segments. This could lead to errors over large regions. For a more accurate average, segmentize (densify) your shapes with  `shapely.segmentize(polys, 1)`
  self._check_polys_length(polys)

<xarray.DataArray (geom: 10)> Size: 40B
array([      nan,       nan, 274.64337, 275.4841 ,       nan, 276.23572,
       274.05746, 276.86087, 275.4918 , 276.8586 ], dtype=float32)
Coordinates:
    lon      (geom) float64 80B -66.01 -63.91 -77.09 ... -73.26 -76.76 -73.98
    lat      (geom) float64 80B 49.21 48.29 46.43 46.91 ... 48.81 48.15 48.04
Dimensions without coordinates: geom
Attributes:
    regrid_method:  conservative

Merging polygon features’ properties into the result¶

In the previous results, the polygons are indexed along the geom dimension, but we’d like to have the region names and properties.

Or, on the contrary, we could want to merge the averaged data to the dataframe instead.

# NBVAL_IGNORE_OUTPUT
shapes_data = shapes_simp.copy()
shapes_data["tasmin"] = tn_avg.to_series()
shapes_data

	geometry	AREA	tasmin
MRS_NM_MRC
La Haute-Gaspésie	POLYGON ((-65.1956 49.2296, -65.1872 49.0994, ...	1.414604	NaN
Le Rocher-Percé	POLYGON ((-62.9991 48.7018, -62.9991 47.1572, ...	1.542547	NaN
Pontiac	POLYGON ((-77.9313 47.2692, -77.6473 47.2693, ...	1.653789	274.643372
La Vallée-de-la-Gatineau	POLYGON ((-76.2703 47.6899, -76.27 47.692, -76...	1.672640	275.484100
La Haute-Côte-Nord	POLYGON ((-69.5082 49.9983, -69.5105 49.9979, ...	1.800530	NaN
Antoine-Labelle	POLYGON ((-75.5386 47.7633, -74.8894 47.7626, ...	1.923892	276.235718
Témiscamingue	MULTIPOLYGON (((-77.5792 47.4424, -77.5863 47....	2.282582	274.057465
Le Domaine-du-Roy	POLYGON ((-73.6826 49.9973, -73.6844 49.997, -...	2.363714	276.860870
La Vallée-de-l'Or	MULTIPOLYGON (((-77.5792 47.4424, -77.5729 47....	3.306656	275.491791
La Tuque	POLYGON ((-74.6763 48.9996, -74.6285 48.9679, ...	3.572838	276.858612

# NBVAL_IGNORE_OUTPUT

# Now we can plot easily the results as a choropleth map!
ax = shapes_data.plot(
    "tasmin", legend=True, legend_kwds={"label": "Minimal temperature 1993-05-20 [K]"}
)
ax.set_ylabel("Latitude")
ax.set_xlabel("Longitude");

../_images/eb7bd6ee1761824c9afa20a413e82c0a0fee11592539691f6be6280bcd98f350.png

Regridding climate data with xESMF¶

Simple example: Bilinear regridding from model to observation¶

The input data¶

The output grid¶

Regridding input data unto the output grid¶