Computing indicators on DAP subsets

In a typical programming environment, the standard way to compute an indicator for a given location would be to extract the time series at the given location, then run the computation on this subset. When interacting with a remote server, things are a bit more complicated. One option would be to first call a subsetting process to extract the data at the desired location, then run the climate indicator process on that subsetted file. The other option showcased here is to pass a DAP url that encodes the subsetting operation.

This tutorial shows how to get the index for the desired location and pass them as a DAP link to a Finch indicator process.

[1]:
import os

import xarray as xr
from birdy import WPSClient

# Link to file storing precipitation
pr = "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip3/pr.sresa2.miub_echo_g.run1.atm.da.nc"

# Open connection to Finch WPS server
pavics_url = "https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps"
url = os.environ.get("WPS_URL", pavics_url)
wps = WPSClient(url)
[2]:
# NBVAL_IGNORE_OUTPUT
# Open remote dataset and extract location indices
ds = xr.open_dataset(pr)
ds
[2]:
<xarray.Dataset>
Dimensions:    (lat: 6, bnds: 2, lon: 7, time: 7200)
Coordinates:
  * lat        (lat) float64 42.68 46.39 50.1 53.81 57.52 61.23
  * lon        (lon) float64 281.2 285.0 288.8 292.5 296.2 300.0 303.8
  * time       (time) object 2046-01-01 12:00:00 ... 2065-12-30 12:00:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (lat, bnds) float64 ...
    lon_bnds   (lon, bnds) float64 ...
    time_bnds  (time, bnds) object ...
    pr         (time, lat, lon) float32 ...
Attributes: (12/17)
    comment:        Spinup: restart files from end of experiment 20C3M (corre...
    title:          MIUB  model output prepared for IPCC Fourth Assessment SR...
    cmor_version:   0.96
    institution:    MIUB (University of Bonn, Bonn, Germany)
    source:         ECHO-G(1999): atmosphere: ECHAM4 (T30L19) with partial se...
    contact:        Stephanie Legutke (legutke@dkrz.de), Seung-Ki Min(skmin@u...
    ...             ...
    calendar:       360_day
    project_id:     IPCC Fourth Assessment
    Conventions:    CF-1.0
    id:             pcmdi.ipcc4.miub_echo_g.sresa2.run1.atm.da
    history:        Mon Aug  1 11:42:37 2011: ncks -4 -L 7 -d lat,42.0,64.0 -...
    NCO:            4.0.9
[3]:
# Use the `remap_label_indexers` function to convert coordinates to *positional* indexes.
import datetime as dt

coords = dict(lat=45, lon=290)
index = xr.core.indexing.map_index_queries(ds, coords, method="nearest").dim_indexers

# The `nearest` method cannot be used with slices, so we do another call for the time period.
ti = xr.core.indexing.map_index_queries(
    ds, dict(time=slice("2060-01-01", "2064-12-30"))
).dim_indexers

# Merge the spatial and temporal indices
index.update(ti)
index
[3]:
{'lat': array(1), 'lon': array(2), 'time': slice(5040, 6840, None)}

Subsetting URLs

The subset syntax consists in a ? followed by comma separated list of variable names, each followed by a slice [start, step, stop] for each dimension. So for example, to get the very first time step of the precipitation time series over the entire grid, we’d write

<url>?pr[0:1:0][0:1:5][0:1:6]

Note that this uses a 0-based indexing system, so [0:1:1] is a slice including the first and second elements.

[4]:
# NBVAL_IGNORE_OUTPUT
xr.open_dataset(pr + "?pr[0:1:0][0:1:5][0:1:6]")
[4]:
<xarray.Dataset>
Dimensions:  (time: 1, lat: 6, lon: 7)
Dimensions without coordinates: time, lat, lon
Data variables:
    pr       (time, lat, lon) float32 ...
Attributes: (12/17)
    comment:        Spinup: restart files from end of experiment 20C3M (corre...
    title:          MIUB  model output prepared for IPCC Fourth Assessment SR...
    cmor_version:   0.96
    institution:    MIUB (University of Bonn, Bonn, Germany)
    source:         ECHO-G(1999): atmosphere: ECHAM4 (T30L19) with partial se...
    contact:        Stephanie Legutke (legutke@dkrz.de), Seung-Ki Min(skmin@u...
    ...             ...
    calendar:       360_day
    project_id:     IPCC Fourth Assessment
    Conventions:    CF-1.0
    id:             pcmdi.ipcc4.miub_echo_g.sresa2.run1.atm.da
    history:        Mon Aug  1 11:42:37 2011: ncks -4 -L 7 -d lat,42.0,64.0 -...
    NCO:            4.0.9

Note that the returned array has no time, lat or lon variables. We only requested the pr variable, not these other coordinate variables. To remedy the situation, we add these coordinate variables to the request.

[5]:
# NBVAL_IGNORE_OUTPUT
xr.open_dataset(pr + "?pr[0:1:0][0:1:5][0:1:6],time[0:1:0],lat,lon")
[5]:
<xarray.Dataset>
Dimensions:  (lat: 6, lon: 7, time: 1)
Coordinates:
  * lat      (lat) float64 42.68 46.39 50.1 53.81 57.52 61.23
  * lon      (lon) float64 281.2 285.0 288.8 292.5 296.2 300.0 303.8
  * time     (time) object 2046-01-01 12:00:00
Data variables:
    pr       (time, lat, lon) float32 ...
Attributes: (12/17)
    comment:        Spinup: restart files from end of experiment 20C3M (corre...
    title:          MIUB  model output prepared for IPCC Fourth Assessment SR...
    cmor_version:   0.96
    institution:    MIUB (University of Bonn, Bonn, Germany)
    source:         ECHO-G(1999): atmosphere: ECHAM4 (T30L19) with partial se...
    contact:        Stephanie Legutke (legutke@dkrz.de), Seung-Ki Min(skmin@u...
    ...             ...
    calendar:       360_day
    project_id:     IPCC Fourth Assessment
    Conventions:    CF-1.0
    id:             pcmdi.ipcc4.miub_echo_g.sresa2.run1.atm.da
    history:        Mon Aug  1 11:42:37 2011: ncks -4 -L 7 -d lat,42.0,64.0 -...
    NCO:            4.0.9

Now let’s go back to our original index and convert it into a DAP subset URL.

[6]:
# NBVAL_IGNORE_OUTPUT
xr.open_dataset(
    pr + "?pr[5040:1:6840][1:1:1][2:1:2],lat[1:1:1],lon[2:1:2],time[5040:1:6839]"
)
[6]:
<xarray.Dataset>
Dimensions:  (lat: 1, lon: 1, time: 1800, time_1: 1801)
Coordinates:
  * lat      (lat) float64 46.39
  * lon      (lon) float64 288.8
  * time     (time) object 2060-01-01 12:00:00 ... 2064-12-30 12:00:00
Dimensions without coordinates: time_1
Data variables:
    pr       (time_1, lat, lon) float32 ...
Attributes: (12/17)
    comment:        Spinup: restart files from end of experiment 20C3M (corre...
    title:          MIUB  model output prepared for IPCC Fourth Assessment SR...
    cmor_version:   0.96
    institution:    MIUB (University of Bonn, Bonn, Germany)
    source:         ECHO-G(1999): atmosphere: ECHAM4 (T30L19) with partial se...
    contact:        Stephanie Legutke (legutke@dkrz.de), Seung-Ki Min(skmin@u...
    ...             ...
    calendar:       360_day
    project_id:     IPCC Fourth Assessment
    Conventions:    CF-1.0
    id:             pcmdi.ipcc4.miub_echo_g.sresa2.run1.atm.da
    history:        Mon Aug  1 11:42:37 2011: ncks -4 -L 7 -d lat,42.0,64.0 -...
    NCO:            4.0.9
[7]:
index
[7]:
{'lat': array(1), 'lon': array(2), 'time': slice(5040, 6840, None)}
[8]:
# NBVAL_IGNORE_OUTPUT
import numpy as np


def dap_slice(index):
    """Convert python index dictionary into DAP subset index dictionary."""
    dap = {}
    for key, val in index.items():
        if isinstance(val, slice):
            dap[key] = f"[{val.start}:{val.step or 1}:{val.stop - 1}]"
        elif isinstance(val, int) or (isinstance(val, np.ndarray) and val.size == 1):
            dap[key] = f"[{val}:1:{val}]"
        else:
            raise ValueError(
                f"Index {val} can't be converted to a DAP subset index (for {key})."
            )
    return dap


def dap_subset(da, index):
    """Return DAP subset URL."""
    s = dap_slice(index)
    vs = [
        da,
    ] + list(da.coords.values())
    url = "?" + ",".join([x.name + "".join([s[dim] for dim in x.dims]) for x in vs])
    return url


sub = dap_subset(ds.pr, index)
print(sub)
?pr[5040:1:6839][1:1:1][2:1:2],lat[1:1:1],lon[2:1:2],time[5040:1:6839]
[9]:
# NBVAL_IGNORE_OUTPUT
xr.open_dataset(pr + sub)
[9]:
<xarray.Dataset>
Dimensions:  (lat: 1, lon: 1, time: 1800)
Coordinates:
  * lat      (lat) float64 46.39
  * lon      (lon) float64 288.8
  * time     (time) object 2060-01-01 12:00:00 ... 2064-12-30 12:00:00
Data variables:
    pr       (time, lat, lon) float32 ...
Attributes: (12/17)
    comment:        Spinup: restart files from end of experiment 20C3M (corre...
    title:          MIUB  model output prepared for IPCC Fourth Assessment SR...
    cmor_version:   0.96
    institution:    MIUB (University of Bonn, Bonn, Germany)
    source:         ECHO-G(1999): atmosphere: ECHAM4 (T30L19) with partial se...
    contact:        Stephanie Legutke (legutke@dkrz.de), Seung-Ki Min(skmin@u...
    ...             ...
    calendar:       360_day
    project_id:     IPCC Fourth Assessment
    Conventions:    CF-1.0
    id:             pcmdi.ipcc4.miub_echo_g.sresa2.run1.atm.da
    history:        Mon Aug  1 11:42:37 2011: ncks -4 -L 7 -d lat,42.0,64.0 -...
    NCO:            4.0.9

Using the subset URL in a WPS process

Now this subset url can be used as a normal netCDF link in WPS processes. Here, let’s compute the average precipitation during wet days (sdii) over our subset. As expected, the output is only computed for the five years we requested on a single grid point closest to the coordinates we chose.

[10]:
# NBVAL_IGNORE_OUTPUT
resp = wps.sdii(pr + sub)
out = resp.get(asobj=True)
out.output.sdii
[10]:
<xarray.DataArray 'sdii' (time: 5, lat: 1, lon: 1)>
[5 values with dtype=float64]
Coordinates:
  * lat      (lat) float64 46.39
  * lon      (lon) float64 288.8
  * time     (time) object 2060-01-01 00:00:00 ... 2064-01-01 00:00:00
Attributes:
    units:          mm d-1
    cell_methods:   time: mean (interval: 30 minutes)
    history:        pr=max(0,pr) applied to raw data;\n[2024-03-19 13:28:16] ...
    standard_name:  lwe_thickness_of_precipitation_amount
    long_name:      Average precipitation during days with daily precipitatio...
    description:    Annual simple daily intensity index (sdii) or annual aver...
[ ]: