Analyzing the ClimEx Large Ensemble

ClimEx is a project investigating the influence of climate change on hydro-meteorological extremes. To understand the effect of natural variability on the occurrence of rare extreme events, ClimEx ran 50 independent simulations, increasing the sample size available.

Analyzing simulation outputs from large ensembles can get tedious due to the large number of individual netCDF files to handle. To simplify data access, we’ve created an aggregated view of daily precipitation and temperature for all 50 ClimEx realizations from 1955 to 2100.

The first step is to open the dataset, whose path can be found in the climex catalog. Although there are currently 36 250 individual netCDF files, there is only one link in the catalog.

import warnings

import numba

warnings.simplefilter("ignore", category=numba.core.errors.NumbaDeprecationWarning)

import logging

import intake
import xarray as xr
import xclim
from clisops.core.subset import subset_gridpoint
from dask.diagnostics import ProgressBar
from dask.distributed import Client, LocalCluster
from IPython.display import HTML, Markdown, clear_output
from xclim import ensembles as xens

# fmt: off
climex = "https://pavics.ouranos.ca/catalog/climex.json"  # TEST_USE_PROD_DATA
cat = intake.open_esm_datastore(climex)
# fmt: on

cat.df.head()
license_type title institution driving_model_id driving_experiment type processing project_id frequency modeling_realm variable_name variable_long_name path
0 permissive non-commercial The ClimEx CRCM5 Large Ensemble Ouranos Consortium on Regional Climatology and... CCCma-CanESM2 historical, rcp85 RCM raw CLIMEX day atmos ['tasmin', 'tasmax', 'tas', 'pr', 'prsn', 'rot... ['Daily Minimum Near-Surface Temperature minim... https://pavics.ouranos.ca/twitcher/ows/proxy/t...
# Opening the link takes a while, because the server creates an aggregated view of 435,000 individual files.
url = cat.df.path[0]
ds = xr.open_dataset(url, chunks=dict(realization=2, time=30 * 3))
ds
<xarray.Dataset> Size: 4TB
Dimensions:       (rlat: 280, rlon: 280, time: 52924, realization: 50)
Coordinates:
  * rlat          (rlat) float64 2kB -12.61 -12.51 -12.39 ... 17.85 17.96 18.07
  * rlon          (rlon) float64 2kB 2.695 2.805 2.915 ... 33.17 33.28 33.39
  * time          (time) object 423kB 1955-01-01 00:00:00 ... 2099-12-30 00:0...
  * realization   (realization) |S64 3kB b'historical-r1-r10i1p1' ... b'histo...
    lat           (rlat, rlon) float32 314kB dask.array<chunksize=(280, 280), meta=np.ndarray>
    lon           (rlat, rlon) float32 314kB dask.array<chunksize=(280, 280), meta=np.ndarray>
Data variables:
    rotated_pole  |S64 64B ...
    tasmin        (realization, time, rlat, rlon) float32 830GB dask.array<chunksize=(2, 90, 280, 280), meta=np.ndarray>
    tasmax        (realization, time, rlat, rlon) float32 830GB dask.array<chunksize=(2, 90, 280, 280), meta=np.ndarray>
    tas           (realization, time, rlat, rlon) float32 830GB dask.array<chunksize=(2, 90, 280, 280), meta=np.ndarray>
    pr            (realization, time, rlat, rlon) float32 830GB dask.array<chunksize=(2, 90, 280, 280), meta=np.ndarray>
    prsn          (realization, time, rlat, rlon) float32 830GB dask.array<chunksize=(2, 90, 280, 280), meta=np.ndarray>
Attributes: (12/30)
    Conventions:              CF-1.6 
    DODS.dimName:             string1
    DODS.strlen:              0
    EXTRA_DIMENSION.bnds:     2
    NCO:                      "4.5.2"
    abstract:                 The ClimEx CRCM5 Large Ensemble of high-resolut...
    ...                       ...
    project_id:               CLIMEX
    rcm_version_id:           v3331 
    terms_of_use:             http://www.climex-project.org/sites/default/fil...
    title:                    The ClimEx CRCM5 Large Ensemble
    type:                     RCM
    EXTRA_DIMENSION.string1:  1

The CLIMEX dataset currently stores 5 daily variables, simulated by CRCM5 driven by CanESM2 using the representative concentration pathway RCP8.5:

  • pr: mean daily precipitation flux

  • prsn: mean daily snowfall flux

  • tas: mean daily temperature

  • tasmin: minimum daily temperature

  • tasmax: maximum daily temperature

There variables are stored along spatial dimensions (rotated latitude and longitude), time (1955-2100) and realizations (50 members). These members are created by first running five members from CanESM2 from 1850 to 1950, then small perturbations are applied in 1950 to spawn 10 members from each original member, to yield a total of 50 global realizations. Each realization is then dynamically downscaled to 12 km resolution over Québec.

ds.realization[:5].data.tolist()
[b'historical-r1-r10i1p1',
 b'historical-r1-r1i1p1',
 b'historical-r1-r2i1p1',
 b'historical-r1-r3i1p1',
 b'historical-r1-r4i1p1']

Creating maps of ClimEx fields

The data is on a rotated pole grid, and the actual geographic latitudes and longitudes of grid centroids are stored in the variables with the same name. We can plot the data directly using the native rlat and rlon coordinates easily.

# NBVAL_IGNORE_OUTPUT
from matplotlib import pyplot as plt

with ProgressBar():
    field = ds.tas.isel(time=0, realization=0).load()
    field.plot()
[                                        ] | 0% Completed | 254.65 us
[                                        ] | 0% Completed | 101.23 ms
[                                        ] | 0% Completed | 201.86 ms
[                                        ] | 0% Completed | 302.45 ms
[                                        ] | 0% Completed | 403.05 ms
[                                        ] | 0% Completed | 503.68 ms
[                                        ] | 0% Completed | 604.48 ms
[                                        ] | 0% Completed | 705.13 ms
[                                        ] | 0% Completed | 805.74 ms
[                                        ] | 0% Completed | 906.34 ms
[                                        ] | 0% Completed | 1.01 s
[                                        ] | 0% Completed | 1.11 s
[                                        ] | 0% Completed | 1.21 s
[#############                           ] | 33% Completed | 1.31 s
[#############                           ] | 33% Completed | 1.41 s
[#############                           ] | 33% Completed | 1.51 s
[#############                           ] | 33% Completed | 1.61 s
[#############                           ] | 33% Completed | 1.71 s
[#############                           ] | 33% Completed | 1.81 s
[#############                           ] | 33% Completed | 1.91 s
[#############                           ] | 33% Completed | 2.01 s
[#############                           ] | 33% Completed | 2.11 s
[#############                           ] | 33% Completed | 2.22 s
[#############                           ] | 33% Completed | 2.32 s
[#############                           ] | 33% Completed | 2.42 s
[#############                           ] | 33% Completed | 2.52 s
[#############                           ] | 33% Completed | 2.62 s
[#############                           ] | 33% Completed | 2.72 s
[##########################              ] | 66% Completed | 2.82 s
[##########################              ] | 66% Completed | 2.92 s
[##########################              ] | 66% Completed | 3.02 s
[##########################              ] | 66% Completed | 3.12 s
[##########################              ] | 66% Completed | 3.22 s
[##########################              ] | 66% Completed | 3.32 s
[##########################              ] | 66% Completed | 3.42 s
[##########################              ] | 66% Completed | 3.52 s
[##########################              ] | 66% Completed | 3.62 s
[##########################              ] | 66% Completed | 3.72 s
[##########################              ] | 66% Completed | 3.83 s
[##########################              ] | 66% Completed | 3.93 s
[##########################              ] | 66% Completed | 4.03 s
[##########################              ] | 66% Completed | 4.13 s
[##########################              ] | 66% Completed | 4.23 s
[##########################              ] | 66% Completed | 4.33 s
[##########################              ] | 66% Completed | 4.43 s
[##########################              ] | 66% Completed | 4.53 s
[##########################              ] | 66% Completed | 4.63 s
[##########################              ] | 66% Completed | 4.73 s
[##########################              ] | 66% Completed | 4.83 s
[##########################              ] | 66% Completed | 4.93 s
[##########################              ] | 66% Completed | 5.03 s
[##########################              ] | 66% Completed | 5.13 s
[##########################              ] | 66% Completed | 5.24 s
[##########################              ] | 66% Completed | 5.34 s
[##########################              ] | 66% Completed | 5.44 s
[##########################              ] | 66% Completed | 5.54 s
[##########################              ] | 66% Completed | 5.64 s
[##########################              ] | 66% Completed | 5.74 s
[##########################              ] | 66% Completed | 5.84 s
[##########################              ] | 66% Completed | 5.94 s
[##########################              ] | 66% Completed | 6.04 s
[##########################              ] | 66% Completed | 6.14 s
[##########################              ] | 66% Completed | 6.24 s
[##########################              ] | 66% Completed | 6.34 s
[##########################              ] | 66% Completed | 6.44 s
[##########################              ] | 66% Completed | 6.54 s
[##########################              ] | 66% Completed | 6.65 s
[##########################              ] | 66% Completed | 6.75 s
[##########################              ] | 66% Completed | 6.85 s
[##########################              ] | 66% Completed | 6.95 s
[########################################] | 100% Completed | 7.05 s

../_images/2843cd872c92e7488bb4e29a8ae16076507f784ecff6614cf6cc4758c3901886.png

However, the axes are defined with respect to the rotated coordinates. To create a map with correct geographic coordinates, we could use the real longitudes and latitudes (plt.pcolormesh(ds.lon, ds.lat, field)), but a better option is to create axes with the same projection used by the model. That is, we set the map projection to RotatedPole, using the coordinate reference system defined in the rotated_pole variable. We can use a similar approach to project the model output on another projection.

# NBVAL_IGNORE_OUTPUT
import cartopy.crs as ccrs

with ProgressBar():
    # The CRS for the rotated pole (data)
    rotp = ccrs.RotatedPole(
        pole_longitude=ds.rotated_pole.grid_north_pole_longitude,
        pole_latitude=ds.rotated_pole.grid_north_pole_latitude,
    )

    # The CRS for your map projection (can be anything)
    ortho = ccrs.Orthographic(central_longitude=-80, central_latitude=45)

    fig = plt.figure(figsize=(8, 4))

    # Plot data on the rotated pole projection directly
    ax = plt.subplot(1, 2, 1, projection=rotp)
    ax.coastlines()
    ax.gridlines()
    m = ax.pcolormesh(ds.rlon, ds.rlat, field)
    plt.colorbar(
        m, orientation="horizontal", label=field.long_name, fraction=0.046, pad=0.04
    )

    # Plot data on another projection, transforming it using the rotp crs.
    ax = plt.subplot(1, 2, 2, projection=ortho)
    ax.coastlines()
    ax.gridlines()
    m = ax.pcolormesh(
        ds.rlon, ds.rlat, field, transform=rotp
    )  # Note the transform parameter
    plt.colorbar(
        m, orientation="horizontal", label=field.long_name, fraction=0.046, pad=0.04
    )

    plt.show()
/home/docs/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/50m_physical/ne_50m_coastline.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)
../_images/cb9dc57fc18b209a231f04f49b1cc76f71ddc6c9209847ebba1b18a1b6060992.png

Computing climate indicators

Next we’ll compute a climate indicator as an example. We’ll compute the mean precipitation intensity for 50 members for the year 2000, then compute ensemble statistics between realizations. The amount of data that we’ll be requesting is however too big to go over the network in one request. To work around this, we need to first chunk the data in smaller dask.array portions by setting the chunks parameter when creating our dataset.

  • When selecting xarray chunk sizes it is important to consider both the “on-disk” chunk sizes of the netcdf data as well as the type of calculation you are performing. For example the Climex data variables _ChunkSizes attributes indicate on-disk chunk size for dims time, rlat, rlon of [365  50  50]1. Using multiples of these values is a reasonable way to ensure that in-memory dask chunks line up with the way that the data is stored on disk.

  • The next step is to think about the calculation we wish to perform. In this case we wish to analyse a single year of data for all realizations over the entire spatial domain. Using something like chunks = dict(realization=1, time=365, rlat=50*3, rlon=50*3) seems reasonable as a start. The general goal is to find a balance between chunk-size and a number of chunks … too many small chunks to process will create a lot of overhead and slow the calculation down. Very large chunks will use a lot of system memory causing other issues or data timeouts from the PAVICS thredds server. In this specific case having a large time chunk could result in reading more data than necessary seeing as we are only interested in a single year.

  • Enabling chunking also has the effect of making further computations on these chunks lazy. The next cell is thus going to execute almost instantly, because no data is transferred, and no computation executed. Calculations are launched when the data is actually needed, as when plotting graphics, saving to file, or when requested explicitly by calling the compute, persist or load methods. Below we make use of a dask.distributed Client of worker processes to parallelize computations and improve performance.

1 In this case the `realization` chunk size is not included in the netcdf file attributes as this is a virtual dimension resulting from the NcML aggregation.

ds = xr.open_dataset(
    url, chunks=dict(realization=1, time=365, rlat=50 * 3, rlon=50 * 3)
)
xclim.set_options(check_missing="pct", missing_options={"pct": {"tolerance": 0.05}})
sdii = xclim.atmos.daily_pr_intensity(pr=ds.pr.sel(time="2000"))
sdii
<xarray.DataArray 'sdii' (realization: 50, time: 1, rlat: 280, rlon: 280)> Size: 31MB
dask.array<where, shape=(50, 1, 280, 280), dtype=float64, chunksize=(1, 1, 150, 150), chunktype=numpy.ndarray>
Coordinates:
  * rlat         (rlat) float64 2kB -12.61 -12.51 -12.39 ... 17.85 17.96 18.07
  * rlon         (rlon) float64 2kB 2.695 2.805 2.915 ... 33.17 33.28 33.39
  * realization  (realization) |S64 3kB b'historical-r1-r10i1p1' ... b'histor...
    lat          (rlat, rlon) float32 314kB dask.array<chunksize=(150, 150), meta=np.ndarray>
    lon          (rlat, rlon) float32 314kB dask.array<chunksize=(150, 150), meta=np.ndarray>
  * time         (time) object 8B 2000-01-01 00:00:00
Attributes:
    units:          mm d-1
    cell_methods:   time: mean
    history:        [2024-11-13 12:32:28] Data compressed with BitRound by ke...
    standard_name:  lwe_thickness_of_precipitation_amount
    long_name:      Average precipitation during days with daily precipitatio...
    description:    Annual simple daily intensity index (sdii) or annual aver...
# NBVAL_IGNORE_OUTPUT

# Create a local client with 6 workers processes.
dask_kwargs = dict(
    n_workers=6,
    threads_per_worker=6,
    memory_limit="4GB",
    local_directory="/notebook_dir/writable-workspace/tmp",
    silence_logs=logging.ERROR,
)

with Client(**dask_kwargs) as client:
    clear_output()
    display(
        HTML(
            f'<div class="alert alert-info"> Consult the client <a href="{client.dashboard_link}" target="_blank"><strong><em><u>Dashboard</u></em></strong></a> or the <strong><em>Dask jupyter sidebar</em></strong> to follow progress ... </div>'
        )
    )
    out = xens.ensemble_mean_std_max_min(sdii.to_dataset())
    out = out.load()

clear_output()
---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
Cell In[7], line 12
      1 # NBVAL_IGNORE_OUTPUT
      2 
      3 # Create a local client with 6 workers processes.
      4 dask_kwargs = dict(
      5     n_workers=6,
      6     threads_per_worker=6,
   (...)
      9     silence_logs=logging.ERROR,
     10 )
---> 12 with Client(**dask_kwargs) as client:
     13     clear_output()
     14     display(
     15         HTML(
     16             f'<div class="alert alert-info"> Consult the client <a href="{client.dashboard_link}" target="_blank"><strong><em><u>Dashboard</u></em></strong></a> or the <strong><em>Dask jupyter sidebar</em></strong> to follow progress ... </div>'
     17         )
     18     )

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/client.py:1230, in Client.__init__(self, address, loop, timeout, set_as_default, scheduler_file, security, asynchronous, name, heartbeat_interval, serializers, deserializers, extensions, direct_to_workers, connection_limit, **kwargs)
   1227 preload_argv = dask.config.get("distributed.client.preload-argv")
   1228 self.preloads = preloading.process_preloads(self, preload, preload_argv)
-> 1230 self.start(timeout=timeout)
   1231 Client._instances.add(self)
   1233 from distributed.recreate_tasks import ReplayTaskClient

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/client.py:1432, in Client.start(self, **kwargs)
   1430     self._started = asyncio.ensure_future(self._start(**kwargs))
   1431 else:
-> 1432     sync(self.loop, self._start, **kwargs)

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/utils.py:439, in sync(loop, func, callback_timeout, *args, **kwargs)
    436         wait(10)
    438 if error is not None:
--> 439     raise error
    440 else:
    441     return result

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/utils.py:413, in sync.<locals>.f()
    411         awaitable = wait_for(awaitable, timeout)
    412     future = asyncio.ensure_future(awaitable)
--> 413     result = yield future
    414 except Exception as exception:
    415     error = exception

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/tornado/gen.py:766, in Runner.run(self)
    764 try:
    765     try:
--> 766         value = future.result()
    767     except Exception as e:
    768         # Save the exception for later. It's important that
    769         # gen.throw() not be called inside this try/except block
    770         # because that makes sys.exc_info behave unexpectedly.
    771         exc: Optional[Exception] = e

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/client.py:1497, in Client._start(self, timeout, **kwargs)
   1494 elif self._start_arg is None:
   1495     from distributed.deploy import LocalCluster
-> 1497     self.cluster = await LocalCluster(
   1498         loop=self.loop,
   1499         asynchronous=self._asynchronous,
   1500         **self._startup_kwargs,
   1501     )
   1502     address = self.cluster.scheduler_address
   1504 self._gather_semaphore = asyncio.Semaphore(5)

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/deploy/spec.py:420, in SpecCluster.__await__.<locals>._()
    418     await self._start()
    419 await self.scheduler
--> 420 await self._correct_state()
    421 if self.workers:
    422     await asyncio.wait(
    423         [
    424             asyncio.create_task(_wrap_awaitable(w))
    425             for w in self.workers.values()
    426         ]
    427     )  # maybe there are more

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/deploy/spec.py:374, in SpecCluster._correct_state_internal(self)
    372 if isinstance(cls, str):
    373     cls = import_term(cls)
--> 374 worker = cls(
    375     getattr(self.scheduler, "contact_address", None)
    376     or self.scheduler.address,
    377     **opts,
    378 )
    379 self._created.add(worker)
    380 workers.append(worker)

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/nanny.py:200, in Nanny.__init__(self, scheduler_ip, scheduler_port, scheduler_file, worker_port, nthreads, loop, local_directory, services, name, memory_limit, reconnect, validate, quiet, resources, silence_logs, death_timeout, preload, preload_argv, preload_nanny, preload_nanny_argv, security, contact_address, listen_address, worker_class, env, interface, host, port, protocol, config, **worker_kwargs)
    186     preload_nanny_argv = dask.config.get("distributed.nanny.preload-argv")
    188 handlers = {
    189     "instantiate": self.instantiate,
    190     "kill": self.kill,
   (...)
    198     "plugin_remove": self.plugin_remove,
    199 }
--> 200 super().__init__(
    201     handlers=handlers,
    202     connection_args=self.connection_args,
    203     local_directory=local_directory,
    204     needs_workdir=False,
    205 )
    207 self.preloads = preloading.process_preloads(
    208     self, preload_nanny, preload_nanny_argv, file_dir=self.local_directory
    209 )
    211 self.death_timeout = parse_timedelta(death_timeout)

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/core.py:262, in Server.__init__(self, handlers, blocked_handlers, stream_handlers, connection_limit, deserialize, serializers, deserializers, connection_args, timeout, io_loop, local_directory, needs_workdir)
    253 self._original_local_dir = local_directory
    255 with warn_on_duration(
    256     "1s",
    257     "Creating scratch directories is taking a surprisingly long time. ({duration:.2f}s) "
   (...)
    260     "scratch data to a local disk.",
    261 ):
--> 262     self._workspace = WorkSpace(local_directory)
    264     if not needs_workdir:  # eg. Nanny will not need a WorkDir
    265         self._workdir = None

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/diskutils.py:128, in WorkSpace.__init__(self, base_dir)
    127 def __init__(self, base_dir: str):
--> 128     self.base_dir = self._init_workspace(base_dir)
    129     self._global_lock_path = os.path.join(self.base_dir, "global.lock")
    130     self._purge_lock_path = os.path.join(self.base_dir, "purge.lock")

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/distributed/diskutils.py:146, in WorkSpace._init_workspace(self, base_dir)
    144 for try_dir in try_dirs:
    145     try:
--> 146         os.makedirs(try_dir)
    147     except FileExistsError:
    148         try:

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
    213 if head and tail and not path.exists(head):
    214     try:
--> 215         makedirs(head, exist_ok=exist_ok)
    216     except FileExistsError:
    217         # Defeats race condition when another thread created the path
    218         pass

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
    213 if head and tail and not path.exists(head):
    214     try:
--> 215         makedirs(head, exist_ok=exist_ok)
    216     except FileExistsError:
    217         # Defeats race condition when another thread created the path
    218         pass

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
    213 if head and tail and not path.exists(head):
    214     try:
--> 215         makedirs(head, exist_ok=exist_ok)
    216     except FileExistsError:
    217         # Defeats race condition when another thread created the path
    218         pass

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/os.py:225, in makedirs(name, mode, exist_ok)
    223         return
    224 try:
--> 225     mkdir(name, mode)
    226 except OSError:
    227     # Cannot rely on checking for EEXIST, since the operating system
    228     # could give priority to other errors like EACCES or EROFS
    229     if not exist_ok or not path.isdir(name):

PermissionError: [Errno 13] Permission denied: '/notebook_dir'
fig = plt.figure(figsize=(15, 5))
for ii, vv in enumerate([v for v in out.data_vars if "stdev" not in v]):
    ax = plt.subplot(1, 3, ii + 1, projection=rotp)

    ax.coastlines()
    ax.gridlines()
    m = ax.pcolormesh(
        out.rlon, out.rlat, out[vv].isel(time=0), vmin=0, vmax=30, cmap="YlGnBu"
    )
    c = plt.colorbar(
        m, orientation="horizontal", label=out[vv].long_name, fraction=0.046, pad=0.04
    )
    ax.set_title(f"Ensemble {vv.split('_')[-1]}")
    if ii != 1:
        c.set_label("")
../_images/c005996dc054a56a72a8f5c7964690956e0801df7c3d5f94650c7637f83d3271.png

In the next example, we extract one grid point over Montréal in order to calculate the annual maximum 1-day precipitation for all years and ensemble members. In this case we would likely benefit by rethinking out chunk sizes.

  • Since we are only interested in a single grid-cell there is no real need to have relatively large spatial chunks in the rlon and rlat dimensions read into memory to simply be immediately discarded. In this case we reduce the spatial chunk size.

  • Similarly, since we are only looking at a single point we can likely get away with loading a large number of time steps into a single chunk.

# Subset over the Montreal gridpoint
ds = xr.open_dataset(url, chunks=dict(realization=1, time=365 * 50, rlon=25, rlat=25))
pt = subset_gridpoint(ds, lon=-73.69, lat=45.50)
print("Input dataset for Montreal :")
display(pt)
out = xclim.atmos.max_1day_precipitation_amount(pr=pt.pr, freq="YS")
print("Maximim 1-day precipitation `lazy` output ..")
out
Input dataset for Montreal :
<xarray.Dataset> Size: 53MB
Dimensions:       (realization: 50, time: 52924)
Coordinates:
    rlat          float64 8B 0.365
    rlon          float64 8B 16.12
  * time          (time) object 423kB 1955-01-01 00:00:00 ... 2099-12-30 00:0...
  * realization   (realization) |S64 3kB b'historical-r1-r10i1p1' ... b'histo...
    lat           float32 4B 45.45
    lon           float32 4B -73.7
Data variables:
    rotated_pole  |S64 64B b''
    tasmin        (realization, time) float32 11MB dask.array<chunksize=(1, 18250), meta=np.ndarray>
    tasmax        (realization, time) float32 11MB dask.array<chunksize=(1, 18250), meta=np.ndarray>
    tas           (realization, time) float32 11MB dask.array<chunksize=(1, 18250), meta=np.ndarray>
    pr            (realization, time) float32 11MB dask.array<chunksize=(1, 18250), meta=np.ndarray>
    prsn          (realization, time) float32 11MB dask.array<chunksize=(1, 18250), meta=np.ndarray>
Attributes: (12/30)
    Conventions:              CF-1.6 
    DODS.dimName:             string1
    DODS.strlen:              0
    EXTRA_DIMENSION.bnds:     2
    NCO:                      "4.5.2"
    abstract:                 The ClimEx CRCM5 Large Ensemble of high-resolut...
    ...                       ...
    project_id:               CLIMEX
    rcm_version_id:           v3331 
    terms_of_use:             http://www.climex-project.org/sites/default/fil...
    title:                    The ClimEx CRCM5 Large Ensemble
    type:                     RCM
    EXTRA_DIMENSION.string1:  1
Maximim 1-day precipitation `lazy` output ..
<xarray.DataArray 'rx1day' (realization: 50, time: 145)> Size: 29kB
dask.array<where, shape=(50, 145), dtype=float32, chunksize=(1, 50), chunktype=numpy.ndarray>
Coordinates:
    rlat         float64 8B 0.365
    rlon         float64 8B 16.12
  * realization  (realization) |S64 3kB b'historical-r1-r10i1p1' ... b'histor...
    lat          float32 4B 45.45
    lon          float32 4B -73.7
  * time         (time) object 1kB 1955-01-01 00:00:00 ... 2099-01-01 00:00:00
Attributes:
    units:          mm/day
    standard_name:  lwe_thickness_of_precipitation_amount
    cell_methods:   time: mean time: maximum over days
    history:        [2024-11-13 12:32:28] Data compressed with BitRound by ke...
    long_name:      Maximum 1-day total precipitation
    description:    Annual maximum 1-day total precipitation
# NBVAL_IGNORE_OUTPUT

# Compute the annual max 1-day precipitation
dask_kwargs = dict(
    n_workers=6,
    threads_per_worker=6,
    memory_limit="4GB",
    local_directory="/notebook_dir/writable-workspace/tmp",
    silence_logs=logging.ERROR,
)
with Client(**dask_kwargs) as client:
    # dashboard available at client.dashboard_link
    clear_output()
    display(
        HTML(
            f'<div class="alert alert-info"> Consult the client <a href="{client.dashboard_link}" target="_blank"><strong><em><u>Dashboard</u></em></strong></a> or the <strong><em>Dask jupyter sidebar</em></strong> to follow progress ... </div>'
        )
    )
    out = out.load()

clear_output()
out_ens = xens.ensemble_mean_std_max_min(out.to_dataset())
fig = plt.figure(figsize=(15, 7))
ax = plt.subplot(1, 1, 1)
h1 = out.plot.line(ax=ax, x="time", color="b", linestyle=":", alpha=0.075, label=None)
h2 = plt.fill_between(
    x=out_ens.time.values,
    y1=out_ens.rx1day_max,
    y2=out_ens.rx1day_min,
    color="b",
    alpha=0.3,
    label="ensemble min - max",
)
h3 = plt.plot(
    out_ens.time.values, out_ens.rx1day_mean, color="b", label="ensemble mean"
)
handles, labels = ax.get_legend_handles_labels()
leg = ax.legend(handles[::-1], labels[::-1])
tit = plt.title(
    f"ClimEx large ensemble \n Montreal : {out.description} {out_ens.time.dt.year[0].values} to {out_ens.time.dt.year[-1].values}"
)
../_images/58a10365210360060d350504312e58c07e4e09f485ee0c7fad9f7953a851e3ba.png