Working with the PAVICS THREDDS server

PAVICS data access

The THREDDS data storing NetCDF file on PAVICS has some public and private directories. Data from public directories can be accessed anonymously, while data from private directories require authentication. This notebook shows how to access public and private data on the THREDDS server.

The PAVICS THREDDS server has a testdata/ folder, in which we store test datasets to validate process requests. Within that directory is a secure/ folder whose file access requires authentication.

# Define some useful variables for following steps
import os

PAVICS_HOST = os.getenv("PAVICS_HOST", "pavics.ouranos.ca")
if PAVICS_HOST == "":
    raise ValueError("Invalid PAVICS HOST value.")

THREDDS_URL = f"https://{PAVICS_HOST}/twitcher/ows/proxy/thredds"
print("THREDDS URL:", THREDDS_URL)
THREDDS URL: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds

First let’s just open an unsecured link.

# NBVAL_IGNORE_OUTPUT

import xarray as xr

PUBLIC_URL = f"{THREDDS_URL}/dodsC/birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc"
ds = xr.open_dataset(PUBLIC_URL)
ds
<xarray.Dataset> Size: 565MB
Dimensions:    (time: 120, bnds: 2, lat: 160, lon: 320, plev: 23)
Coordinates:
  * time       (time) datetime64[ns] 960B 1991-01-16T12:00:00 ... 2000-12-16T...
  * lat        (lat) float64 1kB -89.14 -88.03 -86.91 ... 86.91 88.03 89.14
  * lon        (lon) float64 3kB 0.0 1.125 2.25 3.375 ... 356.6 357.8 358.9
  * plev       (plev) float64 184B 1e+05 9.25e+04 8.5e+04 ... 200.0 100.0 40.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 2kB ...
    lat_bnds   (lat, bnds) float64 3kB ...
    lon_bnds   (lon, bnds) float64 5kB ...
    ta         (time, plev, lat, lon) float32 565MB ...
Attributes: (12/28)
    institution:                     MRI (Meteorological Research Institute, ...
    institute_id:                    MRI
    experiment_id:                   decadal1980
    source:                          MRI-CGCM3 2011 atmosphere: GSMUV (gsmuv-...
    model_id:                        MRI-CGCM3
    forcing:                         GHG, SA, Oz, LU, Sl, Vl, BC, OC (GHG inc...
    ...                              ...
    title:                           MRI-CGCM3 model output prepared for CMIP...
    parent_experiment:               N/A
    modeling_realm:                  atmos
    realization:                     1
    cmor_version:                    2.7.1
    DODS_EXTRA.Unlimited_Dimension:  time

Now let’s do the same with a secured link.

# NBVAL_IGNORE_OUTPUT

from webob.exc import HTTPError

SECURED_URL = f"{THREDDS_URL}/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc"
try:
    ds = xr.open_dataset(SECURED_URL, decode_cf=False)
# depending on 'xarray' version, different errors are raised when failing authentication according to how they handle it
except OSError as exc:
    # "NetCDF: Access failure" xarray >= 0.20
    # "Authorization failure" xarray < 0.17
    assert "NetCDF: Access failure" in str(exc) or "Authorization failure" in str(exc)
except HTTPError as exc:  # xarray >= 0.17
    # note: raised error is 500 with 'message' Unauthorized instead of directly raising HTTPUnauthorized
    assert "401 Unauthorized" in str(exc)
else:
    raise RuntimeError(
        "Expected unauthorized response, but dataset open operation did not raise!"
    )

print("Unauthorized was raised as expected.")
Unauthorized was raised as expected.
syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <?xml^ version="1.0" encoding="utf-8"?><ExceptionReport version="1.0.0" xmlns="http://www.opengis.net/ows/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd"> <Exception exceptionCode="NoApplicableCode" locator="AccessForbidden"> <ExceptionText>Access to service is forbidden.</ExceptionText> </Exception></ExceptionReport>

To open a secured link, we need to open a session with Authentication. Using the wrong Authentication credentials will not work. They will raise immediately when failing login procedure. Using valid credentials will instead succeed login, but will raise a forbidden response when attempting to retrieve the data. Either way, user must be logged in and have appropriate access to fulfill Authorization requirements of the resource.

Let’s see the result when credentials are invalid.

# NBVAL_IGNORE_OUTPUT

import requests
from requests_magpie import MagpieAuth, MagpieAuthenticationError

BAD_USR = "an-invalid-user"
BAD_PWD = "or-bad-password"

try:
    with requests.session() as session:
        session.auth = MagpieAuth(f"https://{PAVICS_HOST}/magpie", BAD_USR, BAD_PWD)
        xr.open_dataset(
            SECURED_URL, decode_cf=False
        )  # Attributes are problematic with this file.
# specific error depends on what raises (unauthorized, forbidden, login failure) and 'xarray' version
except (OSError, HTTPError, MagpieAuthenticationError) as exc:
    print("Access with invalid credentials was not permitted as expected.")
else:
    raise RuntimeError(
        "Expected authentication failure response, but login operation did not raise!"
    )
Access with invalid credentials was not permitted as expected.
syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <?xml^ version="1.0" encoding="utf-8"?><ExceptionReport version="1.0.0" xmlns="http://www.opengis.net/ows/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd"> <Exception exceptionCode="NoApplicableCode" locator="AccessForbidden"> <ExceptionText>Access to service is forbidden.</ExceptionText> </Exception></ExceptionReport>

As we can see, the server identified that credentials were provided, but they were incorrect and could not log in. Similar result would happen if login succeeded, but user was forbidden access due to insufficient permissions.

We’ve created an authtest user in advance that has access to the secure contents to facilitate testing.

Let’s use it now to obtain the secured resource.

# NBVAL_IGNORE_OUTPUT

AUTH_USR = os.getenv("TEST_MAGPIE_AUTHTEST_USERNAME", "authtest")
AUTH_PWD = os.getenv("TEST_MAGPIE_AUTHTEST_PASSWORD", "authtest1234")

# Open session
with requests.Session() as session:
    session.auth = MagpieAuth(f"https://{PAVICS_HOST}/magpie", AUTH_USR, AUTH_PWD)
    # Open a PyDAP data store and pass it to xarray
    store = xr.backends.PydapDataStore.open(SECURED_URL, session=session)
    ds = xr.open_dataset(
        store, decode_cf=False
    )  # Attributes are problematic with this file.
ds
/opt/conda/envs/birdy/lib/python3.12/site-packages/pydap/handlers/dap.py:142: UserWarning: PyDAP was unable to determine the DAP protocol defaulting to DAP2. DAP2 is consider legacy and may result in slower responses. 
Consider replacing `http` in your `url` with either `dap2` or `dap4` to specify the DAP protocol (e.g. `dap2://<data_url>` or `dap4://<data_url>`).  For more 
information, go to https://www.opendap.org/faq-page.
  warnings.warn(
<xarray.Dataset> Size: 892kB
Dimensions:             (time: 12, bnds: 2, lat: 96, lon: 192)
Coordinates:
  * time                (time) float64 96B 5.699e+04 5.702e+04 ... 5.733e+04
  * lat                 (lat) float64 768B -88.57 -86.72 -84.86 ... 86.72 88.57
  * lon                 (lon) float64 2kB 0.0 1.875 3.75 ... 354.4 356.2 358.1
Dimensions without coordinates: bnds
Data variables:
    time_bnds           (time, bnds) float64 192B ...
    lat_bnds            (lat, bnds) float64 2kB ...
    lon_bnds            (lon, bnds) float64 3kB ...
    latitude_longitude  |S128 128B ...
    tasmax              (time, lat, lon) float32 885kB ...
Attributes: (12/28)
    institution:            Max Planck Institute for Meteorology
    institute_id:           MPI-M
    experiment_id:          rcp45
    source:                 MPI-ESM-MR 2011; URL: http://svn.zmaw.de/svn/cosm...
    model_id:               MPI-ESM-MR
    forcing:                GHG,Oz,SD,Sl,Vl,LU
    ...                     ...
    title:                  MPI-ESM-MR model output prepared for CMIP5 RCP4.5
    parent_experiment:      historical
    modeling_realm:         atmos
    realization:            1
    cmor_version:           2.6.0
    Unlimited_Dimension:    time

Successful listing of the above data means the user was granted access for this reference.