Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
PAVICS 0.1 documentation
Light Logo Dark Logo
  • Tutorials
    • Bias correction process
  • Notebooks
    • Spatial and temporal subsetting
    • Gridded Data Renderer
    • Data Access Protocol (DAP)
    • Earth System Grid Federation Data Access
    • Working with the PAVICS THREDDS server
    • PAVICS data access
    • Working with the ClimEx Large Ensemble
    • Accessing and analysing Canadian Surface Reanalysis (CaSR) data in the PAVICS JupyterLab
    • Computing Indices on Weather Forecasts
    • Working with the ECCC GeoAPI to access weather station data
    • Working with the ECCC Climate Daily API
    • Web Coverage Service - Accessing GeoMet data using owslib
    • Web Feature Service - Accessing region countours saved on a GeoServer
    • Working with Web Processing Service with Python and OWSLib
    • Regridding climate data with xESMF
    • General workflow demonstration
    • Processing Large Climate Datasets with Dask and Xarray
    • PAVICS catalog search
    • PAVICS Web Processing Services using OGC-API integration with Weaver
  • Processes
    • Basic climate data analysis
    • Climate indicators
  • Projects using PAVICS
    • Flood Frequency Analysis and Dam Safety in the 21st Century Climate
    • Analyse de fréquence des crues et sécurité des barrages dans le climat du 21e siècle
    • Programmatic access to geospatial layers from the deca-millennial flood project
  • Developer Documentation
    • Installation
    • NetCDF data management
    • GeoServer administration
    • Integration tests
    • Building the docs
  • System Architecture
    • Overview
    • Backend - PAVICS Node
    • JupyterLab Interface
  • Provenance
  • Support
  • Release notes
  • License
  • TODO
Back to top
View this page

PAVICS catalog search¶

To find files that meet constraints, PAVICS offer a process called pavicssearch that searches through a catalog for files matching user-defined criteria. The information for each file is scraped from the attributes of each netCDF file.

import collections

from birdy import WPSClient

url = "https://pavics.ouranos.ca/twitcher/ows/proxy/catalog/wps"
wps = WPSClient(url)
help(wps.pavicsearch)
Help on method pavicsearch in module birdy.client.base:

pavicsearch(facets=None, shards='*', offset=0, limit=0, fields='*', format='application/solr+json', query='*', distrib=False, type='Dataset', constraints=None, esgf=False, list_type='opendap_url', output_formats=None) method of birdy.client.base.WPSClient instance
    Search the PAVICS database and return a catalogue of matches.
    
    Parameters
    ----------
    facets : string
        Comma separated list of facets; facets are searchable indexing terms in the database.
    shards : string
        Shards to be queried
    offset : integer
        Where to start in the document count of the database search.
    limit : integer
        Maximum number of documents to return.
    fields : string
        Comme separated list of fields to return.
    format : string
        Output format.
    query : string
        Direct query to the database.
    distrib : boolean
        Distributed query
    type : string
        One of Dataset, File, Aggregate or FileAsAggregate.
    constraints : string
        Format is facet1:value1,facet2:value2,...
    esgf : boolean
        Whether to also search ESGF nodes.
    list_type : string
        Can be opendap_url, fileserver_url, gridftp_url, globus_url, wms_url
    
    Returns
    -------
    search_result : ComplexData:mimetype:`application/json`, :mimetype:`application/gml+xml`
        PAVICS Catalogue Search Result
    list_result : ComplexData:mimetype:`application/json`
        List of urls of the search result.

Potential search constraints are:

  • project

  • experiment

  • model

  • frequency

  • variable

  • variable_long_name

  • units

  • institute

Note that the rip label (realization, initialization, physics), e.g. r5i1p1, is missing from search facets.

The process returns an output dictionary storing the search facets of each file found, as well as a simple list of the links. Note that it is important to specify type="File", otherwise the process will look for datasets, ie file aggregations. At the moment, very few aggregations are available on the PAVICS data server.

# NBVAL_IGNORE_OUTPUT

resp = wps.pavicsearch(
    constraints="variable:tasmax,project:CMIP5,experiment:rcp45,model:MPI-ESM-MR,institute:MPI-M,frequency:mon",
    limit=100,
    type="File",
)
[result, files] = resp.get(asobj=True)
files
['https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r2i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-210012.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200701-200712.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r3i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r3i1p1_200601-210012.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200701-200712.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r1i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-210012.nc']
# NBVAL_IGNORE_OUTPUT

searchfile = [
    f
    for f in result["response"]["docs"]
    if f["resourcename"]
    == "birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc"
]
searchfile[0]
{'cf_standard_name': ['air_temperature'],
 'abstract': 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'replica': False,
 'wms_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'keywords': ['air_temperature',
  'mon',
  'application/netcdf',
  'tasmax',
  'thredds',
  'CMIP5',
  'rcp45',
  'MPI-ESM-MR',
  'MPI-M'],
 'dataset_id': 'testdata.flyingpigeon.cmip5',
 'datetime_max': '2006-12-16T12:00:00Z',
 'id': '44b680cec0a7d4cc',
 'subject': 'Birdhouse Thredds Catalog',
 'category': 'thredds',
 'opendap_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'title': 'tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'variable_palette': ['default'],
 'variable_min': [0],
 'variable_long_name': ['Daily Maximum Near-Surface Air Temperature'],
 'source': 'https://pavics.ouranos.ca//twitcher/ows/proxy/thredds/catalog.xml',
 'datetime_min': '2006-01-16T12:00:00Z',
 'score': 1.0,
 'variable_max': [1],
 'units': ['K'],
 'resourcename': 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'type': 'File',
 'catalog_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/flyingpigeon/cmip5/catalog.xml?dataset=birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'experiment': 'rcp45',
 'last_modified': '2018-12-21T15:13:38Z',
 'content_type': 'application/netcdf',
 '_version_': 1658705594373111809,
 'variable': ['tasmax'],
 'url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'project': 'CMIP5',
 'institute': 'MPI-M',
 'frequency': 'mon',
 'model': 'MPI-ESM-MR',
 'latest': True,
 'fileserver_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc'}
for k in sorted(searchfile[0].keys()):
    # remove attributes that changes between different servers for the same file
    if k not in ["id", "last_modified", "_version_", "source"]:
        value = searchfile[0][k]
        valuesorted = (
            sorted(value)
            if (
                isinstance(value, collections.abc.Iterable)
                and not isinstance(value, str)
            )
            else value
        )
        print(f"{k}: {valuesorted}")
abstract: birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
catalog_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/flyingpigeon/cmip5/catalog.xml?dataset=birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
category: thredds
cf_standard_name: ['air_temperature']
content_type: application/netcdf
dataset_id: testdata.flyingpigeon.cmip5
datetime_max: 2006-12-16T12:00:00Z
datetime_min: 2006-01-16T12:00:00Z
experiment: rcp45
fileserver_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
frequency: mon
institute: MPI-M
keywords: ['CMIP5', 'MPI-ESM-MR', 'MPI-M', 'air_temperature', 'application/netcdf', 'mon', 'rcp45', 'tasmax', 'thredds']
latest: True
model: MPI-ESM-MR
opendap_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
project: CMIP5
replica: False
resourcename: birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
score: 1.0
subject: Birdhouse Thredds Catalog
title: tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
type: File
units: ['K']
url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
variable: ['tasmax']
variable_long_name: ['Daily Maximum Near-Surface Air Temperature']
variable_max: [1]
variable_min: [0]
variable_palette: ['default']
wms_url: https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
Next
PAVICS Web Processing Services using OGC-API integration with Weaver
Previous
Processing Large Climate Datasets with Dask and Xarray
Copyright © 2018-2026, Ouranos & CRIM
Made with Sphinx and @pradyunsg's Furo