PAVICS catalog search¶
To find files that meet constraints, PAVICS offer a process called pavicssearch that searches through a catalog for files matching user-defined criteria. The information for each file is scraped from the attributes of each netCDF file.
Help on method pavicsearch in module birdy.client.base:
pavicsearch(facets=None, shards='*', offset=0, limit=0, fields='*', format='application/solr+json', query='*', distrib=False, type='Dataset', constraints=None, esgf=False, list_type='opendap_url', output_formats=None) method of birdy.client.base.WPSClient instance
Search the PAVICS database and return a catalogue of matches.
Parameters
----------
facets : string
Comma separated list of facets; facets are searchable indexing terms in the database.
shards : string
Shards to be queried
offset : integer
Where to start in the document count of the database search.
limit : integer
Maximum number of documents to return.
fields : string
Comme separated list of fields to return.
format : string
Output format.
query : string
Direct query to the database.
distrib : boolean
Distributed query
type : string
One of Dataset, File, Aggregate or FileAsAggregate.
constraints : string
Format is facet1:value1,facet2:value2,...
esgf : boolean
Whether to also search ESGF nodes.
list_type : string
Can be opendap_url, fileserver_url, gridftp_url, globus_url, wms_url
Returns
-------
search_result : ComplexData:mimetype:`application/json`, :mimetype:`application/gml+xml`
PAVICS Catalogue Search Result
list_result : ComplexData:mimetype:`application/json`
List of urls of the search result.
Potential search constraints are:
project
experiment
model
frequency
variable
variable_long_name
units
institute
Note that the rip label (realization, initialization, physics), e.g. r5i1p1, is missing from search facets.
The process returns an output dictionary storing the search facets of each file found, as well as a simple list of the links.
Note that it is important to specify type="File", otherwise the process will look for datasets, ie file aggregations. At the moment, very few aggregations are available on the PAVICS data server.
# NBVAL_IGNORE_OUTPUT
resp = wps.pavicsearch(
constraints="variable:tasmax,project:CMIP5,experiment:rcp45,model:MPI-ESM-MR,institute:MPI-M,frequency:mon",
limit=100,
type="File",
)
[result, files] = resp.get(asobj=True)
files
['https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r2i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-210012.nc',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200701-200712.nc',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r3i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r3i1p1_200601-210012.nc',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200701-200712.nc',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r1i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-210012.nc']
# NBVAL_IGNORE_OUTPUT
searchfile = [
f
for f in result["response"]["docs"]
if f["resourcename"]
== "birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc"
]
searchfile[0]
{'cf_standard_name': ['air_temperature'],
'abstract': 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'replica': False,
'wms_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'keywords': ['air_temperature',
'mon',
'application/netcdf',
'tasmax',
'thredds',
'CMIP5',
'rcp45',
'MPI-ESM-MR',
'MPI-M'],
'dataset_id': 'testdata.flyingpigeon.cmip5',
'datetime_max': '2006-12-16T12:00:00Z',
'id': '44b680cec0a7d4cc',
'subject': 'Birdhouse Thredds Catalog',
'category': 'thredds',
'opendap_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'title': 'tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'variable_palette': ['default'],
'variable_min': [0],
'variable_long_name': ['Daily Maximum Near-Surface Air Temperature'],
'source': 'https://pavics.ouranos.ca//twitcher/ows/proxy/thredds/catalog.xml',
'datetime_min': '2006-01-16T12:00:00Z',
'score': 1.0,
'variable_max': [1],
'units': ['K'],
'resourcename': 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'type': 'File',
'catalog_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/flyingpigeon/cmip5/catalog.xml?dataset=birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'experiment': 'rcp45',
'last_modified': '2018-12-21T15:13:38Z',
'content_type': 'application/netcdf',
'_version_': 1658705594373111809,
'variable': ['tasmax'],
'url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
'project': 'CMIP5',
'institute': 'MPI-M',
'frequency': 'mon',
'model': 'MPI-ESM-MR',
'latest': True,
'fileserver_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc'}
for k in sorted(searchfile[0].keys()):
# remove attributes that changes between different servers for the same file
if k not in ["id", "last_modified", "_version_", "source"]:
value = searchfile[0][k]
valuesorted = (
sorted(value)
if (
isinstance(value, collections.abc.Iterable)
and not isinstance(value, str)
)
else value
)
print(f"{k}: {valuesorted}")
abstract: birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
catalog_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/flyingpigeon/cmip5/catalog.xml?dataset=birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
category: thredds
cf_standard_name: ['air_temperature']
content_type: application/netcdf
dataset_id: testdata.flyingpigeon.cmip5
datetime_max: 2006-12-16T12:00:00Z
datetime_min: 2006-01-16T12:00:00Z
experiment: rcp45
fileserver_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
frequency: mon
institute: MPI-M
keywords: ['CMIP5', 'MPI-ESM-MR', 'MPI-M', 'air_temperature', 'application/netcdf', 'mon', 'rcp45', 'tasmax', 'thredds']
latest: True
model: MPI-ESM-MR
opendap_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
project: CMIP5
replica: False
resourcename: birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
score: 1.0
subject: Birdhouse Thredds Catalog
title: tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
type: File
units: ['K']
url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
variable: ['tasmax']
variable_long_name: ['Daily Maximum Near-Surface Air Temperature']
variable_max: [1]
variable_min: [0]
variable_palette: ['default']
wms_url: https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc