# PAVICS catalog search

To find files that meet constraints, PAVICS offer a process called `pavicssearch` that searches through a catalog for files matching user-defined criteria. The information for each file is scraped from the attributes of each netCDF file. 

In [1]:
import collections

from birdy import WPSClient

url = "https://pavics.ouranos.ca/twitcher/ows/proxy/catalog/wps"
wps = WPSClient(url)
help(wps.pavicsearch)

Help on method pavicsearch in module birdy.client.base:

pavicsearch(facets=None, shards='*', offset=0, limit=0, fields='*', format='application/solr+json', query='*', distrib=False, type='Dataset', constraints=None, esgf=False, list_type='opendap_url', output_formats=None) method of birdy.client.base.WPSClient instance
 Search the PAVICS database and return a catalogue of matches.
 
 Parameters
 ----------
 facets : string
 Comma separated list of facets; facets are searchable indexing terms in the database.
 shards : string
 Shards to be queried
 offset : integer
 Where to start in the document count of the database search.
 limit : integer
 Maximum number of documents to return.
 fields : string
 Comme separated list of fields to return.
 format : string
 Output format.
 query : string
 Direct query to the database.
 distrib : boolean
 Distributed query
 type : string
 One of Dataset, File, Aggregate or FileAsAggregate.
 constraints : string
 Format is facet1:value1,facet2:value2,..

Potential search constraints are:
- project
- experiment
- model
- frequency
- variable
- variable_long_name
- units
- institute

Note that the *rip* label (realization, initialization, physics), e.g. r5i1p1, is missing from search facets.

The process returns an output dictionary storing the search facets of each file found, as well as a simple list of the links. 
Note that it is important to specify `type="File"`, otherwise the process will look for datasets, ie file aggregations. At the moment, very few aggregations are available on the PAVICS data server. 


In [2]:
# NBVAL_IGNORE_OUTPUT

resp = wps.pavicsearch(
 constraints="variable:tasmax,project:CMIP5,experiment:rcp45,model:MPI-ESM-MR,institute:MPI-M,frequency:mon",
 limit=100,
 type="File",
)
[result, files] = resp.get(asobj=True)
files

['https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r2i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-210012.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200701-200712.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r3i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r3i1p1_200601-210012.nc',
 'https://pavics.ouranos.ca/twitcher/ows/proxy/thr

In [3]:
# NBVAL_IGNORE_OUTPUT

searchfile = [
 f
 for f in result["response"]["docs"]
 if f["resourcename"]
 == "birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc"
]
searchfile[0]

{'cf_standard_name': ['air_temperature'],
 'abstract': 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'replica': False,
 'wms_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'keywords': ['air_temperature',
 'mon',
 'application/netcdf',
 'tasmax',
 'thredds',
 'CMIP5',
 'rcp45',
 'MPI-ESM-MR',
 'MPI-M'],
 'dataset_id': 'testdata.flyingpigeon.cmip5',
 'datetime_max': '2006-12-16T12:00:00Z',
 'id': '44b680cec0a7d4cc',
 'subject': 'Birdhouse Thredds Catalog',
 'category': 'thredds',
 'opendap_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'title': 'tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',
 'variable_palette': ['default'],
 'variable_min': [0],
 'variable_

In [4]:
for k in sorted(searchfile[0].keys()):
 # remove attributes that changes between different servers for the same file
 if k not in ["id", "last_modified", "_version_", "source"]:
 value = searchfile[0][k]
 valuesorted = (
 sorted(value)
 if (
 isinstance(value, collections.abc.Iterable)
 and not isinstance(value, str)
 )
 else value
 )
 print(f"{k}: {valuesorted}")

abstract: birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
catalog_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/flyingpigeon/cmip5/catalog.xml?dataset=birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
category: thredds
cf_standard_name: ['air_temperature']
content_type: application/netcdf
dataset_id: testdata.flyingpigeon.cmip5
datetime_max: 2006-12-16T12:00:00Z
datetime_min: 2006-01-16T12:00:00Z
experiment: rcp45
fileserver_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
frequency: mon
institute: MPI-M
keywords: ['CMIP5', 'MPI-ESM-MR', 'MPI-M', 'air_temperature', 'application/netcdf', 'mon', 'rcp45', 'tasmax', 'thredds']
latest: True
model: MPI-ESM-MR
opendap_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/