{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PAVICS catalog search\n", "\n", "To find files that meet constraints, PAVICS offer a process called `pavicssearch` that searches through a catalog for files matching user-defined criteria. The information for each file is scraped from the attributes of each netCDF file. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method pavicsearch in module birdy.client.base:\n", "\n", "pavicsearch(facets=None, shards='*', offset=0, limit=0, fields='*', format='application/solr+json', query='*', distrib=False, type='Dataset', constraints=None, esgf=False, list_type='opendap_url', output_formats=None) method of birdy.client.base.WPSClient instance\n", " Search the PAVICS database and return a catalogue of matches.\n", " \n", " Parameters\n", " ----------\n", " facets : string\n", " Comma separated list of facets; facets are searchable indexing terms in the database.\n", " shards : string\n", " Shards to be queried\n", " offset : integer\n", " Where to start in the document count of the database search.\n", " limit : integer\n", " Maximum number of documents to return.\n", " fields : string\n", " Comme separated list of fields to return.\n", " format : string\n", " Output format.\n", " query : string\n", " Direct query to the database.\n", " distrib : boolean\n", " Distributed query\n", " type : string\n", " One of Dataset, File, Aggregate or FileAsAggregate.\n", " constraints : string\n", " Format is facet1:value1,facet2:value2,...\n", " esgf : boolean\n", " Whether to also search ESGF nodes.\n", " list_type : string\n", " Can be opendap_url, fileserver_url, gridftp_url, globus_url, wms_url\n", " \n", " Returns\n", " -------\n", " search_result : ComplexData:mimetype:`application/json`, :mimetype:`application/gml+xml`\n", " PAVICS Catalogue Search Result\n", " list_result : ComplexData:mimetype:`application/json`\n", " List of urls of the search result.\n", "\n" ] } ], "source": [ "import collections\n", "\n", "from birdy import WPSClient\n", "\n", "url = \"https://pavics.ouranos.ca/twitcher/ows/proxy/catalog/wps\"\n", "wps = WPSClient(url)\n", "help(wps.pavicsearch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Potential search constraints are:\n", "- project\n", "- experiment\n", "- model\n", "- frequency\n", "- variable\n", "- variable_long_name\n", "- units\n", "- institute\n", "\n", "Note that the *rip* label (realization, initialization, physics), e.g. r5i1p1, is missing from search facets.\n", "\n", "The process returns an output dictionary storing the search facets of each file found, as well as a simple list of the links. \n", "Note that it is important to specify `type=\"File\"`, otherwise the process will look for datasets, ie file aggregations. At the moment, very few aggregations are available on the PAVICS data server. \n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r2i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-210012.nc',\n", " 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc',\n", " 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200701-200712.nc',\n", " 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r3i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r3i1p1_200601-210012.nc',\n", " 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc',\n", " 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/secure/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200701-200712.nc',\n", " 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-MR/rcp45/mon/atmos/r1i1p1/tasmax/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-210012.nc']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# NBVAL_IGNORE_OUTPUT\n", "\n", "resp = wps.pavicsearch(\n", " constraints=\"variable:tasmax,project:CMIP5,experiment:rcp45,model:MPI-ESM-MR,institute:MPI-M,frequency:mon\",\n", " limit=100,\n", " type=\"File\",\n", ")\n", "[result, files] = resp.get(asobj=True)\n", "files" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'cf_standard_name': ['air_temperature'],\n", " 'abstract': 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'replica': False,\n", " 'wms_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'keywords': ['air_temperature',\n", " 'mon',\n", " 'application/netcdf',\n", " 'tasmax',\n", " 'thredds',\n", " 'CMIP5',\n", " 'rcp45',\n", " 'MPI-ESM-MR',\n", " 'MPI-M'],\n", " 'dataset_id': 'testdata.flyingpigeon.cmip5',\n", " 'datetime_max': '2006-12-16T12:00:00Z',\n", " 'id': '44b680cec0a7d4cc',\n", " 'subject': 'Birdhouse Thredds Catalog',\n", " 'category': 'thredds',\n", " 'opendap_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'title': 'tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'variable_palette': ['default'],\n", " 'variable_min': [0],\n", " 'variable_long_name': ['Daily Maximum Near-Surface Air Temperature'],\n", " 'source': 'https://pavics.ouranos.ca//twitcher/ows/proxy/thredds/catalog.xml',\n", " 'datetime_min': '2006-01-16T12:00:00Z',\n", " 'score': 1.0,\n", " 'variable_max': [1],\n", " 'units': ['K'],\n", " 'resourcename': 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'type': 'File',\n", " 'catalog_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/flyingpigeon/cmip5/catalog.xml?dataset=birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'experiment': 'rcp45',\n", " 'last_modified': '2018-12-21T15:13:38Z',\n", " 'content_type': 'application/netcdf',\n", " '_version_': 1658705594373111809,\n", " 'variable': ['tasmax'],\n", " 'url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc',\n", " 'project': 'CMIP5',\n", " 'institute': 'MPI-M',\n", " 'frequency': 'mon',\n", " 'model': 'MPI-ESM-MR',\n", " 'latest': True,\n", " 'fileserver_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc'}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# NBVAL_IGNORE_OUTPUT\n", "\n", "searchfile = [\n", " f\n", " for f in result[\"response\"][\"docs\"]\n", " if f[\"resourcename\"]\n", " == \"birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\"\n", "]\n", "searchfile[0]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "abstract: birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\n", "catalog_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/flyingpigeon/cmip5/catalog.xml?dataset=birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\n", "category: thredds\n", "cf_standard_name: ['air_temperature']\n", "content_type: application/netcdf\n", "dataset_id: testdata.flyingpigeon.cmip5\n", "datetime_max: 2006-12-16T12:00:00Z\n", "datetime_min: 2006-01-16T12:00:00Z\n", "experiment: rcp45\n", "fileserver_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\n", "frequency: mon\n", "institute: MPI-M\n", "keywords: ['CMIP5', 'MPI-ESM-MR', 'MPI-M', 'air_temperature', 'application/netcdf', 'mon', 'rcp45', 'tasmax', 'thredds']\n", "latest: True\n", "model: MPI-ESM-MR\n", "opendap_url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\n", "project: CMIP5\n", "replica: False\n", "resourcename: birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\n", "score: 1.0\n", "subject: Birdhouse Thredds Catalog\n", "title: tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\n", "type: File\n", "units: ['K']\n", "url: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\n", "variable: ['tasmax']\n", "variable_long_name: ['Daily Maximum Near-Surface Air Temperature']\n", "variable_max: [1]\n", "variable_min: [0]\n", "variable_palette: ['default']\n", "wms_url: https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc\n" ] } ], "source": [ "for k in sorted(searchfile[0].keys()):\n", " # remove attributes that changes between different servers for the same file\n", " if k not in [\"id\", \"last_modified\", \"_version_\", \"source\"]:\n", " value = searchfile[0][k]\n", " valuesorted = (\n", " sorted(value)\n", " if (\n", " isinstance(value, collections.abc.Iterable)\n", " and not isinstance(value, str)\n", " )\n", " else value\n", " )\n", " print(f\"{k}: {valuesorted}\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" } }, "nbformat": 4, "nbformat_minor": 4 }