Earth System Grid Federation Data Access¶
The Earth System Grid Federation (ESGF) has a search API that can be used by clients to query catalog content matching constraints (see API documentation). It’s possible to send requests directly to the API using a simple function (see example), but here we’ll use a python client named pyesgf
to interact with the search API and get data from the ESGF THREDDS servers. The following shows examples of typical queries for data.
If a login username and credentials are required, follow these instructions.
# NBVAL_IGNORE_OUTPUT
from pyesgf.search import SearchConnection
# Create a connection for search on ESGF nodes. Note that setting `distrib=True` can lead to weird failures.
conn = SearchConnection("https://esgf-node.llnl.gov/esg-search/", distrib=False)
# Launch a search query.
# Here we're looking for any variable related to humidity within the CMIP6 SSP2-4.5 experiment.
# Results will be stored in a dictionary with keys defined by the `facets` argument.
ctx = conn.new_context(
project="CMIP6",
experiment_id="ssp245",
query="humidity",
facets="variable_id,source_id",
)
print("Number of results: ", ctx.hit_count)
print("Variables related to humidity: ")
ctx.facet_counts["variable_id"]
Number of results: 10754
Variables related to humidity:
{'tnhusscpbl': 163,
'tnhusscp': 70,
'tnhuspbl': 70,
'tnhusmp': 172,
'tnhusd': 34,
'tnhusc': 224,
'tnhusa': 180,
'tnhus': 76,
'hussLut': 36,
'huss': 2020,
'hus850': 190,
'hus': 2453,
'hursmin': 699,
'hursmax': 682,
'hurs': 2012,
'hur': 1673}
# NBVAL_IGNORE_OUTPUT
# Now let's look for simulations that have the `hurs` variable and pick the first member.
ctx.constrain(variable_id="hurs", ensemble="r1i1p1f1")
ctx.facet_counts["source_id"]
{'UKESM1-0-LL': 126,
'TaiESM1': 20,
'NorESM2-MM': 36,
'NorESM2-LM': 188,
'NESM3': 21,
'MRI-ESM2-0': 663,
'MPI-ESM1-2-LR': 1022,
'MPI-ESM1-2-HR': 51,
'MIROC6': 1989,
'MIROC-ES2L': 1861,
'MIROC-ES2H': 84,
'MCM-UA-1-0': 8,
'KIOST-ESM': 30,
'KACE-1-0-G': 68,
'IPSL-CM6A-LR': 236,
'INM-CM5-0': 26,
'INM-CM4-8': 26,
'IITM-ESM': 17,
'HadGEM3-GC31-LL': 110,
'GISS-E2-2-G': 40,
'GISS-E2-1-H': 84,
'GISS-E2-1-G-CC': 8,
'GISS-E2-1-G': 409,
'GFDL-ESM4': 20,
'GFDL-CM4': 34,
'FIO-ESM-2-0': 16,
'FGOALS-g3': 88,
'FGOALS-f3-L': 12,
'EC-Earth3-Veg-LR': 37,
'EC-Earth3-Veg': 100,
'EC-Earth3-CC': 64,
'EC-Earth3-AerChem': 2,
'EC-Earth3': 817,
'E3SM-1-1': 22,
'CanESM5-CanOE': 24,
'CanESM5-1': 208,
'CanESM5': 1033,
'CNRM-ESM2-1': 108,
'CNRM-CM6-1-HR': 19,
'CNRM-CM6-1': 78,
'CMCC-ESM2': 20,
'CMCC-CM2-SR5': 18,
'CIESM': 9,
'CESM2-WACCM': 168,
'CESM2': 184,
'CAS-ESM2-0': 20,
'CAMS-CSM1-0': 6,
'BCC-CSM2-MR': 20,
'AWI-CM-1-1-MR': 19,
'ACCESS-ESM1-5': 408,
'ACCESS-CM2': 77}
# We can now refine the search and get datasets corresponding within our search context
results = ctx.constrain(source_id="CanESM5").search()
r = results[0]
r.dataset_id
'CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r14i1p2f1.Amon.hus.gn.v20190429|crd-esgf-drc.ec.gc.ca'
# To get file download links, there's an extra step
file_ctx = r.file_context()
file_ctx.facets = "*"
files = file_ctx.search()
[f.download_url for f in files]
['http://crd-esgf-drc.ec.gc.ca/thredds/fileServer/esgD_dataroot/AR6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp245/r14i1p2f1/Amon/hus/gn/v20190429/hus_Amon_CanESM5_ssp245_r14i1p2f1_gn_201501-210012.nc']
# Instead of a download URL, we can also get OPeNDAP links.
urls = [f.opendap_url for f in files]
print(urls)
# It's sometimes possible to request aggregations of multiple netCDF into one OPeNDAP link,
# but this option is often unavailable.
agg_ctx = r.aggregation_context()
agg_ctx.facets = "*"
agg = agg_ctx.search()[0]
print(agg.opendap_url)
['http://crd-esgf-drc.ec.gc.ca/thredds/dodsC/esgD_dataroot/AR6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp245/r14i1p2f1/Amon/hus/gn/v20190429/hus_Amon_CanESM5_ssp245_r14i1p2f1_gn_201501-210012.nc']
None
<xarray.Dataset> Size: 643MB Dimensions: (time: 1032, bnds: 2, plev: 19, lat: 64, lon: 128) Coordinates: * time (time) object 8kB 2015-01-16 12:00:00 ... 2100-12-16 12:00:00 * plev (plev) float64 152B 1e+05 9.25e+04 8.5e+04 ... 1e+03 500.0 100.0 * lat (lat) float64 512B -87.86 -85.1 -82.31 ... 82.31 85.1 87.86 * lon (lon) float64 1kB 0.0 2.812 5.625 8.438 ... 351.6 354.4 357.2 Dimensions without coordinates: bnds Data variables: time_bnds (time, bnds) object 17kB dask.array<chunksize=(1032, 2), meta=np.ndarray> lat_bnds (lat, bnds) float64 1kB dask.array<chunksize=(64, 2), meta=np.ndarray> lon_bnds (lon, bnds) float64 2kB dask.array<chunksize=(128, 2), meta=np.ndarray> hus (time, plev, lat, lon) float32 643MB dask.array<chunksize=(1032, 19, 64, 128), meta=np.ndarray> Attributes: (12/54) CCCma_model_hash: fc4bb7db954c862d023b546e19aec6c588bc0552 CCCma_parent_runid: p2-his14 CCCma_pycmor_hash: 26c970628162d607fffd14254956ebc6dd3b6f49 CCCma_runid: p2-s4514 Conventions: CF-1.7 CMIP-6.2 YMDH_branch_time_in_child: 2015:01:01:00 ... ... variable_id: hus variant_label: r14i1p2f1 version: v20190429 license: CMIP6 model data produced by The Governm... cmor_version: 3.5.0 DODS_EXTRA.Unlimited_Dimension: time