Earth System Grid Federation Data Access

The Earth System Grid Federation (ESGF) has a search API that can be used by clients to query catalog content matching constraints (see API documentation). It’s possible to send requests directly to the API using a simple function (see example), but here we’ll use a python client named pyesgf to interact with the search API and get data from the ESGF THREDDS servers. The following shows examples of typical queries for data.

If a login username and credentials are required, follow these instructions.

# NBVAL_IGNORE_OUTPUT

from pyesgf.search import SearchConnection

# Create a connection for search on ESGF nodes. Note that setting `distrib=True` can lead to weird failures.
conn = SearchConnection("https://esgf-node.llnl.gov/esg-search/", distrib=False)

# Launch a search query.
# Here we're looking for any variable related to humidity within the CMIP6 SSP2-4.5 experiment.
# Results will be stored in a dictionary with keys defined by the `facets` argument.
ctx = conn.new_context(
    project="CMIP6",
    experiment_id="ssp245",
    query="humidity",
    facets="variable_id,source_id",
)

print("Number of results: ", ctx.hit_count)
print("Variables related to humidity: ")
ctx.facet_counts["variable_id"]
Number of results:  10627
Variables related to humidity: 
{'tnhusscpbl': 163,
 'tnhusscp': 70,
 'tnhuspbl': 70,
 'tnhusmp': 172,
 'tnhusd': 34,
 'tnhusc': 224,
 'tnhusa': 180,
 'tnhus': 76,
 'hussLut': 36,
 'huss': 1988,
 'hus850': 190,
 'hus': 2420,
 'hursmin': 694,
 'hursmax': 677,
 'hurs': 1981,
 'hur': 1652}
# NBVAL_IGNORE_OUTPUT

# Now let's look for simulations that have the `hurs` variable and pick the first member.
ctx.constrain(variable_id="hurs", ensemble="r1i1p1f1")
ctx.facet_counts["source_id"]
{'UKESM1-0-LL': 126,
 'TaiESM1': 20,
 'NorESM2-MM': 36,
 'NorESM2-LM': 188,
 'NESM3': 21,
 'MRI-ESM2-0': 663,
 'MPI-ESM1-2-LR': 999,
 'MPI-ESM1-2-HR': 51,
 'MIROC6': 1989,
 'MIROC-ES2L': 1861,
 'MIROC-ES2H': 84,
 'MCM-UA-1-0': 8,
 'KIOST-ESM': 30,
 'KACE-1-0-G': 68,
 'IPSL-CM6A-LR': 236,
 'INM-CM5-0': 26,
 'INM-CM4-8': 26,
 'IITM-ESM': 17,
 'HadGEM3-GC31-LL': 110,
 'GISS-E2-2-G': 40,
 'GISS-E2-1-H': 84,
 'GISS-E2-1-G-CC': 8,
 'GISS-E2-1-G': 325,
 'GFDL-ESM4': 20,
 'GFDL-CM4': 34,
 'FIO-ESM-2-0': 16,
 'FGOALS-g3': 88,
 'FGOALS-f3-L': 12,
 'EC-Earth3-Veg-LR': 37,
 'EC-Earth3-Veg': 100,
 'EC-Earth3-CC': 64,
 'EC-Earth3-AerChem': 2,
 'EC-Earth3': 817,
 'E3SM-1-1': 22,
 'CanESM5-CanOE': 24,
 'CanESM5-1': 208,
 'CanESM5': 1033,
 'CNRM-ESM2-1': 108,
 'CNRM-CM6-1-HR': 19,
 'CNRM-CM6-1': 78,
 'CMCC-ESM2': 20,
 'CMCC-CM2-SR5': 18,
 'CIESM': 9,
 'CESM2-WACCM': 168,
 'CESM2': 184,
 'CAS-ESM2-0': 20,
 'CAMS-CSM1-0': 6,
 'BCC-CSM2-MR': 20,
 'AWI-CM-1-1-MR': 19,
 'ACCESS-ESM1-5': 408,
 'ACCESS-CM2': 57}
# We can now refine the search and get datasets corresponding within our search context
results = ctx.constrain(source_id="CanESM5").search()
r = results[0]
r.dataset_id
'CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r14i1p2f1.Amon.hus.gn.v20190429|crd-esgf-drc.ec.gc.ca'
# To get file download links, there's an extra step
file_ctx = r.file_context()
file_ctx.facets = "*"
files = file_ctx.search()
[f.download_url for f in files]
['http://crd-esgf-drc.ec.gc.ca/thredds/fileServer/esgD_dataroot/AR6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp245/r14i1p2f1/Amon/hus/gn/v20190429/hus_Amon_CanESM5_ssp245_r14i1p2f1_gn_201501-210012.nc']
# Instead of a download URL, we can also get OPeNDAP links.
urls = [f.opendap_url for f in files]
print(urls)

# It's sometimes possible to request aggregations of multiple netCDF into one OPeNDAP link,
# but this option is often unavailable.
agg_ctx = r.aggregation_context()
agg_ctx.facets = "*"
agg = agg_ctx.search()[0]
print(agg.opendap_url)
['http://crd-esgf-drc.ec.gc.ca/thredds/dodsC/esgD_dataroot/AR6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp245/r14i1p2f1/Amon/hus/gn/v20190429/hus_Amon_CanESM5_ssp245_r14i1p2f1_gn_201501-210012.nc']
None
# Open the opendap link with xarray
import xarray as xr

ds = xr.open_mfdataset(urls)
ds
<xarray.Dataset>
Dimensions:    (time: 1032, bnds: 2, plev: 19, lat: 64, lon: 128)
Coordinates:
  * time       (time) object 2015-01-16 12:00:00 ... 2100-12-16 12:00:00
  * plev       (plev) float64 1e+05 9.25e+04 8.5e+04 7e+04 ... 1e+03 500.0 100.0
  * lat        (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
  * lon        (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object dask.array<chunksize=(1032, 2), meta=np.ndarray>
    lat_bnds   (lat, bnds) float64 dask.array<chunksize=(64, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 dask.array<chunksize=(128, 2), meta=np.ndarray>
    hus        (time, plev, lat, lon) float32 dask.array<chunksize=(1032, 19, 64, 128), meta=np.ndarray>
Attributes: (12/54)
    CCCma_model_hash:                fc4bb7db954c862d023b546e19aec6c588bc0552
    CCCma_parent_runid:              p2-his14
    CCCma_pycmor_hash:               26c970628162d607fffd14254956ebc6dd3b6f49
    CCCma_runid:                     p2-s4514
    Conventions:                     CF-1.7 CMIP-6.2
    YMDH_branch_time_in_child:       2015:01:01:00
    ...                              ...
    variable_id:                     hus
    variant_label:                   r14i1p2f1
    version:                         v20190429
    license:                         CMIP6 model data produced by The Governm...
    cmor_version:                    3.5.0
    DODS_EXTRA.Unlimited_Dimension:  time