# Earth System Grid Federation Data Access

The Earth System Grid Federation (ESGF) has a search API that can be used by clients to query catalog content matching constraints (see [API documentation](https://github.com/ESGF/esgf.github.io/wiki/ESGF_Search_REST_API)). It's possible to send requests directly to the API using a simple function (see [example](https://esgf2.github.io/cmip6-cookbook/notebooks/foundations/esgf-opendap.html)), but here we'll use a python client named `pyesgf` to interact with the search API and get data from the ESGF THREDDS servers. The following shows examples of typical queries for data. 

If a login username and credentials are required, follow these [instructions](https://esgf-pyclient.readthedocs.io/en/latest/notebooks/examples/logon.html).

In [1]:
# NBVAL_IGNORE_OUTPUT

from pyesgf.search import SearchConnection

# Create a connection for search on ESGF nodes. Note that setting `distrib=True` can lead to weird failures.
conn = SearchConnection("https://esgf-node.llnl.gov/esg-search/", distrib=False)

# Launch a search query.
# Here we're looking for any variable related to humidity within the CMIP6 SSP2-4.5 experiment.
# Results will be stored in a dictionary with keys defined by the `facets` argument.
ctx = conn.new_context(
    project="CMIP6",
    experiment_id="ssp245",
    query="humidity",
    facets="variable_id,source_id",
)

print("Number of results: ", ctx.hit_count)
print("Variables related to humidity: ")
ctx.facet_counts["variable_id"]

Number of results:  10114
Variables related to humidity: 


{'tnhusscpbl': 157,
 'tnhusscp': 70,
 'tnhuspbl': 70,
 'tnhusmp': 166,
 'tnhusd': 34,
 'tnhusc': 218,
 'tnhusa': 174,
 'tnhus': 76,
 'hussLut': 34,
 'huss': 1918,
 'hus850': 164,
 'hus': 2294,
 'hursmin': 642,
 'hursmax': 627,
 'hurs': 1918,
 'hur': 1552}

In [2]:
# NBVAL_IGNORE_OUTPUT

# Now let's look for simulations that have the `hurs` variable and pick the first member.
ctx.constrain(variable_id="hurs", ensemble="r1i1p1f1")
ctx.facet_counts["source_id"]

{'UKESM1-0-LL': 123,
 'TaiESM1': 20,
 'NorESM2-MM': 36,
 'NorESM2-LM': 186,
 'NESM3': 21,
 'MRI-ESM2-0': 663,
 'MPI-ESM1-2-LR': 799,
 'MPI-ESM1-2-HR': 51,
 'MIROC6': 1989,
 'MIROC-ES2L': 1861,
 'MCM-UA-1-0': 8,
 'KIOST-ESM': 30,
 'KACE-1-0-G': 68,
 'IPSL-CM6A-LR': 198,
 'INM-CM5-0': 26,
 'INM-CM4-8': 26,
 'IITM-ESM': 17,
 'HadGEM3-GC31-LL': 110,
 'GISS-E2-2-G': 40,
 'GISS-E2-1-H': 84,
 'GISS-E2-1-G-CC': 4,
 'GISS-E2-1-G': 325,
 'GFDL-ESM4': 20,
 'GFDL-CM4': 34,
 'FIO-ESM-2-0': 16,
 'FGOALS-g3': 88,
 'FGOALS-f3-L': 12,
 'EC-Earth3-Veg-LR': 37,
 'EC-Earth3-Veg': 100,
 'EC-Earth3-CC': 64,
 'EC-Earth3': 817,
 'E3SM-1-1': 22,
 'CanESM5-CanOE': 24,
 'CanESM5-1': 30,
 'CanESM5': 1033,
 'CNRM-ESM2-1': 108,
 'CNRM-CM6-1-HR': 19,
 'CNRM-CM6-1': 78,
 'CMCC-ESM2': 20,
 'CMCC-CM2-SR5': 18,
 'CIESM': 9,
 'CESM2-WACCM': 168,
 'CESM2': 184,
 'CAS-ESM2-0': 20,
 'CAMS-CSM1-0': 6,
 'BCC-CSM2-MR': 20,
 'AWI-CM-1-1-MR': 19,
 'ACCESS-ESM1-5': 406,
 'ACCESS-CM2': 57}

In [3]:
# We can now refine the search and get datasets corresponding within our search context
results = ctx.constrain(source_id="CanESM5").search()
r = results[0]
r.dataset_id

'CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r14i1p2f1.Amon.hus.gn.v20190429|crd-esgf-drc.ec.gc.ca'

In [4]:
# To get file download links, there's an extra step
file_ctx = r.file_context()
file_ctx.facets = "*"
files = file_ctx.search()
[f.download_url for f in files]

['http://crd-esgf-drc.ec.gc.ca/thredds/fileServer/esgD_dataroot/AR6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp245/r14i1p2f1/Amon/hus/gn/v20190429/hus_Amon_CanESM5_ssp245_r14i1p2f1_gn_201501-210012.nc']

In [5]:
# Instead of a download URL, we can also get OPeNDAP links.
urls = [f.opendap_url for f in files]
print(urls)

# It's sometimes possible to request aggregations of multiple netCDF into one OPeNDAP link,
# but this option is often unavailable.
agg_ctx = r.aggregation_context()
agg_ctx.facets = "*"
agg = agg_ctx.search()[0]
print(agg.opendap_url)

['http://crd-esgf-drc.ec.gc.ca/thredds/dodsC/esgD_dataroot/AR6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp245/r14i1p2f1/Amon/hus/gn/v20190429/hus_Amon_CanESM5_ssp245_r14i1p2f1_gn_201501-210012.nc']
None


In [6]:
# Open the opendap link with xarray
import xarray as xr

ds = xr.open_mfdataset(urls)
ds

Unnamed: 0,Array,Chunk
Bytes,16.12 kiB,16.12 kiB
Shape,"(1032, 2)","(1032, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 16.12 kiB 16.12 kiB Shape (1032, 2) (1032, 2) Dask graph 1 chunks in 2 graph layers Data type object numpy.ndarray",2  1032,

Unnamed: 0,Array,Chunk
Bytes,16.12 kiB,16.12 kiB
Shape,"(1032, 2)","(1032, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.00 kiB,1.00 kiB
Shape,"(64, 2)","(64, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.00 kiB 1.00 kiB Shape (64, 2) (64, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  64,

Unnamed: 0,Array,Chunk
Bytes,1.00 kiB,1.00 kiB
Shape,"(64, 2)","(64, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.00 kiB,2.00 kiB
Shape,"(128, 2)","(128, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.00 kiB 2.00 kiB Shape (128, 2) (128, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  128,

Unnamed: 0,Array,Chunk
Bytes,2.00 kiB,2.00 kiB
Shape,"(128, 2)","(128, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,612.75 MiB,612.75 MiB
Shape,"(1032, 19, 64, 128)","(1032, 19, 64, 128)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 612.75 MiB 612.75 MiB Shape (1032, 19, 64, 128) (1032, 19, 64, 128) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",1032  1  128  64  19,

Unnamed: 0,Array,Chunk
Bytes,612.75 MiB,612.75 MiB
Shape,"(1032, 19, 64, 128)","(1032, 19, 64, 128)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
