Working with the ECCC GeoAPI to access weather station data

Environment and Climate Change Canada (ECCC) hosts a data server compatible with the GeoAPI standard. This notebook shows how to send requests for daily climate station data and display the results.

Climate stations

The server holds different collections, and requests are made to a particular collection. Here we’ll start with the climate-station collection, which holds metadata about available stations, but no actual meteorological data. Useful queryables fields in this collection include DLY_FIRST_DATE and DLY_LAST_DATE, ENG_PROV_NAME, LATITUDE, LONGITUDE and ELEVATION and STATION_NAME, among many others.

Creating a request to the server for data

Let’s start by showing a map of all available stations locations in New-Brunswick. To do so, we first need to compose a URL request. The request includes the address of the server, the collection, then a query to filter results.

import os

os.environ["USE_PYGEOS"] = "0"  # force use Shapely with GeoPandas

import urllib

import geopandas as gpd
from urlpath import URL

# Compose the request
host = URL("https://api.weather.gc.ca")
climate_stations = host / "collections" / "climate-stations" / "items"
url = climate_stations.with_query({"ENG_PROV_NAME": "NOVA-SCOTIA"})
print(url)

# Send the request to the server
resp = url.get()
resp
https://api.weather.gc.ca/collections/climate-stations/items?ENG_PROV_NAME=NOVA-SCOTIA
<Response [200]>

The response from the server is a Response class instance. What we’re interested in is the content of this response, which in this case is a geoJSON file.

# NBVAL_IGNORE_OUTPUT

resp.content[:100]
b'{"type":"FeatureCollection","features":[{"type":"Feature","properties":{"STN_ID":6289,"STATION_NAME"'

We’ll open the geoJSON using geopandas. We have a few options to do this:

  • Load the response’ content using json.load, then create GeoDataFrame using the from_features class method;

  • Save the response content to a file on disk, then open using geopandas.read_file;

  • Save the response in an in-memory file using StringIO;

  • Call geopandas.read_file(url) to let geopandas handle the data download.

Here we’ll use the last option, as it’s the simplest. Note that the first method ignores the feature id, which seems to create problems with visualization with folium below.

# NBVAL_IGNORE_OUTPUT

# The first approach would look like this:
# import json
# stations = gpd.GeoDataFrame.from_features(json.loads(resp.content))

with urllib.request.urlopen(url=str(url)) as req:
    stations = gpd.read_file(filename=req, engine="pyogrio")
stations.head()
ERROR 1: PROJ: proj_create_from_database: Open of /home/docs/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/share/proj failed
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
Cell In[3], line 8
      1 # NBVAL_IGNORE_OUTPUT
      2 
      3 # The first approach would look like this:
      4 # import json
      5 # stations = gpd.GeoDataFrame.from_features(json.loads(resp.content))
      7 with urllib.request.urlopen(url=str(url)) as req:
----> 8     stations = gpd.read_file(filename=req, engine="pyogrio")
      9 stations.head()

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/geopandas/io/file.py:279, in _read_file(filename, bbox, mask, rows, engine, **kwargs)
    276             from_bytes = True
    278 if engine == "pyogrio":
--> 279     return _read_file_pyogrio(filename, bbox=bbox, mask=mask, rows=rows, **kwargs)
    281 elif engine == "fiona":
    282     if pd.api.types.is_file_like(filename):

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/geopandas/io/file.py:445, in _read_file_pyogrio(path_or_bytes, bbox, mask, rows, **kwargs)
    442     kwargs["read_geometry"] = False
    444 # TODO: if bbox is not None, check its CRS vs the CRS of the file
--> 445 return pyogrio.read_dataframe(path_or_bytes, bbox=bbox, **kwargs)

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/pyogrio/geopandas.py:252, in read_dataframe(path_or_buffer, layer, encoding, columns, read_geometry, force_2d, skip_features, max_features, where, bbox, mask, fids, sql, sql_dialect, fid_as_index, use_arrow, arrow_to_pandas_kwargs, **kwargs)
    247 if not use_arrow:
    248     # For arrow, datetimes are read as is.
    249     # For numpy IO, datetimes are read as string values to preserve timezone info
    250     # as numpy does not directly support timezones.
    251     kwargs["datetime_as_string"] = True
--> 252 result = read_func(
    253     path_or_buffer,
    254     layer=layer,
    255     encoding=encoding,
    256     columns=columns,
    257     read_geometry=read_geometry,
    258     force_2d=gdal_force_2d,
    259     skip_features=skip_features,
    260     max_features=max_features,
    261     where=where,
    262     bbox=bbox,
    263     mask=mask,
    264     fids=fids,
    265     sql=sql,
    266     sql_dialect=sql_dialect,
    267     return_fids=fid_as_index,
    268     **kwargs,
    269 )
    271 if use_arrow:
    272     meta, table = result

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/pyogrio/raw.py:197, in read(path_or_buffer, layer, encoding, columns, read_geometry, force_2d, skip_features, max_features, where, bbox, mask, fids, sql, sql_dialect, return_fids, datetime_as_string, **kwargs)
     56 """Read OGR data source into numpy arrays.
     57 
     58 IMPORTANT: non-linear geometry types (e.g., MultiSurface) are converted
   (...)
    191 
    192 """
    194 dataset_kwargs = _preprocess_options_key_value(kwargs) if kwargs else {}
    196 return ogr_read(
--> 197     get_vsi_path_or_buffer(path_or_buffer),
    198     layer=layer,
    199     encoding=encoding,
    200     columns=columns,
    201     read_geometry=read_geometry,
    202     force_2d=force_2d,
    203     skip_features=skip_features,
    204     max_features=max_features or 0,
    205     where=where,
    206     bbox=bbox,
    207     mask=_mask_to_wkb(mask),
    208     fids=fids,
    209     sql=sql,
    210     sql_dialect=sql_dialect,
    211     return_fids=return_fids,
    212     dataset_kwargs=dataset_kwargs,
    213     datetime_as_string=datetime_as_string,
    214 )

File ~/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/pyogrio/util.py:42, in get_vsi_path_or_buffer(path_or_buffer)
     40     # rewind buffer if possible so that subsequent operations do not need to rewind
     41     if hasattr(path_or_buffer, "seek"):
---> 42         path_or_buffer.seek(0)
     44     return bytes_buffer
     46 return vsi_path(str(path_or_buffer))

UnsupportedOperation: seek

Filter stations

Now let’s say we want to filter the stations that were in operations for at least 50 years. What we’ll do is create a new column n_days and filter on it.

# NBVAL_IGNORE_OUTPUT

import pandas as pd

# Create a datetime.Timedelta object from the subtraction of two dates.
delta = pd.to_datetime(stations["DLY_LAST_DATE"]) - pd.to_datetime(
    stations["DLY_FIRST_DATE"]
)

# Get the number of days in the time delta
stations["n_days"] = delta.apply(lambda x: x.days)

# Compute condition
over_50 = stations["n_days"] > 50 * 365.25

# Index the data frame using the condition
select = stations[over_50]
select.head()
id STN_ID STATION_NAME PROV_STATE_TERR_CODE ENG_PROV_NAME FRE_PROV_NAME COUNTRY LATITUDE LONGITUDE TIMEZONE ... HLY_LAST_DATE DLY_FIRST_DATE DLY_LAST_DATE MLY_FIRST_DATE MLY_LAST_DATE HAS_MONTHLY_SUMMARY HAS_NORMALS_DATA HAS_HOURLY_DATA geometry n_days
2 8203400 6399 MALAY FALLS NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 445900000 -622900000 AST ... 2000-08-31 23:00:00 1950-02-01 2000-08-31 1950-01-01 2000-08-01 Y N N POINT (-62.48333 44.98333) 18474
18 8205090 6465 SHEARWATER A NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 443800000 -633000000 AST ... 2023-01-20 08:00:00 1944-02-01 2007-12-12 1944-01-01 2007-03-01 Y Y Y POINT (-63.50000 44.63333) 23325
19 8205698 6485 SYDNEY NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 460900000 -601200000 AST ... NaT 1870-01-01 1941-03-31 1870-01-01 1941-12-01 Y N N POINT (-60.20000 46.15000) 26021
23 8206300 6506 WHITEHEAD NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 451300000 -611100000 AST ... NaT 1883-12-01 1960-06-30 1883-01-01 1960-12-01 Y N N POINT (-61.18333 45.21667) 27970
24 8206440 6513 WOLFVILLE NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 450600000 -642200000 AST ... NaT 1870-09-01 1949-09-30 1870-01-01 1949-12-01 Y N N POINT (-64.36667 45.10000) 28883

5 rows × 35 columns

Map the data

We can then simply map the locations of station with at least 50 years of data using the explore method. This will display an interactive base map and overlay the station locations, where on a station marker will display this station’s information.

On top of this map, we’ll add controls to draw a rectangle. To use the drawing tool, click on the square on the left hand side menu, and the click and drag to draw a rectangle over the area of interest. Once that’s done, click on the Export button on the right of the map. This will download a file called data.geojson

from folium.plugins import Draw

# Add control to draw a rectangle, and an export button.
draw_control = Draw(
    draw_options={
        "polyline": False,
        "poly": False,
        "circle": False,
        "polygon": False,
        "marker": False,
        "circlemarker": False,
        "rectangle": True,
    },
    export=True,
)

# The map library Folium chokes on columns including time stamps, so we first select the data to plot.
m = select[["geometry", "n_days"]].explore("n_days")
draw_control.add_to(m)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Filter stations using bounding box

Next, we’ll use the bounding box drawn on the map to select a subset of stations. We first open the data.geojson file downloaded to disk, create a shapely object and use it filter stations.

# NBVAL_IGNORE_OUTPUT

# Adjust directory if running this locally.
# rect = gpd.read_file("~/Downloads/data.geojson")

# Here we're using an existing file so the notebook runs without user interaction.
rect = gpd.read_file(filename="./data.geojson", engine="pyogrio")

# Filter stations DataFrame using bbox
inbox = select.within(rect.loc[0].geometry)

print("Number of stations within subregion: ", sum(inbox))
sub_select = select[inbox]
sub_select.head()
Number of stations within subregion:  9
id STN_ID STATION_NAME PROV_STATE_TERR_CODE ENG_PROV_NAME FRE_PROV_NAME COUNTRY LATITUDE LONGITUDE TIMEZONE ... HLY_LAST_DATE DLY_FIRST_DATE DLY_LAST_DATE MLY_FIRST_DATE MLY_LAST_DATE HAS_MONTHLY_SUMMARY HAS_NORMALS_DATA HAS_HOURLY_DATA geometry n_days
19 8205698 6485 SYDNEY NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 460900000 -601200000 AST ... NaT 1870-01-01 1941-03-31 1870-01-01 1941-12-01 Y N N POINT (-60.20000 46.15000) 26021
23 8206300 6506 WHITEHEAD NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 451300000 -611100000 AST ... NaT 1883-12-01 1960-06-30 1883-01-01 1960-12-01 Y N N POINT (-61.18333 45.21667) 27970
45 8201410 6336 DEMING NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 451259007 -611040090 AST ... NaT 1956-10-01 2011-12-31 1956-01-01 2006-02-01 Y Y N POINT (-61.17780 45.21639) 20179
134 8205600 6481 STILLWATER NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 451100000 -620000000 AST ... NaT 1915-12-01 1979-10-31 1915-01-01 1979-12-01 Y N N POINT (-62.00000 45.18333) 23345
146 8201000 6329 COLLEGEVILLE NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 452900000 -620100000 AST ... NaT 1916-06-01 2016-09-30 1916-01-01 2006-02-01 Y Y N POINT (-62.01667 45.48333) 36646

5 rows × 35 columns

Request meteorological data

Now we’ll make a request for actual meteorological data from the stations filtered above. For this, we’ll use the Daily Climate Observations collection (climate-daily). Here, we’re picking just one station but we could easily loop on each station.

# NBVAL_IGNORE_OUTPUT

coll = host / "collections" / "climate-daily" / "items"
station_id = "8201410"

# Restricting the number of entries returned to keep things fast.
url = str(coll.with_query({"CLIMATE_IDENTIFIER": station_id, "limit": 365}))
print("Request: ", url)
with urllib.request.urlopen(url=str(url)) as req:
    data = gpd.read_file(filename=req, engine="pyogrio")
data.head()
Request:  https://api.weather.gc.ca/collections/climate-daily/items?CLIMATE_IDENTIFIER=8201410&limit=365
id STATION_NAME CLIMATE_IDENTIFIER ID LOCAL_DATE PROVINCE_CODE LOCAL_YEAR LOCAL_MONTH LOCAL_DAY MEAN_TEMPERATURE ... SPEED_MAX_GUST_FLAG COOLING_DEGREE_DAYS COOLING_DEGREE_DAYS_FLAG HEATING_DEGREE_DAYS HEATING_DEGREE_DAYS_FLAG MIN_REL_HUMIDITY MIN_REL_HUMIDITY_FLAG MAX_REL_HUMIDITY MAX_REL_HUMIDITY_FLAG geometry
0 8201410.1970.5.12 DEMING 8201410 8201410.1970.5.12 1970-05-12 NS 1970 5 12 5.0 ... None 0.0 None 13.0 None None None None None POINT (-61.17780 45.21639)
1 8201410.1970.5.13 DEMING 8201410 8201410.1970.5.13 1970-05-13 NS 1970 5 13 5.8 ... None 0.0 None 12.2 None None None None None POINT (-61.17780 45.21639)
2 8201410.1970.5.14 DEMING 8201410 8201410.1970.5.14 1970-05-14 NS 1970 5 14 6.1 ... None 0.0 None 11.9 None None None None None POINT (-61.17780 45.21639)
3 8201410.1970.5.15 DEMING 8201410 8201410.1970.5.15 1970-05-15 NS 1970 5 15 7.8 ... None 0.0 None 10.2 None None None None None POINT (-61.17780 45.21639)
4 8201410.1970.5.16 DEMING 8201410 8201410.1970.5.16 1970-05-16 NS 1970 5 16 3.9 ... None 0.0 None 14.1 None None None None None POINT (-61.17780 45.21639)

5 rows × 36 columns

We can also send a request for data inside a bounding box at a specific date.

bbox = rect.iloc[0].geometry.bounds
print("Bounding box: ", bbox)
url = str(
    coll.with_query(
        {
            "bbox": str(bbox).strip("()"),
            "LOCAL_DATE": "2000-01-01 00:00:00",
            "limit": 100,
        }
    )
)
with urllib.request.urlopen(url=str(url)) as req:
    snapshot = gpd.read_file(filename=req, engine="pyogrio")
Bounding box:  (-62.186675, 44.78125, -59.123882, 47.53125)
import cartopy
from cartopy import crs as ccrs
from matplotlib import pyplot as plt

# Create map projection
proj = ccrs.PlateCarree()

# If using another projection, remember you'll need to reproject the snapshot's coordinates.
# snapshot.to_crs(proj, inplace=True)

# Create figure and axes
fig = plt.figure(figsize=(5, 3))
ax = fig.add_subplot(projection=proj)

# Set the map extent to the bounding box and draw the coastlines
ax.set_extent([bbox[0], bbox[2], bbox[1], bbox[3]])
ax.coastlines()

# Plot markers color-coded according to the temperature recorded.
ax = snapshot.plot(column="MEAN_TEMPERATURE", ax=ax, cmap=plt.cm.viridis, legend=True)

# Add a label to the colorbar
cax = ax.figure.axes[-1]
cax.set_ylabel("Mean temperature [°C]")
Text(448.07777777777784, 0.5, 'Mean temperature [°C]')
../_images/75c2bc04d5dd9b3c524fc9468ce834ad0d60a0d43be86e857ff2bf39298dfe0d.png

Another useful filter is on dates and times. Let’s say we only want data in a given period, we simply create a request with the datetime argument and a / separating the start and end dates. You may leave the start or end date open-ended using .. instead of a date time string.

url = str(
    coll.with_query(
        {
            "CLIMATE_IDENTIFIER": station_id,
            "datetime": "1990-01-01 00:00:00/1991-01-01 00:00:00",
        }
    )
)
print(url)
with urllib.request.urlopen(url=str(url)) as req:
    gdf = gpd.read_file(filename=req, engine="pyogrio")
https://api.weather.gc.ca/collections/climate-daily/items?CLIMATE_IDENTIFIER=8201410&datetime=1990-01-01+00%3A00%3A00%2F1991-01-01+00%3A00%3A00
# Convert the datetime string to a datetime object
gdf["LOCAL_DATE"] = pd.to_datetime(gdf["LOCAL_DATE"])

# Create a time series out of the column for mean temperature
ts = gdf.set_index("LOCAL_DATE")["MEAN_TEMPERATURE"]
# Plot the time series
ax = ts.plot()
ax.set_xlabel("Time")
ax.set_ylabel("Mean temperature [°C]")
ax.set_title(gdf.iloc[0]["STATION_NAME"])
plt.show()
../_images/78fa3ea13794b0e1e5d895f1b4193e58f70df6df2ed5453aa814bdbf7cbf6360.png

Adjusted and Homogenized Canadian Climate Data (AHCCD)

The Adjusted and Homogenized Canadian Climate Data (AHCCD) datasets from ECCC are climate station data adjusted to account for discontinuities in the record, such as instrument relocation. The collections related to these datasets are ahccd-stations for station metadata, ahccd-annual, ahccd-monthly and ahccd-seasonal for temporally aggregated time series, and ahccd-trends for trends computed on the data.

Now, unfortunately, the fields for these datasets are different from those of the climate stations… One strategy to find out what keywords are accepted is to make a query with no filter except for limit=1. Another is to go to the collection search page (click on the link printed below), and inspect the column names.

# NBVAL_IGNORE_OUTPUT

# The url to query station metadata - this should behave similarly as `climate-stations`
ahccd_stations = host / "collections" / "ahccd-stations" / "items"
url = ahccd_stations.with_query({"limit": 1})
print(ahccd_stations)
with urllib.request.urlopen(url=str(url)) as req:
    gpd.read_file(filename=req, engine="pyogrio")
https://api.weather.gc.ca//collections/ahccd-stations/items
id identifier__identifiant station_id__id_station station_name__nom_station measurement_type__type_mesure period__periode trend_value__valeur_tendance elevation__elevation province__province joined__rejoint year_range__annees start_date__date_debut end_date__date_fin geometry
0 2403053 2403053 2403053 PANGNIRTUNG A pressure_sea_level Ann None 22.6 NU 1 None 2014-12-01 2014-12-01 POINT (-65.70000 66.13000)

So if we want to see the stations in Yukon, we’d have to query with the province__province keyword… Now how do you know what code to use for provinces? One solution is to go again to the collection search page, zoom on the area of interest and click on the check box to “Only show items by map view”, then inspect the results.

# NBVAL_IGNORE_OUTPUT

url = ahccd_stations.with_query({"province__province": "YT"})
with urllib.request.urlopen(url=str(url)) as req:
    gpd.read_file(filename=req, engine="pyogrio")
id identifier__identifiant station_id__id_station station_name__nom_station measurement_type__type_mesure period__periode trend_value__valeur_tendance elevation__elevation province__province joined__rejoint year_range__annees start_date__date_debut end_date__date_fin geometry
0 2100300 2100300 2100300 CARMACKS snow Ann NaN 525.00 YT 0 None 2008-02-01 2008-02-01 POINT (-136.30000 62.10000)
1 2100460 2100460 2100460 DRURY CREEK snow Ann NaN 609.00 YT 0 None 2009-04-01 2009-04-01 POINT (-134.39000 62.20190)
2 2100631 2100631 2100631 HAINES JUNCTION snow Ann NaN 596.00 YT 1 None 2008-09-01 2008-09-01 POINT (-137.50530 60.74950)
3 2101081 2101081 2101081 SWIFT RIVER snow Ann NaN 891.00 YT 0 None 2008-02-01 2008-02-01 POINT (-131.18330 60.00000)
4 2100184 2100184 2100184 BURWASH temp_mean Ann NaN 80.00 YT 1 None 2019-12-01 2019-12-01 POINT (-139.00000 61.30000)
5 2100301 2100301 2100301 CARMACKS temp_mean Ann NaN 54.00 YT 1 None 2019-12-01 2019-12-01 POINT (-136.20000 62.10000)
6 2100LRP 2100LRP 2100LRP DAWSON temp_mean Ann 2.34 37.00 YT 1 1901-2019 2019-12-01 2019-12-01 POINT (-139.10000 64.00000)
7 2100518 2100518 2100518 FARO_(AUT) temp_mean Ann NaN 71.00 YT 1 None 2019-12-01 2019-12-01 POINT (-133.30000 62.20000)
8 2100630 2100630 2100630 HAINES_JUNCTION temp_mean Ann NaN 59.00 YT 0 None 2019-12-01 2019-12-01 POINT (-137.50000 60.70000)
9 2100660 2100660 2100660 IVVAVIK_NAT_PARK temp_mean Ann NaN 24.00 YT 0 None 2019-12-01 2019-12-01 POINT (-140.10000 69.10000)
10 2100682 2100682 2100682 KOMAKUK_BEACH temp_mean Ann NaN 1.00 YT 1 None 2018-06-01 2018-06-01 POINT (-140.20000 69.60000)
11 2100693 2100693 2100693 MACMILLAN_PASS temp_mean Ann NaN 137.00 YT 0 None 2019-12-01 2019-12-01 POINT (-130.00000 63.20000)
12 2100701 2100701 2100701 MAYO temp_mean Ann NaN 50.00 YT 1 None 2019-12-01 2019-12-01 POINT (-135.80000 63.60000)
13 2100805 2100805 2100805 OLD_CROW temp_mean Ann NaN 25.00 YT 1 None 2019-12-01 2019-12-01 POINT (-139.80000 67.50000)
14 2100880 2100880 2100880 PELLY_RANCH temp_mean Ann NaN 44.00 YT 0 None 2017-08-01 2017-08-01 POINT (-137.30000 62.80000)
15 2100941 2100941 2100941 ROSS_RIVER temp_mean Ann NaN 69.00 YT 1 None 2008-08-01 2008-08-01 POINT (-132.40000 61.90000)
16 2100950 2100950 2100950 SHINGLE_POINT temp_mean Ann NaN 4.00 YT 0 None 2019-12-01 2019-12-01 POINT (-137.20000 68.90000)
17 2101102 2101102 2101102 TESLIN temp_mean Ann NaN 70.00 YT 1 None 2019-12-01 2019-12-01 POINT (-132.70000 60.10000)
18 2101135 2101135 2101135 TUCHITUA temp_mean Ann NaN 72.00 YT 0 None 2014-09-01 2014-09-01 POINT (-129.20000 60.90000)
19 2101204 2101204 2101204 WATSON_LAKE temp_mean Ann 1.53 68.00 YT 1 1939-2019 2019-12-01 2019-12-01 POINT (-128.80000 60.10000)
20 2101310 2101310 2101310 WHITEHORSE temp_mean Ann NaN 70.00 YT 1 None 2019-12-01 2019-12-01 POINT (-135.10000 60.70000)
21 2101303 2101303 2101303 WHITEHORSE_A temp_mean Ann 1.80 70.00 YT 1 1943-2019 2019-12-01 2019-12-01 POINT (-135.00000 60.70000)
22 2100100 2100100 2100100 AISHIHIK A pressure_sea_level Ann NaN 966.20 YT 0 None 1966-09-01 1966-09-01 POINT (-137.48000 61.65000)
23 2100160 2100160 2100160 BEAVER CREEK A pressure_sea_level Ann NaN 649.00 YT 1 None 2014-12-01 2014-12-01 POINT (-140.87000 62.41000)
24 2100400 2100400 2100400 DAWSON pressure_sea_level Ann NaN 320.00 YT 0 None 1976-01-01 1976-01-01 POINT (-139.43000 64.05000)
25 2100402 2100402 2100402 DAWSON A pressure_sea_level Ann NaN 370.30 YT 1 None 2014-12-01 2014-12-01 POINT (-139.13000 64.04000)
26 2100517 2100517 2100517 FARO A pressure_sea_level Ann NaN 716.60 YT 1 None 2014-12-01 2014-12-01 POINT (-133.38000 62.21000)
27 2100636 2100636 2100636 HERSCHEL ISLAND pressure_sea_level Ann NaN 1.20 YT 0 None 2014-12-01 2014-12-01 POINT (-138.91000 69.57000)
28 2100685 2100685 2100685 KOMAKUK BEACH A pressure_sea_level Ann NaN 7.30 YT 0 None 1993-06-01 1993-06-01 POINT (-140.18000 69.58000)
29 2100800 2100800 2100800 OLD CROW A pressure_sea_level Ann NaN 251.20 YT 1 None 2014-12-01 2014-12-01 POINT (-139.84000 67.57000)
30 2100935 2100935 2100935 ROCK RIVER pressure_sea_level Ann NaN 731.00 YT 0 None 2014-12-01 2014-12-01 POINT (-136.22000 66.98000)
31 2101000 2101000 2101000 SNAG A pressure_sea_level Ann NaN 586.70 YT 0 None 1966-08-01 1966-08-01 POINT (-140.40000 62.37000)
32 2100182 2100182 2100182 BURWASH A wind_speed Ann NaN 806.20 YT 1 None 2014-12-01 2014-12-01 POINT (-139.05000 61.36670)
33 2100700 2100700 2100700 MAYO A wind_speed Ann 0.00 503.80 YT 1 1953-2014 2014-12-01 2014-12-01 POINT (-135.86670 63.61670)
34 2101100 2101100 2101100 TESLIN A wind_speed Ann NaN 705.00 YT 1 None 2014-12-01 2014-12-01 POINT (-132.73590 60.17410)
35 2101200 2101200 2101200 WATSON LAKE A wind_speed Ann -0.72 687.35 YT 0 1953-2014 2014-11-01 2014-11-01 POINT (-128.82230 60.11650)
36 2101300 2101300 2101300 WHITEHORSE A wind_speed Ann -0.25 706.20 YT 1 1953-2014 2014-12-01 2014-12-01 POINT (-135.06880 60.70950)

Let’s pick the Dawson station (2100LRP), which seems to have a long record. Again, use the trick above to see which fields are accepted.

# NBVAL_IGNORE_OUTPUT

ahccd_mon = host / "collections" / "ahccd-monthly" / "items"
url = ahccd_mon.with_query({"station_id__id_station": "2100LRP"})
with urllib.request.urlopen(url=str(url)) as req:
    mts = gpd.read_file(filename=req, engine="pyogrio")
mts.head()
id lat__lat lon__long identifier__identifiant station_id__id_station period_value__valeur_periode period_group__groupe_periode province__province date temp_mean__temp_moyenne ... rain_units__pluie_unites snow__neige snow_units__neige_unites pressure_sea_level__pression_niveau_mer pressure_sea_level_units__pression_niveau_mer_unite pressure_station__pression_station pressure_station_units__pression_station_unites wind_speed__vitesse_vent wind_speed_units__vitesse_vent_unites geometry
0 2100LRP.1983.03 64 -139.1 2100LRP.1983.03 2100LRP Mar Monthly YT 1983-03 -15.7 ... mm None mm None hPa None hPa None kph POINT (-139.10000 64.00000)
1 2100LRP.1921.03 64 -139.1 2100LRP.1921.03 2100LRP Mar Monthly YT 1921-03 -16.8 ... mm None mm None hPa None hPa None kph POINT (-139.10000 64.00000)
2 2100LRP.1992.06 64 -139.1 2100LRP.1992.06 2100LRP Jun Monthly YT 1992-06 14.2 ... mm None mm None hPa None hPa None kph POINT (-139.10000 64.00000)
3 2100LRP.1938.03 64 -139.1 2100LRP.1938.03 2100LRP Mar Monthly YT 1938-03 -13.7 ... mm None mm None hPa None hPa None kph POINT (-139.10000 64.00000)
4 2100LRP.1963.09 64 -139.1 2100LRP.1963.09 2100LRP Sep Monthly YT 1963-09 6.5 ... mm None mm None hPa None hPa None kph POINT (-139.10000 64.00000)

5 rows × 28 columns

Now let’s plot the mean temperature time series. Note that the server does not necessarily return a continuous time series, and when plotting time series with gaps, matplotlib just draws a continuous line between values. To convey the presence of missing values, here we’ll use the asfreq("MS") method to fill-in gaps in the time series with explicit missing values.

# Set the DataFrame index to a datetime object and sort it
mts.set_index(pd.to_datetime(mts["date"]), inplace=True)
mts.sort_index(inplace=True)

# Convert the temperature to a continuous monthly time series (so missing values are visible in the graphic)
tas = mts["temp_mean__temp_moyenne"].asfreq("MS")

# Mask missing values
tas.mask(tas < -300, inplace=True)

tas.plot(figsize=(12, 3))
plt.ylabel("Mean monthly temperature [°C]")
Text(0, 0.5, 'Mean monthly temperature [°C]')
../_images/d378846b969b61185f49c95312a57c226842f3e86ac29d686e6c4e3425ff016d.png