Working with the ECCC GeoAPI to access weather station data

Environment and Climate Change Canada (ECCC) hosts a data server compatible with the GeoAPI standard. This notebook shows how to send requests for daily climate station data and display the results.

Climate stations

The server holds different collections, and requests are made to a particular collection. Here we’ll start with the climate-station collection, which holds metadata about available stations, but no actual meteorological data. Useful queryables fields in this collection include DLY_FIRST_DATE and DLY_LAST_DATE, ENG_PROV_NAME, LATITUDE, LONGITUDE and ELEVATION and STATION_NAME, among many others.

Creating a request to the server for data

Let’s start by showing a map of all available stations locations in New-Brunswick. To do so, we first need to compose a URL request. The request includes the address of the server, the collection, then a query to filter results.

import os

os.environ["USE_PYGEOS"] = "0"  # force use Shapely with GeoPandas

import urllib

import geopandas as gpd
from urlpath import URL

# Compose the request
host = URL("https://api.weather.gc.ca")
climate_stations = host / "collections" / "climate-stations" / "items"
url = climate_stations.with_query({"ENG_PROV_NAME": "NOVA-SCOTIA"})
print(url)

# Send the request to the server
resp = url.get()
resp
https://api.weather.gc.ca/collections/climate-stations/items?ENG_PROV_NAME=NOVA-SCOTIA
<Response [200]>

The response from the server is a Response class instance. What we’re interested in is the content of this response, which in this case is a geoJSON file.

# NBVAL_IGNORE_OUTPUT

resp.content[:100]
b'{"type":"FeatureCollection","features":[{"type":"Feature","properties":{"STN_ID":6347,"STATION_NAME"'

We’ll open the geoJSON using geopandas. We have a few options to do this:

  • Load the response’ content using json.load, then create GeoDataFrame using the from_features class method;

  • Save the response content to a file on disk, then open using geopandas.read_file;

  • Save the response in an in-memory file using StringIO;

  • Call geopandas.read_file(url) to let geopandas handle the data download.

Here we’ll use the last option, as it’s the simplest. Note that the first method ignores the feature id, which seems to create problems with visualization with folium below.

# NBVAL_IGNORE_OUTPUT

# The first approach would look like this:
# import json
# stations = gpd.GeoDataFrame.from_features(json.loads(resp.content))

with urllib.request.urlopen(url=str(url)) as req:
    stations = gpd.read_file(filename=req, engine="pyogrio")
stations.head()
ERROR 1: PROJ: proj_create_from_database: Open of /home/docs/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/share/proj failed
id STN_ID STATION_NAME PROV_STATE_TERR_CODE ENG_PROV_NAME FRE_PROV_NAME COUNTRY LATITUDE LONGITUDE TIMEZONE ... HLY_FIRST_DATE HLY_LAST_DATE DLY_FIRST_DATE DLY_LAST_DATE MLY_FIRST_DATE MLY_LAST_DATE HAS_MONTHLY_SUMMARY HAS_NORMALS_DATA HAS_HOURLY_DATA geometry
0 8201766 6347 FARMINGTON NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 443800000 -644000000 AST ... NaT NaT 1982-07-01 2003-08-31 1982-01-01 2003-08-01 Y Y N POINT (-64.66667 44.63333)
1 8201850 6349 FRASER BROOK IHD NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 452000000 -631000000 AST ... NaT NaT 1965-12-01 1977-12-31 1966-01-01 1977-12-01 Y N N POINT (-63.16667 45.33333)
2 8201950 6352 GLENORA FALLS NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 460700000 -612200000 AST ... NaT NaT 1954-12-01 1961-12-31 1954-01-01 1961-12-01 Y N N POINT (-61.36667 46.11667)
3 8202550 6369 INVERNESS (AUT) NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 461400000 -611800000 AST ... 1977-01-01 1986-12-31 23:00:00 1973-01-01 1975-03-31 1973-01-01 1973-12-01 Y N N POINT (-61.30000 46.23333)
4 8202565 6370 JACKSON NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 453500000 -635000000 AST ... NaT NaT 1982-07-01 2004-11-30 1982-01-01 2004-11-01 Y N N POINT (-63.83333 45.58333)

5 rows × 34 columns

Filter stations

Now let’s say we want to filter the stations that were in operations for at least 50 years. What we’ll do is create a new column n_days and filter on it.

# NBVAL_IGNORE_OUTPUT

import pandas as pd

# Create a datetime.Timedelta object from the subtraction of two dates.
delta = pd.to_datetime(stations["DLY_LAST_DATE"]) - pd.to_datetime(
    stations["DLY_FIRST_DATE"]
)

# Get the number of days in the time delta
stations["n_days"] = delta.apply(lambda x: x.days)

# Compute condition
over_50 = stations["n_days"] > 50 * 365.25

# Index the data frame using the condition
select = stations[over_50]
select.head()
id STN_ID STATION_NAME PROV_STATE_TERR_CODE ENG_PROV_NAME FRE_PROV_NAME COUNTRY LATITUDE LONGITUDE TIMEZONE ... HLY_LAST_DATE DLY_FIRST_DATE DLY_LAST_DATE MLY_FIRST_DATE MLY_LAST_DATE HAS_MONTHLY_SUMMARY HAS_NORMALS_DATA HAS_HOURLY_DATA geometry n_days
8 8203400 6399 MALAY FALLS NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 445900000 -622900000 AST ... 2000-08-31 23:00:00 1950-02-01 2000-08-31 1950-01-01 2000-08-01 Y N N POINT (-62.48333 44.98333) 18474
24 8205698 6485 SYDNEY NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 460900000 -601200000 AST ... NaT 1870-01-01 1941-03-31 1870-01-01 1941-12-01 Y N N POINT (-60.20000 46.15000) 26021
29 8206300 6506 WHITEHEAD NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 451300000 -611100000 AST ... NaT 1883-12-01 1960-06-30 1883-01-01 1960-12-01 Y N N POINT (-61.18333 45.21667) 27970
30 8206440 6513 WOLFVILLE NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 450600000 -642200000 AST ... NaT 1870-09-01 1949-09-30 1870-01-01 1949-12-01 Y N N POINT (-64.36667 45.10000) 28883
43 8200100 6289 ANNAPOLIS ROYAL NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 444500000 -653100000 AST ... NaT 1914-04-01 2007-06-04 1914-01-01 2006-02-01 Y N N POINT (-65.51667 44.75000) 34032

5 rows × 35 columns

Map the data

We can then simply map the locations of station with at least 50 years of data using the explore method. This will display an interactive base map and overlay the station locations, where on a station marker will display this station’s information.

On top of this map, we’ll add controls to draw a rectangle. To use the drawing tool, click on the square on the left hand side menu, and the click and drag to draw a rectangle over the area of interest. Once that’s done, click on the Export button on the right of the map. This will download a file called data.geojson

from folium.plugins import Draw

# Add control to draw a rectangle, and an export button.
draw_control = Draw(
    draw_options={
        "polyline": False,
        "poly": False,
        "circle": False,
        "polygon": False,
        "marker": False,
        "circlemarker": False,
        "rectangle": True,
    },
    export=True,
)

# The map library Folium chokes on columns including time stamps, so we first select the data to plot.
m = select[["geometry", "n_days"]].explore("n_days")
draw_control.add_to(m)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Filter stations using bounding box

Next, we’ll use the bounding box drawn on the map to select a subset of stations. We first open the data.geojson file downloaded to disk, create a shapely object and use it filter stations.

# NBVAL_IGNORE_OUTPUT

# Adjust directory if running this locally.
# rect = gpd.read_file("~/Downloads/data.geojson")

# Here we're using an existing file so the notebook runs without user interaction.
rect = gpd.read_file(filename="./data.geojson", engine="pyogrio")

# Filter stations DataFrame using bbox
inbox = select.within(rect.loc[0].geometry)

print("Number of stations within subregion: ", sum(inbox))
sub_select = select[inbox]
sub_select.head()
Number of stations within subregion:  9
id STN_ID STATION_NAME PROV_STATE_TERR_CODE ENG_PROV_NAME FRE_PROV_NAME COUNTRY LATITUDE LONGITUDE TIMEZONE ... HLY_LAST_DATE DLY_FIRST_DATE DLY_LAST_DATE MLY_FIRST_DATE MLY_LAST_DATE HAS_MONTHLY_SUMMARY HAS_NORMALS_DATA HAS_HOURLY_DATA geometry n_days
24 8205698 6485 SYDNEY NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 460900000 -601200000 AST ... NaT 1870-01-01 1941-03-31 1870-01-01 1941-12-01 Y N N POINT (-60.20000 46.15000) 26021
29 8206300 6506 WHITEHEAD NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 451300000 -611100000 AST ... NaT 1883-12-01 1960-06-30 1883-01-01 1960-12-01 Y N N POINT (-61.18333 45.21667) 27970
52 8201410 6336 DEMING NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 451259007 -611040090 AST ... NaT 1956-10-01 2011-12-31 1956-01-01 2006-02-01 Y Y N POINT (-61.17780 45.21639) 20179
140 8205600 6481 STILLWATER NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 451100000 -620000000 AST ... NaT 1915-12-01 1979-10-31 1915-01-01 1979-12-01 Y N N POINT (-62.00000 45.18333) 23345
150 8201000 6329 COLLEGEVILLE NS NOVA SCOTIA NOUVELLE-ÉCOSSE CAN 452900000 -620100000 AST ... NaT 1916-06-01 2016-09-30 1916-01-01 2006-02-01 Y Y N POINT (-62.01667 45.48333) 36646

5 rows × 35 columns

Request meteorological data

Now we’ll make a request for actual meteorological data from the stations filtered above. For this, we’ll use the Daily Climate Observations collection (climate-daily). Here, we’re picking just one station but we could easily loop on each station.

# NBVAL_IGNORE_OUTPUT

coll = host / "collections" / "climate-daily" / "items"
station_id = "8201410"

# Restricting the number of entries returned to keep things fast.
url = str(coll.with_query({"CLIMATE_IDENTIFIER": station_id, "limit": 365}))
print("Request: ", url)
with urllib.request.urlopen(url=str(url)) as req:
    data = gpd.read_file(filename=req, engine="pyogrio")
data.head()
Request:  https://api.weather.gc.ca/collections/climate-daily/items?CLIMATE_IDENTIFIER=8201410&limit=365
id MAX_REL_HUMIDITY MAX_TEMPERATURE_FLAG SNOW_ON_GROUND_FLAG SNOW_ON_GROUND DIRECTION_MAX_GUST_FLAG LOCAL_DAY MAX_TEMPERATURE HEATING_DEGREE_DAYS_FLAG TOTAL_RAIN_FLAG ... MIN_REL_HUMIDITY_FLAG TOTAL_SNOW_FLAG TOTAL_RAIN MIN_TEMPERATURE_FLAG MIN_REL_HUMIDITY TOTAL_SNOW TOTAL_PRECIPITATION STATION_NAME SPEED_MAX_GUST_FLAG geometry
0 8201410.1996.6.13 None None None 0.0 None 13 12.5 None None ... None None 0.0 None None 0.0 0.0 DEMING None POINT (-61.17780 45.21639)
1 8201410.1993.12.17 None None None 0.0 None 17 0.5 None None ... None None 0.0 None None 0.0 0.0 DEMING None POINT (-61.17780 45.21639)
2 8201410.1967.11.6 None None None 0.0 None 6 8.9 None None ... None None 0.0 None None 0.0 0.0 DEMING None POINT (-61.17780 45.21639)
3 8201410.2003.1.14 None None None 45.0 None 14 -3.5 None None ... None None 0.0 None None 0.0 0.0 DEMING None POINT (-61.17780 45.21639)
4 8201410.2009.1.30 None None None 5.0 None 30 -1.0 None None ... None None 0.0 None None 0.0 0.0 DEMING None POINT (-61.17780 45.21639)

5 rows × 36 columns

We can also send a request for data inside a bounding box at a specific date.

bbox = rect.iloc[0].geometry.bounds
print("Bounding box: ", bbox)
url = str(
    coll.with_query(
        {
            "bbox": str(bbox).strip("()"),
            "LOCAL_DATE": "2000-01-01 00:00:00",
            "limit": 100,
        }
    )
)
with urllib.request.urlopen(url=str(url)) as req:
    snapshot = gpd.read_file(filename=req, engine="pyogrio")
Bounding box:  (-62.186675, 44.78125, -59.123882, 47.53125)
import cartopy
from cartopy import crs as ccrs
from matplotlib import pyplot as plt

# Create map projection
proj = ccrs.PlateCarree()

# If using another projection, remember you'll need to reproject the snapshot's coordinates.
# snapshot.to_crs(proj, inplace=True)

# Create figure and axes
fig = plt.figure(figsize=(5, 3))
ax = fig.add_subplot(projection=proj)

# Set the map extent to the bounding box and draw the coastlines
ax.set_extent([bbox[0], bbox[2], bbox[1], bbox[3]])
ax.coastlines()

# Plot markers color-coded according to the temperature recorded.
ax = snapshot.plot(column="MEAN_TEMPERATURE", ax=ax, cmap=plt.cm.viridis, legend=True)

# Add a label to the colorbar
cax = ax.figure.axes[-1]
cax.set_ylabel("Mean temperature [°C]")
/home/docs/checkouts/readthedocs.org/user_builds/pavics-sdi/conda/latest/lib/python3.10/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/10m_physical/ne_10m_coastline.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)
Text(448.07777777777784, 0.5, 'Mean temperature [°C]')
../_images/f6c104bb5e9a349b9cfd9463150d6e7393cb038d2a226f9035a1b95c1f52e5f8.png

Another useful filter is on dates and times. Let’s say we only want data in a given period, we simply create a request with the datetime argument and a / separating the start and end dates. You may leave the start or end date open-ended using .. instead of a date time string.

url = str(
    coll.with_query(
        {
            "CLIMATE_IDENTIFIER": station_id,
            "datetime": "1990-01-01 00:00:00/1991-01-01 00:00:00",
        }
    )
)
print(url)
with urllib.request.urlopen(url=str(url)) as req:
    gdf = gpd.read_file(filename=req, engine="pyogrio")
https://api.weather.gc.ca/collections/climate-daily/items?CLIMATE_IDENTIFIER=8201410&datetime=1990-01-01+00%3A00%3A00%2F1991-01-01+00%3A00%3A00
# Convert the datetime string to a datetime object
gdf["LOCAL_DATE"] = pd.to_datetime(gdf["LOCAL_DATE"])

# Create a time series out of the column for mean temperature
ts = gdf.set_index("LOCAL_DATE")["MEAN_TEMPERATURE"]
# Plot the time series
ax = ts.plot()
ax.set_xlabel("Time")
ax.set_ylabel("Mean temperature [°C]")
ax.set_title(gdf.iloc[0]["STATION_NAME"])
plt.show()
../_images/b646e4c544e5d7f285ba51459f75b90427b8af28a56b6c74f3ecd35605a3bc68.png

Adjusted and Homogenized Canadian Climate Data (AHCCD)

The Adjusted and Homogenized Canadian Climate Data (AHCCD) datasets from ECCC are climate station data adjusted to account for discontinuities in the record, such as instrument relocation. The collections related to these datasets are ahccd-stations for station metadata, ahccd-annual, ahccd-monthly and ahccd-seasonal for temporally aggregated time series, and ahccd-trends for trends computed on the data.

Now, unfortunately, the fields for these datasets are different from those of the climate stations… One strategy to find out what keywords are accepted is to make a query with no filter except for limit=1. Another is to go to the collection search page (click on the link printed below), and inspect the column names.

# NBVAL_IGNORE_OUTPUT

# The url to query station metadata - this should behave similarly as `climate-stations`
ahccd_stations = host / "collections" / "ahccd-stations" / "items"
url = ahccd_stations.with_query({"limit": 1})
print(ahccd_stations)
with urllib.request.urlopen(url=str(url)) as req:
    gpd.read_file(filename=req, engine="pyogrio")
https://api.weather.gc.ca/collections/ahccd-stations/items

So if we want to see the stations in Yukon, we’d have to query with the province__province keyword… Now how do you know what code to use for provinces? One solution is to go again to the collection search page, zoom on the area of interest and click on the check box to “Only show items by map view”, then inspect the results.

# NBVAL_IGNORE_OUTPUT

url = ahccd_stations.with_query({"province__province": "YT"})
with urllib.request.urlopen(url=str(url)) as req:
    gpd.read_file(filename=req, engine="pyogrio")

Let’s pick the Dawson station (2100LRP), which seems to have a long record. Again, use the trick above to see which fields are accepted.

# NBVAL_IGNORE_OUTPUT

ahccd_mon = host / "collections" / "ahccd-monthly" / "items"
url = ahccd_mon.with_query({"station_id__id_station": "2100LRP"})
with urllib.request.urlopen(url=str(url)) as req:
    mts = gpd.read_file(filename=req, engine="pyogrio")
mts.head()
id wind_speed__vitesse_vent temp_min_units__temp_min_unites wind_speed_units__vitesse_vent_unites snow__neige temp_max_units__temp_max_unites province__province total_precip__precip_totale pressure_sea_level_units__pression_niveau_mer_unite temp_mean__temp_moyenne ... period_group__groupe_periode rain__pluie snow_units__neige_unites lon__long pressure_station_units__pression_station_unites pressure_station__pression_station temp_max__temp_max temp_min__temp_min station_id__id_station geometry
0 2100LRP.1983.03 None C kph None C YT None hPa -15.7 ... Monthly None mm -139.13 hPa None -6.8 -24.5 2100LRP POINT (-139.13000 64.07000)
1 2100LRP.1921.03 None C kph None C YT None hPa -16.8 ... Monthly None mm -139.13 hPa None -8.8 -24.7 2100LRP POINT (-139.13000 64.07000)
2 2100LRP.1992.06 None C kph None C YT None hPa 14.2 ... Monthly None mm -139.13 hPa None 22.1 6.4 2100LRP POINT (-139.13000 64.07000)
3 2100LRP.1938.03 None C kph None C YT None hPa -13.7 ... Monthly None mm -139.13 hPa None -4.5 -22.9 2100LRP POINT (-139.13000 64.07000)
4 2100LRP.1963.09 None C kph None C YT None hPa 6.5 ... Monthly None mm -139.13 hPa None 12.1 0.9 2100LRP POINT (-139.13000 64.07000)

5 rows × 28 columns

Now let’s plot the mean temperature time series. Note that the server does not necessarily return a continuous time series, and when plotting time series with gaps, matplotlib just draws a continuous line between values. To convey the presence of missing values, here we’ll use the asfreq("MS") method to fill-in gaps in the time series with explicit missing values.

# Set the DataFrame index to a datetime object and sort it
mts.set_index(pd.to_datetime(mts["date"]), inplace=True)
mts.sort_index(inplace=True)

# Convert the temperature to a continuous monthly time series (so missing values are visible in the graphic)
tas = mts["temp_mean__temp_moyenne"].asfreq("MS")

# Mask missing values
tas.mask(tas < -300, inplace=True)

tas.plot(figsize=(12, 3))
plt.ylabel("Mean monthly temperature [°C]")
Text(0, 0.5, 'Mean monthly temperature [°C]')
../_images/63487764eaaa15e8e08514e956977c31fd21786bc8ecd2f29e3426ca8200cca1.png