pycontrails.datalib.gfs.GFSForecast

class pycontrails.datalib.gfs.GFSForecast(time, variables, pressure_levels=-1, paths=None, grid=0.25, forecast_time=None, cachestore=<object object>, show_progress=False)

Bases: MetDataSource

GFS Forecast data access.

Parameters:
  • time (datalib.TimeInput) – The time range for data retrieval, either a single datetime or (start, end) datetime range. Input must be a single datetime-like or tuple of datetime-like (datetime, pandas.Timestamp, numpy.datetime64) specifying the (start, end) of the date range, inclusive. All times will be downloaded for a single forecast model run nearest to the start time (see forecast_time) If None, paths must be defined and all time coordinates will be loaded from files.

  • variables (datalib.VariableInput) – Variable name (i.e. “temperature”, [“temperature, relative_humidity”]) See pressure_level_variables for the list of available variables.

  • pressure_levels (datalib.PressureLevelInput, optional) – Pressure levels for data, in hPa (mbar) Set to [-1] for to download surface level parameters. Defaults to [-1].

  • paths (str | list[str] | pathlib.Path | list[pathlib.Path] | None, optional) – Path to files to load manually. Can include glob patterns to load specific files. Defaults to None, which looks for files in the cachestore or GFS AWS bucket.

  • grid (float, optional) – Specify latitude/longitude grid spacing in data. Defaults to 0.25.

  • forecast_time (DatetimeLike, optional) – Specify forecast run by runtime. If None (default), the forecast time is set to the 6 hour floor of the first timestep.

  • cachestore (cache.CacheStore | None, optional) – Cache data store for staging data files. Defaults to cache.DiskCacheStore. If None, cachestore is turned off.

  • show_progress (bool, optional) – Show progress when downloading files from GFS AWS Bucket. Defaults to False

Examples

>>> from datetime import datetime
>>> from pycontrails.datalib.gfs import GFSForecast
>>> # Store data files to local disk (default behavior)
>>> times = ("2022-03-22 00:00:00", "2022-03-22 03:00:00")
>>> gfs = GFSForecast(times, variables="air_temperature", pressure_levels=[300, 250])
>>> gfs
GFSForecast
    Timesteps: ['2022-03-22 00', '2022-03-22 01', '2022-03-22 02', '2022-03-22 03']
    Variables: ['t']
    Pressure levels: [250, 300]
    Grid: 0.25
    Forecast time: 2022-03-22 00:00:00
>>> gfs = GFSForecast(times, variables="air_temperature", pressure_levels=[300, 250], grid=0.5)
>>> gfs
GFSForecast
    Timesteps: ['2022-03-22 00', '2022-03-22 03']
    Variables: ['t']
    Pressure levels: [250, 300]
    Grid: 0.5
    Forecast time: 2022-03-22 00:00:00

Notes

__init__(time, variables, pressure_levels=-1, paths=None, grid=0.25, forecast_time=None, cachestore=<object object>, show_progress=False)

Methods

__init__(time, variables[, pressure_levels, ...])

cache_dataset(dataset)

Cache data from data source.

create_cachepath(t)

Return cachepath to local data file based on datetime.

download(**xr_kwargs)

Confirm all data files are downloaded and available locally in the cachestore.

download_dataset(times)

Download data from data source for input times.

filename(t)

Construct grib filename to retrieve from GFS bucket.

is_datafile_cached(t, **xr_kwargs)

Check datafile defined by datetime for variables and pressure levels in class.

list_timesteps_cached(**xr_kwargs)

Get a list of data files available locally in the cachestore.

list_timesteps_not_cached(**xr_kwargs)

Get a list of data files not available locally in the cachestore.

open_dataset(disk_paths, **xr_kwargs)

Open multi-file dataset in xarray.

open_metdataset([dataset, xr_kwargs])

Open MetDataset from data source.

set_metadata(ds)

Set met source metadata on ds.attrs.

Attributes

client

S3 client for accessing GFS bucket

grid

Lat / Lon grid spacing.

cachestore

Cache store for intermediates while processing data source If None, cache is turned off.

show_progress

Show progress bar when downloading files from AWS

forecast_time

Base time of the previous GFS forecast based on input times

forecast_path

Construct forecast path in bucket for forecast_time.

hash

Generate a unique hash for this datasource.

is_single_level

Return True if the datasource is single level data.

paths

Path to local source files to load.

pressure_level_variables

GFS pressure level parameters.

pressure_levels

List of pressure levels.

single_level_variables

GFS surface level parameters.

supported_pressure_levels

Get pressure levels available.

supported_variables

Parameters available from data source.

timesteps

List of individual timesteps from data source derived from time Use parse_time() to handle TimeInput.

variable_shortnames

Return a list of variable short names.

variable_standardnames

Return a list of variable standard names.

variables

Variables requested from data source Use parse_variables() to handle VariableInput.

cache_dataset(dataset)

Cache data from data source.

Parameters:

dataset (xarray.Dataset) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.

cachestore

Cache store for intermediates while processing data source If None, cache is turned off.

client

S3 client for accessing GFS bucket

create_cachepath(t)

Return cachepath to local data file based on datetime.

Parameters:

t (datetime) – Datetime of datafile

Returns:

str – Path to cached data file

download_dataset(times)

Download data from data source for input times.

Parameters:

times (list[datetime]) – List of datetimes to download a store in cache

filename(t)

Construct grib filename to retrieve from GFS bucket.

String template:

gfs.tCCz.pgrb2.GGGG.fFFF

  • CC is the model cycle runtime (i.e. 00, 06, 12, 18)

  • GGGG is the grid spacing

  • FFF is the forecast hour of product from 000 - 384

Parameters:

t (datetime) – Timestep to download

Returns:

str – Forecast filenames to retrieve from GFS bucket.

References

property forecast_path

Construct forecast path in bucket for forecast_time.

String template:

GFS_FORECAST_BUCKET/gfs.YYYYMMDD/HH/atmos/{filename}”,

Returns:

str – Bucket prefix for forecast files.

forecast_time

Base time of the previous GFS forecast based on input times

grid

Lat / Lon grid spacing. One of [0.25, 0.5, 1]

property hash

Generate a unique hash for this datasource.

Returns:

str – Unique hash for met instance (sha1)

open_metdataset(dataset=None, xr_kwargs=None, **kwargs)

Open MetDataset from data source.

This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.

Parameters:
  • dataset (xr.Dataset | None, optional) – Input xr.Dataset loaded manually. The dataset must have the same format as the original data source API or files.

  • xr_kwargs (dict[str, Any] | None, optional) – Dictionary of keyword arguments passed into xarray.open_mfdataset() when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored if dataset is input.

  • **kwargs (Any) – Keyword arguments passed through directly into MetDataset constructor.

Returns:

MetDataset – Meteorology dataset

property pressure_level_variables

GFS pressure level parameters.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

set_metadata(ds)

Set met source metadata on ds.attrs.

This is called within the open_metdataset() method to set metadata on the returned MetDataset instance.

Parameters:

ds (xr.Dataset | MetDataset) – Dataset to set metadata on. Mutated in place.

show_progress

Show progress bar when downloading files from AWS

property single_level_variables

GFS surface level parameters.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

property supported_pressure_levels

Get pressure levels available.

Returns:

list[int] – List of integer pressure level values