pycontrails.datalib.gfs.GFSForecast

class pycontrails.datalib.gfs.GFSForecast(time, variables, pressure_levels=-1, paths=None, grid=0.25, forecast_time=None, cachestore=<object object>, show_progress=False, cache_download=False)

Bases: MetDataSource

GFS Forecast data access.

Parameters:
  • time (metsource.TimeInput) – The time range for data retrieval, either a single datetime or (start, end) datetime range. Input must be a single datetime-like or tuple of datetime-like (datetime, pandas.Timestamp, numpy.datetime64) specifying the (start, end) of the date range, inclusive. All times will be downloaded for a single forecast model run nearest to the start time (see forecast_time) If None, paths must be defined and all time coordinates will be loaded from files.

  • variables (metsource.VariableInput) – Variable name (i.e. “temperature”, [“temperature, relative_humidity”]) See pressure_level_variables for the list of available variables.

  • pressure_levels (metsource.PressureLevelInput, optional) – Pressure levels for data, in hPa (mbar) Set to [-1] for to download surface level parameters. Defaults to [-1].

  • paths (str | list[str] | pathlib.Path | list[pathlib.Path] | None, optional) – Path to files to load manually. Can include glob patterns to load specific files. Defaults to None, which looks for files in the cachestore or GFS AWS bucket.

  • grid (float, optional) – Specify latitude/longitude grid spacing in data. Defaults to 0.25.

  • forecast_time (DatetimeLike, optional) – Specify forecast run by runtime. If None (default), the forecast time is set to the 6 hour floor of the first timestep.

  • cachestore (cache.CacheStore | None, optional) – Cache data store for staging data files. Defaults to cache.DiskCacheStore. If None, cachestore is turned off.

  • show_progress (bool, optional) – Show progress when downloading files from GFS AWS Bucket. Defaults to False

  • cache_download (bool, optional) – If True, cache downloaded grib files rather than storing them in a temporary file. By default, False.

Examples

>>> from datetime import datetime
>>> from pycontrails.datalib.gfs import GFSForecast
>>> # Store data files to local disk (default behavior)
>>> times = ("2022-03-22 00:00:00", "2022-03-22 03:00:00")
>>> gfs = GFSForecast(times, variables="air_temperature", pressure_levels=[300, 250])
>>> gfs
GFSForecast
    Timesteps: ['2022-03-22 00', '2022-03-22 01', '2022-03-22 02', '2022-03-22 03']
    Variables: ['t']
    Pressure levels: [250, 300]
    Grid: 0.25
    Forecast time: 2022-03-22 00:00:00
>>> gfs = GFSForecast(times, variables="air_temperature", pressure_levels=[300, 250], grid=0.5)
>>> gfs
GFSForecast
    Timesteps: ['2022-03-22 00', '2022-03-22 03']
    Variables: ['t']
    Pressure levels: [250, 300]
    Grid: 0.5
    Forecast time: 2022-03-22 00:00:00

Notes

__init__(time, variables, pressure_levels=-1, paths=None, grid=0.25, forecast_time=None, cachestore=<object object>, show_progress=False, cache_download=False)

Methods

__init__(time, variables[, pressure_levels, ...])

cache_dataset(dataset)

Cache data from data source.

create_cachepath(t)

Return cachepath to local data file based on datetime.

download(**xr_kwargs)

Confirm all data files are downloaded and available locally in the cachestore.

download_dataset(times)

Download data from data source for input times.

filename(t)

Construct grib filename to retrieve from GFS bucket.

is_datafile_cached(t, **xr_kwargs)

Check datafile defined by datetime for variables and pressure levels in class.

list_timesteps_cached(**xr_kwargs)

Get a list of data files available locally in the cachestore.

list_timesteps_not_cached(**xr_kwargs)

Get a list of data files not available locally in the cachestore.

open_dataset(disk_paths, **xr_kwargs)

Open multi-file dataset in xarray.

open_metdataset([dataset, xr_kwargs])

Open MetDataset from data source.

set_metadata(ds)

Set met source metadata on ds.attrs.

Attributes

client

S3 client for accessing GFS bucket

grid

Lat / Lon grid spacing.

cachestore

Cache store for intermediates while processing data source If None, cache is turned off.

show_progress

Show progress bar when downloading files from AWS

forecast_time

Base time of the previous GFS forecast based on input times

cache_download

forecast_path

Construct forecast path in bucket for forecast_time.

hash

Generate a unique hash for this datasource.

is_single_level

Return True if the datasource is single level data.

paths

Path to local source files to load.

pressure_level_variables

GFS pressure level parameters.

pressure_levels

List of pressure levels.

single_level_variables

GFS surface level parameters.

supported_pressure_levels

Get pressure levels available.

supported_variables

Parameters available from data source.

timesteps

List of individual timesteps from data source derived from time Use parse_time() to handle TimeInput.

variable_shortnames

Return a list of variable short names.

variable_standardnames

Return a list of variable standard names.

variables

Variables requested from data source Use parse_variables() to handle VariableInput.

cache_dataset(dataset)

Cache data from data source.

Parameters:

dataset (xarray.Dataset) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.

cache_download
cachestore

Cache store for intermediates while processing data source If None, cache is turned off.

client

S3 client for accessing GFS bucket

create_cachepath(t)

Return cachepath to local data file based on datetime.

Parameters:

t (datetime) – Datetime of datafile

Returns:

str – Path to cached data file

download(**xr_kwargs)

Confirm all data files are downloaded and available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

download_dataset(times)

Download data from data source for input times.

Parameters:

times (list[datetime]) – List of datetimes to download a store in cache

filename(t)

Construct grib filename to retrieve from GFS bucket.

String template:

gfs.tCCz.pgrb2.GGGG.fFFF

  • CC is the model cycle runtime (i.e. 00, 06, 12, 18)

  • GGGG is the grid spacing

  • FFF is the forecast hour of product from 000 - 384

Parameters:

t (datetime) – Timestep to download

Returns:

str – Forecast filenames to retrieve from GFS bucket.

References

property forecast_path

Construct forecast path in bucket for forecast_time.

String template:

GFS_FORECAST_BUCKET/gfs.YYYYMMDD/HH/atmos/{filename}”,

Returns:

str – Bucket prefix for forecast files.

forecast_time

Base time of the previous GFS forecast based on input times

grid

Lat / Lon grid spacing. One of [0.25, 0.5, 1]

property hash

Generate a unique hash for this datasource.

Returns:

str – Unique hash for met instance (sha1)

is_datafile_cached(t, **xr_kwargs)

Check datafile defined by datetime for variables and pressure levels in class.

If using a cloud cache store (i.e. cache.GCPCacheStore), this is where the datafile will be mirrored to a local file for access.

Parameters:
  • t (datetime) – Datetime of datafile

  • **xr_kwargs (Any) – Additional kwargs passed directly to xarray.open_mfdataset() when opening files. By default, the following values are used if not specified:

    • chunks: {“time”: 1}

    • engine: “netcdf4”

    • parallel: False

Returns:

bool – True if data file exists for datetime with all variables and pressure levels, False otherwise

property is_single_level

Return True if the datasource is single level data.

Added in version 0.50.0.

list_timesteps_cached(**xr_kwargs)

Get a list of data files available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

list_timesteps_not_cached(**xr_kwargs)

Get a list of data files not available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

open_dataset(disk_paths, **xr_kwargs)

Open multi-file dataset in xarray.

Parameters:
  • disk_paths (str | list[str] | pathlib.Path | list[pathlib.Path]) – list of string paths to local files to open

  • **xr_kwargs (Any) – Additional kwargs passed directly to xarray.open_mfdataset() when opening files. By default, the following values are used if not specified:

    • chunks: {“time”: 1}

    • engine: “netcdf4”

    • parallel: False

    • lock: False

Returns:

xarray.Dataset – Open xarray dataset

open_metdataset(dataset=None, xr_kwargs=None, **kwargs)

Open MetDataset from data source.

This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.

Parameters:
  • dataset (xr.Dataset | None, optional) – Input xr.Dataset loaded manually. The dataset must have the same format as the original data source API or files.

  • xr_kwargs (dict[str, Any] | None, optional) – Dictionary of keyword arguments passed into xarray.open_mfdataset() when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored if dataset is input.

  • **kwargs (Any) – Keyword arguments passed through directly into MetDataset constructor.

Returns:

MetDataset – Meteorology dataset

paths

Path to local source files to load. Set to the paths of files cached in cachestore if no paths input is provided on init.

property pressure_level_variables

GFS pressure level parameters.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

pressure_levels

List of pressure levels. Set to [-1] for data without level coordinate. Use parse_pressure_levels() to handle PressureLevelInput.

set_metadata(ds)

Set met source metadata on ds.attrs.

This is called within the open_metdataset() method to set metadata on the returned MetDataset instance.

Parameters:

ds (xr.Dataset | MetDataset) – Dataset to set metadata on. Mutated in place.

show_progress

Show progress bar when downloading files from AWS

property single_level_variables

GFS surface level parameters.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

property supported_pressure_levels

Get pressure levels available.

Returns:

list[int] – List of integer pressure level values

property supported_variables

Parameters available from data source.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

timesteps

List of individual timesteps from data source derived from time Use parse_time() to handle TimeInput.

property variable_shortnames

Return a list of variable short names.

Returns:

list[str] – Lst of variable short names.

property variable_standardnames

Return a list of variable standard names.

Returns:

list[str] – Lst of variable standard names.

variables

Variables requested from data source Use parse_variables() to handle VariableInput.