pycontrails.datalib.gfs.GFSForecast¶
- class pycontrails.datalib.gfs.GFSForecast(time, variables, pressure_levels=-1, paths=None, grid=0.25, forecast_time=None, cachestore=<object object>, show_progress=False, cache_download=False)¶
Bases:
MetDataSource
GFS Forecast data access.
- Parameters:
time (metsource.TimeInput) – The time range for data retrieval, either a single datetime or (start, end) datetime range. Input must be a single datetime-like or tuple of datetime-like (datetime,
pandas.Timestamp
,numpy.datetime64
) specifying the (start, end) of the date range, inclusive. All times will be downloaded for a single forecast model run nearest to the start time (seeforecast_time
) If None,paths
must be defined and all time coordinates will be loaded from files.variables (metsource.VariableInput) – Variable name (i.e. “temperature”, [“temperature, relative_humidity”]) See
pressure_level_variables
for the list of available variables.pressure_levels (metsource.PressureLevelInput, optional) – Pressure levels for data, in hPa (mbar) Set to [-1] for to download surface level parameters. Defaults to [-1].
paths (
str | list[str] | pathlib.Path | list[pathlib.Path] | None
, optional) – Path to files to load manually. Can include glob patterns to load specific files. Defaults to None, which looks for files in thecachestore
or GFS AWS bucket.grid (
float
, optional) – Specify latitude/longitude grid spacing in data. Defaults to 0.25.forecast_time (DatetimeLike, optional) – Specify forecast run by runtime. If None (default), the forecast time is set to the 6 hour floor of the first timestep.
cachestore (
cache.CacheStore
| None, optional) – Cache data store for staging data files. Defaults tocache.DiskCacheStore
. If None, cachestore is turned off.show_progress (
bool
, optional) – Show progress when downloading files from GFS AWS Bucket. Defaults to Falsecache_download (
bool
, optional) – If True, cache downloaded grib files rather than storing them in a temporary file. By default, False.
Examples
>>> from datetime import datetime >>> from pycontrails.datalib.gfs import GFSForecast
>>> # Store data files to local disk (default behavior) >>> times = ("2022-03-22 00:00:00", "2022-03-22 03:00:00") >>> gfs = GFSForecast(times, variables="air_temperature", pressure_levels=[300, 250]) >>> gfs GFSForecast Timesteps: ['2022-03-22 00', '2022-03-22 01', '2022-03-22 02', '2022-03-22 03'] Variables: ['t'] Pressure levels: [250, 300] Grid: 0.25 Forecast time: 2022-03-22 00:00:00
>>> gfs = GFSForecast(times, variables="air_temperature", pressure_levels=[300, 250], grid=0.5) >>> gfs GFSForecast Timesteps: ['2022-03-22 00', '2022-03-22 03'] Variables: ['t'] Pressure levels: [250, 300] Grid: 0.5 Forecast time: 2022-03-22 00:00:00
Notes
- __init__(time, variables, pressure_levels=-1, paths=None, grid=0.25, forecast_time=None, cachestore=<object object>, show_progress=False, cache_download=False)¶
Methods
__init__
(time, variables[, pressure_levels, ...])cache_dataset
(dataset)Cache data from data source.
Return cachepath to local data file based on datetime.
download
(**xr_kwargs)Confirm all data files are downloaded and available locally in the
cachestore
.download_dataset
(times)Download data from data source for input times.
filename
(t)Construct grib filename to retrieve from GFS bucket.
is_datafile_cached
(t, **xr_kwargs)Check datafile defined by datetime for variables and pressure levels in class.
list_timesteps_cached
(**xr_kwargs)Get a list of data files available locally in the
cachestore
.list_timesteps_not_cached
(**xr_kwargs)Get a list of data files not available locally in the
cachestore
.open_dataset
(disk_paths, **xr_kwargs)Open multi-file dataset in xarray.
open_metdataset
([dataset, xr_kwargs])Open MetDataset from data source.
set_metadata
(ds)Set met source metadata on
ds.attrs
.Attributes
S3 client for accessing GFS bucket
Lat / Lon grid spacing.
Cache store for intermediates while processing data source If None, cache is turned off.
Show progress bar when downloading files from AWS
Base time of the previous GFS forecast based on input times
Construct forecast path in bucket for
forecast_time
.Generate a unique hash for this datasource.
is_single_level
Return True if the datasource is single level data.
paths
Path to local source files to load.
GFS pressure level parameters.
pressure_levels
List of pressure levels.
GFS surface level parameters.
Get pressure levels available.
supported_variables
Parameters available from data source.
timesteps
List of individual timesteps from data source derived from
time
Useparse_time()
to handleTimeInput
.variable_shortnames
Return a list of variable short names.
variable_standardnames
Return a list of variable standard names.
variables
Variables requested from data source Use
parse_variables()
to handleVariableInput
.- cache_dataset(dataset)¶
Cache data from data source.
- Parameters:
dataset (
xarray.Dataset
) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.
- cache_download¶
- cachestore¶
Cache store for intermediates while processing data source If None, cache is turned off.
- client¶
S3 client for accessing GFS bucket
- create_cachepath(t)¶
Return cachepath to local data file based on datetime.
- Parameters:
t (
datetime
) – Datetime of datafile- Returns:
str
– Path to cached data file
- download_dataset(times)¶
Download data from data source for input times.
- Parameters:
times (
list[datetime]
) – List of datetimes to download a store in cache
- filename(t)¶
Construct grib filename to retrieve from GFS bucket.
String template:
gfs.tCCz.pgrb2.GGGG.fFFF
CC
is the model cycle runtime (i.e. 00, 06, 12, 18)GGGG
is the grid spacingFFF
is the forecast hour of product from 000 - 384
- Parameters:
t (
datetime
) – Timestep to download- Returns:
str
– Forecast filenames to retrieve from GFS bucket.
References
- property forecast_path¶
Construct forecast path in bucket for
forecast_time
.String template:
GFS_FORECAST_BUCKET/gfs.YYYYMMDD/HH/atmos/{filename}”,
- Returns:
str
– Bucket prefix for forecast files.
- forecast_time¶
Base time of the previous GFS forecast based on input times
- grid¶
Lat / Lon grid spacing. One of [0.25, 0.5, 1]
- property hash¶
Generate a unique hash for this datasource.
- Returns:
str
– Unique hash for met instance (sha1)
- open_metdataset(dataset=None, xr_kwargs=None, **kwargs)¶
Open MetDataset from data source.
This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.
- Parameters:
dataset (
xr.Dataset | None
, optional) – Inputxr.Dataset
loaded manually. The dataset must have the same format as the original data source API or files.xr_kwargs (
dict[str
,Any] | None
, optional) – Dictionary of keyword arguments passed intoxarray.open_mfdataset()
when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored ifdataset
is input.**kwargs (
Any
) – Keyword arguments passed through directly intoMetDataset
constructor.
- Returns:
MetDataset
– Meteorology dataset
See also
- property pressure_level_variables¶
GFS pressure level parameters.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- set_metadata(ds)¶
Set met source metadata on
ds.attrs
.This is called within the
open_metdataset()
method to set metadata on the returnedMetDataset
instance.- Parameters:
ds (
xr.Dataset | MetDataset
) – Dataset to set metadata on. Mutated in place.
- show_progress¶
Show progress bar when downloading files from AWS
- property single_level_variables¶
GFS surface level parameters.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- property supported_pressure_levels¶
Get pressure levels available.
- Returns:
list[int]
– List of integer pressure level values