pycontrails.datalib.gfs.GFSForecast¶
- class pycontrails.datalib.gfs.GFSForecast(time, variables, pressure_levels=-1, paths=None, grid=0.25, forecast_time=None, cachestore=<object object>, show_progress=False, cache_download=False)¶
Bases:
MetDataSource
GFS Forecast data access.
- Parameters:
time (metsource.TimeInput) – The time range for data retrieval, either a single datetime or (start, end) datetime range. Input must be a single datetime-like or tuple of datetime-like (datetime,
pandas.Timestamp
,numpy.datetime64
) specifying the (start, end) of the date range, inclusive. All times will be downloaded for a single forecast model run nearest to the start time (seeforecast_time
) If None,paths
must be defined and all time coordinates will be loaded from files.variables (metsource.VariableInput) – Variable name (i.e. “temperature”, [“temperature, relative_humidity”]) See
pressure_level_variables
for the list of available variables.pressure_levels (metsource.PressureLevelInput, optional) – Pressure levels for data, in hPa (mbar) Set to [-1] for to download surface level parameters. Defaults to [-1].
paths (
str | list[str] | pathlib.Path | list[pathlib.Path] | None
, optional) – Path to files to load manually. Can include glob patterns to load specific files. Defaults to None, which looks for files in thecachestore
or GFS AWS bucket.grid (
float
, optional) – Specify latitude/longitude grid spacing in data. Defaults to 0.25.forecast_time (DatetimeLike, optional) – Specify forecast run by runtime. If None (default), the forecast time is set to the 6 hour floor of the first timestep.
cachestore (
cache.CacheStore
| None, optional) – Cache data store for staging data files. Defaults tocache.DiskCacheStore
. If None, cachestore is turned off.show_progress (
bool
, optional) – Show progress when downloading files from GFS AWS Bucket. Defaults to Falsecache_download (
bool
, optional) – If True, cache downloaded grib files rather than storing them in a temporary file. By default, False.
Examples
>>> from datetime import datetime >>> from pycontrails.datalib.gfs import GFSForecast
>>> # Store data files to local disk (default behavior) >>> times = ("2022-03-22 00:00:00", "2022-03-22 03:00:00") >>> gfs = GFSForecast(times, variables="air_temperature", pressure_levels=[300, 250]) >>> gfs GFSForecast Timesteps: ['2022-03-22 00', '2022-03-22 01', '2022-03-22 02', '2022-03-22 03'] Variables: ['t'] Pressure levels: [250, 300] Grid: 0.25 Forecast time: 2022-03-22 00:00:00
>>> gfs = GFSForecast(times, variables="air_temperature", pressure_levels=[300, 250], grid=0.5) >>> gfs GFSForecast Timesteps: ['2022-03-22 00', '2022-03-22 03'] Variables: ['t'] Pressure levels: [250, 300] Grid: 0.5 Forecast time: 2022-03-22 00:00:00
Notes
- __init__(time, variables, pressure_levels=-1, paths=None, grid=0.25, forecast_time=None, cachestore=<object object>, show_progress=False, cache_download=False)¶
Methods
__init__
(time, variables[, pressure_levels, ...])cache_dataset
(dataset)Cache data from data source.
Return cachepath to local data file based on datetime.
download
(**xr_kwargs)Confirm all data files are downloaded and available locally in the
cachestore
.download_dataset
(times)Download data from data source for input times.
filename
(t)Construct grib filename to retrieve from GFS bucket.
is_datafile_cached
(t, **xr_kwargs)Check datafile defined by datetime for variables and pressure levels in class.
list_timesteps_cached
(**xr_kwargs)Get a list of data files available locally in the
cachestore
.list_timesteps_not_cached
(**xr_kwargs)Get a list of data files not available locally in the
cachestore
.open_dataset
(disk_paths, **xr_kwargs)Open multi-file dataset in xarray.
open_metdataset
([dataset, xr_kwargs])Open MetDataset from data source.
set_metadata
(ds)Set met source metadata on
ds.attrs
.Attributes
S3 client for accessing GFS bucket
Lat / Lon grid spacing.
Cache store for intermediates while processing data source If None, cache is turned off.
Show progress bar when downloading files from AWS
Base time of the previous GFS forecast based on input times
Construct forecast path in bucket for
forecast_time
.Generate a unique hash for this datasource.
Return True if the datasource is single level data.
Path to local source files to load.
GFS pressure level parameters.
List of pressure levels.
GFS surface level parameters.
Get pressure levels available.
Parameters available from data source.
List of individual timesteps from data source derived from
time
Useparse_time()
to handleTimeInput
.Return a list of variable short names.
Return a list of variable standard names.
Variables requested from data source Use
parse_variables()
to handleVariableInput
.- cache_dataset(dataset)¶
Cache data from data source.
- Parameters:
dataset (
xarray.Dataset
) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.
- cache_download¶
- cachestore¶
Cache store for intermediates while processing data source If None, cache is turned off.
- client¶
S3 client for accessing GFS bucket
- create_cachepath(t)¶
Return cachepath to local data file based on datetime.
- Parameters:
t (
datetime
) – Datetime of datafile- Returns:
str
– Path to cached data file
- download(**xr_kwargs)¶
Confirm all data files are downloaded and available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- download_dataset(times)¶
Download data from data source for input times.
- Parameters:
times (
list[datetime]
) – List of datetimes to download a store in cache
- filename(t)¶
Construct grib filename to retrieve from GFS bucket.
String template:
gfs.tCCz.pgrb2.GGGG.fFFF
CC
is the model cycle runtime (i.e. 00, 06, 12, 18)GGGG
is the grid spacingFFF
is the forecast hour of product from 000 - 384
- Parameters:
t (
datetime
) – Timestep to download- Returns:
str
– Forecast filenames to retrieve from GFS bucket.
References
- property forecast_path¶
Construct forecast path in bucket for
forecast_time
.String template:
GFS_FORECAST_BUCKET/gfs.YYYYMMDD/HH/atmos/{filename}”,
- Returns:
str
– Bucket prefix for forecast files.
- forecast_time¶
Base time of the previous GFS forecast based on input times
- grid¶
Lat / Lon grid spacing. One of [0.25, 0.5, 1]
- property hash¶
Generate a unique hash for this datasource.
- Returns:
str
– Unique hash for met instance (sha1)
- is_datafile_cached(t, **xr_kwargs)¶
Check datafile defined by datetime for variables and pressure levels in class.
If using a cloud cache store (i.e.
cache.GCPCacheStore
), this is where the datafile will be mirrored to a local file for access.- Parameters:
t (
datetime
) – Datetime of datafile**xr_kwargs (
Any
) – Additional kwargs passed directly toxarray.open_mfdataset()
when opening files. By default, the following values are used if not specified:chunks: {“time”: 1}
engine: “netcdf4”
parallel: False
- Returns:
bool
– True if data file exists for datetime with all variables and pressure levels, False otherwise
- property is_single_level¶
Return True if the datasource is single level data.
Added in version 0.50.0.
- list_timesteps_cached(**xr_kwargs)¶
Get a list of data files available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- list_timesteps_not_cached(**xr_kwargs)¶
Get a list of data files not available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- open_dataset(disk_paths, **xr_kwargs)¶
Open multi-file dataset in xarray.
- Parameters:
disk_paths (
str | list[str] | pathlib.Path | list[pathlib.Path]
) – list of string paths to local files to open**xr_kwargs (
Any
) – Additional kwargs passed directly toxarray.open_mfdataset()
when opening files. By default, the following values are used if not specified:chunks: {“time”: 1}
engine: “netcdf4”
parallel: False
lock: False
- Returns:
xarray.Dataset
– Open xarray dataset
- open_metdataset(dataset=None, xr_kwargs=None, **kwargs)¶
Open MetDataset from data source.
This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.
- Parameters:
dataset (
xr.Dataset | None
, optional) – Inputxr.Dataset
loaded manually. The dataset must have the same format as the original data source API or files.xr_kwargs (
dict[str
,Any] | None
, optional) – Dictionary of keyword arguments passed intoxarray.open_mfdataset()
when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored ifdataset
is input.**kwargs (
Any
) – Keyword arguments passed through directly intoMetDataset
constructor.
- Returns:
MetDataset
– Meteorology dataset
See also
- paths¶
Path to local source files to load. Set to the paths of files cached in
cachestore
if nopaths
input is provided on init.
- property pressure_level_variables¶
GFS pressure level parameters.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- pressure_levels¶
List of pressure levels. Set to [-1] for data without level coordinate. Use
parse_pressure_levels()
to handlePressureLevelInput
.
- set_metadata(ds)¶
Set met source metadata on
ds.attrs
.This is called within the
open_metdataset()
method to set metadata on the returnedMetDataset
instance.- Parameters:
ds (
xr.Dataset | MetDataset
) – Dataset to set metadata on. Mutated in place.
- show_progress¶
Show progress bar when downloading files from AWS
- property single_level_variables¶
GFS surface level parameters.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- property supported_pressure_levels¶
Get pressure levels available.
- Returns:
list[int]
– List of integer pressure level values
- property supported_variables¶
Parameters available from data source.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- timesteps¶
List of individual timesteps from data source derived from
time
Useparse_time()
to handleTimeInput
.
- property variable_shortnames¶
Return a list of variable short names.
- Returns:
list[str]
– Lst of variable short names.
- property variable_standardnames¶
Return a list of variable standard names.
- Returns:
list[str]
– Lst of variable standard names.
- variables¶
Variables requested from data source Use
parse_variables()
to handleVariableInput
.