pycontrails.datalib.ecmwf.arco_era5¶
Support for ARCO ERA5.
This module supports:
Downloading ARCO ERA5 model level data for specific times and pressure level variables.
Downloading ARCO ERA5 single level data for specific times and single level variables.
Interpolating model level data to a target lat-lon grid and pressure levels.
Local caching of the downloaded and interpolated data as netCDF files.
Opening cached data as a
pycontrails.MetDataset
object.
This module requires the following additional dependencies:
Functions
|
Open ARCO ERA5 model level data for a specific time and variables. |
|
Open ARCO ERA5 single level data for a specific date and variables. |
Classes
|
ARCO ERA5 data accessed remotely through Google Cloud Storage. |
- class pycontrails.datalib.ecmwf.arco_era5.ERA5ARCO(time, variables, pressure_levels=None, cachestore=<object object>)¶
Bases:
ECMWFAPI
ARCO ERA5 data accessed remotely through Google Cloud Storage.
This is a high-level interface to access and cache ARCO ERA5 for a predefined set of times, variables, and pressure levels.
Added in version 0.50.0.
- Parameters:
time (
TimeInput
) – Time of the data to open.variables (
VariableInput
) – List of variables to open.pressure_levels (
PressureLevelInput
, optional) – Target pressure levels, [\(hPa\)]. For pressure level data, this should be a sorted (increasing or decreasing) list of integers. For single level data, this should be-1
. By default, the pressure levels are set to the pressure levels at each model level between 20,000 and 50,000 ft assuming a constant surface pressure.cachestore (
CacheStore
, optional) – Cache store to use. By default, a new disk cache store is used. If None, no caching is done. In this case, the data returned byopen_metdataset()
is not loaded into memory.
References
- cache_dataset(dataset)¶
Cache data from data source.
- Parameters:
dataset (
xarray.Dataset
) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.
- cachestore¶
Cache store for intermediates while processing data source If None, cache is turned off.
- create_cachepath(t)¶
Return cachepath to local data file based on datetime.
- Parameters:
t (
datetime
) – Datetime of datafile- Returns:
str
– Path to cached data file
- download(**xr_kwargs)¶
Confirm all data files are downloaded and available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- download_dataset(times)¶
Download data from data source for input times.
- Parameters:
times (
list[datetime]
) – List of datetimes to download a store in cache
- grid¶
Lat / Lon grid spacing
- property hash¶
Generate a unique hash for this datasource.
- Returns:
str
– Unique hash for met instance (sha1)
- is_datafile_cached(t, **xr_kwargs)¶
Check datafile defined by datetime for variables and pressure levels in class.
If using a cloud cache store (i.e.
cache.GCPCacheStore
), this is where the datafile will be mirrored to a local file for access.- Parameters:
t (
datetime
) – Datetime of datafile**xr_kwargs (
Any
) – Additional kwargs passed directly toxarray.open_mfdataset()
when opening files. By default, the following values are used if not specified:chunks: {“time”: 1}
engine: “netcdf4”
parallel: False
- Returns:
bool
– True if data file exists for datetime with all variables and pressure levels, False otherwise
- property is_single_level¶
Return True if the datasource is single level data.
Added in version 0.50.0.
- list_timesteps_cached(**xr_kwargs)¶
Get a list of data files available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- list_timesteps_not_cached(**xr_kwargs)¶
Get a list of data files not available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- open_dataset(disk_paths, **xr_kwargs)¶
Open multi-file dataset in xarray.
- Parameters:
disk_paths (
str | list[str] | pathlib.Path | list[pathlib.Path]
) – list of string paths to local files to open**xr_kwargs (
Any
) – Additional kwargs passed directly toxarray.open_mfdataset()
when opening files. By default, the following values are used if not specified:chunks: {“time”: 1}
engine: “netcdf4”
parallel: False
lock: False
- Returns:
xarray.Dataset
– Open xarray dataset
- open_metdataset(dataset=None, xr_kwargs=None, **kwargs)¶
Open MetDataset from data source.
This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.
- Parameters:
dataset (
xr.Dataset | None
, optional) – Inputxr.Dataset
loaded manually. The dataset must have the same format as the original data source API or files.xr_kwargs (
dict[str
,Any] | None
, optional) – Dictionary of keyword arguments passed intoxarray.open_mfdataset()
when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored ifdataset
is input.**kwargs (
Any
) – Keyword arguments passed through directly intoMetDataset
constructor.
- Returns:
MetDataset
– Meteorology dataset
See also
- paths¶
Path to local source files to load. Set to the paths of files cached in
cachestore
if nopaths
input is provided on init.
- property pressure_level_variables¶
Variables available in the ARCO ERA5 model level data.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- pressure_levels¶
List of pressure levels. Set to [-1] for data without level coordinate. Use
parse_pressure_levels()
to handlePressureLevelInput
.
- set_metadata(ds)¶
Set met source metadata on
ds.attrs
.This is called within the
open_metdataset()
method to set metadata on the returnedMetDataset
instance.- Parameters:
ds (
xr.Dataset | MetDataset
) – Dataset to set metadata on. Mutated in place.
- property single_level_variables¶
Variables available in the ARCO ERA5 single level data.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- property supported_pressure_levels¶
Pressure levels available from datasource.
- Returns:
list[int] | None
– List of integer pressure levels for class. If None, no pressure level information available for class.
- property supported_variables¶
Parameters available from data source.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- timesteps¶
List of individual timesteps from data source derived from
time
Useparse_time()
to handleTimeInput
.
- property variable_ecmwfids¶
Return a list of variable ecmwf_ids.
- Returns:
list[int]
– List of int ECMWF param ids.
- property variable_shortnames¶
Return a list of variable short names.
- Returns:
list[str]
– Lst of variable short names.
- property variable_standardnames¶
Return a list of variable standard names.
- Returns:
list[str]
– Lst of variable standard names.
- variables¶
Variables requested from data source Use
parse_variables()
to handleVariableInput
.
- pycontrails.datalib.ecmwf.arco_era5.open_arco_era5_model_level_data(times, variables, pressure_levels)¶
Open ARCO ERA5 model level data for a specific time and variables.
Data is not loaded into memory, and the data is not cached.
- Parameters:
times (
list[datetime.datetime]
) – Time of the data to open.variables (
list[met_var.MetVariable]
) – List of variables to open. Unsupported variables are ignored.pressure_levels (
npt.ArrayLike
) – Target pressure levels, [\(hPa\)].
- Returns:
xarray.Dataset
– Dataset with the requested variables on the target grid and pressure levels. Data is reformatted forMetDataset
conventions.
References
- pycontrails.datalib.ecmwf.arco_era5.open_arco_era5_single_level(times, variables)¶
Open ARCO ERA5 single level data for a specific date and variables.
Data is not loaded into memory, and the data is not cached.
- Parameters:
times (
list[datetime.date]
) – Time of the data to open.variables (
list[met_var.MetVariable]
) – List of variables to open.
- Returns:
xarray.Dataset
– Dataset with the requested variables. Data is reformatted forMetDataset
conventions.- Raises:
FileNotFoundError – If the variable is not found at the requested date. This could indicate that the variable is not available in the ARCO ERA5 dataset, or that the time requested is outside the available range.