pycontrails.GCPCacheStore¶
- class pycontrails.GCPCacheStore(cache_dir='', project=None, bucket=None, disk_cache=None, read_only=True, allow_clear=False, timeout=300, show_progress=False, chunk_size=16777216)¶
Bases:
CacheStore
Google Cloud Platform (Storage) Cache.
This class downloads files from Google Cloud Storage locally to a
DiskCacheStore
initialized withcache_dir=".gcp"
to avoid re-downloading files. If the source files on GCP changes, the local mirror of the GCP DiskCacheStore must be cleared by initializing this class and runningclear_disk()
.Note by default, GCP Cache Store is read only. When a
put()
is called andread_only
is set to True, the cache will throw anRuntimeError
error. Setread_only
to False to enable writing to cache store.- Parameters:
cache_dir (
str
, optional) – Root object prefix withinbucket
Defaults toPYCONTRAILS_CACHE_DIR
environment variable, or the root of the bucket. The full GCP URI (ie, “gs://<MY_BUCKET>/<PREFIX>”) can be used here.project (str , *optional*) – GCP Project. Defaults to the current active project set in the google-cloud-sdk environment
bucket (
str
, optional) – GCP Bucket to use for cache. Defaults toPYCONTRAILS_CACHE_BUCKET
environment variable.read_only (
bool
, optional) – Only enable reading from cache. Defaults toTrue
.allow_clear (
bool
, optional) – Allow this cache to be cleared usingclear()
. Defaults toFalse
.disk_cache (
DiskCacheStore
, optional) – Specify a custom local disk cache store to mirror files. Defaults toDiskCacheStore(cache_dir="{user_cache_dir}/.gcp/{bucket}")
show_progress (
bool
, optional) – Show progress bar on cacheput()
. Defaults to Falsechunk_size (
int
, optional) – Chunk size for uploads and downloads with progress. Set a larger size to see more granular progress, and set a smaller size for more optimal download speed. Chunk size must be a multiple of 262144 (ie, 10 * 262144). Default value is 8 * 262144, which will throttle fast download speeds.
Examples
>>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... ) >>> cache.cache_dir 'cache/' >>> cache.bucket 'contrails-301217-unit-test'
- __init__(cache_dir='', project=None, bucket=None, disk_cache=None, read_only=True, allow_clear=False, timeout=300, show_progress=False, chunk_size=16777216)¶
Methods
__init__
([cache_dir, project, bucket, ...])clear_disk
([cache_path])Clear the local disk cache mirror of the GCP Cache Store.
exists
(cache_path)Check if a path in cache exists.
get
(cache_path)Get data from the local cache store.
gs_path
(cache_path)Return a full Google Storage (gs://) URI to object.
listdir
([path])List the contents of a directory in the cache.
path
(cache_path)Return a full filepath in cache.
put
(data_path[, cache_path])Save data to the GCP cache store.
put_multiple
(data_path, cache_path)Put multiple files into the cache at once.
Attributes
Handle to Google Cloud Storage client.
Return the disk size (in MBytes) of the local cache.
- allow_clear¶
- bucket¶
- cache_dir¶
- chunk_size¶
- clear_disk(cache_path='')¶
Clear the local disk cache mirror of the GCP Cache Store.
- Parameters:
cache_path (
str
, optional) – Path in mirrored cache store. Passed into_disk_clear.clear()
. By default, this method will clear the entire mirrored cache store.
Examples
>>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... ) >>> cache.clear_disk()
- property client¶
Handle to Google Cloud Storage client.
- Returns:
google.cloud.storage.Client
– Handle to Google Cloud Storage client
- exists(cache_path)¶
Check if a path in cache exists.
- Parameters:
cache_path (
str
) – Path to directory or file in cache- Returns:
bool
– True if directory or file exists
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.exists("file.nc") False
- get(cache_path)¶
Get data from the local cache store.
- Parameters:
cache_path (
str
) – Path in cache store to get data- Returns:
str
– Returns path to downloaded local file- Raises:
ValueError – Raises value error is
cache_path
refers to a directory
Examples
>>> import pathlib >>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... read_only=False, ... )
>>> cache.put("README.md", "example/file.md") 'example/file.md'
>>> # returns a full path to local copy of the file >>> path = cache.get("example/file.md") >>> pathlib.Path(path).is_file() True
>>> pathlib.Path(path).read_text()[17:69] 'Python library for modeling aviation climate impacts'
- gs_path(cache_path)¶
Return a full Google Storage (gs://) URI to object.
- Parameters:
cache_path (
str
) – string path to object in cache- Returns:
str
– Google Storage URI (gs://) to object in cache
Examples
>>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... ) >>> cache.path("file.nc") 'cache/file.nc'
- listdir(path='')¶
List the contents of a directory in the cache.
- Parameters:
path (
str
) – Path to the directory to list- Returns:
list[str]
– List of files in the directory
- path(cache_path)¶
Return a full filepath in cache.
- Parameters:
cache_path (
str
) – string path or filepath to create in cache If parent directories do not exist, they will be created.- Returns:
str
– Full path string to subdirectory directory or object in cache directory
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.path("file.nc") 'cache/file.nc'
>>> cache.clear() # cleanup
- project¶
- put(data_path, cache_path=None)¶
Save data to the GCP cache store.
If
read_only
is True, this method will return the path to the local disk cache store path.- Parameters:
data_path (
str | pathlib.Path
) – Data to save to GCP cache store.cache_path (
str
, optional) – Path in cache store to save data. Defaults to the same filename asdata_path
.
- Returns:
str
– Returns the path in the cache to the stored file- Raises:
RuntimeError – Raises if
read_only
is TrueFileNotFoundError – Raises if
data
is a string and a file is not found at the string
Examples
>>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... read_only=False, ... )
>>> # put a file directly >>> cache.put("README.md", "test/file.md") 'test/file.md'
- put_multiple(data_path, cache_path)¶
Put multiple files into the cache at once.
- Parameters:
- Returns:
list[str]
– Returns a list of relative paths to the stored files in the cache
- read_only¶
- show_progress¶
- property size¶
Return the disk size (in MBytes) of the local cache.
- Returns:
float
– Size of the disk cache store in MB
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.size 0.0...
>>> cache.clear() # cleanup
- timeout¶