Storage

Storage abstraction layer with multiple backend support (local filesystem, fsspec, obstore), configuration-based registration, and Arrow table import/export with CSV format support.

Pipelines

class sqlspec.storage.SyncStoragePipeline[source]

Bases: object

Pipeline coordinating storage registry operations and telemetry.

__init__(*, registry=None)[source]
write_rows(rows, destination, *, format_hint=None, storage_options=None)[source]

Write dictionary rows to storage using cached serializers.

Return type:

StorageTelemetry

write_arrow(table, destination, *, format_hint=None, storage_options=None, compression=None)[source]

Write an Arrow table to storage using zero-copy buffers.

Return type:

StorageTelemetry

read_arrow(source, *, file_format, storage_options=None)[source]

Read an artifact from storage and decode it into an Arrow table.

Return type:

tuple[Table, StorageTelemetry]

stream_read(source, *, chunk_size=None, storage_options=None)[source]

Stream bytes from an artifact.

Return type:

Iterator[bytes]

allocate_staging_artifacts(requests)[source]

Allocate staging metadata for upcoming loads.

Return type:

list[StagedArtifact]

cleanup_staging_artifacts(artifacts, *, ignore_errors=True)[source]

Delete staged artifacts best-effort.

Return type:

None

class sqlspec.storage.AsyncStoragePipeline[source]

Bases: object

Async variant of the storage pipeline leveraging async-capable backends when available.

__init__(*, registry=None)[source]
async stream_read_async(source, *, chunk_size=None, storage_options=None)[source]

Stream bytes from an artifact asynchronously.

Return type:

AsyncIterator[bytes]

Registry

class sqlspec.storage.StorageRegistry[source]

Bases: object

Global storage registry for named backend configurations.

Allows registering named storage backends that can be accessed from anywhere in your application. Backends are automatically selected based on URI scheme unless explicitly overridden.

Examples

backend = registry.get(“s3://my-bucket”) backend = registry.get(“file:///tmp/data”) backend = registry.get(“gs://my-gcs-bucket”)

registry.register_alias(“my_app_store”, “file:///tmp/dev_data”)

registry.register_alias(“my_app_store”, “s3://prod-bucket/data”)

store = registry.get(“my_app_store”)

backend = registry.get(“s3://bucket”, backend=”fsspec”)

__init__()[source]
register_alias(alias, uri, *, backend=None, base_path='', **kwargs)[source]

Register a named alias for a storage configuration.

Parameters:
  • alias (str) – Unique alias name (e.g., “my_app_store”, “user_uploads”)

  • uri (str) – Storage URI (e.g., “s3://bucket”, “file:///path”, “gs://bucket”)

  • backend (str | None) – Force specific backend (“local”, “fsspec”, “obstore”) instead of auto-detection

  • base_path (str) – Base path to prepend to all operations

  • **kwargs (Any) – Backend-specific configuration options

Return type:

None

get(uri_or_alias, *, backend=None, **kwargs)[source]

Get backend instance using URI-first routing with automatic backend selection.

Parameters:
  • uri_or_alias (str | Path) – URI to resolve directly OR named alias (e.g., “my_app_store”)

  • backend (str | None) – Force specific backend (“local”, “fsspec”, “obstore”) instead of auto-selection

  • **kwargs (Any) – Additional backend-specific configuration options

Return type:

ObjectStoreProtocol

Returns:

Backend instance with automatic backend selection

Raises:

ImproperConfigurationError – If alias not found or invalid input

is_alias_registered(alias)[source]

Check if a named alias is registered.

Return type:

bool

list_aliases()[source]

List all registered aliases.

Return type:

list[str]

clear_cache(uri_or_alias=None)[source]

Clear resolved backend cache.

Return type:

None

clear()[source]

Clear all aliases and instances.

Return type:

None

clear_instances()[source]

Clear only cached instances, keeping aliases.

Return type:

None

clear_aliases()[source]

Clear only aliases, keeping cached instances.

Return type:

None

Configuration Types

class sqlspec.storage.StorageCapabilities[source]

Bases: TypedDict

Runtime-evaluated driver storage capabilities.

class sqlspec.storage.PartitionStrategyConfig[source]

Bases: TypedDict

Configuration for partition fan-out strategies.

class sqlspec.storage.StorageLoadRequest[source]

Bases: TypedDict

Request describing a staging allocation.

class sqlspec.storage.StagedArtifact[source]

Bases: TypedDict

Metadata describing a staged artifact managed by the pipeline.

class sqlspec.storage.StorageTelemetry[source]

Bases: TypedDict

Telemetry payload for storage bridge operations.

class sqlspec.storage.StorageBridgeJob[source]

Bases: NamedTuple

Handle representing a storage bridge operation.

job_id: str

Alias for field number 0

status: str

Alias for field number 1

telemetry: StorageTelemetry

Alias for field number 2

Backends

class sqlspec.storage.backends.base.ObjectStoreBase[source]

Bases: ABC

Base class for storage backends.

All synchronous methods follow the *_sync naming convention for consistency with their async counterparts.

abstractmethod read_bytes_sync(path, **kwargs)[source]

Read bytes from storage synchronously.

Return type:

bytes

abstractmethod write_bytes_sync(path, data, **kwargs)[source]

Write bytes to storage synchronously.

Return type:

None

abstractmethod stream_read_sync(path, chunk_size=None, **kwargs)[source]

Stream bytes from storage synchronously.

Return type:

Iterator[bytes]

abstractmethod read_text_sync(path, encoding='utf-8', **kwargs)[source]

Read text from storage synchronously.

Return type:

str

abstractmethod write_text_sync(path, data, encoding='utf-8', **kwargs)[source]

Write text to storage synchronously.

Return type:

None

abstractmethod list_objects_sync(prefix='', recursive=True, **kwargs)[source]

List objects in storage synchronously.

Return type:

list[str]

abstractmethod exists_sync(path, **kwargs)[source]

Check if object exists in storage synchronously.

Return type:

bool

abstractmethod delete_sync(path, **kwargs)[source]

Delete object from storage synchronously.

Return type:

None

abstractmethod copy_sync(source, destination, **kwargs)[source]

Copy object within storage synchronously.

Return type:

None

abstractmethod move_sync(source, destination, **kwargs)[source]

Move object within storage synchronously.

Return type:

None

abstractmethod glob_sync(pattern, **kwargs)[source]

Find objects matching pattern synchronously.

Return type:

list[str]

abstractmethod get_metadata_sync(path, **kwargs)[source]

Get object metadata from storage synchronously.

Return type:

dict[str, object]

abstractmethod is_object_sync(path)[source]

Check if path points to an object synchronously.

Return type:

bool

abstractmethod is_path_sync(path)[source]

Check if path points to a directory synchronously.

Return type:

bool

abstractmethod read_arrow_sync(path, **kwargs)[source]

Read Arrow table from storage synchronously.

Return type:

Table

abstractmethod write_arrow_sync(path, table, **kwargs)[source]

Write Arrow table to storage synchronously.

Return type:

None

abstractmethod stream_arrow_sync(pattern, **kwargs)[source]

Stream Arrow record batches from storage synchronously.

Return type:

Iterator[RecordBatch]

abstractmethod async read_bytes_async(path, **kwargs)[source]

Read bytes from storage asynchronously.

Return type:

bytes

abstractmethod async write_bytes_async(path, data, **kwargs)[source]

Write bytes to storage asynchronously.

Return type:

None

abstractmethod async read_text_async(path, encoding='utf-8', **kwargs)[source]

Read text from storage asynchronously.

Return type:

str

abstractmethod async write_text_async(path, data, encoding='utf-8', **kwargs)[source]

Write text to storage asynchronously.

Return type:

None

abstractmethod async stream_read_async(path, chunk_size=None, **kwargs)[source]

Stream bytes from storage asynchronously.

Return type:

AsyncIterator[bytes]

abstractmethod async list_objects_async(prefix='', recursive=True, **kwargs)[source]

List objects in storage asynchronously.

Return type:

list[str]

abstractmethod async exists_async(path, **kwargs)[source]

Check if object exists in storage asynchronously.

Return type:

bool

abstractmethod async delete_async(path, **kwargs)[source]

Delete object from storage asynchronously.

Return type:

None

abstractmethod async copy_async(source, destination, **kwargs)[source]

Copy object within storage asynchronously.

Return type:

None

abstractmethod async move_async(source, destination, **kwargs)[source]

Move object within storage asynchronously.

Return type:

None

abstractmethod async get_metadata_async(path, **kwargs)[source]

Get object metadata from storage asynchronously.

Return type:

dict[str, object]

abstractmethod async read_arrow_async(path, **kwargs)[source]

Read Arrow table from storage asynchronously.

Return type:

Table

abstractmethod async write_arrow_async(path, table, **kwargs)[source]

Write Arrow table to storage asynchronously.

Return type:

None

abstractmethod stream_arrow_async(pattern, **kwargs)[source]

Stream Arrow record batches from storage asynchronously.

Return type:

AsyncIterator[RecordBatch]

class sqlspec.storage.backends.local.LocalStore[source]

Bases: object

Simple local file system storage backend.

Provides file system operations without requiring fsspec or obstore. Supports file:// URIs and regular file paths.

All synchronous methods use the *_sync suffix for consistency with async methods.

__init__(uri='', **kwargs)[source]

Initialize local storage backend.

Parameters:
  • uri (str) – File URI or path (e.g., “file:///path” or “/path”)

  • **kwargs (Any) –

    Additional options including: - base_path: Subdirectory relative to URI path. If relative, it’s combined

    with the URI path. If absolute, it takes precedence (backward compatible).

The URI may be a file:// path (Windows style like file:///C:/path is supported). When both URI and base_path are provided, they are combined: - file:///home/user/storage + base_path=”subdir” -> /home/user/storage/subdir - file:///home/user/storage + base_path=”/other” -> /other (absolute takes precedence)

read_bytes_sync(path, **kwargs)[source]

Read bytes from file synchronously.

Return type:

bytes

write_bytes_sync(path, data, **kwargs)[source]

Write bytes to file synchronously.

Return type:

None

read_text_sync(path, encoding='utf-8', **kwargs)[source]

Read text from file synchronously.

Return type:

str

write_text_sync(path, data, encoding='utf-8', **kwargs)[source]

Write text to file synchronously.

Return type:

None

stream_read_sync(path, chunk_size=None, **kwargs)[source]

Stream bytes from file synchronously.

Return type:

Iterator[bytes]

list_objects_sync(prefix='', recursive=True, **kwargs)[source]

List objects in directory synchronously.

Parameters:
  • prefix (str) – Optional prefix that may look like a directory or filename filter.

  • recursive (bool) – Whether to walk subdirectories.

  • **kwargs (Any) – Additional backend-specific options (currently unused).

Return type:

list[str]

When the prefix resembles a directory (contains a slash or ends with ‘/’), we treat it as a path; otherwise we filter filenames within the base path. Paths outside base_path are returned with their absolute names.

exists_sync(path, **kwargs)[source]

Check if file exists synchronously.

Return type:

bool

delete_sync(path, **kwargs)[source]

Delete file or directory synchronously.

Return type:

None

copy_sync(source, destination, **kwargs)[source]

Copy file or directory synchronously.

Return type:

None

move_sync(source, destination, **kwargs)[source]

Move file or directory synchronously.

Return type:

None

glob_sync(pattern, **kwargs)[source]

Find files matching pattern synchronously.

Supports both relative and absolute patterns by adjusting where the glob search begins.

Return type:

list[str]

get_metadata_sync(path, **kwargs)[source]

Get file metadata synchronously.

Return type:

dict[str, object]

is_object_sync(path)[source]

Check if path points to a file synchronously.

Return type:

bool

is_path_sync(path)[source]

Check if path points to a directory synchronously.

Return type:

bool

read_arrow_sync(path, **kwargs)[source]

Read Arrow table from file synchronously.

Return type:

Table

write_arrow_sync(path, table, **kwargs)[source]

Write Arrow table to file synchronously.

Return type:

None

stream_arrow_sync(pattern, **kwargs)[source]

Stream Arrow record batches from files matching pattern synchronously.

Yields:

Arrow record batches from matching files.

Return type:

Iterator[RecordBatch]

property supports_signing: bool

Whether this backend supports URL signing.

Local file storage does not support URL signing. Local files are accessed directly via file:// URIs.

Returns:

Always False for local storage.

sign_sync(paths, expires_in=3600, for_upload=False)[source]

Generate signed URL(s).

Raises:

NotImplementedError – Local file storage does not require URL signing. Local files are accessed directly via file:// URIs.

Return type:

str | list[str]

async read_bytes_async(path, **kwargs)[source]

Read bytes from file asynchronously.

Return type:

bytes

async write_bytes_async(path, data, **kwargs)[source]

Write bytes to file asynchronously.

Return type:

None

async read_text_async(path, encoding='utf-8', **kwargs)[source]

Read text from file asynchronously.

Return type:

str

async write_text_async(path, data, encoding='utf-8', **kwargs)[source]

Write text to file asynchronously.

Return type:

None

async stream_read_async(path, chunk_size=None, **kwargs)[source]

Stream bytes from file asynchronously.

Uses AsyncThreadedBytesIterator to offload blocking file I/O to a thread pool, ensuring the event loop is not blocked during read operations.

Parameters:
  • path (str | Path) – Path to the file to read.

  • chunk_size (int | None) – Size of chunks to yield (default: 65536 bytes).

  • **kwargs (Any) – Additional arguments (unused).

Return type:

AsyncIterator[bytes]

Returns:

AsyncIterator yielding chunks of bytes.

async list_objects_async(prefix='', recursive=True, **kwargs)[source]

List objects asynchronously.

Return type:

list[str]

async exists_async(path, **kwargs)[source]

Check if file exists asynchronously.

Return type:

bool

async delete_async(path, **kwargs)[source]

Delete file asynchronously.

Return type:

None

async copy_async(source, destination, **kwargs)[source]

Copy file asynchronously.

Return type:

None

async move_async(source, destination, **kwargs)[source]

Move file asynchronously.

Return type:

None

async get_metadata_async(path, **kwargs)[source]

Get file metadata asynchronously.

Return type:

dict[str, object]

async read_arrow_async(path, **kwargs)[source]

Read Arrow table asynchronously.

Uses async_() to offload blocking PyArrow I/O to thread pool.

Return type:

Table

async write_arrow_async(path, table, **kwargs)[source]

Write Arrow table asynchronously.

Uses async_() to offload blocking PyArrow I/O to thread pool.

Return type:

None

stream_arrow_async(pattern, **kwargs)[source]

Stream Arrow record batches asynchronously.

Parameters:
  • pattern (str) – Glob pattern to match files.

  • **kwargs (Any) – Additional arguments passed to stream_arrow_sync().

Return type:

AsyncIterator[RecordBatch]

Returns:

AsyncIterator yielding Arrow record batches.

async sign_async(paths, expires_in=3600, for_upload=False)[source]

Generate signed URL(s) asynchronously.

Return type:

str | list[str]

class sqlspec.storage.backends.fsspec.FSSpecBackend[source]

Bases: object

Storage backend using fsspec.

Implements ObjectStoreProtocol using fsspec for various protocols including HTTP, HTTPS, FTP, and cloud storage services.

All synchronous methods use the *_sync suffix for consistency with async methods.

__init__(uri, **kwargs)[source]

Initialize the fsspec-backed storage backend.

Parameters:
  • uri (str) – Filesystem URI (protocol://path).

  • **kwargs (Any) – Additional fsspec configuration options, including an optional base_path.

For cloud URIs (S3/GS/Azure) and file:// URIs, we derive a default base_path from the URI path when no explicit base_path is provided. When both URI and base_path are provided, they are combined (base_path is appended to URI-derived path).

Examples

  • FSSpecBackend(“s3://bucket/prefix”) -> base_path = “bucket/prefix”

  • FSSpecBackend(”file:///home/user/storage”) -> base_path = “/home/user/storage”

  • FSSpecBackend(”file:///home/user”, base_path=”subdir”) -> base_path = “/home/user/subdir”

read_bytes_sync(path, **kwargs)[source]

Read bytes from an object synchronously.

Return type:

bytes

write_bytes_sync(path, data, **kwargs)[source]

Write bytes to an object synchronously.

Return type:

None

read_text_sync(path, encoding='utf-8', **kwargs)[source]

Read text from an object synchronously.

Return type:

str

write_text_sync(path, data, encoding='utf-8', **kwargs)[source]

Write text to an object synchronously.

Return type:

None

exists_sync(path, **kwargs)[source]

Check if an object exists synchronously.

Return type:

bool

delete_sync(path, **kwargs)[source]

Delete an object synchronously.

Return type:

None

copy_sync(source, destination, **kwargs)[source]

Copy an object synchronously.

Return type:

None

move_sync(source, destination, **kwargs)[source]

Move an object synchronously.

Return type:

None

read_arrow_sync(path, **kwargs)[source]

Read an Arrow table from storage synchronously.

Return type:

Table

write_arrow_sync(path, table, **kwargs)[source]

Write an Arrow table to storage synchronously.

Return type:

None

list_objects_sync(prefix='', recursive=True, **kwargs)[source]

List objects with optional prefix synchronously.

Return type:

list[str]

glob_sync(pattern, **kwargs)[source]

Find objects matching a glob pattern synchronously.

Return type:

list[str]

is_object_sync(path)[source]

Check if path points to an object synchronously.

Return type:

bool

is_path_sync(path)[source]

Check if path points to a prefix (directory-like) synchronously.

Return type:

bool

get_metadata_sync(path, **kwargs)[source]

Get object metadata synchronously.

Return type:

dict[str, object]

property supports_signing: bool

Whether this backend supports URL signing.

FSSpec backends do not support URL signing. Use ObStoreBackend for S3, GCS, or Azure if you need signed URLs.

Returns:

Always False for fsspec backends.

sign_sync(paths, expires_in=3600, for_upload=False)[source]

Generate signed URL(s).

Raises:

NotImplementedError – fsspec backends do not support URL signing. Use obstore backend for S3, GCS, or Azure if you need signed URLs.

Return type:

str | list[str]

stream_read_sync(path, chunk_size=None, **kwargs)[source]

Stream bytes from storage synchronously.

Return type:

Iterator[bytes]

stream_arrow_sync(pattern, **kwargs)[source]

Stream Arrow record batches from storage synchronously.

Parameters:
  • pattern (str) – The glob pattern to match.

  • **kwargs (Any) – Additional arguments to pass to the glob method.

Yields:

Arrow record batches from matching files.

Return type:

Iterator[RecordBatch]

async read_bytes_async(path, **kwargs)[source]

Read bytes from storage asynchronously.

Return type:

bytes

async write_bytes_async(path, data, **kwargs)[source]

Write bytes to storage asynchronously.

Return type:

None

async stream_read_async(path, chunk_size=None, **kwargs)[source]

Stream bytes from storage asynchronously.

Uses asyncio.to_thread() to read chunks of the file in a thread pool, ensuring the event loop is not blocked while avoiding buffering the entire file into memory.

Parameters:
  • path (str | Path) – Path to the file to read.

  • chunk_size (int | None) – Size of chunks to yield (default: 65536 bytes).

  • **kwargs (Any) – Additional arguments passed to fs.open.

Return type:

AsyncIterator[bytes]

Returns:

AsyncIterator yielding chunks of bytes.

stream_arrow_async(pattern, **kwargs)[source]

Stream Arrow record batches from storage asynchronously.

Parameters:
  • pattern (str) – The glob pattern to match.

  • **kwargs (Any) – Additional arguments to pass to the glob method.

Return type:

AsyncIterator[RecordBatch]

Returns:

AsyncIterator yielding Arrow record batches.

async read_text_async(path, encoding='utf-8', **kwargs)[source]

Read text from storage asynchronously.

Return type:

str

async write_text_async(path, data, encoding='utf-8', **kwargs)[source]

Write text to storage asynchronously.

Return type:

None

async list_objects_async(prefix='', recursive=True, **kwargs)[source]

List objects in storage asynchronously.

Return type:

list[str]

async exists_async(path, **kwargs)[source]

Check if object exists in storage asynchronously.

Return type:

bool

async delete_async(path, **kwargs)[source]

Delete object from storage asynchronously.

Return type:

None

async copy_async(source, destination, **kwargs)[source]

Copy object in storage asynchronously.

Return type:

None

async move_async(source, destination, **kwargs)[source]

Move object in storage asynchronously.

Return type:

None

async get_metadata_async(path, **kwargs)[source]

Get object metadata from storage asynchronously.

Return type:

dict[str, object]

async sign_async(paths, expires_in=3600, for_upload=False)[source]

Generate signed URL(s) asynchronously.

Return type:

str | list[str]

async read_arrow_async(path, **kwargs)[source]

Read Arrow table from storage asynchronously.

Uses async_() with storage limiter to offload blocking PyArrow I/O to thread pool.

Return type:

Table

async write_arrow_async(path, table, **kwargs)[source]

Write Arrow table to storage asynchronously.

Uses async_() with storage limiter to offload blocking PyArrow I/O to thread pool.

Return type:

None

class sqlspec.storage.backends.obstore.ObStoreBackend[source]

Bases: object

Object storage backend using obstore.

Implements ObjectStoreProtocol using obstore’s Rust-based implementation for storage operations. Supports AWS S3, Google Cloud Storage, Azure Blob Storage, local filesystem, and HTTP endpoints.

All synchronous methods use the *_sync suffix for consistency with async methods.

__init__(uri, **kwargs)[source]

Initialize obstore backend.

Parameters:
  • uri (str) – Storage URI. Supported formats: - file:///absolute/path - Local filesystem - s3://bucket/prefix - AWS S3 - gs://bucket/prefix - Google Cloud Storage - az://container/prefix - Azure Blob Storage - memory:// - In-memory storage (for testing)

  • **kwargs (Any) –

    Additional options: - base_path (str): For local files (file://), this is combined with

    the URI path to form the storage root. For example: uri=”file:///data” + base_path=”uploads” → /data/uploads If base_path is absolute, it overrides the URI path (backward compat). For cloud storage, base_path is used as an object key prefix.

    • Other obstore configuration options (timeouts, credentials, etc.)

property is_local_store: bool

Return whether the backend uses local storage.

classmethod from_config(config)[source]

Create backend from configuration dictionary.

Return type:

ObStoreBackend

read_bytes_sync(path, **kwargs)[source]

Read bytes using obstore synchronously.

Return type:

bytes

write_bytes_sync(path, data, **kwargs)[source]

Write bytes using obstore synchronously.

Return type:

None

read_text_sync(path, encoding='utf-8', **kwargs)[source]

Read text using obstore synchronously.

Return type:

str

write_text_sync(path, data, encoding='utf-8', **kwargs)[source]

Write text using obstore synchronously.

Return type:

None

list_objects_sync(prefix='', recursive=True, **kwargs)[source]

List objects using obstore synchronously.

Return type:

list[str]

exists_sync(path, **kwargs)[source]

Check if object exists using obstore synchronously.

Return type:

bool

delete_sync(path, **kwargs)[source]

Delete object using obstore synchronously.

Return type:

None

copy_sync(source, destination, **kwargs)[source]

Copy object using obstore synchronously.

Return type:

None

move_sync(source, destination, **kwargs)[source]

Move object using obstore synchronously.

Return type:

None

glob_sync(pattern, **kwargs)[source]

Find objects matching pattern synchronously.

Lists all objects and filters them client-side using the pattern.

Return type:

list[str]

get_metadata_sync(path, **kwargs)[source]

Get object metadata using obstore synchronously.

Return type:

dict[str, object]

is_object_sync(path)[source]

Check if path is an object using obstore synchronously.

Return type:

bool

is_path_sync(path)[source]

Check if path is a prefix/directory using obstore synchronously.

Return type:

bool

read_arrow_sync(path, **kwargs)[source]

Read Arrow table using obstore synchronously.

Return type:

Table

write_arrow_sync(path, table, **kwargs)[source]

Write Arrow table using obstore synchronously.

Return type:

None

stream_read_sync(path, chunk_size=None, **kwargs)[source]

Stream bytes using obstore’s native streaming synchronously.

Uses obstore’s sync streaming iterator which yields chunks without loading the entire file into memory, for both local and remote backends.

Return type:

Iterator[bytes]

stream_arrow_sync(pattern, **kwargs)[source]

Stream Arrow record batches using obstore’s native streaming synchronously.

For each matching file, streams data through a buffered wrapper that PyArrow can read directly without loading the entire file.

Return type:

Iterator[RecordBatch]

property supports_signing: bool

Whether this backend supports URL signing.

Only S3, GCS, and Azure backends support pre-signed URLs. Local file storage does not support URL signing.

Returns:

True if the protocol supports signing, False otherwise.

sign_sync(paths, expires_in=3600, for_upload=False)[source]

Generate signed URL(s) for the object(s).

Parameters:
  • paths (str | list[str]) – Single object path or list of paths to sign.

  • expires_in (int) – URL expiration time in seconds (default: 3600, max: 604800 = 7 days).

  • for_upload (bool) – Whether the URL is for upload (PUT) vs download (GET).

Return type:

str | list[str]

Returns:

Single signed URL string if paths is a string, or list of signed URLs if paths is a list. Preserves input type for convenience.

Raises:
async read_bytes_async(path, **kwargs)[source]

Read bytes from storage asynchronously.

Return type:

bytes

async write_bytes_async(path, data, **kwargs)[source]

Write bytes to storage asynchronously.

Return type:

None

async stream_read_async(path, chunk_size=None, **kwargs)[source]

Stream bytes from storage asynchronously.

Uses obstore’s native async streaming to yield chunks of bytes without buffering the entire file into memory.

Return type:

AsyncIterator[bytes]

async list_objects_async(prefix='', recursive=True, **kwargs)[source]

List objects in storage asynchronously.

Return type:

list[str]

async read_text_async(path, encoding='utf-8', **kwargs)[source]

Read text from storage asynchronously.

Return type:

str

async write_text_async(path, data, encoding='utf-8', **kwargs)[source]

Write text to storage asynchronously.

Return type:

None

async exists_async(path, **kwargs)[source]

Check if object exists in storage asynchronously.

Return type:

bool

async delete_async(path, **kwargs)[source]

Delete object from storage asynchronously.

Return type:

None

async copy_async(source, destination, **kwargs)[source]

Copy object in storage asynchronously.

Return type:

None

async move_async(source, destination, **kwargs)[source]

Move object in storage asynchronously.

Return type:

None

async get_metadata_async(path, **kwargs)[source]

Get object metadata from storage asynchronously.

Return type:

dict[str, object]

async read_arrow_async(path, **kwargs)[source]

Read Arrow table from storage asynchronously.

Uses async_() with storage limiter to offload blocking PyArrow I/O to thread pool.

Return type:

Table

async write_arrow_async(path, table, **kwargs)[source]

Write Arrow table to storage asynchronously.

Uses async_() with storage limiter to offload blocking PyArrow serialization to thread pool, preventing event loop blocking.

Return type:

None

stream_arrow_async(pattern, **kwargs)[source]

Stream Arrow record batches from storage asynchronously.

Parameters:
  • pattern (str) – Glob pattern to match files.

  • **kwargs (Any) – Additional arguments passed to stream_arrow_sync().

Return type:

AsyncIterator[RecordBatch]

Returns:

AsyncIterator yielding Arrow record batches.

async sign_async(paths, expires_in=3600, for_upload=False)[source]

Generate signed URL(s) asynchronously.

Parameters:
  • paths (str | list[str]) – Single object path or list of paths to sign.

  • expires_in (int) – URL expiration time in seconds (default: 3600, max: 604800 = 7 days).

  • for_upload (bool) – Whether the URL is for upload (PUT) vs download (GET).

Return type:

str | list[str]

Returns:

Single signed URL string if paths is a string, or list of signed URLs if paths is a list. Preserves input type for convenience.

Raises:

Module Functions

sqlspec.storage.create_storage_bridge_job(status, telemetry)[source]

Create a storage bridge job handle with a unique identifier.

Return type:

StorageBridgeJob

sqlspec.storage.get_storage_bridge_diagnostics()[source]

Return aggregated storage bridge + serializer cache metrics.

Return type:

dict[str, float]

sqlspec.storage.get_storage_bridge_metrics()[source]

Return aggregated storage bridge metrics.

Return type:

dict[str, int]

sqlspec.storage.reset_storage_bridge_metrics()[source]

Reset aggregated storage bridge metrics.

Return type:

None

sqlspec.storage.resolve_storage_path(path, base_path='', protocol='file', strip_file_scheme=True)[source]

Resolve path relative to base_path with protocol-specific handling.

Parameters:
  • path (str | Path) – Path to resolve (may include file:// scheme).

  • base_path (str) – Base path to prepend if path is relative.

  • protocol (str) – Storage protocol (file, s3, gs, etc.).

  • strip_file_scheme (bool) – Whether to strip file:// prefix.

Return type:

str

Returns:

Resolved path string suitable for the storage backend.

Examples

>>> resolve_storage_path("/data/file.txt", protocol="file")
'data/file.txt'
>>> resolve_storage_path(
...     "file.txt", base_path="/base", protocol="file"
... )
'base/file.txt'
>>> resolve_storage_path(
...     "file:///data/file.txt", strip_file_scheme=True
... )
'data/file.txt'
>>> resolve_storage_path(
...     "/data/subdir/file.txt",
...     base_path="/data",
...     protocol="file",
... )
'subdir/file.txt'