DuckDB

Sync DuckDB adapter with full Arrow integration, extension management, and secret configuration. DuckDB excels at analytical workloads and can query Parquet, CSV, and JSON files directly.

Configuration

class sqlspec.adapters.duckdb.DuckDBConfig[source]

Bases: SyncDatabaseConfig[DuckDBPyConnection, DuckDBConnectionPool, DuckDBDriver]

DuckDB configuration with connection pooling.

This configuration supports DuckDB’s features including:

  • Connection pooling

  • Extension management and installation

  • Secret management for API integrations

  • Auto configuration settings

  • Arrow integration

  • Direct file querying capabilities

  • Configurable type handlers for JSON serialization and UUID conversion

DuckDB Connection Pool Configuration: - Default pool size is 1-4 connections (DuckDB uses single connection by default) - Connection recycling is set to 24 hours by default (set to 0 to disable) - Shared memory databases use :memory:shared_db for proper concurrency

Type Handler Configuration via driver_features: - json_serializer: Custom JSON serializer for dict/list parameters.

Defaults to sqlspec.utils.serializers.to_json if not provided. Example: json_serializer=msgspec.json.encode(…).decode(‘utf-8’)

  • enable_uuid_conversion: Enable automatic UUID string conversion (default: True). When True, UUID strings in query results are automatically converted to UUID objects. When False, UUID strings are treated as regular strings.

Example

>>> import msgspec
>>> from sqlspec.adapters.duckdb import DuckDBConfig
>>>
>>> # Custom JSON serializer
>>> def custom_json(obj):
...     return msgspec.json.encode(obj).decode("utf-8")
>>>
>>> config = DuckDBConfig(
...     connection_config={"database": ":memory:"},
...     driver_features={
...         "json_serializer": custom_json,
...         "enable_uuid_conversion": False,
...     },
... )
driver_type

alias of DuckDBDriver

__init__(*, connection_config=None, connection_instance=None, migration_config=None, statement_config=None, driver_features=None, bind_key=None, extension_config=None, observability_config=None, **kwargs)[source]

Initialize DuckDB configuration.

Parameters:
  • connection_config (DuckDBPoolParams | dict[str, typing.Any] | None) – Connection and pool configuration parameters

  • connection_instance (DuckDBConnectionPool | None) – Pre-created pool instance

  • migration_config (dict[str, typing.Any] | None) – Migration configuration

  • statement_config (StatementConfig | None) – Statement configuration override

  • driver_features (DuckDBDriverFeatures | dict[str, typing.Any] | None) – DuckDB-specific driver features including json_serializer and enable_uuid_conversion options

  • bind_key (str | None) – Optional unique identifier for this configuration

  • extension_config (dict[str, dict[str, Any] | LitestarConfig | FastAPIConfig | StarletteConfig | FlaskConfig | ADKConfig | EventsConfig | OpenTelemetryConfig | PrometheusConfig] | None) – Extension-specific configuration (e.g., Litestar plugin settings)

  • observability_config (ObservabilityConfig | None) – Adapter-level observability overrides for lifecycle hooks and observers

  • **kwargs (Any) – Additional keyword arguments passed to the base configuration.

create_connection()[source]

Get a DuckDB connection from the pool.

This method ensures the pool is created and returns a connection from the pool. The connection is checked out from the pool and must be properly managed by the caller.

Returns:

A connection from the pool

Return type:

DuckDBConnection

Note

For automatic connection management, prefer using provide_connection() or provide_session() which handle returning connections to the pool. The caller is responsible for returning the connection to the pool using pool.release(connection) when done.

provide_connection(*args, **kwargs)[source]

Provide a pooled DuckDB connection context manager.

Parameters:
  • *args (Any) – Additional arguments.

  • **kwargs (Any) – Additional keyword arguments.

Return type:

DuckDBConnectionContext

Returns:

A DuckDB connection context manager.

provide_session(*_args, statement_config=None, **_kwargs)[source]

Provide a DuckDB driver session context manager.

Parameters:
  • *_args (Any) – Additional arguments.

  • statement_config (StatementConfig | None) – Optional statement configuration override.

  • **_kwargs (Any) – Additional keyword arguments.

Return type:

DuckDBSessionContext

Returns:

A DuckDB driver session context manager.

get_signature_namespace()[source]

Get the signature namespace for DuckDB types.

This provides all DuckDB-specific types that Litestar needs to recognize to avoid serialization attempts.

Return type:

dict[str, typing.Any]

Returns:

Dictionary mapping type names to types.

get_event_runtime_hints()[source]

Return polling defaults optimized for DuckDB.

Return type:

EventRuntimeHints

class sqlspec.adapters.duckdb.config.DuckDBExtensionConfig[source]

Bases: TypedDict

DuckDB extension configuration for auto-management.

name: str

Name of the extension to install/load.

version: NotRequired[str]

Specific version of the extension.

repository: NotRequired[str]

Repository for the extension (core, community, or custom URL).

force_install: NotRequired[bool]

Force reinstallation of the extension.

class sqlspec.adapters.duckdb.config.DuckDBSecretConfig[source]

Bases: TypedDict

DuckDB secret configuration for AI/API integrations.

secret_type: str

Type of secret (e.g., ‘openai’, ‘aws’, ‘azure’, ‘gcp’).

name: str

Name of the secret.

value: dict[str, Any]

Secret configuration values.

scope: NotRequired[str]

Scope of the secret (LOCAL or PERSISTENT).

Driver

class sqlspec.adapters.duckdb.DuckDBDriver[source]

Bases: SyncDriverAdapterBase

Synchronous DuckDB database driver.

Provides SQL statement execution, transaction management, and result handling for DuckDB databases. Supports multiple parameter styles including QMARK, NUMERIC, and NAMED_DOLLAR formats.

The driver handles script execution, batch operations, and integrates with the sqlspec.core modules for statement processing and caching.

__init__(connection, statement_config=None, driver_features=None)[source]

Initialize driver adapter with connection and configuration.

Parameters:
  • connection (DuckDBPyConnection) – Database connection instance

  • statement_config (StatementConfig | None) – Statement configuration for the driver

  • driver_features (dict[str, typing.Any] | None) – Driver-specific features like extensions, secrets, and connection callbacks

  • observability – Optional runtime handling lifecycle hooks, observers, and spans

dispatch_execute(cursor, statement)[source]

Execute single SQL statement with data handling.

Executes a SQL statement with parameter binding and processes the results. Handles both data-returning queries and data modification operations.

Parameters:
  • cursor (Any) – DuckDB cursor object

  • statement (SQL) – SQL statement to execute

Return type:

ExecutionResult

Returns:

ExecutionResult with execution metadata

dispatch_execute_many(cursor, statement)[source]

Execute SQL with multiple parameter sets using batch processing.

Uses DuckDB’s executemany method for batch operations and calculates row counts for both data modification and query operations.

Parameters:
  • cursor (Any) – DuckDB cursor object

  • statement (SQL) – SQL statement with multiple parameter sets

Return type:

ExecutionResult

Returns:

ExecutionResult with batch execution metadata

dispatch_execute_script(cursor, statement)[source]

Execute SQL script with statement splitting and parameter handling.

Parses multi-statement scripts and executes each statement sequentially with the provided parameters.

Parameters:
  • cursor (Any) – DuckDB cursor object

  • statement (SQL) – SQL statement with script content

Return type:

ExecutionResult

Returns:

ExecutionResult with script execution metadata

begin()[source]

Begin a database transaction.

Return type:

None

commit()[source]

Commit the current transaction.

Return type:

None

rollback()[source]

Rollback the current transaction.

Return type:

None

with_cursor(connection)[source]

Create context manager for DuckDB cursor.

Parameters:

connection (DuckDBPyConnection) – DuckDB connection instance

Return type:

DuckDBCursor

Returns:

DuckDBCursor context manager instance

handle_database_exceptions()[source]

Handle database-specific exceptions and wrap them appropriately.

Return type:

DuckDBExceptionHandler

Returns:

Exception handler with deferred exception pattern for mypyc compatibility.

select_to_arrow(statement, /, *parameters, statement_config=None, return_format='table', native_only=False, batch_size=None, arrow_schema=None, **kwargs)[source]

Execute query and return results as Apache Arrow (DuckDB native path).

DuckDB provides native Arrow support via cursor.arrow(). This is the fastest path due to DuckDB’s columnar architecture.

Parameters:
  • statement – SQL statement, string, or QueryBuilder

  • *parameters – Query parameters or filters

  • statement_config – Optional statement configuration override

  • return_format – “table” for pyarrow.Table (default), “batch” for RecordBatch, “batches” for list of RecordBatch, “reader” for RecordBatchReader

  • native_only – Ignored for DuckDB (always uses native path)

  • batch_size – Batch size hint (for future streaming implementation)

  • arrow_schema – Optional pyarrow.Schema for type casting

  • **kwargs – Additional keyword arguments

Returns:

ArrowResult with native Arrow data

Example

>>> result = driver.select_to_arrow(
...     "SELECT * FROM users WHERE age > ?", 18
... )
>>> df = result.to_pandas()  # Fast zero-copy conversion
select_to_storage(statement, destination, /, *parameters, statement_config=None, partitioner=None, format_hint=None, telemetry=None, **kwargs)[source]

Persist DuckDB query output to a storage backend using Arrow fast paths.

load_from_arrow(table, source, *, partitioner=None, overwrite=False, telemetry=None)[source]

Load Arrow data into DuckDB using temporary table registration.

load_from_storage(table, source, *, file_format, partitioner=None, overwrite=False)[source]

Read an artifact from storage and load it into DuckDB.

Return type:

StorageBridgeJob

property data_dictionary: DuckDBDataDictionary

Get the data dictionary for this driver.

Returns:

Data dictionary instance for metadata queries

collect_rows(cursor, fetched)[source]

Collect DuckDB rows for the direct execution path.

Return type:

tuple[list[typing.Any], list[str], int]

resolve_rowcount(cursor)[source]

Resolve rowcount from DuckDB cursor for the direct execution path.

Return type:

int

Connection Pool

class sqlspec.adapters.duckdb.DuckDBConnectionPool[source]

Bases: object

Thread-local connection manager for DuckDB.

Uses thread-local storage to ensure each thread gets its own DuckDB connection, preventing the thread-safety issues that cause segmentation faults when multiple cursors share the same connection concurrently.

This design trades traditional pooling for thread safety, which is essential for DuckDB since connections and cursors are not thread-safe.

__init__(connection_config, pool_recycle_seconds=86400, health_check_interval=30.0, extensions=None, extension_flags=None, secrets=None, on_connection_create=None)[source]

Initialize the thread-local connection manager.

Parameters:
  • connection_config – DuckDB connection configuration

  • pool_recycle_seconds – Connection recycle time in seconds

  • health_check_interval – Seconds of idle time before running health check

  • extensions – List of extensions to install/load

  • extension_flags – Connection-level SET statements applied after creation

  • secrets – List of secrets to create

  • on_connection_create – Callback executed when connection is created

get_connection()[source]

Get a thread-local connection.

Each thread gets its own dedicated DuckDB connection to prevent thread-safety issues with concurrent cursor operations.

For file-based databases, the connection is closed when the context manager exits to release DuckDB’s file lock, allowing subsequent connections with different configurations.

For in-memory databases, connections are kept alive to preserve data, as in-memory data is lost when the last connection closes.

Yields:

DuckDBConnection – A thread-local connection.

Return type:

Generator[DuckDBPyConnection, None, None]

close()[source]

Close the thread-local connection if it exists.

Return type:

None

size()[source]

Get current pool size (always 1 for thread-local).

Return type:

int

checked_out()[source]

Get number of checked out connections (always 0 for thread-local).

Return type:

int

acquire()[source]

Acquire a thread-local connection.

Each thread gets its own dedicated DuckDB connection to prevent thread-safety issues with concurrent cursor operations.

Returns:

A thread-local connection

Return type:

DuckDBConnection

release(connection)[source]

Release a connection (no-op for thread-local connections).

Parameters:

connection (DuckDBPyConnection) – The connection to release (ignored)

Return type:

None