DuckDB¶
Sync DuckDB adapter with full Arrow integration, extension management, and secret configuration. DuckDB excels at analytical workloads and can query Parquet, CSV, and JSON files directly.
Configuration¶
- class sqlspec.adapters.duckdb.DuckDBConfig[source]¶
Bases:
SyncDatabaseConfig[DuckDBPyConnection,DuckDBConnectionPool,DuckDBDriver]DuckDB configuration with connection pooling.
This configuration supports DuckDB’s features including:
Connection pooling
Extension management and installation
Secret management for API integrations
Auto configuration settings
Arrow integration
Direct file querying capabilities
Configurable type handlers for JSON serialization and UUID conversion
DuckDB Connection Pool Configuration: - Default pool size is 1-4 connections (DuckDB uses single connection by default) - Connection recycling is set to 24 hours by default (set to 0 to disable) - Shared memory databases use :memory:shared_db for proper concurrency
Type Handler Configuration via driver_features: - json_serializer: Custom JSON serializer for dict/list parameters.
Defaults to sqlspec.utils.serializers.to_json if not provided. Example: json_serializer=msgspec.json.encode(…).decode(‘utf-8’)
enable_uuid_conversion: Enable automatic UUID string conversion (default: True). When True, UUID strings in query results are automatically converted to UUID objects. When False, UUID strings are treated as regular strings.
Example
>>> import msgspec >>> from sqlspec.adapters.duckdb import DuckDBConfig >>> >>> # Custom JSON serializer >>> def custom_json(obj): ... return msgspec.json.encode(obj).decode("utf-8") >>> >>> config = DuckDBConfig( ... connection_config={"database": ":memory:"}, ... driver_features={ ... "json_serializer": custom_json, ... "enable_uuid_conversion": False, ... }, ... )
- driver_type¶
alias of
DuckDBDriver
- __init__(*, connection_config=None, connection_instance=None, migration_config=None, statement_config=None, driver_features=None, bind_key=None, extension_config=None, observability_config=None, **kwargs)[source]¶
Initialize DuckDB configuration.
- Parameters:
connection_config¶ (
DuckDBPoolParams|dict[str, typing.Any] |None) – Connection and pool configuration parametersconnection_instance¶ (
DuckDBConnectionPool|None) – Pre-created pool instancemigration_config¶ (
dict[str, typing.Any] |None) – Migration configurationstatement_config¶ (
StatementConfig|None) – Statement configuration overridedriver_features¶ (
DuckDBDriverFeatures|dict[str, typing.Any] |None) – DuckDB-specific driver features including json_serializer and enable_uuid_conversion optionsbind_key¶ (
str|None) – Optional unique identifier for this configurationextension_config¶ (
dict[str,dict[str,Any] |LitestarConfig|FastAPIConfig|StarletteConfig|FlaskConfig|ADKConfig|EventsConfig|OpenTelemetryConfig|PrometheusConfig] |None) – Extension-specific configuration (e.g., Litestar plugin settings)observability_config¶ (
ObservabilityConfig|None) – Adapter-level observability overrides for lifecycle hooks and observers**kwargs¶ (
Any) – Additional keyword arguments passed to the base configuration.
- create_connection()[source]¶
Get a DuckDB connection from the pool.
This method ensures the pool is created and returns a connection from the pool. The connection is checked out from the pool and must be properly managed by the caller.
- Returns:
A connection from the pool
- Return type:
DuckDBConnection
Note
For automatic connection management, prefer using provide_connection() or provide_session() which handle returning connections to the pool. The caller is responsible for returning the connection to the pool using pool.release(connection) when done.
- provide_session(*_args, statement_config=None, **_kwargs)[source]¶
Provide a DuckDB driver session context manager.
- get_signature_namespace()[source]¶
Get the signature namespace for DuckDB types.
This provides all DuckDB-specific types that Litestar needs to recognize to avoid serialization attempts.
- class sqlspec.adapters.duckdb.config.DuckDBExtensionConfig[source]¶
Bases:
TypedDictDuckDB extension configuration for auto-management.
Driver¶
- class sqlspec.adapters.duckdb.DuckDBDriver[source]¶
Bases:
SyncDriverAdapterBaseSynchronous DuckDB database driver.
Provides SQL statement execution, transaction management, and result handling for DuckDB databases. Supports multiple parameter styles including QMARK, NUMERIC, and NAMED_DOLLAR formats.
The driver handles script execution, batch operations, and integrates with the sqlspec.core modules for statement processing and caching.
- __init__(connection, statement_config=None, driver_features=None)[source]¶
Initialize driver adapter with connection and configuration.
- Parameters:
connection¶ (
DuckDBPyConnection) – Database connection instancestatement_config¶ (
StatementConfig|None) – Statement configuration for the driverdriver_features¶ (
dict[str, typing.Any] |None) – Driver-specific features like extensions, secrets, and connection callbacksobservability¶ – Optional runtime handling lifecycle hooks, observers, and spans
- dispatch_execute(cursor, statement)[source]¶
Execute single SQL statement with data handling.
Executes a SQL statement with parameter binding and processes the results. Handles both data-returning queries and data modification operations.
- dispatch_execute_many(cursor, statement)[source]¶
Execute SQL with multiple parameter sets using batch processing.
Uses DuckDB’s executemany method for batch operations and calculates row counts for both data modification and query operations.
- dispatch_execute_script(cursor, statement)[source]¶
Execute SQL script with statement splitting and parameter handling.
Parses multi-statement scripts and executes each statement sequentially with the provided parameters.
- with_cursor(connection)[source]¶
Create context manager for DuckDB cursor.
- Parameters:
connection¶ (
DuckDBPyConnection) – DuckDB connection instance- Return type:
DuckDBCursor- Returns:
DuckDBCursor context manager instance
- handle_database_exceptions()[source]¶
Handle database-specific exceptions and wrap them appropriately.
- Return type:
DuckDBExceptionHandler- Returns:
Exception handler with deferred exception pattern for mypyc compatibility.
- select_to_arrow(statement, /, *parameters, statement_config=None, return_format='table', native_only=False, batch_size=None, arrow_schema=None, **kwargs)[source]¶
Execute query and return results as Apache Arrow (DuckDB native path).
DuckDB provides native Arrow support via cursor.arrow(). This is the fastest path due to DuckDB’s columnar architecture.
- Parameters:
statement¶ – SQL statement, string, or QueryBuilder
*parameters¶ – Query parameters or filters
statement_config¶ – Optional statement configuration override
return_format¶ – “table” for pyarrow.Table (default), “batch” for RecordBatch, “batches” for list of RecordBatch, “reader” for RecordBatchReader
native_only¶ – Ignored for DuckDB (always uses native path)
batch_size¶ – Batch size hint (for future streaming implementation)
arrow_schema¶ – Optional pyarrow.Schema for type casting
**kwargs¶ – Additional keyword arguments
- Returns:
ArrowResult with native Arrow data
Example
>>> result = driver.select_to_arrow( ... "SELECT * FROM users WHERE age > ?", 18 ... ) >>> df = result.to_pandas() # Fast zero-copy conversion
- select_to_storage(statement, destination, /, *parameters, statement_config=None, partitioner=None, format_hint=None, telemetry=None, **kwargs)[source]¶
Persist DuckDB query output to a storage backend using Arrow fast paths.
- load_from_arrow(table, source, *, partitioner=None, overwrite=False, telemetry=None)[source]¶
Load Arrow data into DuckDB using temporary table registration.
- load_from_storage(table, source, *, file_format, partitioner=None, overwrite=False)[source]¶
Read an artifact from storage and load it into DuckDB.
- Return type:
- property data_dictionary: DuckDBDataDictionary¶
Get the data dictionary for this driver.
- Returns:
Data dictionary instance for metadata queries
Connection Pool¶
- class sqlspec.adapters.duckdb.DuckDBConnectionPool[source]¶
Bases:
objectThread-local connection manager for DuckDB.
Uses thread-local storage to ensure each thread gets its own DuckDB connection, preventing the thread-safety issues that cause segmentation faults when multiple cursors share the same connection concurrently.
This design trades traditional pooling for thread safety, which is essential for DuckDB since connections and cursors are not thread-safe.
- __init__(connection_config, pool_recycle_seconds=86400, health_check_interval=30.0, extensions=None, extension_flags=None, secrets=None, on_connection_create=None)[source]¶
Initialize the thread-local connection manager.
- Parameters:
connection_config¶ – DuckDB connection configuration
pool_recycle_seconds¶ – Connection recycle time in seconds
health_check_interval¶ – Seconds of idle time before running health check
extensions¶ – List of extensions to install/load
extension_flags¶ – Connection-level SET statements applied after creation
secrets¶ – List of secrets to create
on_connection_create¶ – Callback executed when connection is created
- get_connection()[source]¶
Get a thread-local connection.
Each thread gets its own dedicated DuckDB connection to prevent thread-safety issues with concurrent cursor operations.
For file-based databases, the connection is closed when the context manager exits to release DuckDB’s file lock, allowing subsequent connections with different configurations.
For in-memory databases, connections are kept alive to preserve data, as in-memory data is lost when the last connection closes.
- checked_out()[source]¶
Get number of checked out connections (always 0 for thread-local).
- Return type: