Bulk Ingest

SQLSpec exposes native bulk-ingest fast paths through a small, adapter-agnostic storage-bridge API. High-volume writes use each driver's database primitive -- COPY, LOAD DATA LOCAL INFILE, direct path load, BulkCopy, Arrow ingest, load jobs, or mutations -- instead of generic row-by-row execution.

The API

Three methods cover the common shapes. They share return type StorageBridgeJob (its telemetry dict reports rows_processed):

  • load_from_arrow(table, source, *, overwrite=False) -- load an Arrow table (or anything coercible to one) using the adapter's native ingest path.

  • load_from_storage(table, source, *, file_format, overwrite=False) -- load a staged artifact (a local path or cloud URI) into a table.

  • load_from_records(table, records, *, columns=None, overwrite=False) -- load in-memory rows. records may be mappings (columns derived from the keys) or positional sequences (columns required). Records are normalized and routed through the adapter's native load_from_arrow path, so every adapter that supports Arrow ingest supports records too.

# dict records -- columns inferred from keys
driver.load_from_records("orders", [{"id": 1, "total": 9.99}, {"id": 2, "total": 4.50}])

# positional records -- columns required
driver.load_from_records("orders", [(3, 1.0), (4, 2.0)], columns=["id", "total"])

Empty input, mismatched mapping keys, or a positional/column width mismatch raise ImproperConfigurationError.

Capability matrix

Adapter

Native ingest path

Transactionality

Gate / opt-in

asyncpg

COPY (copy_records_to_table)

Atomic; exact row counts

Always on

psycopg (sync/async)

COPY streaming write_row

Atomic; exact row counts

Always on

psqlpy

Binary COPY with INSERT fallback

Atomic

Always on

adbc

adbc_ingest (append/replace)

Driver-dependent; FlightSQL falls back to per-row

Always on

duckdb

register + INSERT ... SELECT

Single connection transaction

Always on

sqlite / aiosqlite

executemany inside one BEGIN IMMEDIATE

Atomic when the driver owns the transaction; rolls back on error

Always on

oracledb

direct path load (Thin mode, default); executemany fallback

Per execute_many; array-DML row counts available

enable_direct_path_load=False to force fallback; oracle_batch_errors / oracle_array_dml_row_counts execution args

MySQL family (pymysql, asyncmy, aiomysql, mysql-connector)

executemany (default); LOAD DATA LOCAL INFILE (opt-in)

Server-managed

enable_local_infile_bulk_load + connection local_infile / allow_local_infile

bigquery

Parquet load job (default); Arrow Storage Write API (opt-in)

All-or-nothing load job / PENDING write stream

enable_storage_write_api; load retry/timeout via job-control features

spanner

Transaction.insert_or_update mutations (upsert); Batch Write API (opt-in)

In-transaction (default); independently committed groups (Batch Write)

Always on; enable_batch_write_api for high-throughput groups

mssql-python

cursor.bulkcopy() via load_from_arrow

Driver-managed

Always on

arrow_odbc

bulk_insert_arrow via load_from_arrow

Driver-managed

Always on

Security and opt-in paths

Some fast paths are opt-in because they read local files or change semantics:

  • MySQL ``LOAD DATA LOCAL INFILE`` requires both the adapter feature enable_local_infile_bulk_load and the connection's local-infile setting (local_infile=True for pymysql/aiomysql/asyncmy, allow_local_infile=True for mysql-connector). Enabling the feature without the connection gate raises ImproperConfigurationError at config construction. The MySQL server must also have local_infile enabled. mysql-connector additionally honors allow_local_infile_in_path -- the staged temp file must live under that directory when it is set.

  • Oracle direct path load is the default bulk-ingest transport in Thin mode. Set enable_direct_path_load=False to force executemany. Connections that do not expose the Direct Path Load API, including Thick-mode connections, silently fall back to executemany.

  • BigQuery Storage Write API (enable_storage_write_api) streams Arrow rows for load_from_arrow appends and falls back to the Parquet load job when the Storage client is unavailable; overwrite=True always uses a Parquet WRITE_TRUNCATE load job.

  • Spanner Batch Write API (enable_batch_write_api) routes load_from_arrow through Database.mutation_groups().batch_write() for high-throughput, independently committed insert_or_update groups instead of a single in-transaction flush. The upsert semantics keep each group idempotent on replay.

Examples

MySQL LOAD DATA LOCAL INFILE:

from sqlspec.adapters.pymysql import PyMysqlConfig

config = PyMysqlConfig(
    connection_config={"host": "localhost", "local_infile": True},
    driver_features={"enable_local_infile_bulk_load": True},
)
with config.provide_session() as driver:
    driver.load_from_arrow("orders", arrow_table)

Oracle per-call batch error and array-DML row-count reporting:

statement_config = driver.statement_config.replace(
    execution_args={"oracle_batch_errors": True, "oracle_array_dml_row_counts": True}
)
result = driver.execute_many(
    "INSERT INTO orders (id, total) VALUES (:1, :2)", rows, statement_config=statement_config
)
failures = result.metadata["oracle_batch_errors"]      # list of {offset, code, message}
row_counts = result.metadata["oracle_dml_row_counts"]  # per-statement affected rows

Note

Spanner load_from_arrow uses insert_or_update mutations, so re-running the same rows is an idempotent upsert rather than a primary-key collision.