Phillip LeBlanc

Co-Founder and CTO of Spice AI

View all authors

Spice v2.0.1 (Jun 17, 2026)

June 17, 2026 · 4 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Spice v2.0.1 is now available! 🛠️

Spice v2.0.1 is a patch release focused on reliability and performance. It speeds up Apache Iceberg reads and fixes bugs across AWS S3 and object-store datasets, data acceleration, distributed query, and authenticated access.

What's New in v2.0.1

Faster Iceberg Reads with Parallel File Scanning

The Apache Iceberg reader now scans data files in parallel (#11331), improving read throughput and latency for Iceberg tables that span many files.

AWS S3 & Object-Store Reliability

Three fixes improve S3 and object-store dataset behavior:

Refresh-skip restored (#11339): ETag/Version-based refresh-skip works reliably again, so unchanged S3 objects are no longer re-downloaded on every refresh.
Retry when source files are not yet available (#11342): an object-store dataset whose source files are not present at startup now retries and becomes ready once the data appears, instead of failing permanently.
Path-style addressing for dotted bucket names (#11347): on standard AWS, buckets whose names contain dots now default to path-style addressing, avoiding TLS wildcard certificate errors under virtual-hosted-style HTTPS.

Data Acceleration & Distributed Query Fixes

Two fixes ensure accelerated datasets behave correctly in more configurations:

Acceleration endpoints (#11345): /v1/datasets/{name}/acceleration/refresh (and the related update-refresh-sql, partition-filters, and snapshots endpoints) now work for all accelerated datasets, fixing cases where some incorrectly reported Table is not accelerated.
Distributed clusters (#11226): the distributed query coordinator now serves accelerated data from executors for all accelerated datasets, instead of falling back to reading from the source for some.

Authenticated Query Fixes

With authentication enabled, queries now consistently run as the requesting user (#11253), so per-user behavior such as results caching is correctly scoped to each user.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook includes more than 100 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v2.0.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:2.0.1 image:

docker pull spiceai/spiceai:2.0.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 2.0.1

AWS Marketplace:

Spice is available in the AWS Marketplace.

What's Changed

Changelog

fix(ci): key testoperator & validator artifacts by checked-out commit by @sgrebnov in #11281
chore(deps): fix cargo-deny advisory failures on release/2.0 by @phillipleblanc in #11333
chore(deps): bump iceberg-rust to parallel file scanning fork (release/2.0) by @phillipleblanc in #11331
fix(refresh): restore S3 ETag/Version refresh-skip behind provider wrappers by @phillipleblanc in #11339
fix(runtime): retry object-store dataset load when source files are not yet available by @phillipleblanc in #11342
feat(s3): default to path-style for dotted bucket names on standard AWS by @phillipleblanc in #11347
fix(runtime): resolve accelerated table through metadata-enrichment wrapper by @phillipleblanc in #11345
fix(cluster): distribute accelerated tables wrapped by metadata/index providers by @phillipleblanc in #11226
fix: scope request context across the managed query runtime by @phillipleblanc in #11253

Full Changelog: https://github.com/spiceai/spiceai/compare/v2.0.0...v2.0.1

Spice v2.0-stable (Jun 5, 2026)

June 5, 2026 · 94 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

53 releases since Spice 1.0-stable, Spice.ai OSS has reached the 2.0-stable milestone! 🎉

Spice v2.0.0 is the next major release of Spice and a major milestone in the project's development, advancing Spice from a single-node engine into a distributed data and query platform built for enterprise AI agents. These agents need low-latency, governed access to data spread across many production systems, and because they generate their own queries autonomously, that access has to be sandboxed, observable, and able to absorb occasional heavy analytical queries without overwhelming the underlying systems. The release is headlined by multi-node distributed query, now generally available — multi-active, highly-available, and object-store-native, built on Apache Ballista — distributing both query execution and ingestion across executors with data-local routing and per-executor statistics for distributed join planning. Alongside it, the Spice Cayenne data accelerator is generally available, built on the Vortex compressed columnar format, with a high-throughput CDC write path, MERGE INTO, SQL-defined partitioning, inline writes, a dedicated compaction runtime, and write-path statistics for distributed join sizing. The engine also moves to DataFusion v52 with sort pushdown, a rewritten merge join, and dynamic filters, and the Spice CLI is rewritten in Rust as a single self-contained binary.

v2.0 also expands real-time and write-path capabilities across the platform: native CDC from MongoDB Change Streams and PostgreSQL WAL logical replication, durable Kafka CDC offsets, DML write-back for PostgreSQL, Snowflake, DynamoDB, Arrow, and DuckLake, DDL and MERGE INTO for Iceberg catalogs, mutual TLS across server endpoints and outbound connectors, HashiCorp Vault and Azure Key Vault secret stores, user-defined functions, hybrid search with Elasticsearch and DuckDB HNSW vector indexes, provider-aware LLM prompt caching, and the Responses API across all model providers.

Highlights in v2.0.0 include:

Spice Cayenne (GA) — generally available on the Vortex compressed columnar format, with WAL-staged writes, inline low-latency writes, fast-path CDC deletes, merge-on-read position deletes, composite & SQL-defined partitioning, MERGE INTO, dedicated compaction runtime, and join-sizing statistics maintained on the write path
Multi-Active HA Distributed Query (GA) — multi-node distributed query built on Apache Ballista, with object-store-native clustering, dynamic cluster sizing, distributed ingestion, data-local query routing, per-executor table statistics for distributed join planning, and async queries via /v1/queries
Mutual TLS (mTLS) — public mTLS for HTTP and Flight, TLS cert hot-reload, and mTLS client certificates for FlightSQL and Spice.ai connectors
Enterprise Authentication & Authorization — OIDC bearer-token verification and Cedar-based authorization policy with per-principal row- and column-level filtering
New Secret Stores — HashiCorp Vault and Azure Key Vault
CDC Sources — native MongoDB Change Streams, PostgreSQL WAL logical replication, and durable Kafka CDC offsets — no Debezium or Kafka middleware required
DML & DDL — INSERT/UPDATE/DELETE write-back for PostgreSQL, Snowflake, DynamoDB, and Arrow; CREATE TABLE/DROP TABLE and MERGE INTO for Iceberg catalogs
User-Defined Functions — SQL UDFs in spicepods, remote UDFs over HTTP, and optional geospatial ST_* UDFs
On-Demand Dataset Loading & Unified Query Cancellation — faster startup and end-to-end cancellation across HTTP, Flight, FlightSQL, and MCP
Dynamic HTTP Connector — OAuth2 refresh tokens, pagination, dynamic headers, subquery-driven parameters, and rate-control state persisted across restarts
Storage-Profile Accelerator Tuning & refresh_mode: snapshot — storage-aware acceleration defaults and point-in-time snapshot acceleration
Search & Vectors — Elasticsearch data connector with native hybrid search, DuckDB HNSW vector engine with a statically linked VSS extension, multi-vector MaxSim embeddings, and a rerank() UDTF
AI & LLM — provider-aware prompt caching, Responses API across all providers, MCP Streamable HTTP transport, and a searchable LLM tool registry
New Data Connectors — Elasticsearch (Alpha), GCS (Alpha), Azure Cosmos DB (Alpha), Git (RC), ADBC, DuckLake (Beta), and catalog connectors for PostgreSQL, MySQL, MSSQL, and Snowflake
Rust CLI — single-binary spice CLI with spice query async REPL, shell completions, and --output=json
Dependency upgrades including DataFusion v52.5, DuckDB v1.5.3, Arrow v57.2, iceberg-rust v0.9.1, Turso v0.6.1, and Vortex v0.69

Spice v2.0 includes several breaking changes. Review the breaking changes section before upgrading.

Distribution Changes

AI/ML support including local LLM/ML model and hosted LLM inference is now included in the default Spice build and image. The separate models build variant has been removed.

With models now included by default, the data-only distribution (without AI/ML support) is only published in nightly builds. Official production-ready data-only distributions are available exclusively through Spice Cloud and the Enterprise release.

A new Network Attached Storage (NAS) distribution with built-in SMB and NFS data connector support is also available in nightly builds and with Spice.ai Enterprise.

Distribution / Variant	Open Source	Spice Cloud	Enterprise
Default	✅	✅	✅
Data	Nightly only	✅	✅
NAS (SMB + NFS)	Nightly only	❌	✅
Metal (macOS)	✅	✅	✅
CUDA (Linux)	Nightly only	✅	✅
Allocator variants	Nightly only	✅	✅
ODBC connector	Local build only	✅	✅

Native Windows builds are no longer provided; use WSL for local development. For more details, see the Distributions documentation.

What's New in v2.0.0

Spice Cayenne Reaches General Availability

The Spice Cayenne data accelerator is generally available in v2.0, with a major focus across the release candidates on write-path throughput, correctness, and distributed operation.

Write path & ingest:

Staged Append Writes: WAL-based staged append writes prevent partial writes and data loss on stream errors — batches commit atomically.
Inline Writes: Small writes are serialized as Arrow IPC and committed directly into the Cayenne metastore, bypassing the staged Vortex write path for low-latency ingest. Inline upserts atomically rewrite existing inline rows, inline data stays query-visible via an in-memory union scan, and rows are checkpointed to Vortex when thresholds are reached. Inline writes now also proceed with pending deletions in flight, and inline flush caps scale with available memory and storage class.
Fast-Path CDC Deletes: DELETE statements whose filters identify primary keys directly — including composite keys expressed as (k1, k2) IN ((...), (...)) — skip the table scan entirely.
Merge-On-Read Position Deletes: Primary-key upsert tables use position deletes with memory-pool accounting, avoiding full-table rewrites on update-heavy workloads.
Resident Upsert Keysets: CDC upsert primary-key keysets stay resident between batches, avoiding per-batch full-table rebuilds.
CDC Sub-Batch Efficiency: Interleaved upsert/delete workloads produce fewer sub-batch splits, with last-write-wins deduplication applied within batches.
Dedicated Compaction Runtime: Background compaction runs on a dedicated thread pool with CDC pipelining and protected snapshots, isolating compaction work from query and ingest paths.

Query & planning:

Join Filter Propagation: Filters propagate across equi-join keys, with range fallback for large join filters and IN-list rewrites.
Write-Path Join-Sizing Statistics: Cayenne maintains live row counts and HyperLogLog-based distinct-value estimates on the write path, so distributed JoinSelection can correctly size joins without rescans.
Scan-Result Cache: A new scan-result cache accelerates hot reads, with parallel Vortex partition writes and lock-free deletion caches with bloom-prefiltered probes.

SQL & catalog:

MERGE INTO: Upsert-style MERGE INTO for Cayenne catalog tables, distributed across executors in cluster mode.
PARTITION BY in SQL: Define partitioning directly in CREATE TABLE ... PARTITION BY (...); metadata is persisted in the catalog and survives restarts.
Composite Partitioning: partition_by: [col1, col2] with hierarchical path-like keys.
File-Based Retention Deletes: Time-based retention uses file-level deletes for both position-based and primary-key tables.

Correctness: Synchronized partition commits, correct NULL-sentinel handling for nullable partition expressions, tombstoned inline-checkpointed rows on upsert (preventing duplicate primary keys), and live reads through expired protected snapshots.

Multi-Active HA Distributed Query (GA)

Spice.ai Enterprise feature. See High Availability.

Distributed Query is generally available. Built on Apache Ballista, it distributes query execution across multiple active executor nodes with no single point of failure, reading directly from object storage rather than relying on a central cluster.

Distributed query supports two execution modes:

Synchronous: Queries for accelerated datasets are distributed across executors and results stream back in real-time — best for interactive, latency-sensitive queries.
Asynchronous: Queries submitted via the HTTP /v1/queries API materialize results to object storage for later retrieval — best for long-running analytical and batch workloads.

Key capabilities:

Dynamic Cluster Sizing: The planner adjusts parallelism to the number of active executors as nodes join or leave.
Distributed Ingestion: Ingestion for partitioned accelerated tables is distributed across executors, with partition-aware write-through splitting scheduler-side Flight DoPut writes to the responsible executors.
Data-Local Query Routing: Cayenne catalog queries route to the executors holding the relevant partitions.
Per-Executor Table Statistics: Executors report table statistics — including NDV-aware estimates — so distributed JoinSelection can size joins correctly, fixing out-of-memory conditions on large semi-joins.
Readiness & Failure Detection: /v1/ready gates on a configurable executor quorum for safe rolling deployments; scheduler readiness additionally waits for executor partition loads; executor heartbeat timeout reduced from 180s to 30s.
Distributed DML & DDL: UPDATE/DELETE forwarding to all executors, executor DDL sync for late joiners, and distributed MERGE INTO.
Cluster Observability: New cluster metrics (including scheduler_active_executors_count), distributed runtime.task_history replication, and a Grafana dashboard.
Ballista S3 Shuffle: Async queries with runtime.params.shuffle_location: s3://... complete reliably with executor-environment-derived S3 clients.

Security: Mutual TLS, Secret Stores, and Hardening

Several capabilities in this section are Spice.ai Enterprise features. See Enterprise Security.

Mutual TLS across the platform:

Public mTLS for HTTP and Flight: client_auth_mode: request (optional, for migration windows) or required (strict) client-certificate verification.
TLS Cert Hot-Reload: The runtime reloads TLS certificates on SIGHUP for zero-downtime rotation.
Outbound mTLS Client Certificates: FlightSQL and Spice.ai data connectors present client certificates to upstream services; the spice sql REPL supports mTLS client auth.

runtime:
  tls:
    enabled: true
    certificate_file: /etc/spice/tls/server.crt
    key_file: /etc/spice/tls/server.key
    client_auth_mode: required
    client_auth_ca_file: /etc/spice/tls/client-ca.crt

Authentication & Authorization (Spice.ai Enterprise):

OIDC Authentication: Validate OIDC bearer tokens (JWTs) issued by enterprise identity providers — Microsoft Entra ID, Okta, Auth0, AWS Cognito, and Google — for secure access to runtime endpoints, standalone or combined with API keys.
Principal-Based Policy Enforcement: Fine-grained, Cedar-based authorization policy configured under runtime.authorization governs allow/deny access across datasets, models, tools, and endpoints. Combined with identity SQL functions (current_principal(), current_principal_email(), current_principal_groups()), policies enforce per-principal row-level filtering and column masking.

New Secret Stores: HashiCorp Vault (KV v1/v2; token, approle, kubernetes, and jwt auth with automatic lease renewal) and Azure Key Vault (service principal, managed identity, workload identity, Azure CLI, or auto-detect; sovereign cloud support).

Hardening:

Read-only API Key Enforcement on the Flight DoGet path and async query endpoints.
Per-Principal Cache Namespacing: SQL, search, and caching-accelerator caches are namespaced per authenticated principal so cached results never cross identity boundaries.
API Key Timing Leak & Remote-UDF SSRF: Closed a timing-based position-disclosure leak in API key comparison and blocked SSRF via remote UDF endpoints.
Snowflake Function Deny-List: A function deny-list is enforced in Snowflake federation pushdown, and Snowflake account identifiers and auth configuration are validated at startup.
MCP allowed_hosts: MCP servers can be restricted to an explicit allowlist of upstream hosts.

Change Data Capture (CDC) Sources

See Change Data Capture (CDC) for an overview of CDC in Spice.

MongoDB Change Streams: MongoDB datasets with refresh_mode: changes stream changes natively into any local accelerator — no Debezium or Kafka required.
PostgreSQL Native Replication (WAL): PostgreSQL datasets stream INSERT/UPDATE/DELETE directly from logical replication using pgoutput decoding, with automatic per-replica slot management, an initial REPEATABLE READ bootstrap snapshot, and durable LSN acknowledgement.
Kafka CDC Offset Persistence: Kafka CDC offsets persist in sidecar tables for durable, resumable streams across restarts and failovers.
Pipelined CDC Ingestion: Source reads overlap with batch apply, with envelope coalescing and improved nullability propagation.
Debezium Schema Evolution: Schema changes in Debezium-sourced datasets no longer break dataset initialization on reload.

datasets:
  - from: postgres:my_table
    name: my_table
    params:
      pg_host: localhost
      pg_db: mydb
    acceleration:
      enabled: true
      engine: duckdb
      refresh_mode: changes

DML, DDL, and Write-Back

Spice v2.0 turns more connectors and catalogs into full read/write tables:

PostgreSQL DML: INSERT, UPDATE, and DELETE write-back on PostgreSQL datasets, with foreign-key metadata exposed via the PostgreSQL catalog connector.
Snowflake DML: INSERT, UPDATE, and DELETE write-back on Snowflake datasets.
DynamoDB DML: INSERT, UPDATE, and DELETE for DynamoDB, complementing read and CDC streaming.
Arrow Primary Key Upserts: Native update-or-insert semantics for in-memory Arrow-accelerated tables.
DDL for Iceberg: CREATE TABLE and DROP TABLE via FlightSQL and /v1/sql for Iceberg, with catalog.access: read_write_create.
DuckLake INSERT: DuckLake catalog tables with read_write access support INSERT.

SQL & User-Defined Functions

See the SQL Reference for the full SQL surface area.

User-Defined Functions: Define reusable SQL UDFs as first-class spicepod components, or invoke remote functions over HTTP (Spice.ai Enterprise), plus table user functions.
Spatial SQL UDFs: Optional geospatial ST_* UDFs for geometry workloads.
JSON UDTFs: flatten_json, json_tree, and flatten_json_properties table-valued functions for JSON transformation and schema decomposition (with options such as expand_maps). See JSON Functions and Operators.
PostgreSQL Metadata UDFs: Dataset and column descriptions are exposed via PostgreSQL-compatible UDFs (obj_description, col_description), so BI tools and psql surface Spice metadata.
FlightSQL Substrait Plans: CommandStatementSubstraitPlan support for clients submitting Substrait-encoded plans.
SQL REPL Expanded View: Toggle \x for a vertical key-value layout on wide result sets.
Prepared statement, federation, and unparsing fixes across the engine, including keeping correlated subqueries out of JOIN ON conditions for Spice Cloud federation and correct EXISTS/NOT EXISTS subquery handling in the federation analyzer.

Runtime Features

On-Demand Dataset Loading: Datasets can be deferred — registered with a declared schema at startup (columns[].type, columns[].nullable) and fully resolved on first reference, reducing startup time and memory for large spicepods.
Unified Query Cancellation: HTTP, Flight, FlightSQL, MCP, and internal execution paths honour a unified cancellation signal — disconnects, REPL Ctrl-C, and cancelled HTTP requests cancel the query end-to-end.
Storage-Profile Accelerator Tuning: acceleration.storage_profile (auto, local_ssd, ebs, tmpfs) applies storage-aware defaults across DuckDB, SQLite, Turso, and Cayenne file-mode accelerators; auto detects the backing storage.
refresh_mode: snapshot (Spice.ai Enterprise): Point-in-time snapshot acceleration with SQLite/Turso WAL flushing and Cayenne metastore slice integration, now reporting accurate readiness when no snapshot exists yet.
Structured Component Errors: /v1/datasets?status=true and /v1/models?status=true return structured error objects (category, type, code) and human-readable error_message fields; the CLI shows an ERROR column.
Actionable Config Errors: Parameter typos, missing secret references, and unknown engine names produce specific, actionable errors with suggestions.

Spicepod v2

Spicepods now support version: v2, the default for spice init, while v1 spicepods continue to work with automatic migration of deprecated fields.

Version	Status
`v2`	Default. Used by `spice init`.
`v1`	Supported. Deprecated fields auto-migrate.
`v1beta1`	Removed. No longer accepted.

v1 (deprecated)	v2 (preferred)	Notes
`runtime.results_cache`	`runtime.caching.sql_results`	All fields migrate automatically. `cache_max_size` → `max_size`.
`runtime.memory_limit`	`runtime.query.memory_limit`	Auto-migrated. `query.memory_limit` takes priority if both set.
`runtime.temp_directory`	`runtime.query.temp_directory`	Auto-migrated. `query.temp_directory` takes priority if both set.
`dataset.invalid_type_action`	`dataset.unsupported_type_action`	Auto-migrated. v2 adds a new `string` variant.

New v2 fields include runtime.ready_state, runtime.query.spill_compression, runtime.caching.sql_results.stale_while_revalidate_ttl, runtime.caching.sql_results.encoding, scheduler partition-assignment configuration, and catalog.access: read_write_create.

Data Connectors & Catalogs

New connectors:

Elasticsearch (Alpha, Spice.ai Enterprise): Query Elasticsearch indexes as SQL tables with native hybrid search — vector_search() kNN, text_search() BM25, and rrf() fusion — plus Elasticsearch as a backing vector engine, direct FTS engine configuration, and index lifecycle controls.
GCS (Alpha): Federated queries against Google Cloud Storage, with Iceberg table support.
Azure Cosmos DB (Alpha): Read-only NoSQL / Core SQL API connector with cross-partition scans and schema inference.
Git (RC): HTTPS/SSH auth, Git LFS support, and per-repo connection resilience.
ADBC: Data connector and catalog with full query federation, BigQuery support, and schema/table discovery.
DuckLake (Beta): Lakehouse-style data management with DuckDB as the metadata catalog and object storage for data — ACID transactions, time travel, and schema evolution on Parquet.
Self-Hosted Spice Connector: Connect Spice to another self-hosted Spice runtime as a federated source.

New catalog connectors for PostgreSQL, MySQL, MSSQL, and Snowflake, using native metadata catalogs for schema and table discovery. Unity Catalog compatibility extends to OSS Unity Catalog deployments, and DDL-defined catalogs can expose and query views.

HTTP connector: OAuth2 refresh-token authentication, query-parameter and no-limit pagination, dynamic request headers parameterised from query predicates, subquery-driven request parameters for fan-out queries, response metadata as queryable columns, map-to-array conversion, shared and persistent rate-control state across restarts and replicas, no caching of transient 429/5xx errors, and a correctly populated fetched_at column.

JSON ingestion: Single-object documents, JSONL, BOM-prefixed input, Socrata SODA responses, format auto-detection, and RFC 6901 json_pointer extraction of nested payloads.

Databricks: Resilience controls, Unity Catalog-aware permission prechecks with structured advisory errors, Classic SQL Warehouse foreign-table compatibility, connect_timeout/client_timeout parameters, a Databricks SQL dialect for federation, and Delta Lake column mapping (Name and Id modes).

Other connector improvements: MongoDB SRV support; MySQL mysql_zero_date_behavior; Snowflake OBJECT, MAP, GEOGRAPHY, GEOMETRY, VECTOR, and TIMESTAMP_LTZ types plus key-pair auth; ClickHouse Date32; S3 s3_url_style for path-style addressing and faster Parquet reads; GraphQL custom auth headers; Oracle and MSSQL sort/limit pushdown; GitHub GraphQL resilience; and improved Kafka reliability.

AI & LLM

Provider-Aware Prompt Caching: LLM calls automatically use provider-side prompt caching (e.g., Anthropic, OpenAI) for system prompts and tool descriptions, reducing latency and cost.
Responses API Across All Providers: The Responses API works with every configured model provider, including streaming response.output_text.delta events and Authorization: Bearer header support.
Multi-Vector Embeddings with MaxSim: List-of-string columns produce one embedding per element with MaxSim/mean/sum scoring for ColBERT-style late-interaction retrieval, plus a _match column identifying the best-matching element.
rerank() UDTF: Reorder results from vector_search, text_search, or rrf using any registered chat model as a reranker, with automatic query propagation and pushdown support.
Searchable LLM Tool Registry: Agents discover tools via semantic search instead of enumerating every tool in the system prompt.
MCP Improvements: Streamable HTTP transport (/v1/mcp) on rmcp v1.5.0, native auth for streamable HTTP tools (mcp_auth_token, mcp_headers), external MCP server tool calls traced in task history, and configurable allowed_hosts.
Per-Model Rate-Limited AI UDF Execution for controlling concurrent AI function invocations.

Search & Vectors

DuckDB Vector Engine: vector_engine: duckdb uses DuckDB's HNSW index for fast approximate nearest-neighbor search without an external vector store. In v2.0.0, the DuckDB VSS extension is statically linked into the bundled DuckDB, so HNSW vector search works out-of-the-box on clean machines with no extension download. HNSW indexes are preserved across data refresh, and cosine_distance pushes down via array_cosine_distance.
Hybrid Search: Combine kNN vector search and BM25 full-text search with reciprocal rank fusion (rrf()), backed by Tantivy, Elasticsearch, or DuckDB.
Full-Text Search Performance: Significantly faster Tantivy ingestion with rollback-on-error, and search metadata is correctly preserved on indexing and in Vortex physical schema calculation.
Embedding Validation: row_id columns are validated during dataset initialization.

Caching

Improvements across Caching:

Stale-While-Revalidate: runtime.caching.sql_results.stale_while_revalidate_ttl serves stale results while revalidating in the background.
Cache Encoding: Optional compression (e.g., zstd) for SQL results cache entries.
Retention Policies for cached query results, and improved CDC-driven cache invalidation (including view plan invalidation on updates).
Idle Cache Maintenance: Periodic maintenance drains invalidation predicates on idle caches, fixing unbounded memory growth in rarely-read caches.

Performance & Query Engine

Apache DataFusion is upgraded to v52.5 over the course of the release cycle, bringing:

Sort Pushdown to Scans: ~30x faster top-K queries on pre-sorted data; Parquet scans reverse row-group order for DESC on ASC-sorted files.
Rewritten Sort-Merge Join: Up to three orders of magnitude faster in pathological cases (e.g., TPC-H Q21: minutes → milliseconds).
Dynamic Filters: MIN/MAX aggregates and hash-join build sides prune files, row groups, and rows during execution.
Faster CASE Expressions, statistics caching, and prefix-aware list-files caching for faster planning.
TableProvider DELETE/UPDATE hooks and the RelationPlanner API for extensible SQL planning.
Strict Overflow Handling: try_cast_to errors on overflow instead of silently producing NULLs.

Additional engine work: default query memory limit raised from 70% to 90% with GreedyMemoryPool, partial aggregation optimization for FlightSQLExec, improved partitioned query planning, and metastore transaction support to prevent concurrent conflicts.

Rust CLI

The Spice CLI is completely rewritten from Go to Rust — a single spice binary built from the same codebase as spiced, with full feature parity across 27+ commands.

spice query: Interactive REPL for async queries with multi-line SQL, progress indication, and cancellation.
spice dataset configure: Non-interactive flag-based configuration (--from, --description, --param KEY=VALUE, --set) alongside interactive prompts.
spice completions: Shell completion script generation.
--output=json: Machine-readable output for scripting; spice login --output adds env, json, and keychain modes.
spice init writes a yaml-language-server schema directive for IDE completions.

Observability

OpenTelemetry: Exporter fixes, authenticated metrics export, configurable metric name prefix (runtime.telemetry.metric_prefix), delta temporality by default, and OTLP resource attributes via runtime.telemetry.properties.
Query Metrics: The query_executions metric gains a datasets dimension for per-dataset query attribution.
Ingestion Metrics: rows_written, bytes_written, and dataset_acceleration_size_bytes for acceleration refresh and Flight DoPut/ADBC ingestion, and EXPLAIN ANALYZE metrics in FlightSQLExec.
Task History: Distributed task history in cluster mode and tracing for external MCP server tool calls.

Notable Bug Fixes

localpod synchronization: localpod child datasets correctly track parent refreshes when the parent uses the in-memory Arrow accelerator.
Spice Cloud federation: Correlated subqueries are kept out of JOIN ON conditions, fixing rejected federated queries.
refresh_mode: snapshot: No longer reports Ready with empty data when no snapshot exists.
Search metadata: Field and schema metadata preserved on search indexing and in Vortex physical schema calculation.
HTTP connector: fetched_at column is correctly populated.
Connector correctness: DynamoDB Streams transient-error retries and typed-NULL DML handling; ScyllaDB physical filter pushdown disabled to fix incorrect results; MSSQL TOP N pushdown; DuckDB DELETE/UPDATE on full and caching refresh modes; Turso checked arithmetic for timestamp conversions; ODBC queries no longer silently return 0 rows on failure; Flight GetFlightInfo/DoGet schema parity.

Dependency Updates

Dependency / Component	Version
DataFusion	v52.5
Ballista	v52
Arrow (arrow-rs)	v57.2
DuckDB	v1.5.3 (with statically linked VSS)
iceberg-rust	v0.9.1
Turso (libsql)	v0.6.1
Vortex	v0.69.0
delta_kernel	v0.18.2
rmcp (MCP)	v1.5.0
mistral.rs	v0.8.x (candle v0.10.1)
ADBC Core	v0.23
Rust toolchain	v1.94.1

Contributors

Breaking Changes

Models included by default: The separate models build variant has been removed. Local LLM inference is always included in the default build and image.
Windows native builds removed: Use WSL for local development.
Spicepod version defaults to v2: spice init creates version: v2 spicepods. v1 remains supported with auto-migration; v1beta1 is no longer accepted.

Flattened runtime.scheduler configuration: The nested runtime.scheduler.partition_management block is flattened and renamed:

# Before
runtime:
  scheduler:
    partition_management:
      interval: 30s
      max_assignments_per_cycle: 16
      discovery_timeout: 10s

# After
runtime:
  scheduler:
    partition_assignment_interval: 30s
    max_assignments_per_interval: 16
    partition_discovery_timeout: 10s

S3 metadata columns renamed: location, last_modified, size → _location, _last_modified, _size.
Default query memory limit changed: Increased from 70% to 90%.
Metric renames: accelerated_refresh metrics renamed to acceleration_refresh; last_refresh_time gauge renamed to include the milliseconds unit.
DuckDB parameter rename: partitioned_write_flush_threshold → partitioned_write_flush_threshold_rows.
/v1/search API: Always returns an array in matches, even for single results.
/v1/evals API removed.
Perplexity model provider removed.
x.ai model endpoint: x.ai models exclusively use the /v1/responses endpoint.

Upgrade Guide from v1.x

Most v1 spicepods continue to work on v2.0 — v1 remains supported and deprecated fields auto-migrate at load time — so many deployments can upgrade by updating the binary or image alone. The steps below cover the breaking changes that may require manual action. Review each before upgrading a production deployment.

1. Build, image, and platform changes

Models are now included by default. The separate models build variant (and the corresponding -models image tags) has been removed; local LLM inference is always included in the default build and image. If your deployment pinned a models build or -models-tagged image, switch to the default build/image.
Native Windows builds are removed. Use WSL for local Windows development.

2. Adopt Spicepod `v2` (recommended)

spice init now creates version: v2 spicepods. v1 spicepods remain supported with automatic migration, but v1beta1 is no longer accepted. To move to v2, set version: v2 and update the following fields — each auto-migrates from v1, but updating now clears the deprecation:

v1 (deprecated)	v2 (preferred)
`runtime.results_cache`	`runtime.caching.sql_results` (`cache_max_size` → `max_size`)
`runtime.memory_limit`	`runtime.query.memory_limit`
`runtime.temp_directory`	`runtime.query.temp_directory`
`dataset.invalid_type_action`	`dataset.unsupported_type_action`

3. Update changed configuration

DuckDB parameter rename: partitioned_write_flush_threshold → partitioned_write_flush_threshold_rows.
Default query memory limit raised from 70% to 90%. If you relied on the previous default to leave headroom for other processes on the host, set it explicitly via runtime.query.memory_limit.

4. Update queries and API clients

S3 metadata columns renamed: location, last_modified, size → _location, _last_modified, _size. Update any queries that reference these columns.
/v1/search always returns an array in matches, even for a single result. Update clients that assumed a scalar value.
/v1/evals API removed. Remove integrations that depend on it.

5. Update model providers

Perplexity model provider removed. Re-point affected models to another provider.
x.ai models use the /v1/responses endpoint exclusively. Ensure x.ai integrations target the Responses API.

6. Update observability

Metric renames: accelerated_refresh → acceleration_refresh, and the last_refresh_time gauge is renamed to include the milliseconds unit. Update dashboards and alerts that reference these metric names.

After updating, restart the runtime and verify datasets and models report ready via /v1/datasets?status=true and /v1/models?status=true (the CLI shows a Ready/ERROR column).

Cookbook Updates

New Spice Cookbook recipes added during the v2.0 release cycle:

Async Queries: Submit long-running queries asynchronously and retrieve results later.
DuckLake Catalog: Lakehouse-style data management with ACID transactions and time travel.
Distributed Query: Run Spice in multi-active distributed cluster mode.
mTLS: Mutual TLS for HTTP and Flight endpoints.
Elasticsearch Connector: Query Elasticsearch indexes as SQL tables.
MCP Server: Use Spice as an MCP server over Streamable HTTP.
Snowflake DML: Write-back to Snowflake with INSERT/UPDATE/DELETE.
PostgreSQL, MySQL, and MSSQL Catalogs: Schema and table discovery for external databases.
Full-Text Search: BM25 full-text search over accelerated datasets.

The Spice Cookbook includes more than 100 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v2.0.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:2.0.0 image:

docker pull spiceai/spiceai:2.0.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 2.0.0

AWS Marketplace:

Spice is available in the AWS Marketplace.

What's Changed

Changelog

Add TPC-DS integration tests with S3 source and PostgreSQL acceleration by @phillipleblanc in #9006
fix(tests): fix flaky/slow/failing unit tests by @phillipleblanc in #9009
fix: Update benchmark snapshots for DF51 upgrade by @app/github-actions in #9008
fix: add feature gate to rrf TEST_EMBEDDING_MODEL by @phillipleblanc in #9017
fix: features check by @phillipleblanc in #9014
fix: Enable Cayenne acceleration snapshots by @lukekim in #9020
URL table support by @lukekim in #9018
ScyllaDB key filter by @lukekim in #8997
fix: Schema mismatch when using column projection with HTTP caching by @phillipleblanc in #9021
Add more tests for HTTP caching with columns selection by @sgrebnov in #9025
HTTP cache snapshots: default to time_interval and fix snapshots_creation_policy: on_change by @sgrebnov in #9026
Fix duplicate snapshot creation on startup by @sgrebnov in #9029
Add ScyllaDB and SMB to the README table by @krinart in #9034
Remove waiting for runtime to be ready before creating snapshot by @krinart in #9033
Fix snapshot on_change policy to skip when no writes occurred by @sgrebnov in #9028
Release notes for release release/1.11.0-rc.2 by @krinart in #9016
ci: use arduino/setup-protoc for official protobuf compiler by @phillipleblanc in #9036
ci: install unzip on aarch64 runner for arduino/setup-protoc by @phillipleblanc in #9038
fix: don't fail release if upload to minio fails by @phillipleblanc in #9039
Add missing protoc step to setup-cc action by @krinart in #9041
fix: Update Search integration test snapshots by @app/github-actions in #9013
Fix formula_1 and codebase_community in bird-bench by @Jeadie in #9000
Cayenne S3 Express One Zone improvements by @lukekim in #9015
Add zlib1g-dev to CI by @lukekim in #9052
Improve validation and logging for hash indexes by @lukekim in #9047
Upgrade Vortex with CASE-WHEN by @lukekim in #9051
x.ai models now exclusively use /v1/responses endpoint by @lukekim in #9400
Improvements for snapshot schema comparison by @krinart in #9401
v2.0 breaking changes by @lukekim in #9233
Create PartitionManagementTask for scheduler to update accelerated table partition assignments by @Jeadie in #9378
refactor(Cayenne): route all write orchestration through CayenneDataSink by @sgrebnov in #9402
Refactor benchmark to use QueryExecutor trait by @Jeadie in #9418
feat: Add spidapter build and release workflow by @peasee in #9427
Testoperator: add support for api-key when connecting to external spice instance by @sgrebnov in #9421
Initial implementation of Ducklake catalog & data connectors by @lukekim in #9083
Require aws_lc_rs since jsonwebtoken upgrade by @Jeadie in #9426
feat: Add spidapter tool by @peasee in #9425
Add release notes for 1.11.2 patch release by @sgrebnov in #9430
feat(spidapter): integrate system-adapter-protocol with SCP provisioning by @phillipleblanc in #9434
Add DuckLake TPCH E2E workflow and federated Spicepod configuration by @lukekim in #9431
fix(spidapter): use Flight handshake auth instead of x-api-key header by @phillipleblanc in #9435
[spidapter] Keep only what sparks joy by @Jeadie in #9439
Refactor binary operator balancing by @Jeadie in #9424
feat: Add Iceberg DDL support (CREATE TABLE / DROP TABLE) for default catalog override by @phillipleblanc in #9440
Fix Flight SQL schema consistency: expand view types and verify field names by @sgrebnov in #9438
Update spidapter for new system-adapter-protocol by @sgrebnov in #9442
docs: fix typos and syntax errors in style guide and error handling docs by @cluster2600 in #9445
Add acceleration refresh ingestion metrics (rows_written, bytes_written) by @phillipleblanc in #9461
Refactor(Cayenne): Replace CatalogError and string based errors with Snafu errors by @sgrebnov in #9403
Replace deprecated claude-3-5-haiku-latest with claude-haiku-4-5 by @Jeadie in #9492
Fix #9481: Preserve schema in results cache for empty query results by @phillipleblanc in #9485
Fix partition by serializing by @Jeadie in #9474
query: reconcile execution stream nullability with logical plan schema by @phillipleblanc in #9486
initial spice-cloud-client crate and spice cloud metrics --app <app-name>. by @Jeadie in #9480
feat: Return dataset error message in datasets API by @peasee in #9487
Spicebench by @lukekim in #9447
build(deps): consolidate dependabot dependency updates by @phillipleblanc in #9504
fix(cluster): route non-partitioned accelerated tables in distributed mode by @phillipleblanc in #9508
Enable core scalar UDFs in refresh SQL by @sgrebnov in #9502
Fix metrics in Spidapter again by @Jeadie in #9497
fix(cluster): tolerate Completed->status propagation race in distributed query handle by @phillipleblanc in #9510
feat: Support distributed ingestion in cayenne catalog by @peasee in #9506
Fix Cayenne duplicate primary keys after DELETE + UPSERT CDC sequences by @krinart in #9494
fix(cluster): rewrite table scans inside subqueries for distributed execution by @phillipleblanc in #9518
fix: Set catalog mode to readwritecreate in spidapter by @peasee in #9519
Upgrade AWS SDK crates & set APN user-agent in AWS SDK credential bridge by @lukekim in #8328
feat(runtime): add runtime ready_state on_registration semantics by @lukekim in #9522
fix: Add spidapter post-setup retries by @peasee in #9526
Make partition discovery more robust and make initialization non-blocking by @sgrebnov in #9499
Make lint-rust-fix support targeted packages and features by @Jeadie in #9511
Handle new Cloud SCP API by @Jeadie in #9532
Refactor and simplify streaming benchmarks by @krinart in #9405
fix: ensure spidapter only increments attempts on failures by @peasee in #9534
feat: Support specifying app resources in spidapter by @peasee in #9536
test(runtime): Spice Cayenne DDL integration test by @lukekim in #9535
fix: Handle schema evolution mismatch errors during data refresh by @lukekim in #9527
fix: resolve clippy lint warnings by @phillipleblanc in #9547
pr-builds --tag <TAG> for build_and_release.yml by @Jeadie in #9507
Add --output flag to spice login with env/json/keychain modes by @Jeadie in #9541
Don't use 'PartitionedTableScanRewrite' in async distributed query by @Jeadie in #9548
feat(spidapter): add local backend mode with single executor by @phillipleblanc in #9531
support chat template in HF by @Jeadie in #9543
fix(cayenne): stream PK retention deletes and run OOM regression in CI by @phillipleblanc in #9533
cayenne: Staged append writes to prevent partial writes and data loss on stream error by @sgrebnov in #9491
AcceleratedTable::scan use FederatedTable::scan when ClusterRole::Scheduler by @Jeadie in #9550
Upgrade to delta-kernel-rs v0.18.2 by @lukekim in #9528
Run cayenne tests as part of PR CI by @sgrebnov in #9554
Upgrade to DataFusion v52.2.0 by @lukekim in #9419
Remove Snapshot Compaction + Add snapshot existence check by @krinart in #9523
Update dependencies by @lukekim in #9566
fix: Update benchmark snapshots by @app/github-actions in #9565
fix: Compare Cayenne table configuration on startup by @peasee in #9529
Make Refresh::refresh_sql more robust to alterations over time. by @Jeadie in #9549
fix: Update datafusion-table-providers dependency to latest revision by @lukekim in #9574
Unset AWS_ENDPOINT_URL when empty by @krinart in #9575
fix: allow BytesProcessedExec repartitioning for unordered input by @lukekim in #9540
Sanitize DataFusion errors by @lukekim in #9530
Add conditional logging for partition assignments by @Jeadie in #9577
use 'properly early exit on SIGTERM' by @Jeadie in #9573
Update datafusion to 52.2.0 by @phillipleblanc in #9582
Ensure we query one and only one partition per request by @Jeadie in #9416
feat: Add support for Spicepod version v2 by @lukekim in #9583
[SpiceDQ] Improve error messages; Avoid race condition on allocate_initial_partitions. by @Jeadie in #9579
Update ballista dependencies to latest 52.0.0 revision by @lukekim in #9581
Fix Databricks spark_connect mode always disabled by @phillipleblanc in #9586
Support partitioning in Arrow accelerator by @Jeadie in #9571
Fix spice query CLI response deserialization by @phillipleblanc in #9588
fix: Update benchmark snapshots by @app/github-actions in #9584
fix: Share RuntimeEnv across Cayenne read/write/delete paths for targeted list_files_cache invalidation by @sgrebnov in #9589
feat: Add file:// state_location support for async queries scheduler by @phillipleblanc in #9590
Update endgame links by @krinart in #9598
ci: fix E2E CLI upgrade test to use latest release for spiced download by @phillipleblanc in #9613
fix(DF): Lazily initialize BatchCoalescer in RepartitionExec to avoid schema type mismatch by @sgrebnov in #9623
feat: Implement catalog connectors for various databases by @lukekim in #9509
Refactor and clean up code across multiple crates by @lukekim in #9620
fix: Improve error handling for distributed mode and state_location configuration by @lukekim in #9611
Properly install postgres in install-postgres action by @krinart in #9629
fix: Use Python venv for schema validation in CI by @phillipleblanc in #9637
Update spicepod.schema.json by @app/github-actions in #9640
Update testoperator dispatch to use release/2.0 branch by @phillipleblanc in #9641
fix: Align CUDA asset names in Dockerfile and install tests with build output by @phillipleblanc in #9639
Fix expect test scripts in E2E Installation AI test by @sgrebnov in #9643
testoperator for partitioned arrow accelerator by @Jeadie in #9635
Remove default 1s refresh_check_interval from spidapter for hive datasets by @phillipleblanc in #9645
Fix scheduler panic and cancel race condition by @phillipleblanc in #9644
Align Spice.ai connector parameter names across catalog/data connectors by @lukekim in #9632
docs: update distribution details and add NAS support in release notes by @lukekim in #9650
Enable postgres-accel in CI builds for benchmarks by @sgrebnov in #9649
perf: Cache Turso metastore connection across operations by @penberg in #9646
Add 'scheduler_state_location' to spidapter by @Jeadie in #9655
Implement Cayenne S3 Express multi-zone live test with data validation by @lukekim in #9631
chore(spidapter): bump default memory limit from 8Gi to 32Gi by @phillipleblanc in #9661
perf: Use prepare_cached() in Turso and SQLite metastore backends by @penberg in #9662
Improve CDC cache invalidation by @krinart in #9651
Refactor Cayenne IDs to use UUIDv7 strings by @lukekim in #9667
fix: add liveness check for dead executors in partition routing by @Jeadie in #9657
fix(s3): Fix metadata column schema mismatches in projected queries by @sgrebnov in #9664
s3_metadata_columns tests: include test for location outside table prefix by @sgrebnov in #9676
docs: Update DuckDB, GCS, Git connector and Cayenne documentation by @lukekim in #9671
Add s3_url_style support for S3 connector URL addressing by @phillipleblanc in #9642
Consolidate E2E workflows and require WSL for Windows runtime by @lukekim in #9660
Upgrade to Rust v1.93.1 by @lukekim in #9669
Security fixes and improvements by @lukekim in #9666
feat(flight): add DoPut rows/bytes written metrics for DoPut ETL ingestion tracking by @phillipleblanc in #9663
Skip caching http error response + add response_headers by @krinart in #9670
refactor: Remove v1/evals functionality by @Jeadie in #9420
Make a test harness for Distributed Spice integration tests by @Jeadie in #9615
Enable on_zero_results: use_source for views by @krinart in #9699
fix(spidapter): Lower memory limit, passthrough AWS secrets, override flight URL by @peasee in #9704
Show an error on a shared acceleration file with snapshots enabled by @krinart in #9698
Fixes for anthropic by @Jeadie in #9707
Use max_partitions_per_executor in allocate_initial_partitions by @Jeadie in #9659
[SpiceDQ] Accelerations must have partition key by @Jeadie in #9711
Upgrade to Turso v0.5 by @lukekim in #9628
feat: Rename metadata columns to _location, _last_modified, _size by @phillipleblanc in #9712
fix: bump datafusion-ballista to fix BatchCoalescer schema mismatch panic by @phillipleblanc in #9716
fix: Ensure Cayenne respects target file size by @peasee in #9730
refactor: Make DDL preprocessing generic from Iceberg DDL processing by @peasee in #9731
[SpiceDQ] Distribute query of Cayenne Catalog to executors with data by @Jeadie in #9727
Properly set primary_keys/on_conflict for Cayenne tables by @krinart in #9739
Add executor resource and replica support to cloud app config by @ewgenius in #9734
feat: Support PARTITION BY in Cayenne Catalog table creation by @peasee in #9741
Update datafusion and related packages to version 52.3.0 by @lukekim in #9708
Route FlightSQL statement updates through QueryBuilder by @phillipleblanc in #9754
JSON file format improvements by @lukekim in #9743
[SpiceDQ] Partition Cayenne catalogs writes through to executors by @Jeadie in #9737
Update to DF v52.3.0 versions of datafusion & datafusion-tableproviders by @lukekim in #9756
Make S3 metadata column handling more robust by @sgrebnov in #9762
Fetch API keys from dedicated endpoint instead of apps response by @phillipleblanc in #9767
Update arrow-rs, datafusion-federation, and datafusion-table-providers dependencies by @phillipleblanc in #9769
Chunk metastore batch inserts to respect SQLite parameter limits by @phillipleblanc in #9770
Improve JSON SODA support by @lukekim in #9795
Add ADBC Data Connector by @lukekim in #9723
docs: Release Cayenne as RC by @peasee in #9766
cli[feat]: cloud mode to use region-specific endpoints by @lukekim in #9803
Include updated JSON formats in HTTPS connector by @lukekim in #9800
Flight DoPut: Partition-aware write-through forwarding by @Jeadie in #9759
Pass through authentication to ADBC connector by @lukekim in #9801
Move scheduler_state_location from adapter metadata to env var by @phillipleblanc in #9802
Fix Cayenne DoPut upsert returning stale data after 3+ writes by @phillipleblanc in #9806
Fix JSON column projection producing schema mismatch by @sgrebnov in #9811
Fix http connector by @krinart in #9818
Fix ADBC Connector build and test by @lukekim in #9813
Support update & delete DML for distributed cayenne catalog by @Jeadie in #9805
Set allow_http param when S3 endpoint uses http scheme by @phillipleblanc in #9834
fix: Cayenne Catalog DDL requires a connected executor in distributed mode by @Jeadie in #9838
fix: Add conditional put support for file:// scheduler state location by @Jeadie in #9842
fix: Require the DDL primary key contain the partition key by @Jeadie in #9844
fix: Databricks SQL Warehouse schema retrieval with INLINE disposition and async retry by @lukekim in #9846
Filter pushdown improvements for SqlTable by @lukekim in #9852
feat: add iam_role_source parameter for AWS credential configuration by @lukekim in #9854
Fix ODBC queries silently returning 0 rows on query failure by @lukekim in #9864
feat(adbc): Add ADBC catalog connector with schema/table discovery by @lukekim in #9865
Make Turso SQL unparsing more robust and fix date comparisons by @lukekim in #9871
Fix Flight/FlightSQL filter precedence and mutable query consistency by @lukekim in #9876
Partial Aggregation optimisation for FlightSQLExec by @lukekim in #9882
fix: v1/responses API preserves client instructions when system_prompt is set by @Jeadie in #9884
feat: emit scheduler_active_executors_count and use it in spidapter by @Jeadie in #9885
feat: Add custom auth header support for GraphQL connector by @krinart in #9899
Add --endpoint flag to spice run with scheme-based routing by @lukekim in #9903
When executor connects, send DDL for existing tables by @Jeadie in #9904
fix: Improve ADBC driver shutdown handling and error classification by @lukekim in #9905
fix: require all executors to succeed for distributed DML (DELETE/UPDATE) forwarding by @Jeadie in #9908
fix(cayenne catalog): fix catalog refresh race condition causing duplicate primary keys by @Jeadie in #9909
Remove Perplexity support by @Jeadie in #9910
Fix refresh_sql support for debezium constraints by @krinart in #9912
Implement DML for DynamoDBTableProvider by @lukekim in #9915
chore: Update iceberg-rust fork to v0.9 by @lukekim in #9917
Run physical optimizer on FallbackOnZeroResultsScanExec fallback plan by @sgrebnov in #9927
Improve Databricks error message when dataset has no columns by @sgrebnov in #9928
Delta Lake: fix data skipping for >= timestamp predicates by @sgrebnov in #9932
fix: Ensure distributed Cayenne DML inserts are forwarded to executors by @Jeadie in #9948
Add full query federation support for ADBC data connector by @lukekim in #9953
Make time_format deserialization case-insensitive by @claudespice in #9955
Hash ADBC join-pushdown context to prevent credential leaks in EXPLAIN plans by @lukekim in #9956
fix: Normalize Arrow Dictionary types for DuckDB and SQLite acceleration by @sgrebnov in #9959
ADBC BigQuery: Improve BigQuery dialect date/time and interval SQL generation by @lukekim in #9967
Make BigQueryDialect more robust and add BigQuery TPC-H benchmark support by @lukekim in #9969
fix: Show proper unauthorized error instead of misleading runtime unavailable by @lukekim in #9972
fix: Enforce target_chunk_size as hard maximum in chunking by @lukekim in #9973
Add caching retention by @krinart in #9984
fix: improve Databricks schema error detection and messages by @lukekim in #9987
fix: Set default S3 region for opendal operator and fix cayenne nextest by @phillipleblanc in #9995
fix(PostgreSQL): fix schema discovery for PostgreSQL partitioned tables by @sgrebnov in #9997
fix: Defer cache size check until after encoding for compressed results by @krinart in #10001
fix: Rewrite numeric BETWEEN to CAST(AS REAL) for Turso by @lukekim in #10003
fix: Handle integer time columns in append refresh for all accelerators by @sgrebnov in #10004
fix: preserve s3a:// scheme when building OpenDalStorageFactory with custom endpoint by @phillipleblanc in #10006
Fix ISO8601 time_format with Vortex/Cayenne append refresh by @sgrebnov in #10009
fix: Address data correctness bugs found in audit by @sgrebnov in #10015
fix(federation): fix SQL unparsing for Inexact filter pushdown with alias by @lukekim in #10017
Improve GitHub connector ref handling and resilience by @lukekim in #10023
feat: Add spice completions command for shell completion generation by @lukekim in #10024
fix: Fix data correctness bugs in DynamoDB decimal conversion and GraphQL pagination by @sgrebnov in #10054
Implement RefreshDataset for distributed control stream by @Jeadie in #10055
perf: Improve S3 parquet read performance by @sgrebnov in #10064
fix: Prevent write-through stalls and preserve PartitionTableProvider during catalog refresh by @Jeadie in #10066
feat: spice completions auto-detects shell directory and writes file by @lukekim in #10068
fix: Bug in DynamoDB, GraphQL, and ISO8601 refresh data handling by @sgrebnov in #10063
fix partial aggregation deduplication on string checking by @lukekim in #10078
fix: add MetastoreTransaction support to prevent concurrent transaction conflicts by @phillipleblanc in #10080
fix: Use GreedyMemoryPool, add spidapter query memory limit arg by @phillipleblanc in #10082
feat: Add metrics for EXPLAIN ANALYZE in FlightSQLExec by @lukekim in #10084
Use strict cast in try_cast_to to error on overflow instead of silent NULL by @sgrebnov in #10104
feat: Implement MERGE INTO for Cayenne catalog tables by @peasee in #10105
feat: Add distributed MERGE INTO support for Cayenne catalog tables by @peasee in #10106
Improve JSON format auto-detection for single multi-line objects by @lukekim in #10107
Add mode: file_update acceleration mode by @krinart in #10108
Coerce unsupported Arrow types to Iceberg v2 equivalents in REST catalog API by @peasee in #10109
fix: Update default query memory limit to 90% from 70% by @phillipleblanc in #10112
feat: Add mTLS client auth support to spice sql REPL by @lukekim in #10113
fix(datafusion-federation): report error on overflow instead of silent NULL by @sgrebnov in #10124
fix: Prevent data loss in MERGE when source has duplicate keys by @peasee in #10126
feat: Add ClickHouse Date32 type support by @sgrebnov in #10132
Add Delta Lake column mapping support (Name/Id modes) by @sgrebnov in #10134
fix: Restore Turso numeric BETWEEN rewrite lost in DML revert by @lukekim in #10139
fix: Enable arm64 Linux builds with fp16 and lld workarounds by @lukekim in #10142
fix: remove double trailing slash in Unity Catalog storage locations by @sgrebnov in #10147
fix: Improve GitHub GraphQL client resilience and performance by @lukekim in #10151
Enable reqwest compression and optimize HTTP client settings by @lukekim in #10154
fix: executor startup failures by @Jeadie in #10155
feat: Distributed runtime.task_history support by @Jeadie in #10156
fix: Preserve timestamp timezone in DDL forwarding to executors by @peasee in #10159
feat: Per-model rate-limited concurrent AI UDF execution by @Jeadie in #10160
fix(Turso): Reject subquery/outer-ref filter pushdown in Turso provider by @lukekim in #10174
Fix linux/macos spice upgrade by @phillipleblanc in #10194
Improve CREATE TABLE LIKE error messages, success output, EXPLAIN, and validation by @peasee in #10203
fix: chunk MERGE delete filters and update Vortex for stack-safe IN-lists by @peasee in #10207
Propagate runtime.params.parquet_page_index to Delta Lake connector by @sgrebnov in #10209
Properly mark dataset as Ready on Scheduler by @Jeadie in #10215
fix: handle Utf8View/LargeUtf8 in GitHub connector ref filters by @lukekim in #10217
fix(databricks): Fix schema introspection and timestamp overflow by @lukekim in #10226
fix(databricks): Fix schema introspection failures for non-Unity-Catalog environments by @lukekim in #10227
feat: Add pagination support to HTTP data connector by @lukekim in #10228
feat(databricks): DESCRIBE TABLE fallback and source-native type parsing for Lakehouse Federation by @lukekim in #10229
fix(databricks): harden HTTP retries, compression, and token refresh by @lukekim in #10232
feat[helm chart]: Add support for ServiceAccount annotations and AWS IRSA example by @peasee in #9833
fix: Log warning and fall back gracefully on Cayenne config change by @krinart in #9092
fix: Handle engine mismatch gracefully in snapshot fallback loop by @krinart in #9187
fix: Full Text Search schema mismatch with ADBC connector by @lukekim in #10235
docs: Update v2.0.0-rc.2 release notes with latest changes by @lukekim in #10238
Fix append refresh dedup failure when refresh_sql selects column subset by @sgrebnov in #10225
Revert "Properly mark dataset as Ready on Scheduler (#10215)" by @sgrebnov in #10242
Fix failing merge conflicts for benchmarks by @krinart in #10247
fix(github): fetch commits for dynamic and slash refs by @lukekim in #10233
Upgrade DataFusion to v52.5.0-rc1 by @lukekim in #10249
Merge develop to trunk (2026-04-09) by @claudespice in #10248
fix: Validate embedding row_id columns during dataset init (fixes #8226) by @claudespice in #10208
fix: Update tpch benchmark snapshots for federated/glue[csv].yaml by @app/github-actions in #10244
feat(databricks): add resilience controls, UC awareness, and task history instrumentation by @lukekim in #10246
fix: Make PartitionManager resilient to bare vs fully qualified table references by @sgrebnov in #10257
fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-cayenne[file].yaml by @app/github-actions in #10256
Merge develop to trunk (2026-04-10) by @claudespice in #10251
Improve Snowflake/ADBC dataset registration performance and observability by @lukekim in #10266
Fixes for kafka connector by @krinart in #10263
fix(runtime): gate otel code tags, suppress aws sdk noise, and unblock connector init by @lukekim in #10260
fix(runtime): avoid regionless AWS SDK loads by @lukekim in #10271
Add versioned release install workflow coverage by @lukekim in #10276
fix(runtime): handle HTTP JSON unions and spicepod reloads by @lukekim in #10277
Databricks UC permission prechecks: explicit denial as permanent error, ambiguous cases advisory by @lukekim in #10274
Revert component status changes re-introduced by develop merge (#10248) by @sgrebnov in #10293
Fix broken CI workflows by @ewgenius in #10294
Group dependabot updates by ecosystem by @lukekim in #10296
fix(tests): Replace flaky S3 Vectors snapshot tests with structural validation by @lukekim in #10301
Update test_github_workflows snapshot by @lukekim in #10304
fix(ci): fix Bedrock runner mismatch and snapshot auto-merge failure by @ewgenius in #10306
feat(http): Add map-to-array conversion and query-parameter pagination by @lukekim in #10295
New crate: datafusion-ddl by @Jeadie in #10205
Make Databricks UC permission checks advisory with structured error reporting by @lukekim in #10283
build(deps): bump the github-actions-dependencies group with 4 updates by @app/dependabot in #10298
fix: Clear cached plans on view updates by @peasee in #10312
build(deps): bump the aws-sdk group with 7 updates by @app/dependabot in #10299
Code out of runtime. by @Jeadie in #10178
fix: Respect function registry denies for accelerated table filter pushdown by @peasee in #10311
fix: Don't block heartbeat when all slots acquired by @peasee in #10322
fix: strip only outer parens in get_table_partition_expr_from_ctx by @Jeadie in #10323
Upgrade datafusion-table-providers with MongoDB SRV support by @lukekim in #10317
fix: Avoid pushing down bucketing partition expressions into executors by @peasee in #10324
Upgrade datafusion-table-providers to d1b911a5 and bump adbc to 0.23 by @lukekim in #10329
fix: Update Search integration test snapshots by @app/github-actions in #10308
Handle foreign table + Classic sql warehouse combination gracefully by @krinart in #10318
New crate datafusion-flightsql by @Jeadie in #10201
Set tantivy=warn unless very verbose logging by @Jeadie in #10338
Remove image registry and image name options from spidapter by @ewgenius in #10241
build(deps): bump sysinfo from 0.37.2 to 0.38.4 by @app/dependabot in #10291
build(deps): bump futures from 0.3.31 to 0.3.32 by @app/dependabot in #10289
New crate 'datafusion-dml' by @Jeadie in #10334
Jeadie/26 04 16/spice sql by @Jeadie in #10343
Add Teraswitch/Pittsburgh apt mirrors + retry config for CI runners by @lukekim in #10349
Implement sort pushdown and fix pushdown gaps across providers by @lukekim in #10337
Merge develop to trunk (2026-04-16) by @claudespice in #10345
Update candle and mistral.rs lock-step pins by @lukekim in #10278
docs: fix status badges in README by @lukekim in #10350
Migrate secrets to vars by @krinart in #10354
Add limit pushdown and improve sort pushdown for Oracle and MSSQL by @sgrebnov in #10351
Fix ubuntu mirror configuration by @ewgenius in #10359
fix: Increase throughput test default ready_wait from 30s to 300s (fixes #8207) by @claudespice in #10344
Add auth headers support to OTEL metrics exporter by @lukekim in #10347
fix(github): shrink GraphQL page size on gateway errors; lower comment defaults by @lukekim in #10355
Relax apt mirror substitution failure to warning in CI action by @ewgenius in #10361
feat(http): Add OAuth2 refresh-token auth to HTTP connector by @lukekim in #10348
Upgrade Rust toolchain to 1.94.1 by @lukekim in #10353
Handle order by and sort in PartitionedTableScanRewrite by @Jeadie in #9656
Fix OTEL Exporter by @krinart in #10363
Pin spiceai candle / TEI forks to merged revs; drop local [patch] overrides by @lukekim in #10362
Integrate spiceio and makefile_targets into pr.yml by @lukekim in #10357
ci: skip artifact compression for test binaries/archives by @lukekim in #10381
chore(deps): bump spiceai/candle, spiceai/mistral.rs, aws-lc-rs, tantivy, rand by @lukekim in #10379
Bump datafusion-table-providers (#10375) by @lukekim in #10384
fix: Update Search integration test snapshots by @app/github-actions in #10376
v2.0.0-rc.3 preparation by @ewgenius in #10382
fix(spicepod): JSON schema accepts string or {name: expr} for partition_by by @lukekim in #10352
fix: Use ROUND for Turso decimal BETWEEN comparisons (fixes #9872) by @claudespice in #10360
Revert "v2.0.0-rc.3 preparation" from trunk by @ewgenius in #10386
Add on_schema_resolved dataset ready state by @lukekim in #10368
feat: Add Elasticsearch data connector with hybrid search support by @lukekim in #10258
ci: bump test archive upload compression-level to 1 by @lukekim in #10388
feat(git-connector): promote Git connector to RC status by @lukekim in #10385
feat(postgres): stream WAL directly to Spice accelerators by @lukekim in #10364
Add schema decomposition to the HTTP connector by @lukekim in #10393
fix(cayenne): Skip catalog refresh state reload for existing providers by @sgrebnov in #10396
Make cayenne-flightsql tool by @Jeadie in #10356
build(deps): bump the github-actions-dependencies group with 2 updates by @app/dependabot in #10398
Update openapi.json by @app/github-actions in #10272
Merge develop to trunk — 2026-04-19 by @claudespice in #10407
feat(otel): default OTLP push exporter to delta temporality by @phillipleblanc in #10412
fix: Restore analyzer rule ordering to run federation before type coercion by @sgrebnov in #10415
fix: Map Utf8/LargeUtf8 to STRING in Databricks/Spark SQL dialects by @sgrebnov in #10420
feat(otel): add metric name prefix at runtime.telemetry.metric_prefix by @phillipleblanc in #10418
fix: Map LargeUtf8 to VARCHAR in Athena ODBC dialect by @sgrebnov in #10419
feat(cluster): connector-driven object store registration on executors by @phillipleblanc in #10414
build(deps): bump ubuntu from 22.04 to 24.04 in the docker-dependencies group by @app/dependabot in #10397
fix: Update benchmark snapshots Apr 20 by @app/github-actions in #10417
feat(otel): apply runtime.telemetry.properties as resource attributes on exported metrics by @phillipleblanc in #10416
Publish RC releases to DockerHub; upgrade runners to ubuntu-24.04 by @lukekim in #10428
feat: Add Azure Cosmos DB (NoSQL) data connector (RC) by @lukekim in #10392
feat(datafusion): flatten_json_properties + json_tree UDTFs by @lukekim in #10406
Harden /v1/tools and /v1/nsql against unauthenticated / LLM-driven SQL by @lukekim in #10365
feat(embeddings): multi-vector embeddings with MaxSim + late-interaction by @lukekim in #10408
Update GH runners for CUDA builds by @ewgenius in #10432
fix(delta_lake): register object stores on cluster executors by @phillipleblanc in #10436
DF-native DML by @krinart in #10327
ci: run Build and Test on spiceai-macos; split install jobs by profile by @lukekim in #10434
Improve search UDTFs: text_search, vector_search, rrf by @lukekim in #10387
fix(model2vec): Improve robustness of model loading for sentence-transformers layouts by @sgrebnov in #10444
Merge develop to trunk — 2026-04-21 by @claudespice in #10448
Enable filter pushdown for vector_search UDTF by @sgrebnov in #10447
Support Snowflake OBJECT, MAP, GEOGRAPHY, GEOMETRY, VECTOR, TIMESTAMP_LTZ types by @lukekim in #10451
Fix Databricks tests by @krinart in #10449
fix(cluster): forward register_object_stores through connector wrappers by @phillipleblanc in #10460
Fixes for vector-search by @krinart in #10455
Add expand_maps option and flatten_json UDTF by @lukekim in #10452
fix: Update Search integration test snapshots by @app/github-actions in #10458
Fix physical codec decode ambiguity for empty protobuf messages by @sgrebnov in #10466
chore(logging): demote s3_single_file_cached skip refresh log to debug by @phillipleblanc in #10467
Enable filter pushdown for rrf UDTF by @sgrebnov in #10465
feat(cluster): consolidate distributed state into cluster.json by @phillipleblanc in #10463
feat(cayenne): Add column statistics and data inlining by @lukekim in #10314
docs(copilot): flag missing wrapper delegation when adding default trait methods by @phillipleblanc in #10461
Wire Elasticsearch vector engine write path through acceleration by @lukekim in #10453
Add helm lint CI by @ewgenius in #10468
Fix Azure and GCS acceleration snapshot object store credential handling by @phillipleblanc in #10486
Update spicepod.schema.json by @app/github-actions in #10485
fix(secrets): harden AWS Secrets Manager secret store by @lukekim in #10478
Update datafusion-ballista crate by @sgrebnov in #10488
feat(secrets): add ParameterSpec and more params for AWS secrets manager by @phillipleblanc in #10487
Add rerank UDTF for hybrid search with query auto-propagation by @lukekim in #10469
Fix flatten_json_properties by @krinart in #10475
fix: preserve field and schema metadata in expand_views_schema by @claudespice in #10494
Upgrade rmcp to upstream 1.5.0; switch MCP server to Streamable HTTP by @lukekim in #10491
fix: handle Snowflake TIMESTAMP_LTZ wire format and prevent nanosecond overflow by @claudespice in #10493
Lint parity in Makefile by @krinart in #10492
Add connect_timeout/client_timeout params to Databricks sql_warehouse mode by @lukekim in #10495
fix(tracing): suppress opentelemetry INFO logs at all verbosity levels by @lukekim in #10497
DynamoDB DML by @krinart in #10470
feat(cayenne): native vector search via SIMD similarity UDFs by @lukekim in #10456
fix(cli): suppress banner for all JSON-producing cloud subcommands (fixes #10498) by @claudespice in #10510
fix(deps): bump openssl to 0.10.78 by @phillipleblanc in #10509
fix(s3): quiet AWS SDK credential probe when no region is configured by @phillipleblanc in #10506
fix(cdc): emit ready signal on caught-up Kafka/Debezium streams (#5201) by @phillipleblanc in #10504
runtime-cluster crate + Run partition discovery before forwarding refresh to executors by @krinart in #10490
Update lint-rust target to use --keep-going by @Jeadie in #10508
Add TPC-H SF100 s3[parquet]-duckdb[file] benchmark spicepod by @lukekim in #10524
Remove dev-profile install steps from pr.yml by @Jeadie in #10507
fix: add missing NULL check on Timestamp path in append refresh by @claudespice in #10518
fix: return error on Decimal128/256 overflow instead of silently dropping scale by @claudespice in #10519
fix: delegate update and delete_from in IndexedTableProvider and EmbeddingTable by @claudespice in #10520
feat(devx): make config errors, CLI, and REPL lead users to success by @lukekim in #10489
fix(rerank): defer execution to RerankExec, enable filters and projection pushdown by @sgrebnov in #10514
fix(llms): support Gemma models with missing attention_bias config field by @lukekim in #10523
Fix vector_search silently ignoring named limit/column/include_score args by @sgrebnov in #10527
fix: split unsupported filters locally in scan() for UseSource mode by @ewgenius in #10528
feat(secrets): add Azure Key Vault secret store by @lukekim in #10496
Bump mistralrs by @krinart in #10532
Fix benchmark configurations and CI build issues by @sgrebnov in #10535
Fix catalog query overrides for MySQL and MSSQL benchmarks by @sgrebnov in #10543
For Cayenne, preserve matched columns for MERGE ... ON <cols> by @Jeadie in #10340
build(deps): bump the aws-sdk group across 1 directory with 5 updates by @app/dependabot in #10538
docs: update AI agent instructions (git workflow + Rust 1.94) by @lukekim in #10544
fix: Update tpch benchmark snapshots by @app/github-actions in #10529
fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-duckdb[file].yaml by @app/github-actions in #10525
Extract runtime-datafusion from runtime by @krinart in #10545
Use generic DML extension planner for Cayenne by @Jeadie in #10437
fix: Update Search integration test snapshots by @app/github-actions in #10552
Fix security and correctness audit issues by @lukekim in #10526
fix(MySQL): revert MySQL result column reorder to fix federated query failures by @sgrebnov in #10557
Fix protoc installation by @krinart in #10566
fix: Disable Ballista dynamic filters on HashJoinExec by @peasee in #10548
Support views on DDL catalogs by @Jeadie in #10554
Update datafusion by @Jeadie in #10422
Improve full-text search indexing performance by @sgrebnov in #10464
feat(mysql): add mysql_zero_date_behavior parameter (null|error) by @phillipleblanc in #10573
fix(snowflake): declare private_key in connector PARAMETERS (fixes #10517) by @claudespice in #10559
Honour CARGO_TARGET_DIR in Makefiles by @Jeadie in #10569
Enable cosine_distance pushdown to DuckDB accelerator via array_cosine_distance by @sgrebnov in #10564
fix: Update test snapshots by @app/github-actions in #10570
fix: Update tpch benchmark snapshots by @app/github-actions in #10560
feat(snapshots): make snapshots an optional feature by @phillipleblanc in #10574
Enforce read-only API key restrictions on Flight DoGet and async query paths by @Jeadie in #10551
Improved security posture on Github workflows by @Jeadie in #10556
fix: Update datafusion-table-providers to improve SqlTable filter pushdown by @sgrebnov in #10595
feat(secrets): add HashiCorp Vault secret store by @phillipleblanc in #10561
fix: delegate update() in UpsertDedupTableProvider to inner provider by @claudespice in #10593
Add DuckDB vector engine support by @lukekim in #10562
Sharepoint - add object-store listing connector with expanded auth and write support by @lukekim in #10473
fix: Install protoc from source by @peasee in #10597
Enable DML support for PostgreSQL data connector by @phillipleblanc in #10446
feat(postgres): support inline PEM sslrootcert by @claudespice in #10578
Add foreign key metadata discovery to PostgreSQL Catalog by @sgrebnov in #10849
Add Snowflake DML support by @lukekim in #10747
Add MongoDB Change Streams support by @lukekim in #10813
Add user-defined functions by @lukekim in #10571
Add table user functions and gate HTTP servers by @lukekim in #10675
feat: add on-demand dataset loading by @phillipleblanc in #10629
feat(runtime): declared-schema deferred datasets by @phillipleblanc in #10669
feat(spicepod, runtime): add columns[].type / nullable + lenient type parser by @phillipleblanc in #10661
Replace external smb crate with internal SMB 3.1.1 client by @phillipleblanc in #10516
Add unified query cancellation across all paths by @lukekim in #10390
Add dynamic HTTP request headers by @lukekim in #10604
feat(http): Support dynamic HTTP connector request params from subqueries by @lukekim in #10636
feat(http): pass through HTTP metadata columns with JSON schema decomposition by @lukekim in #10679
Add nolimit HTTP pagination max pages by @lukekim in #10673
Add shared HTTP rate control for connectors by @lukekim in #10648
Use origin label instead of name for HTTP rate control metrics by @lukekim in #10689
fix(http): reject OR across different HTTP filter columns by @lukekim in #10625
Add provider-aware LLM prompt caching by @lukekim in #10645
Add searchable registry mode for LLM tools by @lukekim in #10647
feat: refresh_mode: snapshot + SQLite/Turso WAL flush + Cayenne metastore slice by @phillipleblanc in #10651
feat: per-principal cache namespacing for SQL/search/caching-accelerator by @lukekim in #10702
Add self-hosted Spice connector support by @phillipleblanc in #10546
Add Delta Lake Azure tenant parameter by @phillipleblanc in #10671
Support OAuth2 client credentials in 'spice cloud login' by @ewgenius in #10586
Add configurable allowed_hosts for MCP by @lukekim in #10638
fix: make Helm chart probes configurable by @peasee in #10696
Strip high-cardinality datasets dim from anonymous telemetry by @lukekim in #10711
feat(elasticsearch): direct FTS engine config + index lifecycle and ingestion controls by @lukekim in #10672
Add DuckDB HNSW vector index support for accelerated views by @sgrebnov in #10695
Rewrite DuckDB vector search SQL to activate HNSW_INDEX_SCAN by @sgrebnov in #10674
Fix DuckDB HNSW vector indexes lost after data refresh by @sgrebnov in #10668
Fix DuckDB DELETE/UPDATE on full and caching refresh mode datasets by @phillipleblanc in #10632
Fix DuckLake connector: downcast, module registration, schema discovery, and S3 credentials by @sgrebnov in #10650
Fix federation pushing denied functions inside subqueries to remote engines by @phillipleblanc in #10692
fix(caching): honour refresh_on_startup: always in caching mode by @phillipleblanc in #10594
fix(iceberg): rebuild storage factory when Hadoop catalog scheme is inferred by @sgrebnov in #10601
Pipeline CDC ingestion: overlap source reads with batch apply by @lukekim in #10676
fix: add NULL check to CDC primary key extraction by @lukekim in #10684
Properly handle nullability during CDC processing by @krinart in #10803
Flatten scheduler config and rename partition management → partition assignment by @lukekim in #10450
Improve NSQL UX and harden internal LLM tools by @lukekim in #10715
Support Responses API across model providers by @lukekim in #10724
Update xAI default model and handle Grok model retirements by @Jeadie in #10723
Improve cli table layout by @krinart in #10725
TLS cert hot-reload (mTLS plan M1) by @phillipleblanc in #10727
Fix DuckLake catalog include filter being ignored by @phillipleblanc in #10738
Promote DuckLake Catalog and Data Connector to Beta quality by @sgrebnov in #10743
feat(ducklake): Support INSERT on catalog tables with read_write access by @sgrebnov in #10744
perf(cdc): coalesce envelopes and overlap commits in apply pipeline by @lukekim in #10745
feat: Allow full version tags in spicepod version by @peasee in #10748
Add Arrow primary key upserts by @lukekim in #10749
fix(snapshot): keep refresh_mode snapshot read-only by @phillipleblanc in #10752
feat(tls): public mTLS for HTTP and Flight (channel + identity modes) by @phillipleblanc in #10753
perf(cayenne): lock-free deletion caches with bloom-prefiltered probe by @lukekim in #10756
fix(security): close API key timing-position leak and remote-UDF SSRF by @lukekim in #10757
Fix 'wait_until_dependent_tables_are_ready' for catalogs by @phillipleblanc in #10758
Fixes for views and resolved tables on 'spice refresh' CLI by @phillipleblanc in #10759
Implement FlightSQL CommandStatementSubstraitPlan support by @lukekim in #10761
feat(connectors): mTLS client cert support for flightsql and spiceai connectors by @phillipleblanc in #10764
Allow arbitrary filenames when specifying spicepod path + kind validation by @krinart in #10777
fix: ignore field metadata in schema compatibility check in index_table_scan by @Jeadie in #10778
Display pushed-down limits in EXPLAIN TREE output by @lukekim in #10779
fix: enable streaming append for Kafka with Cayenne accelerator by @lukekim in #10780
fix: bound chunked-index intermediate batch size to prevent OOM by @phillipleblanc in #10783
fix: label all columns in spice cloud metrics table output by @claudespice in #10784
fix: use checked arithmetic for Turso integer-millis timestamp read path by @claudespice in #10786
fix: use checked arithmetic in timestamp-to-nanosecond conversions by @claudespice in #10666
Upgrade to DuckDB v1.5.2 by @sgrebnov in #10788
Improve CDC ingestion performance by @lukekim in #10789
Fix tool_search/tool_invoke spans by @lukekim in #10791
Add Cayenne inline mutations and benchmark coverage by @lukekim in #10792
Ensure we always resolve table names in distributed mode/metadata by @Jeadie in #10793
Remove permanent errors from DynamoDB Streams by @krinart in #10794
Add expanded view mode for wide table display in SQL REPL by @lukekim in #10797
Fix Cayenne CDC schema mismatch error by @sgrebnov in #10800
Executors should create catalog tables on join by @Jeadie in #10807
Add compressed file support for listing connectors by @lukekim in #10809
Improve Cayenne mutation, scan, and inline memtable scaling by @lukekim in #10811
Add range fallback for large join filters by @lukekim in #10816
Improve Cayenne join filter pushdown by @lukekim in #10818
Synchronize Cayenne partition commits across partitions by @phillipleblanc in #10819
fix: Deny nondistributed cayenne catalog by @peasee in #10821
Enable parallel Cayenne Vortex writes by @lukekim in #10822
Expand Arrow type handling in formatting and Elasticsearch by @lukekim in #10825
Add response.output_text.delta to responses API by @krinart in #10828
feat(cayenne): add join filter propagation and no-spill Q21 planning by @lukekim in #10840
Upgrade Turso to v0.6.0 by @sgrebnov in #10843
feat(cli): add spice feedback command to open community Slack by @lukekim in #10856
Upgrade iceberg to v0.9.1 by @sgrebnov in #10859
feat(cluster): per-request executor readiness gate on /v1/ready by @phillipleblanc in #10860
fix: Require dim-side statistics for CayennePropagateFilterAcrossEquiJoinKeys by @sgrebnov in #10863
fix: Debezium schema evolution breaks dataset init on reload by @claudespice in #10144
fix(mssql): Push topK limit to SQL Server for non-nullable sort columns by @Jeadie in #10621
fix(ScyllaDB): disable physical filter pushdown by @sgrebnov in #10772
fix: handle typed NULLs and prevent overflow in DynamoDB DML type conversions by @krinart in #10511
fix: use InsertOp::Overwrite in DynamoDB bootstrap scan_and_overwrite_accelerator by @krinart in #10639
Improve DynamoDB Bootstrap performance by @krinart in #10616
fix: preserve field and schema metadata in Vortex type transformation by @lukekim in #10628
fix: GH connector - explicitly use AWS LC RS crypto provider for jwt by @phillipleblanc in #10619
fix: add snapshot mode guards to delete_from/update and delegate DML in SwappableTableProvider by @phillipleblanc in #10685
Persist HTTP rate-control state in object storage by @lukekim in #10697
Rate limit metrics HTTP endpoint by @lukekim in #10162
feat(geo): add optional spatial SQL UDF support by @lukekim in #10833
feat(cayenne): CDC throughput, compaction, scan caching, and benchmarks by @lukekim in #10852
fix(cayenne): fix Vortex panic on highly compressible data by @sgrebnov in #10855
fix(cayenne): Read live protected snapshots after cleanup grace period by @sgrebnov in #10901
fix: Disable Cayenne HashJoin rewriter optimizer by @sgrebnov in #10882
Fix GetFlightInfo vs DoGet Flight Schema by @krinart in #10864
fix(search): preserve column casing in /v1/search primary key plumbing by @claudespice in #10909
fix(object-store): dedupe s3 url style auto-detection log by @phillipleblanc in #10898
Improve Spice CLI manifest editing and direct command modes by @lukekim in #10815
Persist Kafka CDC offsets in sidecar tables by @lukekim in #10823
feat(task-history): record Ballista stages for distributed queries by @phillipleblanc in #10831
Add '#[deny(clippy::missing_trait_methods)]' to wrapper/delegation trait impls by @Jeadie in #10795
Optimize Cayenne catalog maintenance paths by @lukekim in #10904
Centralize DuckDB settings for accelerator by @ewgenius in #10895
deps(ballista): bump to 47e2b494 to fix S3 shuffle reads under cluster mode by @phillipleblanc in #10910
Authorization header + Bump async-openai + responses_adapter fix by @krinart in #10911
Tune accelerators by storage profile by @lukekim in #10913
feat: add dataset-level on_schema_change config by @lukekim in #10908
Handle NULL sentinel for nullable partition expressions by @Jeadie in #10880
fix: Remove Cayenne Catalog from catalog registration by @peasee in #10914
Add catalog name to foreign key metadata in postgres catalog by @Jeadie in #10917
Cayenne perf: eliminate redundant clones, PK point-lookup fanout fix, IN-list rewrite + microbench coverage by @lukekim in #10916
fix(turso-shared): retry on Turso BEGIN CONCURRENT "Write-write conflict" by @lukekim in #10946
Vendor Vortex DataFusion for Cayenne by @lukekim in #10933
perf(cayenne): background retention + enable CDC pipelining for retention-configured tables by @lukekim in #10936
feat(cayenne): scale metastore pool to 32 + vs_duckdb_scaling benches (1→128 concurrency, sqlite + turso lanes) by @lukekim in #10943
feat(mcp): support auth for streamable HTTP tools by @phillipleblanc in #10927
Explicit error if v1/search requests a table without search index by @Jeadie in #10968
Fix spicepod loading failure when directory name contains dots by @sgrebnov in #10958
Extend append tests with arrow engine configurations by @sgrebnov in #10959
Remove dataset on_schema_change Policy from rc.5 release notes by @sgrebnov in #10964
Skip tpcds_q78 for Cayenne engine at SF100 by @sgrebnov in #10966
fix: Update benchmark snapshots May-20 by @app/github-actions in #10952
Fix #10951: UdtfExec invariant Vec lengths must match children count by @phillipleblanc in #10953
docs(release): update v2.0.0-rc.5 notes with latest trunk PRs by @lukekim in #10949
Remove eval related things for v2.0.0 by @Jeadie in #10945
build(deps): bump ubuntu from 24.04 to 26.04 in the docker-dependencies group by @app/dependabot in #10883
fix: Add publish = false to chbench-driver by @sgrebnov in #10939
[Bug] Timing between reconnect and AllocateInitialPartitions leaves connection without flight_sql_client by @Jeadie in #10805
Fix: refresh_mode: snapshot reports Ready with empty data when no snapshot exists by @sgrebnov in #10979
fix(cluster): gate scheduler readiness on executor partition loads by @phillipleblanc in #10992
fix: handle EXISTS/NOT EXISTS subqueries in federation analyzer by @sgrebnov in #10996
Refactor spice dataset configuration command by @Jeadie in #10999
fix: preserve field and schema metadata in Vortex physical schema calculation by @claudespice in #11013
fix: validate Snowflake account identifiers and auth config by @Jeadie in #11024
Fix Unity Catalog connector deserialization failure with OSS Unity Catalog by @ewgenius in #11026
feat(cayenne): allow inline writes with pending deletions (deletes/upserts) by @sgrebnov in #11031
Expose metadata descriptions via PostgreSQL UDFs by @lukekim in #11032
Remove default runtime features - enable explicitly in spiced by @phillipleblanc in #11037
feat(cayenne): fast-path CDC deletes by extracting PK values from filters by @sgrebnov in #11049
Cayenne optimizer rules: auto relevance test for q21-shape (all-Cayenne CH-Bench) and runtime rule selection by @lukekim in #11050
refactor(cdc): reduce CDC sub-batch splits for interleaved upsert/delete workloads by @sgrebnov in #11051
fix(snowflake): enforce function deny-list in federation pushdown by @claudespice in #11057
fix(mcp): trace external server tool calls in task history by @ewgenius in #11058
perf(cdc): Last-write-wins dedup in group_into_sub_batches to reduce sub-batch splits by @sgrebnov in #11059
PM edits to v2.0.0-rc5 by @lukekim in #11067
fix(snowflake): wire deny-list in extracted connector crate (#10703) by @claudespice in #11071
perf(cayenne): keep CDC upsert PK keysets resident to avoid per-batch full-table rebuilds by @lukekim in #11074
Fix metadata on search indexing by @Jeadie in #11080
feat(cayenne): merge-on-read position deletes for PK upsert tables + memory-pool accounting by @lukekim in #11085
perf(cayenne): scale CDC inline flush caps with memory + storage class by @lukekim in #11087
feat(cluster): report per-executor table statistics so distributed JoinSelection can size joins by @phillipleblanc in #11089
Improve Cayenne CDC write and compaction path tracing by @sgrebnov in #11091
Support tuple-IN composite PK extraction in Cayenne delete fast-path by @sgrebnov in #11093
feat(cluster): NDV-aware executor stats so CDC q18 join swap fires by @phillipleblanc in #11098
feat(cayenne): maintain join-sizing stats on the write path by @phillipleblanc in #11104
fix(cache): run periodic moka maintenance for idle caches by @phillipleblanc in #11106
Upgrade to DuckDB 1.5.3 + statically link the VSS (HNSW) extension by @sgrebnov in #11107
Fix fetched_at for HTTP connector by @Jeadie in #11116
fix(cayenne): tombstone inline-checkpointed rows on upsert to prevent duplicate PKs by @sgrebnov in #11129
feat: dedicated compaction runtime for Cayenne + CDC pipelining, protected snapshots, and test coverage by @lukekim in #11130
Add datasets dimension to the query_executions metric by @phillipleblanc in #11138
Fix #11137: localpod child not tracking parent refreshes with in-memory (arrow) parent accelerator by @phillipleblanc in #11139
Fix Windows build: vendor the VSS extension (drop nested submodule) by @phillipleblanc in #11140
fix(spiceai): keep correlated subqueries out of JOIN ON for Spice Cloud federation by @phillipleblanc in #11143
Refactor spice dataset configuration command by @Jeadie in #10999
feat(cayenne): sharded parallel Vortex encode with key/time clustering by @lukekim in #11144
fix(cluster): prevent DoPut write pipeline self-deadlock under ingest backpressure by @phillipleblanc in #11160
fix(cayenne): only warn on genuine protected-snapshot amplification by @lukekim in #11158

Full Changelog: https://github.com/spiceai/spiceai/compare/v1.11.6...v2.0.0

Spice v1.11.3 (Mar 9, 2026)

March 9, 2026 · 3 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.11.3! 🛠️

Spice v1.11.3 is a patch release fixing schema consistency issues in the S3 and FlightSQL data connectors, improving CDC cache invalidation, and enhancing the HTTP data connector's error handling and response metadata.

What's New in v1.11.3

S3 Data Connector Fix

Fixed an issue where queries using metadata columns (location, last_modified, size) on S3 datasets produced Input field name does not match with the projection expression errors (#9647). This occurred when projecting metadata columns with filters or scalar functions (e.g., SELECT lower(location) FROM table WHERE location = '...'), and when projection returned no matching files.

FlightSQL Schema Consistency

Fixed an issue where the Flight SQL JDBC driver returned Unsupported ArrowType Utf8View errors when performing ::TEXT type casts (#9253). The FlightSQL endpoint now maps view types (e.g., Utf8View, BinaryView) to their non-view equivalents, ensuring compatibility with JDBC and ODBC clients.

CDC Cache Invalidation

Fixed an issue where the SQL results cache was invalidated on every change stream poll, even when zero records were returned (#9472). This caused near-total cache miss rates for datasets using refresh_mode: changes (e.g., DynamoDB Streams), effectively rendering the cache useless. Cache invalidation now only occurs when a change batch contains actual data changes.

HTTP Data Connector Improvements

HTTP error responses (e.g., 5xx) are now excluded from the cache, preventing transient server errors from polluting cached results.
Added a response_headers column (Map type) to HTTP responses, providing access to response header metadata in query results.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook includes 86 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.11.3, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.11.3 image:

docker pull spiceai/spiceai:1.11.3

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 1.11.3

AWS Marketplace:

Spice is available in the AWS Marketplace.

What's Changed

Changelog

fix(s3): Fix metadata column schema mismatches in projected queries by @sgrebnov in #9664
s3_metadata_columns tests: include test for location outside table prefix by @sgrebnov in #9676
Fix Flight SQL schema consistency: expand view types and verify field names by @sgrebnov in #9438
Improve CDC cache invalidation by @krinart in #9651
Skip caching http error response + add response_headers by @krinart in #9670

Full Changelog: https://github.com/spiceai/spiceai/compare/v1.11.2...v1.11.3

Spice v1.10.4 (Jan 5, 2026)

January 5, 2026 · 2 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.10.4! 🛠️

v1.10.4 is a patch release with fixes for Kafka/Debezium batch commits, ABFSS URL support for Azure Data Lake Storage Gen2, and improved column projection handling for location metadata columns.

What's New in v1.10.4

Additional Improvements & Bug Fixes

Reliability: Fixed Kafka and Debezium batch commit handling to properly commit offsets across all partitions. Previously, only the last message's offset was committed, which could cause message loss when batches contained messages from multiple partitions.
Reliability: Added support for abfss:// URL prefix for Azure Data Lake Storage Gen2, in addition to the existing abfs:// prefix. The abfss scheme indicates secure (TLS) connections to ADLS Gen2.
Reliability: Fixed column projection order mismatch when querying datasets with location metadata columns (e.g., SELECT location, day, size FROM dataset). Queries that specified columns in a different order than the schema would fail with "column types must match schema types" errors.
Developer Experience: Added detailed diagnostic logging for union projection pushdown optimization failures in cluster mode. When projection pushdown cannot be applied, debug-level logs now provide additional context to help identify the root cause.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No major cookbook updates.

The Spice Cookbook includes 84 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.10.4, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.10.4 image:

docker pull spiceai/spiceai:1.10.4

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

Update acknowledgements by @app/github-actions in #8695
Proper batch commit for kafka/debezium by @krinart in #8671
Add support for abfss by @krinart in #8706
cluster: UnionProjectionPushdownOptimizer: Add projection pushdown diagnostics for union children by @phillipleblanc in #8734
Fix column projection order mismatch with location metadata columns by @phillipleblanc in #8738

Spice v1.10.3 (Dec 29, 2025)

December 29, 2025 · 2 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.10.3! 🚀

v1.10.3 is a patch release with improved startup reliability, fixes for Azure BlobFS versioned containers, S3 custom endpoint query resolution, and a fix for the OpenAI Responses API.

What's New in v1.10.3

Additional Improvements & Bug Fixes

Reliability: Telemetry exporter initialization now runs asynchronously, preventing blocked startup in environments with network restrictions (e.g., Kubernetes with restrictive network policies).
Reliability: Fixed an issue where queries on Azure Blob containers with versioning enabled would fail with "Azure does not support suffix range requests" error in distributed query mode.
Reliability: Fixed S3 location-based queries against custom S3 endpoints (e.g., MinIO, LocalStack). Queries with location predicates on datasets using s3_endpoint and s3_region parameters now correctly route to the configured endpoint instead of defaulting to AWS S3.
Reliability: Fixed "project index out of bounds" errors in the query optimizer when union children have mismatched schemas. The optimizer now validates schema compatibility before applying projection pushdown.
Reliability: Fixed an issue where the OpenAI Responses API (/v1/responses) was not working correctly.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No major cookbook updates.

The Spice Cookbook includes 84 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.10.3, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.10.3 image:

docker pull spiceai/spiceai:1.10.3

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

Upgrade to openai-async v0.32 by @lukekim in #8635
Fix issue with location predicate for custom S3 endpoints + regression integration test by @phillipleblanc in #8668
fix: Validate schema match before projection pushdown in UnionProjectionPushdownOptimizer by @phillipleblanc in #8669
Start the anonymous telemetry exporter asynchronously by @phillipleblanc in #8679
fix: Azure does not support suffix range requests by @phillipleblanc in #8685

Spice v1.9.0 (Nov 19, 2025)

November 19, 2025 · 59 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.9.0-stable! 🌶

v1.9.0-stable introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50, DuckDB v1.4.2, and Delta-Kernel v0.16 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Introducing Cayenne: SQL as an Acceleration Format: A new high-performance Data Accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format, a Linux Foundation project. Cayenne delivers query and ingestion performance better than DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.

Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing data in Vortex's compressed columnar format. This architecture provides:

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data_30d
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      refresh_mode: append
      retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Apache Ballista Integration: Spice now supports distributed query execution based on Apache Ballista, enabling distributed queries across multiple executor nodes for improved performance on large datasets. This feature is in preview in v1.9.0.

Architecture:

A distributed Spice cluster consists of:

Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
The feature is in preview and may have stability or performance limitations
Specific acceleration support is planned for future releases

For more details, refer to the Distributed Query Documentation.

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

Apache Spark Compatible Functions: Added support for Spark-compatible functions including array, bit_get/bit_count, bitmap_count, crc32/sha1, date_add/date_sub, if, last_day, like/ilike, luhn_check, mod/pmod, next_day, parse_url, rint, and width_bucket.

Bug Fixes & Reliability: Resolved issues with partition name validation and empty execution plans when vector index lists are empty. Fixed timestamp support for partition expressions, enabling better partitioning for time-series data.

See the Apache DataFusion 50.0.3 Release for more details.

DuckDB v1.4.2 Upgrade and Accelerator Improvements

DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.

Composite ART Index Support: DuckDB in Spice now supports composite (multi-column) Adaptive Radix Tree (ART) indexes for accelerated table scans. When queries filter on multiple columns fully covered by a composite index, the optimizer automatically uses index scans instead of full table scans, delivering significant performance improvements for selective queries.

Example configuration:

datasets:
  - from: file://data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      indexes:
        '(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

DuckDB Intermediate Materialization: Queries with indexes now use intermediate materialization (WITH ... AS MATERIALIZED) to leverage faster index scans. Currently supported for non-federated queries (query_federation: disabled) against a single table with indexes only. When predicates cover more columns than the index, the optimizer rewrites queries to first materialize index-filtered results, then apply remaining predicates. This optimization can deliver significant performance improvements for selective queries.

Example configuration:

datasets:
  - from: file://sales_data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        query_federation: disabled # Required currently for intermediate materialization
      indexes:
        '(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
  SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

For more details, refer to the DuckDB Data Accelerator Documentation.

HTTP Data Connector

Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      max_retries: 3
      client_timeout: 10s
      allowed_request_paths: /search/people
      request_query_filters: enabled
      request_body_filters: enabled

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      allowed_request_paths: /search/people
      request_query_filters: enabled
      request_body_filters: enabled
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_sql: |
        SELECT request_path, request_query, content 
        FROM tvmaze
        WHERE request_path = '/search/people'
          AND request_query IN ('q=michael', 'q=luke')

For more details, refer to the HTTP Data Connector Documentation.

DynamoDB Data Connector Improvements

Improved Query Performance: The DynamoDB Data Connector now includes improved filter handling for edge cases, parallel scan support for faster data ingestion, and better error handling for misconfigured queries. These improvements enable more reliable and performant access to DynamoDB data.

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

For more details, refer to the DynamoDB Data Connector Documentation.

S3 Data Connector Improvements

S3 Versioning Support: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
Version tracking is automatic when S3 versioning is enabled on the bucket

S3 Single-File Refresh Skipping: Spice now optimizes S3 single-file dataset refreshes by caching file metadata (ETag, Version ID, size, timestamp) and skipping unnecessary data fetches when the underlying file hasn't changed. This optimization dramatically reduces bandwidth usage and improves refresh performance for scenarios where data doesn't change frequently. The feature is enabled by default for accelerated S3 single-file datasets and includes metrics tracking for skipped refreshes.

Example configuration:

datasets:
  - from: s3://my-bucket/data.parquet
    name: s3_data
    acceleration:
      enabled: true
      engine: duckdb
      refresh_check_interval: 10s

When the file's metadata hasn't changed between refresh checks, Spice will skip the data fetch entirely, logging:

Skipping refresh for dataset 's3_data': file metadata unchanged

For more details, refer to the S3 Data Connector Documentation.

Search & Embeddings Enhancements

Full-Text Search on Views: Full-text search indexes are now supported on views, enabling advanced search scenarios over pre-aggregated or transformed data. This extends the power of Spice's search capabilities beyond base datasets.

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

For more details, refer to the Search Documentation and Embeddings Documentation.

Dedicated Query Thread Pool (Now Enabled by Default)

Dedicated Query Thread Pool: Query execution and accelerated refreshes now run on their own dedicated thread pool, separate from the HTTP server. This prevents heavy query workloads from slowing down API responses, keeping health checks fast and avoiding unnecessary Kubernetes pod restarts under load.

This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
  params:
    dedicated_thread_pool: none

For more details, refer to the Runtime Configuration Documentation.

Query Performance Optimizations

Stale-While-Revalidate Cache Control: Query results now support "stale-while-revalidate" cache control, allowing stale cached data to be served immediately while asynchronously refreshing the cache entry in the background. This improves response times for frequently-accessed queries while maintaining data freshness. Requires cache key type to be set to "sql (raw)" for proper operation.

Optimized Prepared Statements: Prepared statement handling has been optimized for better performance with parameterized queries, reducing planning overhead and improving execution time for repeated queries.

Large RecordBatch Chunking: Large Arrow RecordBatch objects are now automatically chunked to control memory usage during query execution, preventing memory exhaustion for queries returning large result sets.

Query Result Caching: Compressed Encoding, Stale-While-Revalidate Cache Control

Zstd Compression Encoding: Query result caching now supports optional Zstandard (zstd) compression encoding to reduce memory usage for cached query results. This is particularly beneficial for large result sets, reducing cache memory footprint while maintaining fast decompression times. Encoding can be configured via the encoding parameter with options none (default) or zstd.

Example configuration:

runtime:
  caching:
    sql_results:
      enabled: true
      max_size: 128MiB
      item_ttl: 1m
      encoding: zstd # Enable zstd compression

HTTP Cache-Control Support: The query result cache now supports the stale-while-revalidate Cache-Control directive, enabling faster response times by serving stale cached results immediately while asynchronously refreshing the cache in the background. This feature is particularly useful for applications that can tolerate slightly stale data in exchange for improved performance.

Example configuration:

runtime:
  caching:
    sql_results:
      enabled: true
      max_size: 128MiB
      item_ttl: 1m
      stale_while_revalidate_ttl: 1m # serve stale items for up to 1 minute after `item_ttl` expires

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

Immediately return the stale cached result to the client
Asynchronously re-execute the query in the background to refresh the cache
Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
Background revalidation re-executes queries through the normal query path
Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

For more details, refer to the Results Caching Documentation.

Security & Reliability Improvements

Enhanced HTTP Client Security: HTTP client usage across the runtime has been hardened with improved TLS validation, certificate pinning for critical endpoints, and better error handling for network failures.

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Improved Credential Retry Logic: AWS SDK credential initialization has been significantly improved with more robust retry logic and better error handling. The system now automatically retries transient credential resolution failures using Fibonacci backoff, allowing Spice to tolerate extended AWS outages (up to ~48 hours) without manual intervention.

Key features:

Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Version-Controlled Data Access: The new Git Data Connector (Alpha) enables querying datasets stored in Git repositories. This connector is ideal for use cases involving configuration files, documentation, or any data tracked in version control.

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK has been upgraded with support for configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

For more details, refer to the Java SDK Documentation.

CLI Improvements

Install Specific Versions: The spice install command now supports installing specific versions of the Spice runtime and CLI. This enables easy version management, downgrading, or installation of specific releases for testing or compatibility requirements.

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

Persistent Query History: The Spice CLI REPL (SQL, search, and chat interfaces) now persists command history to ~/.spice/query_history.txt, making your query history available across sessions. The history file is automatically created if it doesn't exist, with graceful fallback if the home directory cannot be determined.

New REPL Commands:

.clear - Clear the screen using ANSI escape codes for a clean workspace
.clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear          # Clears the screen
sql> .clear history  # Clears command history
sql> # Use arrow keys or tab to access previous commands

For more details, refer to the CLI Documentation.

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0 image:

docker pull spiceai/spiceai:1.9.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v50
Apache Arrow: Upgraded to v56
DuckDB: Upgraded to v1.4.2
Delta Kernel: Upgraded to v0.16.0

Changelog

Fix for search field as metadata for chunked search indexes by @Jeadie in #7429
Bump object_store from 0.12.3 to 0.12.4 by @app/dependabot in #7433
Properly respect disabling snapshots by @phillipleblanc in #7431
Revert "Properly respect disabling snapshots" by @sgrebnov in #7439
Revert "Disable snapshots by default" by @sgrebnov in #7438
Add preview warning for write access mode by @sgrebnov in #7440
fix: regexp_match for DuckDB datasets by @kczimm in #7443
Add feature is currently in preview warning for snapshots by @sgrebnov in #7442
[Logger] Also emit Datafusion logs by @mach-kernel in #7441
add missing snapshot by @kczimm in #7446
Fix tracing so that ai_completions are parented under sql_query by @lukekim in #7415
Enable snapshot acceleration by default by @phillipleblanc in #7451
Disable acceleration refresh metrics by @krinart in #7450
Add v1.8 release notes by @phillipleblanc in #7430
fix: partition name validation by @kczimm in #7452
Fix lint error due to ignore without reasons by @krinart in #7454
Add models and CUDA support to spiced install script by @lukekim in #7457
Post-release 1.8 updates by @phillipleblanc in #7455
Remove println in datafusion by @phillipleblanc in #7461
Update end_game.md to notify once release is done by @sgrebnov in #7460
Remove italics from snapshot logging by @phillipleblanc in #7463
Update openapi.json by @app/github-actions in #7466
Fix generate spicepod schema by @phillipleblanc in #7464
Fix generate acknowledements by @phillipleblanc in #7465
Update spicepod.schema.json by @app/github-actions in #7469
fix: Ensure ListingTable partitions are pruned when filters are not used by @peasee in #7471
Create runtime-secrets crate by @phillipleblanc in #7474
Create runtime-parameters crate by @phillipleblanc in #7475
Don't download the snapshot if the acceleration is present by @phillipleblanc in #7477
Bump hyper-util from 0.1.16 to 0.1.17 by @app/dependabot in #7434
Add missing remote CLI feature by @lukekim in #7478
Add 1.8.0 release analytics by @sgrebnov in #7481
CLI multi-line input support for spice sql by @lukekim in #7479
fix: duckdb partitioning cannot reload config by @kczimm in #7482
fix: Make search cache test use a slower uncached search by @peasee in #7473
Add support for S3 dataset params by @phillipleblanc in #7476
Add DuckDB TPC-H memory limit variations by @lukekim in #7484
Add better snapshot validation for incorrectly configured spicepods by @phillipleblanc in #7487
Move blocking/sync I/O to spawn blocking by @lukekim in #7462
Add DuckDB file accelerator 2G and 4G dispatches by @lukekim in #7491
Validate spicepod file exists before running tests by @lukekim in #7492
Make snapshot reading/writing more robust with Iceberg-like metadata.json by @phillipleblanc in #7486
Re-use build-testoperator and ensure it's cached by @lukekim in #7494
Fix duckdb test operator limit casing by @lukekim in #7498
fix: Update benchmark snapshots by @app/github-actions in #7499
Create runtime-request-context crate by @Jeadie in #7459
Add integration tests for Acceleration DB snapshotting by @phillipleblanc in #7489
Two minor fixes for AI udf tests by @krinart in #7503
Add model response timeout for ai udf tests by @krinart in #7504
Show error if FTS is misconfigured for datasets/views by @krinart in #7458
Add test for chunked search index with search field as metadata by @Jeadie in #7513
Add sccache for build test operator by @lukekim in #7515
Enhancement: Add spill_compression to runtime config by @krinart in #7505
Improve GitHub Data Connector by @lukekim in #7510
Add RequestTimeoutException to S3 client by @Jeadie in #7514
Add sha=<> to snapshot logging by @phillipleblanc in #7521
Add Type to GitHub Data Connector issues and fix double aliasing for project author by @lukekim in #7519
DuckDB acceleration: fix memory leak in duckdb_arrow_scan by @sgrebnov in #7524
Fix partition_by accelerations when a projection is applied on empty partition sets by @phillipleblanc in #7526
Nullable fields for index columns by @Jeadie in #7523
Fix missing winver dependency for Windows by @krinart in #7538
Update mongo config for benchmarks by @krinart in #7546
Add acceleration snapshots cookbook to template by @phillipleblanc in #7527
Bump github/codeql-action from 3 to 4 by @app/dependabot in #7535
Bump golang.org/x/sys from 0.36.0 to 0.37.0 by @app/dependabot in #7529
Make spice chat play nice with Unix pipes by @Jeadie in #7525
Configurable DuckDB duckdb_index_scan_percentage & duckdb_index_scan_max_count by @lukekim in #7551
[cherry-pick] Release notes for release 1.8.1 by @krinart in #7556
Fix 1.8.1 release notes by @krinart in #7558
FTS index with nonfilterable metadata; search field not metadata by default. by @Jeadie in #7548
Properly set auth headers in github_release.py by @krinart in #7560
project_schema when using EmptyExec by @kczimm in #7543
Add 1.8.1 release analytics by @kczimm in #7561
Bump golang.org/x/mod from 0.28.0 to 0.29.0 by @app/dependabot in #7530
Hive-style partitioning for DuckDB file mode by @kczimm in #7563
Vortex Data Accelerator (Dev grade) by @lukekim in #7566
Only load eval scorers when eval defined by @Jeadie in #7549
Bump octocrab from 0.45.0 to 0.47.0 by @app/dependabot in #7531
Bump regex from 1.11.3 to 1.12.1 by @app/dependabot in #7532
Fix custom file path for Vortex Data Accelerator by @phillipleblanc in #7570
Add List type support to Vortex Data Accelerator by @lukekim in #7569
Bump parking_lot from 0.12.4 to 0.12.5 by @app/dependabot in #7534
Bump tokio-postgres from 0.7.14 to 0.7.15 by @app/dependabot in #7533
Remove duplicate line from 1.8.1 release notes by @krinart in #7580
Upgrade Go from v1.24.2 to v1.25.3 by @lukekim in #7582
Fix race condition in S3 Vectors index and bucket creation by @kczimm in #7577
Add runtime-async crate with managed Tokio runtime by @phillipleblanc in #7575
Optimize GitHub Actions workflows by @lukekim in #7584
Use 'location' as primary key for document tables by @Jeadie in #7567
Extend query-related metrics by @krinart in #7571
Enabling acceleration refresh metrics by using runtime.metrics config by @krinart in #7583
S3Vector service metrics in client by @Jeadie in #7502
Fix score order for one test case by @Jeadie in #7595
ObjectMeta filter pushdown for ObjectStoreTextTable by @Jeadie in #7572
Return TableProvider from CandidateGeneration::search. by @Jeadie in #7559
EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider by @Jeadie in #7587
Update official Docker builds to use release binaries by @phillipleblanc in #7597
New Generate Changelog workflow by @krinart in #7562
BytesProcessedExec to allow optimizer to do limit pushdown by @mach-kernel in #7539
GitHub Data Connector add Projects, improve rate-limiting and error handling by @lukekim in #7547
Add copilot-instructions to help improve Copilot reviews by @lukekim in #7606
Add support for DuckDB table-based partitioning by @sgrebnov in #7581
fix: Use nextest for integration models tests by @peasee in #7617
Fix license issue in table-providers by @phillipleblanc in #7620
Remove Build Docker Image from PR checks by @phillipleblanc in #7621
Combine PR Lint + Build checks by @phillipleblanc in #7623
Remove Cache Rust builds step by @phillipleblanc in #7625
Rename duckdb_partition_mode to partition_mode param by @sgrebnov in #7622
DuckDB table partitioning: delete partitions that no longer exist after full refresh by @sgrebnov in #7614
Build integration test binaries in single job by @phillipleblanc in #7624
Make DuckDB table partition data write threshold configurable by @sgrebnov in #7626
Handle table relations in HTTP v1/search by @Jeadie in #7615
Fix E2E Test sporadic failures on the macOS runners by @phillipleblanc in #7627
Emit query_active_count metric by @krinart in #7589
fix: Disable go cache in actions by @peasee in #7631
fix: Don't nullify DuckDB release callbacks for schemas by @peasee in #7628
Fix integration tests by reverting the use of batch inserts w/ prepared statements by @phillipleblanc in #7630
Split integration tests into 3 partitions by @phillipleblanc in #7635
Initial Pepper data accelerator by @lukekim in #7592
Only build the E2E Test CI binaries once by @phillipleblanc in #7633
Update BytesProcessedExec snapshots by @mach-kernel in #7637
Properly set RequestContext for stream execution in Flight by @krinart in #7591
Add task for creating release branch in docs by @kczimm in #7642
Add missing mongodb params by @krinart in #7647
fix: Update benchmark snapshots by @app/github-actions in #7649
Release notes for v1.8.2 by @Jeadie in #7645
docs: Update error handling in copilot instructions by @peasee in #7652
Pepper accelerator INSERT OVERWRITE support by @lukekim in #7643
fix: Update benchmark snapshots by @app/github-actions in #7650
Run Datafusion queries on a separate Tokio runtime by @phillipleblanc in #7586
Add explicit steps for docs DRI in end game by @kczimm in #7658
Add Release 1.8.2 QA Analytics by @krinart in #7661
Pepper full / append refresh support by @lukekim in #7662
Add 'client_timeout' for s3 vector by @Jeadie in #7501
Improvements to Endgame template by @krinart in #7660
Fix OSS docker release trigger when release marked as latest by @phillipleblanc in #7668
Use '#[serde(deny_unknown_fields)]' for base spicepod components by @Jeadie in #7669
DataFusion upgrade template to include Ballista by @mach-kernel in #7679
S3 Vector index spilling by @kczimm in #7613
Refresh request context bindings / fix trunk integration tests by @mach-kernel in #7680
Fix integration tests for refresh append by @lukekim in #7681
Distributed query support by @mach-kernel in #7585
Run acceleration refreshes on separate Tokio runtime by @phillipleblanc in #7671
Support DESCRIBE in clustered mode by @kczimm in #7686
use hyphen instead of period for index name spill separator by @kczimm in #7697
Add streaming option to /nsql endpoint by @kczimm in #7695
Task History min_sql_duration filter support by @lukekim in #7698
Spawn object_store IO tasks on the original Tokio runtime by @phillipleblanc in #7689
Update openapi.json by @app/github-actions in #7700
Adjust DataFusion runtime worker threads by @phillipleblanc in #7704
Gate dedicated SQL engine CPU runtime behind opt-in param dedicated_thread_pool by @phillipleblanc in #7705
Add TPCH S3 refresh spicepod by @phillipleblanc in #7706
DuckDB: include ANALYZE after write to update query optimizer statistics by @sgrebnov in #7714
Display execution time in Spice REPL for no results by @sgrebnov in #7713
Task History capture and store SQL query plans by @lukekim in #7701
Pepper TPC-H SF-1 benchmark by @lukekim in #7717
fix: Update benchmark snapshots by @app/github-actions in #7720
Optimize prepared statements (parameterized queries) by @lukekim in #7703
Pepper accelerator tests (Clickbench, TPC-H SF-5, SF-100) by @lukekim in #7721
Add support for DuckDB connection_pool_size param by @sgrebnov in #7716
Add health probing to testoperator runs by @phillipleblanc in #7709
Simplify AcceleratedTable by @Jeadie in #7724
Add some basic indexing tests for 'FullTextDatabaseIndex' by @Jeadie in #7688
DuckDB: on_refresh_recompute_statistics param + ANALYZE for table-based partitioning by @sgrebnov in #7719
Fix Windows builds by excluding pepper/vortex by @phillipleblanc in #7729
Enable separate CPU runtime thread pool for DataFusion by default by @phillipleblanc in #7732
'runtime-datafusion' crate for runtime related DataFusion components by @Jeadie in #7666
Pepper expanded append refresh support by @lukekim in #7670
Pepper basic partitioning by @lukekim in #7731
Stable pepper benchmark snapshots by @phillipleblanc in #7739
Delta Lake Connector: Support AWS_SESSION_TOKEN parameter by @mach-kernel in #7752
Pepper use SQLite WAL by @lukekim in #7757
v1.8.3 release notes by @mach-kernel in #7745
Increase testoperator health check threshold to 50ms by @phillipleblanc in #7767
Data Accelerator Graceful Shutdown by @lukekim in #7756
Remove Windows CUDA builds by @phillipleblanc in #7768
fix: Update benchmark snapshots by @app/github-actions in #7771
Bump actions/download-artifact from 5 to 6 by @app/dependabot in #7746
Bump serde from 1.0.226 to 1.0.228 by @app/dependabot in #7743
Fix casing for keywords and additional columns by @Jeadie in #7770
Bump actions/upload-artifact from 4 to 5 by @app/dependabot in #7750
Bump criterion from 0.5.1 to 0.7.0 by @app/dependabot in #7740
Bump rustls-native-certs from 0.8.1 to 0.8.2 by @app/dependabot in #7744
Git Data Connector (Alpha) by @lukekim in #7772
Pepper accelerator delete support by @lukekim in #7616
Update Helm chart instructions for Helm in end_game.md by @sgrebnov in #7776
Turso data accelerator by @lukekim in #7472
Apply retention SQL filter to refresh fetch by @phillipleblanc in #7778
Add Parquet buffering option for DuckDB partitioned writes (tables mode) by @sgrebnov in #7735
fix: EmptyExec when list indexes is empty by @kczimm in #7784
1.8.3 post-release housekeeping by @mach-kernel in #7783
feat: Upgrade to Datafusion v50 by @peasee in #7777
fix: Replace vortex datafusion with public crate by @peasee in #7791
Full-text search on views by @Jeadie in #7733
Revert "Apply retention SQL filter to refresh fetch (#7778)" by @phillipleblanc in #7796
fix: Add ingest duration and acceleration size metrics to testoperator by @peasee in #7792
Set local timezone to UTC for DuckDB by @phillipleblanc in #7797
add Timestamp support for partition expressions by @kczimm in #7803
Fix trunk lint by @krinart in #7804
Add missing mongodb params by @krinart in #7807
Embedding columns on view components by @Jeadie in #7795
Add Turso as a Pepper Catalog metastore by @lukekim in #7793
Run retention_sql on refresh commit for DuckDB by @lukekim in #7785
docs: Update datafusion upgrade checklist by @peasee in #7812
Vector engines on views by @Jeadie in #7808
Handle refresh worker panics and add recovery test by @phillipleblanc in #7815
chunk large record batches to control memory usage by @kczimm in #7802
fix: cannot determine vector dimension for partitioned indexes by @kczimm in #7818
Upgrade to Turso v0.3 by @lukekim in #7821
fix: Ensure custom *Exec ExecutionPlans push down dynamic filters by @peasee in #7811
handle casing in RRF by @Jeadie in #7825
Enable 'turso' for pepper acceleration by default by @sgrebnov in #7826
Improved DynamoDB Data Connector by @krinart in #7715
Initial support for llama.cpp as LLM inference backend by @lukekim in #7794
Pepper: Implement retention SQL on refresh commit by @phillipleblanc in #7814
Fix Dockerfiles for arm64 by @lukekim in #7834
[DynamoDB] Handle filter edge-cases by @krinart in #7830
[DynamoDB] Support parallelization for Scan request by @krinart in #7829
Don't feature gate Pepper by @lukekim in #7832
Fix llama.cpp static link by @lukekim in #7835
fix: docker nightly builds by @kczimm in #7837
Use GitHub-hosted macOS runner only for tag releases by @lukekim in #7836
Fix Bug: DuckDB INTERNAL Error: Failed to load metadata pointer by @sgrebnov in #7839
Fix docker arm64 build to use aegis in pure-rust mode by @lukekim in #7840
Revert "Use GitHub-hosted macOS runner only for tag releases" by @lukekim in #7843
Rename Pepper to Cayenne by @lukekim in #7844
Tighten CLI permissions and install script by @lukekim in #7845
Set mvcc for Cayenne Turso metastore by @lukekim in #7850
Optimize Prepared Statements by @lukekim in #7859
Remove unwrap from ODBC connector, fix secrets, and kuberenetes secre… by @lukekim in #7846
Improve and secure HTTP client usage by @lukekim in #7847
Pin Oracle Instant Client download to a SHA by @lukekim in #7851
Improve experience for missing or invalid Spicepod.yaml by @lukekim in #7849
chore: Fix PR linting by @peasee in #7865
Revert FlightIPC issues by @Jeadie in #7870
Bump Jimver/cuda-toolkit from 0.2.28 to 0.2.29 by @app/dependabot in #7878
Optimize macOS and Windows builds by @lukekim in #7863
Improve error message by adding 'cayenne' to the list of valid accelerator engines by @sgrebnov in #7882
fix: Kafka message delivery failed by @kczimm in #7883
fix: allow parameter index without dollar signs by @kczimm in #7887
docs: Update component criteria by @peasee in #7891
Temporary disable supports_limit_pushdown for SchemaCastScanExec by @sgrebnov in #7893
fix: Make integration run with no relevant changes, disable makefile targets on PR by @peasee in #7896
Add Cayenne benchmark and concurrency tests and remove indexes for Turso MVCC by @lukekim in #7879
Remove '.embeddings[].metadata' by @Jeadie in #7897
Revert llama.cpp engine by @lukekim in #7898
Make Cayenne snapshotting more robust by @sgrebnov in #7899
Release notes v1.9.0-rc1 by @Jeadie in #7902
Fix dataset_acceleration_last_refresh_time_ms unit to milliseconds in description by @ewgenius in #7901
Fix lint error in record_explain_plan functionality by @sgrebnov in #7906
Cleanup old snapshots after full refresh by @lukekim in #7908
Cayenne deletion vector caching support by @lukekim in #7903
Split filters into partition filters (for pruning) and data filters by @lukekim in #7889
fix: Update benchmark snapshots by @app/github-actions in #7911
fix: Update benchmark snapshots by @app/github-actions in #7912
fix: Update benchmark snapshots by @app/github-actions in #7913
Update spicepod.schema.json by @app/github-actions in #7916
fix: Update benchmark snapshots by @app/github-actions in #7917
Add Cayenne & Turso accelerators to E2E CI test matrix by @lukekim in #7922
Make preview warnings consistent by @lukekim in #7921
Filter and write optimizations by @lukekim in #7918
fix: Set sccache region explicitly by @peasee in #7928
fix: Enable integration test merge group checks by @peasee in #7927
Update testoperator release branch from 1.8 to 1.9 by @peasee in #7926
Update DuckDB to 1.4.1 with composite ART scans by @mach-kernel in #7884
Don't build Windows on trunk pushes by @lukekim in #7931
fix: Use correct minio secret in build binary push by @peasee in #7934
Update test-framework workers to use dedicated Flight client by @sgrebnov in #7938
Fix financebench, configure s3vectors for appropriate snapshotting by @Jeadie in #7935
Don't try to initialize accelerator if it is disabled by @lukekim in #7932
Add spark UDFs to Spice by @Jeadie in #7936
Fix extra async_trait in cayenne metadata catalog by @phillipleblanc in #7942
deps: Upgrade to Rust 1.90 by @peasee in #7941
Add cayenne accelerator to README.md by @ewgenius in #7905
fix: Update benchmark snapshots by @app/github-actions in #7948
Run integration tests with AWS_EC2_METADATA_DISABLED flag by @sgrebnov in #7952
Only retry credentials on ConnectorError by @kczimm in #7944
fix: Improve join reordering by ensuring JoinSelection is applied by @peasee in #7828
fix: Remove unused deps, consolidate workspace deps by @peasee in #7953
bump async-openai commit by @kczimm in #7929
deps: Use vortex fork by @peasee in #7954
Enable tracing in delta lake integration tests by @sgrebnov in #7951
Update datasets in S3 vectors test case by @Jeadie in #7956
Add spiced metrics scraping to test operator by @lukekim in #7937
Memoize S3 vectors ListIndex API call with configurable TTL by @kczimm in #7910
Cayenne performance optimizations by @lukekim in #7907
Setup HotFix issue template by @ewgenius in #7957
Fix AWS SDK credential cache retry handling by @phillipleblanc in #7943
Infer RRF join_key from TableProvider::constraints and implement SearchQueryProvider::constraints. by @Jeadie in #7959
[Optimizer]: DuckDB intermediate materialization (non-federated) by @mach-kernel in #7964
1.7.3 post-release housekeeping by @ewgenius in #7962
Fix digest_many UDF for ColumnarValue::Array. by @Jeadie in #7960
Fix spiced metrics reporting as part of benchmark tests by @sgrebnov in #7967
Avoid pushing down Spice specific UDFs to accelerators during federation by @Jeadie in #7909
CLI file persisted history with .clear and .clear history commands by @lukekim in #7970
ResultsCache Cache-Control stale-while-revalidate by @lukekim in #7963
Use GetVectors API instead of returnData by @kczimm in #7083
Make DuckDB intermediate materialization logic more robust by @sgrebnov in #7971
[Cayenne] Configurable target Vortex file size by @lukekim in #7972
fix: Update benchmark snapshots by @app/github-actions in #7974
Bump github.com/klauspost/compress from 1.17.11 to 1.18.1 by @app/dependabot in #7872
fix: Update benchmark snapshots by @app/github-actions in #7978
fix: Update benchmark snapshots by @app/github-actions in #7982
Run Integration tests on spiceai-dev-runners by @sgrebnov in #7985
[CLI] Fix cursor issue due to flush by @lukekim in #7981
fix: Support S3 versioning, Vortex dynamic filter pushdown by @peasee in #7984
Make cluster a default feature by @lukekim in #7994
Optimize DuckDB Intermediate Index Materialization for No-Index Case by @sgrebnov in #7998
HTTP connector with dynamic filter support by @lukekim in #7969
Revert federation 'can_execute_plan' by @Jeadie in #7999
Fix stale caching by @lukekim in #7995
Fix count(*) for http connector by @krinart in #8001
[CLI] Install specific version by @lukekim in #8006
Fix stale with revalidate request/response by @lukekim in #8005
Fallback RequestContext for cluster queries by @Jeadie in #8007
Use use_rustls_tls for Spice Cloud /connect by @lukekim in #8008
Use delta-kernel-rs 0.16x + Parquet reader with object meta API changes by @mach-kernel in #8011
fix: Update datafusion & arrow-rs with S3 versioning fix by @lukekim in #8012
Add 1.9.0-rc.2 release notes by @sgrebnov in #7993
Update Datafusion version by @sgrebnov in #8014
[Acceleration] DuckDB tables mode partitioner + CTE rewrite optimizer by @mach-kernel in #8013
Update spicepod.schema.json by @app/github-actions in #8015
Update acknowledgements by @app/github-actions in #8016
Upgrade shutdown signal Ordering by @krinart in #8017
Set max-age: 0 during stale by @lukekim in #8018
Add E2E test release for Helm by @lukekim in #8023
Bump github.com/olekukonko/tablewriter from 0.0.5 to 1.1.1 by @app/dependabot in #7989
Bump schemars from 0.9.0 to 1.0.4 by @app/dependabot in #7877
Update generate_changelog script by @krinart in #8028
Update QA analytics Release 1.9.0-rc.2 by @krinart in #8027
[CLI] Improve auto-complete by @lukekim in #8022
Improve verify helm workflow by @lukekim in #8024
Bump azure_core from 0.28.0 to 0.30.0 by @app/dependabot in #7986
Test operator load test row count validation by @lukekim in #8036
fix: Revert HTTP response offloading by @peasee in #8041
Disable advanced filters pruning for partitioned tables by @sgrebnov in #8037
fix: Ensure Vortex UncompressedSizeInBytes is calculated by @peasee in #8044
Add 1.9.0-rc.3 release notes by @sgrebnov in #8048
fix: Update test snapshots by @app/github-actions in #8046
add benchmark spicepods by @Jeadie in #8047
DynamoDB TPC-H SF1 Benchmarks by @krinart in #8043
Bump github.com/AzureAD/microsoft-authentication-library-for-go from 1.5.0 to 1.6.0 by @app/dependabot in #7988
Bump golang.org/x/sys from 0.37.0 to 0.38.0 by @app/dependabot in #7987
v1.9.0-rc.2 README updates by @lukekim in #8035
Bump suppaftp from 5.4.0 to 6.3.0 by @app/dependabot in #7875
Bump ctor from 0.5.0 to 0.6.0 by @app/dependabot in #7873
WW README Update by @wyattwenzel in #8058
Reenable dynamic federation support by @Jeadie in #8026
fix: Prevent SortExec from ordering below SchemaCastScanExec by @peasee in #8061
Skip logging and return OK() on error during shutdown by @krinart in #8057
Partition pruning with complex expressions by @lukekim in #8040
Update openapi.json by @app/github-actions in #8064
Make DynamoDB snapshots consistent by @krinart in #8069
Add check for error log by @krinart in #8070
Fix tracing of 's3_vector_query_and_get' by @Jeadie in #8065
DuckDB v1.4.2 by @mach-kernel in #8073
Fix failing OpenAI test by @krinart in #8076
Enable 'test_recency_scoring' by @Jeadie in #8068
Test operator: avoid duplicate Flight requests when using --http-clients by @sgrebnov in #8071
Update load tests to use truth percentile values by @sgrebnov in #8079
Update DynamoDB to RC by @krinart in #8060
CachedQueryVector to avoid recomputing embedding vector for spilling/partitioned vector indexes. by @Jeadie in #8059
Fix DuckDB on_commit sink race by @lukekim in #8081
Add partitioned duckdb by @lukekim in #8083
[CLI] Security and santization by @lukekim in #8082
fix: Update benchmark snapshots by @app/github-actions in #8084
Fix partition_by expression by @lukekim in #8087
Data Components security fixes and sanitization by @lukekim in #8086
Runtime security and sanitization by @lukekim in #8088
Add spicepod-validator tool and fix spicepods by @lukekim in #8089
Skip data fetches for S3 single file refreshes by @lukekim in #8072
MCP security and sanitization by @lukekim in #8090
Update spicepod.schema.json by @app/github-actions in #8099
Update acknowledgements by @app/github-actions in #8098
Add install-dev target back to Makefile by @Jeadie in #8100
fix 'testoperator run search' by @Jeadie in #8101
Update datafusion-table-providers - fix nullability inferences for MySQL and PostgreSQL, and fix full text search for PostgreSQL by @ewgenius in #8092
Remove duplicate install-with-models by @phillipleblanc in #8107
Improve Cayenne partitioning by @lukekim in #8097
Testoperator dispatch: respect verify_results dispatch configuration by @sgrebnov in #8106
Include 'match' column only if chunk offsets found in seach query 'LogicalPlan' by @Jeadie in #8102
Fix validation path by @lukekim in #8109
Fix dispatch paths by @lukekim in #8110
Fix dispatch spicepod paths by @lukekim in #8112
fix: Update benchmark snapshots by @app/github-actions in #8113
fix: Update benchmark snapshots by @app/github-actions in #8114
fix: Update benchmark snapshots by @app/github-actions in #8116
Update test Spicepods by @lukekim in #8131
Add validation to reference schema by @lukekim in #8111
Include root error when failing to find latest timestamp in accelerated table by @sgrebnov in #8132
fix: HTTP Connector validation, query and body by @lukekim in #8115
Update nsql model list by @lukekim in #8141
Update DynamoDB Benchmarks by @krinart in #8135
Fix Dremio E2E test by @sgrebnov in #8139
fix: Update MongoDB benchmark snapshots by @app/github-actions in #8143
fix: Update DynamoDB benchmark snapshots by @app/github-actions in #8142
fix: Update benchmark snapshots by @app/github-actions in #8145
fix: Update iceberg[catalog] benchmark snapshots by @app/github-actions in #8144
Improve HTTP Connector UX by @lukekim in #8146
QueryOverrides for DynamoDB benchmarks by @krinart in #8151
test-framework: add row count validation skipping with TPC-DS defaults by @sgrebnov in #8149
fix: Update benchmark snapshots by @app/github-actions in #8148
fix: Update benchmark snapshots by @app/github-actions in #8154
fix: Update benchmark snapshots by @app/github-actions in #8155
fix: Update test snapshots by @app/github-actions in #8160
Suppress delta_kernel::listed_log_files warnings by @phillipleblanc in #8158
Update table providers to fix warning by @phillipleblanc in #8156
Suppress MCP limit log by @phillipleblanc in #8159
Remove incorrect tool name validation by @Jeadie in #8161
Disable results validation for federated/glue[csv].yaml by @phillipleblanc in #8163
fix: Update benchmark snapshots by @app/github-actions in #8164
Fix dynamodb overrides by @phillipleblanc in #8165
Fix dynamo db overrides again by @phillipleblanc in #8166
few more dynamodb overrides by @phillipleblanc in #8167
Add stub release notes for v1.9.0-rc.4 by @phillipleblanc in #8168
Add v1.9.0-rc.4 release notes by @lukekim in #8169
fix: Cayenne concurrent table creation by @lukekim in #8176
fix: Avoid pruning bucket partitions for != and NOT IN operators by @sgrebnov in #8177

Spice v1.9.0-rc.4 (Nov 18, 2025)

November 18, 2025 · 22 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.9.0-rc.4! 🌶

This release candidate brings DuckDB v1.4.2, Cayenne partitioning improvements, and comprehensive security hardening across the CLI, data connectors, runtime, and MCP. v1.9.0-rc.4 also includes MySQL and PostgreSQL connector improvements with fixed nullability inferences and full-text search support, DynamoDB consistency improvements, HTTP connector validation and UX enhancements, and numerous reliability and performance optimizations. Significant improvements were also made to test and automation infrastructure to ensure high quality releases.

v1.9.0 introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Key Features:

SQLite + Vortex Architecture: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.
Simplified Operations: No complex file hierarchies, no JSON/Avro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.
Fast Metadata Access: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.
Efficient Small Changes: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.
High Concurrency: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.
Advanced Data Lifecycle: Full ACID transactions, delete support, and retention SQL execution on refresh commit.

Example Spicepod.yml configuration:

datasets:
  - from: s3:my_table
    name: accelerated_data_30d
    acceleration:
      enabled: true
      engine: cayenne
      mode: file
      refresh_mode: append
      retention_sql: DELETE FROM accelerated_data WHERE created_at < NOW() - INTERVAL '30 days'

Note, the Cayenne Data Accelerator is in Beta with limitations.

For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.

Multi-Node Distributed Query (Preview)

Architecture:

A distributed Spice cluster consists of:

Scheduler: Responsible for distributed query planning and work queue management for the executor fleet
Executors: One or more nodes responsible for running physical query plans

Getting Started:

Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:

# Start scheduler (note the flight bind address override if you want it reachable outside localhost)
spiced --cluster-mode scheduler --flight 0.0.0.0:50051

Start one or more executors configured with the scheduler's flight URI:

# Start executor (automatically selects a free port if 50051 is taken)
spiced --cluster-mode executor --scheduler-url spiced://localhost:50051

Query Execution:

Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:

EXPLAIN SELECT count(id) FROM my_dataset;

Current Limitations:

Accelerated datasets are currently not supported. This feature is designed for querying partitioned data lake formats (Parquet, Delta Lake, Iceberg, etc.)
The feature is in preview and may have stability or performance limitations
Specific acceleration support is planned for future releases

DataFusion v50 Upgrade

Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:

Performance Improvements 🚀:

Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.

See the Apache DataFusion 50.0.3 Release for more details.

DuckDB v1.4.2 Upgrade and Accelerator Improvements

DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.

Example configuration:

datasets:
  - from: file://data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      indexes:
        '(region, product_id)': enabled

Performance example with composite index on 7.5M rows:

SELECT * FROM sales WHERE region = 'US' AND product_id = 12345;

-- Without index: 0.282s
-- With composite index (region, product_id): 0.037s
-- Performance improvement: 7.6x faster with composite index

Example configuration:

datasets:
  - from: file://sales_data.parquet
    name: sales
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        query_federation: disabled # Required currently for intermediate materialization
      indexes:
        '(region, product_id)': enabled

Performance example:

-- Query with indexed columns (region, product_id) plus additional filter (amount)
SELECT * FROM sales
WHERE region = 'US' AND product_id = 12345 AND amount > 1000;

-- Optimized execution time: 0.031s (with intermediate materialization)
-- Standard execution time: 0.108s (without optimization)
-- Performance improvement: ~3.5x faster

The optimizer automatically rewrites the query to:

WITH _intermediate_materialize AS MATERIALIZED (
  SELECT * FROM sales WHERE region = 'US' AND product_id = 12345
)
SELECT * FROM _intermediate_materialize WHERE amount > 1000;

Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.

Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.

UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.

Example Spicepod.yml configuration:

datasets:
  - from: s3://my_bucket/large_table/
    name: partitioned_data
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      retention:
        sql: DELETE FROM partitioned_data WHERE event_time < NOW() - INTERVAL '7 days'

HTTP Data Connector

Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions

Example Spicepod.yml configuration:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    params:
      file_format: json
      max_retries: 3
      client_timeout: 10s

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael'
LIMIT 10;

If a request_body is supplied it will be posted to the endpoint:

Example SQL query:

SELECT request_path, request_query, content
FROM tvmaze
WHERE request_path = '/search/people' and request_query = 'q=michael' and request_body = '{"name": "michael"}'
LIMIT 10;

HTTP endpoints can be accelerated using refresh_sql:

datasets:
  - from: https://api.tvmaze.com
    name: tvmaze
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_sql: |
        SELECT request_path, request_query, content 
        FROM tvmaze
        WHERE request_path = '/search/people'
          AND request_query IN ('q=michael', 'q=luke')

DynamoDB Data Connector Improvements

Example Spicepod.yml configuration:

datasets:
  - from: dynamodb:my_table
    name: ddb_data
    params:
      scan_segments: 10 # Default `auto` which calculates optimal segments based on number of rows

S3 Versioning Support

Atomic Range Reads for Versioned Files: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.

Current limitations:

Multi-file connections (e.g., partitioned datasets) do not yet support version tracking across all files
Version tracking is automatic when S3 versioning is enabled on the bucket

Search & Embeddings Enhancements

Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.

Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.

Example Spicepod.yml configuration:

views:
  - name: aggregated_reviews
    sql: SELECT review_id, review_text FROM reviews WHERE rating > 4
    embeddings:
      - column: review_text
        model: openai:text-embedding-3-small

Dedicated Query Thread Pool (Now Enabled by Default)

This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:

runtime:
  params:
    dedicated_thread_pool: none

Query Performance Optimizations

Query Result Cache: Stale-While-Revalidate

How it works:

When a cache entry is stale but within the stale-while-revalidate window, Spice will:

Immediately return the stale cached result to the client
Asynchronously re-execute the query in the background to refresh the cache
Future requests will use the refreshed data

Configuration:

Use the Cache-Control HTTP header with the stale-while-revalidate directive:

Cache-Control: max-age=300, stale-while-revalidate=60

This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.

Requirements:

Must use plan or raw SQL cache keys (set cache_key_type to sql or plan in results_caching configuration)
Background revalidation re-executes queries through the normal query path
Timestamp tracking automatically determines cache entry age for staleness checks

Example configuration via HTTP header:

GET /v1/sql
Cache-Control: max-age=600, stale-while-revalidate=120
X-Cache-Key-Type: sql

This feature improves application responsiveness while ensuring data freshness through background updates.

Security & Reliability Improvements

ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.

CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.

Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.

AWS Authentication Improvements

Key features:

Automatic retry with backoff: Implements Fibonacci backoff for transient credential failures (network issues, temporary AWS service disruptions)
Configurable retry limits: Supports up to 300 retry attempts with a maximum retry interval of 600 seconds
Better error handling: Distinguishes between retryable errors (connector errors) and non-retryable errors (misconfiguration)
Unauthenticated access support: Properly supports unauthenticated access to public S3 buckets without requiring credentials
Improved error messages: Provides detailed logging with attempt numbers, retry intervals, and error context for better troubleshooting

The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.

Observability & Tracing

DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.

AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.

Git Data Connector (Alpha)

Example Spicepod.yml configuration:

datasets:
  - from: git:https://github.com/myorg/myrepo
    name: git_metrics
    params:
      file_format: csv

For more details, refer to the Git Data Connector Documentation.

Spice Java SDK 0.4.0

The Spice Java SDK have been upgraded with support configurable Arrow memory limit: spice-java v0.4.0

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

CLI Improvements

Usage:

# Install a specific version
spice install v1.8.3

# Install a specific version with AI flavor
spice install v1.8.3 ai

# Install latest version (existing behavior)
spice install
spice install ai

Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.

New REPL Commands:

.clear - Clear the screen using ANSI escape codes for a clean workspace
.clear history - Clear and persist the query history, removing all stored commands

Tab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.

Example usage:

sql> SELECT * FROM my_table;
sql> .clear          # Clears the screen
sql> .clear history  # Clears command history
sql> # Use arrow keys or tab to access previous commands

Additional Improvements & Bug Fixes

Reliability: Fixed refresh worker panics with recovery handling to prevent runtime crashes during acceleration refreshes.
Reliability: Improved error messages for missing or invalid spicepod.yaml files, providing actionable feedback for misconfiguration.
Reliability: Fixed DuckDB metadata pointer loading issues for snapshots.
Performance: Ensured ListingTable partitions are pruned correctly when filters are not used.
Reliability: Fixed vector dimension determination for partitioned indexes.
Search: Fixed casing issues in Reciprocal Rank Fusion (RRF) for hybrid search queries.
Search: Fixed search field handling as metadata for chunked search indexes.
Validation: Added timestamp support for partition expressions.
Validation: Fixed regexp_match function for DuckDB datasets.
Validation: Fixed partition name validation for improved reliability.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.

The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.9.0-rc.4, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.9.0-rc.4 image:

docker pull spiceai/spiceai:1.9.0-rc.4

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

DataFusion: Upgraded to v50
Apache Arrow: Upgraded to v56
DuckDB: Upgraded to v1.4.2
Delta Kernel: Upgraded to v0.16.0

Changelog (rc.4)

Upgrade shutdown signal Ordering by @krinart in #8017
Set max-age: 0 during stale by @lukekim in #8018
Add E2E test release for Helm by @lukekim in #8023
Update generate_changelog script by @krinart in #8028
[CLI] Improve auto-complete by @lukekim in #8022
Improve verify helm workflow by @lukekim in #8024
fix: Ensure Vortex UncompressedSizeInBytes is calculated by @peasee in #8044
WW README Update by @wyattwenzel in #8058
Reenable dynamic federation support by @Jeadie in #8026
fix: Prevent SortExec from ordering below SchemaCastScanExec by @peasee in #8061
Skip logging and return OK() on error during shutdown by @krinart in #8057
Partition pruning with complex expressions by @lukekim in #8040
Make DynamoDB snapshots consistent by @krinart in #8069
Add check for error log by @krinart in #8070
Fix tracing of 's3_vector_query_and_get' by @Jeadie in #8065
DuckDB v1.4.2 by @mach-kernel in #8073
Fix failing OpenAI test by @krinart in #8076
Enable 'test_recency_scoring' by @Jeadie in #8068
Test operator: avoid duplicate Flight requests when using --http-clients by @sgrebnov in #8071
Update load tests to use truth percentile values by @sgrebnov in #8079
Update DynamoDB to RC by @krinart in #8060
CachedQueryVector to avoid recomputing embedding vector for spilling/partitioned vector indexes. by @Jeadie in #8059
Fix DuckDB on_commit sink race by @lukekim in #8081
Add partitioned duckdb by @lukekim in #8083
[CLI] Security and santization by @lukekim in #8082
Fix partition_by expression by @lukekim in #8087
Data Components security fixes and sanitization by @lukekim in #8086
Runtime security and sanitization by @lukekim in #8088
Add spicepod-validator tool and fix spicepods by @lukekim in #8089
Skip data fetches for S3 single file refreshes by @lukekim in #8072
MCP security and sanitization by @lukekim in #8090
Add install-dev target back to Makefile by @Jeadie in #8100
fix 'testoperator run search' by @Jeadie in #8101
Update datafusion-table-providers - fix nullability inferences for MySQL and PostgreSQL, and fix full text search for PostgreSQL by @ewgenius in #8092
Remove duplicate install-with-models by @phillipleblanc in #8107
Improve Cayenne partitioning by @lukekim in #8097
Testoperator dispatch: respect verify_results dispatch configuration by @sgrebnov in #8106
Include 'match' column only if chunk offsets found in seach query 'LogicalPlan' by @Jeadie in #8102
Fix validation path by @lukekim in #8109
Fix dispatch paths by @lukekim in #8110
Fix dispatch spicepod paths by @lukekim in #8112
Update test Spicepods by @lukekim in #8131
Add validation to reference schema by @lukekim in #8111
Include root error when failing to find latest timestamp in accelerated table by @sgrebnov in #8132
fix: HTTP Connector validation, query and body by @lukekim in #8115
Update nsql model list by @lukekim in #8141
Update DynamoDB Benchmarks by @krinart in #8135
Fix Dremio E2E test by @sgrebnov in #8139
Improve HTTP Connector UX by @lukekim in #8146
QueryOverrides for DynamoDB benchmarks by @krinart in #8151
test-framework: add row count validation skipping with TPC-DS defaults by @sgrebnov in #8149
Suppress delta_kernel::listed_log_files warnings by @phillipleblanc in #8158
Update table providers to fix warning by @phillipleblanc in #8156
Suppress MCP limit log by @phillipleblanc in #8159
Remove incorrect tool name validation by @Jeadie in #8161
Disable results validation for federated/glue[csv].yaml by @phillipleblanc in #8163
Fix dynamodb overrides by @phillipleblanc in #8165
Fix dynamo db overrides again by @phillipleblanc in #8166
few more dynamodb overrides by @phillipleblanc in #8167
Add stub release notes for v1.9.0-rc.4 by @phillipleblanc in #8168

Spice v1.8.0 (Oct 6, 2025)

October 7, 2025 · 20 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.8.0! 🧊

Spice v1.8.0 delivers major advances in data writes, scalable vector search, and now in preview—managed acceleration snapshots for fast cold starts. This release introduces write support for Iceberg tables using standard SQL INSERT INTO, partitioned S3 Vector indexes for petabyte-scale vector search, and preview of the AI SQL function for direct LLM integration in SQL. Additional improvements include improved reliability, and the v3.0.3 release of the Spice.js Node.js SDK.

What's New in v1.8.0

Iceberg Table Write Support (Preview)

Append Data to Iceberg Tables with SQL INSERT INTO: Spice now supports writing to Iceberg tables and catalogs using standard SQL INSERT INTO statements. This enables data ingestion, transformation, and pipeline use cases—no Spark or external writer required.

Append-only: Initial version targets appends; no overwrite or delete.
Schema validation: Inserted data must match the target table schema.
Secure by default: Writes are only enabled for datasets or catalogs explicitly marked with access: read_write.

Example Spicepod configuration:

catalogs:
  - from: iceberg:https://glue.ap-northeast-3.amazonaws.com/iceberg/v1/catalogs/111111/namespaces
    name: ice
    access: read_write

datasets:
  - from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_namespace/tables/my_table
    name: iceberg_table
    access: read_write

Example SQL usage:

-- Insert from another table
INSERT INTO iceberg_table
SELECT * FROM existing_table;

-- Insert with values
INSERT INTO iceberg_table (id, name, amount)
VALUES (1, 'John', 100.0), (2, 'Jane', 200.0);

-- Insert into catalog table
INSERT INTO ice.sales.transactions
VALUES (1001, '2025-01-15', 299.99, 'completed');

Note: Only Iceberg datasets and catalogs with access: read_write support writes. Internal Spice tables and other connectors remain read-only.

Learn more in the Iceberg Data Connector documentation.

Acceleration Snapshots for Fast Cold Starts (Preview)

Bootstrap Managed Accelerations from Object Storage: Spice now supports managed acceleration snapshots in preview, enabling datasets accelerated with file-based engines (DuckDB or SQLite) to bootstrap from a snapshot stored in object storage (such as S3) if the local acceleration file does not exist on startup. This dramatically reduces cold start times and enables ephemeral storage for accelerations with persistent recovery.

Key features:

Rapid readiness: Datasets can become ready in seconds by downloading a pre-built snapshot, skipping lengthy initial acceleration.
Hive-style partitioning: Snapshots are organized by month, day, and dataset for easy retention and management.
Flexible bootstrapping: Configurable fallback and retry behavior if a snapshot is missing or corrupted.

Example Spicepod configuration:

snapshots:
  enabled: true
  location: s3://some_bucket/some_folder/ # Folder for storing snapshots
  bootstrap_on_failure_behavior: warn # Options: warn, retry, fallback
  params:
    s3_auth: iam_role # All S3 dataset params accepted here

datasets:
  - from: s3://some_bucket/some_table/
    name: some_table
    params:
      file_format: parquet
      s3_auth: iam_role
    acceleration:
      enabled: true
      snapshots: enabled # Options: enabled, disabled, bootstrap_only, create_only
      engine: duckdb
      mode: file
      params:
        duckdb_file: /nvme/some_table.db

How it works:

On startup, if the acceleration file does not exist, Spice checks the snapshot location for the latest snapshot and downloads it.
Snapshots are stored as: s3://some_bucket/some_folder/month=2025-09/day=2025-09-30/dataset=some_table/some_table_<timestamp>.db
If no snapshot is found, a new acceleration file is created as usual.
Snapshots are written after each refresh (unless configured otherwise).

Supported snapshot modes:

enabled: Download and write snapshots.
bootstrap_only: Only download on startup, do not write new snapshots.
create_only: Only write snapshots, do not download on startup.
disabled: No snapshotting.

Note: This feature is only supported for file-based accelerations (DuckDB or SQLite) with dedicated files.

Why use acceleration snapshots?

Faster cold starts: Skip waiting for full acceleration on startup.
Ephemeral storage: Use fast local disks (e.g., NVMe) for acceleration, with persistent recovery from object storage.
Disaster recovery: Recover from federated source outages by bootstrapping from the latest snapshot.

Partitioned S3 Vector Indexes

Efficient, Scalable Vector Search with Partitioning: Spice now supports partitioning Amazon S3 Vector indexes and scatter-gather queries using a partition_by expression in the dataset vector engine configuration. Partitioned indexes enable faster ingestion, lower query latency, and scale to billions of vectors.

Example Spicepod configuration:

datasets:
  - name: reviews
    vectors:
      enabled: true
      engine: s3_vectors
      params:
        s3_vectors_bucket: my-bucket
        s3_vectors_index: base-embeddings
      partition_by:
        - 'bucket(50, PULocationID)'
    columns:
      - name: body
        embeddings:
          from: bedrock_titan
      - name: title
        embeddings:
          from: bedrock_titan

See the Amazon S3 Vectors documentation for details.

AI SQL function for LLM Integration (Preview)

LLMs Directly In SQL: A new asynchronous ai SQL function enables direct calls to LLMs from SQL queries for text generation, translation, classification, and more. This feature is released in preview and supports both default and model-specific invocation.

Example Spicepod model configuration:

models:
  - name: gpt-4o
    from: openai:gpt-4o
    params:
      openai_api_key: ${secrets:openai_key}

Example SQL usage:

-- basic usage with default model
SELECT ai('hi, this prompt is directly from SQL.');

-- basic usage with specified model
SELECT ai('hi, this prompt is directly from SQL.', 'gpt-4o');

-- Using row data as input to the prompt
SELECT ai(concat_ws(' ', 'Categorize the zone', Zone, 'in a single word. Only return the word.')) AS category
FROM taxi_zones
LIMIT 10;

Learn more in the SQL Reference AI documentation.

Remote Endpoint Support for Spice CLI

Run CLI Commands Remotely: The Spice CLI now supports connecting to remote Spice instances, enabling you to run spice sql, spice search, and spice chat commands from your local machine against a remote spiced daemon or to Spice Cloud. Previously, these commands required running on the same machine as the runtime. Now, new flags allow remote execution:

--cloud: Connect to a Spice Cloud instance (requires --api-key).
--endpoint <endpoint>: Connect to a remote Spice instance via HTTP or Arrow Flight SQL (gRPC). Supports http://, https://, grpc://, or grpc+tls:// schemes.

Examples:

# Run SQL queries against a remote Spice instance
spice sql --endpoint http://remote-host:8090

# Use Spice Cloud for chat or search
spice chat --cloud --api-key <your-api-key>
spice search --cloud --api-key <your-api-key>

Supported CLI Commands:

spice sql --cloud / spice sql --endpoint <endpoint>
spice search --cloud / spice search --endpoint <endpoint>
spice chat --cloud / spice chat --endpoint <endpoint>

Additional Flags:

--headers: Pass custom HTTP headers to the remote endpoint.
--tls-root-certificate-file: Specify a root certificate for TLS verification.
--user-agent: Set a custom user agent for requests.

For more details, see the Spice CLI Command Reference.

Spice.js v3.0.3 SDK

Spice.js v3.0.3 Released: The official Spice.ai Node.js/JavaScript SDK has been updated to v3.0.3, bringing cross-platform support, new APIs, and improved reliability for both Node.js and browser environments.

Modern Query Methods: Use sql(), sqlJson(), and nsql() for flexible querying, streaming, and natural language to SQL.
Browser Support: SDK now works in browsers and web applications, automatically selecting the optimal transport (gRPC or HTTP).
Health Checks & Dataset Refresh: Easily monitor Spice runtime health and trigger dataset refreshes on demand.
Automatic HTTP Fallback: If gRPC/Flight is unavailable, the SDK falls back to HTTP automatically.
Migration Guidance: v3 requires Node.js 20+, uses camelCase parameters, and introduces a new package structure.

Example usage:

import { SpiceClient } from '@spiceai/spice'

const client = new SpiceClient(apiKey)
const table = await client.sql('SELECT * FROM my_table LIMIT 10')
console.table(table.toArray())

See Spice.js SDK documentation for full details, migration tips, and advanced usage.

Additional Improvements

Reliability: Improved logging, error handling, and network readiness checks across connectors (Iceberg, Databricks, etc.).
Vector search durability and scale: Refined logging, stricter default limits, safeguards against index-only scans and duplicate results, and always-accessible metadata for robust queryability at scale.
Cache behavior: Tightened cache logic for modification queries.
Full-Text Search: FTS metadata columns now usable in projections; max search results increased to 1000.
RRF Hybrid Search: Reciprocal Rank Fusion (RRF) UDTF enhancements for advanced hybrid search scenarios.

Contributors

Breaking Changes

This release introduces two breaking changes associated with the search observability and tooling.

Firstly, the document_similarity tool has been renamed to search. This has the equivalent change to tracing of these tool calls:

## Old: v1.7.1
>> spice trace tool_use::document_similarity
>> curl -XPOST http://localhost:8090/v1/tools/document_similarity \
  -d '{
    "datasets": ["my_tbl"],
    "text": "Welcome to another Spice release"
  }'

## New: v1.8.0
>> spice trace tool_use::search
>> curl -XPOST http://localhost:8090/v1/tools/search \
  -d '{
    "datasets": ["my_tbl"],
    "text": "Welcome to another Spice release"
  }'

Secondly, the vector_search task in runtime.task_history has been renamed to search.

Cookbook Updates

Added new AI SQL function recipe for invoking LLMs within SQL queries.
Updated Iceberg Catalog Connector recipe for Iceberg Writes.
Updated Spice.js JavaScript (Node.js) SDK for v3.0.3 with examples and v2 to v3 migration guide.

The Spice Cookbook now includes 80 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.8.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.8.0 image:

docker pull spiceai/spiceai:1.8.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Dependencies

iceberg-rust: Upgraded to v0.7.0-rc.1
mimalloc: Upgraded from 0.1.47 to 0.1.48
azure_core: Upgraded from 0.27.0 to 0.28.0
Jimver/cuda-toolkit: Upgraded from 0.2.27 to 0.2.28

Changelog

Add #[cfg(feature = "postgres")] to acceleration refresh tests by @Jeadie in #7241
fix: Update benchmark snapshots by @github-actions[bot] in #7267
fix: Update benchmark snapshots by @github-actions[bot] in #7268
fix: Update benchmark snapshots by @github-actions[bot] in #7269
Update the tpch benchmark snapshots for: federated/databricks[sql_warehouse].yaml by @github-actions[bot] in #7270
EmbeddingInput cache keys to include model name by @mach-kernel in #7275
ensure FTS metadata columns can be used in projection by @Jeadie in #7282
Use 8-core runners for Windows CUDA builds by @sgrebnov in #7284
Make search test more robust by @krinart in #7283
Post-release housekeeping by @sgrebnov in #7272
fix: Use median cached response duration for test search cache by @peasee in #7286
Bump dirs from 5.0.1 to 6.0.0 by @dependabot[bot] in #7244
Bump indexmap from 2.11.0 to 2.11.4 by @dependabot[bot] in #7248
Fix JOIN level filters not having columns in schema by @Jeadie in #7287
use SessionContext::new_empty in RRF by @kczimm in #7291
Use rust:1.89-slim-bookworm for build, more places to bump rust version by @sgrebnov in #7293
Update openapi.json by @github-actions[bot] in #7290
Enable chunking in SearchIndex by @Jeadie in #7143
Add index name and remove duplicate records string to S3 Vectors log by @lukekim in #7260
Use file-based fts index by @Jeadie in #7024
Remove 'PostApplyCandidateGeneration' by @Jeadie in #7288
RRF: Rank and recency boosting by @mach-kernel in #7294
Update ROADMAP.md by removing v1.7 milestone by @sgrebnov in #7297
RRF: Preserve base ranking when results differ -> FULL OUTER JOIN does not produce time column by @mach-kernel in #7300
chore: remove unused Dataset methods by @kczimm in #7295
fix removing embedding column by @Jeadie in #7302
fix: Add feature flag for using object store in spicepod by @peasee in #7303
Upgrade to iceberg-rust v0.7.0-rc1 by @sgrebnov in #7296
Enable DML Update SQL operations for datasets configured as access: read_write by @sgrebnov in #7304
Create and parse partitioned S3 vector index names by @kczimm in #7198
RRF: Fix decay for disjoint result sets by @mach-kernel in #7305
RRF: Project top scores, do not yield duplicate results by @mach-kernel in #7306
RRF: Case sensitive column/ident handling by @mach-kernel in #7309 in #7309
For vector_search, use a default limit of 1000 if no limit specified by @lukekim in #7311
Don’t cache modification queries (DDL, DML, COPY) by @sgrebnov in #7316
Fix Anthropic model regex and add validation tests by @ewgenius in #7319
Enhancement: Implement before/after/lag metrics for acceleration refresh by @krinart in #7310
Refactor chat model health check to lower tokens usage for reasoning models by @ewgenius in #7317
Add support for writing into Iceberg tables by @sgrebnov in #7315
Fix lint warnings by @lukekim in #7327
Use logical plan in SearchQueryProvider by @Jeadie in #7314
FTS max search results 100 -> 1000 by @Jeadie in #7331
Improve Databricks SQL Warehouse Error Handling by @sgrebnov in #7332
Use spicepod embedding model name for model_name() by @Jeadie in #7333
Handle async queries for Databricks SQL Warehouse API by @phillipleblanc in #7335
Enable DML (INSERT INTO) operations for catalogs configured as access:read_write by @sgrebnov in #7330
Bump regex from 1.11.2 to 1.11.3 by @dependabot[bot] in #7336
Update qa_analytics.csv with 1.7.0 release data by @sgrebnov in #7337
RRF: Fix ident resolution for struct fields, autohashed join key for varying types by @mach-kernel in #7339
v1.7.1 release notes by @kczimm in #7348
Bump Jimver/cuda-toolkit from 0.2.27 to 0.2.28 by @dependabot[bot] in #7343
Add support for writing into Glue (Iceberg) tables and catalogs by @sgrebnov in #7355
Bump mimalloc from 0.1.47 to 0.1.48 by @dependabot[bot] in #7342
Add ai async UDF by @lukekim in #7328
Use self-hosted and spiceai-macos runners for workflows where possible by @lukekim in #7371
Several updates for improved search testing by @Jeadie in #7358
Update supported versions in SECURITY.md by @Jeadie in #7377
1.7.1 release analytics by @mach-kernel in #7380
Add acceleration_file_path helper and refactor spice_sys to use Snafu errors by @phillipleblanc in #7376 in #7376
fix: Update benchmark snapshots by @github-actions[bot] in #7353
Robust search test by @Jeadie in #7381
[bug] Fix ai UDF bug of mismatched column length by @lukekim in #7383
Add OpenOption to spice_sys acceleration tables by @phillipleblanc in #7379
Add new snapshots Spicepod configuration by @phillipleblanc in #7384
Update naming of tool_use::document_similarity and vector_search spans by @Jeadie in #7273
fix: Update benchmark snapshots by @github-actions[bot] in #7354
Make ai UDF a models only feature by @lukekim in #7387
Add new runtime_acceleration crate; create SnapshotManager; implement SnapshotManager::download_latest_snapshot by @phillipleblanc in #7386
Refactor 'VectorScanTableProvider' to use just 'VectorIndex::list_table_provider' by @Jeadie in #7318
Fix embed logs by @Jeadie in #7382
Enable spicepod dependencies in testoperator by @Jeadie in #7334
ai UDF security and performance optimizations by @lukekim in #7392
Wire up the snapshot download on dataset startup by @phillipleblanc in #7389
Implement initial snapshot creation logic in SnapshotManager by @phillipleblanc in #7391
Make tool_use::table_schema output model-friendly by @krinart in #7393
Fix minor lint warnings by @lukekim in #7395
Enable metadata columns in document-based object store datasets by @Jeadie in #7397
Core dependencies of financebench by @Jeadie in #7400
Add S3vector variant to financebench by @Jeadie in #7399
Set PostgreSQL unsupported_spice_action=string by default by @lukekim in #7398
Use non-blocking connection check for verify_ns_lookup_and_tcp_connect by @phillipleblanc in #7401
Bump moka from 0.12.10 to 0.12.11 by @dependabot[bot] in #7340
Bump tokio-postgres from 0.7.13 to 0.7.14 by @dependabot[bot] in #7344
Bump azure_core from 0.27.0 to 0.28.0 by @dependabot[bot] in #7338
Forbid INSERT OVERWRITE DML operations by @sgrebnov in #7402
Make database connection pool sizes consistent by @lukekim in #7403
Disable vector index only scans by @Jeadie in #7405
Make CLI --endpoint and --cloud args & table output consistent by @lukekim in #7396
Write new snapshots at the end of an accelerated refresh by @phillipleblanc in #7410
Read and write partitioned S3 indexes by @kczimm in #7313
Fix partial data writes in Iceberg data connector by @sgrebnov in #7411
Remove nix by @phillipleblanc in #7414
Use DataFusion JoinSetTracer for async context propagation by @lukekim in #7416
Implement cache invalidation for DML (INSERT INTO) operations by @sgrebnov in #7394
Make cleanup disk GH action; use in integration tests by @Jeadie in #7418
Move S3Vector to 'search' crate by @Jeadie in #7373
Use LogicalPlan builder API for LogicalPlans by @Jeadie in #7408
Use hive-style partitioned paths for DB snapshots by @phillipleblanc in #7422
Limit results from SearchIndex::query_table_provider by @Jeadie in #7421
Delay initial readiness if snapshots are enabled with an append-mode refresh by @phillipleblanc in #7425
Disable snapshots by default by @phillipleblanc in #7426
Rewrite ChunkedNonIndexVectorGeneration to use LogicalPlanBuilder (instead of string formatting) by @Jeadie in #7413
Fix for search field as metadata for chunked search indexes by @Jeadie in #7429
Add feature is currently in preview warning for read_write access mode by @sgrebnov in #7440
Add feature is currently in preview warning for snapshots by @sgrebnov in #7442
Fix tracing so that ai_completions are parented under sql_query by @lukekim in #7415
Disable acceleration refresh metrics by @krinart in #7450
Enable snapshot acceleration by default by @phillipleblanc in #7451
fix: partition name validation by @kczimm in #7452

Spice v1.3.2 (June 2, 2025)

June 2, 2025 · 2 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.3.2! ❄️

Spice v1.3.2 is a patch release with fixes to the DuckDB data accelerator and Snowflake data connector.

Changes:

DuckDB Data Accelerator: Supports ORDER BY rand() for randomized result ordering and ORDER BY NULL for SQL compatibility.
Snowflake Data Connector: Adds TIMESTAMP_NTZ(0) type for timestamps with seconds precision.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.3.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.2 image:

docker pull spiceai/spiceai:1.3.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

No major dependency changes.

Changelog

Handle Snowflake Timestamp NTZ with seconds precision (#6084) by @kczimm in #6084
Fix DuckDB acceleration ORDER BY rand() and ORDER BY NULL (#6071) by @phillipleblanc in #6071

Full Changelog: https://github.com/spiceai/spiceai/compare/v1.3.1...v1.3.2

Spice v1.3.0 (May 19, 2025)

May 20, 2025 · 9 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.3.0! 🏎️

Spice v1.3.0 accelerates data and AI applications with significantly improved query performance, reliability, and expanded Databricks integration. New support for the Databricks SQL Statement Execution API enables direct SQL queries on Databricks SQL Warehouses, complementing Mosaic AI model serving and embeddings (introduced in v1.2.2) and existing Databricks catalog and dataset integrations. This release upgrades to DataFusion v46, optimizes results caching performance, and strengthens security with least-privilege sandboxed improvements.

What's New in v1.3.0

Databricks SQL Statement Execution API Support: Added support for the Databricks SQL Statement Execution API, enabling direct SQL queries against Databricks SQL Warehouses for optimized performance in analytics and reporting workflows.

Example spicepod.yml configuration:

datasets:
  - from: databricks:spiceai.datasets.my_awesome_table
    name: my_awesome_table
    params:
      mode: sql_warehouse
      databricks_endpoint: ${env:DATABRICKS_ENDPOINT}
      databricks_sql_warehouse_id: ${env:DATABRICKS_SQL_WAREHOUSE_ID}
      databricks_token: ${env:DATABRICKS_TOKEN}

For details, see the Databricks Data Connector documentation.

Improved Results Cache Performance & Hashing Algorithm: Spice now supports an alternative results cache hashing algorithm, ahash, in addition to siphash, being the default. Configure it via:
```
runtime:
  results_cache:
    hashing_algorithm: ahash # or siphash
```
The hashing algorithm determines how cache keys are hashed before being stored, impacting both lookup speed and protection against potential DOS attacks.

Using ahash improves performance for large queries or query plans. Combined with results cache optimizations, it reduces 99th percentile request latency and increases total requests/second for queries with large result sets (100k+ cached rows). The following charts show performance tested against the TPCH Query #17 on a scale factor 5 dataset (30+ million rows, 5GB):

Latency Req/sec

Note: ahash was not available in v1.2.2, so it is excluded from comparisons.

To learn more, refer to the Results Cache Hashing Algorithm documentation.
SQL Query Performance: Optimized the critical SQL query path, reducing overhead and improving response times for simple queries by 10-20%.
DuckDB Acceleration: Fixed a bug in the DuckDB acceleration engine causing query failures under high concurrency when querying datasets accelerated into multiple DuckDB files.
Container Security: The container image now runs as a non-root user with enhanced sandboxing and includes only essential dependencies for a slimmer, more secure image.

DataFusion v46 Highlights

Spice.ai is built on the DataFusion query engine. The v46 release brings:

Faster Performance 🚀: DataFusion 46 introduces significant performance enhancements, including a 2x faster median() function for large datasets without grouping, 10–100% speed improvements in FIRST_VALUE and LAST_VALUE window functions by avoiding sorting, and a 40x faster uuid() function. Additional optimizations, such as a 50% faster repeat() string function, accelerated chr() and to_hex() functions, improved grouping algorithms, and Parquet row group pruning with NOT LIKE filters, further boost overall query efficiency.
New range() Table Function: A new table-valued function range(start, stop, step) has been added to make it easy to generate integer sequences — similar to PostgreSQL’s generate_series() or Spark’s range(). Example: SELECT * FROM range(1, 10, 2);
UNION [ALL | DISTINCT] BY NAME Support: DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which align columns by name instead of position. This matches functionality found in systems like Spark and DuckDB and simplifies combining heterogeneously ordered result sets.

Example:
```
SELECT col1, col2 FROM t1
UNION ALL BY NAME
SELECT col2, col1 FROM t2;
```

See the DataFusion 46.0.0 release notes for details.

Spice.ai adopts the latest minus one DataFusion release for quality assurance and stability. The upgrade to DataFusion v47 is planned for Spice v1.4.0 in June.

Contributors

Breaking Changes

The container image now always runs as a non-root user (UID/GID 65534) with minimal dependencies, resulting in a smaller, more secure image. Standard Linux tools, including bash, are no longer included.

Kubernetes Deployments:

Use of the v1.3.0+ Helm chart is required, which includes a securityContext ensuring the sandbox user has required file access.
For deployments using a lower version than the v1.3.0 Helm chart, add the following securityContext to the pod specification:

securityContext:
  runAsUser: 65534
  runAsGroup: 65534
  fsGroup: 65534

See the Docker Sandbox Guide for details on how to update custom Docker images to restore the previous behavior.

Cookbook Updates

Added Accelerated Views: Pre-calculate and materialize data derived from one or more underlying datasets.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.3.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.0 image:

docker pull spiceai/spiceai:1.3.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

DataFusion: Upgraded to v46
Apache Arrow: Upgraded to v54.3.0
delta_kernel: Upgraded to v0.10.0

Changelog

update to 1.2.2 by @Jeadie in #5806
Move sandboxing logic to Dockerfile by @phillipleblanc in #5808
Add note to run installation health workflow after release is marked as official by @Sevenannn in #5797
ROADMAP updates May 13, 2025 by @lukekim in #5809
Update qa_analytics.csv by @kczimm in #5810
post-release housekeeping by @Jeadie in #5811
Fix flaky DataBricks M2M integration tests by @phillipleblanc in #5818
Add DataFusion request context extension to http routes by @ewgenius in #5807
Use Utf8 for partition columns by @phillipleblanc in #5820
Use full path for location metadata column by @phillipleblanc in #5819
Remove the DataFusion reference from the flight service and use the reference from the request context instead by @ewgenius in #5821
Upgrade delta_kernel to 0.10 by @phillipleblanc in #5823
fix: Update benchmark snapshots by @app/github-actions in #5827
Update qa_analytics.csv by @kczimm in #5824
fix: Update benchmark snapshots by @app/github-actions in #5826
fix: Update benchmark snapshots by @app/github-actions in #5825
Fix dispatch spicepod reference for file[parquet]-duckdb[file]-indexes and file[parquet]-duckdb[memory]-indexes by @phillipleblanc in #5837
Fix spice run --http-endpoint in CLI by @Jeadie in #5812
Prevent excessively copying RawCacheKey by @peasee in #5838
Make DuckDB database attachments logic more robust by @sgrebnov in #5839
Simplify Databricks U2M auth flow, by moving user auth to the request context by @ewgenius in #5842
Update to new MCP crate by @Jeadie in #5758
Disable the query tracker when task history is disabled by @peasee in #5852
Set fsGroup on PodSpec to force volumes to be mounted with permission to docker image by @phillipleblanc in #5854
Clarify Helm release steps by @phillipleblanc in #5855
Avoid cloning cached results by @peasee in #5853
Upgrade to DataFusion 46 by @phillipleblanc in #5543
Update openapi.json by @app/github-actions in #5856
Adapt to Arrow 54 changes in Dict IDs preserving (Arrow IPC) by @sgrebnov in #5866
fix: Update benchmark snapshots by @app/github-actions in #5867
Fix s3[parquet]-duckdb[file-many] benchmark Spicepod configuration by @sgrebnov in #5868
fix: Update benchmark snapshots by @app/github-actions in #5869
feat: Refactor caching, support hashing algorithms by @peasee in #5859
Overried health checks for Databricks models in U2M auth mode by @ewgenius in #5858
Update trunk to 1.4.0-unstable by @phillipleblanc in #5878
fix: Pass parameters to testoperator explain plan by @peasee in #5883
Disallow schema updates for existing accelerated tables by @phillipleblanc in #5887
Deferrable registration for Databricks U2M datasets by @ewgenius in #5860

See the full list of changes at: v1.2.2...v1.3.0

What's New in v2.0.1​

Faster Iceberg Reads with Parallel File Scanning​

AWS S3 & Object-Store Reliability​

Data Acceleration & Distributed Query Fixes​

Authenticated Query Fixes​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

Highlights in v2.0.0 include:​

Distribution Changes​

What's New in v2.0.0​

Spice Cayenne Reaches General Availability​

Multi-Active HA Distributed Query (GA)​

Security: Mutual TLS, Secret Stores, and Hardening​

Change Data Capture (CDC) Sources​

DML, DDL, and Write-Back​

SQL & User-Defined Functions​

Runtime Features​

Spicepod v2​

Data Connectors & Catalogs​

AI & LLM​

Search & Vectors​

Caching​

Performance & Query Engine​

Rust CLI​

Observability​

Notable Bug Fixes​

Dependency Updates​

Contributors​

Breaking Changes​

Upgrade Guide from v1.x​

1. Build, image, and platform changes​

2. Adopt Spicepod v2 (recommended)​

3. Update changed configuration​

4. Update queries and API clients​

5. Update model providers​

6. Update observability​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.11.3​

S3 Data Connector Fix​

FlightSQL Schema Consistency​

CDC Cache Invalidation​

HTTP Data Connector Improvements​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.10.4​

Additional Improvements & Bug Fixes​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.10.3​

Additional Improvements & Bug Fixes​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.9.0​

Cayenne Data Accelerator (Beta)​

Multi-Node Distributed Query (Preview)​

DataFusion v50 Upgrade​

DuckDB v1.4.2 Upgrade and Accelerator Improvements​

HTTP Data Connector​

DynamoDB Data Connector Improvements​

S3 Data Connector Improvements​

Search & Embeddings Enhancements​

What's New in v2.0.1

Faster Iceberg Reads with Parallel File Scanning

AWS S3 & Object-Store Reliability

Data Acceleration & Distributed Query Fixes

Authenticated Query Fixes

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

Highlights in v2.0.0 include:

Distribution Changes

What's New in v2.0.0

Spice Cayenne Reaches General Availability

Multi-Active HA Distributed Query (GA)

Security: Mutual TLS, Secret Stores, and Hardening

Change Data Capture (CDC) Sources

DML, DDL, and Write-Back

SQL & User-Defined Functions

Runtime Features

Spicepod v2

Data Connectors & Catalogs

AI & LLM

Search & Vectors

Caching

Performance & Query Engine

Rust CLI

Observability

Notable Bug Fixes

Dependency Updates

Contributors

Breaking Changes

Upgrade Guide from v1.x

1. Build, image, and platform changes

2. Adopt Spicepod `v2` (recommended)

3. Update changed configuration

4. Update queries and API clients

5. Update model providers

6. Update observability

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.11.3

S3 Data Connector Fix

FlightSQL Schema Consistency

CDC Cache Invalidation

HTTP Data Connector Improvements

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.10.4

Additional Improvements & Bug Fixes

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.10.3

Additional Improvements & Bug Fixes

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.9.0

Cayenne Data Accelerator (Beta)

Multi-Node Distributed Query (Preview)

DataFusion v50 Upgrade

DuckDB v1.4.2 Upgrade and Accelerator Improvements

HTTP Data Connector

DynamoDB Data Connector Improvements

S3 Data Connector Improvements

Search & Embeddings Enhancements