Skip to main content

10 posts tagged with "sql"

SQL queries and database management

View All Tags

Spice v1.8.1 (Oct 13, 2025)

Β· 5 min read
Viktor Yershov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.8.1! πŸš€

Spice v1.8.1 is a patch release that adds Acceleration Snapshots Indexes, and includes a number of bug fixes and performance improvements.

What's New in v1.8.1​

Acceleration Snapshot Indexes​

  • Management of Acceleration Snapshots has been improved by adopting an Iceberg-inspired metadata.json, which now encodes pointer IDs, schema serialization, and robust checksum and size, which is validate before loading the snapshot.

  • Acceleration Snapshot Metrics: The following metrics are now available for Acceleration Snapshots:

  • dataset_acceleration_snapshot_bootstrap_duration_ms: The time it took the runtime to download the snapshot - only emitted when it initially downloads the snapshot.

  • dataset_acceleration_snapshot_bootstrap_bytes: The number of bytes downloaded to bootstrap the acceleration from the snapshot.

  • dataset_acceleration_snapshot_bootstrap_checksum: The checksum of the snapshot used to bootstrap the acceleration.

  • dataset_acceleration_snapshot_failure_count: Number of failures encountered when writing a new snapshot at the end of the refresh cycle. A snapshot failure does not prevent the refresh from completing.

  • dataset_acceleration_snapshot_write_timestamp: Unix timestamp in seconds when the last snapshot was completed.

  • dataset_acceleration_snapshot_write_duration_ms: The time it took to write the snapshot to object storage.

  • dataset_acceleration_snapshot_write_bytes: The number of bytes written on the last snapshot write.

  • dataset_acceleration_snapshot_write_checksum: The SHA256 checksum of the last snapshot write.

To learn more, see the Acceleration Snapshots Documentation and the Metrics Documentation.

Improved Regular Expression for DuckDB acceleration​

Regular expression support has been expanded when using DuckDB acceleration for functions like regexp-like and regexp_match.

For more details, refer to the SQL Reference for the list of available regular expression functions.

Additional Improvements & Bugfixes​

  • Reliability: Resolved an issue with partitioning on empty partition sets.
  • Validation: Added better validation for incorrectly configured Spicepods.
  • Reliability: Fixed partition_by accelerations when a projection is applied on empty partition sets.
  • Performance: Ensured ListingTable partitions are pruned when filters are not used.
  • Performance: Don't download acceleration snapshots if the acceleration is already present.
  • Performance: Refactored some blocking I/O and synchronization in the async codebase by moving operations to tokio::task::spawn_blocking, replacing blocking locks with async-friendly variants.
  • Bugfix: Nullable fields are now supported for S3 Vectors index columns.

Contributors​

Breaking Changes​

No breaking changes.

Cookbook Updates​

  • New Accelerated Snapshots Recipe - The recipe shows how to bootstrap DuckDB accelerations from object storage to skip cold starts.

The Spice Cookbook includes 81 recipes to help you get started with Spice quickly and easily.


Upgrading​

To upgrade to v1.8.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.8.1 image:

docker pull spiceai/spiceai:1.8.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

πŸŽ‰ Spice is now available in the AWS Marketplace!

What's Changed​

Changelog​

Spice v1.8.0 (Oct 6, 2025)

Β· 20 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.8.0! 🧊

Spice v1.8.0 delivers major advances in data writes, scalable vector search, and now in previewβ€”managed acceleration snapshots for fast cold starts. This release introduces write support for Iceberg tables using standard SQL INSERT INTO, partitioned S3 Vector indexes for petabyte-scale vector search, and preview of the AI SQL function for direct LLM integration in SQL. Additional improvements include improved reliability, and the v3.0.3 release of the Spice.js Node.js SDK.

What's New in v1.8.0​

Iceberg Table Write Support (Preview)​

Append Data to Iceberg Tables with SQL INSERT INTO: Spice now supports writing to Iceberg tables and catalogs using standard SQL INSERT INTO statements. This enables data ingestion, transformation, and pipeline use casesβ€”no Spark or external writer required.

  • Append-only: Initial version targets appends; no overwrite or delete.
  • Schema validation: Inserted data must match the target table schema.
  • Secure by default: Writes are only enabled for datasets or catalogs explicitly marked with access: read_write.

Example Spicepod configuration:

catalogs:
- from: iceberg:https://glue.ap-northeast-3.amazonaws.com/iceberg/v1/catalogs/111111/namespaces
name: ice
access: read_write

datasets:
- from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_namespace/tables/my_table
name: iceberg_table
access: read_write

Example SQL usage:

-- Insert from another table
INSERT INTO iceberg_table
SELECT * FROM existing_table;

-- Insert with values
INSERT INTO iceberg_table (id, name, amount)
VALUES (1, 'John', 100.0), (2, 'Jane', 200.0);

-- Insert into catalog table
INSERT INTO ice.sales.transactions
VALUES (1001, '2025-01-15', 299.99, 'completed');

Note: Only Iceberg datasets and catalogs with access: read_write support writes. Internal Spice tables and other connectors remain read-only.

Learn more in the Iceberg Data Connector documentation.

Acceleration Snapshots for Fast Cold Starts (Preview)​

Bootstrap Managed Accelerations from Object Storage: Spice now supports managed acceleration snapshots in preview, enabling datasets accelerated with file-based engines (DuckDB or SQLite) to bootstrap from a snapshot stored in object storage (such as S3) if the local acceleration file does not exist on startup. This dramatically reduces cold start times and enables ephemeral storage for accelerations with persistent recovery.

Key features:

  • Rapid readiness: Datasets can become ready in seconds by downloading a pre-built snapshot, skipping lengthy initial acceleration.
  • Hive-style partitioning: Snapshots are organized by month, day, and dataset for easy retention and management.
  • Flexible bootstrapping: Configurable fallback and retry behavior if a snapshot is missing or corrupted.

Example Spicepod configuration:

snapshots:
enabled: true
location: s3://some_bucket/some_folder/ # Folder for storing snapshots
bootstrap_on_failure_behavior: warn # Options: warn, retry, fallback
params:
s3_auth: iam_role # All S3 dataset params accepted here

datasets:
- from: s3://some_bucket/some_table/
name: some_table
params:
file_format: parquet
s3_auth: iam_role
acceleration:
enabled: true
snapshots: enabled # Options: enabled, disabled, bootstrap_only, create_only
engine: duckdb
mode: file
params:
duckdb_file: /nvme/some_table.db

How it works:

  • On startup, if the acceleration file does not exist, Spice checks the snapshot location for the latest snapshot and downloads it.
  • Snapshots are stored as: s3://some_bucket/some_folder/month=2025-09/day=2025-09-30/dataset=some_table/some_table_<timestamp>.db
  • If no snapshot is found, a new acceleration file is created as usual.
  • Snapshots are written after each refresh (unless configured otherwise).

Supported snapshot modes:

  • enabled: Download and write snapshots.
  • bootstrap_only: Only download on startup, do not write new snapshots.
  • create_only: Only write snapshots, do not download on startup.
  • disabled: No snapshotting.

Note: This feature is only supported for file-based accelerations (DuckDB or SQLite) with dedicated files.

Why use acceleration snapshots?

  • Faster cold starts: Skip waiting for full acceleration on startup.
  • Ephemeral storage: Use fast local disks (e.g., NVMe) for acceleration, with persistent recovery from object storage.
  • Disaster recovery: Recover from federated source outages by bootstrapping from the latest snapshot.

Partitioned S3 Vector Indexes​

Efficient, Scalable Vector Search with Partitioning: Spice now supports partitioning Amazon S3 Vector indexes and scatter-gather queries using a partition_by expression in the dataset vector engine configuration. Partitioned indexes enable faster ingestion, lower query latency, and scale to billions of vectors.

Example Spicepod configuration:

datasets:
- name: reviews
vectors:
enabled: true
engine: s3_vectors
params:
s3_vectors_bucket: my-bucket
s3_vectors_index: base-embeddings
partition_by:
- 'bucket(50, PULocationID)'
columns:
- name: body
embeddings:
from: bedrock_titan
- name: title
embeddings:
from: bedrock_titan

See the Amazon S3 Vectors documentation for details.

AI SQL function for LLM Integration (Preview)​

LLMs Directly In SQL: A new asynchronous ai SQL function enables direct calls to LLMs from SQL queries for text generation, translation, classification, and more. This feature is released in preview and supports both default and model-specific invocation.

Example Spicepod model configuration:

models:
- name: gpt-4o
from: openai:gpt-4o
params:
openai_api_key: ${secrets:openai_key}

Example SQL usage:

-- basic usage with default model
SELECT ai('hi, this prompt is directly from SQL.');
-- basic usage with specified model
SELECT ai('hi, this prompt is directly from SQL.', 'gpt-4o');
-- Using row data as input to the prompt
SELECT ai(concat_ws(' ', 'Categorize the zone', Zone, 'in a single word. Only return the word.')) AS category
FROM taxi_zones
LIMIT 10;

Learn more in the SQL Reference AI documentation.

Remote Endpoint Support for Spice CLI​

Run CLI Commands Remotely: The Spice CLI now supports connecting to remote Spice instances, enabling you to run spice sql, spice search, and spice chat commands from your local machine against a remote spiced daemon or to Spice Cloud. Previously, these commands required running on the same machine as the runtime. Now, new flags allow remote execution:

  • --cloud: Connect to a Spice Cloud instance (requires --api-key).
  • --endpoint <endpoint>: Connect to a remote Spice instance via HTTP or Arrow Flight SQL (gRPC). Supports http://, https://, grpc://, or grpc+tls:// schemes.

Examples:

# Run SQL queries against a remote Spice instance
spice sql --endpoint http://remote-host:8090

# Use Spice Cloud for chat or search
spice chat --cloud --api-key <your-api-key>
spice search --cloud --api-key <your-api-key>

Supported CLI Commands:

  • spice sql --cloud / spice sql --endpoint <endpoint>
  • spice search --cloud / spice search --endpoint <endpoint>
  • spice chat --cloud / spice chat --endpoint <endpoint>

Additional Flags:

  • --headers: Pass custom HTTP headers to the remote endpoint.
  • --tls-root-certificate-file: Specify a root certificate for TLS verification.
  • --user-agent: Set a custom user agent for requests.

For more details, see the Spice CLI Command Reference.

Spice.js v3.0.3 SDK​

Spice.js v3.0.3 Released: The official Spice.ai Node.js/JavaScript SDK has been updated to v3.0.3, bringing cross-platform support, new APIs, and improved reliability for both Node.js and browser environments.

  • Modern Query Methods: Use sql(), sqlJson(), and nsql() for flexible querying, streaming, and natural language to SQL.
  • Browser Support: SDK now works in browsers and web applications, automatically selecting the optimal transport (gRPC or HTTP).
  • Health Checks & Dataset Refresh: Easily monitor Spice runtime health and trigger dataset refreshes on demand.
  • Automatic HTTP Fallback: If gRPC/Flight is unavailable, the SDK falls back to HTTP automatically.
  • Migration Guidance: v3 requires Node.js 20+, uses camelCase parameters, and introduces a new package structure.

Example usage:

import { SpiceClient } from '@spiceai/spice'

const client = new SpiceClient(apiKey)
const table = await client.sql('SELECT * FROM my_table LIMIT 10')
console.table(table.toArray())

See Spice.js SDK documentation for full details, migration tips, and advanced usage.

Additional Improvements​

  • Reliability: Improved logging, error handling, and network readiness checks across connectors (Iceberg, Databricks, etc.).
  • Vector search durability and scale: Refined logging, stricter default limits, safeguards against index-only scans and duplicate results, and always-accessible metadata for robust queryability at scale.
  • Cache behavior: Tightened cache logic for modification queries.
  • Full-Text Search: FTS metadata columns now usable in projections; max search results increased to 1000.
  • RRF Hybrid Search: Reciprocal Rank Fusion (RRF) UDTF enhancements for advanced hybrid search scenarios.

Contributors​

Breaking Changes​

This release introduces two breaking changes associated with the search observability and tooling.

Firstly, the document_similarity tool has been renamed to search. This has the equivalent change to tracing of these tool calls:

## Old: v1.7.1
>> spice trace tool_use::document_similarity
>> curl -XPOST http://localhost:8090/v1/tools/document_similarity \
-d '{
"datasets": ["my_tbl"],
"text": "Welcome to another Spice release"
}'

## New: v1.8.0
>> spice trace tool_use::search
>> curl -XPOST http://localhost:8090/v1/tools/search \
-d '{
"datasets": ["my_tbl"],
"text": "Welcome to another Spice release"
}'

Secondly, the vector_search task in runtime.task_history has been renamed to search.

Cookbook Updates​

The Spice Cookbook now includes 80 recipes to help you get started with Spice quickly and easily.


Upgrading​

To upgrade to v1.8.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.8.0 image:

docker pull spiceai/spiceai:1.8.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

πŸŽ‰ Spice is now available in the AWS Marketplace!

What's Changed​

Dependencies​

  • iceberg-rust: Upgraded to v0.7.0-rc.1
  • mimalloc: Upgraded from 0.1.47 to 0.1.48
  • azure_core: Upgraded from 0.27.0 to 0.28.0
  • Jimver/cuda-toolkit: Upgraded from 0.2.27 to 0.2.28

Changelog​

Spice v1.5.0 (July 21, 2025)

Β· 14 min read
Evgenii Khramkov
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.5.0! πŸ”

Spice v1.5.0 brings major upgrades to search and retrieval. It introduces native support for Amazon S3 Vectors, enabling petabyte scale vector search directly from S3 vector buckets, alongside SQL-integrated vector and tantivy-powered full-text search, partitioning for DuckDB acceleration, and automated refreshes for search indexes and views. It includes the AWS Bedrock Embeddings Model Provider, the Oracle Database connector, and the now-stable Spice.ai Cloud Data Connector, and the upgrade to DuckDB v1.3.2.

What's New in v1.5.0​

Amazon S3 Vectors Support: Spice.ai now integrates with Amazon S3 Vectors, launched in public preview on July 15, 2025, enabling vector-native object storage with built-in indexing and querying. This integration supports semantic search, recommendation systems, and retrieval-augmented generation (RAG) at petabyte scale with S3’s durability and elasticity. Spice.ai manages the vector lifecycleβ€”ingesting data, creating embeddings with models like Amazon Titan or Cohere via AWS Bedrock, or others available on HuggingFace, and storing it in S3 Vector buckets.

Spice integration with Amazon S3 Vectors

Example Spicepod.yml configuration for S3 Vectors:

datasets:
- from: s3://my_data_bucket/data/
name: my_vectors
params:
file_format: parquet
acceleration:
enabled: true
vectors:
engine: s3_vectors
params:
s3_vectors_aws_region: us-east-2
s3_vectors_bucket: my-s3-vectors-bucket
columns:
- name: content
embeddings:
- from: bedrock_titan
row_id:
- id

Example SQL query using S3 Vectors:

SELECT *
FROM vector_search(my_vectors, 'Cricket bats', 10)
WHERE price < 100
ORDER BY score

For more details, refer to the S3 Vectors Documentation.

SQL-integrated Search: Vector and BM25-scored full-text search capabilities are now natively available in SQL queries, extending the power of the POST v1/search endpoint to all SQL workflows.

Example Vector-Similarity-Search (VSS) using the vector_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM vector_search(reviews, "Cricket bats")
WHERE country_code="AUS"
LIMIT 3

Example Full-Text-Search (FTS) using the text_search UDTF on the table reviews for the search term "Cricket bats":

SELECT review_id, review_text, review_date, score
FROM text_search(reviews, "Cricket bats")
LIMIT 3

DuckDB v1.3.2 Upgrade: Upgraded DuckDB engine from v1.1.3 to v1.3.2. Key improvements include support for adding primary keys to existing tables, resolution of over-eager unique constraint checking for smoother inserts, and 13% reduced runtime on TPC-H SF100 queries through extensive optimizer refinements. The v1.2.x release of DuckDB was skipped due to a regression in indexes.

Partitioned Acceleration: DuckDB file-based accelerations now support partition_by expressions, enabling queries to scale to large datasets through automatic data partitioning and query predicate pruning. New UDFs, bucket and truncate, simplify partition logic.

New UDFs useful for partition_by expressions:

  • bucket(num_buckets, col): Partitions a column into a specified number of buckets based on a hash of the column value.
  • truncate(width, col): Truncates a column to a specified width, aligning values to the nearest lower multiple (e.g., truncate(10, 101) = 100).

Example Spicepod.yml configuration:

datasets:
- from: s3://my_bucket/some_large_table/
name: my_table
params:
file_format: parquet
acceleration:
enabled: true
engine: duckdb
mode: file
partition_by: bucket(100, account_id) # Partition account_id into 100 buckets

Full-Text-Search (FTS) Index Refresh: Accelerated datasets with search indexes maintain up-to-date results with configurable refresh intervals.

Example refreshing search indexes on body every 10 seconds:

datasets:
- from: github:github.com/spiceai/docs/pulls
name: spiceai.doc.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 10s
columns:
- name: body
full_text_search:
enabled: true
row_id:
- id

Scheduled View Refresh: Accelerated Views now support cron-based refresh schedules using refresh_cron, automating updates for accelerated data.

Example Spicepod.yml configuration:

views:
- name: my_view
sql: SELECT 1
acceleration:
enabled: true
refresh_cron: '0 * * * *' # Every hour

For more details, refer to Scheduled Refreshes.

Multi-column Vector Search: For datasets configured with embeddings on more than one column, POST v1/search and similarity_search perform parallel vector search on each column, aggregating results using reciprocal rank fusion.

Example Spicepod.yml for multi-column search:

datasets:
- from: github:github.com/apache/datafusion/issues
name: datafusion.issues
params:
github_token: ${secrets:GITHUB_TOKEN}
columns:
- name: title
embeddings:
- from: hf_minilm
- name: body
embeddings:
- from: openai_embeddings

AWS Bedrock Embeddings Model Provider: Added support for AWS Bedrock embedding models, including Amazon Titan Text Embeddings and Cohere Text Embeddings.

Example Spicepod.yml:

embeddings:
- from: bedrock:cohere.embed-english-v3
name: cohere-embeddings
params:
aws_region: us-east-1
input_type: search_document
truncate: END
- from: bedrock:amazon.titan-embed-text-v2:0
name: titan-embeddings
params:
aws_region: us-east-1
dimensions: '256'

For more details, refer to the AWS Bedrock Embedding Models Documentation.

Oracle Data Connector: Use from: oracle: to access and accelerate data stored in Oracle databases, deployed on-premises or in the cloud.

Example Spicepod.yml:

datasets:
- from: oracle:"SH"."PRODUCTS"
name: products
params:
oracle_host: 127.0.0.1
oracle_username: scott
oracle_password: tiger

See the Oracle Data Connector documentation.

GitHub Data Connector: The GitHub data connector supports query and acceleration of members, the users of an organization.

Example Spicepod.yml configuration:

datasets:
- from: github:github.com/spiceai/members # General format: github.com/[org-name]/members
name: spiceai.members
params:
# With GitHub Apps (recommended)
github_client_id: ${secrets:GITHUB_SPICEHQ_CLIENT_ID}
github_private_key: ${secrets:GITHUB_SPICEHQ_PRIVATE_KEY}
github_installation_id: ${secrets:GITHUB_SPICEHQ_INSTALLATION_ID}
# With GitHub Tokens
# github_token: ${secrets:GITHUB_TOKEN}

See the GitHub Data Connector Documentation

Spice.ai Cloud Data Connector: Graduated to Stable.

spice-rs SDK Release: The Spice Rust SDK has updated to v3.0.0. This release includes optimizations for the Spice client API, adds robust query retries, and custom metadata configurations for spice queries.

Contributors​

Breaking Changes​

  • Search HTTP API Response: POST v1/search response payload has changed. See the new API documentation for details.
  • Model Provider Parameter Prefixes: Model Provider parameters use provider-specific prefixes instead of openai_ prefixes (e.g., hf_temperature for HuggingFace, anthropic_max_completion_tokens for Anthropic, perplexity_tool_choice for Perplexity). The openai_ prefix remains supported for backward compatibility but is deprecated and will be removed in a future release.

Cookbook Updates​

The Spice Cookbook now includes 72 recipes to help you get started with Spice quickly and easily.

Upgrading​

To upgrade to v1.5.0, download and install the specific binary from github.com/spiceai/spiceai/releases/tag/v1.5.0 or pull the v1.5.0 Docker image (spiceai/spiceai:1.5.0).

What's Changed​

Dependencies​

Changelog​

  • fix: openai model endpoint (#6394) by @Sevenannn in #6394
  • Enable configuring otel endpoint from spice run (#6360) by @Advayp in #6360
  • Enable Oracle connector in default build configuration (#6395) by @sgrebnov in #6395
  • fix llm integraion test (#6398) by @Sevenannn in #6398
  • Promote spice cloud connector to stable quality (#6221) by @Sevenannn in #6221
  • v1.5.0-rc.1 release notes (#6397) by @lukekim in #6397
  • Fix model nsql integration tests (#6365) by @Sevenannn in #6365
  • Fix incorrect UDTF name and SQL query (#6404) by @lukekim in #6404
  • Update v1.5.0-rc.1.md (#6407) by @sgrebnov in #6407
  • Improve error messages (#6405) by @lukekim in #6405
  • build(deps): bump Jimver/cuda-toolkit from 0.2.25 to 0.2.26 (#6388) by @app/dependabot in #6388
  • Upgrade dependabot dependencies (#6411) by @phillipleblanc in #6411
  • Fix projection pushdown issues for document based file connector (#6362) by @Advayp in #6362
  • Add a PartitionedDuckDB Accelerator (#6338) by @kczimm in #6338
  • Use vector_search() UDTF in HTTP APIs (#6417) by @Jeadie in #6417
  • add supported types (#6409) by @kczimm in #6409
  • Enable session time zone override for MySQL (#6426) by @sgrebnov in #6426
  • Acceleration-like indexing for full text search indexes. (#6382) by @Jeadie in #6382
  • Provide error message when partition by expression changes (#6415) by @kczimm in #6415
  • Add support for Oracle Autonomous Database connections (Oracle Cloud) (#6421) by @sgrebnov in #6421
  • prune partitions for exact and in list with and without UDFs (#6423) by @kczimm in #6423
  • Fixes and reenable FTS tests (#6431) by @Jeadie in #6431
  • Upgrade DuckDB to 1.3.2 (#6434) by @phillipleblanc in #6434
  • Fix issue in limit clause for the Github Data connector (#6443) by @Advayp in #6443
  • Upgrade iceberg-rust to 0.5.1 (#6446) by @phillipleblanc in #6446
  • v1.5.0-rc.2 release notes (#6440) by @lukekim in #6440
  • Oracle: add automated TPC-H SF1 benchmark tests (#6449) by @sgrebnov in #6449
  • fix: Update benchmark snapshots (#6455) by @app/github-actions in #6455
  • Preserve ArrowError in arrow_tools::record_batch (#6454) by @mach-kernel in #6454
  • fix: Update benchmark snapshots (#6465) by @app/github-actions in #6465
  • Add option to preinstall Oracle ODPI-C library in Docker image (#6466) by @sgrebnov in #6466
  • Include Oracle connector (federated mode) in automated benchmarks (#6467) by @sgrebnov in #6467
  • Update crates/llms/src/bedrock/embed/mod.rs by @lukekim in #6468
  • v1.5.0-rc.3 release notes (#6474) by @lukekim in #6474
  • Add integration tests for S3 Vectors filters pushdown (#6469) by @sgrebnov in #6469
  • check for indexedtableprovider when finding tables to search on (#6478) by @Jeadie in #6478
  • Parse fully qualified table names in UDTFs (#6461) by @Jeadie in #6461
  • Add integration test for S3 Vectors to cover data update (overwrite) (#6480) by @sgrebnov in #6480
  • Add 'Run all tests' option for models tests and enable Bedrock tests (#6481) by @sgrebnov in #6481
  • Add support for a members table type for the GitHub Data Connector (#6464) by @Advayp in #6464
  • S3 vector data cannot be null (#6483) by @Jeadie in #6483
  • Don't infer FixedSizeList size during indexing vectors. (#6487) by @Jeadie in #6487
  • Add support for retention_sql acceleration param (#6488) by @sgrebnov in #6488
  • Make dataset refresh progress tracing less verbose (#6489) by @sgrebnov in #6489
  • Use RwLock on tantivy index in FullTextDatabaseIndex for update concurrency (#6490) by @Jeadie in #6490
  • Add tests for dataset retention logic and refactor retention code (#6495) by @sgrebnov in #6495
  • Upgade dependabot dependencies (#6497) by @phillipleblanc in #6497
  • Add periodic tracing of data loading progress during dataset refresh (#6499) by @sgrebnov in #6499
  • Promote Oracle Data Connector to Alpha (#6503) by @sgrebnov in #6503
  • Use AWS SDK to provide credentials for Iceberg connectors (#6498) by @phillipleblanc in #6498
  • Add integration tests for partitioning (#6463) by @kczimm in #6463
  • Use top-level table in full-text search JOIN ON (#6491) by @Jeadie in #6491
  • Use accelerated table in vector_search JOIN operations when appropriate (#6516) by @Jeadie in #6516
  • Fix 'additional_column' for quoted columns (fix for qualified columns broke it) (#6512) by @Jeadie in #6512
  • Also use AWS SDK for inferring credentials for S3/Delta/Databricks Delta data connectors (#6504) by @phillipleblanc in #6504
  • Add per-dataset availability monitor configuration (#6482) by @phillipleblanc in #6482
  • Suppress the warning from the AWS SDK if it can't load credentials (#6533) by @phillipleblanc in #6533
  • Change default value of check_availability from default to auto (#6534) by @lukekim in #6534
  • README.md improvements for v1.5.0 (#6539) by @lukekim in #6539
  • Temporary disable s3_vectors_basic (#6537) by @sgrebnov in #6537
  • Ensure binder errors show before query and other (#6374) by @suhuruli in #6374
  • Update spiceai/duckdb-rs -> DuckDB 1.3.2 + index fix (#6496) by @mach-kernel in #6496
  • Update table-providers to latest version with DuckDB fixes (#6535) by @phillipleblanc in #6535
  • S3: default to public access if no auth is provided (#6532) by @sgrebnov in #6532

Spice v1.3.1 (May 26, 2025)

Β· 3 min read
Luke Kim
Founder and CEO of Spice AI

Announcing the release of Spice v1.3.1! πŸ›‘οΈ

Spice v1.3.1 includes improvements to Databricks SQL Warehouse support and parameterized query handling, along with several bugfixes.

What's New in v1.3.1​

  • Databricks SQL Warehouse Added support for the STRUCT type, enabled join pushdown for queries within the same SQL Warehouse and added projection to logical plans to force federation with correct SQL dialect.

  • SQL Improvements: Fixed an issue where ILike was incorrectly optimized to string equality in DataFusion/Arrow and aliased the random() function to rand() for better compatibility.

  • Parameterized Queries: Fixed parameter schema ordering for queries with more than 10 parameters and resolved placeholder inference issues in CASE expressions.

Contributors​

Breaking Changes​

No breaking changes.

Cookbook Updates​

No new cookbook recipes.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading​

To upgrade to v1.3.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.1 image:

docker pull spiceai/spiceai:1.3.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed​

Dependencies​

No major dependency changes.

Changelog​

Full Changelog: github.com/spiceai/spiceai/compare/v1.3.0...v1.3.1

Spice v1.3.0 (May 19, 2025)

Β· 9 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.3.0! 🏎️

Spice v1.3.0 accelerates data and AI applications with significantly improved query performance, reliability, and expanded Databricks integration. New support for the Databricks SQL Statement Execution API enables direct SQL queries on Databricks SQL Warehouses, complementing Mosaic AI model serving and embeddings (introduced in v1.2.2) and existing Databricks catalog and dataset integrations. This release upgrades to DataFusion v46, optimizes results caching performance, and strengthens security with least-privilege sandboxed improvements.

What's New in v1.3.0​

  • Databricks SQL Statement Execution API Support: Added support for the Databricks SQL Statement Execution API, enabling direct SQL queries against Databricks SQL Warehouses for optimized performance in analytics and reporting workflows.

    Example spicepod.yml configuration:

    datasets:
    - from: databricks:spiceai.datasets.my_awesome_table
    name: my_awesome_table
    params:
    mode: sql_warehouse
    databricks_endpoint: ${env:DATABRICKS_ENDPOINT}
    databricks_sql_warehouse_id: ${env:DATABRICKS_SQL_WAREHOUSE_ID}
    databricks_token: ${env:DATABRICKS_TOKEN}

    For details, see the Databricks Data Connector documentation.

  • Improved Results Cache Performance & Hashing Algorithm: Spice now supports an alternative results cache hashing algorithm, ahash, in addition to siphash, being the default. Configure it via:

    runtime:
    results_cache:
    hashing_algorithm: ahash # or siphash

    The hashing algorithm determines how cache keys are hashed before being stored, impacting both lookup speed and protection against potential DOS attacks.

    Using ahash improves performance for large queries or query plans. Combined with results cache optimizations, it reduces 99th percentile request latency and increases total requests/second for queries with large result sets (100k+ cached rows). The following charts show performance tested against the TPCH Query #17 on a scale factor 5 dataset (30+ million rows, 5GB):

    LatencyReq/sec
    Improvements for the 99th percentile query latency, compared against 1.2.2 with cache key type and hashing algorithm.Improvements for the requests/second, compared against 1.2.2 with cache key type and hashing algorithm.

    Note: ahash was not available in v1.2.2, so it is excluded from comparisons.

    To learn more, refer to the Results Cache Hashing Algorithm documentation.

  • SQL Query Performance: Optimized the critical SQL query path, reducing overhead and improving response times for simple queries by 10-20%.

  • DuckDB Acceleration: Fixed a bug in the DuckDB acceleration engine causing query failures under high concurrency when querying datasets accelerated into multiple DuckDB files.

  • Container Security: The container image now runs as a non-root user with enhanced sandboxing and includes only essential dependencies for a slimmer, more secure image.

DataFusion v46 Highlights​

Spice.ai is built on the DataFusion query engine. The v46 release brings:

  • Faster Performance πŸš€: DataFusion 46 introduces significant performance enhancements, including a 2x faster median() function for large datasets without grouping, 10–100% speed improvements in FIRST_VALUE and LAST_VALUE window functions by avoiding sorting, and a 40x faster uuid() function. Additional optimizations, such as a 50% faster repeat() string function, accelerated chr() and to_hex() functions, improved grouping algorithms, and Parquet row group pruning with NOT LIKE filters, further boost overall query efficiency.

  • New range() Table Function: A new table-valued function range(start, stop, step) has been added to make it easy to generate integer sequences β€” similar to PostgreSQL’s generate_series() or Spark’s range(). Example: SELECT * FROM range(1, 10, 2);

  • UNION [ALL | DISTINCT] BY NAME Support: DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which align columns by name instead of position. This matches functionality found in systems like Spark and DuckDB and simplifies combining heterogeneously ordered result sets.

    Example:

    SELECT col1, col2 FROM t1
    UNION ALL BY NAME
    SELECT col2, col1 FROM t2;

See the DataFusion 46.0.0 release notes for details.

Spice.ai adopts the latest minus one DataFusion release for quality assurance and stability. The upgrade to DataFusion v47 is planned for Spice v1.4.0 in June.

Contributors​

Breaking Changes​

The container image now always runs as a non-root user (UID/GID 65534) with minimal dependencies, resulting in a smaller, more secure image. Standard Linux tools, including bash, are no longer included.

Kubernetes Deployments:

  • Use of the v1.3.0+ Helm chart is required, which includes a securityContext ensuring the sandbox user has required file access.

  • For deployments using a lower version than the v1.3.0 Helm chart, add the following securityContext to the pod specification:

securityContext:
runAsUser: 65534
runAsGroup: 65534
fsGroup: 65534

See the Docker Sandbox Guide for details on how to update custom Docker images to restore the previous behavior.

Cookbook Updates​

  • Added Accelerated Views: Pre-calculate and materialize data derived from one or more underlying datasets.

The Spice Cookbook now includes 67 recipes to help you get started with Spice quickly and easily.

Upgrading​

To upgrade to v1.3.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.3.0 image:

docker pull spiceai/spiceai:1.3.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed​

Dependencies​

Changelog​

See the full list of changes at: v1.2.2...v1.3.0