Spice.ai Features
Spice provides a set of features for building data-driven applications and AI agents. This page gives an overview of each feature area.
Data Query and Federationโ
Query Federation connects multiple data sourcesโdatabases, data warehouses, and data lakesโthrough a single SQL interface. Write one query that joins data across PostgreSQL, Snowflake, S3, and other sources. Spice pushes query operations to source databases when possible to reduce data transfer.
Data Acceleration and Cachingโ
Data Acceleration materializes remote datasets locally in memory or on disk using engines like Arrow, DuckDB, SQLite, or PostgreSQL. Accelerated datasets stay current through scheduled refreshes, append mode, or Change Data Capture (CDC). Caching stores query and search results in memory with configurable TTLs and eviction policies to avoid redundant computation.
AI and Language Modelsโ
Large Language Models provides an OpenAI-compatible API gateway for hosted models (OpenAI, Anthropic, xAI) and locally served models (Llama, Phi) with CUDA and Metal acceleration. Models can call tools to query datasets, run SQL, and retrieve schemas. Embeddings generates vector representations of text for semantic search and RAG workflows.
Searchโ
Search supports three methods: vector search (semantic similarity using embeddings), full-text search (keyword matching with BM25 scoring), and hybrid search (combining both with Reciprocal Rank Fusion). All search methods are accessible through SQL UDTFs like vector_search() and text_search().
Functionsโ
Functions extend SQL with custom scalar functions declared in a Spicepod. Inline SQL bodies run in-process and can use any DataFusion built-in; remote http:// / https:// endpoints batch row inputs over JSON for delegating logic to ML models, internal services, or custom code. Every function is automatically callable from SQL and (by default) surfaced as an LLM tool.
Tool Registryโ
Tool Registry keeps per-turn token cost bounded as the runtime's tool catalog grows. It replaces individual tool definitions with searchable tool_search and tool_invoke meta-tools backed by a hybrid full-text, keyword, schema, and vector search. Applies uniformly to built-in tools, MCP tools, and Functions declared with as_tool: true โ typically a ~10ร reduction in tool-definition tokens for tool-heavy Spicepods.
Monitoring and Observabilityโ
Observability exposes Prometheus-compatible metrics, OpenTelemetry metric export, and distributed tracing with Zipkin. Integrations are available for Datadog, Grafana, and other monitoring platforms.
๐๏ธ Query Federation
2 items
๐๏ธ Data Acceleration
7 items
๐๏ธ Caching
Learn how to use Spice in-memory caching
๐๏ธ Distributed Query
Learn how to run Spice in distributed mode for larger scale queries, including the async queries API.
๐๏ธ Change Data Capture
1 item
๐๏ธ Data Ingestion
Learn how to ingest data in Spice.
๐๏ธ Large Language Models
6 items
๐๏ธ Machine Learning Models
Spice supports loading and serving ONNX models for inference, from sources including local filesystems, Hugging Face, and the Spice.ai Cloud platform.
๐๏ธ Embedding Datasets
Learn how to define, or augment existing datasets with embedding column(s).
๐๏ธ Search
4 items
๐๏ธ Functions
Define custom scalar and table SQL functions inline (SQL tier) or by calling remote HTTP services (Remote tier), automatically exposed as SQL functions and LLM tools.
๐๏ธ Semantic Model
Learn how to define and use semantic data models with Spice.
๐๏ธ Observability
1 item
๐๏ธ Tool Registry
Reduce per-turn token cost and improve LLM tool selection accuracy by replacing individual tool definitions with searchable tool_search and tool_invoke meta-tools backed by hybrid full-text, keyword, schema, and vector search.
๐๏ธ Web Search
Learn how Spice can perform web search
