Spice.ai OSS blog | Spice.ai OSS

Spice.ai v0.7-alpha

February 23, 2024 · 2 min read

Co-Founder and CTO of Spice AI

Announcing the release of Spice v0.7-alpha! 🏹

Spice v0.7-alpha is an all new implementation of Spice written in Rust. The Spice v0.7 runtime provides developers with a unified SQL query interface to locally accelerate and query data tables sourced from any database, data warehouse, or data lake.

Learn more and get started in minutes with the updated Quickstart in the repository README!

Highlights in v0.7-alpha

DataFusion SQL Query Engine: Spice v0.7 leverages the Apache DataFusion query engine to provide very fast, high quality SQL query across one or more local or remote data sources.

Data tables can be locally accelerated using Apache Arrow in-memory or by DuckDB.

New in this release

Adds runtime rewritten in Rust for high-performance.
Adds Apache DataFusion SQL query engine.
Adds The Spice.ai platform as a data source.
Adds Dremio as a data source.
Adds OpenTelemetry (OTEL) collector.
Adds local data table acceleration.
Adds DuckDB file or in-memory as a data table acceleration engine.
Adds In-memory Apache Arrow as a data table acceleration engine.
Removes the built-in AI training engine; now cloud-based and provided by the Spice.ai platform.
Removes the built-in dashboard and web-interface; now cloud-based and provided by the Spice.ai platform.

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Building on Apache Arrow and Flight

May 10, 2022 · 5 min read

Luke Kim

Founder and CEO of Spice AI

In February, we announced Spice.ai OSS v0.6 with its data processing and transport completely rebuilt upon Apache Flight. This enables Spice.ai OSS to scale to datasets 10-100 times larger and brings Spice.ai into the Apache Arrow ecosystem paving the way for integrations with many popular projects, like Apache Parquet, pandas and big data systems like Hive, Drill, Spark, Snowflake, BigQuery, and many more.

In Spice.ai OSS v0.6.1 we announced a new big data system integration… our own, Spice.xyz!

Figure 1. Spice.xyz - Data and AI infrastructure for web3

Integration with Spice.xyz

Spice.xyz is data and AI infrastructure for web3.

It’s web3 data made easy. Insanely fast and purpose designed for applications and ML.

Spice.xyz delivers data in Apache Arrow format, over high-performance Apache Arrow Flight APIs to your application, notebook, ML pipeline, and of course, to the Spice.ai runtime.

With Spice.ai OSS v0.6.1, a new Apache Arrow Flight data connector was made available, creating a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with Spice.xyz, developers can quickly and easily build web3 data-driven applications that learn and adapt using Spice.ai.

To read the announcement post for Spice.xyz, visit blog.spice.xyz.

Apache Arrow and Flight Core

Apache Arrow is a specification for an in-memory columnar data format that’s very efficient for analytics operations. Arrow’s zero-copy read semantics coupled with the Flight client-server framework mean extremely fast and efficient data transport and access without serialization overhead. This enables high-performance bulk-data scenarios, critical for data-driven applications and ML. These properties enable an open-architecture based on Apache Arrow, Flight, and Parquet.

Paul Dix, CTO of InfluxData wrote a fantastic post on the Arrow ecosystem and why the future core of InfluxDB is built with Arrow. Sam Crowder also wrote A (Recent) History of Batch Data showing how Arrow is a cornerstone of modern data architecture.

Joining projects like InfluxDB, the core of both Spice.ai OSS and Spice.xyz are built with a foundation of Arrow and Flight. This means they benefit from the same high-performance data operations, they work great with each other and other projects in the ecosystem.

Exciting New Use Cases

Betting on Arrow in Spice.ai enables exciting new applications because AI needs AI-ready data.

Previously it was difficult to efficiently get bulk data from a provider like Spice.xyz to the Spice.ai engine, but now it's just a matter of configuring the connection through a few lines of YAML.

Imagine creating an application to trade NFTs. With Spice.xyz, developers can query Ethereum for data relating to NFT trading activity. That data is then delivered with the high-performance Arrow format to the Spice.ai runtime. The application’s Spicepod could learn how to value NFTs based upon it’s trading history and the communities it’s owners have been engaged in. And this could be all done in real-time, something not feasible before.

In addition, using the Arrow Flight connector, other exciting applications are enabled across a ton of domains, like IoT, financial applications, security monitoring, and many more.

What's Next

To get somewhere you need a goal or destination, a vehicle to get there, and fuel for that vehicle.

When it comes to intelligent, AI-driven applications, Spice.xyz now provides the Spice.ai vehicle with a massive pipeline of web3 data fuel.

The next step is to make it easier for developers to define the destination for the vehicle. Upcoming on the Spice.ai OSS roadmap is the ability for developers to define goals for how the decision-engine should learn. Like learning to maximize measurement “A” or optimizing to a target of “B”.

For example, in web3, this might be to build a client that can learn and adapt to optimize Ethereum Gas Fee prices for token swaps. The goal would be to minimize the gas fee, a problem we experienced first-hand when we built defly.ai. Today you have to encode that goal into your reward function, but our plan is to help do that for you, and all you have to do is tell us the end goal.

Goal-oriented learning applies to many domains, whether it be minimizing fees in crypto or maximizing engagement on a social platform. And personally, we’re excited about the eventual ability to apply Spice.ai and just say “minimize my taxes” :-)

Learn More and Contribute

Even for advanced developers, building intelligent apps that leverage AI is still way too hard. Our mission is to make this as easy as creating a modern web page. If that vision resonates with you, join us!

If you’d like to get involved, we’d love to talk. Try out Spice.ai OSS, Spice.xyz, email us “hey,” get in touch on Discord, or reach out on Twitter.

Luke

Spice.ai v0.6.1-alpha

April 21, 2022 · 2 min read

Luke Kim

Founder and CEO of Spice AI

Announcing the release of Spice.ai v0.6.1-alpha! 🌶

Building upon the Apache Arrow support in v0.6-alpha, Spice.ai now includes new Apache Arrow data processor and Apache Arrow Flight data connector components! Together, these create a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with big data systems from the Apache Arrow ecosystem like Hive, Drill, Spark, Snowflake, and BigQuery, it's now easier than ever to combine big data with Spice.ai.

And we're also excited to announce the release of Spice.xyz! 🎉

Spice.xyz is data and AI infrastructure for web3. It’s web3 data made easy. Insanely fast and purpose designed for applications and ML.

Spice.xyz delivers data in Apache Arrow format, over high-performance Apache Arrow Flight APIs to your application, notebook, ML pipeline, and of course through these new data components, to the Spice.ai runtime.

Read the announcement post at blog.spice.ai.

Spice.xyz

New in this release

Adds Apache Arrow Data Processor
Adds Apache Arrow Flight Data Connector

Now built with Go 1.18.

Dependency updates

Updates to React 18
Updates to CRA 5
Updates to Glide DataGrid 4
Updates to SWR 1.2
Updates to TypeScript 4.6

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice.ai v0.6-alpha

February 8, 2022 · 3 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice.ai v0.6-alpha! 🏹

Spice.ai now scales to datasets 10-100 larger enabling new classes of uses cases and applications! 🚀 We've completely rebuilt Spice.ai's data processing and transport upon Apache Arrow, a high-performance platform that uses an in-memory columnar format. Spice.ai joins other major projects including Apache Spark, pandas, and InfluxDB in being powered by Apache Arrow. This also paves the way for high-performance data connections to the Spice.ai runtime using Apache Arrow Flight and import/export of data using Apache Parquet. We're incredibly excited about the potential this architecture has for building intelligent applications on top of a high-performance transport between application data sources the Spice.ai AI engine.

Highlights in v0.6-alpha

Massive improvement in data loading performance and dataset scale

From data connectors, to REST API, to AI engine, we've now rebuilt Spice.ai's data processing and transport on the Apache Arrow project. Specifically, using the Apache Arrow for Go implementation. Many thanks to Matt Topol for his contributions to the project and guidance on using it.

This release includes a change to the Spice.ai runtime to AI Engine transport from sending text CSV over gGPC to Apache Arrow Records over IPC (Unix sockets).

This is a breaking change to the Data Processor interface, as it now uses arrow.Record instead of Observation.

Benchmarking v0.6

Performance Graph

Before v0.6, Spice.ai would not scale into the 100s of 1000s of rows.

Format	Row Number	Data Size	Process Time	Load Time	Transport time	Memory Usage
csv	2,000	163.15KiB	3.0005s	0.0000s	0.0100s	423.754MiB
csv	20,000	1.61MiB	2.9765s	0.0000s	0.0938s	479.644MiB
csv	200,000	16.31MiB	0.2778s	0.0000s	NA (error)	0.000MiB
csv	2,000,000	164.97MiB	0.2573s	0.0050s	NA (error)	0.000MiB
json	2,000	301.79KiB	3.0261s	0.0000s	0.0282s	422.135MiB
json	20,000	2.97MiB	2.9020s	0.0000s	0.2541s	459.138MiB
json	200,000	29.85MiB	0.2782s	0.0010s	NA (error)	0.000MiB
json	2,000,000	300.39MiB	0.3353s	0.0080s	NA (error)	0.000MiB

After building on Arrow, Spice.ai now easily scales beyond millions of rows.

Format	Row Number	Data Size	Process Time	Load Time	Transport time	Memory Usage
csv	2,000	163.14KiB	2.8281s	0.0000s	0.0194s	439.580MiB
csv	20,000	1.61MiB	2.7297s	0.0000s	0.0658s	461.836MiB
csv	200,000	16.30MiB	2.8072s	0.0020s	0.4830s	639.763MiB
csv	2,000,000	164.97MiB	2.8707s	0.0400s	4.2680s	1897.738MiB
json	2,000	301.80KiB	2.7275s	0.0000s	0.0367s	436.238MiB
json	20,000	2.97MiB	2.8284s	0.0000s	0.2334s	473.550MiB
json	200,000	29.85MiB	2.8862s	0.0100s	1.7725s	824.089MiB
json	2,000,000	300.39MiB	2.7437s	0.0920s	16.5743s	4044.118MiB

New in this release

Adds Apache Arrow data processing and transport.
Fixes TensorBoard logging and monitoring when using GitHub Codespaces and Docker.
Adds Polling HTTP Data Connector

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Adding Soft Actor-Critic

January 12, 2022 · 6 min read

Corentin Risselin

Software Engineer at Spice AI

Last month in the v0.5-alpha version, a new learning algorithm was added to Spice.ai: Soft Actor-Critic. This is a very popular algorithm in the Reinforcement Learning field. Let's see what it is and why this is an interesting addition.

The previous article Understanding Q-learning: How a Reward Is All You Need is not necessary but can be helpful to understand this article.

What is Soft Actor-Critic

Actor-Critic

Deepmind first introduced the actor-critic approach in deep learning in a 2016 paper. We can think of this approach as having 2 tasks:

Choosing actions to take: giving probabilities for each possible action (the policy)
Evaluating values for each action: the estimated reward from those actions (the Q-values)

Those tasks will be made by 2 different neural networks or a single network that branches out in 2 heads. The actor is the part that outputs the policy, while the critic outputs the values.

Actor-Critic Diagram

In most cases, this model was proven to perform very well, better than Deep Q-Learning. The actor is trained to prefer actions associated with the best values from the critic. The critic is trained to correctly estimate rewards (current and future ones) of the actions.

Both will improve over time though we have to keep in mind that the critic is unlikely to evaluate all possible actions in the environment as it will only see actions from states that the actor is likely to take (the policy).

This bias of the system toward its policy is important: the algorithm is meant to train on-policy. The duo actor-critic works together: trying to train it with inputs and outputs from another system (humans or even itself in past iterations of its own training) will not work.

Multiple improvements were made to limit the bias of the actor-critic approach but the necessity to train on-policy remains. This is very limiting as being able to train from any experience can be very valuable for time and data efficiency.

Soft Actor-Critic

Soft Actor-Critic allows an Actor-Critic network to train off-policy. It was introduced in a paper in 2018 and included multiple additions to improve its parent algorithm. The main difference is the introduction of the entropy of the actor outputs during the training phase.

The entropy measures the chaos/order of a system (or uncertainty). If a system always acts the same way, the entropy is minimal. Here the actor's entropy is maximum if all possible actions have the same weight (same probability) and minimum if the actor always chose only a single action with 100% confidence.

During the training phase, the actor is trained to maintain the entropy of its outputs at a specific value.

The introduction of the entropy changes the goal of the training not only to find the bests output but to keep exploring the other actions. The critic part will be trained on all actions, even if they may occur only in rare cases.

There are other essential parts, such as having 2 critics and being able to output continuous values, but the entropy is the crucial difference in this algorithm's training and potential.

Adding choices to Spice.AI learning algorithms

As we saw above, the Actor-Critic algorithm is known to outperform Deep Q-Learning in most cases. If we also want to leverage previous data (off-policy training), Soft Actor-Critic is a natural choice. This approach is heavier despite better theoretical results, making it more suitable for complex tasks. For simpler tasks, Deep Q-Learning will still be an appealing option for its speed of training and its capability to quickly convergence to a good solution.

We can think of Soft Actor-Critic as a complex machine designed to take actions while keeping a variety of possibilities. Sometimes several options seem equally rewarding: a simpler algorithm would take what it evaluates as the best one even though the margin is small and the precision of its evaluation shouldn't be enough. This tendency to quickly convergence to a solution has its benefits and inconveniences.

Implementation in the source code

Adding new algorithms is essential to Spice.ai, so the procedure was designed to be straightforward.

Looking a the source code, the code related to training agents is in the ai/src folder. This part of the code uses the python language as most modern AI libraries are distributed in this language.

In this folder, every agent is in the algorithms folder, and each has its subfolder. There is an agent_interface file that defines the main class that the different agents should inherit from and a factory script responsible for creating instances of an agent from a given algorithm name.

Adding a new agent is simple:

making a new folder in the algorithms
adding a json file describing the algorithm_id, name, and docs_link (see other json as an example) in the folder
adding a new python file with a class that would inherit from the SpiceAIAgent defined in the agent_interface script
adding a line in the factory script to instantiate the new implementation when its name is called.

For the new agent, inheriting from the main SpiceAIAgent class, 5 functions need to be implemented:

add_experience: storing inputs and outputs (used during the training)
act: returning the action to be taken from a given input
save: saving the agent to a given a path
load: restoring the agent from a given path
learn: train iteration (from the accumulated experiences)

Conclusion

Soft Actor-Critic is a fascinating algorithm that performs well in complex environments. We now support Soft Actor Critic in Spice.ai, which is another step forward in constantly improving the performance of the AI engine. Additionally, we'll continue improving existing algorithms and adding newer ones over time. We designed the platform for ease of implementation and experimentation so if you'd like to try building your own agent, you can get the source code on Github and contribute to the platform. Say hi on Discord, reach out on Twitter or email us.

I hope you enjoy this post and something new.

Corentin

Highlights in v0.7-alpha​

New in this release​

Resources​

Community​

Integration with Spice.xyz​

Apache Arrow and Flight Core​

Exciting New Use Cases​

What's Next​

Learn More and Contribute​

New in this release​

Dependency updates​

Resources​

Community​

Highlights in v0.6-alpha​

Massive improvement in data loading performance and dataset scale​

Benchmarking v0.6​

New in this release​

Resources​

Community​

What is Soft Actor-Critic​

Actor-Critic​

Soft Actor-Critic​

Adding choices to Spice.AI learning algorithms​

Implementation in the source code​

Conclusion​

Highlights in v0.7-alpha

New in this release

Resources

Community

Integration with Spice.xyz

Apache Arrow and Flight Core

Exciting New Use Cases

What's Next

Learn More and Contribute

New in this release

Dependency updates

Resources

Community

Highlights in v0.6-alpha

Massive improvement in data loading performance and dataset scale

Benchmarking v0.6

New in this release

Resources

Community

What is Soft Actor-Critic

Actor-Critic

Soft Actor-Critic

Adding choices to Spice.AI learning algorithms

Implementation in the source code

Conclusion