Spice.ai OSS blog | Spice.ai OSS

Spice.ai v0.6.1-alpha

April 21, 2022 · 2 min read

Founder and CEO of Spice AI

Announcing the release of Spice.ai v0.6.1-alpha! 🌶

Building upon the Apache Arrow support in v0.6-alpha, Spice.ai now includes new Apache Arrow data processor and Apache Arrow Flight data connector components! Together, these create a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with big data systems from the Apache Arrow ecosystem like Hive, Drill, Spark, Snowflake, and BigQuery, it's now easier than ever to combine big data with Spice.ai.

And we're also excited to announce the release of Spice.xyz! 🎉

Spice.xyz is data and AI infrastructure for web3. It’s web3 data made easy. Insanely fast and purpose designed for applications and ML.

Spice.xyz delivers data in Apache Arrow format, over high-performance Apache Arrow Flight APIs to your application, notebook, ML pipeline, and of course through these new data components, to the Spice.ai runtime.

Read the announcement post at blog.spice.ai.

Spice.xyz

New in this release

Adds Apache Arrow Data Processor
Adds Apache Arrow Flight Data Connector

Now built with Go 1.18.

Dependency updates

Updates to React 18
Updates to CRA 5
Updates to Glide DataGrid 4
Updates to SWR 1.2
Updates to TypeScript 4.6

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice.ai v0.6-alpha

February 8, 2022 · 3 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice.ai v0.6-alpha! 🏹

Spice.ai now scales to datasets 10-100 larger enabling new classes of uses cases and applications! 🚀 We've completely rebuilt Spice.ai's data processing and transport upon Apache Arrow, a high-performance platform that uses an in-memory columnar format. Spice.ai joins other major projects including Apache Spark, pandas, and InfluxDB in being powered by Apache Arrow. This also paves the way for high-performance data connections to the Spice.ai runtime using Apache Arrow Flight and import/export of data using Apache Parquet. We're incredibly excited about the potential this architecture has for building intelligent applications on top of a high-performance transport between application data sources the Spice.ai AI engine.

Highlights in v0.6-alpha

Massive improvement in data loading performance and dataset scale

From data connectors, to REST API, to AI engine, we've now rebuilt Spice.ai's data processing and transport on the Apache Arrow project. Specifically, using the Apache Arrow for Go implementation. Many thanks to Matt Topol for his contributions to the project and guidance on using it.

This release includes a change to the Spice.ai runtime to AI Engine transport from sending text CSV over gGPC to Apache Arrow Records over IPC (Unix sockets).

This is a breaking change to the Data Processor interface, as it now uses arrow.Record instead of Observation.

Benchmarking v0.6

Performance Graph

Before v0.6, Spice.ai would not scale into the 100s of 1000s of rows.

Format	Row Number	Data Size	Process Time	Load Time	Transport time	Memory Usage
csv	2,000	163.15KiB	3.0005s	0.0000s	0.0100s	423.754MiB
csv	20,000	1.61MiB	2.9765s	0.0000s	0.0938s	479.644MiB
csv	200,000	16.31MiB	0.2778s	0.0000s	NA (error)	0.000MiB
csv	2,000,000	164.97MiB	0.2573s	0.0050s	NA (error)	0.000MiB
json	2,000	301.79KiB	3.0261s	0.0000s	0.0282s	422.135MiB
json	20,000	2.97MiB	2.9020s	0.0000s	0.2541s	459.138MiB
json	200,000	29.85MiB	0.2782s	0.0010s	NA (error)	0.000MiB
json	2,000,000	300.39MiB	0.3353s	0.0080s	NA (error)	0.000MiB

After building on Arrow, Spice.ai now easily scales beyond millions of rows.

Format	Row Number	Data Size	Process Time	Load Time	Transport time	Memory Usage
csv	2,000	163.14KiB	2.8281s	0.0000s	0.0194s	439.580MiB
csv	20,000	1.61MiB	2.7297s	0.0000s	0.0658s	461.836MiB
csv	200,000	16.30MiB	2.8072s	0.0020s	0.4830s	639.763MiB
csv	2,000,000	164.97MiB	2.8707s	0.0400s	4.2680s	1897.738MiB
json	2,000	301.80KiB	2.7275s	0.0000s	0.0367s	436.238MiB
json	20,000	2.97MiB	2.8284s	0.0000s	0.2334s	473.550MiB
json	200,000	29.85MiB	2.8862s	0.0100s	1.7725s	824.089MiB
json	2,000,000	300.39MiB	2.7437s	0.0920s	16.5743s	4044.118MiB

New in this release

Adds Apache Arrow data processing and transport.
Fixes TensorBoard logging and monitoring when using GitHub Codespaces and Docker.
Adds Polling HTTP Data Connector

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: hey@spice.ai

Adding Soft Actor-Critic

January 12, 2022 · 6 min read

Corentin Risselin

Software Engineer at Spice AI

Last month in the v0.5-alpha version, a new learning algorithm was added to Spice.ai: Soft Actor-Critic. This is a very popular algorithm in the Reinforcement Learning field. Let's see what it is and why this is an interesting addition.

The previous article Understanding Q-learning: How a Reward Is All You Need is not necessary but can be helpful to understand this article.

What is Soft Actor-Critic

Actor-Critic

Deepmind first introduced the actor-critic approach in deep learning in a 2016 paper. We can think of this approach as having 2 tasks:

Choosing actions to take: giving probabilities for each possible action (the policy)
Evaluating values for each action: the estimated reward from those actions (the Q-values)

Those tasks will be made by 2 different neural networks or a single network that branches out in 2 heads. The actor is the part that outputs the policy, while the critic outputs the values.

Actor-Critic Diagram

In most cases, this model was proven to perform very well, better than Deep Q-Learning. The actor is trained to prefer actions associated with the best values from the critic. The critic is trained to correctly estimate rewards (current and future ones) of the actions.

Both will improve over time though we have to keep in mind that the critic is unlikely to evaluate all possible actions in the environment as it will only see actions from states that the actor is likely to take (the policy).

This bias of the system toward its policy is important: the algorithm is meant to train on-policy. The duo actor-critic works together: trying to train it with inputs and outputs from another system (humans or even itself in past iterations of its own training) will not work.

Multiple improvements were made to limit the bias of the actor-critic approach but the necessity to train on-policy remains. This is very limiting as being able to train from any experience can be very valuable for time and data efficiency.

Soft Actor-Critic

Soft Actor-Critic allows an Actor-Critic network to train off-policy. It was introduced in a paper in 2018 and included multiple additions to improve its parent algorithm. The main difference is the introduction of the entropy of the actor outputs during the training phase.

The entropy measures the chaos/order of a system (or uncertainty). If a system always acts the same way, the entropy is minimal. Here the actor's entropy is maximum if all possible actions have the same weight (same probability) and minimum if the actor always chose only a single action with 100% confidence.

During the training phase, the actor is trained to maintain the entropy of its outputs at a specific value.

The introduction of the entropy changes the goal of the training not only to find the bests output but to keep exploring the other actions. The critic part will be trained on all actions, even if they may occur only in rare cases.

There are other essential parts, such as having 2 critics and being able to output continuous values, but the entropy is the crucial difference in this algorithm's training and potential.

Adding choices to Spice.AI learning algorithms

As we saw above, the Actor-Critic algorithm is known to outperform Deep Q-Learning in most cases. If we also want to leverage previous data (off-policy training), Soft Actor-Critic is a natural choice. This approach is heavier despite better theoretical results, making it more suitable for complex tasks. For simpler tasks, Deep Q-Learning will still be an appealing option for its speed of training and its capability to quickly convergence to a good solution.

We can think of Soft Actor-Critic as a complex machine designed to take actions while keeping a variety of possibilities. Sometimes several options seem equally rewarding: a simpler algorithm would take what it evaluates as the best one even though the margin is small and the precision of its evaluation shouldn't be enough. This tendency to quickly convergence to a solution has its benefits and inconveniences.

Implementation in the source code

Adding new algorithms is essential to Spice.ai, so the procedure was designed to be straightforward.

Looking a the source code, the code related to training agents is in the ai/src folder. This part of the code uses the python language as most modern AI libraries are distributed in this language.

In this folder, every agent is in the algorithms folder, and each has its subfolder. There is an agent_interface file that defines the main class that the different agents should inherit from and a factory script responsible for creating instances of an agent from a given algorithm name.

Adding a new agent is simple:

making a new folder in the algorithms
adding a json file describing the algorithm_id, name, and docs_link (see other json as an example) in the folder
adding a new python file with a class that would inherit from the SpiceAIAgent defined in the agent_interface script
adding a line in the factory script to instantiate the new implementation when its name is called.

For the new agent, inheriting from the main SpiceAIAgent class, 5 functions need to be implemented:

add_experience: storing inputs and outputs (used during the training)
act: returning the action to be taken from a given input
save: saving the agent to a given a path
load: restoring the agent from a given path
learn: train iteration (from the accumulated experiences)

Conclusion

Soft Actor-Critic is a fascinating algorithm that performs well in complex environments. We now support Soft Actor Critic in Spice.ai, which is another step forward in constantly improving the performance of the AI engine. Additionally, we'll continue improving existing algorithms and adding newer ones over time. We designed the platform for ease of implementation and experimentation so if you'd like to try building your own agent, you can get the source code on Github and contribute to the platform. Say hi on Discord, reach out on Twitter or email us.

I hope you enjoy this post and something new.

Corentin

What Data Informs AI-driven Decision Making?

January 4, 2022 · 8 min read

Luke Kim

Founder and CEO of Spice AI

AI unlocks a new generation of intelligent applications that learn and adapt from data. These applications use machine learning (ML) to out-perform traditionally developed software. However, the data engineering required to leverage ML is a significant challenge for many product teams. In this post, we'll explore the three classes of data you need to build next-generation applications and how Spice.ai handles runtime data engineering for you.

While ML has many different applications, one way to think about ML in a real-time application that can adapt is as a decision engine. Phillip discussed decision engines and their potential uses in A New Class of Applications That Learn and Adapt. This decision engine learns and informs the application how to operate. Of course, applications can and do make decisions without ML, but a developer normally has to code that logic. And the intelligence of that code is fixed, whereas ML enables a machine to constantly find the appropriate logic and evolve the code as it learns. For ML to do this, it needs three classes of data.

The three classes of data for informed decision making

We don't want any decision, though. We want high-quality, informed decisions. If you consider making higher quality, informed decisions over time, you need three classes of information. These classes are historical information, real-time or present information, and the results of your decisions.

Especially recently, stock or crypto trading is something many of us can relate to. To make high-quality, informed investing decisions, you first need general historical information on the price, security, financials, industry, previous trades, etc. You study this information and learn what might make a good investment or trade.

Second, you need a real-time updated stream of data as it happens to make a decision. If you were stock trading, this information might be the stock price on the day or hour you want to make the trade. You need to apply what you learned from historical data to the current information to decide what trade to place.

Finally, if we're going to make better decisions over time, we need to capture and learn from the results of those decisions. Whether you make a great or poor trade, you want to incorporate that experience into your historical learning.

Three data classes

Using all three data classes together results in higher quality decisions over time. Broad data across these classes are useful, and we could make some nice trades with that. Still, we can make an even higher quality trading decision with personal context. For example, we may want to consider the individual tax consequences or risk level of the trade for our situation. So each of these classes also comes with global or local variants. We combine global information, like what worked well for everyone, and local experience, what worked well for us and our situation, to make the best, overall informed decision.

The waterfall approach to data engineering

Consider how you would capture these three data classes and make them available to both the application and ML in the trading example. This data engineering can be a pretty big challenge.

First, you need a way to gather and consume historical information, like stock prices, and keep that updated over time. You need to handle streaming constantly updated real-time data to make runtime decisions on how to operate. You need to capture and match the decisions you make and feed that back into learning. And finally, you need a way to provide personal or local context, like holding off on sell trades until next year, to stay within a tax threshold, or identifying a pattern you like to trade. If all this wasn't enough, as we learned from Phillip's AI needs AI-ready data post, all three data classes need to be in a format that ML can use.

Traditional app and data integration.

If you can afford a data or ML team, they may do much of this for you. However, this model starts to look quite waterfall-like and is not suited well to applications that want to learn and adapt in real-time. Like a waterfall approach, you would provide requirements to your data team, and they would do the data engineering required to provide you with the first two classes of data, historical and real-time. They may give you ML-ready data or train an ML model for you. However, there is often a large latency to apply that data or model in your application and a long turn-around time if it does not meet your requirements. In addition, to capture the third class of data, you would need to capture and send the results of the decisions your application made as a result of using those models back to the data team to incorporate in future learning. This latency through the data, decision-making, learning, and adaptation process is often infeasible for a real-world app.

And, if you can't afford a data team, you have to figure out how to do all that yourself.

The agile approach

Modern software engineering practices have favored agile methodologies to reduce time to learn and adapt applications to customer and business needs. Spice.ai takes inspiration from agile methods to provide developers with a fast, iterative development cycle.

Spice.ai provides mechanisms for making all three classes of data available to both the application and the decision engine. Developers author Spicepods declaring how data should be captured, consumed, and made ML-ready so that all three classes are consistent and ML available.

The Spice.ai runtime exposes developer-friendly APIs and data connectors for capturing and consuming data and annotating that data with personal context. The runtime generates AI-ready data for you and makes it available directly for ML. These APIs also make it easy to capture application decisions and incorporate the resulting learning.

The Spice.ai approach short circuits the traditional waterfall-like data process by keeping as much data as possible application local instead of round-tripping through an external pipeline or team, especially valuable for real-time data. The application can learn and adapt faster by reducing the latency of decision consequences to learning.

Spice.ai enables personalized learning from personal context and experiences through the interpretations mechanism. Interpretations allow an application to provide additional information or an "interpretation" of a time range as input to learning. The trading example could be as simple as labeling a time range as a good time to buy or providing additional contextual information such as tax considerations, etc. Developers can also use interpretations to record the results of decisions with more context than what might be available in the observation space. You can read more about Interpretations in the Spice.ai docs.

While Spice.ai focuses on ensuring consistent ML-ready data is available, it does not replace traditional data systems or teams. They still have their place, especially for large historical datasets, and Spice.ai can consume data produced by them. Where possible, especially for application and real-time data, Spice.ai keeps runtime data local to create a virtuous cycle of data from the application to the decision engine and back again, enabling faster and more agile learning and adaption.

App with Spice.ai.

Summary

In summary, to build an intelligent application driven from AI recommended decisions, a significant amount of data engineering can be required to learn, make decisions, and incorporate the results. The Spice.ai runtime enables you as a developer to focus on consuming those decisions and tuning how the AI engine should learn rather than the runtime data engineering.

The potential of the next generation of intelligent applications to improve the quality of our lives is very exciting. Using AI to help applications make better decisions, whether that be AI-assisted investing, improving the energy efficiency of our homes and buildings, or supporting us in deciding on the most appropriate medical treatment, is very promising.

Learn more and contribute

Even for advanced developers, building intelligent apps that leverage AI is still way too hard. Our mission is to make this as easy as creating a modern web page. If that vision resonates with you, join us!

If you want to get involved, we'd love to talk. Try out Spice.ai, email us "hey," get in touch on Discord, or reach out on Twitter.

Luke

A New Class of Applications That Learn and Adapt

December 30, 2021 · 5 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

A new class of applications that learn and adapt is becoming possible through machine learning (ML). These applications learn from data and make decisions to achieve the application's goals. In the post Making apps that learn and adapt, Luke described how developers integrate this ability to learn and adapt as a core part of the application's logic. You can think of the component that does this as a "decision engine." This post will explore a brief history of decision engines and use-cases for this application class.

History of decision engines

The idea to make intelligent decision-making applications is not new. Developers first created these applications around the 1970s¹, and they are some of the earliest examples of using artificial intelligence to solve real-world problems.

The first applications used a class of decision engines called "expert systems." A distinguishing trait of expert systems is that they encode human expertise in rules for decision-making. Domain experts created combinations of rules that powered decision-making capabilities.

Some uses of expert systems include:

However, the resources required to build expert systems make employing them infeasible for many applications². They often need a significant time and resource investment to capture and encode expertise into complex rule sets. These systems also do not automatically learn from experience, relying on experts to write more rules to improve decision-making.

With the advent of modern deep-learning techniques and the ability to access significantly more data, it is now possible for the computer, not only the developer, to learn and encode the rules to power a decision engine and improve them over time. The vision for Spice.ai is to make it easy for developers to build this new class of applications. So what are some use-cases for these applications?

Use cases of decision-making applications

Reduce energy costs by optimizing air conditioning

Today: The air conditioning system for an office building runs on a fixed schedule and is set to a fixed temperature in business hours, only adjusting using in-room sensor data, if at all. This behavior potentially over cools at business close as the outside temperature lowers and the building starts vacating.

With Spice.ai: Using Spice.ai, the application combines time-series data from multiple data sources, including the time of day and day of the week, building/room occupancy, and outside temperature, energy consumption, and pricing. The A/C controller application learns how to adjust the air conditioning system as the room naturally cools towards the end of the day. As the occupancy decreases, the decision engine is rewarded for maintaining the desired temperature and minimizing energy consumption/cost.

Food delivery order dispatching

Today: Customers order food delivery with a mobile app. When the order is ready to be picked up from the restaurant, the order is dispatched to a delivery driver by a simple heuristic that chooses the nearest available driver. As the app gets more popular with customers and the number of restaurants, drivers, and customers increases, the heuristic needs to be constantly tuned or supplemented with human operators to handle the demand.

With Spice.ai: The application learns which driver to dispatch to minimize delivery time and maximize customer star ratings. It considers several factors from data, including patterns in both the restaurant and driver's order histories. As the number of users, drivers, and customers increases over time, the app adapts to keep up with the changing patterns and demands of the business.

Routing stock or crypto trades to the best exchange

Today: When trading stocks through a broker like Fidelity or TD Ameritrade, your broker will likely route your order to an exchange like the NYSE. And in the emerging world of crypto, you can place your trade or swap directly on a decentralized exchange (DEX) like Uniswap or Pancake Swap. In both cases, the routing of orders is likely to be either a form of traditional expert system based upon rules or even manually routed.

With Spice.ai: A smart order routing application learns from data such as pending transactions, time of day, day of the week, transaction size, and the recent history of transactions. It finds patterns to determine the most optimal route or exchange to execute the transaction and get you the best trade.

Summary

A new class of applications that can learn and adapt are made possible by integrating AI-powered decision engines. Spice.ai is a decision engine that makes it easy for developers to build these applications.

If you'd like to partner with us in creating this new generation of intelligent decision-making applications, we invite you to join us on Discord, reach out on Twitter or email us.

Phillip

Russell, Stuart; Norvig, Peter (1995). Artificial Intelligence: A Modern Approach. Simon & Schuster. pp. 22–23. ISBN 978-0-13-103805-9. ↩
Kendal, S. L., & Creen, M. (2007). An introduction to knowledge engineering. London: Springer. ISBN 978-1-84628-475-5 ↩

New in this release​

Dependency updates​

Resources​

Community​

Highlights in v0.6-alpha​

Massive improvement in data loading performance and dataset scale​

Benchmarking v0.6​

New in this release​

Resources​

Community​

What is Soft Actor-Critic​

Actor-Critic​

Soft Actor-Critic​

Adding choices to Spice.AI learning algorithms​

Implementation in the source code​

Conclusion​

The three classes of data for informed decision making​

The waterfall approach to data engineering​

The agile approach​

Summary​

Learn more and contribute​

History of decision engines​

Use cases of decision-making applications​

Reduce energy costs by optimizing air conditioning​

Food delivery order dispatching​

Routing stock or crypto trades to the best exchange​

Summary​

Footnotes​

New in this release

Dependency updates

Resources

Community

Highlights in v0.6-alpha

Massive improvement in data loading performance and dataset scale

Benchmarking v0.6

New in this release

Resources

Community

What is Soft Actor-Critic

Actor-Critic

Soft Actor-Critic

Adding choices to Spice.AI learning algorithms

Implementation in the source code

Conclusion

The three classes of data for informed decision making

The waterfall approach to data engineering

The agile approach

Summary

Learn more and contribute

History of decision engines

Use cases of decision-making applications

Reduce energy costs by optimizing air conditioning

Food delivery order dispatching

Routing stock or crypto trades to the best exchange

Summary

Footnotes