Luke Kim

Founder and CEO of Spice AI

View all authors

Spice.ai v0.6.1-alpha

April 21, 2022 · 2 min read

Luke Kim

Founder and CEO of Spice AI

Announcing the release of Spice.ai v0.6.1-alpha! 🌶

Building upon the Apache Arrow support in v0.6-alpha, Spice.ai now includes new Apache Arrow data processor and Apache Arrow Flight data connector components! Together, these create a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with big data systems from the Apache Arrow ecosystem like Hive, Drill, Spark, Snowflake, and BigQuery, it's now easier than ever to combine big data with Spice.ai.

And we're also excited to announce the release of Spice.xyz! 🎉

Spice.xyz is data and AI infrastructure for web3. It’s web3 data made easy. Insanely fast and purpose designed for applications and ML.

Spice.xyz delivers data in Apache Arrow format, over high-performance Apache Arrow Flight APIs to your application, notebook, ML pipeline, and of course through these new data components, to the Spice.ai runtime.

Read the announcement post at blog.spice.ai.

Spice.xyz

New in this release

Adds Apache Arrow Data Processor
Adds Apache Arrow Flight Data Connector

Now built with Go 1.18.

Dependency updates

Updates to React 18
Updates to CRA 5
Updates to Glide DataGrid 4
Updates to SWR 1.2
Updates to TypeScript 4.6

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

What Data Informs AI-driven Decision Making?

January 4, 2022 · 8 min read

Luke Kim

Founder and CEO of Spice AI

AI unlocks a new generation of intelligent applications that learn and adapt from data. These applications use machine learning (ML) to out-perform traditionally developed software. However, the data engineering required to leverage ML is a significant challenge for many product teams. In this post, we'll explore the three classes of data you need to build next-generation applications and how Spice.ai handles runtime data engineering for you.

While ML has many different applications, one way to think about ML in a real-time application that can adapt is as a decision engine. Phillip discussed decision engines and their potential uses in A New Class of Applications That Learn and Adapt. This decision engine learns and informs the application how to operate. Of course, applications can and do make decisions without ML, but a developer normally has to code that logic. And the intelligence of that code is fixed, whereas ML enables a machine to constantly find the appropriate logic and evolve the code as it learns. For ML to do this, it needs three classes of data.

The three classes of data for informed decision making

We don't want any decision, though. We want high-quality, informed decisions. If you consider making higher quality, informed decisions over time, you need three classes of information. These classes are historical information, real-time or present information, and the results of your decisions.

Especially recently, stock or crypto trading is something many of us can relate to. To make high-quality, informed investing decisions, you first need general historical information on the price, security, financials, industry, previous trades, etc. You study this information and learn what might make a good investment or trade.

Second, you need a real-time updated stream of data as it happens to make a decision. If you were stock trading, this information might be the stock price on the day or hour you want to make the trade. You need to apply what you learned from historical data to the current information to decide what trade to place.

Finally, if we're going to make better decisions over time, we need to capture and learn from the results of those decisions. Whether you make a great or poor trade, you want to incorporate that experience into your historical learning.

Three data classes

Using all three data classes together results in higher quality decisions over time. Broad data across these classes are useful, and we could make some nice trades with that. Still, we can make an even higher quality trading decision with personal context. For example, we may want to consider the individual tax consequences or risk level of the trade for our situation. So each of these classes also comes with global or local variants. We combine global information, like what worked well for everyone, and local experience, what worked well for us and our situation, to make the best, overall informed decision.

The waterfall approach to data engineering

Consider how you would capture these three data classes and make them available to both the application and ML in the trading example. This data engineering can be a pretty big challenge.

First, you need a way to gather and consume historical information, like stock prices, and keep that updated over time. You need to handle streaming constantly updated real-time data to make runtime decisions on how to operate. You need to capture and match the decisions you make and feed that back into learning. And finally, you need a way to provide personal or local context, like holding off on sell trades until next year, to stay within a tax threshold, or identifying a pattern you like to trade. If all this wasn't enough, as we learned from Phillip's AI needs AI-ready data post, all three data classes need to be in a format that ML can use.

Traditional app and data integration.

If you can afford a data or ML team, they may do much of this for you. However, this model starts to look quite waterfall-like and is not suited well to applications that want to learn and adapt in real-time. Like a waterfall approach, you would provide requirements to your data team, and they would do the data engineering required to provide you with the first two classes of data, historical and real-time. They may give you ML-ready data or train an ML model for you. However, there is often a large latency to apply that data or model in your application and a long turn-around time if it does not meet your requirements. In addition, to capture the third class of data, you would need to capture and send the results of the decisions your application made as a result of using those models back to the data team to incorporate in future learning. This latency through the data, decision-making, learning, and adaptation process is often infeasible for a real-world app.

And, if you can't afford a data team, you have to figure out how to do all that yourself.

The agile approach

Modern software engineering practices have favored agile methodologies to reduce time to learn and adapt applications to customer and business needs. Spice.ai takes inspiration from agile methods to provide developers with a fast, iterative development cycle.

Spice.ai provides mechanisms for making all three classes of data available to both the application and the decision engine. Developers author Spicepods declaring how data should be captured, consumed, and made ML-ready so that all three classes are consistent and ML available.

The Spice.ai runtime exposes developer-friendly APIs and data connectors for capturing and consuming data and annotating that data with personal context. The runtime generates AI-ready data for you and makes it available directly for ML. These APIs also make it easy to capture application decisions and incorporate the resulting learning.

The Spice.ai approach short circuits the traditional waterfall-like data process by keeping as much data as possible application local instead of round-tripping through an external pipeline or team, especially valuable for real-time data. The application can learn and adapt faster by reducing the latency of decision consequences to learning.

Spice.ai enables personalized learning from personal context and experiences through the interpretations mechanism. Interpretations allow an application to provide additional information or an "interpretation" of a time range as input to learning. The trading example could be as simple as labeling a time range as a good time to buy or providing additional contextual information such as tax considerations, etc. Developers can also use interpretations to record the results of decisions with more context than what might be available in the observation space. You can read more about Interpretations in the Spice.ai docs.

While Spice.ai focuses on ensuring consistent ML-ready data is available, it does not replace traditional data systems or teams. They still have their place, especially for large historical datasets, and Spice.ai can consume data produced by them. Where possible, especially for application and real-time data, Spice.ai keeps runtime data local to create a virtuous cycle of data from the application to the decision engine and back again, enabling faster and more agile learning and adaption.

App with Spice.ai.

Summary

In summary, to build an intelligent application driven from AI recommended decisions, a significant amount of data engineering can be required to learn, make decisions, and incorporate the results. The Spice.ai runtime enables you as a developer to focus on consuming those decisions and tuning how the AI engine should learn rather than the runtime data engineering.

The potential of the next generation of intelligent applications to improve the quality of our lives is very exciting. Using AI to help applications make better decisions, whether that be AI-assisted investing, improving the energy efficiency of our homes and buildings, or supporting us in deciding on the most appropriate medical treatment, is very promising.

Learn more and contribute

Even for advanced developers, building intelligent apps that leverage AI is still way too hard. Our mission is to make this as easy as creating a modern web page. If that vision resonates with you, join us!

If you want to get involved, we'd love to talk. Try out Spice.ai, email us "hey," get in touch on Discord, or reach out on Twitter.

Luke

Spicepods: From Zero To Hero

December 2, 2021 · 9 min read

Luke Kim

Founder and CEO of Spice AI

In my previous post, Teaching Apps how to Learn with Spicepods, I introduced Spicepods as packages of configuration that describe an application's data-driven goals and how it should learn from data. To leverage Spice.ai in your application, you can author a Spicepod from scratch or build upon one fetched from the spicerack.org registry. In this post, we'll walk through the creation and authoring of a Spicepod step-by-step from scratch.

As a refresher, a Spicepod consists of:

A required YAML manifest that describes how the pod should learn from data
Optional seed data
Learned model/state
Performance telemetry and metrics

We'll create the Spicepod for the ServerOps Quickstart, an application that learns when to optimally run server maintenance operations based upon the CPU-usage patterns of a server machine.

We'll also use the Spice CLI, which you can install by following the Getting Started guide or Getting Started YouTube video.

Fast iterations

Modern web development workflows often include a file watcher to hot-reload so you can iteratively see the effect of your change with a live preview.

Spice.ai takes inspiration and enables a similar Spicepod manifest authoring experience. If you first start the Spice.ai runtime in your application root before creating your Spicepod, it will watch for changes and apply them continuously so that you can develop in a fast, iterative workflow.

You would normally do this by opening two terminal windows side-by-side, one that runs the runtime using the command spice run and one where you enter CLI commands. In addition, developers would open the Spice.ai dashboard located at http://localhost:8000 to preview changes they make.

Figure 1. Spice.ai's modern development workflow

Creating a Spicepod

The easiest way to create a Spicepod is to use the Spice.ai CLI command: spice init <Spicepod name>. We'll make one in the ServerOps Quickstart application called serverops.

Figure 2. Creating a Spicepod.

The CLI saves the Spicepod manifest file in the spicepods directory of your application. You can see it created a new serverops.yaml file, which should be included in your application and be committed to your source repository. Let's take a look at it.

Figure 3. Spicepod manifest.

The initialized manifest file is very simple. It contains a name and three main sections being:

dataspaces
actions
training

We'll walk through each of these in detail, and as a Spicepod author, you can always reference the documentation for the Spicepod manifest syntax.

Authoring a Spicepod manifest

You author and edit Spicepod manifest files in your favorite text editor with a combination of Spice.ai CLI helper commands. We eventually plan to have a VS Code extension and dashboard/portal editing abilities to make this even easier.

Adding a dataspace

To build an intelligent, data-driven application, we must first start with data.

A Spice.ai dataspace is a logical grouping of data with definitions of how that data should be loaded and processed, usually from a single source. A combination of its data source and its name identifies it, for example, nasdaq/msft or twitter/tweets. Read more about Dataspaces in the Core Concepts documentation.

Let's add a dataspace to the Spicepod manifest to load CPU metric data from a CSV file. This file is a snapshot of data from InfluxDB, a time-series database we like.

Figure 4. Adding a dataspace.

We can see this dataspace is identified by its source hostmetrics and name cpu. It includes a data section with a file data connector, the path to the file, and a data processor to know how to process it. In addition, it defines a single measurement usage_idle under the measurements section, which is a measurement of CPU load. In Spice.ai, measurements are the core primitive the AI engine uses to learn and is always numerical data. Spice.ai includes a growing library of community contributable data connectors and data processors you can consist of in your Spicepod to access data. You can also contribute your own.

Finally, because the data is a snapshot of live data loaded from a file, we must set a Spicepod epoch_time that defines the data's start Unix time.

Now we have a dataspace, called hostmetrics/cpu, that loads CSV data from a file and processes the data into a usage_idle measurement. The file connector might be swapped out with the InfluxDB connector in a production application to stream real-time CPU metrics into Spice.ai. And in addition, applications can always send real-time data to the Spice.ai runtime through its API with a simple HTTP POST (and in the future, using Web Sockets and gRPC).

Adding actions

Now that the Spicepod has data, let's define some data-driven actions so the ServerOps application can learn when is the best time to take them. We'll add three actions using the CLI helper command, spice action add.

Figure 5. Adding actions.

And in the manifest:

Figure 6. Actions added to the manifest

Adding rewards

The Spicepod now has data and possible actions, so we can now define how it should learn when to take them. Similar to how humans learn, we can set rewards or punishments for actions taken based on their effect and the data. Let's add scaffold rewards for all actions using the spice rewards add command.

Figure 7. Adding rewards

We now have rewards set for each action. The rewards are uniform (all the same), meaning the Spicepod is rewarded the same for each action. Higher rewards are better, so if we change perform_maintenance to 2, the Spicepod will learn to perform maintenance more often than the other actions. Of course, instead of setting these arbitrarily, we want to learn from data, and we can do that by referencing the state of data at each time-step in the time-series data as the AI engine trains.

Figure 8. Rewards added to the manifest

The rewards themselves are just code. Currently, we currently support Python code, either inline or in a .py external code file and we plan to support several other languages. The reward code can access the time-step state through the prev_state and new_state variables and the dataspace name. For the full documentation, see Rewards.

Let's add this reward code to perform_maintenance, which will reward performing maintenance when there is low CPU usage.

cpu_usage_prev = 100 - prev_state.hostmetrics_cpu_usage_idle
cpu_usage_new = 100 - new_state.hostmetrics_cpu_usage_idle
cpu_usage_delta = cpu_usage_prev - cpu_usage_new
reward = cpu_usage_delta / 100

This code takes the CPU usage (100 minus the idle time) deltas between the previous time state and the current time state, and sets the reward to be a normalized delta value between 0 and 1. When the CPU usage is moving from higher cpu_usage_prev to lower cpu_usage_low, its a better time to run server maintenance and so we reward the inverse of the delta. E.g. 80% - 50% = 30% = 0.3. However, if the CPU moves lower to higher, 50% - 80% = -30% = -0.3, it's a bad time to run maintenance, so we provide a negative reward or "punish" the action.

Figure 9. Reward code

Through these rewards and punishments and the CPU metric data, the Spicepod will when it is a good time to perform maintence and be the decision engine for the ServerOps application. You might be thinking you could write code without AI to do this, which is true, but handling the variety of cases, like CPU spikes, or patterns in the data, like cyclical server load, would take a lot of code and a development time. Applying AI helps you build faster.

Putting it all together

The manifest now has defined data, actions, and rewards. The Spicepod can get data to learn which actions to take and when based on the rewards provided.

If the Spice.ai runtime is running, the Spicepod automatically trains each time the manifest file is saved. As this happens reward performance can be monitored in the dashboard.

Once a training run completes, the application can query the Spicepod for a decision recommendation by calling the recommendations API http://localhost:8000/api/v0.1/pods/serverops/recommendation. The API returns a JSON document that provides the recommended action, the confidence of taking that action, and when that recommendation is valid.

In the ServerOps Quickstart, this API is called from the server maintenance PowerShell script to make an intelligent decision on when to run maintenance. The ServerOps Sample, which uses live data, can be continuously trained to learn and adapt even as the live data changes due to load patterns changing.

The full Spicepod manifest from this walkthrough can be added from spicerack.org using the spice add quickstarts/serverops command.

Summary

Leveraging Spice.ai to be the decision engine for your server maintenance application helps you build smarter applications, faster that will continue to learn and adapt over time, even as usage patterns change over time.

Learn more and contribute

Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!

Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

If you are interested in partnering, we'd love to talk. Try out Spice.ai, email us "hey," get in touch on Discord, or reach out on Twitter.

We are just getting started! 🚀

Luke

Spice.ai v0.4.1-alpha

November 22, 2021 · 3 min read

Luke Kim

Founder and CEO of Spice AI

Announcing the release of Spice.ai v0.4.1-alpha! ✅

This point release focuses on fixes and improvements to v0.4-alpha. Highlights include AI engine performance improvements, updates to the dashboard observations data grid, notification of new CLI versions, and several bug fixes.

A special acknowledgment to @Adm28, who added the CLI upgrade detection and prompt, which notifies users of new CLI versions and prompts to upgrade.

CLI upgrade prompt

Highlights in v0.4.1-alpha

AI engine performance improvements

Overall training performance has been improved up to 13% by removing a lock in the AI engine.

In versions before v0.4.1-alpha, performance was especially impacted when streaming new data during a training run.

Dashboard Observations Datagrid

The dashboard observations datagrid now automatically resizes to the window width, and headers are easier to read, with automatic grouping into dataspaces. In addition, column widths are also resizable.

CLI version detection and upgrade prompt

When it is run, the Spice.ai CLI will now automatically check for new CLI versions once a day maximum.

If it detects a new version, it will print a notification to the console on spice version, spice run or spice add commands prompting the user to upgrade using the new spice upgrade command.

New in this release

Adds automatic resizing of the observations datagrid.
Adds header group by dataspace to the observations datagrid.
Adds CLI version detection and prompt for upgrade on version, run, and add commands.
Adds Support for parsing hex-encoded times and measurements. Use the time_format of hex or prefix with 0x.
Updates AI engine with improved training performance.
Updates Go and NPM dependencies.
Fixes detection of Spicepods in the Spicepods directory, and a resulting error when loading a non-Spicepod file.
Fixes a potential "zip slip" security issue.
Fixes an issue where the AI engine may not gracefully shutdown.

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Teaching Apps how to Learn with Spicepods

November 15, 2021 · 6 min read

Luke Kim

Founder and CEO of Spice AI

The last post in this series, Making Apps that Learn and Adapt, described the shift from building AI/ML solutions to building apps that learn and adapt. But, how does the app learn? And as a developer, how do I teach it what it should learn?

With Spice.ai, we teach the app how to learn using a Spicepod.

Imagine you own a restaurant. You created a menu, hired staff, constructed the kitchen and dining room, and got off to a great start when it first opened. However, over the years, your customers' tastes changed, you've had to make compromises on ingredients, and there's a hot new place down the street... business is stagnating, and you know that you need to make some changes to stay competitive.

You have a few options. First, you could gather all the data, such as customer surveyss, seasonal produce metrics, and staff performance profiles. You may even hire outside consultants. You then take this data to your office, and after spending some time organizing, filtering, and collating it, you've discovered an insight! Your seafood dishes sell poorly and cost the most... you are losing money! You spend several weeks or months perfecting a new menu, which you roll out with much fanfare! And then… business is still poor. What!? How could this be? It was a data-driven approach! You start the process again. While this approach is a worthy option, it has long latency from data to learning to implementation.

Another option is to build real-time learning and adaption directly into the restaurant. Imagine a staff member whose sole job was learning and adapting how the restaurant should operate; lets name them Blue. You write a guide for Blue that defines certain goal metrics, like customer food ratings, staff happiness, and of course, profit. Blue tracks each dish served, from start to finish, from who prepared it to its temperature, its costs, and its final customer taste rating. Blue not only learns from each customer review as each dish is consumed but also how dish preparation affects other goal metrics, like profitability. The restaurant staff consults Blue to determine any adjustments to improve goal metrics as they work. The latency from data to learning, to adaption, has been reduced, from weeks or months to minutes. This option, of course, is not feasible for most restaurants, but software applications can use this approach. Blue and his instructions are analogous to the Spice.ai runtime and manifest.

In the Spice.ai model, developers teach the app how to learn by describing goals and rewarding its actions, much like how a parent might teach a child. As these rewards are applied in training, the app learns what actions maximize its rewards towards the defined goals.

Returning to the restaurant example, you can think of the Spice.ai runtime as Blue, and Spicepod manifests as the guide on how Blue should learn. Individual staff members would consult with Blue for ongoing recommendations on decisions to make and how to act. These goals and rewards are defined in Spicepods or "pods" for short. Spicepods are packages of configuration that describe the application's goals and how it should learn from data. Although it's not a direct analogy, Spicepods and their manifests can be conceptualized similar to Docker containers and Dockerfiles. In contrast, Dockerfiles define the packaging of your app, Spicepods specify the packaging of your app's learning and data.

Anatomy of a Spicepod

A Spicepod consists of:

A required YAML manifest that describes how the pod should learn from data
Optional seed data
Learned model/state
Performance telemetry and metrics

Developers author Spicepods using the spice CLI command such as with spice pod init <name> or simply by creating a manifest file such as mypod.yaml in the spicepods directory of their application.

Here's an example of the Tweet Recommendation Quickstart Spicepod manifest.

tweet-recommendation-manifest

A screenshot of the Spicepod manifest for the Tweet Recommendation Quickstart

You can see the data definitions under dataspaces, the actions the application may take under actions, and their rewards when training.

In the next post, I'll walk through in detail each section of the pod manifest. In the meantime, you can review the documentation for a complete reference of the Spicepod manifest syntax.

Spicepods as packages

On disk, Spicepods are generally layouts of a manifest file, seed data, and trained models, but they can also be exported as zipped packages.

spicepod-layout

A screenshot of the Spicepod layout for the trader quickstart application

When the runtime exports a Spicepod using the spice export command, it is saved with a .spicepod extension. It can then be shared, archived, or imported into another instance of the Spice.ai runtime.

Soon, we also expect to enable publishing of .spicepods to spicerack.org, from where community-created Spicepods can easily be added to your application using spice add <pod name> (currently, only Spice AI published pods are available on spicerack.org).

Treating Spicepods as packages and enabling their sharing and distribution through spicerack.org will help developers share their "restaurant guides" and build upon each other's work, much like they do with npmjs.org or pypi.org. In this way, developers can together build better and more intelligent applications.

In the next post, we'll dive deeper into authoring a Spicepod manifest to create an intelligent application. Follow @spice_ai on Twitter to get an update when we post.

If you haven't already, read the next the first post in the series, Making Apps that Learn and Adapt.

Learn more and contribute

Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

If you are interested in partnering, we'd love to talk. Try out Spice.ai, email us "hey," get in touch on Discord, or reach out on Twitter.

We are just getting started! 🚀

Luke

New in this release​

Dependency updates​

Resources​

Community​

The three classes of data for informed decision making​

The waterfall approach to data engineering​

The agile approach​

Summary​

Learn more and contribute​

Fast iterations​

Creating a Spicepod​

Authoring a Spicepod manifest​

Adding a dataspace​

Adding actions​

Adding rewards​

Putting it all together​

Summary​

Learn more and contribute​

Highlights in v0.4.1-alpha​

AI engine performance improvements​

Dashboard Observations Datagrid​

CLI version detection and upgrade prompt​

New in this release​

Resources​

Community​

Learn more and contribute​

New in this release

Dependency updates

Resources

Community

The three classes of data for informed decision making

The waterfall approach to data engineering

The agile approach

Summary

Learn more and contribute

Fast iterations

Creating a Spicepod

Authoring a Spicepod manifest

Adding a dataspace

Adding actions

Adding rewards

Putting it all together

Summary

Learn more and contribute

Highlights in v0.4.1-alpha

AI engine performance improvements

Dashboard Observations Datagrid

CLI version detection and upgrade prompt

New in this release

Resources

Community

Learn more and contribute