Skip to main content

What Data Informs AI-driven Decision Making?

ยท 7 min read
Luke Kim
Founder and CEO of Spice AI

AI unlocks a new generation of intelligent applications that learn and adapt from data. These applications use machine learning (ML) to out-perform traditionally developed software. However, the data engineering required to leverage ML is a significant challenge for many product teams. In this post, we'll explore the three classes of data you need to build next-generation applications and how Spice.ai handles runtime data engineering for you.

While ML has many different applications, one way to think about ML in a real-time application that can adapt is as a decision engine. Phillip discussed decision engines and their potential uses in A New Class of Applications That Learn and Adapt. This decision engine learns and informs the application how to operate. Of course, applications can and do make decisions without ML, but a developer normally has to code that logic. And the intelligence of that code is fixed, whereas ML enables a machine to constantly find the appropriate logic and evolve the code as it learns. For ML to do this, it needs three classes of data.

The three classes of data for informed decision makingโ€‹

We don't want any decision, though. We want high-quality, informed decisions. If you consider making higher quality, informed decisions over time, you need three classes of information. These classes are historical information, real-time or present information, and the results of your decisions.

Especially recently, stock or crypto trading is something many of us can relate to. To make high-quality, informed investing decisions, you first need general historical information on the price, security, financials, industry, previous trades, etc. You study this information and learn what might make a good investment or trade.

Second, you need a real-time updated stream of data as it happens to make a decision. If you were stock trading, this information might be the stock price on the day or hour you want to make the trade. You need to apply what you learned from historical data to the current information to decide what trade to place.

Finally, if we're going to make better decisions over time, we need to capture and learn from the results of those decisions. Whether you make a great or poor trade, you want to incorporate that experience into your historical learning.

Three data classes

Using all three data classes together results in higher quality decisions over time. Broad data across these classes are useful, and we could make some nice trades with that. Still, we can make an even higher quality trading decision with personal context. For example, we may want to consider the individual tax consequences or risk level of the trade for our situation. So each of these classes also comes with global or local variants. We combine global information, like what worked well for everyone, and local experience, what worked well for us and our situation, to make the best, overall informed decision.

The waterfall approach to data engineeringโ€‹

Consider how you would capture these three data classes and make them available to both the application and ML in the trading example. This data engineering can be a pretty big challenge.

First, you need a way to gather and consume historical information, like stock prices, and keep that updated over time. You need to handle streaming constantly updated real-time data to make runtime decisions on how to operate. You need to capture and match the decisions you make and feed that back into learning. And finally, you need a way to provide personal or local context, like holding off on sell trades until next year, to stay within a tax threshold, or identifying a pattern you like to trade. If all this wasn't enough, as we learned from Phillip's AI needs AI-ready data post, all three data classes need to be in a format that ML can use.

Traditional app and data integration.

If you can afford a data or ML team, they may do much of this for you. However, this model starts to look quite waterfall-like and is not suited well to applications that want to learn and adapt in real-time. Like a waterfall approach, you would provide requirements to your data team, and they would do the data engineering required to provide you with the first two classes of data, historical and real-time. They may give you ML-ready data or train an ML model for you. However, there is often a large latency to apply that data or model in your application and a long turn-around time if it does not meet your requirements. In addition, to capture the third class of data, you would need to capture and send the results of the decisions your application made as a result of using those models back to the data team to incorporate in future learning. This latency through the data, decision-making, learning, and adaptation process is often infeasible for a real-world app.

And, if you can't afford a data team, you have to figure out how to do all that yourself.

The agile approachโ€‹

Modern software engineering practices have favored agile methodologies to reduce time to learn and adapt applications to customer and business needs. Spice.ai takes inspiration from agile methods to provide developers with a fast, iterative development cycle.

Spice.ai provides mechanisms for making all three classes of data available to both the application and the decision engine. Developers author Spicepods declaring how data should be captured, consumed, and made ML-ready so that all three classes are consistent and ML available.

The Spice.ai runtime exposes developer-friendly APIs and data connectors for capturing and consuming data and annotating that data with personal context. The runtime generates AI-ready data for you and makes it available directly for ML. These APIs also make it easy to capture application decisions and incorporate the resulting learning.

The Spice.ai approach short circuits the traditional waterfall-like data process by keeping as much data as possible application local instead of round-tripping through an external pipeline or team, especially valuable for real-time data. The application can learn and adapt faster by reducing the latency of decision consequences to learning.

Spice.ai enables personalized learning from personal context and experiences through the interpretations mechanism. Interpretations allow an application to provide additional information or an "interpretation" of a time range as input to learning. The trading example could be as simple as labeling a time range as a good time to buy or providing additional contextual information such as tax considerations, etc. Developers can also use interpretations to record the results of decisions with more context than what might be available in the observation space. You can read more about Interpretations in the Spice.ai docs.

While Spice.ai focuses on ensuring consistent ML-ready data is available, it does not replace traditional data systems or teams. They still have their place, especially for large historical datasets, and Spice.ai can consume data produced by them. Where possible, especially for application and real-time data, Spice.ai keeps runtime data local to create a virtuous cycle of data from the application to the decision engine and back again, enabling faster and more agile learning and adaption.

App with Spice.ai.

Summaryโ€‹

In summary, to build an intelligent application driven from AI recommended decisions, a significant amount of data engineering can be required to learn, make decisions, and incorporate the results. The Spice.ai runtime enables you as a developer to focus on consuming those decisions and tuning how the AI engine should learn rather than the runtime data engineering.

The potential of the next generation of intelligent applications to improve the quality of our lives is very exciting. Using AI to help applications make better decisions, whether that be AI-assisted investing, improving the energy efficiency of our homes and buildings, or supporting us in deciding on the most appropriate medical treatment, is very promising.

Learn more and contributeโ€‹

Even for advanced developers, building intelligent apps that leverage AI is still way too hard. Our mission is to make this as easy as creating a modern web page. If that vision resonates with you, join us!

If you want to get involved, we'd love to talk. Try out Spice.ai, email us "hey," get in touch on Discord, or reach out on Twitter.

Luke

A New Class of Applications That Learn and Adapt

ยท 5 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

A new class of applications that learn and adapt is becoming possible through machine learning (ML). These applications learn from data and make decisions to achieve the application's goals. In the post Making apps that learn and adapt, Luke described how developers integrate this ability to learn and adapt as a core part of the application's logic. You can think of the component that does this as a "decision engine." This post will explore a brief history of decision engines and use-cases for this application class.

History of decision enginesโ€‹

The idea to make intelligent decision-making applications is not new. Developers first created these applications around the 1970s1, and they are some of the earliest examples of using artificial intelligence to solve real-world problems.

The first applications used a class of decision engines called "expert systems." A distinguishing trait of expert systems is that they encode human expertise in rules for decision-making. Domain experts created combinations of rules that powered decision-making capabilities.

Some uses of expert systems include:

However, the resources required to build expert systems make employing them infeasible for many applications2. They often need a significant time and resource investment to capture and encode expertise into complex rule sets. These systems also do not automatically learn from experience, relying on experts to write more rules to improve decision-making.

With the advent of modern deep-learning techniques and the ability to access significantly more data, it is now possible for the computer, not only the developer, to learn and encode the rules to power a decision engine and improve them over time. The vision for Spice.ai is to make it easy for developers to build this new class of applications. So what are some use-cases for these applications?

Use cases of decision-making applicationsโ€‹

Reduce energy costs by optimizing air conditioningโ€‹

Today: The air conditioning system for an office building runs on a fixed schedule and is set to a fixed temperature in business hours, only adjusting using in-room sensor data, if at all. This behavior potentially over cools at business close as the outside temperature lowers and the building starts vacating.

With Spice.ai: Using Spice.ai, the application combines time-series data from multiple data sources, including the time of day and day of the week, building/room occupancy, and outside temperature, energy consumption, and pricing. The A/C controller application learns how to adjust the air conditioning system as the room naturally cools towards the end of the day. As the occupancy decreases, the decision engine is rewarded for maintaining the desired temperature and minimizing energy consumption/cost.

Food delivery order dispatchingโ€‹

Today: Customers order food delivery with a mobile app. When the order is ready to be picked up from the restaurant, the order is dispatched to a delivery driver by a simple heuristic that chooses the nearest available driver. As the app gets more popular with customers and the number of restaurants, drivers, and customers increases, the heuristic needs to be constantly tuned or supplemented with human operators to handle the demand.

With Spice.ai: The application learns which driver to dispatch to minimize delivery time and maximize customer star ratings. It considers several factors from data, including patterns in both the restaurant and driver's order histories. As the number of users, drivers, and customers increases over time, the app adapts to keep up with the changing patterns and demands of the business.

Routing stock or crypto trades to the best exchangeโ€‹

Today: When trading stocks through a broker like Fidelity or TD Ameritrade, your broker will likely route your order to an exchange like the NYSE. And in the emerging world of crypto, you can place your trade or swap directly on a decentralized exchange (DEX) like Uniswap or Pancake Swap. In both cases, the routing of orders is likely to be either a form of traditional expert system based upon rules or even manually routed.

With Spice.ai: A smart order routing application learns from data such as pending transactions, time of day, day of the week, transaction size, and the recent history of transactions. It finds patterns to determine the most optimal route or exchange to execute the transaction and get you the best trade.

Summaryโ€‹

A new class of applications that can learn and adapt are made possible by integrating AI-powered decision engines. Spice.ai is a decision engine that makes it easy for developers to build these applications.

If you'd like to partner with us in creating this new generation of intelligent decision-making applications, we invite you to join us on Discord, reach out on Twitter or email us.

Phillip

Footnotesโ€‹

  1. Russell, Stuart; Norvig, Peter (1995). Artificial Intelligence: A Modern Approach. Simon & Schuster. pp. 22โ€“23. ISBN 978-0-13-103805-9. โ†ฉ

  2. Kendal, S. L., & Creen, M. (2007). An introduction to knowledge engineering. London: Springer. ISBN 978-1-84628-475-5 โ†ฉ

Spice.ai v0.5.1-alpha

ยท 2 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

Announcing the release of Spice.ai v0.5.1-alpha! ๐Ÿ“ˆ

This minor release builds upon v0.5-alpha adding the ability to start training from the dashboard plus support for monitoring training runs with TensorBoard.

Highlights in v0.5.1-alphaโ€‹

Start training from dashboardโ€‹

A "Start Training" button has been added to the pod page on the dashboard so that you can easily start training runs from that context.

Training runs can now be started by:

  • Modifications to the Spicepod YAML file.
  • The spice train <pod name> command.
  • The "Start Training" dashboard button.
  • POST API calls to /api/v0.1/pods/{pod name}/train

TensorBoard monitoringโ€‹

TensorBoard monitoring is now supported when using DQL (default) or the new SACD learning algorithms that was announced in v0.5-alpha.

When enabled, TensorBoard logs will automatically be collected and a "Open TensorBoard" button will be shown on the pod page in the dashboard.

Logging can be enabled at the pod level with the training_loggers pod param or per training run with the CLI --training-loggers argument.

Support for VPG will be added in v0.6-alpha. The design allows for additional loggers to be added in the future. Let us know what you'd like to see!

New in this releaseโ€‹

  • Adds a start training button on the dashboard pod page.
  • Adds TensorBoard logging and monitoring when using DQL and SACD learning algorithms.

Dependency updatesโ€‹

  • Updates to Tailwind 3.0.6
  • Updates to Glide Data Grid 3.2.1

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!

Understanding Q-learning: How a Reward Is All You Need

ยท 10 min read
Corentin Risselin
Software Engineer at Spice AI

There are two general ways to train an AI to match a given expectation: we can either give it the expected outputs (commonly named labels) for differents inputs; we call this supervised learning. Or we can provide a reward for each output as a score: this is reinforcement learning (RL).

Supervised learning works by tweaking all the parameters (weights in neural networks) to fit the desired outputs, expecting that given enough input/label pairs the AI will find common rules that generalize for any input.

Reinforcement learning's reward is often provided from a simple function that can score any output: we don't know what specific output would be best, but we can recognize how good the result is. In this latter statement there are two underlying concepts we will address in this post:

  • Can we only tell if the output is good in a binary way, or do we have to quantify the output to train our AI?
  • Do we have to give a reward for every AI's output? Can we give a reward only at specific times?

Those questions are already mostly answered, and many algorithms deal with those topics. Our journey here will be to understand how we tackle those questions and end up with a beautiful formula that is at the core of modern approaches of RL:

Equation 1. Q estimation at the heart of many RL algorithm, also known as the Bellman equation.

Q-learningโ€‹

The vast majority, if not all, of modern RL algorithms are based on the principles of Q-learning: the idea is to evaluate a 'reward expectation' for each possible action. If we can have a good evaluation, we could maximize the reward by choosing actions with the maximum evaluated rewards. The function giving this expected reward is named Q. For now, we will assume we can have a reward for any action.

Equation 2. Definition of the Q function.

The t indices show that the state and action aren't constant and will vary, usually with time/action taken. On the other hand, the Q function and the reward function r are unique functions that ideally return the 'expected reward' for any (state, action) pairs.

For now, we will assume we can have a reward that gives an objective and perfect evaluation of each state/action.

Figure 1. Example of reward given for different actions at a specific state. Here a simple 2D map with a goal.

Q-Tableโ€‹

We know that actions' outcomes (rewards) will vary depending on the current state we are in, otherwise the problem would be trivial to solve. If the states that are relevant to our actions can be numbered, a simple way would be to build a table with all the possible states/action pairs. There are different ways to build such a table depending on how we can interact with our environment. Eventually, we would have a good 'map' to guide us to do the best actions.

Figure 2. Example of Q-table: we can build an exhaustive table for all the possible (state, action) pairs

Deep Q-Learningโ€‹

When the number of variables of the environment relevant to our actions/rewards becomes too large, the number of possible states grows quickly. It doesn't take a lot of possible parameters to make the Q-table approach unfeasible. Neural networks are known to work very nicely and efficiently in high dimensionality (with many input variables). They also generalize well, so the idea in Deep Q-Learning is to use a neural network to predict the different Q values for each action given a state.

Figure 3. A neural network can predict Q values from state information

In this case, we do not need to give the state/action pairs but only the state, as the neural network would exhaustively return all the Q values associated with each action. Outputting all actions' Q value is a common method as the general cases have a complex environment but a smaller number of possible actions.

This method works very well. It is similar to supervised learning with states as inputs and rewards as labels. We assumed so far that we had a reward for each action, and we chose the next action with the best reward (called a greedy policy). In many cases this is not enough: even if an action would yield the best reward at a given state, this may affect the next state so that we wouldn't optimize the reward in the long term. Also, if we can't have a reward for each action, we usually give 0 as a reward. We will not be able to choose the right action if they affect later states despite not yielding different rewards at the current state.

The sparsity of rewards or the long-term calculation of total reward (non-greedy policies) leads us to diverge from supervised learning and learn potential future rewards.

Temporal difference: TD-Learningโ€‹

TD-learning is a clever way to account for potential future value without knowing them yet. TD is a model-free class of algorithms: it does not simulate future states. The main idea is to consider all the rewards of a sequence of actions to give a better value than just the reward of the next action.

We can, for instance, sum all the future rewards:

Figure 4. Cumulating future rewards to assign values to each state.

Mathematically this can be written as:

Equation 3.

This is named TD(0): the simplest form of TD method, accumulating all the rewards.

Introducing policiesโ€‹

We could try different trajectories (sequence of actions) and retrospectively get the final reward for each action, but this has 2 drawbacks: the environment is usually too vast, and the sequence of actions might not even have a definite end. Also, such exhaustive methods might not be very efficient. Instead, we can evaluate the 'value' of the next state overall, like the maximum of all its possible rewards (direct reward), and add this value to the reward of a given action.

If a state can have different branches, we can select the best one, and this would be our policy, the way we choose actions. This simple form of taking the maximum is called the 'greedy' policy.

Figure 5. With a greedy policy the associated values to state come from the maximum value of the next state. Here despite the lower branch giving only half the top reward directly the overall value is greater.

This can be written down as:

Equation 4.

The expected value notation is defined as:

Equation 5.

For a greedy policy the probabilities p would all be set to 0 but the one associated with the highest return to 1 (in case of equality between n actions, we would attribute '1/n' as probabilities to get the same expected value).

Equation 6.

Relation with Q functionโ€‹

The expected reward can be replaced by the Q function we used earlier, which now can be denominated to be specific to our chosen policy (named ฯ€):

Equation 7.

TD-0โ€‹

We previously discussed the problem of not being able to go through all the states exhaustively and that the evaluation of the Q value from a neural network could help. We want to use the TD method to have a better value estimation that will consider potential future rewards.

The TD(0) method is elegant as we can, in fact, only use the next state's expected value instead of all future ones. The idea is that with successive evaluations, we build a chain of dependencies as each states' value depends on the next one.

Equation 8.

Figure 6. Iterative propagation of state values following TD(0) method.

We can see that the greedy policy would work even with null rewards in the trajectory. We can explicit our greedy policy, going back to use Q value instead of the state value V:

Equation 9.

TD-lambdaโ€‹

We need to fix a problem: if a trajectory grows too long or never ends, a state value can potentially grow indefinitely. To counter that, we can add a discount factor (originally named lambda, usually refer as gamma in Q-learning) for the next state's value:

Equation 10.

Notice that we simplify the reward notation for clarity.

To avoid exploding values, this discount has to be between 0 and 1 (strictly below 1). We can think about it as giving more importance to the direct reward than the future ones. As the contribution to the latter reward decrease, the chain of action can grow without the calculated value growing. If the reward has an upper limit, the value will also be bounded.

The sparsity of rewards is also solved: giving only a positive reward after many non-rewarding steps will create smooth values for the intermediate states. Any reward, positive or negative, will diffuse its value to the neighbor states.

Figure 7. The TD(0) value propagation can allow for a smooth value distribution over the state that will help building efficient behaviour.

Q-Learning algorithmโ€‹

Finally, as we train a neural network to estimate the Q function, we need to update its target with successive iteration. We cannot fully trust the estimator (a neural network here) to give the correct value, so we introduce a learning rate to update the target smoothly.

Equation 11. Fully explained Bellman equation.

That is it! We now understand all the parts of this formula. Over multiple training steps with different sates, the training should find a good average Q function. While training, the estimator uses its own output to train itself (commonly referred to as bootstrapping): it is like it is chasing itself. Bootstrapping can lead to instability in the training process. There are many additional methods to help against such instability.

From giving rewards, sparse or not, binary or fine-grained, we have a smooth space of values for all our states/actions so the AI can follow a greedy policy to the best outcome.

This way of training is not a silver bullet and there is no guarantee that the AI will find a correlation from the information given as state to the returned reward.

Conclusionโ€‹

We can see how our rewards are used to train AI's policies using Q-learning. By understanding the many iterations required and the bootstrapping issues, we can help our AI by carefully giving relevant state information and reward:

  • There needs to be a correlation between the state information and the reward: the simpler the relationship, the easier/faster the AI will find it.
  • Sparse and binary rewards make the training problem long and arduous. Giving more information through the reward can tremendously increase the speed/accuracy of the learned Q-estimator.
  • The longer the chain of actions, the more complex the Q-value will be to estimate.

We didn't see how the AI's algorithm can explore different actions given an environment here. Spice.ai's technology focuses exclusively on off-policy training where we only have past data and cannot interact with the environment. RL is a vast topic and currently quickly growing. Robotics is a fantastic field of application; many other areas are yet to be explored with such a technology. We hope to push forward the technology and its field of application with our platform.

If you'd like to partner with us on the mission of making new applications by leveraging RL, we invite you to discuss with us on Discord, reach out on Twitter or email us.

I hope you enjoy this post and learn new things.

Corentin

Spice.ai v0.5-alpha

ยท 3 min read
Phillip LeBlanc
Co-Founder and CTO of Spice AI

We are excited to announce the release of Spice.ai v0.5-alpha! ๐Ÿฅ‡

Highlights include a new learning algorithm called "Soft Actor-Critic" (SAC), fixes to the behavior of spice upgrade, and a more consistent authoring experience for reward functions.

If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.

Highlights in v0.5-alphaโ€‹

Soft Actor-Critic (Discrete) (SAC) Learning Algorithmโ€‹

The addition of the Soft Actor-Critic (Discrete) (SAC) learning algorithm is a significant improvement to the power of the AI engine. It is not set as the default algorithm yet, so to start using it pass the --learning-algorithm sacd parameter to spice train. We'd love to get your feedback on how its working!

Consistent reward authoring experienceโ€‹

With the addition of the reward function files that allow you to edit your reward function in a Python file, the behavior of starting a new training session by editing the reward function code was lost. With this release, that behavior is restored.

In addition, there is a breaking change to the variables used to access the observation state and interpretations. This change was made to better reflect the purpose of the variables and make them easier to work with in Python

Previous (Type)New (Type)
prev_state (SimpleNamespace)current_state (dict)
prev_state.interpretations (list)current_state_interpretations (list)
new_state (SimpleNamespace)next_state (dict)
new_state.interpretations (list)next_state_interpretations (list)

Improved spice upgrade behaviorโ€‹

The Spice.ai CLI will no longer recommend "upgrading" to an older version. An issue was also fixed where trying to upgrade the Spice.ai CLI using spice upgrade on Linux would return an error.

New in this releaseโ€‹

  • Adds a new learning algorithm called "Soft-Actor Critic" (SAC).
  • Updates the reward function parameters for the YAML code blocks from prev_state and new_state to current_state and next_state to be consistent with the reward function files.
  • Fixes an issue where editing a reward functions file would not automatically trigger training.
  • Fixes the normalization of values for the Deep-Q Learning algorithm to handle larger values.
  • Fixes an issue where the Spice.ai CLI would not upgrade on Linux with the spice upgrade command.
  • Fixes an issue where the Spice.ai CLI would recommend an "upgrade" to an older version.

Resourcesโ€‹

Communityโ€‹

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!