Cost modelling for AI projects

I have an ecommerce website and I want more people to click the buttons that give me money (buy now, view ad, etc.) but I dont have the time to A/B test those colours myself. To more efficiently find the right shade I want to use a bandit algorithm that learns how to pick the best colour for a button by making choices that have an associated reward. Before investing the team's time and money into the project, I want to understand whether the expected RoI on the predicted model is greater than the projected cost of creating it. I am not looking for an exact prediction but I do want to develop my intuitive sense for the project so that I can manage and predict the risks appropriately as we go along.

Python and Decoded.AI logos

To do that analysis, I want to rely on a mix of three things. Firstly, I want to learn what kinds of things will drive costs in the project. If I can understand the cost pressures then I am more likely to control for the most relevant kinds of risks. Secondly, I want to understand how our existing capabilities will make life easier or harder for the team so that I can choose what strategies to emphasise or avoid:

Cheap Expensive
Strength Emphasise Develop
Weakness Outsource Avoid

Finally, I want to project these ideas forward as much as possible so that I can understand the costs without necessarily incurring them. So long as I am explicit in my assumptions and measurements then I can maintain a sense of when and where the landscape has shifted or my expectations subverted for the duration of the project. These ideas should help me to maximise my team's chances of delivering on-time and under-budget.

This code defines a 'bandit' algorithm that works with colours. A 'reward' function is like a carrot or a stick.

This reward function 'flips' a coin when given a colour and uses that to either return a carrot (1) or a stick (0).

The bandit 'explores' an environment by choosing a colour 1000 times (epochs) and then recording whether it received a carrot or a stick.

Using Decoded.AI I can begin to analyse some of the business risks of the project. It's all built directly from the code and the features that we want to use don't require an API or any interface implementations. This is the structure of an initial idea around a bandit algorithm that took less than ten minutes to draft out:

 def main():
    Runs the program.
    agent: Bandit[Colours] = Bandit()

    def reward(_observation: typing.Optional[Colours]) -> Reward:
        Rewards the colour of a button **randomly**
        if random.random() < 0.5:
            return 0
            return 1

    env = ClickableButton(fn=reward)

    agent.explore(env, 0.5, 1000)


I've put the reward function as close to the top as possible to make our assumptions as obvious as possible. In this case, it's a filler for a genuine reward function so that I can get a sense of where to go next before investing in designing the environment and writing the code. With only a thousand epochs and a dummy reward function I can model some assumptions to project a bit about the likely cost structure of the model based on information profiled from a 0.5 second smoke test that was absurdly inexpensive:

Screenshot of

Let's make some assumptions. I ran the example on a fairly cheap CPU so I can expect to get at least a 5x speed-up when moving to a faster cloud instance before even thinking about writing code for the GPU. That'll cost more so we'll adjust the vCPU price and say that we want to scale up the complexity by 100x. It also isn't free for our team to try and get the model working effectively. Engineering time is expensive and every time we have to stop our work to wait for feedback (e.g. the model to train and our metrics to come in) we incur a cost. Based on experience, I've set that to about $70/hr to account for the cost of an engineer working on the code and then $10/hr for every hour waiting on a model to train. I think that it would take at least 15 significant attempts to get the model beyond our minimum tolerable risk, so that's accounted for as well in the 'attempts' metric. None of these are exact metrics but they help build an intuition for the project: time is valuable and the number of attempts we get are finite.

Screenshot of

I can see right away that my incidentals (e.g. time waiting for feedback, time writing code etc.) are the biggest cost risk for this project. The computation itself is fairly quick and I have no memory pressures, so I can afford fast iteration without focusing on optimisation. What are the limits of that?

Screenshot of

Well, it turns out that even if I rely on a pipeline whose complexity increases exponentially I would not have a problem paying the monthly cloud bill. The biggest risk is clearly that I fail to deliver within the approximate working time captured in the 'cost per attempt' metric. So, for my strategy on this project I want to do a few things:

  • Maximise flow states by prioritising developer experience and time-on-task even over compute optimisations
  • Maximise learnings by prioritising information rich instrumentation even over compute resource usage and slower feedback loops

We also get a sense of when we should abandon the project. If our expectations are that the button picker will bring in approximately $1,000/month of extra revenue and we want to break even within three months then we can stay the course for at least three months at this burn. That kind of analysis can give us a sense of cost certainty that makes projects easier to manage and fosters more creativity as we know how much of a risk we can afford to take.

Closing thoughts

Our goal for the early experimentation stage of an AI project is mostly to project forward about the kinds of strategies available to us given the likely cost/capability structure of our early ideas. When we don't have good answers to these questions it can be challenging to select and stand by a strategy that the team feels comfortable is making the best of our chances. With Decoded.AI, we can start to get that kind of information within the time it takes to spike a smoke-test (10 minutes), push some data and navigate to the browser. By investigating how those pressures might induce us to behave, we can select ideas that maximise our chances of success by selecting the most appropriate strategy. Because smoke-tests are cheap, we can keep drawing out rich information using Decoded.AI to maintain our strategic approach without much effort to improve the chances that we deliver our button-picker on time and under budget.