How tall will my children be, and other AI problems
De-mystifying AI and explaining what it means for consumer businesses
Let’s imagine a maths problem, and how we might solve it. Our challenge is to work out the eventual adult height of the children of various pairs of parents. Now there will be lots of extraneous factors impacting that, but for now our hypothesis is that only three things matter - the gender of the child, the height of the mother and the height of the father.
So where do we start? Clearly there are lots of ways of combining those three data points. For simplicity, we’ll do the exercise separately for male and female children, so for each of those we are now down to just two data points - the heights of mum and dad. But what is the right way to combine those? Is the answer the average of the two heights? Is it 0.7 times the father’s height plus 0.3 times the mother’s height? Maybe it isn’t so simple - perhaps the best model involves the square or the cube of one of their heights or even (heaven forbid) a logarithm.
There are just too many options for us to randomly guess and hope for the best - we need a method that gets us to the right answer.
Here’s what we’ll do:
We’ll take a big sample of data where we already know the answer - so we have mum and dad’s heights and the height of the child as an adult.
We’ll write a big equation that combines together every version of the two heights we can imagine, each multiplied by a ‘weighting’ factor - so if dad’s height is D and mum’s height is M, then the child’s height C will be written as:
\(C = X_{1}D + X_{2}M + X_{3}D^{2} + X_{4}M^{2} ….. etc\)In other words, a long list of every possible expression of D and M (all the squares, square roots, logarithms etc) each with a weighting, which are the various X’s in the equation.
So far we’ve written something very complicated looking but completely useless, since we don’t know what any of those X weightings are. Some may be zero - in other words, that particular version of D or M isn’t helpful - whilst others will not.
So now we use an ‘algorithm’ or step by step approach to try to work out the right weightings to use. That goes a bit like this:
Start by guessing - that’s right, just populate the equation with guesses for each of the weightings.
Then take that sample of data where we already know the answer. Use our guesswork equation to calculate the predicted height of the child, and compare the output to the actual answer that we already know.
Across the whole sample, add up how ‘wrong’ we were - in other words, the sum of the errors when we use our guesswork equation on every family in the sample, where the error is the difference between the predicted height and the actual one.
That error will be big (we did, after all, just make up the weightings). But here is the clever bit - with a bit of maths, we can work out in which direction each of the elements of X is wrong - in other words, whether each weighting was too high or too low.
Having done that, we just tweak each weighting - if it was too high, we take it down a bit, if it was too low we take it up. Then we repeat the whole exercise, and we should get a total error which is a bit less terrible the second time.
And that process is one that loops over and over again until we just can’t get the error any lower. At that point, we have the very best model of a child’s height based on their parents’ heights that we can get with our sample data. Assuming that the results aren’t too bad (the errors have gotten quite small) then we have a model that we might apply to new examples as they come along and be reasonably happy that the results will be fairly good.
What we’ve just done is build an AI model - specifically in this case using ‘Machine Learning’. We’ve ‘trained’ our model on a sample of data and used that to get the best equation to predict the output when a new example comes along.
Now needless to say, that’s a fairly simplified view of how it works, and even in this example you’ll have noticed that I had to gloss over how you know which weights are too high and which are too low in order to avoid talking about partial derivatives. So yes, the maths and theory get quite complex and the executing a model like this in practice is also difficult in a whole variety of ways and best left to people who look like they don’t get out into the sun very often.
But the concept is straightforward - we don’t know how to predict something so we’ll take a huge number of data points where know the answer and use that data to work backwards to the best ‘fitting’ equation we can, and then use that to predict future outcomes.
Those future outcomes might be projected sales of a new product, or the right allocation of different SKUs to different stores, or the optimal way of routing delivery trucks - in other words, a huge number of practical challenges facing retail businesses every day. That’s why AI is such a hot topic across so many consumer industries. Whether the tool being used is dressed up as Machine Learning, Neural Networking, AI or any of a dozen other buzzwords, the essential process being used is the same.
In the real world, of course, there will be hundreds or thousands of input variables, not just 2, and we’ll use thousands or millions of historical examples for our ‘training’
In future posts we’ll explore some of the practical examples of these techniques a bit more, and try to identify new areas where the application of these ‘big data’ techniques might help our businesses.
The purpose of starting with this simple explanation, however, is that armed with a conceptual understanding of what is going on we can also see some of the weaknesses in the approach:
A technique like this will give you the best answer it can, but that isn’t necessarily a very good one (who’s to say that a child’s height is really a function of Mum and Dad at all?). So it is always worth asking some tough questions about how good a fit with the data the model really is.
This whole approach also assumes that the past (the sample data) is a good way of predicting the future but is it? If the world around you changes and evolves these models can become redundant really quickly.
There are a range of ways that the creation of the model can go wrong - the process is as much art as it is science - including choosing the wrong variables to include, failing to test the model on new data (so you build one which is brilliant at explaining the sample data but has no predictive value) and many more.
In other words, there is no magic answer in these AI black boxes. It is hugely important that we as business leaders understand and exploit the value that these new technologies can delivery for us, but also that we understand their limitations.
Students using ChatGPT to write their essays are discovering this alongside us in real time. ChatGPT is another example of this kind of technique but applied to language, so it is an engine which is really good at working out which word should come next in a sentence. As a result, it will write you a really well written and fluent essay on your chosen topic, but has no idea whether what it is writing is actually true or not.
We, just like the students, will be better served by these new technologies when we understand both their strengths and their weaknesses, and deploy them accordingly.
P.S.
If you like this kind of stuff, there is lots more of it in my book, The Average is Always Wrong, which was written precisely to de-mystify data and analytics for business leaders - have a look here:
P.P.S.
Moving Tribes is travelling for the next few weeks, so posts will be occasional and shorter until normal service is resumed in June. That said, I will look forward to, and respond to, all of your feedback as normal, so please do get involved, comment and share this post, I’m always grateful.
And beware over-fitting your model! unless the problem has an absolute equation, if the answer is almost perfect (i.e the errors are very small), then its over-fitted and the model is useless. Its useless, because in the real world nothing is absolute, even the measurements of radius & circumference of a circle contain errors so all the data points have there own little measurement errors. Well trained AI / Machine learning models use hold out data samples from which the model doesn't learn, so it can 'test' the model after the learning is complete. Reading Ian's The Average is Always Wrong is a great way to get into and demystifies the topic.