Business projects, sales programmes, often go to double the time and double the cost: how would Bayes have accounted and planned for these?
I now turn to important and sometimes critical time-measures that are used in business decision-making, strategic planning and valuation, such as ‘sales cycle’ time, customer lifetime, and various ‘time-to-market’ quantities, such as time to proof of concept or time to develop a first version of a product.
Bayesian analysis enables us to make good and common sense estimates in this area, where frequency statistics fails. It allows us to use sparse, past observations of positive cases, all of our recent observations where no good result has yet happened, and a subjective knowledge, all treated together and in an objective way, using all of the above information and data and nothing but this. That is, it will be a maximum entropy treatment of the problem where we only use the data we have and nothing more, as accurately as is possible.
We assume that the model for the probability that the time taken to success, t, in ‘quarters’ of a year, is an exponential distribution e−λt for any positive t > 0. λ will be the mean rate of success for the next case in point. We have available some similar prior data, over a period of t quarters, where we had n clients and r ≤ n successful sales (footnote 0).
Let T = rt̂ + (n−r)t
be the total number of quarters (i.e. three-month periods of time) in which we have observed lack of success in selling something, e.g. a product or service, and where
is the mean observed time to success tj for the jth data point.
Let the inverse θ = λ−1, be the mean time to success, for the quantity we want to estimate predictively, and track or monitor carefully, ideally in real time, from as early as possible in our business development efforts, for example, the mean sales-cycle time, i.e. the time from first contact with a new client to the time of the first sale, or possibly the time between sales, or new marketing campaigns, product releases or versions and so on. We shall create an acceptance test at some given level of credibility or rational degree of belief, P, for this θ to be above a selected test value θ0, with some degree of belief my team of executives are comfortable with or interested in.
I wish to obtain an expression telling me the predicted time-to-success in quarters is above (or below) θ0 in terms of θ0 and T, n and r, i.e.given all the available evidence.
By our hypothesis (model), the probability that the lifetime θ > θ0 is given by e−λθ0.
The prior probability for the subjective belief in the mean time taken ts is taken to be distributed exponentially around this value, ps(λ) = tse−λts, which is the maximally-equivocal (footnote 1) most objective assumption.
The small probability from the test, for a given value of λT, given the evidence in the test data, and our best expert opinion, leading to T, is given by the probability ‘density’
Multiplying the probability that the time is greater than θ0 by this probability for each value of λ, and integrating over all positive values of λ, I find that the probability that the next sales person or next case of customer lifetime or time to sale is greater than our selected lifetime for the test, θ0 is given by
Where p(D,θ0) is the posterior probability as a function of our data D and (acceptance) case in point θ0, and which after some straightforward algebra turns out to be a simple expression from which result one can obtain the numerical value with T having been shifted by the inclusion of the subjective expert time, ts, T → T + ts, which is our subjective, common sense, maximum entropy, prior belief as to the mean length of time in quarters for this quantity.
Suppose we have an acceptance probability, of P * 100% that our rational, mean sales cycle time for the next customer or time-to-market for a product or service is less than some time θ0. I thus test whetherp(D,θ0) < P. If this inequality is true, (we have chosen P such that) our team will accept and work with this case, because it is sufficiently unlikely for us that the time to sale or sales cycle is longer than θ0. Alternatively, I can determine what θ0 is, for a given limiting value of P, say, 20%. For example: taking some data, where n = 8, r = 6, the expert belief is that the sales time mean is
, i.e. just over a year, and there were specific successes, say, at tj = (3,4,4,4,4.5,6)quarters corresponding to our r = 6, and we run the new test for t = 2 quarters. We want to be 80% sure that our next impact-endeavour for sales/etc will not last more than some given θ0 that we want to determine. I put in the values, and find that T = 33.75, continuing to determine θ0 I find that with odds of 4:1 on, time/lifetime/time-to-X is no greater than 8.7 quarters.
Suppose that we had more data, say an average of
quarters with r = 15 actual successes and n = 20 trials. We decide to rely on the data and set
. Now T = 78. Keeping the same probability acceptance or odds requirement at 80% or 4:1 on, we find θ0 ≤ 8.25 quarters. If we were considering customer lifetime, rather than sales cycle time or similar measures like time to proof of concept etc, we benefit when the lifetime of the customer is more than a given value of time θ0, and so we may look at tests where P > 80% and so on.
If we omit the quantity ts, we find that the threshold θ0 = 7.8 quarters, only a small tightening, since the weight of one subjective ’data’ is much smaller than the effect of so many, O(n) ‘real’ data points.
Now I wish to consider the case where we run a test for a time t with n opportunities. After a time t, we obtain a first success (footnote 2), so that then r = 1 and we note the value of t̂. I then set ts = t and I have also t̂ = t. T reduces to T = (n+1)t, and if we look at the case θ0 = t, our probability reduces to an expression that is a function of n:
Since ∞ > n ≥ 1 then
, i.e. if we are only testing one case and we stop this test after time t with one success r = 1 = n, this gives us our minimal probability that the mean is θ ≥ t, all agreeing with common sense, and interesting that the only case where we can achieve a greater than 50:50 probability of θ < t = ts = t̂ is when we only tested n = 1 to success. This is of course probing the niches of sparse data, but in business, one often wishes to move ahead with a single ‘proof of concept’. It is interesting to be able to quantify the risks in this way.
If we consider the (extreme) case where we have no data, only our subjective belief (footnote 3), quantified as ts. Let us take
, m an integer, then our probability p(∅,θ0) of taking this time reduces to
This means that at m = 1 the probability of being greater or less than θ0 is a half, which is common sense. If we want to have odds, say, of 4:1 on, or a probability of only 20% of being above θ0 quarters, then we require m = 4, and the relationship between the odds to 1 and m is simple.
Again this all meets with common sense, but shows us how to deal with a near or complete absence of data, as well as how the situation changes with more and more data. The moral is that for fairly sparse data, when we seek relatively high levels of degree of belief in our sales or time needed the next time we attempt something, the Reverend Bayes is not too forgiving, although he is more forthcoming with useful and most concise information than an equivalent frequency statistics analysis. As we accumulate more and more data, we can see the value of the data very directly, as we have quantified how our risks are reduced as it comes in.
The results seem to fit our experiences with delays and over-budget projects. We must take risks with our salespeople and our planning times, but with this analysis, we are able to quantify and understand these calculated risks and rewards and plan accordingly.
One can update this model with a two-parameter model that reduces to it, but which allows for a shape (hyper)parameter which gives flexibility around prior data, such as the general observation that immediate success, failure or general `event’ is not common, the position of the mean relative to the mode, and also around learning/unlearning since the resulting process need not be memoryless (see another blog here!)
- or customer lifetimes, or types of time-to-market, or general completions/successes etc.)↩︎
- highest entropy, which uncertainty measure is given by S = − ∑spslog ps.↩︎
- e.g. a sale in a new segment/geography/product/service↩︎
- if we neither have any data nor a subjective belief, the model finally breaks down, but that is all you can ask of the model, and a good Bayesian would not want the model to ‘work’ under such circumstances!↩︎