Category Archives: mathematics

When to Use Bayes probability and When to Use Frequentist Statistics

When to use the Bayesian approach

In the following situations, I might want to use Bayes’ approach:

• I have quantifiable beliefs beforehand. These may come from internal experienced colleagues, external ‘experts’, or other subjective sources.

• The data may be ‘sparse’ or limited (presently or for the foreseeable), certainly not ‘big’ , and it often will, but may not dominate our prior, subjective beliefs.

• There is medium or high uncertainty involved.

• I wish to make consistent, sound decisions in the face of and acknowledging my uncertainty.

• I wish to do this in such a way that I can be honest with my stakeholders, shareholders, team, wider staff, investors, board, and so on.

• The model or data-generation methods will involve one or multiple parameters (such as profit, share price, average customer lifetime, transaction value, sales, cost, COGS, and so on).

• I cannot [wait to] trial in an idealised experiment. In dynamic environments, this is one of the key problems with frequentist approaches: we never have the same situation and data twice. The Bayesian approach naturally revises and updates.

• I want to know what it is best to do, or understand what the options are and which ones are better or worse for me and my team in the here and now, for this occasion and situation. In life, it’s rare to be able to wait for ‘the long-run’, but it is often the case that using recent prior data can be useful.

• I want to use all the new data available to me, and be able to eliminate noise as best I can.

• I don’t want to choose an arbitrary approach, I want to use logic; I want the logic of the inferences to be ‘leakproof’ and only the assumptions can be inappropriate. Throughout this book, we’ll see some simple and more complicated examples of using logical probability.

Finally, Bayesian methods keep type. As Jaynes (1976) explained, if the data used is imaginary or pseudo-random, the probability distributions will be imaginary or pseudo-random, and if the data is real data, the probability distributions will relate to real data, e.g. real frequencies, then the probability outputs will be real frequencies, if the prior data is taken from what is reasonable to believe, then the out probabilities will also represent what is reasonable to believe, and so on. . . Summary: the outputs will be of the same character as the inputs.

We first compare approaches to statistics and probability.

• Comparing the Frequentist and the Bayesian approaches to probability

In idem flumen bis descendimus, et non descendimus – Horace, via Seneca, L. A., Epistulae Morales LVIII.23

Frequency is the description used of the statistics that are still the most commonly used. Here we define frequency and compare the frequentist with the Bayesian approach. The frequency definition of probability is the orthodoxy. It is defined as the number of successes say, m, in a large number of identical trials n, i.e. the probability is taken to be the frequency: m/n . There are laws of (large) numbers that lead us to believe that for high enough n, we shall have a good description of the propensity of an event happening.

However, a problem with frequency statistics is highlighted by analogy in the above saying attributed to the poet Horace, and by the apocryphal Buddhist monks. The river changes; we never step into the same river twice, though we go down to the ‘same river.’

In the table below comparing approaches, we see the dynamics of what is being modelled, i.e. ‘reality’, is best approached so that the model changes in real time with the latest information, rather than being descriptive and noting the unusualness of sample or batch information. One is subjectivist and relativist, while the other remains objectivist. We have seen how subjectivist theories like quantum mechanics and general relativity have superseded what went before. These are two very finely-tested theories. Is this is the moment subjectivist approaches in probability logic will arrive?

Summary comparison: Bayesian vs Frequentist Approaches

BayesianFrequentist
Inferential, prescriptiveDescriptive
The here and now, and the next…Long-run behaviour, hoping things persist as-is
Useful, intuitive resultPossibly large number of conflicting results
Elegant, simple mathematicsArbitrary convention & complexity
Weight of evidence, credibility intervalsSignificance, `p-values’, Confidence intervals
Probability as rational degree of beliefProbability as frequency of occurrence
Leakproof, logical probability theoryAd hoc devices, possibly irrelevant information
Equivocation, best model choiceModel then test samples
Unique outcome of one experimentAccept or reject batch vs population
Emphasises revision as data comes inNotes the sample data
Data fixed, parameters unknownData is just one of many possible realisations
Unknowns can be constantsUnknowns are random variables
Doesnt apply in all situationsDitto, but works most of the time with minimal assumptions
Use all of the data, optimallyOften does not use all the data or fully
Doesnt require us to understand degrees of freedom or sufficient statisticsWe must understand and compute the degrees of freedom
Common sense results, transparently inappropriate inference tracks back to the assumptionsSometimes non-common sense results or failure occurs without obvious recourse or poor inference
Focus on the scientific mathematical or business meritsFocus on overcoming technical difficulties of the methods

Table: Comparison of Bayesian (left column) and Frequentist (right column) methods

I have deliberately left out the somewhat contentious issue (to some) of ‘Prior’ distribution selection, but cover this issue in my book:

The Art of Decision, out soon with Big Bold Moves Publishing.

Impossible Decisions…

adeo nihil est cuique se vilius
Seneca, L. A., Epistulae Morales XLII.7

Christmas last, my family was gathered around a table, opening crackers.

It seems that each year the crackers and the accompanying box get heavier. This year, along with the customary brightly-coloured paper hat, joke, and philosophical thought, a relative clutched a small book of cards.

These were labelled ‘Impossible Decisions!’ It was perhaps an idea from someone, somewhere to help those who can no longer chat convivially at the table among kin and cotton. Soon, challenges were being read out with gusto such as:

Would you rather you could only speak in rhyme or could only communicate through drawings?

It was overwhelming to see 100 such conundra collected together like an anthology of short poems.

I’ve been thinking about decisions for a long time. Some of that was as a sort of preparation for what was ahead, as a younger man, some later in life as I was confronted with various apparently important decisions, including those appearing as a consequence of a force majeure. Forces majeures may remove the optionality though, and make the decision for one. . .

Circa 1275AD, a real decision was to be made by Bondone, the humble tiller of the soil.

Who was Bondone? Well, he was Giotto’s father. According to Vasari (The Lives of the Painters, Sculptors and Architects (1550)), a gentleman artist called Cimabue, who was passing through their village, noticed the drawings by the 10-year-old boy Giotto that he had made of animals like the few sheep he had been given by his father to tend, and the nature nearby, and was so impressed that he wanted to take the boy to his studio in Florence. Giotto was happy to go, but said that his father would have to give his blessing.

There is the decision for Bondone: to keep my son to help till the soil and look after the sheep in the village, perhaps taking over everything before I age and weaken, or, to let him go off, away from me and our humble and rooted family, to the Big City with an unknown gentleman, to learn the skills of various fine arts. Perhaps since he had a number of siblings, the little one was of course sent packing, if indeed there was much to pack.

Years later, having fraternised with people like Dante, whose master he painted, Giotto became known as a great artist; his portrayal of a frightened Christ child being presented by His mother Mary to Simeon, in the private Scrovegni Chapel, located in Padua, is said to be an extraordinary thing. The risks did not materialise, and the positives, we assume, outweighed the family missing the talented son.

We must of course note the very different subjective viewpoints of ‘general posterity’ and that decision for Bondone, his son and family in that village 14 miles from Florence, in the late thirteenth century.

Seneca urges his reader, particularly Lucilius, to whom he was writing, but in the end, all of us, to think not only about the values, the positives of a choice, but also the negatives, and he lists some of them:

danger, anxiety, lost honour, personal freedom and loss of time, …among others.

Many of life’s decisions do not have a simple and immediate answer, but we can choose to try to make them in a better way, and there is a selection of methods to choose from.


I put it to you that there are better and worse ways to do it, and that choosing to be consistent may well be better in the end.

Is the effort required in going beyond ‘gut instinct’ of more value than its gains? When is this so?

There are perhaps some very large decisions that perhaps really ought to be made more rigorously.

Following the wisdom of ancients like Seneca, we can all learn to assess the real not the notional position, ‘own ourselves’, avoid over- or perhaps more often, under-valuing ourselves, and find our own way forward.

We do not give up. . .

New Probability Tables

Student-b

Dear Sir/Madam,

In this letter, I derive and tabulate the maximum entropy values for the probabilities of each side of biased n-sided dice, for n=3,4,6,8,10,12,15, and 20. These probabilities for each of the n options (sides), are those which have the least input information beyond what we know, which is nothing more than the bias or average score on the n-sided die. I generalise the “Brandeis dice” problem from E T Jaynes’ 1963 lectures, to an n-sided die, from the 6-sided case. To calculate these probabilities, I obtain the solution of an n+1-order polynomial equation, derived using a power series identity, for the value of the Lagrange multiplier, λ. The resulting maximally-equivocated prior probabilities at the 5th, 17th, 34th, 50th (fair), 66th, 83rd, and 95th percentiles of the range from 1 up to will aid in decision-making, where the options are the conditions we cannot influence, but across which we may have a non-linear payoff.

We use the standard variational principle in order to maximise the entropy in the system.

i=1npifk(xi)=Fk

i=1npi=1

where the index k is not summed over in the first equation, and where the pi are the probabilities of the n options, e.g. sides of an n-dice. Fk are the numbers given in the problem statement (constraints or biases), and fk(xi) are functions of the Lagrange multipliers λi. The second equation is just the probability axiom requiring the probabilities to sum to one. This set of constraints is solved by using Lagrange multipliers. The formal solution is

pi=1Zexp[λ1f1(x1)...λmfm(xi)]

where Z(λ1,...,λm)=i=1nexp[λ1f1(x1)...λmfm(xi)]is the partition function and λk are the set of multipliers, of which for a solution to the problem there need to be fewer than n, though in our current problem as we shall see, there is only one. The constraints are satisfied if:

Fk=λklogeZ

for k ranging from 1 to m.

Our measure of entropy is given by S=i=1npilogepi and in terms of our constraints, i.e. the data, this function is:

S(F1,...,Fm)=logeZ+k=1mλkFk

The solution for the maximum of S is:

λk=SFk

For k in same range up to m. For our set of n-sided dice, m=1 and so I can simplify Fkto F. The fk(xi) are simply the set of i the values on the n sides of our die.

For the problem at hand of the biased die, I introduce the quantity q which I define as the tested, trusted average score on the given n-sided die in hand. That is, I set F=q here, our bias constraint number, which can range from the lowest die value 1 through to the highest value, n.

q=q0:=12(n+1)

i.e. 3.5 on a 6-sided die, then the die is fair, otherwise, it has a bias and therefore an additional constraint. I assume this is all I know and believe about the die, other than the number of sides, n.

We see that λk becomes just λ and the equation for Fk reduces to

F=λlogeZ

and the equation for Sk reduces to

S(F)=logeZ+λF

and its solution is

λ=SF

After a little algebra, I found that the partition function Z is given by

Z=x(xn1)(x1)

and after some further algebra, I found that in order to determine the value of x, where x=eλ, corresponding to the maximum entropy (least input information) set of probabilities, we must find the positive, real root of the following equation, which is not unity:

(nq)xn+1+(q(n+1))xn1+qx+(1q)=0

By inspection, this equation is always satisfied by the real solution x = 1, which corresponds to the fair or unbiased die, with all probabilities equal to 1/n for n sides. We need the other real root, and we obtain this by simple numerical calculation. From the solution x=xq for the given value of bias q, the set of probabilities corresponding to maximum entropy for each side of the relevant n-sided die are easily generated.

The following tables may be of use in decisionmaking in business and other contexts, especially where the agent (the organisation or individual making a decision) has a non-linear desirability or utility function over the outcomes (i.e. the values of the discrete set of possible options), does not have perfect intuition and does not wish to put any more information into the decision that is not within the agent’s state of knowledge.

I present tables for n = 3, 4, 6, 8, 10, 12, 15, and 20 here, each at 7 bias values of q for each n, corresponding to percentages of the range from 1 to n of 5%, 17, 34, 50, 66, 83 and 95%. There is transformation group symmetry in this problem. If i represents the side with i spots up, then when we reflect from in+1i and transform x1x we obtain the same probability, e.g. the probability of a 1 on a six sided die at 5th percentile bias is the same as a 6 at 95th percentile bias. This is why in our tables we can observe the corresponding symmetry in the values of the probabilities and in the entropy, which is maximal of all biases when there is no bias and thus no constraint. Readers may wish arbitrarily to adjust any of the probabilities in the tables in the appendix and recalculate the entropy S=i=1npilogepi, which will be lower than the maximum entropy value in the table.

APPENDIX Student-b Maximum Entropy Probability Tables

n=3
q05 q17 q34 q0 q66 q83 q95
q-vals 1.1 1.34 1.68 2 2.32 2.66 2.9
Score Probabilities
1 0.9078 0.7232 0.5064 0.3333 0.1864 0.0632 0.0078
2 0.0843 0.2137 0.3072 0.3333 0.3072 0.2137 0.0843
3 0.0078 0.0632 0.1864 0.3333 0.5064 0.7232 0.9078
entropy 0.3343 0.7386 1.0203 1.0986 1.0203 0.7386 0.3343
n=4
q05 q17 q34 q0 q66 q83 q95
q-vals 1.15 1.51 2.02 2.5 2.98 3.49 3.85
Score Probabilities
1 0.8689 0.6425 0.4136 0.25 0.1241 0.0324 0.002
2 0.1141 0.2374 0.2769 0.25 0.1854 0.0877 0.015
3 0.015 0.0877 0.1854 0.25 0.2769 0.2374 0.1141
4 0.002 0.0324 0.1241 0.25 0.4136 0.6425 0.8689
Entropy 0.445 0.9502 1.2921 1.3863 1.2921 0.9502 0.445
n=6
q05 q17 q34 q0 q66 q83 q95
q-vals 1.250 1.85 2.70 3.5 4.3 5.15 5.75
Score Probabilities
1 0.7998 0.5260 0.3043 0.1666 0.072 0.0134 0.0003
2 0.1602 0.2527 0.2282 0.1667 0.0961 0.028 0.0013
3 0.0321 0.1214 0.1711 0.1667 0.1282 0.0583 0.0064
4 0.0064 0.0583 0.1282 0.1667 0.1711 0.1214 0.0321
5 0.0013 0.028 0.0961 0.1667 0.2282 0.2527 0.1602
6 0.0003 0.0135 0.0721 0.1667 0.3043 0.526 0.7998
Entropy 0.6254 1.2655 1.6794 1.7918 1.6794 1.2655 0.6254

APPENDIX Student-b Maximum Entropy Probability Tables (ctd)

n=8
q05 q17 q34 q0 q66 q83 q95
q-vals 1.35 2.19 3.38 4.5 5.62 6.81 7.65
Score Probabilities
1 0.7407 0.4454 0.2412 0.125 0.05 0.0076 0.0001
2 0.1921 0.2489 0.1927 0.125 0.0626 0.0136 0.0002
3 0.0498 0.1391 0.1539 0.125 0.0784 0.0243 0.0009
4 0.0129 0.0777 0.1229 0.125 0.0982 0.0434 0.0034
5 0.0034 0.0434 0.0982 0.125 0.1229 0.0777 0.0129
6 0.0009 0.0243 0.0784 0.125 0.1539 0.1391 0.0498
7 0.0002 0.0136 0.0626 0.125 0.1927 0.2489 0.1921
8 0.0001 0.0076 0.05 0.125 0.2412 0.4454 0.7407
Entropy 0.7726 1.5012 1.9569 2.0794 1.9569 1.5012 0.7726
n=10
q05 q17 q34 q0 q66 q83 q95
q-vals 1.45 2.53 4.06 5.5 6.94 8.47 9.55
Score Probabilities
1 0.6896 0.3862 0.2 0.1 0.0381 0.005 0.0000
2 0.214 0.2382 0.1663 0.1 0.0458 0.0081 0.0001
3 0.0664 0.147 0.1383 0.1 0.055 0.0131 0.0002
4 0.0206 0.0907 0.115 0.1 0.0662 0.0213 0.0006
5 0.0064 0.0559 0.0957 0.1 0.0796 0.0345 0.002
6 0.002 0.0345 0.0796 0.1 0.0957 0.0559 0.0064
7 0.0006 0.0213 0.0662 0.1 0.115 0.0907 0.0206
8 0.0002 0.0131 0.055 0.1 0.1383 0.147 0.0664
9 0.0001 0.0081 0.0458 0.1 0.1663 0.2382 0.214
10 0.0000 0.005 0.0381 0.1 0.2 0.3862 0.6896
Entropy 0.8981 1.6905 2.1735 2.3026 2.1735 1.6905 0.8981

APPENDIX Student-b Maximum Entropy Probability Tables (ctd)

n=12
q05 q17 q34 q0 q66 q83 q95
q-vals 1.55 2.87 4.74 6.5 8.26 10.13 11.45
Score Probabilities
1 0.6451 0.3408 0.1708 0.0833 0.0306 0.0036 0.0000
2 0.2289 0.2255 0.1461 0.0833 0.0358 0.0055 0.0000
3 0.0812 0.1492 0.125 0.0833 0.0419 0.0083 0.0001
4 0.0288 0.0987 0.1069 0.0833 0.049 0.0125 0.0002
5 0.0102 0.0653 0.0915 0.0833 0.0572 0.0189 0.0005
6 0.0036 0.0432 0.0782 0.0833 0.0669 0.0286 0.0013
7 0.0013 0.0286 0.0669 0.0833 0.0782 0.0432 0.0036
8 0.0005 0.0189 0.0572 0.0833 0.0915 0.0653 0.0102
9 0.0002 0.0125 0.049 0.0833 0.1069 0.0987 0.0288
10 0.0001 0.0083 0.0419 0.0833 0.125 0.1492 0.0812
11 0.0000 0.0055 0.0358 0.0833 0.1461 0.2255 0.2289
12 0.0000 0.0036 0.0306 0.0833 0.1708 0.3408 0.6451
Entropy 1.0078 1.8001 2.1252 2.4849 1.7684 1.1462 1.0081
n=15
q05 q17 q34 q0 q66 q83 q95
q-vals 1.7 3.38 5.76 8 10.24 12.62 14.3
Score Probabilities
1 0.5882 0.2898 0.1402 0.0667 0.0236 0.0025 0.0000
2 0.2422 0.2063 0.1235 0.0667 0.0269 0.0035 0.0000
3 0.0997 0.1469 0.1087 0.0667 0.0305 0.0049 0.0000
4 0.0411 0.1046 0.0957 0.0667 0.0346 0.0069 0.0000
5 0.0169 0.0745 0.0843 0.0667 0.0393 0.0097 0.0001
6 0.007 0.053 0.0743 0.0667 0.0447 0.0136 0.0002
7 0.0029 0.0378 0.0654 0.0667 0.0507 0.0191 0.0005
8 0.0012 0.0269 0.0576 0.0667 0.0576 0.0269 0.0012
9 0.0005 0.0191 0.0507 0.0667 0.0654 0.0378 0.0029
10 0.0002 0.0136 0.0447 0.0667 0.0743 0.053 0.007
11 0.0001 0.0097 0.0393 0.0667 0.0843 0.0745 0.0169
12 0.0000 0.0069 0.0346 0.0667 0.0957 0.1046 0.0411
13 0.0000 0.0049 0.0305 0.0667 0.1087 0.1469 0.0997
14 0.0000 0.0035 0.0269 0.0667 0.1235 0.2063 0.2422
15 0.0000 0.0025 0.0236 0.0667 0.1402 0.2898 0.5882
Entropy 1.1517 2.0471 2.5698 2.7081 2.5698 2.0471 1.1517

APPENDIX Student-b Maximum Entropy Probability Tables (ctd)

n=20
q05 q17 q34 q0 q66 q83 q95
q-vals 1.95 4.23 7.46 10.5 13.54 16.77 19.05
Score Probabilities
1 0.5128 0.2318 0.108 0.05 0.0171 0.0016 0.0000
2 0.2498 0.1784 0.098 0.05 0.0188 0.0021 0.0000
3 0.1217 0.1372 0.0889 0.05 0.0207 0.0027 0.0000
4 0.0593 0.1056 0.0807 0.05 0.0229 0.0035 0.0000
5 0.0289 0.0812 0.0732 0.05 0.0252 0.0045 0.0000
6 0.0141 0.0625 0.0665 0.05 0.0278 0.0059 0.0000
7 0.0069 0.0481 0.0603 0.05 0.0306 0.0077 0.0000
8 0.0033 0.037 0.0547 0.05 0.0337 0.01 0.0001
9 0.0016 0.0285 0.0497 0.05 0.0371 0.013 0.0002
10 0.0008 0.0219 0.0451 0.05 0.0409 0.0169 0.0004
11 0.0004 0.0169 0.0409 0.05 0.0451 0.0219 0.0008
12 0.0002 0.013 0.0371 0.05 0.0497 0.0285 0.0016
13 0.0001 0.01 0.0337 0.05 0.0547 0.037 0.0033
14 0.0000 0.0077 0.0306 0.05 0.0603 0.0481 0.0069
15 0.0000 0.0059 0.0278 0.05 0.0665 0.0625 0.0141
16 0.0000 0.0045 0.0252 0.05 0.0732 0.0812 0.0289
17 0.0000 0.0035 0.0229 0.05 0.0807 0.1056 0.0593
18 0.0000 0.0027 0.0207 0.05 0.0889 0.1372 0.1217
19 0.0000 0.0021 0.0188 0.05 0.098 0.1784 0.2498
20 0.0000 0.0016 0.0171 0.05 0.108 0.2318 0.5128
Entropy 1.351 2.3085 2.8526 2.9957 2.8526 2.3085 1.351