Motivation for probability as degree of belief

We begin with some quotes from a few writers, financiers and probability mathematicians.

• Tagore: ‘If you shut the door to all errors, the truth will be shut out (too).’

• Chekhov: ‘Knowledge is of no value, unless you put it into practice.’

• Munger (4): ‘You don’t have to be brilliant, only a little bit wiser than the other guys on average, for a long time.’

• Markowitz (1987) (5): ‘The rational investor is a Bayesian.’


• Box and Tiao (1973): ‘It follows that inferences that are unacceptable must come from inappropriate assumption/s not from our inferential logic.’

• Jaynes (1976): ‘..the record is clear; orthodox arguments against Bayes Theorem, and in favour of confidence intervals, have never considered such mundane things as demonstrable facts concerning performance.’

• Feller (1968): “In the long run, Bayesians win.”

Well, the mathematicians and financiers are clear about Bayesian probability, while the writers indirectly so: Tagore and Chekhov were astute. If no alternatives are given, then our approach and its probabilities will often be seriously affected, which fact Tagore appears to allude to. How often do we lament not using or remembering all we know – Chekhov seems aware of this.

We seem to be prepared in these times to exclude possibilities, `worlds’, options. In so doing, when we look at Bayes famous theorem for a minute, we see our inevitable error.

Munger and Feller’s quotations go hand in hand and lead us to the approach discussed here of logical or inductive, plausible inference. Our approach here acknowledges or is guided by the above motivating words. Let us try to spot places in life where we might think of some of these words.

How do you or your team make decisions?

If you don’t have a ‘plan’ for this, then my Art of Decision book should give you something to work with. Many people choose to make their decisions intuitively and rapidly (2). This may well suit many of us. But can you justify it? Is there nothing that a bit more reflection and logic can bring, on average and over the long run, that might help improve the way we bash forwards in life, business, diplomacy and statecraft?

What is good and easy, and what is not good and hard?

If there was any ‘give’ at all from the reader in answering the first question above, then this second one is a nice followup. If you can assess what has been ‘not good and hard’ in terms of decisions, then read on: there may just be something here for you.

What is probability as extended logic? (3)
Probability is our rational degree of belief . . . in the truth of a proposition.

Logic is the mode of reasoning. It is logic that is extended from only applying to binary certainty and impossibility to. . .

. . . all states of degree of belief (uncertainty) in between these two endpoints. 

—-

2 as described in the book ‘Blink’ by Malcolm Gladwell.

3 This phrase was used, for example, by Jaynes (2003).

—-

What is decision theory?

The description of the process of how to use the information available to determine the

most desirable action available to the ‘Agent’ (the person or organisation who acts).

These definitions are general and seem to allow wide application. As a bonus, the ideas that underpin decisionmaking, i.e. our topic, also relate to artificial intelligence and machine learning, and thus will be of interest to those trying to give themselves a good base for understanding those rapidly developing areas.

—-

(4) d. 2023 at age 99. With Warren Buffett, he built up a fund of almost $1 trillion over several decades. 
(5) Mean-Variance Analysis, Oxford: Blackwell

—-

We turn to the great Roman scholar de Finetti, who said in 1946: “Probability does not exist’’.

What did he mean by this? Here, we shall look at Bayesian subjective (6) probability as extended logic. We compare it with orthodox, frequentist ad-hoc statistics. We look at the pros and cons of probability, utility and Bayes’ logic and ask why it is not used more often. In card game terms, like contract bridge or other games, there is partial, differing states of knowledge across the game players, and there is the different concept of ‘missing a trick’, which is to say, we made a mistake given what we knew; subjective probability and Bayes-backed decision logic is about being rational and avoiding missing costly ‘tricks’, especially after a certain time or in the long run, by virtue of ‘playing’ consistently optimally.

Above is a quote of de Finetti (7), who was known for his intellect and beautiful writing. He meant that your probability of an event is subjective rather than objective. That probability does not exist in the same way that the ‘ether’ scientists thought existed before the Michelson-Morley experiments demonstrated its likely non-existence in the early 20th century.

Probability is relative not objective. It is a function of your state of knowledge, the possible options you are aware of, and the observed data that you may have, and which you trust. When these have been used up we equivocate between the alternatives. We do this in the sense that we choose our probability distribution so as to use all the information we have, not throwing any away, and so as not to add any more ‘information’ that we do not have (8). As you find out more or get more data, you can update your probabilities. This up-to-date probability distribution is one of your key tools for prediction and making decisions. Many people don’t write it down. Such information may be tacit and there will be some sort of ‘mental model’ in operation. If you try to work with probability, it is likely that you may not be using the above logic, i.e. probability theory. You may be making decisions by some other process. . .

In a 2022 lecture by the Nobel Laureate, Professor ‘t Hooft (9) expounded a theory in which everything that happens is determined in advance. (10)

Why do I bring this up? Let us go back to de Finetti. Since we can never know all the ‘initial conditions’ in their minute detail, then our world is subjective, based on our state of knowledge, and this leads to other theories, including that of probability logic,

which is my topic here. ’t Hooft’s theory is all very well. (11)
As human beings, we find this situation really tricky. There may be false intuition.

There may be ‘groupthink’. Alternatives may be absent from the calculations (we come back to this later).

—-

6 Subjective because I am looking at the decision from my point of view, with my state of knowledge.
7 See appendix on ‘History & key figures. . .
8 This is how the ‘Maximum Entropy Principle’ works, and there is an explicit example of how this works mathematically in the first section of the Mathematical Miscellany later.
9 He won the prize in the late 1990s for his work with a colleague on making a theory of subatomic particle forces make sense.
10 This was called the ‘N1 theory’.
11 And to digress, we may wonder how (bad or good) it is for humanity to live life under such a hypothesis.

—-

The famous ether experiment mentioned above is an example of the great majority of top scientists (physicists), in fairly modern times, believing in something, for a long time, that turned out later literally to be non-existent, like the Emperor’s New Clothes.

In the ‘polemic’ section of his paper about different kinds of estimation intervals (1976), the late, eminent physicist, E T Jaynes, wrote ‘. . . orthodox arguments against Laplace’s use of Bayes’ theorem and in favour of “confidence Intervals” have never considered such mundane things as demonstrable facts concerning performance.’

Jaynes went on to say that ‘on such grounds (i.e. that we may not give probability statements in terms of anything but random variables (12)), we may not use (Bayesian) derivations, which in each case lead us more easily to a result that is either the same as the best orthodox frequentist result, or demonstrably superior to it’.

Jaynes went on: ‘We are told by frequentists that we must say ‘the % number of times that the confidence interval covers the true value of the parameter’ not ‘the probability that the true value of the parameter lies in the credibility interval’. And: ‘The foundation stone of the orthodox school of thought is the dogmatic insistence that the word probability must be interpreted as frequency in some random experiment.’ Often that ‘experiment’ involves made-up, randomised data in some imaginary and only descriptive, rather than a useful prescriptive (13), model. Often, we can’t actually repeat the experiment directly or even do it once. Many organisations will want a prescription for their situation in the here-and-now, rather than a description of what may happen with a given frequency in some ad hoc and imaginary model that uses any amount of made-up data.

Liberally quoting again, Jaynes continues: ‘The only valid criterion for choosing is which approach leads us to the more reasonable and useful results?

‘In almost every case, the Bayesian result is easier to get at and more elegant. The main reason for this is that both the ad hoc step of choosing a statistic and the ensuing mathematical problem finding its sampling distribution are eliminated.

‘In virtually every real problem of real life the direct probabilities are not determined by any real random experiment; they are calculated from a theoretical model whose choice involves ‘subjective’ judgement. . . and then ‘objective’ calibration and maximum entropy equivocation between outcomes we don’t know(14). Here, ‘maximum entropy’ simply means not putting in any more information once we’ve used up all the information we believe we actually have.

‘Our job is not to follow blindly a rule which would prove correct 95% of the time in the long run; there are an infinite number of radically different rules, all with this property.

—-

12 In his book, de Finetti avoids the term ‘variable’ as it suggests a number which ‘varies’, which he considers a strange concept related to the frequentist idea of multiple or many idealised identical trials where the parameter we want to describe is fixed, and the data is not fixed, which viewpoint probability logic reverses. He uses the phrase: random quantity instead.
13 What should we believe? What should we therefore do? 
14 See Objective Bayesianism by Williamson (2010)

—-

Things (mostly) never stay put for the long run. Our job is to draw the conclusions that are most likely to be right in the specific case at hand; indeed, the problems in which it is most important that we get this theory right or just the ones where we know from the start that the experiment can never be repeated.’

‘In the great majority of real applications, long run performance is of no concern to us, because it will never be realised.’

And finally, Jaynes said that ‘the information we receive is often not a direct proposition, but is an indirect claim that a proposition is true, from some “noisy” source that is itself not wholly reliable’. The great Hungarian logician and problem-solver Pólya deals with such situations in his 1954 works around plausible inference, and we cover the basics of this in this book.

Most people are happy to use logic when dealing with certainty and impossibility. This is the standard architecture for sextillions  of electronic devices, for example.

Where there is uncertainty between these extremes of logic, let us use the theory of probability as extended logic.


The above is a draft adaptation of an early chapter section of the 2024 book The Art of Decision by Dr J D Hayward

When to Use Bayes probability and When to Use Frequentist Statistics

When to use the Bayesian approach

In the following situations, I might want to use Bayes’ approach:

• I have quantifiable beliefs beforehand. These may come from internal experienced colleagues, external ‘experts’, or other subjective sources.

• The data may be ‘sparse’ or limited (presently or for the foreseeable), certainly not ‘big’ , and it often will, but may not dominate our prior, subjective beliefs.

• There is medium or high uncertainty involved.

• I wish to make consistent, sound decisions in the face of and acknowledging my uncertainty.

• I wish to do this in such a way that I can be honest with my stakeholders, shareholders, team, wider staff, investors, board, and so on.

• The model or data-generation methods will involve one or multiple parameters (such as profit, share price, average customer lifetime, transaction value, sales, cost, COGS, and so on).

• I cannot [wait to] trial in an idealised experiment. In dynamic environments, this is one of the key problems with frequentist approaches: we never have the same situation and data twice. The Bayesian approach naturally revises and updates.

• I want to know what it is best to do, or understand what the options are and which ones are better or worse for me and my team in the here and now, for this occasion and situation. In life, it’s rare to be able to wait for ‘the long-run’, but it is often the case that using recent prior data can be useful.

• I want to use all the new data available to me, and be able to eliminate noise as best I can.

• I don’t want to choose an arbitrary approach, I want to use logic; I want the logic of the inferences to be ‘leakproof’ and only the assumptions can be inappropriate. Throughout this book, we’ll see some simple and more complicated examples of using logical probability.

Finally, Bayesian methods keep type. As Jaynes (1976) explained, if the data used is imaginary or pseudo-random, the probability distributions will be imaginary or pseudo-random, and if the data is real data, the probability distributions will relate to real data, e.g. real frequencies, then the probability outputs will be real frequencies, if the prior data is taken from what is reasonable to believe, then the out probabilities will also represent what is reasonable to believe, and so on. . . Summary: the outputs will be of the same character as the inputs.

We first compare approaches to statistics and probability.

• Comparing the Frequentist and the Bayesian approaches to probability

In idem flumen bis descendimus, et non descendimus – Horace, via Seneca, L. A., Epistulae Morales LVIII.23

Frequency is the description used of the statistics that are still the most commonly used. Here we define frequency and compare the frequentist with the Bayesian approach. The frequency definition of probability is the orthodoxy. It is defined as the number of successes say, m, in a large number of identical trials n, i.e. the probability is taken to be the frequency: m/n . There are laws of (large) numbers that lead us to believe that for high enough n, we shall have a good description of the propensity of an event happening.

However, a problem with frequency statistics is highlighted by analogy in the above saying attributed to the poet Horace, and by the apocryphal Buddhist monks. The river changes; we never step into the same river twice, though we go down to the ‘same river.’

In the table below comparing approaches, we see the dynamics of what is being modelled, i.e. ‘reality’, is best approached so that the model changes in real time with the latest information, rather than being descriptive and noting the unusualness of sample or batch information. One is subjectivist and relativist, while the other remains objectivist. We have seen how subjectivist theories like quantum mechanics and general relativity have superseded what went before. These are two very finely-tested theories. Is this is the moment subjectivist approaches in probability logic will arrive?

Summary comparison: Bayesian vs Frequentist Approaches

BayesianFrequentist
Inferential, prescriptiveDescriptive
The here and now, and the next…Long-run behaviour, hoping things persist as-is
Useful, intuitive resultPossibly large number of conflicting results
Elegant, simple mathematicsArbitrary convention & complexity
Weight of evidence, credibility intervalsSignificance, `p-values’, Confidence intervals
Probability as rational degree of beliefProbability as frequency of occurrence
Leakproof, logical probability theoryAd hoc devices, possibly irrelevant information
Equivocation, best model choiceModel then test samples
Unique outcome of one experimentAccept or reject batch vs population
Emphasises revision as data comes inNotes the sample data
Data fixed, parameters unknownData is just one of many possible realisations
Unknowns can be constantsUnknowns are random variables
Doesnt apply in all situationsDitto, but works most of the time with minimal assumptions
Use all of the data, optimallyOften does not use all the data or fully
Doesnt require us to understand degrees of freedom or sufficient statisticsWe must understand and compute the degrees of freedom
Common sense results, transparently inappropriate inference tracks back to the assumptionsSometimes non-common sense results or failure occurs without obvious recourse or poor inference
Focus on the scientific mathematical or business meritsFocus on overcoming technical difficulties of the methods

Table: Comparison of Bayesian (left column) and Frequentist (right column) methods

I have deliberately left out the somewhat contentious issue (to some) of ‘Prior’ distribution selection, but cover this issue in my book:

The Art of Decision, out soon with Big Bold Moves Publishing.

Impossible Decisions…

adeo nihil est cuique se vilius
Seneca, L. A., Epistulae Morales XLII.7

Christmas last, my family was gathered around a table, opening crackers.

It seems that each year the crackers and the accompanying box get heavier. This year, along with the customary brightly-coloured paper hat, joke, and philosophical thought, a relative clutched a small book of cards.

These were labelled ‘Impossible Decisions!’ It was perhaps an idea from someone, somewhere to help those who can no longer chat convivially at the table among kin and cotton. Soon, challenges were being read out with gusto such as:

Would you rather you could only speak in rhyme or could only communicate through drawings?

It was overwhelming to see 100 such conundra collected together like an anthology of short poems.

I’ve been thinking about decisions for a long time. Some of that was as a sort of preparation for what was ahead, as a younger man, some later in life as I was confronted with various apparently important decisions, including those appearing as a consequence of a force majeure. Forces majeures may remove the optionality though, and make the decision for one. . .

Circa 1275AD, a real decision was to be made by Bondone, the humble tiller of the soil.

Who was Bondone? Well, he was Giotto’s father. According to Vasari (The Lives of the Painters, Sculptors and Architects (1550)), a gentleman artist called Cimabue, who was passing through their village, noticed the drawings by the 10-year-old boy Giotto that he had made of animals like the few sheep he had been given by his father to tend, and the nature nearby, and was so impressed that he wanted to take the boy to his studio in Florence. Giotto was happy to go, but said that his father would have to give his blessing.

There is the decision for Bondone: to keep my son to help till the soil and look after the sheep in the village, perhaps taking over everything before I age and weaken, or, to let him go off, away from me and our humble and rooted family, to the Big City with an unknown gentleman, to learn the skills of various fine arts. Perhaps since he had a number of siblings, the little one was of course sent packing, if indeed there was much to pack.

Years later, having fraternised with people like Dante, whose master he painted, Giotto became known as a great artist; his portrayal of a frightened Christ child being presented by His mother Mary to Simeon, in the private Scrovegni Chapel, located in Padua, is said to be an extraordinary thing. The risks did not materialise, and the positives, we assume, outweighed the family missing the talented son.

We must of course note the very different subjective viewpoints of ‘general posterity’ and that decision for Bondone, his son and family in that village 14 miles from Florence, in the late thirteenth century.

Seneca urges his reader, particularly Lucilius, to whom he was writing, but in the end, all of us, to think not only about the values, the positives of a choice, but also the negatives, and he lists some of them:

danger, anxiety, lost honour, personal freedom and loss of time, …among others.

Many of life’s decisions do not have a simple and immediate answer, but we can choose to try to make them in a better way, and there is a selection of methods to choose from.


I put it to you that there are better and worse ways to do it, and that choosing to be consistent may well be better in the end.

Is the effort required in going beyond ‘gut instinct’ of more value than its gains? When is this so?

There are perhaps some very large decisions that perhaps really ought to be made more rigorously.

Following the wisdom of ancients like Seneca, we can all learn to assess the real not the notional position, ‘own ourselves’, avoid over- or perhaps more often, under-valuing ourselves, and find our own way forward.

We do not give up. . .