Motivation for probability as degree of belief

We begin with some quotes from a few writers, financiers and probability mathematicians.

• Tagore: ‘If you shut the door to all errors, the truth will be shut out (too).’

• Chekhov: ‘Knowledge is of no value, unless you put it into practice.’

• Munger (4): ‘You don’t have to be brilliant, only a little bit wiser than the other guys on average, for a long time.’

• Markowitz (1987) (5): ‘The rational investor is a Bayesian.’

• Box and Tiao (1973): ‘It follows that inferences that are unacceptable must come from inappropriate assumption/s not from our inferential logic.’

• Jaynes (1976): ‘..the record is clear; orthodox arguments against Bayes Theorem, and in favour of confidence intervals, have never considered such mundane things as demonstrable facts concerning performance.’

• Feller (1968): “In the long run, Bayesians win.”

Well, the mathematicians and financiers are clear about Bayesian probability, while the writers indirectly so: Tagore and Chekhov were astute. If no alternatives are given, then our approach and its probabilities will often be seriously affected, which fact Tagore appears to allude to. How often do we lament not using or remembering all we know – Chekhov seems aware of this.

We seem to be prepared in these times to exclude possibilities, `worlds’, options. In so doing, when we look at Bayes famous theorem for a minute, we see our inevitable error.

Munger and Feller’s quotations go hand in hand and lead us to the approach discussed here of logical or inductive, plausible inference. Our approach here acknowledges or is guided by the above motivating words. Let us try to spot places in life where we might think of some of these words.

How do you or your team make decisions?

If you don’t have a ‘plan’ for this, then my Art of Decision book should give you something to work with. Many people choose to make their decisions intuitively and rapidly (2). This may well suit many of us. But can you justify it? Is there nothing that a bit more reflection and logic can bring, on average and over the long run, that might help improve the way we bash forwards in life, business, diplomacy and statecraft?

What is good and easy, and what is not good and hard?

If there was any ‘give’ at all from the reader in answering the first question above, then this second one is a nice followup. If you can assess what has been ‘not good and hard’ in terms of decisions, then read on: there may just be something here for you.

What is probability as extended logic? (3)
Probability is our rational degree of belief . . . in the truth of a proposition.

Logic is the mode of reasoning. It is logic that is extended from only applying to binary certainty and impossibility to. . .

. . . all states of degree of belief (uncertainty) in between these two endpoints.

—-

2 as described in the book ‘Blink’ by Malcolm Gladwell.

3 This phrase was used, for example, by Jaynes (2003).

—-

What is decision theory?

The description of the process of how to use the information available to determine the

most desirable action available to the ‘Agent’ (the person or organisation who acts).

These definitions are general and seem to allow wide application. As a bonus, the ideas that underpin decisionmaking, i.e. our topic, also relate to artificial intelligence and machine learning, and thus will be of interest to those trying to give themselves a good base for understanding those rapidly developing areas.

—-

(4) d. 2023 at age 99. With Warren Buffett, he built up a fund of almost $1 trillion over several decades.
(5) Mean-Variance Analysis, Oxford: Blackwell

—-

We turn to the great Roman scholar de Finetti, who said in 1946: “Probability does not exist’’.

What did he mean by this? Here, we shall look at Bayesian subjective (6) probability as extended logic. We compare it with orthodox, frequentist ad-hoc statistics. We look at the pros and cons of probability, utility and Bayes’ logic and ask why it is not used more often. In card game terms, like contract bridge or other games, there is partial, differing states of knowledge across the game players, and there is the different concept of ‘missing a trick’, which is to say, we made a mistake given what we knew; subjective probability and Bayes-backed decision logic is about being rational and avoiding missing costly ‘tricks’, especially after a certain time or in the long run, by virtue of ‘playing’ consistently optimally.

Above is a quote of de Finetti (7), who was known for his intellect and beautiful writing. He meant that your probability of an event is subjective rather than objective. That probability does not exist in the same way that the ‘ether’ scientists thought existed before the Michelson-Morley experiments demonstrated its likely non-existence in the early 20th century.

Probability is relative not objective. It is a function of your state of knowledge, the possible options you are aware of, and the observed data that you may have, and which you trust. When these have been used up we equivocate between the alternatives. We do this in the sense that we choose our probability distribution so as to use all the information we have, not throwing any away, and so as not to add any more ‘information’ that we do not have (8). As you find out more or get more data, you can update your probabilities. This up-to-date probability distribution is one of your key tools for prediction and making decisions. Many people don’t write it down. Such information may be tacit and there will be some sort of ‘mental model’ in operation. If you try to work with probability, it is likely that you may not be using the above logic, i.e. probability theory. You may be making decisions by some other process. . .

In a 2022 lecture by the Nobel Laureate, Professor ‘t Hooft (9) expounded a theory in which everything that happens is determined in advance. (10)

Why do I bring this up? Let us go back to de Finetti. Since we can never know all the ‘initial conditions’ in their minute detail, then our world is subjective, based on our state of knowledge, and this leads to other theories, including that of probability logic,

which is my topic here. ’t Hooft’s theory is all very well. (11)
As human beings, we find this situation really tricky. There may be false intuition.

There may be ‘groupthink’. Alternatives may be absent from the calculations (we come back to this later).

—-

6 Subjective because I am looking at the decision from my point of view, with my state of knowledge.
7 See appendix on ‘History & key figures. . .
8 This is how the ‘Maximum Entropy Principle’ works, and there is an explicit example of how this works mathematically in the first section of the Mathematical Miscellany later.
9 He won the prize in the late 1990s for his work with a colleague on making a theory of subatomic particle forces make sense.
10 This was called the ‘N1 theory’.
11 And to digress, we may wonder how (bad or good) it is for humanity to live life under such a hypothesis.

—-

The famous ether experiment mentioned above is an example of the great majority of top scientists (physicists), in fairly modern times, believing in something, for a long time, that turned out later literally to be non-existent, like the Emperor’s New Clothes.

In the ‘polemic’ section of his paper about different kinds of estimation intervals (1976), the late, eminent physicist, E T Jaynes, wrote ‘. . . orthodox arguments against Laplace’s use of Bayes’ theorem and in favour of “confidence Intervals” have never considered such mundane things as demonstrable facts concerning performance.’

Jaynes went on to say that ‘on such grounds (i.e. that we may not give probability statements in terms of anything but random variables (12)), we may not use (Bayesian) derivations, which in each case lead us more easily to a result that is either the same as the best orthodox frequentist result, or demonstrably superior to it’.

Jaynes went on: ‘We are told by frequentists that we must say ‘the % number of times that the confidence interval covers the true value of the parameter’ not ‘the probability that the true value of the parameter lies in the credibility interval’. And: ‘The foundation stone of the orthodox school of thought is the dogmatic insistence that the word probability must be interpreted as frequency in some random experiment.’ Often that ‘experiment’ involves made-up, randomised data in some imaginary and only descriptive, rather than a useful prescriptive (13), model. Often, we can’t actually repeat the experiment directly or even do it once. Many organisations will want a prescription for their situation in the here-and-now, rather than a description of what may happen with a given frequency in some ad hoc and imaginary model that uses any amount of made-up data.

Liberally quoting again, Jaynes continues: ‘The only valid criterion for choosing is which approach leads us to the more reasonable and useful results?

‘In almost every case, the Bayesian result is easier to get at and more elegant. The main reason for this is that both the ad hoc step of choosing a statistic and the ensuing mathematical problem finding its sampling distribution are eliminated.

‘In virtually every real problem of real life the direct probabilities are not determined by any real random experiment; they are calculated from a theoretical model whose choice involves ‘subjective’ judgement. . . and then ‘objective’ calibration and maximum entropy equivocation between outcomes we don’t know(14). Here, ‘maximum entropy’ simply means not putting in any more information once we’ve used up all the information we believe we actually have.

‘Our job is not to follow blindly a rule which would prove correct 95% of the time in the long run; there are an infinite number of radically different rules, all with this property.

—-

12 In his book, de Finetti avoids the term ‘variable’ as it suggests a number which ‘varies’, which he considers a strange concept related to the frequentist idea of multiple or many idealised identical trials where the parameter we want to describe is fixed, and the data is not fixed, which viewpoint probability logic reverses. He uses the phrase: random quantity instead.
13 What should we believe? What should we therefore do?
14 See Objective Bayesianism by Williamson (2010)

—-

Things (mostly) never stay put for the long run. Our job is to draw the conclusions that are most likely to be right in the specific case at hand; indeed, the problems in which it is most important that we get this theory right or just the ones where we know from the start that the experiment can never be repeated.’

‘In the great majority of real applications, long run performance is of no concern to us, because it will never be realised.’

And finally, Jaynes said that ‘the information we receive is often not a direct proposition, but is an indirect claim that a proposition is true, from some “noisy” source that is itself not wholly reliable’. The great Hungarian logician and problem-solver Pólya deals with such situations in his 1954 works around plausible inference, and we cover the basics of this in this book.

Most people are happy to use logic when dealing with certainty and impossibility. This is the standard architecture for sextillions of electronic devices, for example.

Where there is uncertainty between these extremes of logic, let us use the theory of probability as extended logic.

The above is a draft adaptation of an early chapter section of the 2024 book The Art of Decision by Dr J D Hayward

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30