Bayes’ rule and Jeffrey’s updating from the principle of minimum change

What follows is, to the best of my understanding, largely folklore. Variants of the argument appear implicitly across the literature on Bayesian updating, relative entropy, and consistent inference, and the conclusion is often taken for granted once one is familiar with Jeffrey updating. However, I have not been able to find a place where the reasoning is laid out explicitly and in a self-contained way, starting from minimal assumptions and making clear what is, and is not, being imposed. For that reason, I am recording this note here, both for my own reference and in the hope that it may be useful to others. A PDF version containing the same material, with slightly improved formatting, is available here.

Let XX and YY be finite-valued quantities (random variables) about which we wish to reason. Our prior information is summarized by a joint distribution

P(x,y)=p(x)φ(y|x),P(x,y)=p(x)\,\varphi(y|x),

where p(x)p(x) represents our prior state of knowledge about XX, and φ(y|x)\varphi(y|x) encodes the likelihood. No interpretation beyond this bookkeeping role is assumed.

From PP we may compute the implied marginal distribution PY(y)=xp(x)φ(y|x)P_Y(y)=\sum_x p(x)\,\varphi(y|x) and, by the product rule (I wouldn’t call this “Bayes’ theorem” yet), the inverse conditional distribution

φ^(x|y)=P(x|y)=p(x)φ(y|x)PY(y).\hat\varphi(x|y)=P(x|y)=\frac{p(x)\,\varphi(y|x)}{P_Y(y)}.

Now suppose that new information becomes available which does not specify a particular value of YY, but instead constrains our revised beliefs about YY to take the form of a probability distribution τ(y)\tau(y). The problem is then to determine what joint distribution R(x,y)R(x,y) should represent our new state of knowledge, given that it must be consistent with τ\tau and must not contain any information not logically implied by the prior PP together with this new constraint.

Principle of minimum change

Following the general principle that probabilities should be updated only to the extent required by new information, we select RR to minimize the relative entropy

D(RP)=x,yR(x,y)logR(x,y)P(x,y),D(R\|P)=\sum_{x,y}R(x,y)\log\frac{R(x,y)}{P(x,y)},

subject to the constraint RY=τR_Y=\tau. This criterion ensures that no unwarranted assumptions are introduced.

Solution

Any admissible RR may be written as

R(x,y)=τ(y)R(x|y).R(x,y)=\tau(y)\,R(x|y).

Substitution into the relative entropy yields the identity

D(RP)amp;=x,yR(x,y)logR(x,y)P(x,y)amp;=x,yτ(y)R(x|y)[logτ(y)P(y)+logR(x|y)φ^(x|y)]amp;={x,yτ(y)R(x|y)logτ(y)P(y)+x,yτ(y)R(x|y)logR(x|y)φ^(x|y)}amp;=D(τPY)+x,yτ(y)R(x|y)logR(x|y)φ^(x|y)amp;=D(τPY)+yτ(y)D(R(|y)φ^(|y)).\begin{aligned} D(R\| P) &= \sum_{x,y}R(x,y)\log\frac{R(x,y)}{P(x,y)} \\ &= \sum_{x,y}\tau(y)R(x|y)\left[\log\frac{\tau(y)}{P(y)}+\log\frac{R(x|y)}{\hat\varphi(x|y)}\right] \\ &= \left\{\sum_{x,y}\tau(y)R(x|y)\log\frac{\tau(y)}{P(y)}+\sum_{x,y}\tau(y)R(x|y)\log\frac{R(x|y)}{\hat\varphi(x|y)}\right\} \\ &= D(\tau\|P_Y)+\sum_{x,y}\tau(y)\;R(x|y)\;\log\frac{R(x|y)}{\hat\varphi(x|y)} \\ & = D(\tau\|P_Y) + \sum_y \tau(y)\; D\Big(R(\cdot|y)\|\hat\varphi(\cdot|y)\Big)\;. \end{aligned}

(Apologies for the ugly formatting; I’m afraid this is a bug of WordPress; remember that a better formatted version can be downloaded from this link.) The first term depends only on the revised marginal τ\tau and is therefore fixed. The second term is nonnegative and vanishes if and only if

R(x|y)=φ^(x|y)for all x,y.R(x|y)=\hat\varphi(x|y) \quad\text{for all }x,y.

Hence the unique distribution consistent with the stated constraints and the principle of minimum updating is R(x,y)=τ(y)φ^(x|y)R(x,y)=\tau(y)\,\hat\varphi(x|y), wherever τ(y)>0\tau(y)>0. Note that, if it happens that τ(y)>0\tau(y)>0 but PY(y)=0P_Y(y)=0, we are in the situation in which the new evidence is falsifying our prior. This is signaled by the fact that, in this case, D(τPY)=+D(\tau\|P_Y)=+\infty.

Interpretation

The new information alters only our beliefs about YY; therefore, rational consistency requires that our conditional beliefs about XX given YY remain exactly those implied by the prior. Any other choice would amount to smuggling in additional information not contained in the premises.

Excerpt from an Opinion Paper on the Future of Quantum Science and Technology

A few months ago I was invited by Prof. Miroljub Dugic (Institute of Physics, University of Kragujevac, Serbia) to take part in a poll on the future of quantum science and technology. Below I report the questions together with my answers. The complete paper can be downloaded from this link.

Do you find the quantum measurement problem worth striving?

I really have no idea if or how the quantum measurement problem may be solved. Quantum theory itself has nothing to say about a putative solution, so, if I must pick one, I would say that any genuine resolution of the measurement problem, if it exists at all, would have to come from new physics for which we currently have neither hints nor any operational need, in a sense. For the moment, the problem seems to tell us more about the limits of our preferred narratives than about the limits of formalism itself.

Which research directions do you find prominent in theory and applications?

I believe we should more decisively push toward an observer-dependent perspective, not only in quantum theory but in science as a whole, including cosmology. This idea is not new; it is as old as science itself. From time to time it resurfaces, and when it does it sheds new light on physics, yet it is quickly overshadowed again by the powerful illusion that we call “objective reality”. In this spirit, I expect research directions that put center stage what an observer can infer and learn, and how such inferences are constrained or enabled by physical theory, to become increasingly prominent. Such an approach may guide both theoretical developments and quantum technologies.

What would you expect of the future post-quantum theory?

If a future post-quantum theory ever emerges, to deserve that name, it would have to depart from quantum theory at least as radically as quantum theory departed from classical physics. It is worth recalling that the inadequacy of classical theory was made evident by relatively simple, low-energy experiments at the end of the nineteenth and the beginning of the twentieth century. By contrast, and this is an important methodological point, we currently have no experimentally demonstrated situations in which quantum theory is inadequate. The open issues we do have are conceptual or theoretical rather than empirical. The regimes where quantum theory might conceivably fail lie at extreme scales not yet accessible to experiment. Without concrete empirical hints, imagining a genuinely new framework becomes exceedingly difficult. In this sense, talk of a post-quantum theory today feels uncomfortably close to “armchair philosophizing”, carried out without the observational footholds that historically guided genuine theoretical revolutions.