Our sensors give us an incomplete, noisy, and indirect information about the world. For example, estimating the location of a sound source is difficult because in natural contexts, the sound of interest is corrupted by other sound sources, reflections, etc. Thus it is not possible to know the position of the source with certainty. The ‘Bayesian coding hypothesis’ (Knill & Pouget, 2014) postulates that the brain represents not the most likely position, but the entire probability distribution of the position. It then uses those distributions to do Bayesian inference, for example, when combining different sources of information (say, auditory and visual). This would allow the brain to optimally infer the most likely position. There is indeed some evidence for optimal inference in psychophysical experiments – although there is also some contradicting evidence (Rahnev & Denison, 2018).
The idea has some appeal. The problem is that, by framing perception as a statistical inference problem, it focuses on the most trivial type of uncertainty, statistical uncertainty. It is illustrated by the following quote: “The fundamental concept behind the Bayesian approach to perceptual computations is that the information provided by a set of sensory data about the world is represented by a conditional probability density function over the set of unknown variables”. Implicit in this representation is a particular model, for which variables are defined. Typically, one model describes a particular experimental situation. For example, the model would describe the distribution of auditory cues associated with the position of the sound source. Another situation would be described by a different model, for example one with two sound sources would require a model with two variables. Or if the listening environment is a room and the size of that room might vary, then we would need a model with the dimensions of the room as variables. In any of these cases where we have identified and fixed parametric sources of variation, then the Bayesian approach works fine, because we are indeed facing a problem of statistical inference. But that framework doesn’t fit any real life situation. In real life, perceptual scenes have variable structure, which corresponds to the model in statistical inference (there is one source, or two sources, we are in a room, the second source comes from the window, etc). The perceptual problem is therefore not just to infer the parameters of the model (dimensions of the room etc), but also the model itself, its structure. Thus, it is not possible in general to represent an auditory scene by a probability distribution on a set of parameters, because the very notion of a parameter already assumes that the structure of the scene is known and fixed.
Inferring parameters for a known statistical model is relatively easy. What is really difficult, and is still challenging for machine learning algorithms today, is to identify the structure of a perceptual scene, what constitutes an object (object formation), how objects are related to each other (scene analysis). These fundamental perceptual processes do not exist in the Bayesian brain. This touches on two very different types of uncertainty: statistical uncertainty, variations that can be interpreted and expected in the framework of a model; and epistemic uncertainty, the model is unknown (the difference has been famously explained by Donald Rumsfeld).
Thus, the “Bayesian brain” idea addresses an interesting problem (statistical inference), but it trivializes the problem of perception, by missing the fact that the real challenge is epistemic uncertainty (building a perceptual model), not statistical uncertainty (tuning the parameters): the world is not noisy, it is complex.
Dear Romain,
Thanks for the great blog posts. I have a clarification question: I'm not sure I understand the distinction you make between statistical uncertainty and epistemic uncertainty. Is there a fundamental difference between the two or is it supposed to be a practical distinction? A statistical model can have complex internal structure and complex parameters (including infinite-dimensional parameters in non-parametric or semi-parametric approaches), so I don't follow why, in principle, it couldn't capture complex aspects of the structure of the world. One might call into question whether it is feasible to conduct bayesian inference on full probability distributions with the kind of complicated models that operation in the real-world seems to require, but I'm not sure whether this is what you are getting at here. Note that I do agree that existing proof of concepts for the 'bayesian brain' hypothesis consider very restricted experimetal scenarii and that I have doubt as to their scalability.
Thanks for your attention.
Yes, you could mathematically describe distributions over structures that are not vector spaces, including labelled graphs etc. But the Bayesian brain hypothesis is not just that there are mental representations of those distributions, but also explicit neural representations in the form of parametric distributions, based on things such as tuning curves; in other words, distributions over a vector space, with variables whose meaning is fixed in advance (if it is not, then the problem is where is the meaning of the variables represented?). If you wanted to explicitly represent all possible structures of the world, then you would need a symbol system (because of recursion), which goes quite some way beyond the standard connectionist view that the "Bayesian brain hypothesis" seems to support.
For the broader point about statistical vs epistemic, you can find examples in science, for example mechanics. You want to know how bodies move. You decide to throw some huge balls from a tower. You come up with Newtonian mechanics. In your model, all balls have exactly the same trajectory, independent of size. There is some uncertainty due to fluctuations in the air. You can model this uncertainty by throwing the ball lots of times and measuring a distribution. This is statistical uncertainty. Now you go out of your tower and throw a feather. You make an incorrect prediction, but one that is systematically incorrect, because your model did not taken into account friction. You make errors because your model is wrong, in a way that could be fixed.
On a deeper level, the use of Newtonian mechanics relies on many assumptions that we call modeling and that are not totally straightforward. For example, we take a bunch of molecules and decide to call it "an object", and on that object we define observables such as the mass, the center of inertia, the position, the speed and the acceleration; all of those rely on sophisticated geometric ideas. This constitutes the framework that will produce data on which you could possibly measure distributions (although you will note that Newton did not use Bayes theorem to infer his laws of mechanics). There is a choice of relevant observables that is prior to any statistical model. It is in fact the part we call "modeling".
Ping : What is computational neuroscience? (XXIX) The free energy principle | Romain Brette