Search Blogs

Thursday, March 16, 2023

Bayes Rule: Visual Referesher

Bayes rule is a familiar or natural outcome for most familiar with probability theory. In words, it tells us how to update the probability of a random variable(s) given some event(s) has occurred and that we have some prior knowledge or belief about the probability of the random variable(s) from earlier events. The algebra to get to Bayes rule is simple but I found it always best to have a more spatial perspective on what Bayes rule is really stating.

I'll first begin with ta 2D square sample space, $\it{S}$. This space is discrete and we can represent each outcome as a tiny square, $\it{s}$. In this case, we will have a total of 12 tiny squares in $\it{S}$. This means there is a 1/16 chance that any square is randomly selected, hence, $p(s)$.

$$\begin{array}{|c|c|c|c|}\hline  \it{s_1} & \it{s_2}  & \it{s_3}  & \it{s_4}   \\ \hline \it{s_5}  & \it{s_6}  & \it{s_7} & \it{s_8}  \\ \hline \it{s_9}  & \it{s_{10}}  & \it{s_{11}}  &  \it{s_{11}} \\ \hline \it{s_{13}}  & \it{s_{14}} & \it{s_{15}}  & \it{s_{16}} \\ \hline \end{array}$$

$$\mathrm{P}(\it{s}_i)_{\it{S}} = \mathrm{1/16}$$

Now we say have the scenario where we are only interested in two subspaces of $\it{S}$, $\it{S}_A$ and $\it{S}_B$. More specifically we want to know the probabilities of a square randomly occurring in each of these subspaces given they occur in $\it{S}$ and what the probability is of a square occurring in the intersection, or state differently, the probability of a square occurring in both $\it{S}_A$ and $\it{S}_B$.

With this we have the following: $\mathrm{P}(\it{s}_A)$, $\mathrm{P}(\it{s}_B)$, and $\mathrm{P}(\it{s}_A \cap \it{s}_B) = \mathrm{P}(\it{s}_A)$. The updated image of this would look like:

The probability $\mathrm{P}(\it{s}_A)$ in red, $\mathrm{P}(\it{s}_B)$ in blue, and the overlap $P(\it{s}_A \cap \it{s}_B)$. Keep in mind that $\mathrm{P}(\it{s}_A \cap \it{s}_B) = \mathrm{P}(\it{s}_A \cap \it{s}_B) = \mathrm{1/8}$. 

The question we usually want to ask is not what the joint probability, i.e., what's the probability of both $\it{S}_A$ and $\it{S}_B$ squares, but instead is what is the probability of a square in $\it{S}_A$  given that a square in $\it{S}_B$ has been picked/occurred or vice versa.  So what does this mean? We want to compare the relative probabilities of the joint space to that of the given space where the event has occurred:

\begin{equation} \mathrm{P}(\it{s}_A | \it{s}_B) = \frac{\mathrm{P}(\it{s}_A  \cap \it{s}_B)}{\mathrm{P}(\it{s}_B)}\label{eq:bayes1}   \end{equation} 

and

\begin{equation} \mathrm{P}(\it{s}_B| \it{s}_A) = \frac{\mathrm{P}(\it{s}_A  \cap \it{s}_B)}{\mathrm{P}(\it{s}_A)} \label{eq:bayes2}\end{equation}

Notice how these two equations are not the same but we the probability in the joint space, $\mathrm{P}(\it{S}_A \cap \it{S}_B) = \mathrm{P}(\it{S}_B \cap \it{S}_A)$. This had to be the case just by looking at the illustration with the colored cells above. 

The key is that we can now determine the conditional probabilities, that is the probability of a cell in a subspace given a cell in the other subspace has been picked or occurred, by rearrange eq. \ref{eq:bayes1} and eq. \ref{eq:bayes2} for the joint probability and then substituting terms to get:

\begin{equation*} \mathrm{P}(\it{s}_A | \it{s}_B) \mathrm{P}(\it{s}_B) = \mathrm{P}(\it{s}_B | \it{s}_A) \mathrm{P}(\it{s}_A)\end{equation*}

which is rearranged to get the typical Bayes formula:

\begin{equation}\mathrm{P}\left(\it{s}_A | \it{s}_B\right)  = \frac{\mathrm{P}\left(\it{s}_B | \it{s}_A\right) \mathrm{P}\left(\it{s}_A\right)}{\mathrm{P}\left(\it{s}_B\right)} \label{eq:bayesformula}\end{equation}.

At first eq. \ref{eq:bayesformula} might seem expected you could. I mean it is just an outcome of analyzing probabilities of subspaces, but the impact is really how one can this equation to update knowledge. Let us break down the terms in eq. \ref{eq:bayesformula}.

The first term in the numerator is called the likelihood probability. It indicates how probably an event in $\it{S}_B$ is given that an event in $\it{S}_A$ occurs. It can also represent the probability of the observed data given the model and its parameters (i.e. prior over parameters). The second term in the numerator, the prior, informs about previous knowledge of the observations or parameters. Finally, the denominator can be interpreted as the probability of observing a cell in $\it{S}_B$ or you can think about it as the data averaged over all possible values of the model parameters. 

An important aspect of eq. $\ref{eq:bayesformula}$ is that in the case of probability functions, the integration equals one. This just means that over the whole space of probabilities, something must have happened.

In the example given, the probabilities are just uniform discrete values, so we obtain a posterior probability that is just a number that represents our updated knowledge about the probability of a cell in $\it{S}_{A}$ given the cell is in $\it{S}_{B}$. This is a particularly simple and maybe intuitive outcome. What is typically more useful is that we have a probability density function that represents our prior knowledge about an event/outcome and we want to determine the posterior distribution. We then choose a likelihood probability that encodes information about what has been observed given the prior probability and make inferences by sampling the constructed posterior distribution.


Reuse and Attribution

No comments:

Post a Comment

Please refrain from using ad hominem attacks, profanity, slander, or any similar sentiment in your comments. Let's keep the discussion respectful and constructive.