# Derivation of the normalizing constant for the posterior

I watched Introduction To Bayesian Inference  by Chris Bishop and decided to derive bayesian inference equation from scratch to make sure I follow the lecturer.

Posterior can be written as

$p{(\theta \mid \hat{x}, \mathrm{X} )} = \frac{p{(\theta, \hat{x}, \mathrm{X})}}{p{(\hat{x}, \mathrm{X})}}$

That is definition of conditional probability (Grimmett and Stirzaker, 2001, p. 9)

Rewrite the numerator as (see Exercise (1.4.2) in Grimmett & Stirzaker or Chain rule from Wikipedia)

$p{(\theta, \hat{x}, \mathrm{X})} = p(\hat{x} \mid \mathrm{X}, \theta) p(\mathrm{X} \mid \theta) p(\theta)$

and apply Bayes’s formula to $p(\mathrm{X} \mid \theta)$

$\frac{p{(\theta, \hat{x}, \mathrm{X})}}{p{(\hat{x}, \mathrm{X})}} = \frac{p(\hat{x} \mid \mathrm{X}, \theta) p(\theta \mid \mathrm{X}) p(\mathrm{X})p(\theta)}{p(\theta)p(\hat{x},\mathrm{X})}$

then substitute

$p(\hat{x},\mathrm{X}) = p(\hat{x} \mid \mathrm{X}) p(\mathrm{X})$

After cancelling common factors, we get posterior (since $\hat{x}$ and $\mathrm{X}$ are independent)

$p{(\theta \mid \hat{x}, \mathrm{X} )} = \frac{1}{p(\hat{x} \mid \mathrm{X})} \cdot p(\hat{x} \mid \theta) p(\theta \mid \mathrm{X})$

That has turned out to be useful for my friend so I posted it here.