Derivation of the normalizing constant for the posterior

I watched Introduction To Bayesian InferenceĀ videolectures1 by Chris Bishop and decided to derive bayesian inference equation from scratch to make sure I follow the lecturer.

Posterior can be written as

p{(\theta \mid \hat{x}, \mathrm{X} )} = \frac{p{(\theta, \hat{x}, \mathrm{X})}}{p{(\hat{x}, \mathrm{X})}}

That is definition of conditional probability (Grimmett and Stirzaker, 2001, p. 9) a_logo_17

Rewrite the numerator as (see Exercise (1.4.2) in Grimmett & Stirzaker or Chain rule from Wikipedia)

p{(\theta, \hat{x}, \mathrm{X})} = p(\hat{x} \mid \mathrm{X}, \theta) p(\mathrm{X} \mid \theta) p(\theta)

and apply Bayes’s formula to p(\mathrm{X} \mid \theta)

\frac{p{(\theta, \hat{x}, \mathrm{X})}}{p{(\hat{x}, \mathrm{X})}} = \frac{p(\hat{x} \mid \mathrm{X}, \theta) p(\theta \mid \mathrm{X}) p(\mathrm{X})p(\theta)}{p(\theta)p(\hat{x},\mathrm{X})}

then substitute

p(\hat{x},\mathrm{X}) = p(\hat{x} \mid \mathrm{X}) p(\mathrm{X})

After cancelling common factors, we get posterior (since \hat{x} and \mathrm{X} are independent)

p{(\theta \mid \hat{x}, \mathrm{X} )} =  \frac{1}{p(\hat{x} \mid \mathrm{X})} \cdot p(\hat{x} \mid \theta) p(\theta \mid \mathrm{X})

That has turned out to be useful for my friend so I posted it here.