The evidence is the tricky bit and what makes bayesian inference more difficult. What this term describes is the evidence that the data was generated by the model. Note that this term is independent of the parameters θ of the model.

We can calculate this by integrating over all possible parameter values.

p(data)=θp(data,θ)dθp(data) = \int_{\theta} p(data, \theta) d\theta

This integral is what makes Bayes rule so tricky. For non-trivial models there is no closed form way to calculate this. The more parameters your model has, the more parameters that you need to integrate over. For real-world models it seems like we're in trouble.

Using sampling to avoid this calculation

To get an exact answer to this you need to use a conjugate prior, a set of distributions that are related to the likelihood in a mathematically nice way. This is very limiting, and not suitable for the real world.

What we do these days is use computers to avoid this problem by sampling. This trick depends on noticing that the evidence term doesn't depend on the parameters of the model. That means that the shape of the posterior only depends on the numerator of Bayes rule, and we can simplify what we are going to calculate like this:

p(θdata)=p(dataθ)p(θ)p(data)p(\theta | data) = \frac{p(data | \theta) p(\theta)}{p(data)}
p(θdata)p(dataθ)p(θ)p(\theta | data) \propto p(data | \theta) p(\theta)