How likely are we to make the observations of data if the parameters of our model θ are correct.

When you have your model, it is usually straightforward to calculate this likelihood. For a simple example, if we are modelling coin tosses as a fair coin being tossed, and we see two heads in a row, p(dataθ)p(data | \theta) is given by:

=p(Hθ)×p(Hθ) = p(H | \theta) \times p(H | \theta)
=12×12 = \frac{1}{2} \times \frac{1}{2}
=14 = \frac{1}{4}

How to pick a good model?

Picking a good model that you can parameterise is a key part of this process. Ben Lambert's book gives you this framework for picking a good model:

  1. Write down the real life behaviour that the model should be capable of explaining.
  2. Write down the assumptions that it is believed are reasonable to achieve step 1.
  3. Look for probability models that are based on these assumptions. If necessary, combine multiple modles to capture all the assumptions.
  4. After fitting the model to the data, test its ability to explain the behaviour identified in step 1. If unsuccessful, go back to step 2 and assess which of your assumptions are invalidated. Then choose a new more general probability model that encompasses these new assumptions.