Hyper-parameters

This short guide will illustrate how to estimate hyper-parameters from posterior samples using bilby using a simplistic problem.

Setting up the problem

We are given three data sets labelled by a, b, and c. Each data set consists of \(N\) observations of a variable \(y\) taken at a dependent variable \(x\). Notationally, we can write these three data sets as \((y_i^a, x_i^a),(y_i^b, x_i^b),(y_i^c, x_i^c)\) where \(i \in [0, N]\) labels the indexes of each data set.

Plotting the data, we see that all three look like they could be modelled by a linear function with some slope and some intercept:

Given any individual data set, you could write down a linear model \(y(x) = c_0 + c_1 x\) and infer the intercept and gradient. For example, given the a-data set, you could calculate \(P(c_0|y_i^a, x_i^a)\) by fitting the model to the data. Here is a figure demonstrating the posteriors on the intercept, for the three data sets given above

_images/hyper_parameter_combined_posteriors.png

While the data looks noise, the overlap in the \(c_0\) posteriors might (especially given context about how the data was produced and the physical setting) make you believe all data sets share a common intercept. How would you go about estimating this? You could just take the mean of the means of the posterior and that would give you a pretty good estimate. However, we’ll now discuss how to do this with hyper parameters.

Understanding the population using hyperparameters

We first need to define our hyperparameterized model. In this case, it is as simple as

\[c_0 \sim \mathcal{N}(\mu_{c_0}, \sigma_{c_0})\]

That is, we model the population \(c_0\) (from which each of the data sets was drawn) as coming from a normal distribution with some mean and some standard deviation.

To do - write in details of the likelihood and its derivation

For the samples in the figure above, the posterior on these hyperparamters is given by