Diagnostics Reference

We provide access to several diagnostics. The in-house “log_l” diagnostic is an adhoc comparison of your model’s LLH to the actual LLH.

Currently all other diagnostics are provided by the SBI package

Diagnostics

class SBCDiagnostic(simulator, inference_handler, plot_dir)[source]

Posterior calibration diagnostics via Simulation-Based Calibration.

Wraps the sbi SBC, expected-coverage, and TARP diagnostics behind a common interface. All three methods share a single pool of prior predictive samples generated by create_prior_samples(), which must be called before any plot method.

The general workflow is:

  1. Construct the object — this calls build_posterior() on the provided handler.

  2. Call create_prior_samples() to draw θ ~ prior and simulate the corresponding observables x ~ p(x | θ).

  3. Call any combination of rank_plot(), expected_coverage(), and tarp(). Each saves a PDF to plot_dir.

Parameters:
  • simulator (Simulator) – Simulator object used to generate prior predictive samples. Must implement SimulatorProtocol.

  • inference_handler (InferenceHandler) – Trained InferenceHandler whose posterior is used for all diagnostic evaluations.

  • plot_dir (Path) – Directory where output PDFs are written. Created automatically if it does not exist.

create_prior_samples(num_prior_samples)[source]

Draw samples from the prior and generate corresponding prior predictives.

Samples θ ~ prior using the inference handler’s prior, then runs each θ through the simulator to produce the prior predictive observables x ~ p(x | θ). Both tensors are stored on the instance and reused by all subsequent diagnostic methods.

Parameters:

num_prior_samples (int) – Number of (θ, x) pairs to generate.

Return type:

None

rank_plot(num_posterior_samples=1000, num_rank_bins=20)[source]

Produce an SBC rank-uniformity plot and save it to plot_dir.

For each prior sample θ*, draws num_posterior_samples samples from p(θ | x*) and computes the rank of θ* within those samples. Under a well-calibrated posterior the ranks are uniformly distributed; systematic deviations indicate over- or under-dispersion.

Saves rank_plot.pdf to plot_dir.

Parameters:
  • num_posterior_samples (int) – Number of posterior samples drawn per prior sample when computing ranks.

  • num_rank_bins (int) – Number of histogram bins in the rank plot.

Raises:

ValueError – If create_prior_samples() has not been called.

Return type:

None

expected_coverage(num_posterior_samples=1000, num_rank_bins=20)[source]

Produce an expected-coverage (CDF) plot and save it to plot_dir.

Uses the negative log-probability of the posterior as a test statistic. The empirical coverage should match the nominal level for a well-calibrated posterior: the CDF curve should lie on the diagonal. Curves above the diagonal indicate over-coverage (conservative posterior); curves below indicate under-coverage (overconfident posterior).

Saves expected_coverage.pdf to plot_dir.

Parameters:
  • num_posterior_samples (int) – Number of posterior samples drawn per prior sample when computing the test statistic.

  • num_rank_bins (int) – Number of bins used when constructing the empirical CDF.

Raises:

ValueError – If create_prior_samples() has not been called, or if posterior is None.

Return type:

None

tarp(num_posterior_samples=1000)[source]

Produce a TARP diagnostic plot and save it to plot_dir.

TARP (Test of Accuracy with Random Points) is a global calibration test that avoids the marginalisation assumptions of rank-based methods. It computes the empirical coverage probability (ECP) as a function of the credibility level α and reports two summary statistics:

  • ATC (Area To Curve) — should be close to 0 for a well-calibrated posterior; positive values indicate over-coverage, negative values indicate under-coverage.

  • KS p-value - from a Kolmogorov - Smirnov test against the diagonal; a large p-value (> 0.05) is consistent with calibration.

Both statistics are logged at INFO level. Saves tarp.pdf to plot_dir.

Parameters:

num_posterior_samples (int) – Number of posterior samples drawn per prior sample when estimating the ECP.

Raises:

ValueError – If create_prior_samples() has not been called.

Return type:

None

compare_logl(simulator, inference_handler, n_samples, n_bins=100, likelihood_range=None, save_path=None)[source]

Compares the LLH of an actual model and the simulator

Parameters:
  • simulator (Simulator) – The simulator

  • inference_handler (InferenceHandler) – The inference handler

  • n_samples (int) – Number of samples to draw

  • save_path (Path | None) – Where to save, defaults to None

  • n_bins (int)

  • likelihood_range (tuple[float, float] | None)