Diagnostics Reference¶
We provide access to several diagnostics. The in-house “log_l” diagnostic is an adhoc comparison of your model’s LLH to the actual LLH.
Currently all other diagnostics are provided by the SBI package
Diagnostics¶
- class SBCDiagnostic(simulator, inference_handler, plot_dir)[source]¶
Posterior calibration diagnostics via Simulation-Based Calibration.
Wraps the
sbiSBC, expected-coverage, and TARP diagnostics behind a common interface. All three methods share a single pool of prior predictive samples generated bycreate_prior_samples(), which must be called before any plot method.The general workflow is:
Construct the object — this calls
build_posterior()on the provided handler.Call
create_prior_samples()to drawθ ~ priorand simulate the corresponding observablesx ~ p(x | θ).Call any combination of
rank_plot(),expected_coverage(), andtarp(). Each saves a PDF to plot_dir.
- Parameters:
simulator (
Simulator) – Simulator object used to generate prior predictive samples. Must implementSimulatorProtocol.inference_handler (
InferenceHandler) – TrainedInferenceHandlerwhose posterior is used for all diagnostic evaluations.plot_dir (
Path) – Directory where output PDFs are written. Created automatically if it does not exist.
- create_prior_samples(num_prior_samples)[source]¶
Draw samples from the prior and generate corresponding prior predictives.
Samples
θ ~ priorusing the inference handler’s prior, then runs eachθthrough the simulator to produce the prior predictive observablesx ~ p(x | θ). Both tensors are stored on the instance and reused by all subsequent diagnostic methods.- Parameters:
num_prior_samples (
int) – Number of(θ, x)pairs to generate.- Return type:
None
- rank_plot(num_posterior_samples=1000, num_rank_bins=20)[source]¶
Produce an SBC rank-uniformity plot and save it to plot_dir.
For each prior sample
θ*, drawsnum_posterior_samplessamples fromp(θ | x*)and computes the rank ofθ*within those samples. Under a well-calibrated posterior the ranks are uniformly distributed; systematic deviations indicate over- or under-dispersion.Saves
rank_plot.pdftoplot_dir.- Parameters:
num_posterior_samples (
int) – Number of posterior samples drawn per prior sample when computing ranks.num_rank_bins (
int) – Number of histogram bins in the rank plot.
- Raises:
ValueError – If
create_prior_samples()has not been called.- Return type:
None
- expected_coverage(num_posterior_samples=1000, num_rank_bins=20)[source]¶
Produce an expected-coverage (CDF) plot and save it to plot_dir.
Uses the negative log-probability of the posterior as a test statistic. The empirical coverage should match the nominal level for a well-calibrated posterior: the CDF curve should lie on the diagonal. Curves above the diagonal indicate over-coverage (conservative posterior); curves below indicate under-coverage (overconfident posterior).
Saves
expected_coverage.pdftoplot_dir.- Parameters:
num_posterior_samples (
int) – Number of posterior samples drawn per prior sample when computing the test statistic.num_rank_bins (
int) – Number of bins used when constructing the empirical CDF.
- Raises:
ValueError – If
create_prior_samples()has not been called, or ifposteriorisNone.- Return type:
None
- tarp(num_posterior_samples=1000)[source]¶
Produce a TARP diagnostic plot and save it to plot_dir.
TARP (Test of Accuracy with Random Points) is a global calibration test that avoids the marginalisation assumptions of rank-based methods. It computes the empirical coverage probability (ECP) as a function of the credibility level
αand reports two summary statistics:ATC (Area To Curve) — should be close to 0 for a well-calibrated posterior; positive values indicate over-coverage, negative values indicate under-coverage.
KS p-value - from a Kolmogorov - Smirnov test against the diagonal; a large p-value (> 0.05) is consistent with calibration.
Both statistics are logged at INFO level. Saves
tarp.pdftoplot_dir.- Parameters:
num_posterior_samples (
int) – Number of posterior samples drawn per prior sample when estimating the ECP.- Raises:
ValueError – If
create_prior_samples()has not been called.- Return type:
None
- compare_logl(simulator, inference_handler, n_samples, n_bins=100, likelihood_range=None, save_path=None)[source]¶
Compares the LLH of an actual model and the simulator
- Parameters:
simulator (
Simulator) – The simulatorinference_handler (
InferenceHandler) – The inference handlern_samples (
int) – Number of samples to drawsave_path (
Path|None) – Where to save, defaults to Nonen_bins (int)
likelihood_range (tuple[float, float] | None)