Data Loading

class ParaketDataset(data_folder, parameter_names, nuisance_params=None)[source]

File-level PyTorch dataset over a folder of .feather simulation files.

Each __getitem__ call loads one feather file and returns a (theta, x) pair. Call to_tensor_dataset() before training to pre-load everything into RAM as a flat TensorDataset.

Parameters:
  • data_folder (Path)

  • parameter_names (list[str])

  • nuisance_params (list[str] | None)

to_tensor_dataset(device='cpu')[source]

Pre-load all feather files into a flat TensorDataset.

Concatenates all (theta, x) pairs along the sample dimension. This avoids repeated disk reads per epoch during training.

Parameters:

device (str) – Target device for the output tensors.

Return type:

TensorDataset

Returns:

A TensorDataset of (theta_tensor, x_tensor) with shapes (n_total_samples, n_params) and (n_total_samples, n_bins).

Utilities

from_feather(file_name, parameter_names, nuisance_pars=None)[source]

Load a (theta, x) pair from a feather file.

Parameters:
  • file_name (Path) – Path to the .feather file.

  • parameter_names (list[str]) – Ordered parameter names used for nuisance filtering.

  • nuisance_pars (list[str] | None) – fnmatch patterns for parameters to exclude from theta. None returns all parameters.

Return type:

tuple[ndarray[tuple[Any, ...], dtype[float32]], ndarray[tuple[Any, ...], dtype[float32]]]

Returns:

Tuple of (theta, x) as float32 numpy arrays.

Raises:

FileNotFoundError – If file_name does not exist.

to_feather(file_name, theta_values, x_values)[source]

Write a (theta, x) pair to a feather file.

Parameters:
  • file_name (Path) – Destination path. Must end in .feather.

  • theta_values (ndarray[tuple[Any, ...], dtype[float32]]) – Parameter array of shape (n_samples, n_params).

  • x_values (ndarray[tuple[Any, ...], dtype[float32]]) – Observable array of shape (n_samples, n_bins).

Raises:

ValueError – If file_name does not have a .feather suffix.

Return type:

None

filter_nuisance(parameter_names, nuisance_pars, theta)[source]

Remove nuisance parameters from a theta array by name pattern.

Parameters:
  • parameter_names (list[str]) – Ordered parameter names, length must match theta.shape[1].

  • nuisance_pars (list[str]) – fnmatch patterns for parameters to exclude (e.g. ["syst_*"]).

  • theta (ndarray[tuple[Any, ...], dtype[float32]]) – Parameter array of shape (n_samples, n_params).

Return type:

ndarray[tuple[Any, ...], dtype[float32]]

Returns:

Filtered array with nuisance columns removed.

Raises:

ValueError – If len(parameter_names) != theta.shape[1].