Data Loading¶
- class ParaketDataset(data_folder, parameter_names, nuisance_params=None)[source]¶
File-level PyTorch dataset over a folder of
.feathersimulation files.Each
__getitem__call loads one feather file and returns a(theta, x)pair. Callto_tensor_dataset()before training to pre-load everything into RAM as a flatTensorDataset.- Parameters:
data_folder (Path)
parameter_names (list[str])
nuisance_params (list[str] | None)
- to_tensor_dataset(device='cpu')[source]¶
Pre-load all feather files into a flat
TensorDataset.Concatenates all
(theta, x)pairs along the sample dimension. This avoids repeated disk reads per epoch during training.- Parameters:
device (
str) – Target device for the output tensors.- Return type:
TensorDataset- Returns:
A
TensorDatasetof(theta_tensor, x_tensor)with shapes(n_total_samples, n_params)and(n_total_samples, n_bins).
Utilities¶
- from_feather(file_name, parameter_names, nuisance_pars=None)[source]¶
Load a
(theta, x)pair from a feather file.- Parameters:
file_name (
Path) – Path to the.featherfile.parameter_names (
list[str]) – Ordered parameter names used for nuisance filtering.nuisance_pars (
list[str] |None) – fnmatch patterns for parameters to exclude from theta.Nonereturns all parameters.
- Return type:
tuple[ndarray[tuple[Any,...],dtype[float32]],ndarray[tuple[Any,...],dtype[float32]]]- Returns:
Tuple of
(theta, x)asfloat32numpy arrays.- Raises:
FileNotFoundError – If file_name does not exist.
- to_feather(file_name, theta_values, x_values)[source]¶
Write a
(theta, x)pair to a feather file.- Parameters:
file_name (
Path) – Destination path. Must end in.feather.theta_values (
ndarray[tuple[Any,...],dtype[float32]]) – Parameter array of shape(n_samples, n_params).x_values (
ndarray[tuple[Any,...],dtype[float32]]) – Observable array of shape(n_samples, n_bins).
- Raises:
ValueError – If file_name does not have a
.feathersuffix.- Return type:
None
- filter_nuisance(parameter_names, nuisance_pars, theta)[source]¶
Remove nuisance parameters from a theta array by name pattern.
- Parameters:
parameter_names (
list[str]) – Ordered parameter names, length must matchtheta.shape[1].nuisance_pars (
list[str]) – fnmatch patterns for parameters to exclude (e.g.["syst_*"]).theta (
ndarray[tuple[Any,...],dtype[float32]]) – Parameter array of shape(n_samples, n_params).
- Return type:
ndarray[tuple[Any,...],dtype[float32]]- Returns:
Filtered array with nuisance columns removed.
- Raises:
ValueError – If
len(parameter_names) != theta.shape[1].