API Reference

Core Model

class bayesian_feature_selection.HorseshoeGLM(family: Literal['gaussian', 'binomial', 'poisson'] = 'gaussian', scale_global: float = 1.0)[source]

Bayesian GLM with horseshoe prior for feature selection.

The horseshoe prior provides strong shrinkage for irrelevant features while keeping relevant features relatively at large.

Supports: - Linear regression (family=’gaussian’) - Logistic regression (family=’binomial’) - Poisson regression (family=’poisson’)

fit(X: <MagicMock name = 'mock.ndarray' id='140578108382928'>, y: <MagicMock name = 'mock.ndarray' id='140578108382928'>, config: InferenceConfig | None = None) HorseshoeGLM[source]

Fit the model using MCMC or SVI.

Parameters:
  • X (np.ndarray) – Design matrix

  • y (np.ndarray) – Response variable

  • config (InferenceConfig, optional) – Inference configuration

Return type:

self

get_feature_importance(threshold: float = 0.5, method: Literal['beta', 'lambda', 'both'] = 'beta') <MagicMock name='mock.DataFrame' id='140578108388240'>[source]

Extract feature importance based on posterior inclusion probabilities.

Parameters:
  • threshold (float) – Threshold for feature selection

  • method (str) – Feature selection method: - “beta”: Based on coefficient posterior (default, good for effect size) - “lambda”: Based on local shrinkage parameter (good for pure noise filtering) - “both”: Return both metrics

Returns:

Feature importance metrics

Return type:

pd.DataFrame

Notes

Method comparison:

  • “beta”: Selects features with consistent non-zero effects. Captures both direction and magnitude. Best for prediction and interpretation.

  • “lambda”: Identifies features with weak shrinkage (less noise-like). Pure noise gets lambda ≈ 0, relevant features get lambda >> 0. Better for filtering pure noise without requiring strong effects.

model(X: <MagicMock name = 'mock.numpy.ndarray' id='140578140201936'>, y: <MagicMock name = 'mock.numpy.ndarray' id='140578140201936'> | None = None)[source]

Horseshoe prior GLM model.

Parameters:
  • X (jnp.ndarray) – Design matrix (n_samples, n_features)

  • y (jnp.ndarray, optional) – Response variable

predict(X_new: <MagicMock name = 'mock.ndarray' id='140578108382928'>, return_samples: bool = False) <MagicMock name='mock.ndarray' id='140578108382928'>[source]

Make predictions on new data.

Parameters:
  • X_new (np.ndarray) – New design matrix

  • return_samples (bool) – If True, return posterior predictive samples

Returns:

Predictions (mean or samples)

Return type:

np.ndarray

Data Loading

class bayesian_feature_selection.DataLoader(config: DataConfig)[source]

Handle data loading and preprocessing for Bayesian feature selection.

Supports: - CSV file loading - Train/test splitting - Feature standardization - Feature selection

load_and_split(data_path: Path | None = None, target_col: str | None = None) List[str]][source]

Load data and split into train/test sets.

Parameters:
  • data_path (Path, optional) – Path to CSV file (overrides config)

  • target_col (str, optional) – Target column name (overrides config)

Returns:

  • X_train (np.ndarray) – Training features

  • X_test (np.ndarray) – Test features

  • y_train (np.ndarray) – Training targets

  • y_test (np.ndarray) – Test targets

  • feature_names (List[str]) – Feature names

load_data(data_path: Path | None = None, target_col: str | None = None) List[str]][source]

Load data from CSV file.

Parameters:
  • data_path (Path, optional) – Path to CSV file (overrides config)

  • target_col (str, optional) – Target column name (overrides config)

Returns:

  • X (np.ndarray) – Feature matrix

  • y (np.ndarray) – Target vector

  • feature_names (List[str]) – Feature names

save_predictions(predictions: <MagicMock name = 'mock.ndarray' id='140578108382928'>, output_path: Path, include_features: bool = False, X: <MagicMock name = 'mock.ndarray' id='140578108382928'> | None = None) None[source]

Save predictions to CSV file.

Parameters:
  • predictions (np.ndarray) – Model predictions

  • output_path (Path) – Output file path

  • include_features (bool) – Include feature values in output

  • X (np.ndarray, optional) – Feature matrix (required if include_features=True)

bayesian_feature_selection.load_data_from_config(config: DataConfig, data_path: Path | None = None, target_col: str | None = None) List[str]][source]

Convenience function to load data from config.

Parameters:
  • config (DataConfig) – Data configuration

  • data_path (Path, optional) – Path to CSV file (overrides config)

  • target_col (str, optional) – Target column name (overrides config)

Returns:

  • X (np.ndarray) – Feature matrix

  • y (np.ndarray) – Target vector

  • feature_names (List[str]) – Feature names

Configuration

class bayesian_feature_selection.DataConfig(data_path: str | None = None, target_col: str | None = None, feature_cols: List[str] | None = None, test_size: float = 0.0, standardize: bool = False, random_seed: int = 42)[source]

Data configuration.

data_path: str | None = None
feature_cols: List[str] | None = None
random_seed: int = 42
standardize: bool = False
target_col: str | None = None
test_size: float = 0.0
class bayesian_feature_selection.InferenceConfig(method: Literal['mcmc', 'svi'] = 'mcmc', num_warmup: int = 1000, num_samples: int = 2000, num_chains: int = 4, num_steps: int = 10000, learning_rate: float = 0.001, use_gpu: bool = True, progress_bar: bool = True)[source]

Configuration for inference.

learning_rate: float = 0.001
method: Literal['mcmc', 'svi'] = 'mcmc'
num_chains: int = 4
num_samples: int = 2000
num_steps: int = 10000
num_warmup: int = 1000
progress_bar: bool = True
use_gpu: bool = True
class bayesian_feature_selection.ModelConfig(family: Literal['gaussian', 'binomial', 'poisson'] = 'gaussian', scale_global: float = 1.0)[source]

Model configuration.

family: Literal['gaussian', 'binomial', 'poisson'] = 'gaussian'
scale_global: float = 1.0
class bayesian_feature_selection.SelectionConfig(method: Literal['beta', 'lambda', 'both'] = 'beta', threshold: float = 0.5)[source]

Feature selection configuration.

method: Literal['beta', 'lambda', 'both'] = 'beta'
threshold: float = 0.5
class bayesian_feature_selection.OutputConfig(save_plots: bool = True, save_diagnostics: bool = True, save_samples: bool = False)[source]

Output configuration.

save_diagnostics: bool = True
save_plots: bool = True
save_samples: bool = False
class bayesian_feature_selection.ExperimentConfig(data: DataConfig = <factory>, model: ModelConfig = <factory>, inference: InferenceConfig = <factory>, selection: SelectionConfig = <factory>, output: OutputConfig = <factory>)[source]

Complete experiment configuration.

data: DataConfig
classmethod from_yaml(yaml_path: Path) ExperimentConfig[source]

Load configuration from YAML file.

inference: InferenceConfig
model: ModelConfig
output: OutputConfig
selection: SelectionConfig
to_yaml(yaml_path: Path) None[source]

Save configuration to YAML file.

update_from_dict(updates: Dict[str, Any]) ExperimentConfig[source]

Update configuration from dictionary (e.g., CLI overrides).

Visualization

bayesian_feature_selection.visualization.plot_feature_importance(importance_df: <MagicMock name = 'mock.DataFrame' id='140578108388240'>, output_dir: Path, feature_names: List[str] | None = None, top_n: int = 20, method: str = 'beta')[source]

Plot feature importance with credible intervals.

Parameters:
  • importance_df (pd.DataFrame) – Feature importance dataframe

  • output_dir (Path) – Output directory for plots

  • feature_names (List[str], optional) – Feature names

  • top_n (int) – Number of top features to plot

  • method (str) – Feature selection method used (‘beta’, ‘lambda’, or ‘both’). Determines which inclusion probability column to display.

bayesian_feature_selection.visualization.plot_diagnostics(mcmc, output_dir: Path)[source]

Plot MCMC diagnostics using ArviZ.

Parameters:
  • mcmc (MCMC) – Fitted MCMC object

  • output_dir (Path) – Output directory