API Reference¶
Core Model¶
- class bayesian_feature_selection.HorseshoeGLM(family: Literal['gaussian', 'binomial', 'poisson'] = 'gaussian', scale_global: float = 1.0)[source]¶
Bayesian GLM with horseshoe prior for feature selection.
The horseshoe prior provides strong shrinkage for irrelevant features while keeping relevant features relatively at large.
Supports: - Linear regression (family=’gaussian’) - Logistic regression (family=’binomial’) - Poisson regression (family=’poisson’)
- fit(X: <MagicMock name = 'mock.ndarray' id='140578108382928'>, y: <MagicMock name = 'mock.ndarray' id='140578108382928'>, config: InferenceConfig | None = None) HorseshoeGLM[source]¶
Fit the model using MCMC or SVI.
- Parameters:
X (np.ndarray) – Design matrix
y (np.ndarray) – Response variable
config (InferenceConfig, optional) – Inference configuration
- Return type:
self
- get_feature_importance(threshold: float = 0.5, method: Literal['beta', 'lambda', 'both'] = 'beta') <MagicMock name='mock.DataFrame' id='140578108388240'>[source]¶
Extract feature importance based on posterior inclusion probabilities.
- Parameters:
threshold (float) – Threshold for feature selection
method (str) – Feature selection method: - “beta”: Based on coefficient posterior (default, good for effect size) - “lambda”: Based on local shrinkage parameter (good for pure noise filtering) - “both”: Return both metrics
- Returns:
Feature importance metrics
- Return type:
pd.DataFrame
Notes
Method comparison:
“beta”: Selects features with consistent non-zero effects. Captures both direction and magnitude. Best for prediction and interpretation.
“lambda”: Identifies features with weak shrinkage (less noise-like). Pure noise gets lambda ≈ 0, relevant features get lambda >> 0. Better for filtering pure noise without requiring strong effects.
- model(X: <MagicMock name = 'mock.numpy.ndarray' id='140578140201936'>, y: <MagicMock name = 'mock.numpy.ndarray' id='140578140201936'> | None = None)[source]¶
Horseshoe prior GLM model.
- Parameters:
X (jnp.ndarray) – Design matrix (n_samples, n_features)
y (jnp.ndarray, optional) – Response variable
- predict(X_new: <MagicMock name = 'mock.ndarray' id='140578108382928'>, return_samples: bool = False) <MagicMock name='mock.ndarray' id='140578108382928'>[source]¶
Make predictions on new data.
- Parameters:
X_new (np.ndarray) – New design matrix
return_samples (bool) – If True, return posterior predictive samples
- Returns:
Predictions (mean or samples)
- Return type:
np.ndarray
Data Loading¶
- class bayesian_feature_selection.DataLoader(config: DataConfig)[source]¶
Handle data loading and preprocessing for Bayesian feature selection.
Supports: - CSV file loading - Train/test splitting - Feature standardization - Feature selection
- load_and_split(data_path: Path | None = None, target_col: str | None = None) List[str]][source]¶
Load data and split into train/test sets.
- Parameters:
data_path (Path, optional) – Path to CSV file (overrides config)
target_col (str, optional) – Target column name (overrides config)
- Returns:
X_train (np.ndarray) – Training features
X_test (np.ndarray) – Test features
y_train (np.ndarray) – Training targets
y_test (np.ndarray) – Test targets
feature_names (List[str]) – Feature names
- load_data(data_path: Path | None = None, target_col: str | None = None) List[str]][source]¶
Load data from CSV file.
- Parameters:
data_path (Path, optional) – Path to CSV file (overrides config)
target_col (str, optional) – Target column name (overrides config)
- Returns:
X (np.ndarray) – Feature matrix
y (np.ndarray) – Target vector
feature_names (List[str]) – Feature names
- save_predictions(predictions: <MagicMock name = 'mock.ndarray' id='140578108382928'>, output_path: Path, include_features: bool = False, X: <MagicMock name = 'mock.ndarray' id='140578108382928'> | None = None) None[source]¶
Save predictions to CSV file.
- Parameters:
predictions (np.ndarray) – Model predictions
output_path (Path) – Output file path
include_features (bool) – Include feature values in output
X (np.ndarray, optional) – Feature matrix (required if include_features=True)
- bayesian_feature_selection.load_data_from_config(config: DataConfig, data_path: Path | None = None, target_col: str | None = None) List[str]][source]¶
Convenience function to load data from config.
- Parameters:
config (DataConfig) – Data configuration
data_path (Path, optional) – Path to CSV file (overrides config)
target_col (str, optional) – Target column name (overrides config)
- Returns:
X (np.ndarray) – Feature matrix
y (np.ndarray) – Target vector
feature_names (List[str]) – Feature names
Configuration¶
- class bayesian_feature_selection.DataConfig(data_path: str | None = None, target_col: str | None = None, feature_cols: List[str] | None = None, test_size: float = 0.0, standardize: bool = False, random_seed: int = 42)[source]¶
Data configuration.
- data_path: str | None = None¶
- feature_cols: List[str] | None = None¶
- random_seed: int = 42¶
- standardize: bool = False¶
- target_col: str | None = None¶
- test_size: float = 0.0¶
- class bayesian_feature_selection.InferenceConfig(method: Literal['mcmc', 'svi'] = 'mcmc', num_warmup: int = 1000, num_samples: int = 2000, num_chains: int = 4, num_steps: int = 10000, learning_rate: float = 0.001, use_gpu: bool = True, progress_bar: bool = True)[source]¶
Configuration for inference.
- learning_rate: float = 0.001¶
- method: Literal['mcmc', 'svi'] = 'mcmc'¶
- num_chains: int = 4¶
- num_samples: int = 2000¶
- num_steps: int = 10000¶
- num_warmup: int = 1000¶
- progress_bar: bool = True¶
- use_gpu: bool = True¶
- class bayesian_feature_selection.ModelConfig(family: Literal['gaussian', 'binomial', 'poisson'] = 'gaussian', scale_global: float = 1.0)[source]¶
Model configuration.
- family: Literal['gaussian', 'binomial', 'poisson'] = 'gaussian'¶
- scale_global: float = 1.0¶
- class bayesian_feature_selection.SelectionConfig(method: Literal['beta', 'lambda', 'both'] = 'beta', threshold: float = 0.5)[source]¶
Feature selection configuration.
- method: Literal['beta', 'lambda', 'both'] = 'beta'¶
- threshold: float = 0.5¶
- class bayesian_feature_selection.OutputConfig(save_plots: bool = True, save_diagnostics: bool = True, save_samples: bool = False)[source]¶
Output configuration.
- save_diagnostics: bool = True¶
- save_plots: bool = True¶
- save_samples: bool = False¶
- class bayesian_feature_selection.ExperimentConfig(data: DataConfig = <factory>, model: ModelConfig = <factory>, inference: InferenceConfig = <factory>, selection: SelectionConfig = <factory>, output: OutputConfig = <factory>)[source]¶
Complete experiment configuration.
- data: DataConfig¶
- classmethod from_yaml(yaml_path: Path) ExperimentConfig[source]¶
Load configuration from YAML file.
- inference: InferenceConfig¶
- model: ModelConfig¶
- output: OutputConfig¶
- selection: SelectionConfig¶
- update_from_dict(updates: Dict[str, Any]) ExperimentConfig[source]¶
Update configuration from dictionary (e.g., CLI overrides).
Visualization¶
- bayesian_feature_selection.visualization.plot_feature_importance(importance_df: <MagicMock name = 'mock.DataFrame' id='140578108388240'>, output_dir: Path, feature_names: List[str] | None = None, top_n: int = 20, method: str = 'beta')[source]¶
Plot feature importance with credible intervals.
- Parameters:
importance_df (pd.DataFrame) – Feature importance dataframe
output_dir (Path) – Output directory for plots
feature_names (List[str], optional) – Feature names
top_n (int) – Number of top features to plot
method (str) – Feature selection method used (‘beta’, ‘lambda’, or ‘both’). Determines which inclusion probability column to display.