Methodology¶
This page describes the statistical methodology behind the Bayesian feature selection approach implemented in this package.
The Horseshoe Prior¶
The horseshoe prior (Carvalho et al., 2010) is a continuous shrinkage prior designed for sparse estimation in high-dimensional settings. For a coefficient vector \(\beta\), the horseshoe prior is defined as:
where:
\(\tau\) is the global shrinkage parameter controlling the overall level of sparsity.
\(\lambda_j\) is the local shrinkage parameter for each feature \(j\), allowing individual features to escape shrinkage.
\(\tau_0\) is the scale of the global prior (
scale_globalin the code).
The key property of the horseshoe is that its density has an infinitely tall spike at zero (encouraging shrinkage of noise features) and heavy tails (allowing signal features to remain large).
Regularized Horseshoe Prior¶
This package implements the regularized horseshoe (Piironen & Vehtari, 2017), which adds a slab component to prevent excessively large coefficients:
The slab variance \(c^2\) controls the maximum magnitude of the
coefficients. When \(\lambda_j\) is small, \(\tilde{\lambda}_j \approx
\lambda_j\) (strong shrinkage). When \(\lambda_j\) is large,
\(\tilde{\lambda}_j \approx c / \tau\) (bounded by the slab). Using an
Inverse-Gamma(1, 1) prior on \(c^2\) allows the slab width to be
learned from the data.
GLM Families¶
The horseshoe prior is combined with a generalized linear model (GLM). Three families are supported:
Gaussian (linear regression):
where \(\eta_i = \alpha + X_i \beta\) is the linear predictor.
Binomial (logistic regression):
Poisson (count regression):
Feature Selection Criteria¶
After fitting the model, features are selected based on posterior inclusion probabilities. Three methods are available:
- Beta-based selection (
method="beta"): Computes the fraction of posterior samples where \(|\beta_j| > 0.01\). Features with inclusion probability above the threshold are selected. This method captures both the direction and magnitude of effects.
- Lambda-based selection (
method="lambda"): Uses the local shrinkage parameters \(\lambda_j\). Features whose \(\lambda_j\) values consistently exceed the median across all features are considered relevant. This method is better at filtering pure noise without requiring strong coefficient effects.
- Combined selection (
method="both"): Averages the beta-based and lambda-based inclusion probabilities. A feature must show evidence from both criteria to be selected.
Choosing scale_global¶
The scale_global parameter (\(\tau_0\)) controls the prior expected
level of sparsity. A recommended rule of thumb (Piironen & Vehtari, 2017) is:
where:
\(p_0\) is the expected number of relevant features,
\(p\) is the total number of features, and
\(n\) is the number of observations.
Guidelines:
Sparse problems (\(p_0 \ll p\)): Use smaller values (0.1–0.5).
Moderate sparsity: Use values around 0.5–1.0.
Dense problems (\(p_0 \approx p\)): Use larger values (1.0–2.0).
References¶
Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). “The horseshoe estimator for sparse signals.” Biometrika, 97(2), 465–480.
Piironen, J., & Vehtari, A. (2017). “Sparsity information and regularization in the horseshoe and other shrinkage priors.” Electronic Journal of Statistics, 11(2), 5018–5051.