========
Tutorial
========

This tutorial walks through a complete end-to-end workflow for Bayesian feature
selection using the horseshoe prior.

Step 1: Install the Package
----------------------------

Install from PyPI:

.. code-block:: console

    $ pip install bayesian_feature_selection

Or install from source for development:

.. code-block:: console

    $ git clone https://github.com/MacroMagic/bayesian_feature_selection.git
    $ cd bayesian_feature_selection
    $ pip install -e ".[dev]"

Step 2: Prepare Data
--------------------

Create a synthetic dataset with known sparse structure for demonstration:

.. code-block:: python

    import numpy as np

    np.random.seed(42)
    n_samples = 200
    n_features = 50

    # Generate random features
    X = np.random.randn(n_samples, n_features)

    # Only 5 features are truly relevant
    true_beta = np.zeros(n_features)
    true_beta[0] = 3.0
    true_beta[1] = -2.0
    true_beta[5] = 1.5
    true_beta[10] = -1.0
    true_beta[20] = 0.5

    # Generate response
    y = X @ true_beta + np.random.randn(n_samples) * 0.5

    feature_names = [f"gene_{i}" for i in range(n_features)]

Step 3: Configure the Experiment
---------------------------------

**Programmatic configuration:**

.. code-block:: python

    from bayesian_feature_selection import (
        HorseshoeGLM,
        InferenceConfig,
        ModelConfig,
        SelectionConfig,
        ExperimentConfig,
        DataConfig,
        OutputConfig,
    )

    model_config = ModelConfig(family="gaussian", scale_global=0.5)
    inference_config = InferenceConfig(
        method="mcmc",
        num_warmup=500,
        num_samples=1000,
        num_chains=2,
        use_gpu=False,
    )
    selection_config = SelectionConfig(method="beta", threshold=0.5)

**YAML configuration:**

Save the following as ``experiment.yaml``:

.. code-block:: yaml

    data:
      data_path: "data/synthetic.csv"
      target_col: "y"
      standardize: true
      test_size: 0.2

    model:
      family: "gaussian"
      scale_global: 0.5

    inference:
      method: "mcmc"
      num_warmup: 500
      num_samples: 1000
      num_chains: 2
      use_gpu: false

    selection:
      method: "beta"
      threshold: 0.5

    output:
      save_plots: true
      save_diagnostics: true

Then load it:

.. code-block:: python

    from pathlib import Path
    from bayesian_feature_selection import ExperimentConfig

    config = ExperimentConfig.from_yaml(Path("experiment.yaml"))

Step 4: Fit the Model
---------------------

**Using MCMC:**

.. code-block:: python

    model = HorseshoeGLM(
        family=model_config.family,
        scale_global=model_config.scale_global,
    )
    model.fit(X, y, config=inference_config)

**Using SVI (faster, approximate):**

.. code-block:: python

    svi_config = InferenceConfig(
        method="svi",
        num_steps=5000,
        learning_rate=0.005,
        use_gpu=False,
    )
    model_svi = HorseshoeGLM(family="gaussian", scale_global=0.5)
    model_svi.fit(X, y, config=svi_config)

Step 5: Analyze Results
-----------------------

Extract feature importance and examine the selected features:

.. code-block:: python

    importance = model.get_feature_importance(
        threshold=selection_config.threshold,
        method=selection_config.method,
    )

    # Add feature names
    importance["feature_name"] = [feature_names[i] for i in importance["feature_idx"]]

    # Show selected features
    selected = importance[importance["selected"]]
    print("Selected features:")
    print(selected[["feature_name", "beta_mean", "beta_inclusion_prob"]])

    # Show all features sorted by importance
    print("\nAll features by inclusion probability:")
    print(importance[["feature_name", "beta_mean", "beta_inclusion_prob"]].head(10))

Features with ``beta_inclusion_prob`` above the threshold are marked as
selected. The ``ci_excludes_zero`` column indicates whether the 95% credible
interval excludes zero, providing additional evidence of relevance.

Step 6: Make Predictions
------------------------

.. code-block:: python

    # Generate new data
    X_new = np.random.randn(10, n_features)

    # Point predictions
    y_pred = model.predict(X_new)
    print("Predictions:", y_pred)

    # Posterior predictive samples for uncertainty quantification
    y_samples = model.predict(X_new, return_samples=True)
    y_mean = y_samples.mean(axis=0)
    y_std = y_samples.std(axis=0)
    print("Prediction uncertainty (std):", y_std)

Step 7: Save and Visualize Results
-----------------------------------

.. code-block:: python

    from pathlib import Path
    from bayesian_feature_selection.visualization import (
        plot_feature_importance,
        plot_diagnostics,
    )

    output_dir = Path("results")
    output_dir.mkdir(exist_ok=True)

    # Save feature importance to CSV
    importance.to_csv(output_dir / "feature_importance.csv", index=False)

    # Plot feature importance (saves PNG files)
    plot_feature_importance(importance, output_dir, feature_names=feature_names)

    # Plot MCMC diagnostics (trace plots, posterior, R-hat)
    if model.mcmc is not None:
        plot_diagnostics(model.mcmc, output_dir)

    # Save the experiment configuration for reproducibility
    experiment_config = ExperimentConfig(
        model=model_config,
        inference=inference_config,
        selection=selection_config,
    )
    experiment_config.to_yaml(output_dir / "config.yaml")