Hyperparameter Tuning

mlcli includes powerful hyperparameter tuning capabilities to help you find the best model configuration. Choose from grid search, random search, or Bayesian optimization.

Quick Start

Run hyperparameter tuning with a single command:

Terminal
mlcli tune data/train.csv \
  --model random_forest \
  --strategy random \
  --n-iter 50 \
  --cv 5

Tuning Strategies

Grid Search

Exhaustively search through a specified parameter grid.

Pros:Thorough, Reproducible, Simple
Cons:Slow for large spaces, Curse of dimensionality

Random Search

Sample random combinations from the parameter space.

Pros:Faster than grid, Good for large spaces, Often finds good solutions
Cons:May miss optimal, Less thorough

Bayesian Optimization

Use probabilistic models to guide the search.

Pros:Sample efficient, Handles complex spaces, Best for expensive evaluations
Cons:More complex, Overhead for simple models

Grid Search

Grid search exhaustively evaluates all combinations of hyperparameters. Best for small parameter spaces.

Terminal
mlcli tune data/train.csv \
  --model xgboost \
  --strategy grid \
  --param n_estimators=100,200,300 \
  --param max_depth=3,5,7 \
  --param learning_rate=0.01,0.1 \
  --cv 5 \
  --scoring accuracy

Random Search

Random search samples random combinations from the parameter space. More efficient for larger spaces.

Terminal
mlcli tune data/train.csv \
  --model lightgbm \
  --strategy random \
  --param n_estimators=int:50:500 \
  --param max_depth=int:3:15 \
  --param learning_rate=float:0.001:0.3:log \
  --param subsample=float:0.5:1.0 \
  --n-iter 100 \
  --cv 5

Bayesian Optimization

Uses probabilistic models to intelligently explore the parameter space. Best for expensive model evaluations.

Terminal
mlcli tune data/train.csv \
  --model random_forest \
  --strategy bayesian \
  --param n_estimators=int:50:500 \
  --param max_depth=int:3:20 \
  --param min_samples_split=int:2:20 \
  --param min_samples_leaf=int:1:10 \
  --n-iter 50 \
  --cv 5

Parameter Specification

Define parameter search spaces using different formats:

Categorical Values

Terminal
--param criterion=gini,entropy,log_loss

Integer Range

Terminal
--param n_estimators=int:100:1000

Float Range

Terminal
--param learning_rate=float:0.001:0.1

Log Scale

Terminal
--param learning_rate=float:0.0001:1.0:log

YAML Configuration

Define complex tuning configurations in YAML:

YAML
# tuning_config.yaml
model: xgboost
strategy: bayesian
n_iter: 100
cv: 5
scoring: f1_weighted

parameters:
  n_estimators:
    type: int
    low: 50
    high: 500
  max_depth:
    type: int
    low: 3
    high: 15
  learning_rate:
    type: float
    low: 0.001
    high: 0.3
    log: true
  subsample:
    type: float
    low: 0.5
    high: 1.0
  colsample_bytree:
    type: float
    low: 0.5
    high: 1.0
  reg_alpha:
    type: float
    low: 0.0001
    high: 10.0
    log: true
  reg_lambda:
    type: float
    low: 0.0001
    high: 10.0
    log: true

early_stopping:
  patience: 10
  min_delta: 0.001

output:
  best_params: best_params.json
  trials: trials.csv
  plots: tuning_plots/

Run with the config file:

Terminal
mlcli tune data/train.csv --config tuning_config.yaml

Cross-Validation Options

Terminal
# Standard K-Fold
mlcli tune data.csv --model rf --cv 5

# Stratified K-Fold (for classification)
mlcli tune data.csv --model rf --cv stratified:5

# Time Series Split
mlcli tune data.csv --model rf --cv timeseries:5

# Repeated K-Fold
mlcli tune data.csv --model rf --cv repeated:5:3

Scoring Metrics

Available scoring metrics for optimization:

Classification

  • accuracy - Classification accuracy
  • f1 - F1 score (binary)
  • f1_weighted - Weighted F1 score
  • roc_auc - ROC AUC score
  • precision - Precision score
  • recall - Recall score

Regression

  • neg_mse - Negative mean squared error
  • neg_rmse - Negative root mean squared error
  • neg_mae - Negative mean absolute error
  • r2 - R² score

Output

After tuning, mlcli outputs the best parameters and can save detailed results:

Terminal
mlcli tune data.csv \
  --model xgboost \
  --strategy bayesian \
  --n-iter 50 \
  --output-params best_params.json \
  --output-trials trials.csv \
  --output-plots plots/

Using Best Parameters

Train a model with the tuned parameters:

Terminal
# Use the best parameters from tuning
mlcli train data.csv \
  --model xgboost \
  --params best_params.json \
  --output models/tuned_xgboost.pkl