Hyperparameter Tuning
mlcli includes powerful hyperparameter tuning capabilities to help you find the best model configuration. Choose from grid search, random search, or Bayesian optimization.
Quick Start
Run hyperparameter tuning with a single command:
mlcli tune data/train.csv \
--model random_forest \
--strategy random \
--n-iter 50 \
--cv 5Tuning Strategies
Grid Search
Exhaustively search through a specified parameter grid.
Random Search
Sample random combinations from the parameter space.
Bayesian Optimization
Use probabilistic models to guide the search.
Grid Search
Grid search exhaustively evaluates all combinations of hyperparameters. Best for small parameter spaces.
mlcli tune data/train.csv \
--model xgboost \
--strategy grid \
--param n_estimators=100,200,300 \
--param max_depth=3,5,7 \
--param learning_rate=0.01,0.1 \
--cv 5 \
--scoring accuracyRandom Search
Random search samples random combinations from the parameter space. More efficient for larger spaces.
mlcli tune data/train.csv \
--model lightgbm \
--strategy random \
--param n_estimators=int:50:500 \
--param max_depth=int:3:15 \
--param learning_rate=float:0.001:0.3:log \
--param subsample=float:0.5:1.0 \
--n-iter 100 \
--cv 5Bayesian Optimization
Uses probabilistic models to intelligently explore the parameter space. Best for expensive model evaluations.
mlcli tune data/train.csv \
--model random_forest \
--strategy bayesian \
--param n_estimators=int:50:500 \
--param max_depth=int:3:20 \
--param min_samples_split=int:2:20 \
--param min_samples_leaf=int:1:10 \
--n-iter 50 \
--cv 5Parameter Specification
Define parameter search spaces using different formats:
Categorical Values
--param criterion=gini,entropy,log_lossInteger Range
--param n_estimators=int:100:1000Float Range
--param learning_rate=float:0.001:0.1Log Scale
--param learning_rate=float:0.0001:1.0:logYAML Configuration
Define complex tuning configurations in YAML:
# tuning_config.yaml
model: xgboost
strategy: bayesian
n_iter: 100
cv: 5
scoring: f1_weighted
parameters:
n_estimators:
type: int
low: 50
high: 500
max_depth:
type: int
low: 3
high: 15
learning_rate:
type: float
low: 0.001
high: 0.3
log: true
subsample:
type: float
low: 0.5
high: 1.0
colsample_bytree:
type: float
low: 0.5
high: 1.0
reg_alpha:
type: float
low: 0.0001
high: 10.0
log: true
reg_lambda:
type: float
low: 0.0001
high: 10.0
log: true
early_stopping:
patience: 10
min_delta: 0.001
output:
best_params: best_params.json
trials: trials.csv
plots: tuning_plots/Run with the config file:
mlcli tune data/train.csv --config tuning_config.yamlCross-Validation Options
# Standard K-Fold
mlcli tune data.csv --model rf --cv 5
# Stratified K-Fold (for classification)
mlcli tune data.csv --model rf --cv stratified:5
# Time Series Split
mlcli tune data.csv --model rf --cv timeseries:5
# Repeated K-Fold
mlcli tune data.csv --model rf --cv repeated:5:3Scoring Metrics
Available scoring metrics for optimization:
Classification
accuracy- Classification accuracyf1- F1 score (binary)f1_weighted- Weighted F1 scoreroc_auc- ROC AUC scoreprecision- Precision scorerecall- Recall score
Regression
neg_mse- Negative mean squared errorneg_rmse- Negative root mean squared errorneg_mae- Negative mean absolute errorr2- R² score
Output
After tuning, mlcli outputs the best parameters and can save detailed results:
mlcli tune data.csv \
--model xgboost \
--strategy bayesian \
--n-iter 50 \
--output-params best_params.json \
--output-trials trials.csv \
--output-plots plots/Using Best Parameters
Train a model with the tuned parameters:
# Use the best parameters from tuning
mlcli train data.csv \
--model xgboost \
--params best_params.json \
--output models/tuned_xgboost.pkl