Configuration

mlcli supports flexible configuration through JSON files and command-line arguments. This guide covers all available options.

Configuration File

Create a JSON configuration file to define your training parameters:

JSON
{
  "model_type": "random_forest",
  "dataset_path": "data/train.csv",
  "target_column": "label",
  "test_size": 0.2,
  "random_state": 42,
  "hyperparameters": {
    "n_estimators": 100,
    "max_depth": 10,
    "min_samples_split": 2,
    "min_samples_leaf": 1,
    "class_weight": "balanced"
  },
  "output_dir": "models/",
  "experiment_name": "my_experiment",
  "tags": [
    "production",
    "v1"
  ]
}

Configuration Options

Required Options

OptionTypeDescription
model_typestringType of model to train (e.g., random_forest, xgboost, svm)
dataset_pathstringPath to the training dataset (CSV format)
target_columnstringName of the target column in the dataset

Optional Options

OptionTypeDefaultDescription
test_sizefloat0.2Proportion of data for validation
random_stateint42Random seed for reproducibility
output_dirstringmodels/Directory to save trained models
experiment_namestringnullName for the experiment run
hyperparametersobject{}Model-specific hyperparameters

Command-Line Arguments

You can also pass options directly via the command line:

Terminal
mlcli train \
  --data data/train.csv \
  --model random_forest \
  --target label \
  --test-size 0.2 \
  --n-estimators 100 \
  --max-depth 10 \
  --output models/

CLI arguments take precedence over config file values, allowing you to override specific options.

Environment Variables

Some options can be set via environment variables:

Terminal
# Set default output directory
export MLCLI_OUTPUT_DIR=models/

# Enable MLflow tracking
export MLCLI_MLFLOW_TRACKING_URI=http://localhost:5000

# Set random seed
export MLCLI_RANDOM_STATE=42

Hyperparameters by Model

Random Forest

JSON
{
  "n_estimators": 100,
  "max_depth": 10,
  "min_samples_split": 2,
  "min_samples_leaf": 1,
  "max_features": "sqrt",
  "class_weight": "balanced"
}

XGBoost

JSON
{
  "n_estimators": 100,
  "max_depth": 6,
  "learning_rate": 0.1,
  "subsample": 0.8,
  "colsample_bytree": 0.8,
  "objective": "binary:logistic"
}

Deep Neural Network

JSON
{
  "hidden_layers": [
    128,
    64,
    32
  ],
  "dropout_rate": 0.3,
  "learning_rate": 0.001,
  "epochs": 100,
  "batch_size": 32,
  "optimizer": "adam"
}