Configuration
mlcli supports flexible configuration through JSON files and command-line arguments. This guide covers all available options.
Configuration File
Create a JSON configuration file to define your training parameters:
JSON
{
"model_type": "random_forest",
"dataset_path": "data/train.csv",
"target_column": "label",
"test_size": 0.2,
"random_state": 42,
"hyperparameters": {
"n_estimators": 100,
"max_depth": 10,
"min_samples_split": 2,
"min_samples_leaf": 1,
"class_weight": "balanced"
},
"output_dir": "models/",
"experiment_name": "my_experiment",
"tags": [
"production",
"v1"
]
}Configuration Options
Required Options
| Option | Type | Description |
|---|---|---|
model_type | string | Type of model to train (e.g., random_forest, xgboost, svm) |
dataset_path | string | Path to the training dataset (CSV format) |
target_column | string | Name of the target column in the dataset |
Optional Options
| Option | Type | Default | Description |
|---|---|---|---|
test_size | float | 0.2 | Proportion of data for validation |
random_state | int | 42 | Random seed for reproducibility |
output_dir | string | models/ | Directory to save trained models |
experiment_name | string | null | Name for the experiment run |
hyperparameters | object | {} | Model-specific hyperparameters |
Command-Line Arguments
You can also pass options directly via the command line:
Terminal
mlcli train \
--data data/train.csv \
--model random_forest \
--target label \
--test-size 0.2 \
--n-estimators 100 \
--max-depth 10 \
--output models/CLI arguments take precedence over config file values, allowing you to override specific options.
Environment Variables
Some options can be set via environment variables:
Terminal
# Set default output directory
export MLCLI_OUTPUT_DIR=models/
# Enable MLflow tracking
export MLCLI_MLFLOW_TRACKING_URI=http://localhost:5000
# Set random seed
export MLCLI_RANDOM_STATE=42Hyperparameters by Model
Random Forest
JSON
{
"n_estimators": 100,
"max_depth": 10,
"min_samples_split": 2,
"min_samples_leaf": 1,
"max_features": "sqrt",
"class_weight": "balanced"
}XGBoost
JSON
{
"n_estimators": 100,
"max_depth": 6,
"learning_rate": 0.1,
"subsample": 0.8,
"colsample_bytree": 0.8,
"objective": "binary:logistic"
}Deep Neural Network
JSON
{
"hidden_layers": [
128,
64,
32
],
"dropout_rate": 0.3,
"learning_rate": 0.001,
"epochs": 100,
"batch_size": 32,
"optimizer": "adam"
}