Auto MPG¤
Running in Google Colab¤
You can run this experiment in Google Colab by clicking the button below:
Dataset¤
The Auto MPG Dataset is a regression dataset [1] with 7 features:
-
Cylinders
-
Displacement
-
Horsepower
-
Weight
-
Acceleration
-
Model Year
-
Origin.
The dependant variable MPG is monotonically decreasing with respect to
features Weigh, Displacement, and Horsepower. The
monotonicity_indicator
corrsponding to these features are set to -1,
since the relationship is a monotonically decreasing one with respect to
the dependant variable.
This is a part of comparison with methods and datasets from COMET [2].
References:
-
Ross Quinlan. Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann, 1993.
-
Aishwarya Sivaraman, Golnoosh Farnadi, Todd Millstein, and Guy Van den Broeck. Counterexample-guided learning of monotonic neural networks. Advances in Neural Information Processing Systems, 33:11936–11948, 2020.
Github repo: https://github.com/AishwaryaSivaraman/COMET
monotonicity_indicator = {
"Cylinders": 0,
"Displacement": -1,
"Horsepower": -1,
"Weight": -1,
"Acceleration": 0,
"Model_Year": 0,
"Origin": 0,
}
These are a few examples of the dataset:
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
Cylinders | 1.482807 | 1.482807 | 1.482807 | 1.482807 | 1.482807 |
Displacement | 1.073028 | 1.482902 | 1.044432 | 1.025368 | 2.235927 |
Horsepower | 0.650564 | 1.548993 | 1.163952 | 0.907258 | 2.396084 |
Weight | 0.606625 | 0.828131 | 0.523413 | 0.542165 | 1.587581 |
Acceleration | -1.275546 | -1.452517 | -1.275546 | -1.806460 | -1.983431 |
Model_Year | -1.631803 | -1.631803 | -1.631803 | -1.631803 | -1.631803 |
Origin | -0.701669 | -0.701669 | -0.701669 | -0.701669 | -0.701669 |
ground_truth | 18.000000 | 15.000000 | 16.000000 | 17.000000 | 15.000000 |
Hyperparameter search¤
The choice of the batch size and the maximum number of epochs depends on the dataset size. For this dataset, we use the following values:
batch_size = 16
max_epochs = 50
We use the Type-2 architecture built using
MonoDense
layer with the following set of hyperparameters ranges:
def hp_params_f(hp):
return dict(
units=hp.Int("units", min_value=16, max_value=24, step=1),
n_layers=hp.Int("n_layers", min_value=2, max_value=2),
activation=hp.Choice("activation", values=["elu"]),
learning_rate=hp.Float(
"learning_rate", min_value=1e-2, max_value=0.3, sampling="log"
),
weight_decay=hp.Float(
"weight_decay", min_value=1e-2, max_value=0.3, sampling="log"
),
dropout=hp.Float("dropout", min_value=0.0, max_value=0.5, sampling="linear"),
decay_rate=hp.Float(
"decay_rate", min_value=0.8, max_value=1.0, sampling="reverse_log"
),
)
The following fixed parameters are used to build the Type-2 architecture for this dataset:
-
final_activation
is used to build the final layer for regression problem (set toNone
) or for the classification problem ("sigmoid"
), -
loss
is used for training regression ("mse"
) or classification ("binary_crossentropy"
) problem, and -
metrics
denotes metrics used to compare with previosly published results:"accuracy"
for classification and “mse
” or “rmse
” for regression.
Parameters objective
and direction
are used by the tuner such that
objective=f"val_{metrics}"
and direction is either "min
or "max"
.
Parameters max_trials
denotes the number of trial performed buy the
tuner, patience
is the number of epochs allowed to perform worst than
the best one before stopping the current trial. The parameter
execution_per_trial
denotes the number of runs before calculating the
results of a trial, it should be set to value greater than 1 for small
datasets that have high variance in results.
final_activation = None
loss = "mse"
metrics = "mse"
objective = "val_mse"
direction = "min"
max_trials = 200
patience = 5
executions_per_trial = 3
The following table describes the best models and their hyperparameters found by the tuner:
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
units | 21 | 17 | 19 | 21 | 22 |
n_layers | 2 | 2 | 2 | 2 | 2 |
activation | elu | elu | elu | elu | elu |
learning_rate | 0.073407 | 0.105021 | 0.080618 | 0.042817 | 0.107845 |
weight_decay | 0.058583 | 0.064151 | 0.023706 | 0.045050 | 0.032343 |
dropout | 0.157718 | 0.189830 | 0.149354 | 0.324661 | 0.237459 |
decay_rate | 0.887923 | 0.828540 | 0.800000 | 0.988544 | 0.886158 |
val_mse_mean | 8.371161 | 8.404634 | 8.420449 | 8.421339 | 8.430901 |
val_mse_std | 0.084437 | 0.149566 | 0.110670 | 0.063357 | 0.115722 |
val_mse_min | 8.251875 | 8.255271 | 8.294801 | 8.352478 | 8.297507 |
val_mse_max | 8.476566 | 8.614701 | 8.576631 | 8.520736 | 8.565886 |
params | 848 | 567 | 627 | 848 | 885 |
The optimal model¤
These are the best hyperparameters found by previous runs of the tuner:
def final_hp_params_f(hp):
return dict(
units=hp.Fixed("units", value=21),
n_layers=hp.Fixed("n_layers", 2),
activation=hp.Fixed("activation", value="elu"),
learning_rate=hp.Fixed("learning_rate", value=0.073407),
weight_decay=hp.Fixed("weight_decay", value=0.058583),
dropout=hp.Fixed("dropout", value=0.157718),
decay_rate=hp.Fixed("decay_rate", value=0.887923),
)
The final evaluation of the optimal model:
0 | |
---|---|
units | 21 |
n_layers | 2 |
activation | elu |
learning_rate | 0.073407 |
weight_decay | 0.058583 |
dropout | 0.157718 |
decay_rate | 0.887923 |
val_mse_mean | 8.371155 |
val_mse_std | 0.084440 |
val_mse_min | 8.251865 |
val_mse_max | 8.476567 |
params | 848 |