Skip to content

Auto MPG¤

Running in Google Colab¤

You can run this experiment in Google Colab by clicking the button below:

Open in Colab

Dataset¤

The Auto MPG Dataset is a regression dataset [1] with 7 features:

  • Cylinders

  • Displacement

  • Horsepower

  • Weight

  • Acceleration

  • Model Year

  • Origin.

The dependant variable MPG is monotonically decreasing with respect to features Weigh, Displacement, and Horsepower. The monotonicity_indicator corrsponding to these features are set to -1, since the relationship is a monotonically decreasing one with respect to the dependant variable.

This is a part of comparison with methods and datasets from COMET [2].

References:

  1. Ross Quinlan. Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann, 1993.

    https://archive.ics.uci.edu/ml/datasets/auto+mpg

  2. Aishwarya Sivaraman, Golnoosh Farnadi, Todd Millstein, and Guy Van den Broeck. Counterexample-guided learning of monotonic neural networks. Advances in Neural Information Processing Systems, 33:11936–11948, 2020.

    Github repo: https://github.com/AishwaryaSivaraman/COMET

monotonicity_indicator = {
    "Cylinders": 0,
    "Displacement": -1,
    "Horsepower": -1,
    "Weight": -1,
    "Acceleration": 0,
    "Model_Year": 0,
    "Origin": 0,
}

These are a few examples of the dataset:

  0 1 2 3 4
Cylinders 1.482807 1.482807 1.482807 1.482807 1.482807
Displacement 1.073028 1.482902 1.044432 1.025368 2.235927
Horsepower 0.650564 1.548993 1.163952 0.907258 2.396084
Weight 0.606625 0.828131 0.523413 0.542165 1.587581
Acceleration -1.275546 -1.452517 -1.275546 -1.806460 -1.983431
Model_Year -1.631803 -1.631803 -1.631803 -1.631803 -1.631803
Origin -0.701669 -0.701669 -0.701669 -0.701669 -0.701669
ground_truth 18.000000 15.000000 16.000000 17.000000 15.000000

The choice of the batch size and the maximum number of epochs depends on the dataset size. For this dataset, we use the following values:

batch_size = 16
max_epochs = 50

We use the Type-2 architecture built using MonoDense layer with the following set of hyperparameters ranges:

def hp_params_f(hp):
    return dict(
        units=hp.Int("units", min_value=16, max_value=24, step=1),
        n_layers=hp.Int("n_layers", min_value=2, max_value=2),
        activation=hp.Choice("activation", values=["elu"]),
        learning_rate=hp.Float(
            "learning_rate", min_value=1e-2, max_value=0.3, sampling="log"
        ),
        weight_decay=hp.Float(
            "weight_decay", min_value=1e-2, max_value=0.3, sampling="log"
        ),
        dropout=hp.Float("dropout", min_value=0.0, max_value=0.5, sampling="linear"),
        decay_rate=hp.Float(
            "decay_rate", min_value=0.8, max_value=1.0, sampling="reverse_log"
        ),
    )

The following fixed parameters are used to build the Type-2 architecture for this dataset:

  • final_activation is used to build the final layer for regression problem (set to None) or for the classification problem ("sigmoid"),

  • loss is used for training regression ("mse") or classification ("binary_crossentropy") problem, and

  • metrics denotes metrics used to compare with previosly published results: "accuracy" for classification and “mse” or “rmse” for regression.

Parameters objective and direction are used by the tuner such that objective=f"val_{metrics}" and direction is either "min or "max".

Parameters max_trials denotes the number of trial performed buy the tuner, patience is the number of epochs allowed to perform worst than the best one before stopping the current trial. The parameter execution_per_trial denotes the number of runs before calculating the results of a trial, it should be set to value greater than 1 for small datasets that have high variance in results.

final_activation = None
loss = "mse"
metrics = "mse"
objective = "val_mse"
direction = "min"
max_trials = 200
patience = 5
executions_per_trial = 3

The following table describes the best models and their hyperparameters found by the tuner:

  0 1 2 3 4
units 21 17 19 21 22
n_layers 2 2 2 2 2
activation elu elu elu elu elu
learning_rate 0.073407 0.105021 0.080618 0.042817 0.107845
weight_decay 0.058583 0.064151 0.023706 0.045050 0.032343
dropout 0.157718 0.189830 0.149354 0.324661 0.237459
decay_rate 0.887923 0.828540 0.800000 0.988544 0.886158
val_mse_mean 8.371161 8.404634 8.420449 8.421339 8.430901
val_mse_std 0.084437 0.149566 0.110670 0.063357 0.115722
val_mse_min 8.251875 8.255271 8.294801 8.352478 8.297507
val_mse_max 8.476566 8.614701 8.576631 8.520736 8.565886
params 848 567 627 848 885

The optimal model¤

These are the best hyperparameters found by previous runs of the tuner:

def final_hp_params_f(hp):
    return dict(
        units=hp.Fixed("units", value=21),
        n_layers=hp.Fixed("n_layers", 2),
        activation=hp.Fixed("activation", value="elu"),
        learning_rate=hp.Fixed("learning_rate", value=0.073407),
        weight_decay=hp.Fixed("weight_decay", value=0.058583),
        dropout=hp.Fixed("dropout", value=0.157718),
        decay_rate=hp.Fixed("decay_rate", value=0.887923),
    )

The final evaluation of the optimal model:

  0
units 21
n_layers 2
activation elu
learning_rate 0.073407
weight_decay 0.058583
dropout 0.157718
decay_rate 0.887923
val_mse_mean 8.371155
val_mse_std 0.084440
val_mse_min 8.251865
val_mse_max 8.476567
params 848