Auto MPG¤

Running in Google Colab¤

You can run this experiment in Google Colab by clicking the button below:

Dataset¤

The Auto MPG Dataset is a regression dataset [1] with 7 features:

Cylinders
Displacement
Horsepower
Weight
Acceleration
Model Year
Origin.

The dependant variable MPG is monotonically decreasing with respect to features Weigh, Displacement, and Horsepower. The monotonicity_indicator corrsponding to these features are set to -1, since the relationship is a monotonically decreasing one with respect to the dependant variable.

This is a part of comparison with methods and datasets from COMET [2].

References:

Ross Quinlan. Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann, 1993.

https://archive.ics.uci.edu/ml/datasets/auto+mpg
Aishwarya Sivaraman, Golnoosh Farnadi, Todd Millstein, and Guy Van den Broeck. Counterexample-guided learning of monotonic neural networks. Advances in Neural Information Processing Systems, 33:11936–11948, 2020.

Github repo: https://github.com/AishwaryaSivaraman/COMET

monotonicity_indicator = {
    "Cylinders": 0,
    "Displacement": -1,
    "Horsepower": -1,
    "Weight": -1,
    "Acceleration": 0,
    "Model_Year": 0,
    "Origin": 0,
}

These are a few examples of the dataset:

	0	1	2	3	4
Cylinders	1.482807	1.482807	1.482807	1.482807	1.482807
Displacement	1.073028	1.482902	1.044432	1.025368	2.235927
Horsepower	0.650564	1.548993	1.163952	0.907258	2.396084
Weight	0.606625	0.828131	0.523413	0.542165	1.587581
Acceleration	-1.275546	-1.452517	-1.275546	-1.806460	-1.983431
Model_Year	-1.631803	-1.631803	-1.631803	-1.631803	-1.631803
Origin	-0.701669	-0.701669	-0.701669	-0.701669	-0.701669
ground_truth	18.000000	15.000000	16.000000	17.000000	15.000000

Hyperparameter search¤

The choice of the batch size and the maximum number of epochs depends on the dataset size. For this dataset, we use the following values:

batch_size = 16
max_epochs = 50

We use the Type-2 architecture built using MonoDense layer with the following set of hyperparameters ranges:

def hp_params_f(hp):
    return dict(
        units=hp.Int("units", min_value=16, max_value=24, step=1),
        n_layers=hp.Int("n_layers", min_value=2, max_value=2),
        activation=hp.Choice("activation", values=["elu"]),
        learning_rate=hp.Float(
            "learning_rate", min_value=1e-2, max_value=0.3, sampling="log"
        ),
        weight_decay=hp.Float(
            "weight_decay", min_value=1e-2, max_value=0.3, sampling="log"
        ),
        dropout=hp.Float("dropout", min_value=0.0, max_value=0.5, sampling="linear"),
        decay_rate=hp.Float(
            "decay_rate", min_value=0.8, max_value=1.0, sampling="reverse_log"
        ),
    )

The following fixed parameters are used to build the Type-2 architecture for this dataset:

final_activation is used to build the final layer for regression problem (set to None) or for the classification problem ("sigmoid"),
loss is used for training regression ("mse") or classification ("binary_crossentropy") problem, and
metrics denotes metrics used to compare with previosly published results: "accuracy" for classification and “mse” or “rmse” for regression.

Parameters objective and direction are used by the tuner such that objective=f"val_{metrics}" and direction is either "min or "max".

Parameters max_trials denotes the number of trial performed buy the tuner, patience is the number of epochs allowed to perform worst than the best one before stopping the current trial. The parameter execution_per_trial denotes the number of runs before calculating the results of a trial, it should be set to value greater than 1 for small datasets that have high variance in results.

final_activation = None
loss = "mse"
metrics = "mse"
objective = "val_mse"
direction = "min"
max_trials = 200
patience = 5
executions_per_trial = 3

The following table describes the best models and their hyperparameters found by the tuner:

	0	1	2	3	4
units	21	17	19	21	22
n_layers	2	2	2	2	2
activation	elu	elu	elu	elu	elu
learning_rate	0.073407	0.105021	0.080618	0.042817	0.107845
weight_decay	0.058583	0.064151	0.023706	0.045050	0.032343
dropout	0.157718	0.189830	0.149354	0.324661	0.237459
decay_rate	0.887923	0.828540	0.800000	0.988544	0.886158
val_mse_mean	8.371161	8.404634	8.420449	8.421339	8.430901
val_mse_std	0.084437	0.149566	0.110670	0.063357	0.115722
val_mse_min	8.251875	8.255271	8.294801	8.352478	8.297507
val_mse_max	8.476566	8.614701	8.576631	8.520736	8.565886
params	848	567	627	848	885

The optimal model¤

These are the best hyperparameters found by previous runs of the tuner:

def final_hp_params_f(hp):
    return dict(
        units=hp.Fixed("units", value=21),
        n_layers=hp.Fixed("n_layers", 2),
        activation=hp.Fixed("activation", value="elu"),
        learning_rate=hp.Fixed("learning_rate", value=0.073407),
        weight_decay=hp.Fixed("weight_decay", value=0.058583),
        dropout=hp.Fixed("dropout", value=0.157718),
        decay_rate=hp.Fixed("decay_rate", value=0.887923),
    )

The final evaluation of the optimal model:

	0
units	21
n_layers	2
activation	elu
learning_rate	0.073407
weight_decay	0.058583
dropout	0.157718
decay_rate	0.887923
val_mse_mean	8.371155
val_mse_std	0.084440
val_mse_min	8.251865
val_mse_max	8.476567
params	848