Heart disease¤

Running in Google Colab¤

You can run this experiment in Google Colab by clicking the button below:

Dataset¤

Heart Disease [1] is a classification dataset used for predicting the presence of heart disease with 13 features:

age
sex
cp
trestbps
chol
fbs
restecg
thalach
exang
oldpeak
slope
ca
thal

The dependant variable is monotonically increasing with respect to features trestbps and cholestrol (chol). The monotonicity_indicator corrsponding to these features are set to 1.

References:

John H. Gennari, Pat Langley, and Douglas H. Fisher. Models of incremental concept formation. Artif. Intell., 40(1-3):11–61, 1989.

https://archive.ics.uci.edu/ml/datasets/heart+disease
Aishwarya Sivaraman, Golnoosh Farnadi, Todd Millstein, and Guy Van den Broeck. Counterexample-guided learning of monotonic neural networks. Advances in Neural Information Processing Systems, 33:11936–11948, 2020

monotonicity_indicator = {
    "age": 0,
    "sex": 0,
    "cp": 0,
    "trestbps": 1,
    "chol": 1,
    "fbs": 0,
    "restecg": 0,
    "thalach": 0,
    "exang": 0,
    "oldpeak": 0,
    "slope": 0,
    "ca": 0,
    "thal": 0,
}

These are a few examples of the dataset:

	0	1	2	3	4
age	0.972778	1.415074	1.415074	-1.902148	-1.459852
sex	0.649445	0.649445	0.649445	0.649445	-1.533413
cp	-2.020077	0.884034	0.884034	-0.084003	-1.052040
trestbps	0.721008	1.543527	-0.649858	-0.101512	-0.101512
chol	-0.251855	0.740555	-0.326754	0.066465	-0.794872
fbs	2.426901	-0.410346	-0.410346	-0.410346	-0.410346
restecg	1.070838	1.070838	1.070838	-0.953715	1.070838
thalach	-0.025055	-1.831151	-0.928103	1.566030	0.920995
exang	-0.721010	1.381212	1.381212	-0.721010	-0.721010
oldpeak	0.986440	0.330395	1.232457	1.970508	0.248389
slope	2.334348	0.687374	0.687374	2.334348	-0.959601
ca	-0.770198	2.425024	1.359950	-0.770198	-0.770198
thal	-2.070238	-0.514345	1.041548	-0.514345	-0.514345
ground_truth	0.000000	1.000000	0.000000	0.000000	0.000000

Hyperparameter search¤

The choice of the batch size and the maximum number of epochs depends on the dataset size. For this dataset, we use the following values:

batch_size = 16
max_epochs = 50

We use the Type-2 architecture built using MonoDense layer with the following set of hyperparameters ranges:

def hp_params_f(hp):
    return dict(
        units=hp.Int("units", min_value=16, max_value=32, step=1),
        n_layers=hp.Int("n_layers", min_value=2, max_value=2),
        activation=hp.Choice("activation", values=["elu"]),
        learning_rate=hp.Float(
            "learning_rate", min_value=1e-4, max_value=1e-2, sampling="log"
        ),
        weight_decay=hp.Float(
            "weight_decay", min_value=3e-2, max_value=0.3, sampling="log"
        ),
        dropout=hp.Float("dropout", min_value=0.0, max_value=0.5, sampling="linear"),
        decay_rate=hp.Float(
            "decay_rate", min_value=0.8, max_value=1.0, sampling="reverse_log"
        ),
    )

The following fixed parameters are used to build the Type-2 architecture for this dataset:

final_activation is used to build the final layer for regression problem (set to None) or for the classification problem ("sigmoid"),
loss is used for training regression ("mse") or classification ("binary_crossentropy") problem, and
metrics denotes metrics used to compare with previosly published results: "accuracy" for classification and “mse” or “rmse” for regression.

Parameters objective and direction are used by the tuner such that objective=f"val_{metrics}" and direction is either "min or "max".

Parameters max_trials denotes the number of trial performed buy the tuner, patience is the number of epochs allowed to perform worst than the best one before stopping the current trial. The parameter execution_per_trial denotes the number of runs before calculating the results of a trial, it should be set to value greater than 1 for small datasets that have high variance in results.

final_activation = "sigmoid"
loss = "binary_crossentropy"
metrics = "accuracy"
objective = "val_accuracy"
direction = "max"
max_trials = 200
executions_per_trial = 3
patience = 5

The following table describes the best models and their hyperparameters found by the tuner:

	0	1	2	3	4
units	22	18	23	23	21
n_layers	2	2	2	2	2
activation	elu	elu	elu	elu	elu
learning_rate	0.001000	0.001000	0.001328	0.001000	0.001000
weight_decay	0.113929	0.122019	0.111481	0.139452	0.140732
dropout	0.397874	0.460844	0.405396	0.424631	0.418484
decay_rate	0.894921	0.921600	0.901050	0.897339	0.889619
val_accuracy_mean	0.885246	0.885246	0.881967	0.881967	0.878689
val_accuracy_std	0.000000	0.000000	0.007331	0.007331	0.008979
val_accuracy_min	0.885246	0.885246	0.868852	0.868852	0.868852
val_accuracy_max	0.885246	0.885246	0.885246	0.885246	0.885246
params	1605	1077	1672	1672	1538

The optimal model¤

These are the best hyperparameters found by previous runs of the tuner:

def final_hp_params_f(hp):
    return dict(
        units=hp.Fixed("units", value=22),
        n_layers=hp.Fixed("n_layers", 2),
        activation=hp.Fixed("activation", value="elu"),
        learning_rate=hp.Fixed("learning_rate", value=0.001),
        weight_decay=hp.Fixed("weight_decay", value=0.113929),
        dropout=hp.Fixed("dropout", value=0.397874),
        decay_rate=hp.Fixed("decay_rate", value=0.894921),
    )

The final evaluation of the optimal model:

	0
units	22
n_layers	2
activation	elu
learning_rate	0.001000
weight_decay	0.113929
dropout	0.397874
decay_rate	0.894921
val_accuracy_mean	0.885246
val_accuracy_std	0.000000
val_accuracy_min	0.885246
val_accuracy_max	0.885246
params	1605