Heart disease¤
Running in Google Colab¤
You can run this experiment in Google Colab by clicking the button below:
Dataset¤
Heart Disease [1] is a classification dataset used for predicting the presence of heart disease with 13 features:
-
age
-
sex
-
cp
-
trestbps
-
chol
-
fbs
-
restecg
-
thalach
-
exang
-
oldpeak
-
slope
-
ca
-
thal
The dependant variable is monotonically increasing with respect to
features trestbps
and cholestrol (chol
). The
monotonicity_indicator
corrsponding to these features are set to 1.
References:
-
John H. Gennari, Pat Langley, and Douglas H. Fisher. Models of incremental concept formation. Artif. Intell., 40(1-3):11–61, 1989.
-
Aishwarya Sivaraman, Golnoosh Farnadi, Todd Millstein, and Guy Van den Broeck. Counterexample-guided learning of monotonic neural networks. Advances in Neural Information Processing Systems, 33:11936–11948, 2020
monotonicity_indicator = {
"age": 0,
"sex": 0,
"cp": 0,
"trestbps": 1,
"chol": 1,
"fbs": 0,
"restecg": 0,
"thalach": 0,
"exang": 0,
"oldpeak": 0,
"slope": 0,
"ca": 0,
"thal": 0,
}
These are a few examples of the dataset:
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
age | 0.972778 | 1.415074 | 1.415074 | -1.902148 | -1.459852 |
sex | 0.649445 | 0.649445 | 0.649445 | 0.649445 | -1.533413 |
cp | -2.020077 | 0.884034 | 0.884034 | -0.084003 | -1.052040 |
trestbps | 0.721008 | 1.543527 | -0.649858 | -0.101512 | -0.101512 |
chol | -0.251855 | 0.740555 | -0.326754 | 0.066465 | -0.794872 |
fbs | 2.426901 | -0.410346 | -0.410346 | -0.410346 | -0.410346 |
restecg | 1.070838 | 1.070838 | 1.070838 | -0.953715 | 1.070838 |
thalach | -0.025055 | -1.831151 | -0.928103 | 1.566030 | 0.920995 |
exang | -0.721010 | 1.381212 | 1.381212 | -0.721010 | -0.721010 |
oldpeak | 0.986440 | 0.330395 | 1.232457 | 1.970508 | 0.248389 |
slope | 2.334348 | 0.687374 | 0.687374 | 2.334348 | -0.959601 |
ca | -0.770198 | 2.425024 | 1.359950 | -0.770198 | -0.770198 |
thal | -2.070238 | -0.514345 | 1.041548 | -0.514345 | -0.514345 |
ground_truth | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 |
Hyperparameter search¤
The choice of the batch size and the maximum number of epochs depends on the dataset size. For this dataset, we use the following values:
batch_size = 16
max_epochs = 50
We use the Type-2 architecture built using
MonoDense
layer with the following set of hyperparameters ranges:
def hp_params_f(hp):
return dict(
units=hp.Int("units", min_value=16, max_value=32, step=1),
n_layers=hp.Int("n_layers", min_value=2, max_value=2),
activation=hp.Choice("activation", values=["elu"]),
learning_rate=hp.Float(
"learning_rate", min_value=1e-4, max_value=1e-2, sampling="log"
),
weight_decay=hp.Float(
"weight_decay", min_value=3e-2, max_value=0.3, sampling="log"
),
dropout=hp.Float("dropout", min_value=0.0, max_value=0.5, sampling="linear"),
decay_rate=hp.Float(
"decay_rate", min_value=0.8, max_value=1.0, sampling="reverse_log"
),
)
The following fixed parameters are used to build the Type-2 architecture for this dataset:
-
final_activation
is used to build the final layer for regression problem (set toNone
) or for the classification problem ("sigmoid"
), -
loss
is used for training regression ("mse"
) or classification ("binary_crossentropy"
) problem, and -
metrics
denotes metrics used to compare with previosly published results:"accuracy"
for classification and “mse
” or “rmse
” for regression.
Parameters objective
and direction
are used by the tuner such that
objective=f"val_{metrics}"
and direction is either "min
or "max"
.
Parameters max_trials
denotes the number of trial performed buy the
tuner, patience
is the number of epochs allowed to perform worst than
the best one before stopping the current trial. The parameter
execution_per_trial
denotes the number of runs before calculating the
results of a trial, it should be set to value greater than 1 for small
datasets that have high variance in results.
final_activation = "sigmoid"
loss = "binary_crossentropy"
metrics = "accuracy"
objective = "val_accuracy"
direction = "max"
max_trials = 200
executions_per_trial = 3
patience = 5
The following table describes the best models and their hyperparameters found by the tuner:
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
units | 22 | 18 | 23 | 23 | 21 |
n_layers | 2 | 2 | 2 | 2 | 2 |
activation | elu | elu | elu | elu | elu |
learning_rate | 0.001000 | 0.001000 | 0.001328 | 0.001000 | 0.001000 |
weight_decay | 0.113929 | 0.122019 | 0.111481 | 0.139452 | 0.140732 |
dropout | 0.397874 | 0.460844 | 0.405396 | 0.424631 | 0.418484 |
decay_rate | 0.894921 | 0.921600 | 0.901050 | 0.897339 | 0.889619 |
val_accuracy_mean | 0.885246 | 0.885246 | 0.881967 | 0.881967 | 0.878689 |
val_accuracy_std | 0.000000 | 0.000000 | 0.007331 | 0.007331 | 0.008979 |
val_accuracy_min | 0.885246 | 0.885246 | 0.868852 | 0.868852 | 0.868852 |
val_accuracy_max | 0.885246 | 0.885246 | 0.885246 | 0.885246 | 0.885246 |
params | 1605 | 1077 | 1672 | 1672 | 1538 |
The optimal model¤
These are the best hyperparameters found by previous runs of the tuner:
def final_hp_params_f(hp):
return dict(
units=hp.Fixed("units", value=22),
n_layers=hp.Fixed("n_layers", 2),
activation=hp.Fixed("activation", value="elu"),
learning_rate=hp.Fixed("learning_rate", value=0.001),
weight_decay=hp.Fixed("weight_decay", value=0.113929),
dropout=hp.Fixed("dropout", value=0.397874),
decay_rate=hp.Fixed("decay_rate", value=0.894921),
)
The final evaluation of the optimal model:
0 | |
---|---|
units | 22 |
n_layers | 2 |
activation | elu |
learning_rate | 0.001000 |
weight_decay | 0.113929 |
dropout | 0.397874 |
decay_rate | 0.894921 |
val_accuracy_mean | 0.885246 |
val_accuracy_std | 0.000000 |
val_accuracy_min | 0.885246 |
val_accuracy_max | 0.885246 |
params | 1605 |