Loan¤
Running in Google Colab¤
You can run this experiment in Google Colab by clicking the button below:
Dataset¤
Lending club loan data contains complete loan data for all loans
issued through 2007-2015 of several banks. Each data point is a
28-dimensional feature including the current loan status, latest payment
information, and other additional features. The task is to predict loan
defaulters given the feature vector. The possibility of loan default
should be nondecreasing w.r.t. number of public record bankruptcies,
Debt-to-Income ratio, and non-increasing w.r.t. credit score, length of
employment, annual income. Thus the monotonicity_indicator
corrsponding to these features are set to 1.
References:
- https://www.kaggle.com/wendykan/lending-club-loan-data (Note: Currently, the dataset seems to be withdrawn from kaggle)
monotonicity_indicator = {
f"feature_{i}": mi for i, mi in enumerate([-1, 1, -1, -1, 1] + [0] * 23)
}
These are a few examples of the dataset:
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
feature_0 | 0.833333 | 1.000000 | 0.666667 | 0.333333 | 0.666667 |
feature_1 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_2 | 0.400000 | 1.000000 | 0.800000 | 0.500000 | 0.700000 |
feature_3 | 0.005263 | 0.003474 | 0.005263 | 0.007158 | 0.006842 |
feature_4 | 0.005185 | 0.023804 | 0.029700 | 0.024434 | 0.021962 |
feature_5 | 0.185751 | 0.134860 | 0.236641 | 0.745547 | 0.440204 |
feature_6 | 0.240654 | 0.036215 | 0.271807 | 0.778037 | 0.260125 |
feature_7 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
feature_8 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_9 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 |
feature_10 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_11 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_12 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_13 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
feature_14 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_15 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 1.000000 |
feature_16 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
feature_17 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_18 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_19 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_20 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_21 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_22 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_23 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_24 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_25 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_26 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
feature_27 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
ground_truth | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Hyperparameter search¤
The choice of the batch size and the maximum number of epochs depends on the dataset size. For this dataset, we use the following values:
batch_size = 256
max_epochs = 20
We use the Type-2 architecture built using
MonoDense
layer with the following set of hyperparameters ranges:
def hp_params_f(hp):
return dict(
units=hp.Int("units", min_value=4, max_value=32, step=1),
n_layers=hp.Int("n_layers", min_value=1, max_value=2),
activation=hp.Choice("activation", values=["elu"]),
learning_rate=hp.Float(
"learning_rate", min_value=1e-4, max_value=1e-2, sampling="log"
),
weight_decay=hp.Float(
"weight_decay", min_value=3e-2, max_value=0.3, sampling="log"
),
dropout=hp.Float("dropout", min_value=0.0, max_value=0.5, sampling="linear"),
decay_rate=hp.Float(
"decay_rate", min_value=0.8, max_value=1.0, sampling="reverse_log"
),
)
The following fixed parameters are used to build the Type-2 architecture for this dataset:
-
final_activation
is used to build the final layer for regression problem (set toNone
) or for the classification problem ("sigmoid"
), -
loss
is used for training regression ("mse"
) or classification ("binary_crossentropy"
) problem, and -
metrics
denotes metrics used to compare with previosly published results:"accuracy"
for classification and “mse
” or “rmse
” for regression.
Parameters objective
and direction
are used by the tuner such that
objective=f"val_{metrics}"
and direction is either "min
or "max"
.
Parameters max_trials
denotes the number of trial performed buy the
tuner, patience
is the number of epochs allowed to perform worst than
the best one before stopping the current trial. The parameter
execution_per_trial
denotes the number of runs before calculating the
results of a trial, it should be set to value greater than 1 for small
datasets that have high variance in results.
final_activation = None
loss = "binary_crossentropy"
metrics = "accuracy"
objective = "val_accuracy"
direction = "max"
max_trials = 50
executions_per_trial = 1
patience = 5
The following table describes the best models and their hyperparameters found by the tuner:
The optimal model¤
These are the best hyperparameters found by previous runs of the tuner:
def final_hp_params_f(hp):
return dict(
units=hp.Fixed("units", value=8),
n_layers=hp.Fixed("n_layers", 2),
activation=hp.Fixed("activation", value="elu"),
learning_rate=hp.Fixed("learning_rate", value=0.008),
weight_decay=hp.Fixed("weight_decay", value=0.0),
dropout=hp.Fixed("dropout", value=0.0),
decay_rate=hp.Fixed("decay_rate", value=1.0),
)
The final evaluation of the optimal model:
0 | |
---|---|
units | 8 |
n_layers | 2 |
activation | elu |
learning_rate | 0.008000 |
weight_decay | 0.000000 |
dropout | 0.000000 |
decay_rate | 1.000000 |
val_accuracy_mean | 0.652917 |
val_accuracy_std | 0.000085 |
val_accuracy_min | 0.652851 |
val_accuracy_max | 0.653065 |
params | 577 |