Loan¤

Running in Google Colab¤

You can run this experiment in Google Colab by clicking the button below:

Dataset¤

Lending club loan data contains complete loan data for all loans issued through 2007-2015 of several banks. Each data point is a 28-dimensional feature including the current loan status, latest payment information, and other additional features. The task is to predict loan defaulters given the feature vector. The possibility of loan default should be nondecreasing w.r.t. number of public record bankruptcies, Debt-to-Income ratio, and non-increasing w.r.t. credit score, length of employment, annual income. Thus the monotonicity_indicator corrsponding to these features are set to 1.

References:

https://www.kaggle.com/wendykan/lending-club-loan-data (Note: Currently, the dataset seems to be withdrawn from kaggle)

monotonicity_indicator = {
    f"feature_{i}": mi for i, mi in enumerate([-1, 1, -1, -1, 1] + [0] * 23)
}

These are a few examples of the dataset:

	0	1	2	3	4
feature_0	0.833333	1.000000	0.666667	0.333333	0.666667
feature_1	0.000000	0.000000	0.000000	0.000000	0.000000
feature_2	0.400000	1.000000	0.800000	0.500000	0.700000
feature_3	0.005263	0.003474	0.005263	0.007158	0.006842
feature_4	0.005185	0.023804	0.029700	0.024434	0.021962
feature_5	0.185751	0.134860	0.236641	0.745547	0.440204
feature_6	0.240654	0.036215	0.271807	0.778037	0.260125
feature_7	0.000000	0.000000	0.000000	1.000000	0.000000
feature_8	0.000000	0.000000	0.000000	0.000000	0.000000
feature_9	0.000000	0.000000	1.000000	0.000000	1.000000
feature_10	0.000000	0.000000	0.000000	0.000000	0.000000
feature_11	0.000000	0.000000	0.000000	0.000000	0.000000
feature_12	0.000000	1.000000	0.000000	0.000000	0.000000
feature_13	1.000000	0.000000	0.000000	1.000000	0.000000
feature_14	0.000000	0.000000	0.000000	0.000000	0.000000
feature_15	1.000000	1.000000	1.000000	0.000000	1.000000
feature_16	0.000000	0.000000	0.000000	1.000000	0.000000
feature_17	0.000000	0.000000	0.000000	0.000000	0.000000
feature_18	0.000000	0.000000	0.000000	0.000000	0.000000
feature_19	0.000000	0.000000	0.000000	0.000000	0.000000
feature_20	0.000000	0.000000	0.000000	0.000000	0.000000
feature_21	0.000000	0.000000	0.000000	0.000000	0.000000
feature_22	0.000000	0.000000	0.000000	0.000000	0.000000
feature_23	0.000000	0.000000	0.000000	0.000000	0.000000
feature_24	0.000000	0.000000	0.000000	0.000000	0.000000
feature_25	0.000000	0.000000	0.000000	0.000000	0.000000
feature_26	0.000000	0.000000	0.000000	0.000000	0.000000
feature_27	0.000000	0.000000	0.000000	0.000000	0.000000
ground_truth	0.000000	0.000000	0.000000	0.000000	0.000000

Hyperparameter search¤

The choice of the batch size and the maximum number of epochs depends on the dataset size. For this dataset, we use the following values:

batch_size = 256
max_epochs = 20

We use the Type-2 architecture built using MonoDense layer with the following set of hyperparameters ranges:

def hp_params_f(hp):
    return dict(
        units=hp.Int("units", min_value=4, max_value=32, step=1),
        n_layers=hp.Int("n_layers", min_value=1, max_value=2),
        activation=hp.Choice("activation", values=["elu"]),
        learning_rate=hp.Float(
            "learning_rate", min_value=1e-4, max_value=1e-2, sampling="log"
        ),
        weight_decay=hp.Float(
            "weight_decay", min_value=3e-2, max_value=0.3, sampling="log"
        ),
        dropout=hp.Float("dropout", min_value=0.0, max_value=0.5, sampling="linear"),
        decay_rate=hp.Float(
            "decay_rate", min_value=0.8, max_value=1.0, sampling="reverse_log"
        ),
    )

The following fixed parameters are used to build the Type-2 architecture for this dataset:

final_activation is used to build the final layer for regression problem (set to None) or for the classification problem ("sigmoid"),
loss is used for training regression ("mse") or classification ("binary_crossentropy") problem, and
metrics denotes metrics used to compare with previosly published results: "accuracy" for classification and “mse” or “rmse” for regression.

Parameters objective and direction are used by the tuner such that objective=f"val_{metrics}" and direction is either "min or "max".

Parameters max_trials denotes the number of trial performed buy the tuner, patience is the number of epochs allowed to perform worst than the best one before stopping the current trial. The parameter execution_per_trial denotes the number of runs before calculating the results of a trial, it should be set to value greater than 1 for small datasets that have high variance in results.

final_activation = None
loss = "binary_crossentropy"
metrics = "accuracy"
objective = "val_accuracy"
direction = "max"
max_trials = 50
executions_per_trial = 1
patience = 5

The following table describes the best models and their hyperparameters found by the tuner:

The optimal model¤

These are the best hyperparameters found by previous runs of the tuner:

def final_hp_params_f(hp):
    return dict(
        units=hp.Fixed("units", value=8),
        n_layers=hp.Fixed("n_layers", 2),
        activation=hp.Fixed("activation", value="elu"),
        learning_rate=hp.Fixed("learning_rate", value=0.008),
        weight_decay=hp.Fixed("weight_decay", value=0.0),
        dropout=hp.Fixed("dropout", value=0.0),
        decay_rate=hp.Fixed("decay_rate", value=1.0),
    )

The final evaluation of the optimal model:

	0
units	8
n_layers	2
activation	elu
learning_rate	0.008000
weight_decay	0.000000
dropout	0.000000
decay_rate	1.000000
val_accuracy_mean	0.652917
val_accuracy_std	0.000085
val_accuracy_min	0.652851
val_accuracy_max	0.653065
params	577