Friday, June 3, 2016

Binary Classifier Error Testing

Using the Keras package, I wrote some code to make a binary classifier neural network.  Given the inputs of "age" and "LTI" (the Loan To Income ratio), the network should output a '1' if the individual is predicted to default on the loan, and a '0' if they are not.

I tested using several different activation functions, and here is what I've found:

Wrong Approach

First, I defined errors the number of errors to be:

nb_errors=abs(sum(y1-Y_test)) ,

where y1 is the output of the neural network, and Y_test is the actual results from the dataset.

This is the wrong approach because it means that false positives make my data look better.  If the network detects outputs a '1' where there should be a '0', then that will effectively cancel out some of the false negatives, where the network outputs a '0', but there should be a '1'.  This resulted in some interesting behavior.



With only 1 hidden layer, both networks performed much worse than with 2, even when the number of nodes were the same.  The number of errors saturate when there is few nodes with the 'relu' activation function, because the network is outputting an answer of all-zeros.  For these instances. it hasn't really learned anything yet, but it gets an answer that still get ~85% of the outputs right.

More interesting is the performance of 'tanh'.  It actually does better with fewer nodes so long as there are two hidden layers.  This is because 'tanh' seems to give more false positives than 'relu' for this dataset.

Right Approach

The number of errors should be calculated by:

nb_errors=sum(abs(y1-Y_test)) ,

This way, the false positives and false negatives are treated the same, so you can get the total number of errors by just adding them up.  Here is the results from this approach:



'relu' and 'tanh' activation functions seemed to perform similarly for high numbers of nodes, but 'tanh' performed better overall, especially when there were fewer nodes.  More hidden layers adds computation time, but makes a big difference in accuracy.  Of the 1200 points tested, the network gets ~50 wrong answers with two layers of 64 nodes, which is not bad.

I also tested the 'sigmoid' activation function, and it performed poorly.  It was not able to break away from the answer of all-zeros until the very end of the last simulation, and even then it performed poorly.  Thus, Sigmoid does not appear to be a good activation function to use for the hidden layers.

Conclusion

'tanh' performs better than 'relu' for this binary classification problem.  It also has a higher frequency of false-positives.  This is fine for the lending industry, which may be hesitant to give out risky loans, but if false-positives are worse than false-negatives, than the 'relu' activation function may be better to use so long as enough nodes are used to get a good answer.
---------------------------
download the data set: creditset.csv
download my: python code

No comments:

Post a Comment