Four Years Remaining

When the Best is not the Best

Posted by Konstantin 04.01.2016

Collecting large amounts of data and then using it to "teach" computers to automatically recognize patterns is pretty much standard practice nowadays. It seems that, given enough data and the right methods, computers can get quite precise at detecting or predicting nearly anything, whether it is face recognition, fraud detection or movie recommendations.

Whenever a new classification system is created, it is taken for granted that the system should be as precise as possible. Of course, classifiers that never make mistakes are rare, but if it possible, we should strive to have them make as few mistakes as possible, right? Here is a fun example, where things are not as obvious.

Consider a bank, which, as is normal for a bank, makes money by giving loans to its customers. Of course, there is always a risk that a customer will default (i.e. not repay the loan). To account for that, the bank has a risk scoring system which, for a given loan application, assesses the probability that the corresponding customer may default. This probability is later used to compute the interest rate offered for the customer. To simplify a bit, the issued interest on a loan might be computed as the sum of customer's predicted default risk probability and a fixed profit margin. For example, if a customer is expected to default with probability 10% and the bank wants 5% profit on its loans on average, the loan might be issued at slightly above 15% interest. This would cover both the expected losses due to non-repayments as well as the profit margin.

Now, suppose the bank managed to develop a perfect scoring algorithm. That is, each application gets a rating of either having 0% or 100% risk. Suppose as well that within a month the bank processes 1000 applications, half of which are predicted to be perfectly good, and half - perfectly bad. This means that 500 loans get issued with a 5% interest rate, while 500 do not get issued at all.

Think what would happen, if the system would not do such a great job and confused 50 of the bad applications with the good ones? In this case 450 applications would be classified as "100%" risk, while 550 would be assigned a risk score of "9.1%" (we still require the system to provide valid risk probability estimates). In this case the bank would issue a total of 550 loans at 15%. Of course, 50 of those would not get repaid, yet this loss would be covered from the increased interest paid by the honest lenders. The financial returns are thus exactly the same as with the perfect classifier. However, the bank now has more clients. More applications were signed, and more contract fees were received.

True, the clients might be a bit less happy for getting a higher interest rate, but assuming they were ready to pay it anyway, the bank does not care. In fact, the bank would be more than happy to segment its customers by offering higher interest rates to low-risk customers anyway. It cannot do it openly, though. The established practices usually constrain banks to make use of "reasonable" scorecards and offer better interest rates to low-risk customers.

Hence, at least in this particular example, a "worse" classifier is in fact better for business. Perfect precision is not really the ultimately desired feature. Instead, the system is much more useful when it provides a relevant and "smooth" distribution of predicted risk scores, making sure the scores themselves are decently precise estimates for the probability of a default.

Posted by Konstantin @ 9:52 pm

Tags: Data analysis, Economics, Machine learning, Paradox, Probability theory, Project, Statistics
7 Comments
1. AndoS on 05.01.2016 at 00:16 (Reply)
  
  I'm not sure I agree with the example.
  If the market reality is that the good customers are willing to pay the 9.1% rate, it would be much more beneficial for the bank to use the perfect classifier, and charge 9.1% from the good customers, all the while incurring no losses from the bad customers.
  
  Furthermore, in a competitive setting, a bank with the perfect classifier can trivially outcompete the bank with the poor classifier, since it can offer a singificantly lower price all the while getting higher margins.
  1. Konstantin on 05.01.2016 at 00:41 (Reply)
    
    The issue here is that banks quite often (as far as I understand it) cannot segment customers and price their loans arbitrarily using supply-demand logic - otherwise they might tend to offer higher interest rates to some of the more reliable customers. Instead, if a customer is known to have a good credit rating, he is expected to receive lower interest rates. No discussion.
    
    In addition, the risk scoring and interest rate computation may be modularized - one team oversees the first step (some companies simply buy this service from an external credit agency), and another works on the second one (I guess some companies use a standardized computation here). Moreover, the interest computation logic must sometimes be transparent and "reasonable" at least to some extent for particular legal reasons.
    
    Note that if you have a perfect classifier, you can of course always make it worse by adding rules of the form "flip a coin and predict 10% instead of 0% if the coin falls down heads that many times". However in the circumstances described above you may have problems adding such logic.
    
    Instead, you can essentially hide the "coin flipping" (along with the resulting customer segmentation spread) in the imperfections of the risk scoring method.
    
    To put it differently, if a bank has to choose between two logistic regression models (those being the most typical and intuitively "understandable" backend to financial "scorecards"), where the first one has better accuracy/precision/ROC, while the second has a "more useful" spread of predicted probabilities, resulting in potentialy more clients, it is not immediately obvious whether the first model should be preferred.
    
    Of course, the example may still fail completely under certain assumptions (such as perfect market competition that you mention). However, those assumptions do not always hold in reality either.
2. Konstantin on 05.01.2016 at 01:19 (Reply)
  
  Note that perhaps a more direct example would be an insurance agency.
  
  An insurance agency with a perfect risk predictor, in order to be able to insure all its clients and still offer reasonable prices to everyone, would essentially have to artificially "undo" the work done by the classifier and smooth out the hypothetically perfect predictions (they would have to demand nonzero premiums from people who are known to have zero risk, in order to "sponsor" reasonable rates for riskier clients).
  
  Hence a "perfect" classifier would not be the best here. What is instead needed is something like a good probability estimator with a particular output profile (the latter dictated by the market somehow).
3. AndoS on 05.01.2016 at 11:05 (Reply)
  
  Assuming this sort of smoothing is needed (here and in the banking example), it is unclear why it would be better to have the smoother built into the predictor, instead of having it as s separate layer. It seems that having perfect predictions allows for a more optimal smoother and/or better business decisions for optimizing the company's bottom-line, e.g.:
  * Even if you aren't allowed to turn away bad customers, you might be able to offer them lousier rates, driving them to competition, while offering better rates to good customers. If this is not allowed by law, you still have:
  * Upselling to known good customers and downselling to bad ones (which you can't do with an imperfect predictor)
  * More accurate budget planning (really important for smaller companies)
  etc
  1. Konstantin on 05.01.2016 at 15:00 (Reply)
    
    It can be better to have the smoother built into predictor for two reasons:
    
    - Transparency regulations. There may be situations where you literally cannot upsell/downsell explicitly due to external legal or internal regulations and must tie your interest/insurance premium computation "in a reasonable manner" to a risk score.
    
    - Practical considerations. It may be hard or even impossible to find an optimal classifier. Even if you find it, it may be much more complex internally (e.g. an ensemble or a nonlinear model instead of a simple linear one). When you add the smoothing logic on top things end up even worse. At the same time a simple user-friendly linear model may immediately provide you with the same target probability distribution.
    
    All in all the moral of the post is that there may be situations where you need a certain output distribution, and this distribution is not just two peaks at 0 and 100%. Of course, you can always find alternative contexts, such as "budget planning" or "publishing research papers", where better predictors may still be fruitfully exploited, but here those are not assumed to be the prime goals.
    
    In this example, if you have to choose between a model that is more precise according to most of your favourite indicators (perhaps still not perfect), and a model that has a "nicer" output distribution, you might not end up picking the first one.
    
    Of course, ideally, you might want to somehow combine the two models, tuning the target distribution to your liking somehow. However, this does not seem to be standard practice in machine learning at all. That's one obvious direction for some possibly mathematically-interesting yet not too complicated research, I believe.
4. Sergey on 30.04.2019 at 06:54 (Reply)
  
  In real life there are, say, 30 banks willing to give you a loan at 6% APR. One of the banks is not going to experiment and offer you 9%, because it's much more important for them to give you a loan in the first place. Not only does it bring continuous income and sometimes closing fees, but it also opens the door for future business, as tomorrow you may want something else. Furthermore, offering a ridiculous rate can be a PR risk if you take to posting about it on social media.
  
  In the U.S. you model would not pass the scrutiny of regulators if it offers higher rates to low-risk customers. So while this example is amusing, it is far removed from the actual reality of automated underwriting. In real life, you may have to make your model slightly weaker in order for it to stay compliant; it does happen. So in contrast to what you are asserting, a better model is, in fact, better, and will bring you a lot more money in practice. While is why a bank would use such a model in the first place.
  1. Konstantin on 04.05.2019 at 23:12 (Reply)
    
    You reversed the numbers in your example, that's why it seems nonsensical to you. Here's how the correct example should look like:
    
    Suppose that no banks in the region in general offer loans cheaper than 6% to the majority of customers.
    
    Suppose that John is one of this majority and can expect a 6% APR loan. Knowing the market, John is happy take a 6% loan.
    
    Finally, suppose that bank X has somehow more data about John and a super-precise risk model, according to which John is super-trustworthy and even a 2% APR loan to him is still expected to be profitable.
    
    Does it make sense for bank X to give John a 2% loan when the bank knows it could just as well proudly issue a 5% loan and still make John happy with the "best rate"?
    
    Which model, in this case, would be better for the bank? The mathematically precise one, offering John a 2% loan, or a ROC-wise-worse one, which would "use the extra income" from John to finance a couple of more risky customers (which would then bring extra income in closing fees or add-on services)?
    
    The scrutiny of regulators cannot really prevent you from including ad-hoc criteria (such as "nicer-looking score spread") when you are training your scoring model.
Leave a comment

Name (required)

E-Mail:(not displayed)(required)

Website:

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Reply to:

January 2016
M	T	W	T	F	S	S
« Sep		May »
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Oli on The Data Science Workflow
Adam on The Curse of Genomic Coordinates
second on How to Send an SMS
6 Regularization Techniques for Deep Learning | Python | Keras - AI ASPIRANT on The Mystery of Early Stopping
Aldo D'Ottavio on What is the Covariance Matrix?

When the Best is not the Best

7 Comments

Leave a comment

Calendar

Recent Comments

Archives