Collecting large amounts of data and then using it to "teach" computers to automatically recognize patterns is pretty much standard practice nowadays. It seems that, given enough data and the right methods, computers can get quite precise at detecting or predicting nearly anything, whether it is face recognition, fraud detection or movie recommendations.

Whenever a new classification system is created, it is taken for granted that the system should be as precise as possible. Of course, classifiers that never make mistakes are rare, but if it possible, we should strive to have them make as few mistakes as possible, right? Here is a fun example, where things are not as obvious.

Consider a bank, which, as is normal for a bank, makes money by giving loans to its customers. Of course, there is always a risk that a customer will default (i.e. not repay the loan). To account for that, the bank has a risk scoring system which, for a given loan application, assesses the probability that the corresponding customer may default. This probability is later used to compute the interest rate offered for the customer. To simplify a bit, the issued interest on a loan might be computed as the sum of customer's predicted default risk probability and a fixed profit margin. For example, if a customer is expected to default with probability 10% and the bank wants 5% profit on its loans on average, the loan might be issued at slightly above 15% interest. This would cover both the expected losses due to non-repayments as well as the profit margin.

Now, suppose the bank managed to develop a perfect scoring algorithm. That is, each application gets a rating of either having 0% or 100% risk. Suppose as well that within a month the bank processes 1000 applications, half of which are predicted to be perfectly good, and half - perfectly bad. This means that 500 loans get issued with a 5% interest rate, while 500 do not get issued at all.

Think what would happen, if the system would not do such a great job and confused 50 of the bad applications with the good ones? In this case 450 applications would be classified as "100%" risk, while 550 would be assigned a risk score of "9.1%" (we still require the system to provide valid risk probability estimates). In this case the bank would issue a total of 550 loans at 15%. Of course, 50 of those would not get repaid, yet this loss would be covered from the increased interest paid by the honest lenders. The financial returns are thus exactly the same as with the perfect classifier. However, the bank now has *more clients. *More applications were signed, and more contract fees were received.

True, the clients might be a bit less happy for getting a higher interest rate, but assuming they were ready to pay it anyway, the bank does not care. In fact, the bank would be more than happy to segment its customers by offering higher interest rates to low-risk customers anyway. It cannot do it openly, though. The established practices usually constrain banks to make use of "reasonable" scorecards and offer better interest rates to low-risk customers.

Hence, at least in this particular example, a "worse" classifier is in fact better for business. Perfect precision is not really the ultimately desired feature. Instead, the system is much more useful when it provides a relevant and "smooth" distribution of predicted risk scores, making sure the scores themselves are decently precise estimates for the probability of a default.

I'm not sure I agree with the example.

If the market reality is that the good customers are willing to pay the 9.1% rate, it would be much more beneficial for the bank to use the perfect classifier, and charge 9.1% from the good customers, all the while incurring no losses from the bad customers.

Furthermore, in a competitive setting, a bank with the perfect classifier can trivially outcompete the bank with the poor classifier, since it can offer a singificantly lower price all the while getting higher margins.

The issue here is that banks quite often (as far as I understand it) cannot segment customers and price their loans arbitrarily using supply-demand logic - otherwise they might tend to offer

higherinterest rates to some of the more reliable customers. Instead, if a customer is known to have a good credit rating, he is expected to receive lower interest rates. No discussion.In addition, the risk scoring and interest rate computation may be modularized - one team oversees the first step (some companies simply buy this service from an external credit agency), and another works on the second one (I guess some companies use a standardized computation here). Moreover, the interest computation logic must sometimes be transparent and "reasonable" at least to some extent for particular legal reasons.

Note that if you have a perfect classifier, you can of course always make it worse by adding rules of the form "flip a coin and predict 10% instead of 0% if the coin falls down heads that many times". However in the circumstances described above you may have problems adding such logic.

Instead, you can essentially hide the "coin flipping" (along with the resulting customer segmentation spread) in the imperfections of the risk scoring method.

To put it differently, if a bank has to choose between two logistic regression models (those being the most typical and intuitively "understandable" backend to financial "scorecards"), where the first one has better accuracy/precision/ROC, while the second has a "more useful" spread of predicted probabilities, resulting in potentialy more clients, it is not immediately obvious whether the first model should be preferred.

Of course, the example may still fail completely under certain assumptions (such as perfect market competition that you mention). However, those assumptions do not always hold in reality either.

Note that perhaps a more direct example would be an insurance agency.

An insurance agency with a perfect risk predictor, in order to be able to insure all its clients and still offer reasonable prices to everyone, would essentially have to artificially "undo" the work done by the classifier and smooth out the hypothetically perfect predictions (they would have to demand nonzero premiums from people who are known to have zero risk, in order to "sponsor" reasonable rates for riskier clients).

Hence a "perfect" classifier would not be the best here. What is instead needed is something like a good probability estimator with a particular output profile (the latter dictated by the market somehow).

Assuming this sort of smoothing is needed (here and in the banking example), it is unclear why it would be better to have the smoother built into the predictor, instead of having it as s separate layer. It seems that having perfect predictions allows for a more optimal smoother and/or better business decisions for optimizing the company's bottom-line, e.g.:

* Even if you aren't allowed to turn away bad customers, you might be able to offer them lousier rates, driving them to competition, while offering better rates to good customers. If this is not allowed by law, you still have:

* Upselling to known good customers and downselling to bad ones (which you can't do with an imperfect predictor)

* More accurate budget planning (really important for smaller companies)

etc

It can be better to have the smoother built into predictor for two reasons:

- Transparency regulations. There may be situations where you literally cannot upsell/downsell explicitly due to external legal or internal regulations and must tie your interest/insurance premium computation "in a reasonable manner" to a risk score.

- Practical considerations. It may be hard or even impossible to find an optimal classifier. Even if you find it, it may be much more complex internally (e.g. an ensemble or a nonlinear model instead of a simple linear one). When you add the smoothing logic on top things end up even worse. At the same time a simple user-friendly linear model may immediately provide you with the same target probability distribution.

All in all the moral of the post is that there may be situations where you need a certain output distribution, and this distribution is not just two peaks at 0 and 100%. Of course, you can always find alternative contexts, such as "budget planning" or "publishing research papers", where better predictors may still be fruitfully exploited, but here those are not assumed to be the prime goals.

In this example, if you have to choose between a model that is more precise according to most of your favourite indicators (perhaps still not perfect), and a model that has a "nicer" output distribution, you might not end up picking the first one.

Of course, ideally, you might want to somehow combine the two models, tuning the target distribution to your liking somehow. However, this does not seem to be standard practice in machine learning at all. That's one obvious direction for some possibly mathematically-interesting yet not too complicated research, I believe.