These are the reasons why the most sophisticated Machine Learning techniques may not perform well.

4 min readAug 29, 2022

We know that machine learning can be awesome. The field has drawn a lot of focus in recent times. As a result, new innovations are coming out rapidly. However, in practical settings, machine learning is not really an exact science. There are some issues that we should be mindful of. In this article, I will briefly cover some of these issues. These issues are broadly discussed in the paper¹ by David Hand. The issues are as follows.

1. marginal improvements.

2. design sample selection.

3. problem uncertainty.

4. Interpret empirical comparisons.

Marginal improvements

Sometimes we use a simple classifier (e.g., LDA) first and then use a much more complex classifier on the dataset. We hope that the complex classifier (e.g., SVM) might be able to produce better accuracy. Now if our simple classifier produces an error rate that is close to the Bayes error rate for that dataset, then the complex classifier will not be able to produce a result that is significantly better than what the simple classifier produced. We can see that from the below figure. In this case, all our effort in implementing and fine-tuning a complex classifier might be wasted as the relative improvement over the simple classifier might not be worthwhile.

Our simple classifier achieves 75% accuracy on the given dataset. In other words, a 25% error rate is reported for the classifier. If the minimum possible error rate (Bayes error rate) for the given dataset is 15%, then no matter how sophisticated our other classifier is, it can only achieve at most 10% improvement compared to our simple classifier.

Design sample selection

In machine learning, we train our classifier on the training set and apply it to the test set. We assume that the distribution of the training set and the test set would be similar. However, these assumptions could easily be violated. For example, if the training set is not a representative sample of the test set or there is population drift, then the classifier will not be able o perform well on the unseen test set. Both these issues are illustrated in the figures below. So, if our problem is affected by one of these two issues, then a sophisticated classifier may not be able to produce superior performance over a simple one.

There is a large gap between the distribution of the training set and the test set. In this case, the training set is not a representative sample of the test set. So no matter how sophisticated the method we use, the performance will never be good.

Population drift refers to the phenomenon where the distribution of the population changes over time. In this case, we train our classifier at time-index 1. However, as the distribution of the test set changes over time, the performance of the classifier declines.

Problem uncertainty

Sometimes our problems are vague. The definition of classes is not well defined. In that case, one might tempt to change the class labels (hence definitions) in such a way that it may produce better accuracy. This is definitely not a good practice! In other cases, the labels of the dataset may not be accurate. In these cases, there is no guarantee that a sophisticated method would do better than a simple one.

Interpret empirical comparisons

Imagine that you are scrolling through your LinkedIn page or a data science blog. Suddenly you stumble upon a post where it shows that a new cool algorithm produced far better results than an existing simpler method. We start to talk about it, try to learn the new method as quickly as possible, and try to think about ways to implement that for our project. Until we come across the next big thing and repeat the cycle. Hand argued that we should keep a simple thing into our consideration: it is not possible for a person to be an expert on all the methods. It is more likely that the person who invented a new method is more likely to be the expert on that method. Now, when that person compares the new method with an older method, it is possible that the person has much more expertise in the new method (since they developed it), compared to the other existing methods. As a result, the superior performance of the shiny new method might very well be due to the expertise of the practitioner rather than the superiority of the new classifier.

Conclusion

We put a great deal of effort into understanding different machine learning algorithms. It is equally important to know when they might not work as well as we would have liked. We know that certain classifiers like deep learning models do a much better job on certain tasks like image classification, natural language processing (NLP), etc. It would be unwise to think that all the issues that Hand highlighted in the paper, will always disrupt the performance of the sophisticated classifier. It will depend on the context and the problem that we are trying to solve. However, it is important to be mindful of these issues.

[1]: Hand, D. J. (2006). Classifier technology and the illusion of progress. Statistical science, 21(1), 1–14.