Machine learning describes the process through which computers can learn without continued human input. In the era of big data, machine learning is particularly promising because it allows for identification of patterns in large data sets. Machine learning has applications in fields as diverse as medicine, e-commerce, and banking. This essay will discuss the application of machine learning, particularly explanation-based learning, to the financial tech industry, focusing on fraud detection.
A Brief History of Machine Learning
The concept of machine learning first arose in 1950 with Alan Turing’s paper Computing Machinery and Intelligence, 1 in which Turing proposed to answer the question, “Can machines think?” To answer this question, Turing crafted what became known as the “Turing Test” with three participants: one human judge, one human player, and one computer. The judge, placed separately from the human and the computer, aims to determine which of the two is a human and which is a computer. A computer “passes” the Turing Test when the judge cannot consistently distinguish the computer player from the human player. 2 Turing predicted that humans could program computers that would pass the test by 2000. 3
Over the next four decades, scholars and programmers refined the concept of machine learning and developed new tests. In 1959, IBM programmer Arthur Samuel created a checkers program in which the computer improved progressively the more it played. 4 Programmers focused on developing machines that performed pattern recognition over the next two decades. These efforts culminated in the introduction of Explanation-Based Learning (“EBL”), 5 in which a machine uses a set of programmer-supplied “training data” to identify patterns, synthesize rules, and apply the rules to new sets of data. 6
From the 1990’s to today, work has transitioned to developing machines that can handle large amounts of data to draw conclusions. 7 Machine learning has been extended to include “deep learning,” which involves use of increased processing power to analyze visual and auditory data in real-time. 8 Large technology companies have developed their own proprietary machine learning code that acts as the backbone for certain features of their products. 9 Future development focuses on continued improvement in natural language processing—which allows for human voice interaction with devices 10—and applying machine learning to new industries.
Pattern Recognition and Explanation-Based Learning: An E-Commerce Example
Although there are several methodologies for machine learning, this article focuses on explanation-based learning. Explanation-based learning (“EBL”) involves teaching a machine to detect patterns in data based on a set of programmer-supplied “training data,” using the patterns to create a rule and then applying the rule to larger sets of data to make predictions. A simplified but powerful example from the e-commerce industry will help illustrate the process. 11
Retail companies face the challenge of catering to individual customers in a growing global economy. Machine learning can help retailers by providing extremely personalized predictions about how an individual’s shopping habits may change given a change in personal circumstances. The simple system illustrated here will involve a machine learning system predicting whether a customer is pregnant.
The first requirement for a machine learning system is “training data.” Training data consists of different data points (called “features”), which come together to form an individual “record,” and an output value (the “target”). 12 Training data is necessary because the machine cannot make predictions without examples of how the different features affect the output. In our example, the features will be the customer’s age and whether or not she purchases two products commonly associated with pregnancy. These features come together to form ten “records”: in this case, a purchasing history for one customer. The target is whether the customer was actually pregnant. 13 TABLE 1 shows this data.
From this set of ten records and their corresponding outputs, the computer can form a “decision tree,” a process for evaluating the probability of the output occurring given the value of each feature. This is a decision tree for this problem, with row (A) showing the probabilities of pregnancy given the training data in TABLE 1.
With this small amount of data, the decision tree is not particularly useful to the retailer. The ten records do not capture the purchasing trends of the entire customer base; two branches of the tree remain empty. Imagine instead that a larger set of training data (say, with 10,000 records) produced the probabilities in row (B) in FIGURE 1. This data would be useful to a retailer, especially where the data produced high or low probabilities.
Evaluating the Model
Now that the model has been developed from an adequate training set, the retailer can utilize the model to make predictions about new customers. A retailer could, say, send coupons for baby products to a potentially pregnant customer who fits in one of the high probability categories. If the customer is indeed pregnant, the coupons might encourage her to shop at the retailer.
However, no decision tree is perfect because of practical limitations in data collection. In this case, the decision tree uses a limited set of data to produce probabilities that a customer with a given shopping history is pregnant. The retailer needs to evaluate if its model is actually effective at predicting if a customer is pregnant.
The shaded areas are where the retailer uses the model and determines that a customer is pregnant; the white areas are where the retailer determined the customer was not pregnant. The retailer should aim to minimize the number of women in the red shaded area (a “false positive,” where the retailer determined someone was pregnant, but she wasn’t) and maximize the number of women in the green shaded area (where the retailer correctly identified someone as pregnant).
The retailer can evaluate this by calculating “precision” and “recall.” 14 In our example, precision is the percentage of customers predicted to be pregnant who actually are. Higher precision indicates fewer false positives. Recall is the percentage of all pregnant customers who are identified by the model. Higher recall indicates fewer false negatives. There is a relationship between precision and recall. As a retailer raises the probability threshold for predicting someone is pregnant, it will reduce false positives (and thus increase precision), but it will also increase false negatives (thus reducing recall). If a retailer decides that the model must have 85% certainty that a customer is pregnant instead of 75%, it will exclude customers whose product purchases suggest between a 75% and 85% probability of pregnant. It is very possible that customers in this range are pregnant, but that the increased probability threshold will produce false negatives for these customers.
A retailer faces obvious obstacles in determining numbers of actually pregnant customers, but this could be accomplished through surveys of customers. By further refining the model through evaluation of the most predictive features, the trade-off between precision and recall can be reduced, creating a higher quality model and giving the retailer the maximum benefits of a machine learning system.
EBL and Fraud Detection
EBL is commonly used in the financial technology space to detect credit card fraud. Financial institutions often license fraud-detection software from third-parties. This software, in its most simplified form, utilizes hundreds or thousands of features to form a decision tree, producing probabilities used to predict if a transaction is fraudulent. 15
Using a system similar to that in the pregnancy example, fraud detection companies identify features that, when analyzed together, are highly predictive of fraud. In this simplified example, a fraud detection company could build a system using three different features to detect basic instances of fraud on a single card: the country of use for a charge, the charge amount, and the number of countries used in a given time period. 16
Fraud detection can be quite difficult. From a cursory examination of the training data, there does not seem to be a consistent pattern. Although some transactions may be quite obviously fraud (such as record 2, where a charge was made in a country not present in other records), other patterns are not so evident (such as a correlation between number of countries and fraudulent charges). Machine learning becomes particularly useful in the fraud detection industry because it enables companies to quickly analyze complex sets of data. For example, the developer of this model may determine that the third feature is not particularly predictive of fraud. One benefit of EBL in this space is that companies can identify relevant features and exclude irrelevant ones. 17
EBL continues to grow in other financial spheres as well. Banks use EBL to analyze customer traits (including past defaults, job status, and marital status) to approve or reject loans. 18 Other financial institutions use EBL to power “robo-advisors” that advise customers on allocating investments and financial instruments. 19 In the future, EBL could power new security systems for banking (such as facial recognition) or even finance-specific customer service systems. 20
Since Alan Turing first hypothesized a thinking machine in 1950, machine learning has developed into a powerful tool. In explanation-based learning, one of the many different types of machine learning, a human provides a set of training data, which includes several features and records, from which a machine extrapolates patterns and creates rules. We encounter these systems every day: in e-commerce and fraud detection, machine learning forms a critical backbone. Future development of EBL will focus on applying the technology to new technologies in the era of big data.