Empowering banking security: machine learning for fraud detection

Miquido Author

14 Nov 2023

24 min read

In this article:

Evolution of fraud detection

How machine learning will transform fraud detection

Supervised machine learning

Unsupervised machine learning

Benefits of using ML for fraud detection

Machine learning techniques used in fraud detection

1. Logistic regression

2. Decision tree

3. Random Forest

4. Neural networks

5. Support vector machine

6. K-nearest neighbor

Navigating challenges and strategic considerations

Inadequate infrastructure

Data quality and security

Lack of talent

Case studies of fraud detection in banking using machine learning

Fraud detection

Anti-money laundering

Credit underwriting

Key considerations when using ML for fraud detection

1. Limit the number of input variables

2. Ensure regulatory compliance

3. Set a reasonable threshold

Anticipating the future

With every opportunity comes a threat. The shift towards digitization in the banking industry improved customer experience and expanded client bases to previously unbanked populations. The downside was that online transactions and digital payment solutions opened new avenues for fraudsters to exploit.

Findings from a KMPG fraud survey indicate that cyber-attacks are increasing in frequency and severity, resulting in billions of dollars in loss.

Value of fraud loss in the United States in 2022

The above graph illustrates the value of fraud loss by payment method in the United States in 2022. Bank transfers and payments were the highest, with a $1.59 billion loss.

These losses have forced banking institutions to adopt new solutions to detect, mitigate, and prevent financial fraud. One such method is artificial intelligence (AI), specifically machine learning.

In this article, we’ll discuss everything you need to know about machine learning for fraud detection, including benefits and real-life applications.

Evolution of fraud detection

Traditional fraud detection follows a rule-based approach. As the name suggests, it operates under a set of rules or conditions that determine whether a transaction is genuine or fraudulent. Common conditions include the location (is the purchase outside the user’s usual area?) and frequency (is the number and type of purchase usual for the user?).

A transaction only goes through when it meets the conditions. For example, a customer in Ohio suddenly has a POS charge in New Zealand. The location is outside the user’s area code, so the system flags the transactions as fraudulent.

There are several drawbacks to this type of fraud detection system.

It produces a high number of false positives. This is where you block payments from genuine customers.
It is inflexible. The rule-based approach uses fixed outcomes, making it difficult to adapt to trends in digital banking. You must change the rules to catch new forms of fraud.
It doesn’t scale. When data increases, so does the effort it takes to prevent it. Any changes to the system are done manually, making it expensive and time-consuming.

Rule-based fraud detection works. However, its disadvantages make it unsuitable for modern digital environments. It can’t recognize patterns and relies on human intervention.

Furthermore, hackers don’t adhere to a 9-5 schedule and can deploy sophisticated methods like location spoofing and customer behavior impersonation to fool fraud detection systems. Therefore, you need an equally highly-developed system that works 24/7.

Enter machine learning.

Machine learning is an artificial intelligence (AI) that uses data to train fraud detection algorithms to uncover data patterns and relationships, gain insight, and make predictions.

You’re already familiar with machine learning, even if you don’t know it. For instance, whenever you engage with an Instagram post, you feed the algorithm information about the type of content you like. It then scours the app for similar content to add to your feed.

How machine learning will transform fraud detection

Fraud detection in banking using machine learning is already changing the industry, with quicker, more flexible, and more accurate identification of and response to fraud.

The AI system analyzes patterns in customer data and automatically changes rules based on historical and emerging threats.

Remember that New Zealand POS charge we mentioned earlier? Fraud detection using machine learning would consider that the same bank card has a purchase for a flight to that location. Therefore, the new debit is most likely legitimate.

Two models are used to train algorithms to detect fraud: supervised machine learning and unsupervised machine learning.

Supervised machine learning

The supervised learning model feeds algorithms large amounts of data tagged as either fraud or non-fraud. The algorithm studies these examples and learns which patterns and relationships distinguish legitimate transactions from fraudulent ones.

This learning model is time-consuming as it requires manual tagging of data. Moreover, your data sets must be correctly labeled and well-organized. An incorrectly tagged transaction will affect the accuracy of the algorithm.

Additionally, it only learns from inputs included in the training set. So, transactions through your newly launched mobile banking app features that weren’t part of the historical data wouldn’t be flagged. There is now a loophole for fraudsters to exploit.

Unsupervised machine learning

The unsupervised learning model uses minimal human input. The algorithm learns patterns and relationships from large amounts of untagged data, grouping data sets based on similarities and differences.

The objective is to spot unusual activity not included in the training data set. Thus, unsupervised learning picks up where supervised learning drops off and detects new fraud.

Remember that you don’t have to choose between a supervised or unsupervised machine learning model. You can use them together (semi-supervised learning model) or independently.

Benefits of using ML for fraud detection

We’ve hinted at the benefits of fraud detection using machine learning in banking, but let’s discuss them further.

Speed

Machine learning computations happen rapidly and give fraud decisions in real time. While rule-based algorithms also decide in real time, they rely on written rules to flag fraud.

What happens in new scenarios with no pre-defined rules? It leads to false positives or false negatives.

Machine learning detects new patterns automatically, analyzing regular customer activity and calculating appropriate outcomes within milliseconds.

Accuracy

Rule-based detection systems block genuine transactions or allow fraudulent ones because they do not detect nuances in customer behavior.

Machine learning systems consider variables beyond the written rules, for example, known fraudulent behavior. These variables help contextualize the transaction, lowering the rate of false positives.

Flexibility

Machine learning is flexible and reactive. The self-learning ability enables this system to adjust to new scenarios and detect new threats. Rule-based systems are rigid and don’t have learning capabilities. Therefore, it can only respond to fraudulent activity according to pre-defined rules.

Efficiency

Machine learning algorithms can analyze thousands of transaction data per second. Instead of spending labor and overhead costs investigating low to moderate fraud cases, machine learning can process repetitive or clear-cut fraud. It allows fraud specialists to focus on complex patterns that need human insight.

Scalability

Increased data volume puts pressure on rule-based systems. New rules add to the system’s complexity, making it difficult to maintain. Any error or contradiction can render the entire model ineffective.

Machine learning systems are the opposite. They not only assimilate large volumes of new data, but they also improve.

Machine learning techniques used in fraud detection

Before we examine the different algorithms used in AI fraud detection, let’s overview how the system works.

The first step is data input. The model’s accuracy depends on the volume and quality of the data. The more high-quality data you add, the more accurate the model becomes.

Next, the model analyzes the data and extracts key features that describe normal behaviors versus fraudulent ones. These features include customer identity (email or phone number), location (IP or shipping address), payment methods (cardholder name and originating country), and more.

The third step is training the algorithm (with more data) to distinguish between genuine and fraudulent transactions. The model receives a training data set and predicts the probability of fraud in various cases. Once the algorithm is sufficiently trained, you are ready to launch it.

Now, let’s look at the various algorithms you can use.

1. Logistic regression

Logistic regression is a supervised learning algorithm. It calculates the probability of fraud on a binary scale – fraud or non-fraud – based on the model’s parameters.

Transactions that fall on the positive side of the graph are most likely fraudulent, while those on the negative side are most likely legitimate.

2. Decision tree

A decision tree is a supervised learning algorithm but goes further than logistic regression algorithms. It is a hierarchical decision structure that analyzes data in levels to determine whether a transaction is genuine or fraudulent.

Below is an illustration of a decision tree for credit card fraud detection.

Machine learning for fraud detection: decision tree

The condition to identify whether the transaction is fraudulent is the transaction amount. If the value of the transaction exceeds a set threshold, the algorithm considers it fraudulent. If not, the tree checks another condition – transaction time. If the timing is unusual (here, 3 a.m.), it’s likely to be a fraud. If not, it checks another condition. It goes on.

3. Random Forest

Random forest is a combination of many decision trees, where each decision tree checks for different conditions – identity, location, etc.

Machine learning for fraud detection: Random forest

After checking all parameters, every sub-tree offers a decision. The combined total determines if the transaction is genuine or fraudulent.

4. Neural networks

Neural networks are complex, unsupervised algorithms. Inspired by the human brain, neural networks process data in multiple layers to extract high-level features. This algorithm goes hand-in-hand with deep learning, which can recognize patterns in pictures, text, audio, and other data.

Here’s a simplified version of a neural network.

Neural Network: Machine learning for fraud detection

A neural network has three layers: input, hidden, and output. The input layer processes data, the hidden layer analyzes the data from the input layer to identify hidden patterns, and the output layer classifies the data.

Deep neural networks have several hidden layers. They are great for identifying non-linear relationships and detecting unprecedented fraud scenarios.

5. Support vector machine

Support vector machines (SVM) are supervised learning algorithms that predict, classify, and detect outliers.

Support vector machine algorithm: Machine learning for fraud detection

This linear SVM illustration shows two data sets separated by a straight line called a hyperplane. It is the decision boundary that classifies data as fraud vs. non-fraud.

Data points further from the hyperplane are easily classified. Support vectors (closest to the hyperplane) are difficult to categorize. These outliers can affect the position of the hyperplane if removed.

6. K-nearest neighbor

K-nearest neighbor (KNN) is a supervised learning algorithm. It operates on the assumption that similar items exist close to each other.

Below is a simple illustration.

K-nearest neighbor algorithm: Machine learning for fraud detection

New data entry needs to be placed in either category A or B. The algorithm calculates the distance between data points using a mathematical equation called the Euclidean distance. The new data point falls into the group with the most neighbors. If the closest data set is tagged ‘fraud,’ that transaction is classified as fraudulent.

Navigating challenges and strategic considerations

Like all technology, there are growing pains associated with integrating machine learning for fraud detection. Here are some common challenges you may face.

Inadequate infrastructure

Many banking systems can’t analyze large quantities of complex data. Furthermore, most data is siloed and housed in separate storage facilities.

Unfortunately, there’s no quick fix to this problem. You have to invest in the appropriate hardware and software.

You’ll need to partner with an experienced Fintech app development agency and set up an infrastructure to automatically select appropriate algorithms for specific data sets, import raw data and prepare it for machine learning, visualize the data, test the algorithm, and more.

Data quality and security

Data quality is a significant issue for financial institutions looking to implement machine learning for fraud detection. Machine learning models don’t distinguish between good and bad data. So, if the algorithm is tainted with irrelevant or incomplete data, the accuracy of your model will be incorrect.

Data ingestion solutions like Amazon Kinesis collect, clean, and transform raw data, making it suitable for machine learning models. Once the data is cleaned and organized, you must segregate sensitive and insensitive data. Encrypt confidential information and store it in secured facilities. You should also limit access to this data.

Lack of talent

Despite what people fear, machine learning isn’t stealing jobs. It’s quite the opposite. We still need fraud analysts to manage complex cases that require human insight and experience. Also, machine learning is a new technology, and there aren’t enough experts in the field.

This is good news for job seekers but not for institutions that can’t capitalize on the full potential of machine learning. You can overcome this speedbump by partnering with businesses with the skill set to implement machine learning.

Case studies of fraud detection in banking using machine learning

Now, let’s look at real-life examples of fraud detection in banking using machine learning.

Fraud detection

Danske Bank is a Danish multinational financial corporation. It is the largest bank in Denmark and a leading retail bank in Northern Europe. Under the rule-based detection system, the bank struggled to mitigate fraud. It had a 40% fraud detection rate and 99.5% false positive rate.

Working with Teradata, a data software company, Danske integrated deep learning software to help identify potential fraudulent activity. The result was a 60% reduction in false positives and a 50% increase in true positives.

Anti-money laundering

OakNorth is a commercial lending bank in the UK, providing business and personal financial services to scaling companies. The bank had a fractured screening process, with one provider for anti-money laundering checks and another for customers. Moreover, the screenings for politically exposed persons (PEP) generated a lot of false positives.

Working with ComplyAdvantage, a fraud and AML detection company, the bank integrated a screening and ongoing monitoring solution to streamline compliance and consolidate data. This facilitated rapid data transfer between the bank’s lending and saving operations.

Credit underwriting

Hawaii USA Credit Union is the largest credit union in Hawaii and one of Forbes Magazine’s best credit unions. It wanted to be competitive against Fintech companies and grow its personal loan portfolio without increasing risk.

Working with Zest AI, the credit union automated its decision-making processes using an AI-driven personal loan model. The model used 278 variables to provide deeper insights than the VantageScore credit scoring system. The result was a 21% increase in approvals rate and a 0% default/loan application fraud rate.

Key considerations when using ML for fraud detection

While fraud detection in banking using machine learning is efficient, it is also daunting. These systems demand lots of accurate data, or the models don’t work as well as they should.

So, here are some tips to optimize the machine learning process.

1. Limit the number of input variables

Throughout this article, we’ve said more is more. That remains true about data volume. However, less is more with the number of fraud detection variables.

Typical features to consider when investigating fraud include:

IP address
Email address
Shipping address
Average order/transaction value

The benefit of fewer features is shorter algorithm training times. You also avoid problems of overlapping or irrelevant datasets.

2. Ensure regulatory compliance

Preventing fraud is one part of data security. The other is data privacy. Many countries have laws about how institutions can collect, use, and store customer data. There’s China’s Personal Information Protection Law (PIPL), the California Consumer Privacy Act (CCPA), and the European Union’s General Data Protection Regulation (GDPR), to name a few.

These laws have implications for the data used in machine learning. The primary principle across most data privacy compliance regulations is notice/consent. You must notify and receive permission to use customer data for purposes other than user requests, including data for training machine learning algorithms.

The simplest way to ensure adherence to privacy standards is by using technical partners with regulatory-compliant features. For instance, you should partner with a banking app development company that understands how to maintain data privacy and security.

3. Set a reasonable threshold

Transaction value rules have minimum requirements to trigger an accept or reject response. You want a threshold that balances security and user experience. If the threshold is too strict, you risk blocking legitimate transactions. If the threshold is too lax, you’ll increase the rate of successful fraud.

Calculate your risk appetite to find the right balance. Risk levels differ for each financial institution or product. For example, a micro-lending bank offering can set a high threshold for low-value loans. A commercial bank can’t be as generous with mortgage loans.

Anticipating the future

The future is now, yet only 17% of organizations use machine learning in anti-fraud programs. Don’t be left behind.

Here are some breakthroughs you can expect in your bank’s security through machine learning.

Device profiling: identify the different devices that connect to your banking network, analyzing the features and behaviors of any given device.
Automated anomaly detection and response: identify fraudulent behavior from known devices and isolate affected systems.
Zero-day detection: identify previously unknown vulnerabilities and malware to protect organizations from cyberattacks.
Data masking: automatically detects and anonymizes confidential data.
Scaled insights: identify trends in fraud across multiple devices and locations.
Innovative policy: use machine learning insights to drive relevant security policies.

Whether you’re a wealth management institution or a credit union, AI and machine learning hold enormous opportunities for fraud detection.

However, it is critical to remember that hackers also use these technologies to circumvent protective measures. Update your machine-learning models to stay ahead of these attacks. You can also strengthen your AI-based security with good old human intelligence.

Empowering banking security: machine learning for fraud detection

Evolution of fraud detection