The average bank loses more than 5% of their revenue to fraud.But that is not even the biggest loss, because the loss in confidence of the customers for such banks is of far greater significance than those figures. Some losses even go undetected for years, thus causing the banks to lose the customers to the competition.
Fraud could come from customers, merchants, as well as from employees. Counterfeit credit cards, stolen credit cards, stolen account login, bank website cloning, merchant website cloning, etc. The list is too long and as technology advances, other fraudulent methods are being created.
Gone are those days when bank fraudsters, used to be discovered and pursued only a few days after the crime had been committed. Some banks still rely 100% on whistle blowing and other traditional systems of internal audit and revenue assurance .For a bank with more hundreds of thousands of customers, these traditional systems are not only cumbersome, impractical and also but miss out two key aspects which are:
- Fraud need to be detected in real time, and
- Banks already have more than enough data about every customer to be able to predict fraud
As introduced in the previous article “introducing the application of data science per industry” the data is already available and this data keeps doubling every 2 years. Banks are already sitting on a huge oil reserve, waiting to be explored. And this is where data science fits in perfectly-the perfect mining engineer for the perfect mining job. This is because Data Science is about modelling this massive, rich and freely available data, in order to train the machine to see patterns and detect any anomality in the system. Data scientists are well trained to run models on both structured and unstructured data to come up with what is a normal pattern for each customer, against what is abnormal for each customer. And with such a model running, the audit team will move from a traditional cyclical approach to fraud, to a continuous and risk-based modelling approach.
Whatt Data Science will bring on the table
All actions performed by an accout are agregegated into one big customer base and a machine learning database.Starting from the data on the account creation, all ATM transactions,all online banking transactions, calls made to call center, are gathered,cleaned and modelled.
A real time model is run,which checks each current transaction against the information collected from the aggregated customer base above to see if it expected or unusual.For example, is the amount being withdrawn too heavy for this account? In a matter of seconds, billions of transactions are verified. The transactions will each be given a risk score,with high scores sending alerts to the audit team.
A sample Data Scientist’s Workflow for fraud detection in a bank
A data scientist will typically follow the steps below to come up with a pattern detection model against fraud in a bank.
- Collecting the data:
The data scientist will create a data base which aggregates data from as multiple sources.It starts with collecting all data and metadata about every transaction an account or customer has done with a bank. Such as creation of the account, ATM transactions, overdrafts, online interactions, call centre interactions, etc.
Next, there is also external data about the customer, such as their social media activities which are also collected per customer.
2. Cleaning the data
About 90% of the time will be spent here, extracting key features/ words from the Social media and website data, removing empty cells, harmonising dates, and finally matching and merging all these data to come up with one big customer data base with hundreds of variables describing a customer.
3. Exploratory Data Analysis:
Descriptive statistics and Visualization will be applied to discover hidden patterns and understand some of the following: where fraud easily occurs, how fraud will appear in the data as an outlier to a normal distribution, etc
4. Modelling and predicting churn
A classification model will typically be used for fraud detection. First, a training set is used to train the model on the key features which can be used to predict fraud and a test set is used to test predictions made by this newly trained model and the results compared actual fraud figures, so as to validate the accuracy of the model.
5. Conclusions and Recommendations
The output will be a classification model whose accuracy will be determined, by validating its output against real world data .This model is then integrated in the bank’s software and systems and it becomes the basis of detecting fraud.
Practical advice to bank decision makers:
- Most banks are sitting on huge goldmine-data about their customers. But are just not aware of the value and what insights it could bring to the banks
- A data focused culture/organigram needs to be created
- Talented data scientists should be recruited or trained from within the company with the right mix of : Programming, Statistics and bank domain knowledge
- The models that have been validated for fraud detection should be integrated in the company’s software or systems.