Can data analytics stop credit card fraud before it happens?

Card-payment fraud hurts margins, raises chargebacks, and harms customer experience. A Latin America study shows that segmenting merchants by size and using unsupervised prevention rules plus supervised alerts can prevent much fraud in payment gateways.

Published in Business & Management

Can data analytics stop credit card fraud before it happens?
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

SpringerLink
SpringerLink SpringerLink

Data analytics to prevent retail credit card fraud: empirical evidence from Latin America - Financial Innovation

Reducing the risk of fraud in credit card transactions is crucial for the competitiveness of companies, especially in Latin American countries. This study aims to establish measures for preventing and detecting fraud in the use of credit cards in shops through analytical methods (data mining, machine learning and artificial intelligence). To achieve this objective, the study employs a predictive methodology using descriptive and exploratory statistics and frequency, frequency & monetary (RFM) classification techniques, differentiating between SMEs and large businesses via cluster analysis and supervised models. A dataset of 221,292 card records from a Latin American merchant payment gateway for the year 2022 is used. For fraud alerts, the classification model has been selected for small and medium–sized merchants, and the multilayer perceptron (MLP) neural network has been selected for large merchants. Random forest or Gini decision tree models have been selected as backup models for retraining. For the detection of punctual fraud patterns, the K-means and partitioning around medoids (PAM) models have been selected, depending on the type of trade. The results revealed that the application of the identified models would have prevented between 48 and 85% of fraud transactions, depending on the trade size. Despite the promising results, continuous updating is recommended, as fraudsters frequently implement new fraud techniques.

Card-payment fraud is not merely an operational nuisance: it compresses margins, increases chargeback costs, degrades customer experience, and ultimately affects competitiveness. A recent study proposes a pragmatic approach—combining merchant-size segmentation, unsupervised models for preventive rules, and supervised models for alerts—and quantifies how much fraud could be avoided in a real payment-gateway setting in Latin America.

Analytics (and machine learning) in the service of commerce

The study analyzes 221,292 anonymized transactions from a payment gateway serving Latin American merchants (year 2022), described by 163 variables spanning categorical, temporal, and numerical fields. The core intuition is straightforward: fraud patterns differ substantially across merchants, and therefore a single model should not be expected to perform equally well for a small business and a large retailer.

Methodological design (without sacrificing rigor)

The paper decomposes the problem into two distinct operational objectives:

  1. Prevention (pre-authorization or near-real-time gating): identify actionable fraud patterns to enable automatic rejection or manual validation using unsupervised clustering.

  2. Alerting (post-approval monitoring): classify recent transactions as fraud / non-fraud and produce a risk signal using supervised learning.

To ensure operational relevance, the authors first segment merchants by size using an RFM-style (Recency–Frequency–Monetary) scheme and then fit models per stratum.

Central finding: merchant size changes the “best” model

The most meaningful result is not that “one model wins,” but that different models dominate in different segments:

  • Small merchants: k-means performs best for prevention, yielding rules that could block ~85% of fraud while avoiding substantial disruption to legitimate transactions.

  • Mid-sized merchants: PAM with Gower distance is strongest, with an estimated preventive effectiveness around 63%.

  • Large merchants: PAM (Gower) again leads, though with a lower stop rate, around 48%.

Overall, the study reports that applying the identified approaches could have prevented roughly 48% to 85% of fraud, depending on merchant size.

What about fraud “alerts” (when you cannot auto-decline)?

For the detection/alerting component, the authors compare several supervised classifiers (MLP, Random Forest, logistic regression, decision trees, Naïve Bayes, SVM) using standard performance metrics (accuracy, sensitivity/recall, specificity, and related measures):

  • Small merchants: MLP provides the strongest balance, with 90% accuracy and 91% sensitivity (ability to capture fraudulent cases).

  • Mid-sized merchants: Random Forest reaches 89% accuracy, though the paper emphasizes that MLP may be preferable when maximizing sensitivity (capturing fraud) is the overriding objective while maintaining solid overall performance.

  • Large merchants: Random Forest (86%) and MLP (85%) are close in accuracy; MLP stands out in sensitivity (87%), which is advantageous when the cost of false negatives is high.

In addition, the authors recommend maintaining “backup” models for retraining and operational continuity (e.g., Random Forest or a Gini-based decision tree), recognizing that fraud tactics evolve and adversaries adapt.

How reliable are these results?

The paper provides quantitative support for segmentation by merchant size:

  • A single “global” model yields AUC 0.789 and F1 0.46.

  • Size-segmented models improve substantially—for example, AUC 0.901 (small), 0.864 (mid-sized), and 0.852 (large)—with corresponding gains in F1.

The segmentation rationale is further supported by evidence of statistical heterogeneity and a population stability indicator (PSI) consistent with meaningful differences across strata.

From the paper to practice: optimizing costs, not only accuracy

A particularly operationally relevant element is that decision thresholds are not selected heuristically. The authors define an expected-cost framework contrasting the cost of missed fraud versus manual review, using a 10:1 ratio (e.g., 100 USD vs. 10 USD), and then select the threshold that minimizes expected cost by segment.

Illustrative optima reported include:

  • Small merchants: 25% threshold and cluster limit <300

  • Mid-sized merchants: 10% and <300

  • Large merchants: 10% without a limit

Implications: a realistic roadmap (and key cautions)

The paper does not present a “silver bullet.” It explicitly highlights the need for periodic retraining (suggesting approximately every six months) because fraud patterns shift over time. In terms of governance and implementation, it recommends:

  • strengthening technological infrastructure (especially among SMEs),

  • investing in analytical and ML capabilities,

  • promoting collaboration among merchants, banks, and regulators (with anonymized data),

  • protecting privacy and maintaining user trust.

Source: Rugeles Diaz, L. T., Echarte Fernández, M. Á., Jorge-Vázquez, J., & Náñez Alonso, S. L. (2025). Data analytics to prevent retail credit card fraud: empirical evidence from Latin America. Financial Innovation, 11:137. https://doi.org/10.1186/s40854-025-00879-5

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Financial Technology and Innovation
Humanities and Social Sciences > Business and Management > Corporate Finance > Financial Technology and Innovation