Positive-Unlabeled Learning Improves Detection of Corrupt Procurement Contracts in Mexico
Global: Positive-Unlabeled Learning Improves Detection of Corrupt Procurement Contracts in Mexico
A team of researchers has introduced a positive‑unlabeled (PU) learning framework to identify likely corrupt and fraudulent contracts within Mexico’s federally funded procurement system. The study, posted to arXiv in December 2025, addresses the persistent difficulty of obtaining reliable negative examples for supervised models, a gap that has limited previous analytical efforts.
Background
Public procurement fraud remains a significant obstacle for governments worldwide, with most prior investigations relying on domain‑specific risk indicators derived from individual contract attributes or limited network analyses. Conventional supervised machine learning has struggled in this arena because confirmed non‑corrupt contracts are rarely documented, leading to biased training sets.
Methodology
The researchers combined publicly available procurement records from Mexico with company sanction databases to construct a dataset containing confirmed positive cases and a large pool of unlabeled contracts. They applied PU learning algorithms that fuse traditional red‑flag indicators—such as single‑source awards and unusually short bidding periods—with network‑derived metrics, including contract centrality and supplier eigenvector centrality, to estimate the likelihood of corruption.
Results
Evaluation of the best‑performing PU model showed it captured, on average, 32 percent more known positive contracts than baseline approaches and achieved a performance metric 2.3 times higher than random guessing. These outcomes substantially outperformed models that relied solely on conventional red‑flag features.
Feature Importance
Analysis using Shapley Additive Explanations (SHAP) highlighted that network‑based attributes—particularly those associated with contracts situated in the core of the procurement network or suppliers possessing high eigenvector centrality—were the most influential predictors. Traditional red‑flag variables contributed additional predictive power, especially for contracts awarded through competitive tender processes.
Implications and Adaptability
The proposed approach offers law‑enforcement agencies in Mexico a data‑driven tool to prioritize investigations of high‑risk contracts. Moreover, the methodology is designed to be transferable to other national contexts where comparable procurement and sanction data are accessible.
Future Directions
Authors suggest expanding the model to incorporate temporal dynamics of contract networks and exploring semi‑supervised techniques that could further mitigate the scarcity of labeled negative examples. Ongoing validation with real‑world enforcement outcomes will be essential to refine predictive accuracy.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung