LogicLens: A Unified Framework for Text-Centric Forgery Detection

Global: LogicLens Introduces Unified Visual-Textual Co-Reasoning Framework for Text-Centric Forgery Detection

A team of researchers has unveiled LogicLens, a unified framework that integrates detection, grounding, and explanation of text‑centric forgeries into a single visual‑textual co‑reasoning task. The work was posted to the arXiv preprint server in December 2025 and targets the growing security challenges posed by advanced AI‑generated content. By reformulating multiple sub‑tasks as a joint objective, the authors aim to improve holistic performance and reasoning capabilities. The study also introduces a new dataset, RealText, to support training and evaluation. According to the authors, the approach seeks to strengthen information authenticity across digital media.

Unified Task Formulation

LogicLens reconceptualizes forgery analysis by treating detection, grounding, and explanatory reasoning as interrelated components of a single task. This contrasts with prior pipelines that handle each step independently, potentially missing cross‑modal cues that could enhance overall accuracy.

Cross‑Cues‑Aware Chain of Thought Mechanism

The core of the framework is the Cross‑Cues‑aware Chain of Thought (CCT) mechanism, which iteratively cross‑validates visual signals against textual logic. The authors describe CCT as enabling deep reasoning that aligns image regions with corresponding textual inconsistencies, thereby improving the model’s ability to pinpoint fabricated content.

Weighted Multi‑Task Reward and GRPO Optimization

To harmonize the objectives of detection, grounding, and explanation, the researchers propose a weighted multi‑task reward function optimized via Gradient‑Regularized Policy Optimization (GRPO). This design seeks to balance performance across all sub‑tasks while preventing any single objective from dominating training dynamics.

PR² Pipeline for High‑Quality Annotations

The paper introduces the PR² (Perceiver, Reasoner, Reviewer) pipeline, a hierarchical multi‑agent system that generates cognitively aligned annotations. According to the authors, this pipeline produces fine‑grained labels, including pixel‑level segmentation and textual explanations, which are essential for training the LogicLens model.

RealText Dataset Overview

RealText, the dataset accompanying the study, comprises 5,397 images annotated with authenticity labels, segmentation masks, and explanatory text. The authors emphasize the dataset’s diversity, covering a range of real‑world scenarios where text appears within complex visual contexts.

Experimental Results

Extensive experiments demonstrate LogicLens’s superiority on multiple benchmarks. In a zero‑shot evaluation on the T‑IC13 benchmark, the model outperforms a specialized framework by 41.4% and GPT‑4o by 23.4% in macro‑average F1 score. On the dense‑text T‑SROIE dataset, LogicLens achieves a notable lead over other multimodal large language model methods in macro‑average F1, CSS, and mF1 metrics. The authors attribute these gains to the integrated reasoning and the enriched annotation pipeline.

Future Directions and Availability

The researchers indicate that the LogicLens code, model weights, and the RealText dataset will be released publicly to facilitate further research. They anticipate that the unified approach could be extended to other multimodal forgery detection scenarios beyond text‑centric cases.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

LogicLens Introduces Unified Visual-Textual Co-Reasoning Framework for Text-Centric Forgery Detection