LogicLens Unveiled as Unified Framework for Detecting Text-Centric Forgeries
Global: LogicLens Unveiled as Unified Framework for Detecting Text-Centric Forgeries
Researchers have unveiled LogicLens, a unified visual-textual co-reasoning framework designed to detect sophisticated text-centric forgeries that exploit recent advances in AI-generated content. The system integrates detection, grounding, and explanation into a single joint task, aiming to enhance overall performance and reliability.
Background
The proliferation of AI-generated content has increased the prevalence of forgeries that manipulate text embedded in images, posing risks to information authenticity. Existing analysis tools often rely on coarse visual cues and treat detection, grounding, and explanation as separate tasks, limiting their effectiveness against nuanced attacks.
Framework Overview
LogicLens addresses these gaps through a Cross-Cues-aware Chain of Thought (CCT) mechanism, which iteratively cross‑validates visual cues against textual logic. A weighted multi‑task reward function guides GRPO‑based optimization, ensuring balanced performance across detection, grounding, and explanatory sub‑tasks.
Annotation Pipeline
To support training, the authors introduced the PR² (Perceiver, Reasoner, Reviewer) pipeline, a hierarchical multi‑agent system that generates high‑quality, cognitively aligned annotations. The pipeline iteratively refines outputs, producing detailed explanations, pixel‑level segmentations, and authenticity labels.
Dataset Release
Using PR², the team assembled RealText, a dataset of 5,397 images containing fine‑grained annotations, including textual explanations, segmentation masks, and ground‑truth authenticity tags. The dataset is intended to facilitate research on visual‑textual forgery detection.
Performance Evaluation
Extensive experiments show that LogicLens outperforms existing benchmarks. In a zero‑shot evaluation on the T‑IC13 benchmark, it exceeds a specialized framework by 41.4% and GPT‑4o by 23.4% in macro‑average F1 score. On the dense‑text T‑SROIE dataset, LogicLens achieves leading scores in mF1, CSS, and macro‑average F1 compared with other multimodal large language model approaches.
Implications and Availability
The authors plan to release the RealText dataset, the LogicLens model, and associated code under an open‑access license, encouraging broader community validation and further development of robust forgery detection tools.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung