Hidden Prompts Can Manipulate AI Peer Review: Detection Method Proposed

Global: Study Shows Hidden Prompts Can Manipulate AI Peer Review, Proposes Detection Method

A recent preprint posted on arXiv details a novel vulnerability in the growing use of large language models (LLMs) for scientific peer review. The authors demonstrate that concealed prompts embedded within PDF files can steer LLM‑based reviewers toward overly favorable assessments, and they outline a countermeasure that leverages similar hidden triggers to expose AI‑generated reviews. The work, submitted in December 2025, aims to alert editors and researchers to the fragility of current evaluation workflows.

Background

LLMs such as ChatGPT have become common tools for drafting manuscripts, summarizing literature, and even assisting in the review process. While these applications can accelerate research and reduce manual effort, scholars have warned that reliance on automated systems may introduce fabricated findings, biased conclusions, or misinterpretations that could propagate through the scientific record.

Attack Vector

The preprint describes a technique whereby authors embed invisible trigger phrases or code snippets in a PDF. When an LLM processes the document, the hidden prompt can “jailbreak” the model, causing it to generate a review that emphasizes positive aspects and downplays critical feedback. The authors provide experimental evidence that the approach works across several publicly available LLMs used in review platforms.

Proposed Defense

To mitigate the threat, the researchers propose an “inject‑and‑detect” strategy. Editors would deliberately insert their own invisible prompts into submissions; if a subsequent review echoes or reacts to these triggers, the response can be flagged as likely generated by an LLM rather than a human reviewer. The paper outlines the design of such triggers, expected model behavior, and safeguards to prevent misuse of the detection system itself.

Implications for the Scientific Community

If adopted, the detection method could help journals preserve the integrity of peer review by providing a low‑cost tool to differentiate human and AI contributions. However, the authors caution that widespread deployment may also encourage adversaries to craft more sophisticated prompt‑hiding techniques, creating an ongoing arms race between attackers and defenders.

Ethical Considerations and Future Work

The authors emphasize that any deployment must respect author privacy and comply with existing ethical guidelines for AI use. They call for broader community engagement to refine the approach, evaluate its effectiveness across diverse disciplines, and develop standards for transparent AI assistance in scholarly publishing.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Study Shows Hidden Prompts Can Manipulate AI Peer Review, Proposes Detection Method

Background

Attack Vector

Proposed Defense

Implications for the Scientific Community

Ethical Considerations and Future Work

Data and Protocol

Privacy Protocol