Large Language Models Fail to Recognize Ambiguous Human Names in Privacy Tasks

Global: Study Finds Large Language Models Miss Ambiguous Human Names in Privacy Tasks

Researchers report that large language models (LLMs) employed in privacy pipelines frequently fail to recognize ambiguous human names, resulting in a notable decline in the detection of personally identifiable information (PII). The findings stem from an analysis of how LLMs handle short text snippets that contain names with multiple possible entity interpretations.

Background

LLMs have become integral to automated privacy solutions, where they are tasked with identifying and redacting sensitive data such as human names. The underlying assumption is that these models can reliably distinguish names from other text elements, a premise now called into question by the new study.

Benchmark Creation

To evaluate this assumption, the authors introduced AmBench, a benchmark comprising more than 12,000 real but ambiguous human names. Each name appears in dozens of concise snippets designed to be compatible with several entity types, exposing linguistic cues that can mislead the models.

Evaluation Results

Testing twelve state‑of‑the‑art LLMs against AmBench revealed a recall drop of 20 % to 40 % compared with more recognizable names. This performance gap indicates that current LLM‑based privacy tools may leave a substantial portion of ambiguous names unprotected.

Effect of Prompt Injections

The study also examined the impact of benign prompt injections—instruction‑like user texts that can cause models to conflate data with commands. In the presence of such prompts, the LLM‑powered enterprise tool Clio, developed by Anthropic AI to extract privacy‑preserving insights from conversations with Claude, ignored ambiguous names up to four times more often.

Implications for Fairness

These results raise concerns about uneven privacy protection driven by linguistic properties of names. The authors argue that the disparity could translate into fairness issues, as certain demographic groups may be disproportionately affected by name ambiguity.

Recommendations

The paper calls for a systematic investigation into the failure modes of LLM‑based privacy solutions and the development of countermeasures to mitigate blind spots. Future work is urged to address both technical robustness and equitable privacy enforcement.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Study Finds Large Language Models Miss Ambiguous Human Names in Privacy Tasks