Study Reveals Vision Language Models Leak More Personal Data for High‑Visibility Individuals
Global: Evaluation of Vision Language Model PII Leakage Across Online Visibility
Researchers have introduced a new benchmark, PII-VisBench, to assess how Vision Language Models (VLMs) handle personally identifiable information (PII) when queried about individuals with varying levels of online presence. The study, posted on arXiv, evaluates 18 open‑source VLMs ranging from 0.3 billion to 32 billion parameters and measures both the models’ refusal to answer and the incidence of unintended PII disclosure.
Benchmark Design
PII-VisBench comprises 4,000 unique probing queries distributed across 200 distinct subjects. Each subject is classified into one of four visibility categories—high, medium, low, or zero—based on the quantity and nature of information publicly available about them online.
Evaluation Methodology
The authors assess models using two primary metrics: the Refusal Rate, which captures the percentage of queries the model declines to answer, and the Conditional PII Disclosure Rate, which records the fraction of non‑refusal responses that contain PII. All experiments are conducted under consistent prompting conditions to isolate the effect of subject visibility.
Key Findings
Across the evaluated models, refusal rates rise and conditional disclosure rates fall as subject visibility declines. Specifically, the Refusal Rate drops from 9.10 % for high‑visibility subjects to 5.34 % for low‑visibility subjects, indicating that models are more likely to provide PII when more data exists online.
Model Variability
Significant heterogeneity emerges among model families, with some architectures exhibiting markedly higher disclosure rates than others. Additionally, the type of PII—such as names, locations, or contact details—affects the likelihood of leakage, suggesting uneven privacy protection across data categories.
Prompt Engineering Vulnerabilities
The study also demonstrates that paraphrasing and jailbreak‑style prompts can bypass model safeguards, leading to increased PII exposure. These findings underscore the susceptibility of VLMs to adversarial prompting techniques.
Recommendations for Future Work
Authors advocate for visibility‑aware safety evaluations and targeted training interventions that consider a subject’s online footprint. Incorporating such considerations may improve model robustness against privacy‑related attacks.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung