Model-Centric Privacy Evaluation: A New Approach to Synthetic Tabular Data

Global: Study Calls for Model-Centric Privacy Evaluation of Synthetic Tabular Data

A new research paper posted on arXiv on January 2026 examines how synthetic tabular data generated by machine‑learning models may still expose personal information when the underlying model is accessible for querying. The authors argue that privacy assessments must move beyond evaluating a single released dataset and instead consider the capabilities of the generative model itself, aligning the analysis with the European Union’s General Data Protection Regulation (GDPR). Their goal is to provide a framework that regulators, developers, and data custodians can use to gauge identifiability risks more accurately.

Limitations of Dataset‑Centric Anonymity

Current privacy evaluations often treat synthetic data as an isolated product, measuring anonymity only at the level of the released dataset. This approach overlooks scenarios in which the trained model is deployed as a service or made available for interaction, a situation increasingly common in commercial and research settings.

Adopting a Model‑Centric Perspective

The paper proposes a shift toward model‑centric privacy analysis, emphasizing that the risk profile depends on the attacker’s access to the generative model and the types of queries it can answer. By grounding assessments in state‑of‑the‑art privacy attacks—such as membership inference and reconstruction attacks—the authors demonstrate how model access can amplify identifiability threats.

GDPR Interpretation Under Model Access

Interpreting GDPR definitions of personal data and anonymization through the lens of model accessibility, the authors identify specific risk categories that must be mitigated. They map these risks to concrete threat models, illustrating how the regulation’s intent to protect individuals can be compromised when synthetic data systems expose model parameters or respond to unrestricted queries.

Comparing Privacy Mechanisms

The study contrasts two prevalent mechanisms used alongside synthetic data: Differential Privacy (DP) and Similarity‑Based Privacy Metrics (SBPMs). While DP is shown to provide quantifiable guarantees that can limit identifiability across a range of attacks, the authors contend that SBPMs lack rigorous safeguards and may give a false sense of security.

Implications for Stakeholders

Concluding that synthetic data techniques alone do not ensure sufficient anonymization, the authors recommend integrating robust DP safeguards when models are exposed. They also call for regulatory guidance that reflects model‑centric threat assessments, enabling more responsible deployment of synthetic data solutions in both research and industry.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Study Calls for Model-Centric Privacy Evaluation of Synthetic Tabular Data

Limitations of Dataset‑Centric Anonymity

Adopting a Model‑Centric Perspective

GDPR Interpretation Under Model Access

Comparing Privacy Mechanisms

Implications for Stakeholders

Data and Protocol

Privacy Protocol