SafeRedir Enables Inference‑Time Unlearning for Image Generation Models
Global: SafeRedir Framework Enables Inference‑Time Unlearning for Image Generation Models
Background and Motivation
Researchers have introduced SafeRedir, an inference‑time framework designed to remove unsafe concepts from image generation models without altering the underlying networks. The approach targets safety and compliance risks associated with the inadvertent reproduction of non‑safe‑for‑work (NSFW) imagery and copyrighted artistic styles.
Limitations of Existing Solutions
Current mitigation strategies, including post‑hoc filtering and model‑level unlearning, often require costly retraining, can degrade the quality of benign outputs, and may fail when prompts are paraphrased or adversarially crafted. SafeRedir seeks to overcome these shortcomings by operating solely at inference.
SafeRedir Architecture
The system comprises two core components: a latent‑aware multimodal safety classifier that detects unsafe generation trajectories, and a token‑level delta generator that applies precise semantic redirections. Auxiliary predictors support token masking and adaptive scaling to localize interventions.
Operational Mechanism
During inference, SafeRedir intercepts token embeddings and introduces calculated deltas that steer the prompt toward semantically safe regions in the embedding space. This token‑level intervention occurs without modifying the diffusion model itself.
Performance Evaluation
Empirical tests across several unlearning tasks show that SafeRedir effectively erases harmful concepts while preserving semantic fidelity and perceptual quality. The framework also demonstrates heightened resistance to adversarial attacks compared with prior methods.
Broader Applicability
Experiments indicate that SafeRedir generalizes across multiple diffusion backbones and integrates with previously unlearned models, confirming its plug‑and‑play compatibility and potential for wide adoption.
Open Access and Resources
The authors have made the code and dataset publicly available on GitHub, facilitating further research and implementation.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung