SafeRedir Framework: Inference-Time Unlearning for Image Generation Models

Global: SafeRedir Framework Enables Inference‑Time Unlearning for Image Generation Models

Background and Motivation

Researchers have introduced SafeRedir, an inference‑time framework designed to remove unsafe concepts from image generation models without altering the underlying networks. The approach targets safety and compliance risks associated with the inadvertent reproduction of non‑safe‑for‑work (NSFW) imagery and copyrighted artistic styles.

Limitations of Existing Solutions

Current mitigation strategies, including post‑hoc filtering and model‑level unlearning, often require costly retraining, can degrade the quality of benign outputs, and may fail when prompts are paraphrased or adversarially crafted. SafeRedir seeks to overcome these shortcomings by operating solely at inference.

SafeRedir Architecture

The system comprises two core components: a latent‑aware multimodal safety classifier that detects unsafe generation trajectories, and a token‑level delta generator that applies precise semantic redirections. Auxiliary predictors support token masking and adaptive scaling to localize interventions.

Operational Mechanism

During inference, SafeRedir intercepts token embeddings and introduces calculated deltas that steer the prompt toward semantically safe regions in the embedding space. This token‑level intervention occurs without modifying the diffusion model itself.

Performance Evaluation

Empirical tests across several unlearning tasks show that SafeRedir effectively erases harmful concepts while preserving semantic fidelity and perceptual quality. The framework also demonstrates heightened resistance to adversarial attacks compared with prior methods.

Broader Applicability

Experiments indicate that SafeRedir generalizes across multiple diffusion backbones and integrates with previously unlearned models, confirming its plug‑and‑play compatibility and potential for wide adoption.

Open Access and Resources

The authors have made the code and dataset publicly available on GitHub, facilitating further research and implementation.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

SafeRedir Enables Inference‑Time Unlearning for Image Generation Models