KOTOX Dataset: Enhancing Deobfuscation and Detoxification of Toxic Korean Language

Global: Korean Toxic Language Dataset Introduced to Enhance Deobfuscation and Detoxification

A team of researchers released a new Korean-language dataset, KOTOX, on arXiv in October 2025 to address the growing challenge of toxic content that is deliberately disguised. The dataset provides paired neutral, toxic, and obfuscated sentences, enabling models to learn both deobfuscation and detoxification. According to the abstract, the work aims to improve robustness of detection systems without compromising performance on standard, non‑obfuscated text.

Background

Online platforms have increasingly struggled with toxic language, prompting extensive research into detection and mitigation techniques. Most existing studies focus on straightforward, non‑obfuscated text, which leaves a gap when users intentionally mask harmful expressions.

Linguistic Challenges in Korean

Korean’s agglutinative structure allows toxic expressions to be concealed by attaching particles or altering morphemes, creating patterns that are difficult for conventional models to recognize. This linguistic property has been largely unexplored in prior work, motivating the need for a dedicated dataset.

Dataset Construction

The authors categorized Korean obfuscation patterns into linguistically grounded classes and derived transformation rules from real‑world examples. Using these rules, they generated paired sentences: the original toxic statement, its obfuscated version, and a neutral rewrite. The resulting corpus supports simultaneous training for deobfuscation and detoxification tasks.

Model Performance

Experimental results reported in the paper indicate that models trained on KOTOX handle obfuscated inputs more effectively than baseline systems, while maintaining comparable accuracy on clean text. The authors attribute this improvement to the explicit exposure to varied obfuscation patterns during training.

Implications and Future Directions

The dataset is positioned to facilitate research on robust toxic‑content moderation for Korean, potentially informing the development of safer large‑language models. The authors suggest that the methodology could be adapted to other agglutinative languages facing similar challenges.

Access and Resources

All code and data associated with KOTOX are publicly available on GitHub at https://github.com/leeyejin1231/KOTOX, enabling researchers and practitioners to replicate and extend the findings.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Korean Toxic Language Dataset Introduced to Enhance Deobfuscation and Detoxification