A multidisciplinary analysis of transparent AI-driven toxicity detection tools for civic engagement platforms

Toxic speech on online civic engagement platforms (CEPs) disproportionately affects marginalized groups and threatens the diversity of citizen voices. However, the deployment of AI-driven toxic speech detection (TSD) tools for CEPs faces complex challenges from legal, psychological, and technical perspectives that remain insufficiently explored. We present a first-of-its-kind interdisciplinary review of these challenges, focusing on the explainability of TSD systems, their compliance with European legal standards and offer a roadmap for ethical deployment. Our review reveals three main findings. First, although transparency in AI decision-making is necessary from both legal and psychological perspectives, assessing the explainability of AI-driven TSD tools, and their compliance with legal regulations within Europe, remains a significant challenge. Second, current explainability approaches, ranging from toxic span identification to advanced explainable AI methods, lack standardized metrics. This makes it difficult to assess their reliability and appropriateness for CEPs. Third, despite the importance of TSD, frameworks and best practices for CEPs are still lacking in existing literature. This paper aims to fill this gap by providing a holistic perspective on the challenges and solutions for TSD deployment. It provides the foundation for collaborative efforts to develop and standardize metrics, evaluation protocols, and best practices that can ensure AI decisions in CEPs are transparent, accountable, and aligned with users’ needs.

Zangl, Maria & Loi, Iliana & Zachos, Panagiotis & Bedek, Michael & Dimogerontakis, Emmanouil & Nikolaou, Charikleia-Eleni & Albert, Dietrich & Moustakas, Konstantinos. (2025). A multidisciplinary analysis of transparent AI-driven toxicity detection tools for civic engagement platforms. AI & SOCIETY. 1-18. 10.1007/s00146-025-02424-5.