Attention-Based Explainability for Cross-Lingual Neural Information Retrieval in Low-Resource Languages: A LaBSE Framework with a Bilingual Medical Context Case Study
DOI:
https://doi.org/10.47852/bonviewAIA62027530Keywords:
attention mechanisms, cross-lingual information retrieval, ICD-10 code retrieval, low-resource languages, neural IRAbstract
The growing demand for equitable access to medical knowledge in multilingual contexts highlights a critical gap: traditional information retrieval (IR) systems perform poorly in low-resource languages, limiting access to medical expertise. Neural IR models, while effective, often lack interpretability—a serious concern in clinical applications such as ICD-10 code retrieval. Existing cross-lingual IR systems for low-resource languages are similarly constrained by limited task-specific tuning and the absence of real-time user feedback. This paper proposes an attention-based explainable framework for cross-lingual neural IR, specifically designed for bilingual ICD-10 code retrieval in Indonesian and English. The framework leverages Language-agnostic BERT Sentence Embedding—well-suited for low-resource settings—combined with an attention-based explainability mechanism that highlights token-level contributions, enhancing transparency and interpretability in model decision-making. The system is deployed as an interactive web-based application, demonstrating practical usability in clinical contexts. On the Indonesian MIRACL dataset, the framework achieves a mean reciprocal rank (MRR) of 0.782 and Recall@5 of 91.4% without fine-tuning. After domain-specific adaptation on the Master ICD-10 dataset, performance improves to MRR of 0.9352 and Recall@5 of 95.73%, surpassing strong baselines. These results demonstrate the viability of trustworthy, interpretable, and multilingual IR systems for healthcare in low-resource settings.
Received: 31 August 2025 | Revised: 3 Feburuary 2026 | Accepted: 24 April 2026
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in GitHub at https://github.com/fendis0709/icd-10, and in Hugging Face at https://huggingface.co/datasets/miracl/miracl.
Author Contribution Statement
Nasir Hamzah: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Zico Pratama Putra: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.