Attention-Based Explainability for Cross-Lingual Neural Information Retrieval in Low-Resource Languages: A LaBSE Framework with a Bilingual Medical Context Case Study

Nasir Hamzah; Zico Pratama Putra

doi:10.47852/bonviewAIA62027530

Authors

Nasir Hamzah Faculty of Information Technology, Universitas Nusa Mandiri, Indonesia https://orcid.org/0009-0008-8796-4115
Zico Pratama Putra Faculty of Information Technology, Universitas Nusa Mandiri, Indonesia https://orcid.org/0000-0003-3465-921X

DOI:

https://doi.org/10.47852/bonviewAIA62027530

Keywords:

attention mechanisms, cross-lingual information retrieval, ICD-10 code retrieval, low-resource languages, neural IR

Abstract

The growing demand for equitable access to medical knowledge in multilingual contexts highlights a critical gap: traditional information retrieval (IR) systems perform poorly in low-resource languages, limiting access to medical expertise. Neural IR models, while effective, often lack interpretability—a serious concern in clinical applications such as ICD-10 code retrieval. Existing cross-lingual IR systems for low-resource languages are similarly constrained by limited task-specific tuning and the absence of real-time user feedback. This paper proposes an attention-based explainable framework for cross-lingual neural IR, specifically designed for bilingual ICD-10 code retrieval in Indonesian and English. The framework leverages Language-agnostic BERT Sentence Embedding—well-suited for low-resource settings—combined with an attention-based explainability mechanism that highlights token-level contributions, enhancing transparency and interpretability in model decision-making. The system is deployed as an interactive web-based application, demonstrating practical usability in clinical contexts. On the Indonesian MIRACL dataset, the framework achieves a mean reciprocal rank (MRR) of 0.782 and Recall@5 of 91.4% without fine-tuning. After domain-specific adaptation on the Master ICD-10 dataset, performance improves to MRR of 0.9352 and Recall@5 of 95.73%, surpassing strong baselines. These results demonstrate the viability of trustworthy, interpretable, and multilingual IR systems for healthcare in low-resource settings.

Received: 31 August 2025 | Revised: 3 Feburuary 2026 | Accepted: 24 April 2026

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

The data that support the findings of this study are openly available in GitHub at https://github.com/fendis0709/icd-10, and in Hugging Face at https://huggingface.co/datasets/miracl/miracl.

Author Contribution Statement

Nasir Hamzah: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Zico Pratama Putra: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision.

Attention-Based Explainability for Cross-Lingual Neural Information Retrieval in Low-Resource Languages: A LaBSE Framework with a Bilingual Medical Context Case Study

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

cimago-journal

Make a Submission

Keywords