Fine-Tuning IndoBERTa for Indonesian Digital News Sentiment Classification
DOI:
https://doi.org/10.47852/bonviewAIA62027506Keywords:
IndoBERTa, sentiment analysis, domain adaptation, Indonesian NLP, transformer fine-tuningAbstract
The rapid growth of Indonesian digital news content necessitates automated sentiment analysis systems capable of handling formal journalistic discourse, which differs substantially from the social media text used to train most existing sentiment classifiers. This study investigates domain adaptation for Indonesian news sentiment analysis by fine-tuning IndoBERTa, a RoBERTa-based model for Indonesian language processing. Using a structured data mining workflow inspired by the Cross-Industry Standard Process for Data Mining, we collected and manually annotated 1300 news articles from two major Indonesian news portals (Kompas and Detik) into positive, negative, and neutral categories. Zero-shot evaluation using a social-media-trained sentiment model yielded 14% accuracy, with over 89% of samples predicted as neutral, indicating a strong domain-induced classification bias. After fine-tuning, IndoBERTa achieved 98% accuracy, a Cohen’s kappa score of 0.98, and balanced F1-scores across all sentiment classes. While keyword-guided data collection and class balancing likely inflate absolute performance, the results clearly demonstrate the effectiveness of domain-specific adaptation for Indonesian news sentiment classification. The fine-tuned model is deployed in a real-time analysis pipeline and publicly released to support further research in Indonesian Natural Language Processing (NLP).
Received: 31 August 2025 | Revised: 9 February 2026 | Accepted: 7 May 2026
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Author Contribution Statement
Desi Masdin Dama: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Visualization. Tati Mardiana: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision. Riki Supriyadi: Conceptualization, Methodology, Validation, Supervision. Zico Pratama Putra: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision. Dicky Octaviano: Validation, Investigation, Writing – review & editing. Achmad Bayhaqy: Software, Resources, Data curation.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.