Fine-Tuning IndoBERTa for Indonesian Digital News Sentiment Classification

Desi Masdin Dama; Tati Mardiana; Riki Supriyadi; Zico Pratama Putra; Dicky Octaviano; Achmad Bayhaqy

doi:10.47852/bonviewAIA62027506

Authors

Desi Masdin Dama Department of Data Science, Universitas Nusa Mandiri, Indonesia
Tati Mardiana Department of Data Science, Universitas Nusa Mandiri, Indonesia https://orcid.org/0000-0001-7618-1766
Riki Supriyadi Department of Data Science, Universitas Nusa Mandiri, Indonesia https://orcid.org/0000-0002-2628-9688
Zico Pratama Putra Faculty of Information Technology, Universitas Nusa Mandiri, Indonesia https://orcid.org/0000-0003-3465-921X
Dicky Octaviano Faculty of Management, Universitas Bina Sarana Informatika, Indonesia https://orcid.org/0009-0007-5999-7692
Achmad Bayhaqy Department of Data Science, Universitas Nusa Mandiri, Indonesia

DOI:

https://doi.org/10.47852/bonviewAIA62027506

Keywords:

IndoBERTa, sentiment analysis, domain adaptation, Indonesian NLP, transformer fine-tuning

Abstract

The rapid growth of Indonesian digital news content necessitates automated sentiment analysis systems capable of handling formal journalistic discourse, which differs substantially from the social media text used to train most existing sentiment classifiers. This study investigates domain adaptation for Indonesian news sentiment analysis by fine-tuning IndoBERTa, a RoBERTa-based model for Indonesian language processing. Using a structured data mining workflow inspired by the Cross-Industry Standard Process for Data Mining, we collected and manually annotated 1300 news articles from two major Indonesian news portals (Kompas and Detik) into positive, negative, and neutral categories. Zero-shot evaluation using a social-media-trained sentiment model yielded 14% accuracy, with over 89% of samples predicted as neutral, indicating a strong domain-induced classification bias. After fine-tuning, IndoBERTa achieved 98% accuracy, a Cohen’s kappa score of 0.98, and balanced F1-scores across all sentiment classes. While keyword-guided data collection and class balancing likely inflate absolute performance, the results clearly demonstrate the effectiveness of domain-specific adaptation for Indonesian news sentiment classification. The fine-tuned model is deployed in a real-time analysis pipeline and publicly released to support further research in Indonesian Natural Language Processing (NLP).

Received: 31 August 2025 | Revised: 9 February 2026 | Accepted: 7 May 2026

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Author Contribution Statement

Desi Masdin Dama: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Visualization. Tati Mardiana: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision. Riki Supriyadi: Conceptualization, Methodology, Validation, Supervision. Zico Pratama Putra: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision. Dicky Octaviano: Validation, Investigation, Writing – review & editing. Achmad Bayhaqy: Software, Resources, Data curation.

Fine-Tuning IndoBERTa for Indonesian Digital News Sentiment Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

cimago-journal

Make a Submission

Keywords