Fine-Tuning IndoBERTa for Indonesian Digital News Sentiment Classification

Authors

  • Desi Masdin Dama Department of Data Science, Universitas Nusa Mandiri, Indonesia
  • Tati Mardiana Department of Data Science, Universitas Nusa Mandiri, Indonesia https://orcid.org/0000-0001-7618-1766
  • Riki Supriyadi Department of Data Science, Universitas Nusa Mandiri, Indonesia https://orcid.org/0000-0002-2628-9688
  • Zico Pratama Putra Faculty of Information Technology, Universitas Nusa Mandiri, Indonesia https://orcid.org/0000-0003-3465-921X
  • Dicky Octaviano Faculty of Management, Universitas Bina Sarana Informatika, Indonesia https://orcid.org/0009-0007-5999-7692
  • Achmad Bayhaqy Department of Data Science, Universitas Nusa Mandiri, Indonesia

DOI:

https://doi.org/10.47852/bonviewAIA62027506

Keywords:

IndoBERTa, sentiment analysis, domain adaptation, Indonesian NLP, transformer fine-tuning

Abstract

The rapid growth of Indonesian digital news content necessitates automated sentiment analysis systems capable of handling formal journalistic discourse, which differs substantially from the social media text used to train most existing sentiment classifiers. This study investigates domain adaptation for Indonesian news sentiment analysis by fine-tuning IndoBERTa, a RoBERTa-based model for Indonesian language processing. Using a structured data mining workflow inspired by the Cross-Industry Standard Process for Data Mining, we collected and manually annotated 1300 news articles from two major Indonesian news portals (Kompas and Detik) into positive, negative, and neutral categories. Zero-shot evaluation using a social-media-trained sentiment model yielded 14% accuracy, with over 89% of samples predicted as neutral, indicating a strong domain-induced classification bias. After fine-tuning, IndoBERTa achieved 98% accuracy, a Cohen’s kappa score of 0.98, and balanced F1-scores across all sentiment classes. While keyword-guided data collection and class balancing likely inflate absolute performance, the results clearly demonstrate the effectiveness of domain-specific adaptation for Indonesian news sentiment classification. The fine-tuned model is deployed in a real-time analysis pipeline and publicly released to support further research in Indonesian Natural Language Processing (NLP).

 

Received: 31 August 2025 | Revised: 9 February 2026 | Accepted: 7 May 2026

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

 

Author Contribution Statement

Desi Masdin Dama: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Visualization. Tati Mardiana: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision. Riki Supriyadi: Conceptualization, Methodology, Validation, Supervision. Zico Pratama Putra: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision. Dicky Octaviano: Validation, Investigation, Writing – review & editing. Achmad Bayhaqy: Software, Resources, Data curation.


Downloads

Published

2026-06-03

Issue

Section

Research Article

How to Cite

Dama, D. M., Mardiana, T., Supriyadi, R., Putra, Z. P., Octaviano, D., & Bayhaqy, A. (2026). Fine-Tuning IndoBERTa for Indonesian Digital News Sentiment Classification. Artificial Intelligence and Applications. https://doi.org/10.47852/bonviewAIA62027506