Public Discussion of DeepSeek Large Language Model on Twitter: A Mixed-Methods Sentiment and Topic Modeling
DOI:
https://doi.org/10.47852/bonviewAIA62027447Keywords:
DeepSeek, Twitter, large language model, natural language processing, sentiment analysisAbstract
Artificial intelligence (AI) has moved from research laboratories into everyday tools used by millions worldwide. In recent years, advances in natural language AI systems have sparked extensive public exploration and discussion. This study investigates overall public sentiment and key discussion topics related to the DeepSeek large language model (LLM) on Twitter (now rebranded as X) and examines sentiment differences across discussion topics during various DeepSeek-related events. After data collection, Python was used to perform preliminary cleaning and screening of English-language tweets. The Valence Aware Dictionary and Sentiment Reasoner (VADER) sentiment analysis tool was applied to classify tweet sentiment. Based on the VADER labels, the dataset was stratified to obtain a high-quality sample of 5000 tweets while preserving the original sentiment distribution. To further explore discussion themes, latent Dirichlet allocation combined with coherence score evaluation was employed for topic modeling. Topic-level sentiment analysis was then conducted across different DeepSeek-related events to assess public attitudes toward each discussion topic. Results indicate that overall public sentiment toward DeepSeek LLM is predominantly positive. Topic modeling identified 10 optimal discussion topics, covering areas such as technical performance, economic impact, political and cultural context, and international competition. The findings also reveal significant differences in sentiment distribution across topics, demonstrating the practical value of combining sentiment analysis and topic modeling for business intelligence and AI product optimization.
Received: 29 August 2025 | Revised: 26 December 2026 | Accepted: 11 March 2026
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/datasets/bwandowando/tweets-and-reaction-on-deepseek-models and https://www.kaggle.com/datasets/datatattle/covid-19-nlp-text-classification.
Author Contribution Statement
Wei Chien Ng: Conceptualization, Methodology, Resources, Writing – original draft, Supervision, Project administration, Funding acquisition. Shiwen Chen: Methodology, Software, Formal analysis, Investigation, Data curation, Writing – original draft. Quan Kai Ang: Writing – review & editing, Visualization. Yu Qing Soong: Validation, Writing – review & editing.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.