Domain Knowledge-Driven Relation Extraction Methods


  • Boxuan Chen School of Computer Science and Technology, China University of Mining and Technology, China
  • Guan Yuan School of Computer Science and Technology and Engineering Research Center of Mine Digitalization, China University of Mining and Technology, China



domain knowledge graph, pre-trained model, relation classification


Relation classification is one of the important tasks in natural language processing, which aims to determine the relationship between entities given a sentence and their respective positions. Most of the existing methods for relation classification are based on neural networks using pre-trained models such as BERT. In recent years, models based on pre-trained models like BERT have achieved excellent results in relation classification tasks in general domains. However, these methods often struggle when applied to specialized domains due to the limitations of the corpora used for BERT pre-training. Most pre-trained models are trained on text corpora from general sources like Wikipedia, which cover a wide range of domains. As a result, the content of these corpora in specific domains is limited and lacks the necessary expertise, leading to subpar performance of relation classification models in specific domains. While providing a large number of domain-specific corpora to pre-trained models could potentially address this issue, it comes with limitations such as increased computational requirements and insufficient training of specialized vocabulary. This paper proposes a method inspired by the K-BERT pre-training model to incorporate triplet knowledge from domain knowledge graphs into sentence sequences. The triplets are transformed into sentence trees and then fed into the BERT pre-trained model using absolute and relative indices. Our model has achieved an accuracy of 93.06%, which is markedly higher than that of any other baseline approach. This approach allows the incorporation of domain knowledge without significantly increasing the computational complexity. Additionally, the paper introduces a partial input method that enables the computer to understand input sentences from multiple dimensions and hierarchical levels. Experimental results on a medical domain dataset for relation classification, which includes type labels, demonstrate significant advantages over other relation classification models in terms of Accuracy.


Received: 24 January 2024 | Revised: 19 March 2024 | Accepted: 3 April 2024


