Domain Knowledge-Driven Relation Extraction Methods
DOI:
https://doi.org/10.47852/bonviewJDSIS42022524Keywords:
domain knowledge graph, pre-trained model, relation classificationAbstract
Relation classification is one of the important tasks in natural language processing, which aims to determine the relationship between entities given a sentence and their respective positions. Most of the existing methods for relation classification are based on neural networks using pre-trained models such as BERT. In recent years, models based on pre-trained models like BERT have achieved excellent results in relation classification tasks in general domains. However, these methods often struggle when applied to specialized domains due to the limitations of the corpora used for BERT pre-training. Most pre-trained models are trained on text corpora from general sources like Wikipedia, which cover a wide range of domains. As a result, the content of these corpora in specific domains is limited and lacks the necessary expertise, leading to subpar performance of relation classification models in specific domains. While providing a large number of domain-specific corpora to pre-trained models could potentially address this issue, it comes with limitations such as increased computational requirements and insufficient training of specialized vocabulary. This paper proposes a method inspired by the K-BERT pre-training model to incorporate triplet knowledge from domain knowledge graphs into sentence sequences. The triplets are transformed into sentence trees and then fed into the BERT pre-trained model using absolute and relative indices. Our model has achieved an accuracy of 93.06%, which is markedly higher than that of any other baseline approach. This approach allows the incorporation of domain knowledge without significantly increasing the computational complexity. Additionally, the paper introduces a partial input method that enables the computer to understand input sentences from multiple dimensions and hierarchical levels. Experimental results on a medical domain dataset for relation classification, which includes type labels, demonstrate significant advantages over other relation classification models in terms of Accuracy.
Received: 24 January 2024 | Revised: 19 March 2024 | Accepted: 3 April 2024
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in [Github] at https://github.com/Soso6666/Classification-data.
Author Contribution Statement
Boxuan Chen: Methodology, Software, Validation, Resources, Data curation, Writing - original draft, Visualization. Guan Yuan: Conceptualization, Methodology, Formal analysis, Investigation, Resources, Data curation, Writing - review & editing, Supervision, Project administration.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.