Please wait a minute...
Journal of Data and Information Science  2019, Vol. 4 Issue (4): 42-55    DOI: 10.2478/jdis-2019-0020
Research Paper     
Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts
Gaihong Yu1,2,Zhixiong Zhang1,2,(),Huan Liu1,2,Liangping Ding1,2
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
3Wuhan Library, Chinese Academy of Sciences, Wuhan 430071, China
Download: PDF (3097 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

Purpose: Move recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units. To improve the performance of move recognition in scientific abstracts, a novel model of move recognition is proposed that outperforms the BERT-based method.

Design/methodology/approach: Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences. In this paper, inspired by the BERT masked language model (MLM), we propose a novel model called the masked sentence model that integrates the content and contextual information of the sentences in move recognition. Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps. Then, we compare our model with HSLN-RNN, BERT-based and SciBERT using the same dataset.

Findings: Compared with the BERT-based and SciBERT models, the F1 score of our model outperforms them by 4.96% and 4.34%, respectively, which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-the-art results of HSLN-RNN at present.

Research limitations: The sequential features of move labels are not considered, which might be one of the reasons why HSLN-RNN has better performance. Our model is restricted to dealing with biomedical English literature because we use a dataset from PubMed, which is a typical biomedical database, to fine-tune our model.

Practical implications: The proposed model is better and simpler in identifying move structures in scientific abstracts and is worthy of text classification experiments for capturing contextual features of sentences.

Originality/value: The study proposes a masked sentence model based on BERT that considers the contextual features of the sentences in abstracts in a new way. The performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.



Key wordsMove recognition      BERT      Masked sentence model      Scientific abstracts     
Received: 27 September 2019      Published: 19 December 2019
Corresponding Authors: Zhixiong Zhang     E-mail: zhangzhx@mail.las.ac.cn
Cite this article:

Gaihong Yu, Zhixiong Zhang, Huan Liu, Liangping Ding. Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts. Journal of Data and Information Science, 2019, 4(4): 42-55.

URL:

http://manu47.magtech.com.cn/Jwk3_jdis/10.2478/jdis-2019-0020     OR     http://manu47.magtech.com.cn/Jwk3_jdis/Y2019/V4/I4/42

Figure 1. An example of an abstract.
Figure 2. Sentence representations.
Figure 3. The architecture of the masked sentence model based on BERT.
Label The content of the sentence
Methods We selected the major journals (11 journals) collecting papers (more than 7,000) over the last five years from the top members of the research community, and read and analyzed the papers (more than 200) covering the topics.
Table 1 Data format of sentence content.
Label The context of the sentence
Methods This survey aims at reviewing the literature related to Clinical Information Systems (CIS), Hospital Information Systems (HIS), Electronic Health Record (EHR) systems, and how collected data can be analyzed by Artificial Intelligence (AI) techniques. aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa. Then, we completed the analysis using search engines to also include papers from major conferences over the same five years. We defined a taxonomy of major features and research areas of CIS, HIS, EHR systems. We also defined a taxonomy for the use of Artificial Intelligence (AI) techniques on healthcare data. In the light of these taxonomies, we report on the most relevant papers from the literature. We highlighted some major research directions and issues which seem to be promising and to need further investigations over a medium- or long-term period.
Table 2 Data format of the sentence’s context.
Label The content & context of the sentence
Methods We selected the major journals (11 journals) collecting papers (more than 7,000) over the last five years from the top members of the research community, and read and analyzed the papers (more than 200) covering the topics.
Methods This survey aims at reviewing the literature related to Clinical Information Systems (CIS), Hospital Information Systems (HIS), Electronic Health Record (EHR) systems, and how collected data can be analyzed by Artificial Intelligence (AI) techniques. aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa aaa. Then, we completed the analysis using search engines to also include papers from major conferences over the same five years. We defined a taxonomy of major features and research areas of CIS, HIS, EHR systems. We also defined a taxonomy for the use of Artificial Intelligence (AI) techniques on healthcare data. In the light of these taxonomies, we report on the most relevant papers from the literature. We highlighted some major research directions and issues which seem to be promising and to need further investigations over a medium- or long-term period.
Table 3 Data format for integrating sentence content and context.
Label P R F1 Support
Background 64.37 75.85 69.64 3,077
Objectives 73.55 56.97 64.20 2,333
Methods 92.42 94.97 93.68 9,884
Results 92.08 91.09 91.58 9,713
Conclusions 84.95 81.38 83.13 4,571
Avg / Total 86.75 86.61 86.53 29,578
Table 4 The results of Exp1: based on the content of sentences.
Label P R F1 Support
Background 72.27 79.72 75.82 3,077
Objectives 70.51 60.27 64.99 2,333
Methods 90.70 89.80 90.25 9,884
Results 87.71 89.20 88.45 9,713
Conclusions 90.19 89.30 89.74 4,571
Avg / Total 86.13 86.15 86.09 29,578
Table 5 The results of Exp2: based on the context of sentences.
Label P R F1 Support
Background 75.26 81.18 78.11 3,077
Objectives 78.08 61.98 69.10 2,333
Methods 92.98 97.48 95.17 9,884
Results 96.02 93.74 94.87 9,713
Conclusions 94.70 94.51 94.60 4,571
Avg / Total 91.22 91.30 91.15 29,578
Table 6 The results of Exp3: based on MSM integrated information.
Label Exp1 Exp2 Exp3 Exp3-Exp1 Exp3-Exp2
F1 F1 F1 +F1 +F1
Background 69.64 75.82 78.11 8.47 2.29
Objectives 64.20 64.99 69.10 4.9 4.11
Methods 93.68 90.25 95.17 1.49 4.92
Results 91.58 88.45 94.87 3.29 6.42
Conclusions 83.13 89.74 94.60 11.47 4.86
Avg / Total 86.53 86.09 91.15 4.62 5.06
Table 7 Comparison of the results of the experiments.
Models F1 (PubMed 20k RCT)
Our Model MaskedSentenceModel_BERT 91.15
Others HSLN-RNN (Jin and Szolovits, 2018) (SOTA) 92.6
BERT-Base (Beltagy et al., 2018) 86.19
Sci BERT (SciVocab) (Beltagy et al., 2018) 86.80
Sci BERT (BaseVocab) (Beltagy et al., 2018) 86.81
Table 8 PubMed 20k RCT results.
[1]   Amini I., Martinez D., & Molla D. (2012). Overview of the ALTA 2012 shared task. In Proceedings of the Australasian Language Technology Association Workshop 2012: ALTA 2012(pp. 124-129). Dunedin, New Zealand.
[2]   Badie K., Asadi N., & Tayefeh Mahmoudi M. (2018). Zone identification based on features with high semantic richness and combining results of separate classifiers. Journal of Information and Telecommunication, 2(4), 411-427.
doi: 10.1080/24751839.2018.1460083
[3]   Basili , R. &Pennacchiotti , M.(2010). Distributional lexical semantics: Toward uniform representation paradigms for advanced acquisition and processing tasks. Natural Language Engineering, 1(1), 1-12.
doi: 10.1017/S1351324900000036
[4]   Beltagy I., Lo K., & Cohan, A. (2019). SciBERT: Pretrained contextualized embeddings for scientific text. arXiv:1903.10676v3.
[5]   Dasigi P., Burns G.A.P.C., Hovy E., & Waard A. (2017). Experiment segmentation in scientific discourse as clause-level structured prediction using recurrent neural networks. arXiv:1702.05398.
[6]   Devlin J., Chang M.W., Lee K., & Toutanova K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.
[7]   Ding L.P., Zhang Z.X., & Liu H. (2019). Research on factors affecting the SVM model performance on move recognition. Data Analysis and Konwledge Discovery,.
[8]   Firth J.R.(1930). A synopsis of linguistic theory, 1930-1955. In: Firth, J.R., Ed., Studies in Linguistic Analysis, Longmans, London, 168-205.
[9]   Fisas B., Ronzano F., & Saggion H. (2016). A multi-layered annotated corpus of scientific papers. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).
[10]   Franck Dernoncourt & Ji Young Lee. (2017). Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts. In Proceedings of the 8th International Joint Conference on Natural Language Processing.
[11]   Gerlach M., Peixoto T.P., Altmann E.G., & Altmann E.G. (2018). A network approach to topic models. Science advances, 4(7), eaaq1360.
doi: 10.1126/sciadv.aay7323 pmid: 31844673
[12]   Hirohata K., Okazaki N., Ananiadou S., & Mitsuru. (2018). Identifying sections in scientific abstracts using conditional random fields. In Proceedings of the Third International Joint Conference on Natural Language Processing.
[13]   Ma M.B., Huang L., Xiang B., & Zhou B.W. (2015). Dependency-based convolutional neural networks for sentence embedding. arXiv:1507.01839.
[14]   Peters M.E., Neumann M., Iyyer M., et al. (2018). Deep contextualized word representations.In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. doi: 10.18653/v1/N18-1202 arXiv:1802.05365.
[15]   Radford A., Narasimhan K., Salimans T., & Sutskever Ilya (2018). Improving language understanding by generative pre-training.
[16]   Lai S.W., Xu L., Liu K., & Zhao J. (2015). Recurrent convolutional neural networks for text classification. In AAAI’15 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pages 2267-2273.
[17]   Swales J.M.(2004).Research genres:Explorations and applications. Cambridge: Cambridge University Press.
[18]   Taylor , W.L. (1953). “Cloze procedure”: A new tool for measuring readability. Journalism & Mass Communication Quarterly, 30(4), 415-433. doi: https://doi.org/10.1177/107769905303000401
doi: 10.1002/hpja.316 pmid: 31846517
[19]   Teufel , S. (1999). Argumentative zoning: Information extraction from scientific text. Edinburgh: University of Edinburgh.
[20]   Vaswani A., Shazeer N., Parmar N., et al. (2017). Attention is all you need. arXiv:1706.03762v5.
doi: 10.1016/j.ijnurstu.2019.103474 pmid: 31835121
[21]   Yamamoto , Y. &Takagi, T. (2005). A sentence classification system for multi-document summarization in the biomedical domain. In Proceedings of International Workshop on Biomedical Data Engineering, pages 90-95.
[22]   Yoon Kim. (2014). Convolutional neural networks for sentence classification. arXiv:1408.5882.
doi: 10.1016/j.jbi.2019.103205 pmid: 31085324
No related articles found!