Journal of Data and Information Science ›› 2016, Vol. 1 ›› Issue (2): 45-59.doi: 10.20309/jdis.201613

• Research Paper • Previous Articles     Next Articles

Mining Related Articles for Automatic Journal Cataloging

Yuqing Mao1,2 & Zhiyong Lu2   

  1. 1 School of Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China;
    2 National Center for Biotechnology Information, National Library of Medicine, MD 20894, USA
  • Received:2015-12-14 Revised:2016-01-05 Online:2016-06-15 Published:2016-06-15
  • Contact: Yuqing Mao
  • Supported by:
    We would like to thank Dr. John Wilbur for his helpful discussion on this project. This research is supported by NIH Intramural Research Program, National Library of Medicine.

Abstract: Purpose: This paper is an investigation of the effectiveness of the method of clustering biomedical journals through mining the content similarity of journal articles.
Design/methodology/approach: 3,265 journals in PubMed are analyzed based on article content similarity and Web usage, respectively. Comparisons of the two analysis approaches and a citation-based approach are given.
Findings: Our results suggest that article content similarity is useful for clustering biomedical journals, and the content-similarity-based journal clustering method is more robust and less subject to human factors compared with the usage-based approach and the citation-based approach.
Research limitations: Our paper currently focuses on clustering journals in the biomedical domain because there are a large volume of freely available resources such as PubMed and MeSH in this field. Further investigation is needed to improve this approach to fit journals in other domains.
Practical implications: Our results show that it is feasible to catalog biomedical journals by mining the article content similarity. This work is also significant in serving practical needs in research portfolio analysis.
Originality/value: To the best of our knowledge, we are among the first to report on clustering journals in the biomedical field through mining the article content similarity. This method can be integrated with existing approaches to create a new paradigm for future studies of journal clustering.

Key words: PubMed, Journals, Cluster, Catalog, Text mining, Research evaluation