Journal of Data and Information Science ›› 2021, Vol. 6 ›› Issue (1): 154-177.doi: 10.2478/jdis-2021-0006

• Research Papers • Previous Articles     Next Articles

Using Network Embedding to Obtain a Richer and More Stable Network Layout for a Large Scale Bibliometric Network

Ting Chen1,2,3, Guopeng Li3, Qiping Deng4, Xiaomei Wang3,()   

  1. 1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
    2Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100049, China
    3Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
    4Library, University of Electronic Science and Technology of China, Chengdu 611731, China
  • Received:2020-07-01 Revised:2020-09-22 Accepted:2020-10-20 Online:2021-02-20 Published:2020-11-30
  • Contact: Xiaomei Wang E-mail:wangxm@casisd.cn

Abstract:

Purpose: The goal of this study is to explore whether deep learning based embedded models can provide a better visualization solution for large citation networks.
Design/methodology/approach: Our team compared the visualization approach borrowed from the deep learning community with the well-known bibliometric network visualization for large scale data. 47,294 highly cited papers were visualized by using three network embedding models plus the t-SNE dimensionality reduction technique. Besides, three base maps were created with the same dataset for evaluation purposes. All base maps used the classic OpenOrd method with different edge cutting strategies and parameters.
Findings: The network embedded maps with t-SNE preserve a very similar global structure to the full edges classic force-directed map, while the maps vary in local structure. Among them, the Node2Vec model has the best overall visualization performance, the local structure has been significantly improved and the maps’ layout has very high stability.
Research limitations: The computational and time costs of training are very high for network embedded models to obtain high dimensional latent vector. Only one dimensionality reduction technique was tested.
Practical implications: This paper demonstrates that the network embedding models are able to accurately reconstruct the large bibliometric network in the vector space. In the future, apart from network visualization, many classical vector-based machine learning algorithms can be applied to network representations for solving bibliometric analysis tasks.
Originality/value: This paper provides the first systematic comparison of classical science mapping visualization with network embedding based visualization on a large scale dataset. We showed deep learning based network embedding model with t-SNE can provide a richer, more stable science map. We also designed a practical evaluation method to investigate and compare maps.

Key words: Scientometrics, Visualization, Essential science indicators, Bibliometric networks, Network embedding, Science mapping