Journal of Data and Information Science ›› 2020, Vol. 5 ›› Issue (3): 97-115.doi: 10.2478/jdis-2020-0020

• Research Papers • Previous Articles     Next Articles

A Novel Method for Resolving and Completing Authors’ Country Affiliation Data in Bibliographic Records

Ba Xuan Nguyen1,2,(), Jesse David Dinneen1,3, Markus Luczak-Roesch1,4   

  1. 1School of Information Management, Victoria University of Wellington, Wellington, New Zealand
    2Posts and Telecommunications Institute of Technology, Ho Chi Minh City, Vietnam
    3Berlin School of Library and Information Science, Humboldt-Universit?t zu Berlin, Berlin, Germany
    4Te Pūnaha Matatini - The New Zealand Centre of Research Excellence for Complex Systems and Networks, New Zealand;
  • Received:2020-02-01 Revised:2020-05-18 Accepted:2020-05-18 Online:2020-07-20 Published:2020-09-04
  • Contact: Ba Xuan Nguyen


Purpose: Our work seeks to overcome data quality issues related to incomplete author affiliation data in bibliographic records in order to support accurate and reliable measurement of international research collaboration (IRC).

Design/methodology/approch: We propose, implement, and evaluate a method that leverages the Web-based knowledge graph Wikidata to resolve publication affiliation data to particular countries. The method is tested with general and domain-specific data sets.

Findings:Our evaluation covers the magnitude of improvement, accuracy, and consistency. Results suggest the method is beneficial, reliable, and consistent, and thus a viable and improved approach to measuring IRC.

Research limitations: Though our evaluation suggests the method works with both general and domain-specific bibliographic data sets, it may perform differently with data sets not tested here. Further limitations stem from the use of the R programming language and R libraries for country identification as well as imbalanced data coverage and quality in Wikidata that may also change over time.

Practical implications: The new method helps to increase the accuracy in IRC studies and provides a basis for further development into a general tool that enriches bibliographic data using the Wikidata knowledge graph.

Originality/value: This is the first attempt to enrich bibliographic data using a peer-produced, Web-based knowledge graph like Wikidata.

Key words: International research collaboration measurement, Bibliographic data, Country identification, Knowledge graphs, Wikidata, Open data