Journal of Data and Information Science ›› 2023, Vol. 8 ›› Issue (1): 29-46.doi: 10.2478/jdis-2023-0003

• Research Paper • Previous Articles     Next Articles

Practical operation and theoretical basis of difference-in-difference regression in science of science: The comparative trial on the scientific performance of Nobel laureates versus their coauthors

Yurui Huang, Chaolin Tian, Yifang Ma()   

  1. Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen 518055, Guangdong, China
  • Received:2022-12-30 Accepted:2023-01-04 Online:2023-02-20 Published:2023-02-22
  • Contact: Yifang Ma (Email:


Purpose: In recent decades, with the availability of large-scale scientific corpus datasets, difference-in-difference (DID) is increasingly used in the science of science and bibliometrics studies. DID method outputs the unbiased estimation on condition that several hypotheses hold, especially the common trend assumption. In this paper, we gave a systematic demonstration of DID in the science of science, and the potential ways to improve the accuracy of DID method.

Design/methodology/approach: At first, we reviewed the statistical assumptions, the model specification, and the application procedures of DID method. Second, to improve the necessary assumptions before conducting DID regression and the accuracy of estimation, we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention. Lastly, we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates, by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors.

Findings: We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results. As a case study, we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors.

Research limitations: This study ignored the rigorous mathematical deduction parts of DID, while focused on the practical parts.

Practical implications: This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies.

Originality/value: This study gains insights into the usage of econometric tools in science of science.

Key words: Science of Science, Bibliometrics, Difference-in-Difference, CEM, PSM, Nobel Prize