Journal of Data and Information Science, 2019, 4(4): 84-95
doi: 10.2478/jdis-2019-0022
Are Contributions from Chinese Physicists Undercited?
Jinzhong Guo1,2, Xiaoling Liu1, Liying Yang2, Jinshan Wu3,
 Cite this article:
Jinzhong Guo, Xiaoling Liu, Liying Yang, Jinshan Wu. Are Contributions from Chinese Physicists Undercited?. Journal of Data and Information Science[J], 2019, 4(4): 84-95 doi:10.2478/jdis-2019-0022

Abstract:

Purpose: In this work, we want to examine whether or not there are some scientific fields to which contributions from Chinese scholars have been under or over cited.

Design/methodology/approach: We do so by comparing the number of received citations and the IOF of publications in each scientific field from each country. The IOF is calculated from applying the modified closed system input-output analysis (MCSIOA) to the citation network. MCSIOA is a PageRank-like algorithm which means here that citations from the more influential subfields are weighted more towards the IOF.

Findings: About 40% of subfields in physics in China are undercited, meaning that their net influence ranks are higher (better) than the direct rank, while about 75% of subfields in the USA and German are undercited

Research limitations: Only APS data is analyzed in this work. The expected citation influence is assumed to be represented by the IOF, and this can be wrong.

Practical implications: MCSIOA provides a measure of net influences and according to that measure. Overall, Chinese physicists’ publications are more likely overcited rather than being undercited.

Originality/value: The issue of under or over cited has been analyzed in this work using MCSIOA.

Key words: Input-Output Analysis ; Scientific impact ; Citation networks
1 Introduction

Evaluating scientific contribution from each country to each discipline has been a trending topic in Scientometrics. In practice, for science policymakers, it is also important to have proper recognition of their own country’s academic position, especially for the fast-developing countries like China. Often in this kind of study, researchers count the number of publications or received citations of the countries and in the fields of interest and rank them accordingly (King, 2004; Meho & Yang, 2007; Moed & Halevi, 2015).

However, a simple counting of publications and citations might underestimate or overestimate the contribution of a country. How the papers are cited, say by important mile-stone like papers or by insignificant homework-like papers, should make a large difference. In fact, that is exactly the idea behind PageRank algorithm (Brin & Page, 1998) and Leontief Input-Output Analysis (LIOA) of Economics (Leontief, 1941; Miller & Blair, 2009): citation or more generally a linkage from a more influential node, which can be papers, web pages, economic sectors, and scientific fields, should weight more than that from an insignificant node. In the LIOA, economists are interested in the same question. Given an input-output table between economic sectors, which sector is more influential than others. For precisely this purpose, Loentief proposed the LIOA, which regard the economic sectors and the input-output relation among them as an open system, while taking final demanders and labor input as the external sector (Leontief, 1941). In Shen et al. (2016), we extended LIOA into a Modified closed system input-output analysis (MCSIOA) to make the analysis to be applicable to an input-output table of any flow between any nodes beyond economic sectors and to even closed systems.

LIOA starts from the direct input-output coefficient matrix B, where Bij means how many units of a product i is needed to produce one unit of product j. Then the idea of LIOA is that for each unit of product j from the final demanders, denoted as Yj, the economic system needs to produce first Yj, and then also the raw materials need to produce Yj, thus BYj, and then also the raw materials to produce BYj,thus BBYj = B2Yj and so on. Overall, one arrives at the famous Leontief inverse input-output matrix X = Yj + BYj + B2Yj + … = (1 - B)-1Yj == LYj. Depending on the structure of L, some sectors Yj might lead to a large X, even when it is needed by the final demanders only a small amount. In terms of scientific influence, this is like to say that if a field A cites heavily a field B and the field B also cites heavily a field C, then the field A should be considered strongly influenced by the field C.

MCSIOA (Shen et al., 2016) follows the same spirit but works on closed systems where there is not a natural external sector like the final demanders in LIOA. Therefore, instead of a matrix inverse, which requires an external sector, MCSIOA uses the largest matrix eigenvalue and the corresponding eigenvector (called the largest eigen pair for simplicity), which is applicable to closed systems and also takes into account higher-order effects of the matrix. To see why the largest eigen pair includes higher-order effects, one can use the power method calculation of it: starting from a random initial vector X0, the iterative multiplication Xn = BXn-1 leads to eventually the largest eigenvalue pair. we also refer the readers to Shen et al. (2016) for further details, which will also be briefly explained in the section of Data and Method.

In this work, we will apply MCSIOA to an input-output table between the subfields of physics from major countries. We will regard the overall influence of each sector calculated by MCSIOA as the net influence and use the number of citations as the direct influence of each sector, and then compare the net and the direct influence. When a country is ranked higher according to the net influence than the one according to the direct influence, then we say publications from that country have been undercited. We say publications from a country are overcited if the net influence rank is lower than the rank according to the direct citation counts.

We are fully aware that the above approach is just one possible way to measure the net influence. Simply counting of citations is a limited influence measure, but it is easy to understand and easy to implement. The use of MCSIOA, on the other hand, requires further justification. However, from the above explanation of its spirit and also from its success in economics application and the previous Scientometric study, we think it provides a reasonable measure of the net influence by taking not only the direct but also the indirect connections into consideration.

The rest of this manuscript is organized as follow, the data and method are in section 2. In section 3, we illustrate the main results. Conclusions and discussions will be in section 4.

2 Data and method

The data we use in this work is provided to us by the American Physical Society (APS) and it includes all papers published in APS journals between 1977 and 2013. There are a total of 404,496 papers and 6,039,964 citations. All papers have been classified according to the Physics and Astronomy Classification Scheme (PACS) codes, which is a classification system of subfields of physics. We chose the first and third-level PACS code for our analysis. There are F = 1,281 subfields in total.

Totally 165 countries and regions are identified from 337,768 authors’ address. For non-USA addresses, the last part of the address string is usually a country name. For that, we match this last part to a list of countries. For the USA addresses, there is often not a country name in the address strings, we then match the state names to a list of states in the USA. In very rare cases, we find a match of the last part of the address string in both lists and in those cases, we check each of them manually. We use the full count when assigning papers to countries. As illustrated in Figure 1, for each citation between a citing paper A and a cited paper B, we identify the corresponding fields fA, fB and country cA, cB, which can be more than one, and then add this citation to the citation count from $f^{A×C^{A}}$ to $f^{B×C^{B}}$,


Figure 1.

(a) paper B, in field 75.10 (also 75.30, 75.40, and 75.50) and from Japan, is cited by paper A, in field 75.10 (also 75.30) and from the USA, German and Japan. (b) Citations (from A to B and from A to C) are converted into a citation network among the countries × subfields of physics.

$x_{f^{A×C^{A}}}^{f^{B×C^{B}}}=x_{f^{A×C^{A}}}^{f^{B×C^{B}}}+1$

We then keep the countries/regions with more than 1,000 publications in our record and group other countries and call it “others”. In the end, there are C = 45 countries/regions in our record.

Once we have the input-output matrix $x=(x_{j}^{i})_{(C·F)×(C·F)}$, where each element xij representing the number of citations from j to i, we define the direct input-output coefficient matrix F

$F_{j}^{i}=\frac{x_{j}^{i}}{\sum_{k}x_{i}^{k}}$$F_{j}^{i}=\frac{x_{j}^{i}}{\sum_{k}x_{i}^{k}}$

and perform the MCSIOA that is the net influence of sector j is (Shen et al., 2016).

$S_{IO}^{j}=1-λ_{MAX}^{-j}$

where $λ_{MAX}^{-j}$ are the largest eigenvalue of F(- j), which is the matrix after removing the jth row and column of F. It is called IOF in (Shen et al., 2016). We then rank all sectors according to respectively their total number of received citations $X^j=\sum_{k}x_{k}^{j}$ and their IOF $S_{IO}^{j}$ and compare the two ranks to determine sector j is overcited or undercited.

The idea behind the definition of $S_{IO}^{j}$ can be seen from the following two facts. Firstly, the largest eigenvalue of F takes into account both direct and indirect connections in F. Secondly, the largest eigenvalue of the original F matrix is 1, and it can be regarded as the production efficiency of F, meaning that all the input when supplied according to the right combination, the corresponding eigenvector. Therefore, the largest eigenvalue F(- j) also captures both direct and indirect connections, and it means the percentage of production efficiency of the system after the sector j removed, due to which some unmatched supplies will be wasted. Therefore, if the sector j is well-connected to the rest of the system and rest of the system deeply relies on sector j, then the percentage of production efficiency of the rest of the system will be low. Our previous study (Shen et al., 2016) have shown that indeed influential sectors do lead to large $S_{IO}^{j}$.

The difference between net influence and the direct influence and the direct influence means that indirect citation does not follow a similar pattern of direct citations. For example, if most of the citations are from the same sector (country × field), then the dissipation power of the sector is lower, and then the net influence will also be lower. Or the net influence is lower when most of the citations come from low impact papers. On the other hand, if most citations from other sectors and from high impact papers, the net influence will be higher.

3 Results
3.1 The direct input-output flow among countries/regions

We first illustrate the input-output flow among countries/regions on a world map (Csomos, 2018). The nodes indicate countries/regions and the links represent the citations of scientific papers within APS. As shown in Figure 2, each link is color-coded. The red (green) part corresponds to the number of received citations (citing references). The thickness of the line corresponds to the number of citations, the thicker, the larger. For each line, the node starting with the red (green) line has more (less) received citations than the number of citing references. For example, the edge between the USA and Europe is red near the USA and green near Europe, and this means that the USA received more citations from Europe than the other way around. This line is also quite thick, indicating there are a lot of citations on this line. Furthermore, the ratio between the length of red and green parts is set to be the ratio between the received citation of the USA and the received citation of Europe. In this way, we code a lot of information on this world map. We can see that the edge between Europe and Japan is red near Japan while the edge between Europe and China is red near Europe. On each node c, we also plot the number of received citations and the calculated IOF, $S_{IO}^{c}$.


Figure 2.

The direct citations are shown on a world map. For each node c, we show the number of received citations and the calculated IOF, $S_{IO}^{c}$. On each edge $ e _{i}^{j}$ on the world map, we code with the thickness of the line the value of both $ x _{i}^{j}$ and $ x _{i}^{j}$:$ e _{ j }^{i}$ is the line near i and $ e _{ j }^{i}$ is the line near j. Each edge is also color-coded: the line starting from i is red when $ x _{i}^{j}$> $ x _{j}^{i}$ and green otherwise.

We can see from this would map in Figure 2 that the USA is a source of knowledge to all other countries and regions since the lines starting from the USA are all red. On the contrary, China is more like a sink, or a consumer of knowledge since almost all the lines starting from China are green. The Europe as a whole can be regarded as a proxy where edges are mixed with red and green colors. Japan can also be seen as a proxy (Zhang et al., 2013).

In principle, we can draw such a world map for each subfield, but then by looking at each subfield separately, we will be missing the citations among subfields. Therefore, next, we apply MCSIOA to the input-output table of countries × subfields to take into account of those cross-field citations and also to provide a measure of net influence of each subfield in each country so that we can compare the direct and the net influence to answer the equation raised in the introduction that which field in which country is undercited or overcited.

Based on this input-output matrix $x=(x_{i}^{j})_{(C·F)×(C·F }$ and the MCSIOA analysis (Shen et al., 2016), we calculate the IOF of each of the 972 subfields of physics in each of the 45 countries and regions. We then rank all the countries × subfields together and compare these two ranks.

3.2 The net influence (IOF) ranks of countries × subfields

From Figure 3, we see that many USA subfields are above the diagonal line. Thus, they are undercited. Or we say that according to their net influence, there should be more citations to these USA subfields. Meanwhile, a lot of Chinese subfields are below the diagonal line. Thus, they are overcited. To see how many subfields in each country are above or under the diagonal line, we count the percentage of undercited fields and also calculate the relative and absolute ranking difference, and show them in.


Figure 3.

The net influence rank of countries × subfields. Each country is represented by its flag. The ones with higher net influence rank than the direct rank are above the diagonal line, thus undercited, while they are under the diagonal line when their net influence ranks are lower, thus overcited.

The relative and absolute ranking difference, which sometimes are also called ranking mobility (Dagostino & Dardanoni, 2009), is respectively defined as

$M_c=\sum_{f}R_{c,f}^{(d)}-R_{c,f}^{(n)}$ (4)

$|M|_c=\sum_{f}|R_{c,f}^{(d)}-R_{c,f}^{(n)}|$ (5)

Where $R_{c,f}^{(d)} $ and $R_{c,f}^{(n)}$ are respectively the direct citation count rank and the net influence rank of the subfield f of the country c. For an overall undercited country, Mc will be larger than zero. |Mc| shows how large is the difference between the direct rank and the net rank.

From Figure 4(a), we see that physicists from the USA have made a contribution to most subfields (972 out of 1,281) and also the percentage of undercited subfields are high 74%=$\frac{724}{972}$. Similar situations are found for German (844 out of 1,281, and 75%=$\frac{724}{972}$), France (806 out of 1,281, and 75%=$\frac{589}{806}$), and British (757 out of 1,281, 75%=$\frac{547}{757}$). For China, the coverage is 696 out of 1,281, and the percentage of the undercited fields is 40%=$\frac{492}{696}$. Both coverage percentages and undercited percentages are much lower than those of the countries mentioned above. Furthermore, in Figure 4(b), we look into the relative and absolute ranking differences of each country. Mc provides more detailed information than the undercited percentage. We found that again, the USA, German, France have high Mc while China, Iran, and Korean are the countries with the lowest Mc, indicating that overall publications from those countries are overcited.


Figure 4.

(a) Percentage of undercited fields of each country are plotted in a figure of the number of undercited fields v.s. the number of contributed fields. More information than just the percentage of undercited fields can be seen from (b) the relative and absolute ranking difference between the net and the direct rank of each country c.

For readers who are interested in knowing what are the undercited or overcited fields for each country, we provide a list of top 10 undercited or overcited subfields of four of the countries, including the USA, German, Japan, and China in Table 1. We call for domain experts to examine more closely this table and even the results of net and direct ranks of all major countries in each subfield. We will be happy to provide the corresponding data.

Table 1

Top 10 undercited or overcited subfields of USA and China.

4 Conclusion and discussion

In this work, using the IOF calculated from the general input-output analysis (Shen et al., 2016) as a measure of net influence and the direct citation counts as a measure of the direct influence, we discuss the question of whether or not publications from physicists from a country, especially China, have been undercited. We find that 75% percent of German subfields are undercited and 74% for the USA, while China has 40% percent undercited subfields. We also provide a list of such highly undercited or overcited subfields for each country.

The data we analyzed in this work is only on physics and only from the APS journals. The method is applicable and should be applied to other disciplines or even all the disciplines together. After all, evaluating and recognizing properly scientific contributions of our own countries properly can be meaningful not only to science policymakers and educators but also to individual researchers and even citizens.

The definition of overciting or under-citing in this work is based on the comparison between direct citation ranks and the IOF rank. We admit that this is not the only way to define overciting or under-citing.

Author contributions

Jinshan Wu (jinshanw@bnu.edu.cn), Liying Yang (yangly@mail.las.ac.cn) and Jinzhong Guo (guojinzhong123@163.com) designed this study. Guo and Xiaoling Liu (liuxiaoling.xmu@163.com) performed the analysis. All participated in writing up the manuscript.

The authors have declared that no competing interests exist.

References

[1]
Brin , S., &Page ,L. (1998). The anatomy of a large-scale hypertextual Web search engine.Computer Networks and ISDN systems, 30(1-7), 107-117.
DOI:10.1016/S0169-7552(98)00110-X      URL    
[Cite within: 1]
[2]
Csomos , G. (2018). A spatial scientometric analysis of the publication output of cities worldwide. Journal of Informetrics, 12(2), 547-566.
DOI:10.1016/j.joi.2018.05.003      URL    
[Cite within: 1]
[3]
Dagostino , M., &Dardanoni ,V. (2009). The measurement of rank mobility. Journal of Economic Theory, 144(4), 1783-1803.
DOI:10.1093/ptj/pzz168      PMID:31742357      URL    
Physical therapists need to be able to evaluate high-level gross motor skills of children to determine their capacity to engage in activities such as running, jumping, hopping, and stair climbing. The High-Level Mobility Assessment Tool (HiMAT) has excellent interrater and retest reliability and is less susceptible to a ceiling effect than existing mobility scales in children who are 6 to 17 years old and have traumatic brain injury.
[Cite within: 1]
[4]
King , D.A. (2004. The scientific impact of nations. Nature, 430(6997), 311-316. URL:2004). The scientific impact of nations. Nature, 430(6997), 311-316. URL: .
DOI:10.1038/s41586-019-1545-0      PMID:31619795      URL    
Since 2000, many countries have achieved considerable success in improving child survival, but localized progress remains unclear. To inform efforts towards United Nations Sustainable Development Goal 3.2-to end preventable child deaths by 2030-we need consistently estimated data at the subnational level regarding child mortality rates and trends. Here we quantified, for the period 2000-2017, the subnational variation in mortality rates and number of deaths of neonates, infants and children under 5 years of age within 99 low- and middle-income countries using a geostatistical survival model. We estimated that 32% of children under 5 in these countries lived in districts that had attained rates of 25 or fewer child deaths per 1,000 live births by 2017, and that 58% of child deaths between 2000 and 2017 in these countries could have been averted in the absence of geographical inequality. This study enables the identification of high-mortality clusters, patterns of progress and geographical inequalities to inform appropriate investments and implementations that will help to improve the health of all populations.
[Cite within: 1]
[5]
Leontief , W . (1941). The structure of the American economy, 1919-1929 Harvard University Press. Cambridge (new, enlarged edition, Oxford University Press, New York, 1951).
[Cite within: 2]
[6]
Meho , L.I., &Yang ,K. (2007). Impact of data sources on citation counts and rankings of list faculty: Web of Science versus Scopus and google scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105-2125.
DOI:10.1002/(ISSN)1532-2890      URL    
[7]
Miller R.E.,& Blair, P.D. (2009). Input-output analysis:Foundations and extensions. Cambridge University Press.
[Cite within: 1]
[8]
Moed , H.F., &Halevi ,G. (2015). Multidimensional assessment of scholarly research impact.Journal of the Association for Information Science and Technology, 66(10), 1988-2002. doi:10.1002/asi.23314.
DOI:10.1186/1756-9966-29-168      PMID:21172002      URL    
The Open Archive Initiative (OAI) refers to a movement started around the '90 s to guarantee free access to scientific information by removing the barriers to research results, especially those related to the ever increasing journal subscription prices. This new paradigm has reshaped the scholarly communication system and is closely connected to the build up of institutional repositories (IRs) conceived to the benefit of scientists and research bodies as a means to keep possession of their own literary production. The IRs are high-value tools which permit authors to gain visibility by enabling rapid access to scientific material (not only publications) thus increasing impact (citation rate) and permitting a multidimensional assessment of research findings.
[Cite within: 8]
[9]
Shen Z., Yang L., Pei J., Li M., Wu C., Bao J., Wei T., Di Z., Rousseau R., & Wu J. (2016). Interrelations among scientific fields and their relative influences revealed by an inputoutput analysis. Journal of Informetrics, 10(1), 82-97. doi:10.1016/j.joi.2015.11.002.
[Cite within: 1]
Resource
PDF downloaded times    
RichHTML read times    
Abstract viewed times    

Share
Export

External search by key words

Input-Output Analysis     
Scientific impact     
Citation networks     

External search by authors

Jinzhong Guo    
Xiaoling Liu    
Liying Yang    
Jinshan Wu    

Related articles(if any):