Journal of Data and Information Science ›› 2020, Vol. 5 ›› Issue (4): 1934.doi: 10.2478/jdis20200034
• Research Paper • Previous Articles Next Articles
Xiaoli Chen^{1}^{,}^{2}^{,}^{†}(), Tao Han^{1}^{,}^{2}
Received:
20200208
Revised:
20200520
Accepted:
20200611
Online:
20200920
Published:
20201120
Contact:
Xiaoli Chen
Email:chenxl@mail.las.ac.cn
Table 1
Topicentity distribution of the 8th citation generation.
Topic  Top 10 Topic Entities 

1  numerical analysis, simulation, Monte Carlo, artificial intelligence, dynamic programming, probability, principal component analysis, experiment, Markov chain, controllers 
2  algorithm, simulation, Markov chain, Monte Carlo method, Monte Carlo, artificial intelligence, principal component analysis, experiment, program optimization, artificial neural network 
3  artificial neural network, algorithm, fingerprint, genetic programming, biological neural networks, CPU cache, backpropagation, neural network simulation, gradient, discontinuous Galerkin method 
4  artificial neural network, Boltzmann machine, restricted Boltzmann machine, generative model, backpropagation, pixel, speech recognition, deep learning, MNIST database, mixture model 
5  fault tolerance, data mining, artificial neural network, brute force search, algorithm, asymptotically optimal algorithm, backpropagation 
Table 2
Disappearing topics over each citation generation.
Gen  Top 10 Disappearing Topic Entities 

12  generative model, Boltzmann machine, restricted Boltzmann machine, algorithm, inference, pixel, latent variable, gradient, Markov chain, approximation algorithm 
318  artificial neural network, algorithm, generative model, backpropagation, nonlinear system, deep learning, gradient, speech recognition, hidden Markov model, pixel 
19  artificial neural network, generative model, machine learning, algorithm, restricted Boltzmann machine, convolutional neural network, image resolution, value ethics, Boltzmann machine, gradient 
20  artificial neural network, hidden Markov model, Markov model, nonlinear system, backpropagation, unsupervised learning, speech recognition, time series, cluster analysis, cognition disorders 
21  artificial neural network, nonlinear system, generative model, factor analysis, MNIST database, anatomical layer, deep learning, mixture model, unit, gradient 
22  pixel, restricted Boltzmann machine, gradient, artificial neural network, speech recognition, Boltzmann machine, unsupervised learning, statistical model, deep learning, network architecture 
Table 3
Inherited topics over each citation generation.
Gen  Top 10 Inherited Topic Entities 

17  artificial neural network, algorithm, deep learning, backpropagation, speech recognition, hidden Markov model, neural network simulation, machine learning, test set, nonlinear system 
89  algorithm, simulation, Markov chain, Monte Carlo method, Monte Carlo, artificial intelligence, principal component analysis, experiment, program optimization, artificial neural network 
1014  artificial neural network, backpropagation, generative model, Boltzmann machine, restricted Boltzmann machine, computer data storage, deep learning, speech recognition, feedforward neural network, nonlinear system 
1516  simulation, Monte Carlo method, Monte Carlo, algorithm, numerical analysis, Markov chain, dynamic programming, solutions, coefficient, experiment 
17  artificial neural network, gradient, matching polynomial, nonlinear system, spline interpolation, hidden Markov model, generative model, approximation algorithm, Bayesian network, factor analysis 
18  simulation, Monte Carlo method, Monte Carlo, computation, computation action, silicon, gradient, distortion, Markov chain, algorithm 
19  artificial neural network, generative model, machine learning, algorithm, restricted Boltzmann machine, convolutional neural network, image resolution, value ethics, Boltzmann machine, gradient 
20  artificial neural network, hidden Markov model, Markov model, nonlinear system, backpropagation, unsupervised learning, speech recognition, time series, cluster analysis, cognition disorders 
21  artificial neural network, nonlinear system, generative model, factor analysis, MNIST database, anatomical layer, deep learning, mixture model, unit, gradient 
22  artificial intelligence, mitral valve prolapse syndrome, greater than, power dividers and directional couplers, supervised learning, performance, meal occasion for eating, plasminogen activator, nominal impedance, platelet glycoprotein 4 human 
Table 4
Innovative topics over each citation generation.
Gen  Top 10 Innovative Topic Entities 

1  artificial intelligence, computation, machine learning, biological neural networks, experiment, neural tube defects, convolutional neural network, synthetic data, simulation, neural networks 
2  machine learning, experiment, supervised learning, simulation, program optimization, sparse matrix, neural networks, neural network simulation, computation, unsupervised learning 
3  greater than, solutions, classification, estimation theory, Eisenstein’s criterion, pattern recognition, cluster analysis, neural tube defects, feature selection, sensor 
4  robot, Monte Carlo, Markov model, Eisenstein’s criterion, rule guideline, neural network simulation, coefficient, numerical analysis, dynamic programming, high and low level 
5  numerical analysis, artificial intelligence, heuristic, experiment, solutions, Eisenstein’s criterion, computation, requirement, sensor, coefficient 
621  artificial intelligence, Monte Carlo method, biological neural networks, neural network simulation, Bayesian network, Markov chain 
22  principal component analysis, food, principal component, obesity, platelet glycoprotein 4 human, red meat, whole grains, eaf2 gene, diabetes mellitus, exercise 
[1]  Ammar, W., Groeneveld, D., Bhagavatula, C.S., Beltagy, I., Crawford, M., Downey, D.C., & Dunkelberger, J. (2018). Construction of the literature graph in semantic scholar. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. 3, pp. 8491. United States: Association for Computational Linguistics (ACL). doi: 10.18653/v1/N183011 
[2]  Bao, Y., Collier, N., & Datta, A. (2013). A partially supervised crosscollection topic model for crossdomain text classification. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 239248. New York, USA: ACM. doi: 10.1145/2505515.2505556 
[3]  Beykikhoshk, A., Phung, D., Arandjelovic, O., & Venkatesh, S. (2016). Analysing the history of autism spectrum disorder using topic models. In Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 762771. Montreal: IEEE. doi: 10.1109/dsaa.2016.65 
[4]  Blei, D.M., Ng, A.Y., Jordan, M.I., & Lafferty, J. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 9931022. 
[5] 
Cardenas, R., Bello, K., Coronado, A.M., & Villota, E. (2018). Improving topic coherence using entity extraction denoising. Prague Bull. Math. Linguistics, 110, 85101. doi: 10.2478/pralin20180004
doi: 10.2478/pralin20180004 
[6]  Chang, J. (2009). Relational topic models for document networks. In Proceedings of the Conference on AI and Statistics (AISTATS). 
[7] 
Chang, J., & Blei, D.M. (2010). Hierarchical relational models for document networks. Annals of Applied Statistics, 4(1), 124150.
doi: 10.1214/09AOAS309 
[8] 
Chen, C., Buntine, W., Ding, N., Xie, L., & Du, L. (2015). Differential topic models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 230242. doi: 10.1109/TPAMI.2014.2313127
doi: 10.1109/TPAMI.2014.2313127 pmid: 26353238 
[9]  Chen, X., & Han, T. (2019). How research milestone shape the technology of today—A case study of highly cited researcher using topic model. In Proceedings of the 17th International Conference on Scientometrics and Informetrics,ISSI 2019, pp. 25532554. Rome. 
[10] 
De Battisti, F., Ferrara, A., & Salini, S. (2015). A decade of research in statistics: A topic model approach. Scientometrics, 103, 413433. doi: 10.1007/s1119201515541
doi: 10.1007/s1119201515541 
[11]  Dietz, L., Bickel, S., & Scheffer, T. (2007). Unsupervised prediction of citation influences. ICML ‘07: In Proceedings of the 24th International Conference on Machine Learning, pp. 233240. Retrieved from https://doi.org/10.1145/1273496.1273526 
[12]  Doyle, G., & Elkan, C. (2009). Accounting for burstiness in topic models. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 281288. New York, USA: ACM. doi: 10.1145/1553374.1553410 
[13] 
Elgendi, M. (2019). Characteristics of a highly cited article: A machine learning perspective. IEEE Access, 7, 8797787986. doi: 10.1109/ACCESS.2019.2925965
doi: 10.1109/Access.6287639 
[14]  Gerrish, S.M., & Blei, D.M. (2010). A languagebased approach to measuring scholarly impact. In Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 375382. USA: Omnipress. Retrieved from http://dl.acm.org/citation.cfm?id=3104322.3104371 
[15]  Hall, D., Jurafsky, D., & Manning, C.D. (2008). Studying the history of ideas using topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 363371. Stroudsburg: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=1613715.1613763 
[16]  He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., & Giles, C. (2009). Detecting topic evolution in scientific literature: How can citations help? pp. 957966. doi: 10.1145/1645953.1646076 
[17] 
Hu, X., Rousseau, R., & Chen, J. (2011). On the definition of forward and backward citation generations. Journal of Informetrics, 5, 2736. doi: https://doi.org/10.1016/j.joi.2010.07.004
doi: 10.1016/j.joi.2010.07.004 
[18]  Iwata, T., Yamada, T., Sakurai, Y., & Ueda, N. (2010). Online multiscale dynamic topic models. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 663672. New York, USA: ACM. doi: 10.1145/1835804.1835889 
[19]  Jennifer, S., & Halem, M. (2018). Ontologygrounded topic modeling for climate science research. In Emerging Topics in Semantic Technologies. ISWC 2018 Satellite Events. AKA Verlag, Berlin. 
[20]  Kataria, S., Mitra, P., & Bhatia, S. (2010). Utilizing context in generative bayesian models for linked corpus. In M. Fox, & D. Poole (Ed.), In Proceedings of the 24th AAAI Conference on Artificial Intelligence. AAAI Press. Retrieved from http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1883 
[21] 
Kim, J., Kim, D., & Oh, A. (2017). Joint modeling of topics, citations, and topical authority in academic corpora. Transactions of the Association for Computational Linguistics, 5, 191204. Retrieved from https://transacl.org/ojs/index.php/tacl/article/view/1061
doi: 10.1162/tacl_a_00055 
[22]  Li, W., & Mccallum, A. (2006). Pachinko allocation: DAGstructured mixture models of topic correlations. In Proceedings of the 23rd International Conference on Machine Learning, pp. 577584. 
[23] 
Martínez, M.A., Herrera, M., Contreras, E., Ruíz, A., & HerreraViedma, E. (2015). Characterizing highly cited papers in Social Work through HClassics. Scientometrics, 102, 17131729. doi: 10.1007/s111920141460y
doi: 10.1007/s111920141460y 
[24]  Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems  Volume 2, pp. 31113119. USA: Curran Associates Inc. Retrieved from http://dl.acm.org/citation.cfm?id=2999792.2999959 
[25]  Mimno, D., Wallach, H.M., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262272. Stroudsburg: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=2145432.2145462 
[26]  Moody, C.E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. 
[27]  Musat, C.C., Velcin, J., TrausanMatu, S., & Rizoiu, M.A. (2011). Improving topic evaluation using conceptual knowledge. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain. 
[28]  Nallapati, R.M., Ahmed, A., Xing, E.P., & Cohen, W.W. (2008). Joint latent topic models for text and citations. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 542550. New York, USA: ACM. doi: 10.1145/1401890.1401957 
[29]  Nallapati, R., & Cohen, W. (2008). Linkplsalda: A new unsupervised model for topics and influence in blogs. International Conference on Weblogs and Social Media. 
[30]  Newman, D., Chemudugunta, C., & Smyth, P. (2006). Statistical entitytopic models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 680686. New York, USA: ACM. doi: 10.1145/1150402.1150487 
[31]  Newman, D., Lau, J.H., Grieser, K., & Baldwin, T. (2010. Automatic evaluation of topic coherence. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100108. Stroudsburg: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=1857999.1858011 
[32] 
Parker, J.N., Allesina, S., & Lortie, C.J. (2013). Characterizing a scientific elite (B): Publication and citation patterns of the most highly cited scientists in environmental science and ecology. Scientometrics, 94(2), 469480. doi: 10.1007/s1119201208596
doi: 10.1007/s1119201208596 
[33]  Paul, M., & Girju, C.R. (2009). Topic modeling of research fields: An interdisciplinary perspective. International Conference Recent Advances in Natural Language Processing, RANLP, 337342. 
[34]  Paul, M., & Girju, R. (2010). A twodimensional TopicAspect Model for discovering multi faceted topics. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp. 545550. 
[35]  Řehůřek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 4550. Valletta: ELRA. 
[36]  Risch, J., & Krestel, R. (2018, 6). My approach = Your apparatus? Entropybased topic modeling on multiple domainspecific text collections. In Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Fort Worth, TX, USA. doi: 10.1145/3197026.3197038 
[37]  Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pp. 399408. New York, USA: ACM. doi: 10.1145/2684822.2685324 
[38]  Salvatier, J., Wiecki, T., & Fonnesbeck, C. (2016). Probabilistic programming in Python using PyMC3. Peer J Computer Science, e55. doi: 10.7287/PEERJ.PREPRINTS.1686V1 
[39]  Shen, J., Song, Z., Li, S., Tan, Z., Mao, Y., Fu, L. . . . , & Wang, X. (2016). Modeling topic level academic influence in scientific literatures. Scholarly big data: AI perspectives, challenges, and ideas, Papers from the 2016 AAAI Workshop, Phoenix, Arizona, USA. Retrieved from http://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12598 
[40]  Wang, C., Blei, D., & Heckerman, D. (2008). Continuous time dynamic topic models. Tech. rep. Retrieved from https://www.microsoft.com/enus/research/publication/continuoustimedynamictopicmodels/ 
[41]  Wang, X., Zhai, C., & Roth, D. (2013). Understanding evolution of research themes: A probabilistic generative model for citations. In R. Parekh, J. He, D. S. Inderjit, P. Bradley, Y. Koren, R. Ghani, . . R. Uthurusamy (Ed.), KDD 2013  19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 11151123. Association for Computing Machinery. doi: 10.1145/2487575.2487698 
[42]  Wu, H., Wang, M., Feng, J., & Pei, Y. (2010). Research topic evolution in “Bioinformatics”. In Proceedings of the 4th International Conference on Bioinformatics and Biomedical Engineering, pp. 14. doi: 10.1109/ICBBE.2010.5516318 
[43] 
Wu, Q., Zhang, C., Hong, Q., & Chen, L. (2014). Topic evolution based on LDA and HMM and its application in stem cell research. Journal of Information Science, 40(5), 611620. doi: 10.1177/0165551514540565
doi: 10.1177/0165551514540565 
[44]  Xu, S., Shi, Q., Qiao, X., Zhu, L., Jung, H., Lee, S., & Choi, S.P. (2014). AuthorTopic over Time (AToT): A dynamic users’ interest model. In J. J. Park, H. Adeli, N. Park, & I. Woungang (Ed.), Mobile, Ubiquitous, and Intelligent Computing, pp. 239245. Berlin: Springer Berlin Heidelberg. 
[45] 
Yan, E. (2015). Research dynamics, impact, and dissemination: A topiclevel analysis: Research Dynamics, Impact, and Dissemination. Journal of the Association for Information Science and Technology, 66, 23572372. doi: 10.1002/asi.23324
doi: 10.1002/asi.2015.66.issue11 
[46]  Zhai, C., Velivelli, A., & Yu, B. (2004). A crosscollection mixture model for comparative text mining. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 743748. New York, USA: ACM. doi: 10.1145/1014052.1014150 
[47]  Zhang, J., Gerow, A., Altosaar, J., Evans, J., & Jean So, R. (2015). Fast, flexible models for discovering topic correlation across weaklyrelated collections. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 15541564. Lisbon: Association for Computational Linguistics. doi: 10.18653/v1/D151179 
[48]  Zhou, H.K., Yu, H.M., & Hu, R. (2017). Topic discovery and evolution in scientific literature based on content and citations. Frontiers of Information Technology & Electronic Engineering, 10, 15111532. doi: 10.1631/FITEE.1601125 
[1]  Chunlei YE. Mapping the evolution of research topics using ATM and SNA [J]. Journal of Data and Information Science, 2014, 7(4): 4662. 
Viewed  
Full text 


Abstract 

