Rediscovering Don Swanson: The Past, Present and Future of Literature-based Discovery
Neil R. Smalheiser
Journal of Data and Information Science    2017, 2 (4): 43-64.   doi:10.1515/jdis-2017-0019
Accepted: 03 November 2009

Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don’s contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review.Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions).Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery.Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (, as does BITOLA which is maintained by Dmitar Hristovski (http://, and Epiphanet which is maintained by Trevor Cohen ( Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic.Originality/value: This paper discusses problems and issues which were inherent in Don’s thoughts during his life, including those which have not yet been fully taken up and studied systematically.

Big Metadata, Smart Metadata, and Metadata Capital: Toward Greater Synergy Between Data Science and Metadata
Jane Greenberg
Journal of Data and Information Science    2017, 2 (3): 19-36.   doi:10.1515/jdis-2017-0012
Accepted: 25 August 2017

Abstract378)   HTML0)    PDF (1602KB)(472)      

Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research. This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science.

Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science.

Findings: The “utilitarian nature” and “historical and traditional views” of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part of a metadata lingua franca to help frame research in the data science research space.

Research limitations:There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore.

Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem.

Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.

Science Mapping: A Systematic Review of the Literature
Chen Chaomei
Journal of Data and Information Science    2017, 2 (2): 1-40.   doi:10.1515/jdis-2017-0006
Accepted: 25 February 2017

Abstract854)   HTML321)    PDF (2872KB)(1821)      

Purpose: We present a systematic review of the literature concerning major aspects of science mapping to serve two primary purposes: First, to demonstrate the use of a science mapping approach to perform the review so that researchers may apply the procedure to the review of a scientific domain of their own interest, and second, to identify major areas of research activities concerning science mapping, intellectual milestones in the development of key specialties, evolutionary stages of major specialties involved, and the dynamics of transitions from one specialty to another.

Design/methodology/approach: We first introduce a theoretical framework of the evolution of a scientific specialty. Then we demonstrate a generic search strategy that can be used to construct a representative dataset of bibliographic records of a domain of research. Next, progressively synthesized co-citation networks are constructed and visualized to aid visual analytic studies of the domain’s structural and dynamic patterns and trends. Finally, trajectories of citations made by particular types of authors and articles are presented to illustrate the predictive potential of the analytic approach.

Findings: The evolution of the science mapping research involves the development of a number of interrelated specialties. Four major specialties are discussed in detail in terms of four evolutionary stages: conceptualization, tool construction, application, and codification. Underlying connections between major specialties are also explored. The predictive analysis demonstrates citations trajectories of potentially transformative contributions.

Research limitations: The systematic review is primarily guided by citation patterns in the dataset retrieved from the literature. The scope of the data is limited by the source of the retrieval, i.e. the Web of Science, and the composite query used. An iterative query refinement is possible if one would like to improve the data quality, although the current approach serves our purpose adequately. More in-depth analyses of each specialty would be more revealing by incorporating additional methods such as citation context analysis and studies of other aspects of scholarly publications.

Practical implications: The underlying analytic process of science mapping serves many practical needs, notably bibliometric mapping, knowledge domain visualization, and visualization of scientific literature. In order to master such a complex process of science mapping, researchers often need to develop a diverse set of skills and knowledge that may span multiple disciplines. The approach demonstrated in this article provides a generic method for conducting a systematic review.

Originality/value: Incorporating the evolutionary stages of a specialty into the visual analytic study of a research domain is innovative. It provides a systematic methodology for researchers to achieve a good understanding of how scientific fields evolve, to recognize potentially insightful patterns from visually encoded signs, and to synthesize various information so as to capture the state of the art of the domain.

Patent Citations Analysis and Its Value in Research Evaluation: A Review and a New Approach to Map Technology-relevant Research
van Raan Anthony F.J.
Journal of Data and Information Science    2017, 2 (1): 13-50.   doi:10.1515/jdis-2017-0002
Accepted: 03 December 2016

Abstract154)   HTML161)    PDF (1446KB)(338)      

First, to review the state-of-the-art in patent citation analysis, particularly characteristics of patent citations to scientific literature (scientific non-patent references, SNPRs). Second, to present a novel mapping approach to identify technology-relevant research based on the papers cited by and referring to the SNPRs.

In the review part we discuss the context of SNPRs such as the time lags between scientific achievements and inventions. Also patent-to-patent citation is addressed particularly because this type of patent citation analysis is a major element in the assessment of the economic value of patents. We also review the research on the role of universities and researchers in technological development, with important issues such as universities as sources of technological knowledge and inventor-author relations. We conclude the review part of this paper with an overview of recent research on mapping and network analysis of the science and technology interface and of technological progress in interaction with science. In the second part we apply new techniques for the direct visualization of the cited and citing relations of SNPRs, the mapping of the landscape around SNPRs by bibliographic coupling and co-citation analysis, and the mapping of the conceptual environment of SNPRs by keyword co-occurrence analysis.

We discuss several properties of SNPRs. Only a small minority of publications covered by the Web of Science or Scopus are cited by patents, about 3%-4%. However, for publications based on university-industry collaboration the number of SNPRs is considerably higher, around 15%. The proposed mapping methodology based on a “second order SNPR approach” enables a better assessment of the technological relevance of research.

The main limitation is that a more advanced merging of patent and publication data, in particular unification of author and inventor names, in still a necessity.

The proposed mapping methodology enables the creation of a database of technology-relevant papers (TRPs). In a bibliometric assessment the publications of research groups, research programs or institutes can be matched with the TRPs and thus the extent to which the work of groups, programs or institutes are relevant for technological development can be measured.

The review part examines a wide range of findings in the research of patent citation analysis. The mapping approach to identify a broad range of technology-relevant papers is novel and offers new opportunities in research evaluation practices.

Under-reporting of Adverse Events in the Biomedical Literature
Ronald N. Kostoff
Journal of Data and Information Science    2016, 1 (4): 10-32.   doi:10.20309/jdis.201623
Accepted: 27 September 2016

Abstract670)      PDF (1649KB)(662)      
Purpose: To address the under-reporting of research results, with emphasis on the underreporting/distorted reporting of adverse events in the biomedical research literature.
Design/methodology/approach: A four-step approach is used:(1) To identify the characteristics of literature that make it adequate to support policy; (2) to show how each of these characteristics becomes degraded to make inadequate literature; (3) to identify incentives to prevent inadequate literature; and (4) to show policy implications of inadequate literature.
Findings: This review has provided reasons for, and examples of, adverse health effects of myriad substances (1) being under-reported in the premiere biomedical literature, or (2) entering this literature in distorted form. Since there is no way to gauge the extent of this under/distorted-reporting, the quality and credibility of the‘premiere’biomedical literature is unknown. Therefore, any types of meta-analyses or scientometric analyses of this literature will have unknown quality and credibility. The most sophisticated scientometric analysis cannot compensate for a highly flawed database.
Research limitations: The main limitation is in identifying examples of under-reporting. There are many incentives for under-reporting and few dis-incentives.
Practical implications: Almost all research publications, addressing causes of disease, treatments for disease, diagnoses for disease, scientometrics of disease and health issues, and other aspects of healthcare, build upon previous healthcare-related research published. Many researchers will not have laboratories or other capabilities to replicate or validate the published research, and depend almost completely on the integrity of this literature. If the literature is distorted, then future research can be misguided, and health policy recommendations can be ineffective or worse.
Originality/value: This review has examined a much wider range of technical and nontechnical causes for under-reporting of adverse events in the biomedical literature than previous studies.
