Journal of Data and Information Science ›› 2021, Vol. 6 ›› Issue (3): 123-145.doi: 10.2478/jdis-2021-0020

• Regular Papers • Previous Articles     Next Articles

RDFAdaptor: Efficient ETL Plugins for RDF Data Process

Jiao Li1*, Guojian Xian1,2*, Ruixue Zhao1,2, Yongwen Huang1, Yuantao Kou1,2, Tingting Luo1, Tan Sun2,3,()   

  1. 1Agricultural Information Institute of CAAS, Beijing 100081, China
    2Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
    3Chinese Academy of Agricultural Sciences, Beijing 10081, China
  • Received:2020-12-18 Revised:2021-02-26 Accepted:2021-03-09 Online:2021-08-20 Published:2021-07-22
  • Contact: Tan Sun E-mail:suntan@caas.cn

Abstract:

Purpose: The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats, thus developing out the necessity for RDF data processing with specific purposes. The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor, a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency.

Design/methodology/approach: The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats, and reuses the Java framework RDF4J as middleware that realizes access to data repositories, SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support. It can support effortless services with various configuration templates in multi-scenario applications, and help extend data process tasks in other services or tools to complement missing functions.

Findings: The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface, supports data integration and federation over multi-source heterogeneous repositories or endpoints, as well as manage linked data in hybrid storage mode.

Research limitations: The plugin set can support several application scenarios of RDF data process, but error detection/check and interaction with other graph repositories remain to be improved.

Practical implications: The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation, multi-format data conversion, remote RDF data migration, and RDF graph update in semantic query process.

Originality/value: This is the first attempt to develop components instead of systems that can include extract, consolidate, and store RDF data on the basis of an ecologically mature data warehousing environment.

Key words: RDF ETL solution, RDF data processing, Linked data, Portable plugins