Welcome to TripleGeo: An open-source tool for extracting geospatial features into RDF triples
TripleGeo is a utility developed by the Institute for the Management of Information Systems at Athena Research Center under the EU/FP7 project GeoKnow: Making the Web an Exploratory for Geospatial Knowledge. This generic purpose, open-source tool can be used for integrating features from geospatial databases into RDF triples.
TripleGeo is based on open-source utility geometry2rdf. However, this earlier tool (2010) has been substantially modified and enhanced to extract non-geographical attributes and also interact with diverse geographical and triple formats. TripleGeo is written in Java and is still under development; more enhancements will be included in future releases. However, all supported features have been tested and work smoothly in both MS Windows and Linux platforms.
The Java source code for TripleGeo is freely available from here.
Converting geospatial features into triples
From a user’s perspective, the utility works from command line in a transparent fashion according to some preconfigured settings. Execution is parameterized with a configuration file that declares user preferences for the conversion. TripleGeo provides the following functionality:
- It can take as input ESRI shapefiles,as well as spatial tables hosted in major DBMSs.
- It currently handles most common spatial data types, including points, (multi)linestrings and (multi)polygons.
- It can perform on-the-fly transformation of a given dataset into another spatial reference system.
- It can export geometries in several serialized formats, including WKT as prescribed by the recent GeoSPARQL standard.
When initiated, this process iterates through all features in the original dataset and emits a series of triples per record. Every geometric feature is turned into properly formatted triple(s), according to the specified vocabulary. Additional descriptive attributes can be extracted, including identifiers, names, or feature types. For the time being, such attributes are exported as literals, without taking into account any underlying ontology.
Architecture
TripleGeo has been implemented with several Java classes in a modular fashion as illustrated in the following flow diagram:
- Connectors to source data are required in order to access geometric features. In case of a DBMS, this is possible thanks to suitable JDBC drivers. With respect to shapefiles, the integrated GeoTools library provides all required functionality.
- A configuration file lists several properties that control several stages: how input source will be accessed, which data is involved, what geometric representation should be used, whether geometries must be transformed in another reference system, as well as the output format.
- A parser iterates through each input record and converts geometries into a suitable representation according to user specifications. It also consumes non-spatial attribute values (e.g., types, names) of the features involved and emits properly formatted literals.
- A Jena model is used to retain in memory all state information consisting of the collection of generated triples.
- Optionally, reprojection of geometries into another spatial reference system is also available.
- Export of generated triples into files is performed by the Jena API. This offers the possibility of writing the output into a single file at several triple formats as defined by the user.
Input
The current version of TripleGeo utility can access geometries from:
- ESRI shapefiles, a widely used file-based format for storing geospatial features.
- Several geospatially-enabled DBMSs, including:
- Oracle Spatial
- PostGIS (spatial module for PostgreSQL)
- MySQL
- IBM DB2 with Spatial extender.
Geospatial data must reside in a single table (in case of a database) or one shapefile. Currently, there is no support for combining information from several sources (e.g., by joining two or more tables).
Output
In terms of output serializations, triples can be obtained in one of the following formats:
- RDF/XML (default)
- RDF/XML-ABBREV
- N-TRIPLES
- N3
- TURTLE (also abbreviated as TTL).
Concerning geospatial representations, triples can be exported according to:
- the GeoSPARQL standard for several geometric types (including points, linestrings, and polygons)
- the WGS84 RDF Geoposition vocabulary for point features
- the Virtuoso RDF vocabulary for point features.
Results are written into a local file, so that they can be readily imported into a triple store.
Configuration settings
Before attempting any conversion using TripleGeo, a configuration file must be prepared. This file lists crucial properties that define how input data will be accessed, where they will be exported and into which format, as well as optional features (e.g., reprojection into another spatial reference system).
These settings include properties concerning:
- Input and output parameters, including paths for necessary files and output tripleformat.
- Target RDF vocabulary for geometric representation.
- Database credentials and features (when accessing a DBMS) OR shapefile features (from the file system)
- Namespace parameters and prefixes for the resources that will be generated as well as for the utilized ontology.
- Spatial Reference Systems, when tranformations should take places for geometries.
- Optional parameters (e.g., default language for string literals).
You may consult these sample configurations that cover several indicative cases in terms of data access and supported geometric types.
Execution
In order to use TripleGeo for extracting triples from a spatial dataset, the user should follow these steps:
- Open a terminal window and navigate to the directory where TripleGeo has been extracted. Normally, this folder includes a
lib/
subdirectory with the required libraries, as well as a configuration file (e.g., namedoptions.conf
). - Verify that Java JRE (or SDK) ver 1.7 or later is installed. Currently installed version of Java can be checked using:
java –version
from the command line. - Next, check all properties in the required configuration file, as explained in Section 3.3.2. This file must be located in the same folder as the executable TripleGeo.jar package. If triples are to be extracted from a DBMS, make sure that the correct credentials are given in the configuration file.
- In case that triples will be extracted from ESRI shapefiles, give the following command:
java -cp lib/*;TripleGeo.jar eu.geoknow.athenarc.triplegeo.ShpToRdf options.conf
- Alternatively, if triples will be extracted from a geospatially-enabled DBMS (e.g., Oracle Spatial), give the following command:
java -cp lib/*;TripleGeo.jar eu.geoknow.athenarc.triplegeo.wkt.RdbToRdf options.conf
- While conversion is running, it periodically issues notifications about its progress. Note that for large datasets (i.e., hundreds of thousands of records), conversion may take several minutes.
As soon as processing is finished and all triples are written into a file, the user is notified about the total amount of extracted triples and the overall execution time.
Resources for testing
- The Java source code for TripleGeo is freely available from here.
- Precompiled Java executable binaries for TripleGeo utility can be freely downloaded as a zip file. Note that in order to execute TripleGeo directly from these binaries, Java JRE (or SDK) 1.7 or later must have been installed and properly configured on your local machine.
- Sample geographic datasets for testing are available in ESRI shapefile format.
- Sample configuration files for several cases are also available here. You can edit any of these files in order to prepare suitable configuration settings for accessing a geospatial repository (from shapefile or DBMS) before executing TripleGeo on its contents.
License
The contents of this project are licensed under the GPL v3 License (UPDATE).