Spatial dataset discovery process is essential matter for SDI. It is also a first step for end-user to acquire spatial data. The use of spatial metadata can simplify and boost the whole process but only if the stored metadata records are consistent with dataset. To achieve this desired state, data provider is expected to continuously monitor changes in dataset and commit updates, which is time-consuming task. Some changes in aspects of metadata such as extent or CRS can be automated, but others, like keyword set, cannot. The importance of metadata keywords is major for discovery process. The crucial part of creating proper metadata dataset which can be discovered is to create a complete keyword set for each metadata record.
Most implementations of metadata discovery services are based on OGC standard – Catalogue Service for Web (CS-W). CS-W provides basic operation for metadata record retrieval – GetRecords, which can use complex OGC Filter expressions with variables, wildcards, logical and comparison operators. Usually metadata records are discovered only by matching entered phrase with records keyword set. When the phrase is too generic there are too many results and when entered phrase is too detailed there are no results. There are also problems with synonymy, words thematically connected and multilingualism. The only solution for this problem is to manually update each metadata records keyword set, which can be a labour-intensive task. Because of that spatial information is accessible only to specialists.
This project propose a solution to this concluded problem by use of Semantic Web technologies, especially RDF data model and SKOS standard. Providing dictionary (thesaurus) with concepts thematically connected to spatial dataset can solve problem with synonymy and multilingualism. Metadata records can be discovered regardless of used language. Edge distance between two concepts in thesaurus tree can determine relevancy of entered phrase to metadata record. Concepts from thesauri can also be linked to other resources and become part of Linked Data project. Properly created thesaurus can be reused and linked to many metadata records in many repositories.
Current practical results of conducted research include a working implementation which consists of:
- Thesauri creation and publication application. Thesauri are created with use of spreadsheets (Microsoft Excel and Open document formats) and then loaded into application which converts it to RDF/XML format based on SKOS standard. During transformation all labels are stemmed and indexed with use of Apache Lucene. Converted documents are loaded to RDF Repository (currently Sesame Open RDF), where thesauri can be queried with SPARQL queries.
- Semantic REST and WPS web service, which is connected to RDF Repository and CS-W metadata repository (currently Deegree CSW). This service is a proxy between web client and metadata repository. It receives requests from client, enriches them with the use of thesauri, forms and sends valid CS-W query. The results from repository are ranked according to edge distance relevancy and returned to client.
- Web client, a multilingual front-end for user. It is used to provide phrase input and to present discovery results. It also provides instant semantic suggest, which can determine phrase proposition for user during phrase entering with the use of thesauri.
Implementation of Semantic Catalogue Service is available at Wroclaw Board of Surveying, Cartography and Municipal Cadastre.