Automated Metadata Interpretation to Assist in the Use of Unfamiliar GIS Data Sources

Brandon Plewe and Steven Johnson
Department of Geography
Brigham Young University
 
The Open GIS Consortium (OGC) has been actively involved in the development of technical means for allowing Geographic Information Systems software to incorporate data from heterogeneous sources, both internal and external to an organization. Recently, several vendors have introduced software based on the OGC standards which begins to fulfill this vision. As with any advance in technology, interoperability has the potential to greatly enhance our work, but can also be misused, whether intentionally or not.

One of the abuses which becomes more possible with interoperability is for people to use data sets for applications for which they are not well-suited. This is largely due to the fact that GIS users will be more often using and combining data sets with which they are not intimately familiar. Examples of problems include: creating large-scale maps from low-accuracy data, combining datasets digitized from maps of very different ages, and misinterpreting attributes and classification schemes.

Metadata has long been touted as the solution to the problem. If users read the metadata, they will become familiar with the data set and will be able to make good judgments about its proper use. However, metadata records based on standards such as the FGDC Content Standard for Geospatial Metadata tend to be extremely complex, and difficult to read and understand for all but the creator. However, their complex structure is well-suited to automatic parsing by computers.

This research is building a prototype system for the automatic interpretation and use of metadata in a standard GIS environment (ESRI Arcview). As themes are added to a GIS project, their associated metadata records are also retrieved, and pertinent elements are parsed from the metadata record. This information includes the age of the dataset and its sources, the projection and coordinate system, general horizontal and vertical accuracy, quality of source information, subject matter, spatial footprint, and explanation of attributes and associated classification systems. The initial prototype focuses on issues of scale and accuracy (e.g. comparing the intended scales of two themes being displayed together). The metadata records are expected to be text files using the FGDC-standard SGML DTD, and are thus relatively simple to parse into fields. Some pieces of information can be gathered directly from the value of a single field (e.g. Horizontal Positional Accuracy Value), while others may require more extensive analysis of textual information (e.g. Horizontal Positional Accuracy Report).

This information is used throughout the GIS session when needed. For example, when a theme is added to a view window, the characteristics of the new dataset (e.g. time period, accuracy, coordinate system) are compared to those of the existing themes, and checked for compatibility. Other appropriate applications include changing the view scale (zooming in and out), performing queries, and creating maps. These hooks are added to the traditional operation interface to make the metadata system transparent; that is, the dialog boxes appear similar to the standard operations, but with added buttons or information.

When checks are performed against the metadata, a user interface assists users in their subsequent actions. This can take three forms: help boxes with descriptive text, warnings or advisory messages, and locks to prevent users from performing incorrect operations. For example, when the user adds a theme to a view, and there is a mismatch in the metadata, he or she may be presented with a table of metadata fields showing the conflict, asking them whether they still wish to add the theme. Alternatively, they may see a message box warning them of the possible repercussions of the mismatch, or be prevented entirely from adding the theme (with a proper explanation). Users can generally control (using global preferences) which level of assistance they would like. They also always have the opportunity to override the checks if they so desire. At any point, the user may directly view elements of the metadata which are pertinent to the task at hand.

In addition to preventing or warning users of potential abuses, automated metadata can also be useful for automating standard GIS processes. For example, when a theme is added to a view in Arcview, the metadata could be used to automatically give the theme a meaningful name, or set the initial scale and measurement units of the view window, or even select an initial symbology which is appropriate to the subject matter of the theme (e.g. green for a vegetation layer).

Although the system is still under development, the intended result is that the metadata are used as a partial solution to a serious problem in GIS. The advanced GIS user will be able to more intelligently use data sources with which he or she is not familiar, without having to read and understand the entire metadata records. Novice users, who may not understand many of the technical concepts discussed in the metadata, are able to use data sources more correctly, often without even realizing it.