Metadata Education Project

Metadata education suggestions and materials for:

Data Creation / Automation and Data Update

Learning Material | Learning Outcomes | Preparatory topics | Complementary topics | Vocabulary


Learning Outcomes

Motivation

Skills

Knowledge


Preparatory topics:


Complementary topics:


Vocabulary

Vocabulary definitions

General:

Advanced


Material for this topic


Why document data, as it is created/automated?

The lack of existing digital data was often a major bottleneck in the application of GIS technology, since the cost of creating data often consumed 70% or more of project costs. Fortunately, more and more data is becoming available in digital form, including basic thematic and framework data that is useful in a wide variety of applications.

Not all existing data, however, is readily available nor will it always suit the requirements of the application (see the topic on Data sources: determining fitness-for-use). Many organizations, government offices, businesses, and students are still faced with the need to create their own data in order to accomplish the needed analyses/solutions with a GIS system.

Metadata is a key element in respect to using existing data sources. The quality of output from a GIS is only as good as the quality of the data input into the system. The more you know about your data, the more confidence you will have in your results.

But if you are creating the data yourself, instead of using existing data created by others, is metadata all that important?

Yes! Metadata is an investment in your data. If data creation and/or manipulation are approximately 70% of the cost of implementing GIS within an organization, why would you not document this investment with metadata? Having this documentation reduces time and confusion in the future, if there is staff turn-over, if changes are made to the data, if procedures need to be replicated, if codes are forgotten, if questions are raised about the data's quality... and so on.

What if the data is to be created for a one-time only GIS project? Is metadata still worth the investment? - Most likely. Questions can still be raised about the results of the project (remember: garbage in = garbage out). Other projects might benefit from the data created for your project, if only they had some documentation of the data's quality, or even knew of the data's existence through a metadata catalog.

Another argument for creating metadata as you create data is that the well-defined data quality elements that are a part of the metadata content standard can provide a guideline for creating data using quality assurance procedures, including checking the data for errors and providing an estimate of its accuracy.

Here are two examples of datasets that were created without metadata or quality control/quality assurance measures. When the data was put to use, embarrassing errors were discovered in it!


What should you document as you create/automate data?

At the very least, it is important to document the sources and procedures used to create or automate the data. After the data creation/automation process is completed, it is also helpful to document other aspects of the data's quality, such as its estimated positional and thematic (attribute) accuracy, its completeness, and its logical consistency (see the topic on Data quality for more details.)

This documentation will have two direct benefits:

Most data is automated from existing sources, such as aerial photographs, satellite images, hardcopy maps or existing databases with a spatial description (such as a legal description or a street address). Describing these sources is the first step in documentation. Where did they come from? When were they created? What was their source scale or resolution?

If you are creating data "from scratch", as if often the case when data is collected by GPS receivers, it is important to provide information about the instruments and methods used to collect the data. This information is documented in the form of process steps. For instance, what sampling method was used to select the features to be collected within the study area? What brand and grade of GPS receiver was used? How many satellites were used to fix positions? What was done about interference (buildings, trees)? How were the locations corrected, if differential correction was required?

Describing process steps is important in all types of data automation and data manipulation:

For specific information about entering or reading metadata on Sources and Process Steps, see the Data Quality section of the metadata content standard.


Example exercises to demonstrate the importance of metadata

This is an excerpt from a class exercise created for the University of Northern Alabama's Advanced Digital Techniques in Geography course (fall 1998) by Lisa Keys-Mathews (web-site).

Your team is asked to complete the following tasks:


Advanced material

A Spatio-temporal model for the manipulation of lineage metadata
Spery, L., Claramunt, C. and T. Libourel. 2001. Geoinformatica 5(1):51-70
This paper describes the importance of tracking updates to geospatial data, such as a frequently-changing cadastral database. A database model was developed for tracking lineage inforamtion and to provide the ability to make "historical quaeries" on the data through it changes.

Abstract:
Nowadays one of the most successful applications of GIS is the management of a land-use cadastre. A lot of corporate GIS databases are in development, they support the legal management and distribution of cadastral maps. However, the propagation of geographical updates toward cadastral databases is still a methodological and technical problem to address in the context of large applications with many different users. This paper proposes a model based on lineage metadata that supports the management of geographical changes in the context of a corporate cadastre application. Geographical and cadastral changes are identified from an analysis of the French cadastre which acts as a case study for the development of our model. The lineage metadata model is based on the application of a direct acyclic graph that permits the management of the evolution of geographical objects and the generation of historical queries. The proposed model is specified and validated with the O_{2} object-oriented database management system.


Related Material

NCGIA 1990 Core Curriculum: Unit 7: Data Input


Back to Course Topics