Abstract:
Error within a spatial database can be from the source materials or methods, from the processing methods,
and from the use. Source errors can be from the original measurements, the interpretation of those
measurements, and creation of the digital databases. Processing errors can be from changes in software,
migrations to new software, conversions to new formats, projecting or transforming data, internal storage
ability of computer hardware, and even simple processes such as rounding errors. Use error is impacted by
both source and processing errors, yet inappropriate use of digital databases is, and will continue to be, an
issue as software is made available to inexperienced users. Determining "fitness-of-use" requires knowledge
of the application the database will be used for, as well as full disclosure of what the potential problems are
with the database. Although the use of metadata has been officially adopted through Executive Order 12906
(United States Government, 1994), in the United States very little accuracy information is actually provided.
Metadata providers are encouraged to provide verbal descriptions of the digital databases they create that
assist the end user in determining the fitness-of-use but do not indicate exactly where the accuracy is better
or worse. Many global statistics are provided as an interpretation of the database (Unwin, 1996) and are also
too general to be of much use in determining fitness-of-use. Very few databases provide a full disclosure of
what the potential errors are in the final database, where those errors exist, and to what extent they may exist.
"Error is inescapable, it should be recognized as a fundamental dimension of data" (Chrisman, 1991, pg.
165) and is an element in every database. Comments such as "truth in labeling" (Prisley, 1994, pg. 33),
fitness-of-use, lack of information, and a "full description of quality" (Aspinall and Pearson, 1995, pg. 71)
abound. Many calls have been made for the producer of the database to provide clear and concise
information on errors in the database, but few recommendations are made on how to provide this information.
One recommendation is to provide a summary of error by object (Aspinall & Pearson, 1995; Brassell et al.,
1995), and another recommendation is to provide an error matrix (Aspinall & Pearson, 1995; Congalton and
Green, 1998; Goodchild, 1994; Veregin, 1995; Veregin and Hargitai, 1995). While each of these
recommendations has merits, neither is sufficient alone to describe all of the potential error in some
databases. The question is how to provide a full disclosure of the errors in a digital database? What index
should be used. How can these indexes be used to determine error propagation? How would they impact the
decision making process? How should confidence levels be visualized before and after the analysis? The
organization of this thesis is to review the two selected databases and the source and processing errors,
determine the extent of the error, address fitness-for-use, evaluate how the error impacts a change analysis,
and determine how to visualize or communicate the level of confidence in the final analysis. The example
used here will demonstrate a method for error tracking and propagation identification in a land use change
analysis.