White Paper on Data Integration
 
One of the five thrusts of the Canadian Geospatial Data Infrastructure (CGDI) is Data Framework. It aims at providing CGDI users a foundation of reliable GeoSpatial data to enable a data framework for value-adding, application development, and more specific detailed data collection.

The CGDI is addressing this issue from two different and complementary perspectives; 1) the data and 2) Data Integration Services. This paper describes the context of Data Integration, introduces services and proposes a technological path for the implementation of a complete set of Data Integration Services.

The Context of Data Integration

In today’s Geospatial projects which involve data from a multitude of sources the work required to bring all the data together into one application and to integrate the data so that it can be used for in-depth analysis accounts for up to 80% of the overall project’s effort. We envision that by the year 2012 this effort will account for less than 1%.

With proper planning and focus through initiatives like the CGDI, this reduction will occur sooner. The GeoSpatial Software industry has recognized the need for Data Integration Tools and some solutions are already in place. 

Fancy this

Imagine that you have a project to create a map of all the tourist attraction in your neighborhood for the venue of family and friends who will attend your daughter’s wedding. Imagine that all you have to do is open your mapping program from your favorite office suite, direct your Web browser to the CEONet, enter your postal code and subject of interest to discover that there are fifteen data sets that contains information you require. Imagine that after reviewing their complete description you decide to use eight of them; six need a format translation, two a change of Datum, one a scale transformation and all of them need to be cleaned of duplication and aligned perfectly. Keeping in mind that cost was the main issue, you select two service outlet for those transformations; one for the formats and another one for all the other transformations. After providing GeoCommerce with the proper identification and financial authorization you move your browser to selecting the new decor for the room she will leave, to fill the hour or so these transformation will take. Between carpet and curtains, you receive an E-Mail with your eight files attached. An hour later you have a complete hyper-map with topography, roads, utilities, airport, accommodations and attractions to which you have added your personal touch to recommend sites of exceptional value. After composing a little note; you E-Mail it to the invitees.

Guess what

Is this scenario science fiction ? Not really. CEONet has already in place the mechanism for data discovery and access and has laid the groundwork for the service infrastructure that will permit these services to be bought and rendered online. Many commercial products have implemented software to perform the required transformations. The CGDI Data Integration Service will provide the coalescence required to fill the gap.

Data Integration Services

To gain strength, momentum and recognition, the CGDI Data Integration Services are built on existing international and Industry standards. The main source of the related standards are the ISO TC-211 http://www.statkart.no/isotc211/ and the OpenGIS Consortium http://www.opengis.org/.

GeoSpatial Data Integration Services are a set of functionality encapsulated behind standard interfaces that provide as a whole the entire set of capabilities required to bring together as equals individual data sets and features originating from different sources.

The following list of services is not unique. There are more than one way to break down the very large concept of "Intergrate". This is an attempt that is as valid as others as it allows to focus on providing solutions to problems. With time and additional research and contribution the list will evolve from a concept to series of software componant that actually work. 

Align Geometry 

The align geometry service allows for the vertical and horizontal fit of two or more data sets. The development activities for this service are centered around the Data Alignment Layer http://www.cits.nrcan.gc.ca/~cdal/main-e.html.

Match Semantic

The Match Semantic service allows for two or more data sets that were described using different semantic to be merged semantically to conform to a common semantic. This includes without being limited to activities such as feature codes translation, attributes amalgamation or fragmentation.

Align Scale

The Align Scale Service allows for two or more data sets who are originally from two different scale and/or resolution to be matched. Generalization processes are under this service.

Conflate

The conflate service allows for two or more data sets to be merged into a resulting data set that contains the required elements of the inputs. It is essentially the process of creating C such as A + B = C and where C > A and B. This service uses many of the others.

Translate Format

The translate format service takes a data set in format X and converts it in format Y. It also includes one way translation where a data set in format X can be read directly by an application using this service.

Evaluate Quality

The evaluate quality service provides an assesment of the quality of a data set. This assesment is used to judge suitability of the data set for certain applications. The service is also used by other services to provide required quality parameters.

Associate

The associate service establish a link between features of two or more data sets. The link (or association) may be of different nature. It can be used to link a feature that is represented more than once at different scale or to maintain the information that a particular limit is bounded by a natural feature that is maintained in a distinct data set.

Technological Path

The following chart depicts the possible (probable) timetable for the commercial implementation of these services.
 
Beginning of Work
Vision
Consultative Process
R & D
Implementation
Extensive Use
Translate Format
1997
1998
Align Geometry
1997-1998
1998
1998-2000
1999-2000
2001
Conflate
1996-
1999
2002
Align Scale
1995-
2000
2002
Evaluate Quality
2002
2003
2004-
2006
2010
Match Semantic
2002
2003
2004-
2007
2011
Associate
1999
2000
2000-
2002
2006

Another way of describing how and when these services may come to be part of COTS GeoSpatial Software, is to describe the upcoming years in terms of era.

The first is underway, it is "Discovery, Access and Format Translation". The Access component of the CGDI although not completed, is already in place. It provides capabilities to search and discover the existence of requested data sets. Other functionalities will include the purchasing, delivery as well as the basic infrastructure to implement other online services such as those described here.

By the year 2002 we expect the commercial software oriented towards implementing Data Integration Services will take a predominant place and this will mark the beginning of the "Basic Semantic Integration" era. At that time, vertical and horizontal integration using the Data Alignment Layer will be common place, services for basic conflation, scale transformation and association will be emerging.

By 2005, the entire framework will be in place, CGDI will be, not an effort but a reality. The community will have reached a complete understanding of its foundation and implemented its principles. Software will be more robust and functionnalities will be added. By 2008, it will be common place to see substantial interoperability within application communities and by 2012 Data Integration Services will be just as easy to use as the Web is today, only it will be faster.
 

Sylvain Latour
Senior Project Officer
Center for Topographic Information
2144 King West, Sherbrooke, Qc, Canada
Tel: (819) 564-5600 Ext. 269
Fax: (819) 564-5698
E-Mail: slatour@nrcan.gc.ca