CHAPTER 7

Using Search Protocols


Search protocols are another element in the process of searching for resources within the CGDI. Since you will be using search protocols when you make your database searchable, it is useful to understand their function and structure. This chapter:

7.1 What are Search Protocols?

A search protocol is a standard way of asking questions (or queries), getting answers, and exchanging information between two computers over the Internet. A search protocol is akin to a technical language. It specifies the transport mechanism for the information flow between the two computers.

For example, the information may travel over the hypertext transfer protocol (HTTP) on top of the transmission control protocol/Internet protocol (TCP/IP) using sockets. These are not the only transport mechanisms available, but they are the most commonly used.

A search protocol also specifies the information that is passed back and forth between the computers. The type of information passed includes:

When the search server translates between the search protocol and the database, there are two elements involved:

  1. Since the metadata is in a database, the search server must know what to look for in the database. Each search server understands its own metadata format or content standard. When the search server is set up, the server can create a translation mapping between the server's metadata fields and the database fields.

  2. The search server must be able to interpret the message contained in the query. If the search protocol is the language, there are many messaging schemes that can be formulated in that language. The server understands a specific messaging scheme (or profile) that is tightly coupled with the metadata content standard used to represent the metadata in the search server.

7.1.1 Search Protocol Architecture

Figure 8 illustrates the function of the search protocol in the communication flow between the user and the supplier's product database:



Figure 8 Search Protocol Architecture

7.1.2 Stateful Versus Stateless Searching

Search protocols can be either stateful or stateless.

A stateful protocol means that the discovery mechanism opens a connection with your search server and keeps it open for the entire duration of the search session. This enables a user who selects ten data products, for instance, to examine the results of the data product search after the first data product is returned, before all ten results are returned.

A stateless protocol means that the discovery mechanism opens a connection with your search server, sends a bit of information, gets back a bit of information, and then closes the connection. The search session consists of a series of such open-send-receive-close interactions between it and your search server. Each open-send-receive-close interaction is independent of the others. Your search server handles each one independently; there is no "history" of preceding interactions and so it is called "stateless". In this case a user who requests ten data products could not examine the results of the data product search until all ten results are returned.

See A1.9, Geodata Discovery Service, to see how this CGDI specification applies the stateful and stateless protocols to retrieve metadata about geospatial data.

7.2 CGDI Search Protocols

The GeoConnections Discovery Portal mainly supports the following geospatial search protocol:

The following geospatial search protocols were previously supported by the GeoConnections Discovery Portal but will no longer be supported for new suppliers:

The American National Standards Institute/National Information Standards Organization's ANSI/NISO Z39.50 search protocol is a computer-to-computer communications protocol designed to support searching and retrieval of information, full-text documents, bibliographic data, images and multimedia in a distributed network environment.

A protocol specification standardizes the query syntax, search field identities and default format of returned records, and provides mechanisms for access control, and server self-description. Based on client/server architecture and operating over the Internet, the Z39.50 protocol is supporting an increasing number of applications. Like the dynamic network environment in which it is used, the standard is evolving to meet the changing needs of information creators, suppliers, and users.

To its credit, Z39.50 is very comprehensive. At the same time, it can be quite complex for a data supplier to install Z39.50 server software and configure it to search a dataset. This runs counter to the goal of GeoConnections to keep the cost of supplier participation to a minimum. To overcome this problem, a FGDC metadata toolkit has been developed which packages the necessary Z39.50 GEO software in a manner that makes it easy for suppliers to install and configure.

The Z39.50 search protocol is a message-based protocol that utilizes request/response pairs for each of the services it supports. Its essential services are:

  1. Init, which establishes a session between the client and the database server;

  2. Search, which conveys the search criteria to the target database, and responds with statistics on the matches, such as the total number of matches. The response to a search request does not include the actual records from the database that match; and

  3. Present, which follows a search response, and is used to request the actual matching records, or a subset of the records. The mechanism is very powerful because the result set is managed at the target server and the complete result set does not have to be returned over the network.

There are additional services which provide access control, resource management and self-describing facilities for the target databases, etc.; however, these are less often supported by client and server software than the three basic services described above.

The Init request/response (the process where two servers are synchronized so they can communicate) allows both computers to introduce themselves and indicate which services (i.e. functions) of Z39.50 they support.

The Search request contains the parameters of the information retrieval request. It consists of one or more Attribute/Relation/Value restrictions (e.g. height > 5).

The response contains either the resulting set of records or just the count of the number of matching records. If only the record count was received, the request can be used to request sets of those matching records.

For more information about the Z39.50 search protocol, please refer to:

7.2.1 The GEO Profile of Z39.50

Some search protocols have several profiles. A profile identifies a set of base standards, together with appropriate options and parameters necessary to accomplish identified functions for purposes including: (a) interoperability, and (b) methodology for referencing the various uses of the base standards, meaningful both to users and suppliers.

The Z39.50 information retrieval model is independent of its domain. Domain specializations are provided by an additional mechanism, referred to as "application profiles". Specific profiles exist for the messaging scheme of geospatial-type queries and results. These profiles are tightly coupled with specific metadata content standards for geospatial metadata. The FGDC has developed a Z39.50 application profile for geospatial metadata, called GEO, which provides a specification on how to implement the Content Standard for Digital Geospatial Metadata (CSDGM) metadata elements within a Z39.50 service.

Using this profile achieves interoperability with the FGDC Clearinghouse, amongst others. Furthermore, the Earth observation community, in the guise of the Committee for Earth Observation Satellites (CEOS) agencies, is working to ensure that the GEO and Catalogue Interoperable Protocol (CIP) protocols (both of which are based on Z39.50) are interoperable. The CIP defines a single interface to Earth observation catalogues. The GEO profile standardizes (on top of Z39.50) the data model for search and retrieval, the query language operators (including spatial operators), etc.

The Z39.50 GEO profile states that a Z39.50 GEO profile server must:

Furthermore, the Z39.50 GEO profile offers the following:

You can view the full Z39.50 GEO profile at:

http://www.blueangeltech.com/standards/GeoProfile/geo22.htm

The many profiles of the Z39.50 standard are listed at:

http://lcWeb.loc.gov/z3950/agency/profiles/profiles.html

Figure 9 illustrates the Z39.50 GEO architecture as implemented in the GeoConnections Discovery Portal



Figure 9 Z39.50 GEO Architecture

 

<< Previous  |  Home  |  Top of Page  |  Table of Contents  |  Next >>