The Role of Intelligent Systems in the National Information Infrastructure, страница 11

This section discusses three types of infrastructure services: (1) Data and knowledge management services that allow information consumers to quickly locate relevant facts and software resources from a huge morass of heterogeneous, distributed data; (2) Integration and translation services that convert information from one format to another subject to semantic constraints; (3) Knowledge discovery services that scan rapidly evolving databases in order to produce summaries, discover new correlations, and check consistency.

Before discussing the details of these services, we note that the vast majority of interactions between entities on the network will not be between people and people or between people and programs but between programs and programs. People will, of course, also operate in the network. Many of the functions discussed here (including search, information brokering, network guidance, resource market research and marketing) can and will on occasion be performed by people--as they currently are in the physical economy. However, because of the NII’s potential size, complexity, and rate of change, intelligent software systems will initiate a large fraction of network activity. If these programs are to be useful, they must be both intelligent and knowledgeable. For example, people can use a freeway signpost that reads "I-95 New England" to get to Boston because they know that I-95 is the name of a freeway, the sign means I-95 goes to New England from here, and Boston is in New England. Network-resident intelligent agents will need similar kinds of general knowledge to infer that a seismic-activity database might hold the answer to a query about earthquakes.

2.2.1 Data and Knowledge Management Services

To realize the NII’s potential, two closely related problems must be solved. Information consumers need effective ways to locate relevant information and software resources in a huge, distributed sea of heterogeneous data. Conversely, publishers must disseminate new information and services to interested people and software agents. Two challenges--heterogeneity and scalability--make location and dissemination services difficult to provide. Heterogeneous Data

As we discussed in the Introduction, the information distributed on the NII will be stored in a wide variety of forms, from video images, audio and byte-coded and scanned text in various languages to database relations and mathematical equations. Indexing this information will be difficult because there are so many ways to categorize each item. For example, a photograph of Bill Clinton standing in front of the White House with Al Gore is indeed a picture of Bill Clinton. However, it can also be categorized as a picture of the White House, as well as a picture of Al Gore, a picture of a president, and a picture of the residence of a head of state. Similarly, a speech by Bill Clinton could be indexed by any portion of its content, any aspect of the style in which it was delivered, or any aspect of the circumstances of its delivery.

Because it would be grossly inefficient to index photos and audio clips under all possible terms, NII databases will need to use another method to provide flexible access. The polynomial-time inference and classification schemes of knowledge representation (Subsection 3.1) provide the desired functions, but multiple taxonomies must be supported, and classification schemes must allow evolution over time. The sheer quantity of available data will require that many of the indices be created autonomously, which, in turn, will require information retrieval and natural language parsing techniques (Subsection 3.8) as well as algorithms from computer vision (Subsection 3.9). Because many queries will be underspecified and return too many matches, the information infrastructure must support quality determination by evaluating completeness, consistency, and relevance. Plausible and probabilistic reasoning algorithms (Subsection 3.4) have already demonstrated their utility for representing medical information, and their application to educational and help systems is growing.