The Role of Intelligent Systems in the National Information Infrastructure, страница 27

3.9.1 Relevance to the National Information Infrastructure

Image understanding and synthesis are relevant to the development of intelligent user interfaces, digital libraries, and 3D fax capabilities. Automatic design of informational graphics (for example, charts, maps, and scientific visualizations) is a necessary complement to natural language generation; both modalities are needed to facilitate computer applications that can explain themselves. Virtual environments seem sterile and unbelievable unless the human- and computer-controlled agents that inhabit them move in plausible ways. Automatic motion synthesis and motion tracking from video are two technologies that address this problem. Interpretation of manual gestures and facial expressions is an important aspect of human communication that might be incorporated effectively into computer-human interfaces if the necessary computer-vision problems can be solved. On a more mundane level, robust handwriting recognition would create many opportunities for developing new, more natural user interfaces.

Infrastructure services and development tools will also benefit from advances in computer graphics and computer vision. Although the majority of human knowledge remains stored in paper documents, document analysis and recognition will be needed to convert scanned text and illustrations into symbolic form, thereby facilitating data and knowledge management services. More speculatively, general image understanding would revolutionize knowledge discovery and acquisition; even limited success (for example, the ability to reliably find specific people in photographs or videos) would support extremely useful services. Finally, the difficulty of designing and modeling large-scale virtual environments can be mitigated by applying intelligent modeling techniques that can automatically build scene models from replicas of natural and manmade objects, which might even be acquired initially through computer-vision techniques.

3.9.2 State of the Art

In general, current computer-vision techniques are capable of impressive feats under controlled conditions, but these techniques often prove to be brittle and nonrobust under real-world conditions. The state-of-the-art in four typical tasks illustrates this point.

Facial recognition: A variety of different algorithmic approaches can recognize standard mug shots front-facing, head-only photographs under controlled lighting with high accuracy; however, identifying a face in an image taken in a more realistic setting cannot be done reliably.

Object recognition and reconstruction: Under ideal lighting and viewing conditions, simple known objects (for example, a coffee mug, a rubber duck) can be recognized, and simple unknown objects can be reconstructed, but these techniques often fail completely under less favorable conditions.

Hand tracking and gesture recognition: Under ideal conditions, the movement and configuration of a human hand can be tracked with high accuracy; however, no current system can interpret sign language in a practical setting. Automatic recognition of facial expressions is becoming a reality.

Document analysis and recognition: A one-column document that is cleanly typed in an appropriate font can be interpreted with high fidelity; however, a poor-quality multicolumn document with irregular layout and mixed fonts can baffle the best of the current systems.

Three-dimensional computer graphics problems can be neatly divided into two categories: modeling and rendering. Modeling is the problem of acquiring, representing, and manipulating a symbolic description of the objects in a static or moving scene. Rendering is the problem of converting a scene model into the appropriate two-dimensional image. In general, image rendering is well understood; current techniques are capable of accounting for great subtlety in lighting and shading phenomena to generate amazingly realistic imagery. Modeling, however, is not as advanced. Making an articulated figure move in a visually plausible way or designing a solid model of a manmade artifact are difficult. Typically, the best tools available for these tasks are direct-manipulation user interfaces, which are tedious and hard to use. Only recently have researchers begun to apply, to some degree, AI tools and ideas with a view toward automating the hard modeling tasks. Informational graphics, which are mostly two-dimensional but can be three-dimensional, form an essentially distinct category of graphics, but one that is of great importance for information analysis and presentation. Research in this area has also focused mostly on improved user interfaces for the manual creation of images such as charts and maps; relatively little research has been conducted to date on the automatic design of such graphics.