The Role of Intelligent Systems in the National Information Infrastructure, страница 8

Expressive: Users should be able to form arbitrary questions and requests easily, without being limited by restrictive menus or forced to learn artificial query languages. Intelligent interfaces should accept requests in whichever modality (e.g. speech, text, gestures) the user chooses.

Goal oriented: Users should be able to state what they want accomplished. The intelligent interface should determine how and when to achieve the goal, then perform the actions without supervision.

Cooperative: Instead of the passive-aggressive error messages that are currently given in response to incorrect or incomplete specifications, intelligent agents should collaborate with the user to build an acceptable request.

Customized: Personal assistant agents should adapt to different users, both by receiving direct requests from the user and by learning from experience.

These criteria--and their consequences--are explored further in Subsections 2.1.1 through 2.1.3. In some cases, however, the best interface will be one that gives the impression of directly manipulable, three-dimensional space; the contribution of AI to these virtual-reality interfaces are described in Subsection 2.1.4.

2.1.1 Integration and Expressivity

Two decades ago, window-based graphical interfaces and the direct-manipulation metaphor revolutionized human-computer interaction. However, few fundamental changes have occurred since then, and computers remain intimidating to the vast majority of the population. If the NII is to be both broadly accessible and flexible, people will need to interact with it in a natural manner, much like they do with one another. For example, users will want to access NII resources using a combination of speech and text (typed or handwritten) in their own natural language, and with hand and facial gestures. Furthermore, an interface should be able to present information in the manner most conducive to interpretation, be it text, graphics, animation, audio, or some coordinated combination of several modalities. Whereas today’s application interfaces offer, at most, a help command or menu option, NII interfaces will increase acceptance by offering customized, intelligent help and training, especially for the nonexpert user. Development of such a flexible interface paradigm raises several challenges in the areas of machine perception and automatic explanation.

2.1.1.1 Machine Perception

Because people converse using speech, written language, gesture and facial expression, the ability to communicate seems effortless. If we want to ensure that user interactions with the NII are as natural, computers will require more advanced perceptual capabilities. As a result of research in the AI community, such capabilities are becoming technically feasible: given a controlled environment, existing computer vision algorithms (Subsection 3.9) can recognize eye and lip movements as well as hand gestures. Speech systems (Subsection 3.8) are currently capable of robust speaker-independent recognition for small vocabularies, and practical speaker-dependent recognition for vocabularies of ten thousand words or more; real-time natural language processing systems (Subsection 3.8) have been used in numerous database-query applications. However, technical problems still remain: many current technologies are brittle and thus break too easily for them to be considered fully mature and ready to use.

2.1.1.2 Automatic Explanation

Computers acquire, process, and generate data far more readily than they can present or explain it. If the NII were only to provide more ways for data to be produced and transferred, it would have limited success. To complement existing conventional abilities to store and move raw data, we need intelligent agents that are both linguistically and graphically articulate. If a query returns huge amounts of data, the intelligent agent should be able to compute a salient summary and present it using whichever modality best suits interpretation; it should support and be able to choose from among a wide range of options, including chart graphics, natural-language text, volumetric visualizations, animation, music, or speech. Furthermore, current interface capabilities that provide formatted data must be supplemented by automatic explanation systems that consider the background, abilities, and interest of the requester.