I talk. Computer listens. Computer does things. Finally somebody who cares what I have to say! 2001: A Space Odyssey a few years early. The latest in human-computer interaction, featuring the Macintosh. This story got me over 400 hits on the hit counter.

Imagine what it was like for Steve Jobs in 1979 when he took his now-famous trip to Xerox’ PARC research facility and discovered the Graphical User Interface (GUI). Here was the new paradigm in human-computer interaction that would power the computer industry. But it wasn’t until 1984 that Jobs was able to stand on a stage in Cupertino, introduce the Macintosh, and bring the GUI to the masses.

Turning to the early 1990s, the GUI had revolutionized everything. Sure, it would be a couple more years before Microsoft’s juggernaut of Windows 95 rolled around, but the battle was over. GUI had won. Henceforth, the command-line interface would be used only by programmers and people stuck with MS-DOS.

Meanwhile, Apple’s engineers worked on an exciting technology code-named Casper.

Flash forward to late 1998 and the imminent release of MacOS 8.5. Few people realize it, but we stand on the threshold of a paradigm shift equal to or exceeding the GUI. That new paradigm, based on the Casper technology of the early 1990s, is the Vocal User Interface (VUI) and though it will, in its first generation, lack the artificial intelligence we’d get from Star Trek’s Enterprise, it will amaze people nonetheless.

Stated clearly, what we’re talking about is speech recognition. Many of the components of the technology are already available, but I believe that MacOS 8.5 will improve them to the point that, when coupled with the PowerPC G3 microprocessor, the VUI may become the new measuring stick in computer ease of use.

Currently, Apple’s speech recognition is part of their freely available PlainTalk technology. [UPDATE 04/19/04: Now built into Mac OS X 10.2 and up. See http://www.apple.com/macosx/features/speech/ for details]. On my own G3-, MacOS 8.1-based system, the speech recognition is sufficient to accurately pick up about 85 to 90 percent of what I say, which isn’t a bad start for a technology that is about to get even better. Us American English speakers have a lot to be happy about.

(An interesting aside is that Apple seems to be moving their speech technology into a multi-lingual arena. In fact, the latest product as of this writing is a Mandarin-to-Chinese dictation system. There’s also a Mexican Spanish language recognition system available.)

Speech recognition is all well and good, but that’s only one part of the VUI equation. The true power of the Mac speech recognition system lies in its ability to execute AppleScripts. MacOS 8.5’s almost fully-AppleScript-able Finder in conjunction with a PowerPC-native AppleScript system will let Apple’s VUI perform almost any Finder-based task at remarkable speed. I might note that Microsoft has no AppleScript equivalent built into their operating systems.

It’s no secret that Apple has made a renewed commitment to AppleScript after letting it flounder for a few years. This could not have come at a better time in terms of the VUI. As more programs become AppleScript-able, the power of the VUI correspondingly increases. Already, programs like StuffIt Deluxe, Eudora Pro, and Adobe PageMaker support at least a minimum level of AppleScript. And I think there’s little question that developers will want to make their software fully AppleScript-able in the days to come.

Let me give an example of how all this works together. I can write an AppleScript which, say, launches Netscape Navigator and opens my web site. I then name the file what I will say in order to launch, i.e. “Go to my web site.” I place that script in the “Speakable Items” folder (a blessed folder in the System Folder), and say to Zeke, my Power Mac, “Zeke, go to my web site.” Zeke executes the appropriate AppleScript, and my hands never touch the mouse or keyboard.

My log off script is even better. I say, “Zeke, log off Navigator” and Zeke quits Navigator, closes the Internet connection, opens my RAM disk, selects all the Navigator cache files, moves them to the trash, empties the trash, and closes the RAM disk. Again, remarkable since I’ve done nothing other than utter four words.

Now with a technology this cool, some may be concerned about the possibility of the Wintel world usurping the VUI the same way they did the GUI. That’s a legitimate concern of course, but at least as of right now Apple has a couple of advantages.

First there’s the aforementioned AppleScript. Microsoft doesn’t provide an operating system-level scripting language, so while scripting is possible on an application-by-application basis under Windows 98, that’s a lot of extra work for programmers and it should provide for some wild inconsistencies in terms of the VUI. (Much like, ahem, the inconsistencies of Microsoft’s GUI.)

Second, Apple benefits from the supercharged nature of the PowerPC chip. At more than double the processor power of Intel’s Pentium II series, Apple is extraordinarily well positioned to run processor-intensive tasks like speech recognition. In fact, this huge advantage will only widen as the G4 chips and AltiVec technology become available over the course of the next year. This performance lead is likely to give Apple the chance to move forward on the second generation of the VUI which should include, among other things, artificial intelligence.

This artificial intelligence is something of the holy grail of computerdom, and right now Apple’s only starting the journey. The level of Apple’s artificial intelligence stands at a fairly good help system and a speech recognition technology that can tell you knock-knock jokes. So we’ve got some distance to travel yet.

Nevertheless, that’s to take nothing away from what amounts to a fundamental shift in human-computer interaction. See, I expect that in a few months Steve Jobs will get up in front of a crowd of Mac partisans and do a computer demo like he did in 1984. Only this time, he will say something like, “iMac, open Microsoft Explorer and go to the Apple web site” and the computer will comply. The crowd, most of whom had no idea what was coming, will sit in stunned silence while iMac follows Jobs’ directions. Then they will give Jobs the biggest standing ovation of his life, because they’ll realize what they’ve seen is like watching Neil Armstrong step onto the moon or the Wright brothers fly at Kitty Hawk. They will have witnessed the first step toward a different level of human-computer interaction—a new paradigm. He will have given them a glimpse into the future.

[UPDATE 04/19/04: Steve Jobs and Co. did demo a secure voice-based log-in system at a MacWorld a few years back, but they’ve yet to push a VUI into the forefront of the Macintosh world. I remain hopeful.]

Further Speech Recognition & Applescript Resources
http://www.apple.com/macosx/features/speech/
http://www.apple.com/applescript/
http://www.scriptweb.com/