Speech technology is poised for wide acceptance in the
Speech technology is poised for wide acceptance in the distribution center.
he dialogue accompanying the photo to the right isn't exactly scintillating conversation. But to the operators of high-velocity, thin-margin distribution operations, it's poetry.
Conversing with computers was once the purview of science fiction. As the technology improved, it slowly became a common tool for call center operations. Only in the last few years has it become refined enough for use in the often noisy, high-speed atmosphere of the distribution center, where operators are employing voice to inch even closer to the holy grail of 100 percent shipping accuracy.
According to a May, 2002, report by KOM International, "Voice Technology in the Distribution Center," about 150 companies have invested in speech technology for distribution center operations. This $20- to $30-million market, currently served by just two viable vendors, is growing at a healthy clip, more than 50 percent a year.
Vendors are confident that after a long period of tire-kicking, distributors are finally ready to adopt voice in a serious way. Now it's a matter of clearing the last few hurdles and evangelizing the masses.
Unfortunately, early attempts to deliver speech recognition capabilities to the PC, as well as earlier iterations of industrial voice technology, left a bad impression on potential users. Developers such as Vocollect and Voxware, which acquired Verbex Voice Systems in 1999, have not only had to develop voice technologies specifically suited for the mobile warehouse environment, they've had to educate potential customers as well.
Both companies use speaker-dependent technology, in which the system is trained to recognize the unique characteristics of a particular user's speech patterns for a set vocabulary, improving first-time read rates. They also process speech recognition and the generation of spoken voice commands on a client device worn on the worker's belt that works in concert with a headset. That speeds data collection and minimizes traffic and dependence on the wireless network.
Telephony systems are often speaker-independent, with the ability to discern words among voices it is encountering for the first time. This consumes more computer horsepower and generally has to be performed by a remote host, which would slow interaction with a worker on a warehouse floor equipped with a wearable speech device.
Only since the late 1990s have all the ingredients of a successful speech technology system for the distribution center come together: small, powerful, yet affordable wearable processors; well-refined speech recognition engines; and technology that can detect speech in the noisy warehouse environment, picking up the voice among noise from freezers, conveyor belts, and even boom boxes. Systems are proprietary and include both hardware and software.
Speaker-Independent: Recognizes the spoken words of many
people speaking a specific language. •
text commands into computer-generated synthetic voice
Digitized Speech: A human
voice that is digitized into a sound file.
• Speaker-Independent: Recognizes the spoken words of many people speaking a specific language.
• Text-to-Speech: Converts text commands into computer-generated synthetic voice commands.
• Digitized Speech: A human voice that is digitized into a sound file.
"There was more pragmatism for the next three years," explained Marc Wulfraat, partner at KOM International and author of the aforementioned report. "Then a few more big retailers bought in and in 1999, 2000, 2001, there was a flurry of activity, with growth rates of 50 percent to 400 percent for some voice players." More aggressive marketing and alliance activity on the part of voice vendors helped fuel the increase, added Chris Barnes, director of business development for CMAC, an Atlanta VAR that has installed voice systems.
In the grocery industry, margins are thin, velocity is high, and accuracy is essential. So grocers have been among voice pioneers. Grocery implementers generally see an 80 percent decline in mispicks and 15 to 20 percent productivity improvements over paper-based systems, said Vocollect's Mr. Sweeney.
K-VA-T, a grocery chain which distributes to its own 84 Food City stores, saw productivity increase 15 percent with the deployment of speech technology from Vocollect, while accuracy "skyrocketed" over the previous system of pick lists and case strips, according to Paul Widener, distribution systems manager. "In the grocery industry, every fraction of a cent helps. Adding those kinds of numbers to the bottom line is exceptional." The grocer measures its accuracy in terms of the number of electronic credits made to stores as a result of mis-shipments; that number declined 75 percent after voice technology was installed. K-VA-T has installed voice in four warehouses with one more planned.
When Corporate Express, a business-to-business office and computer supplier, installed its first voice technology system to replace a manual, less-than-case-quantity picking system, productivity doubled, exceeding the results of a pick-to-light system also in tests. Accuracy now approaches six sigma levels, and the firm spends less time auditing shipments. In one warehouse, sales managers were brought in to audit orders for six hours and no errors were found. "They were amazed," recalled Tim Beauchamp, senior vice president, distribution operations. Fewer customer disappointments is a less measurable, but key, benefit, he noted. Three Corporate Express warehouses are now deployed, (for more information, see "Corporate Express Gets Up to Speed " in our April 2002 issue) with 22 more planned and use of voice for truck loading is under consideration.
One benefit anticipated by 7-Eleven, now deploying a Voxware solution in the operations of its logistics partner, Cardinal Dedicated Logistics, is the ability to easily accommodate multiple languages among pickers. Some voice users report lower turnover among pickers following deployment and faster training of new employees.
So far, food operators—grocery, food service, and food manufacturers—have been early adopters, as well as several retail verticals such as general merchandise, convenience and apparel manufacturing, automobile manufacturing, and firms with large package sortation operations, according to KOM.
Larger operations predominate, but smaller companies have also deployed voice, said Ken Finkel, vice president sales and business development for Voxware. Vendors report interest from the entire range of operations, from those with entirely manual distribution systems to those that leverage ERP and warehouse management systems.
Some have existing RF networks in place, while others need to deploy these as part of the rollout—or choose the less popular batch option. In fact, it's often at a decision point, such as when it's time to upgrade the RF system, that users consider trying voice, said Voxware's Mr. Finkel.
Full-case picking has dominated first deployments because it provides the fastest ROI. But some companies are beginning to apply voice to receiving, putaway, cycle counting, quality assurance, replenishment, and cross-docking.
New users to a speaker-dependent voice system must train the
system to recognize them. According to Voxware, during the
personalization process the worker is cued with a sequence of words
to speak into the headset. As the worker recites the sequence, the
system samples and compresses a sample of his or her speaking
patterns. This compressed sample is called a voice file and is
unique for every user. The dialog voice file contains a broad
sampling of the words and phrases used by the worker while
performing logistics tasks, such as picking or replenishment. A
separate file with a specific sign-on word is used to identify users
for security purposes.
New users to a speaker-dependent voice system must train the system to recognize them. According to Voxware, during the personalization process the worker is cued with a sequence of words to speak into the headset. As the worker recites the sequence, the system samples and compresses a sample of his or her speaking patterns. This compressed sample is called a voice file and is unique for every user. The dialog voice file contains a broad sampling of the words and phrases used by the worker while performing logistics tasks, such as picking or replenishment. A separate file with a specific sign-on word is used to identify users for security purposes.
But voice vendors caution that there is no blanket definition for what environments are ideal. Determining a fit requires careful examination of the distribution operation, current processes, and goals.
But where it does fit, voice can be superior to other methods because it frees hands and eyes and eliminates trips to the pick assignment desk. A user with a scanner, for example, must remove the device from the holster, read the screen, scan the appropriate bar codes, enter data, and then re-holster the scanner to actually pick up the case or item. A voice user, on the other hand, can receive picking instructions by voice while en route to a picking slot, speak the slot number, hear the pick instruction and immediately pick up the merchandise. Those precious seconds add up to some big productivity gains when multiplied by millions of picks a year.
"A number of our customers have tried handhelds," said Vocollect's Mr. Sweeney. "The accuracy was great, but the productivity was not there." Corporate Express tried pick to light and scanning before settling on voice.
For operations where cases are heavy or freezers cloud screens and make gloves necessary, hands-free is particularly important. Compared to voice, pick-to-light and carousel systems are more difficult to reconfigure for changes in merchandise and operations. On the other hand, some environments, such as a highly productive split-case pick-to-tote-to-belt operation may actually be slowed down by a voice interaction, suggested the KOM report.
Vendors hesitate to promise any blanket results. "The value proposition is different in different cases" of voice implementation, noted Voxware's Mr. Finkel, particularly since the starting point varies so widely.
Because potential users vary so widely, voice system developers offer applications to round out the picture: optimizing picking for those lacking WMS systems, for example, or offering real-time data updates for those whose WMSes lack that capability.
They've also struck alliances with WMS vendors. "Most WMS companies recognize that voice will be a part of the mix," said Voxware's Mr. Finkel. WMS developer Manhattan Associates is among those, but few of its customers have expressed interest in voice so far, said spokesperson Ellen Donovan. Catalyst International has one installation in beta and expects more voice projects this year, said Dan Trew, vice president of product strategy. "We've tried to build an interface in such a fashion that the vendor is not a big issue," he explained. "The code is in the product and we've done some alpha testing."
But while some WMS vendors have followed through on those alliances with integration and real-world deployments, KOM's Marc Wulfraat cautioned that others are little more than press-release relationships, with developers waiting for the first bonafide customer to fund the actual coding. "Retailers are going back and asking to real-time enable the WMS, but they have to pay for that," he said.
Older host applications often are not set up for real-time data and, therefore, he said, cannot fully exploit the benefits voice can deliver in real-time environments, such as dynamically replenishing picking slots.
Distribution operations "have never been able traditionally to attain real-time inventory," Mr. Wulfraat continued. "Computer systems couldn't keep up with it all. Now for the first time there is the potential to be able to feed information to computers every time the inventory is touched." RF has been the closest alternative, but voice is even faster, he explained.
Another choice is whether to use voice on its own or to combine it with other picking techniques. Corporate Express, for example, found accuracy declined when the picker was asked to recite more than two or three digits. So pickers scan locations and use voice to perform the actual pick. "We reviewed the process and experimented, and when it was just voice we didn't see as much accuracy as we would like," said Mr. Beauchamp. Others use scanning to capture expiration dates or serial numbers.
Corporate Express and K-VA-T both found implementation straightforward. The level of integration challenge "depends on how tightly you want the integration," said K-VA-T's Mr. Widener. "We went to where the largest payback was."
"We wanted lower capital costs, a lot less complex IT from an infrastructure" and faster implementation, said Corporate Express' Mr. Beauchamp. "I've installed all sorts of warehouse technology, and this is the most bulletproof system I've dealt with."
Here is a condensed example of actual prompts and valid
responses that might be used in a picking dialog. (Courtesy of
Voxware): VI = voice input (user speaks to the
system) 2. CALIBRATE SPEECH
RECOGNIZER 3. SELECT PICKLIST 4. PICKING SEQUENCE 5. TAKE A LUNCH BREAK 6. WEIGH ITEMS 7. LOG OFF FROM THE SYSTEM
Here is a condensed example of actual prompts and valid responses that might be used in a picking dialog. (Courtesy of Voxware):
VI = voice input (user speaks to the
2. CALIBRATE SPEECH
3. SELECT PICKLIST
4. PICKING SEQUENCE
5. TAKE A LUNCH BREAK
6. WEIGH ITEMS
7. LOG OFF FROM THE SYSTEM
It's easy for potential users to underestimate the costs of mispicks, both direct and the domino costs that they cause down the demand chain, noted K-VA-T's Mr. Widener, and therefore fail to see the ROI that voice can quickly produce.
"We're trying to get people to understand that it's not a matter of buying a talking scanner. It's a project, a process, a plan. It's not just throwing a lot of hardware," added Voxware's Mr. Finkel. Sometimes processes need to be rethought to take best advantage of the productivity improvements voice can bring.
Finally, there's the classic obstacle: resistance to change. It can be threatening to pickers to learn a new technology, particularly if their income is tied to their productivity. But users report initial resistance quickly giving way to an embracing of the technology. Management can be similarly reticent to risking a new technology.
While pickers at Corporate Express resisted voice a bit, it was the distribution supervisors and managers that balked more. "The biggest learning challenge was the cultural change in how they manage outbound activities," said Mr. Beauchamp. "There was a little bit of process change. We spent more time educating and training them in a new way to do business."
But with only two vendors in the game and adoption poised to spike, some are speculating that larger entities will step in, either buying out the current vendors or launching their own initiatives. Scanner companies, with their deeper pockets and well-entrenched sales channels, are one potential acquirer, suggested CMAC's Mr. Barnes. KOM's Mr. Wulfraat has been contacted by several European firms looking to make inroads in the U.S. market.
long history has finally led it to the point where application in noisy,
busy distribution operations is a viable option. While no one envisions a
totally voice-driven warehouse, voice vendors are confident that the
technology will take its rightful place in the arsenal of technologies
that help make distribution as accurate and efficient as possible.