Over the past five years there’s been a shift to the use of voice-directed warehouse applications running on standard RF terminals rather than voice-only hardware devices. The use of so-called multi-modal terminals opens the opportunity to combine voice with other modes of communication and data capture to enhance efficiency, accuracy and ease of use.
This also creates a more nuanced process optimization challenge. Rather than using voice direction and voice recognition at every step in a process, you now need to consider when scanning might add efficiency or accuracy to a voice-directed process, or where users could benefit from receiving information via a screen instead of voice.
Voice Goes Multi-Modal
Six years ago all voice-directed picking and other warehouse applications were delivered on voice-only hardware terminals that combined voice-directed work with speech recognition. In those days, voice- and RF-based applications were seen as mutually exclusive, with voice positioned as a replacement for bar code scanning. However, voice and scanning are complementary technologies. In fact, voice and scanning have been combined as far back as the 1990s, well before the introduction of multi-modal RF terminals supporting voice.
For example, for the past decade a major apparel retailer has been using a Jennifer voice directed put-to-store application using voice-only terminals along with external scanners. In this application the external scanner is used to capture SKU numbers on cartons of product prior to putting. Today this put-to-store application can now be run on a standard RF terminal and use the terminal’s built-in scanner, saving the expense of purchasing an external scanner.
The rapid adoption of multi-modal terminals for voice makes it easier and more cost-effective to combine voice, scanning and other technologies in new ways. The difficulty now is figuring out when it is best to use voice, when to scan, and when to use a terminal screen.
Voice plus Scan
In the apparel application above, the user scans the bar code on a carton to identify the product and initiate a put-to-store task. The voice system then tells the user how many items to take from the box and put into each tote for different stores. In a voice-only world, the user could speak the item or SKU number, but scanning is faster than speaking in this case. And since scanning in this example is used to initiate the put-to-store task, there is no time penalty for handling a scanner at that point in the process.
Another example of a pre-pick process that is well-suited to scanning is cart set-up in a batch picking situation (i.e., where multiple customer orders are picked to separate totes or cartons on a cart). Using voice-direction and bar code scanning, the user can scan the bar code on each tote or carton as they place it on the picking cart, in effect telling the WMS which carton ID is associated with which customer order in the picking assignment. The voice-plus-scan approach to cart set up is much faster than either a voice-only or scan-only process, making this a great example of how voice and scanning together are better than either one individually.
What these two voice-plus-scan examples have in common is that scanning is used to initiate the pick or put task. Since the users are not yet grabbing items there is no productivity or accuracy penalty to handling the RF device. And as noted in the cart set up example, scanning is faster than voice data entry in that process.
To Scan or Not To Scan
What about scanning shelf labels or items within a pick, put, or other task? In almost every case, scanning within the pick (or other) transaction will be slower than speaking a check digit. This is true even if the user has a finger-mounted ring scanner. With voice, the user will speak a shelf-mounted check string while approaching the slot and reaching for the product without stopping. With scanning, the user must stop, aim and scan before reaching for the items, even when using a ring scanner. The time penalty for using a ring scanner is small, but in high volume pick operations every extra second per transaction adds up to significant man-hours over days, weeks and months. And scanning a shelf label is no more accurate than speaking a check digit.
On the other hand, for customers who need to capture serial data at the point of pick, bar code scanning may be a better alternative than voice. For example, it is faster to scan a serial number than to speak 8-12 digits. Likewise, there are a growing number of situations in which users may need to capture additional variable data at the point of pick, everything from lot numbers, date codes, case weights, etc. Depending on the frequency and type of data capture required, ring scanners are the ideal complement to voice. (To see an example of how voice and scanning are combined for both cart set up and serial data capture, see the RSR Group voice picking video available at www.lucasware.com/successes.)
Whether voice or scanning is better in a given DC or process depends on a number of factors, including the availability of bar codes, the ergonomics of scanning (the ease of scanning with a ring scanner or with the built-in RF device scanner), the specific data capture needs, and other details of the process.
Screen with Voice
In comparison to voice-plus-scan, there are fewer voice-directed systems today that use the RF terminal screen after the user signs-on. This is somewhat understandable, as one of the big advantages of voice-directed applications is that users never need to look down at a computer screen while working―they can listen to their next instruction while moving with their heads up―a faster, more efficient and safer process. But there are some instances where presenting information on screen is preferable to voice. One good example of this relates to pallet-building in grocery, foodservice and other case-picking applications.
Anyone who has spent time in a grocery DC knows that pallet-building is a bit of an art form, and that there is some variation in how different selectors will stack their pallets. In paper-based systems, experienced selectors are very adept at looking at their pick lists and quickly identifying which items they should pick first to create a stable, non-crushable base on which they can easily stack the rest of the items in an order.
When moving to voice, you can take that decision-making out of the hands of selectors and trust the WMS or voice system to determine which items to pick as a base, in which order. But as much as we in the software industry like to believe our analysts and developers are smarter than selectors, experienced selectors may do a better job planning their own pallets. As a result, some selectors may experience a slight drop in productivity when moving to voice, as they may sometimes need to unstack and restack their pallets to create their preferred base.
And here’s where a terminal screen comes in. Rather than pre-determining the order in which to pick base items, the WMS can identify the potential base items and the voice system can display the items as a list on the terminal screen. By giving selectors this information in list form, they can make informed decisions about which items to pick first and pre-plan their pallet, avoiding pallet restacking during the pick process.
The voice system could deliver this list verbally, but a screen is a better delivery method for this quantity of information―a list of items with item number, description, cube, weight, location and quantity to pick. On the other hand, if managers don’t want to give all selectors the ability to decide which items to pick first, they can enable or disable base-item display for different selectors within the voice system.
In addition to presenting lists of information, the terminal screen can also be used in other situations. For example, if selectors are confused about pack factors when picking an item (what does one “each” or box look like?), the voice system can display an image of the item or package to pick. This “show me” capability is a relatively new addition to the arsenal that could have wide use, especially outside of picking.
All Together Now
The next frontier in voice-plus applications is the integration of voice and RFID. There are some significant technical and cost obstacles that need to be overcome before RFID will be adopted widely in the DC, but the technology offers potentially large productivity and accuracy advantages for task verification and data collection.
The long-term ideal would be for a voice-directed application in which a user is directed to a location by voice and the user triggers (by voice) an RFID reader to read a tag on a product or location. Compared to bar code scanning, RFID is a true hands-free technology. Compared to voice, RFID reduces the potential for human error (reading a location check string but picking from an adjacent slot).
In truth, it will be some time before voice-plus-RFID applications enter the mainstream of DCs. In the meantime, we can expect to see new and innovative combinations of voice with scanning, screens and keypads that will drive levels of accuracy and productivity above and beyond what is possible with earlier voice-only or scan-only systems.
Chris Sweeney is senior vice president and Dan Keller is senior solutions manager at Lucas Systems Inc., developers of voice directed applications for distribution centers.