<img alt="" src="https://secure.late6year.com/200973.png" style="display:none;">

Vision and Voice Picking

Everything you need to know about vision and voice picking in a warehouse

Vision and Voice Picking

Learn why it’s the better combination to increase DC productivity, how it beats out RF scanners, Pick to Light, and voice only, and find real-life case studies of warehouses driving value with it today.

ShowStoppers Image-1


Table of Contents

Vision and Voice Picking: What is It?

For any warehouse no matter your industry or product, picking is an essential operation. It's estimated that over half of warehouse workers are involved in either picking, packing or sorting  meaning that for many, the day begins and ends 100% focused on this task alone. 

The central question facing distribution centers isHow to improve how picking is performed – making it easier for workers to perform repetitive tasks across an entire shiftwithout losing count or making errorsall while getting products out the door faster? 

When it comes to improving picking, the market is full of technology options with ways to enhance results.  But of all of them, Vision and Voice picking is strategic pairing for powerful advantage. 

Put simply, vision and voice picking combines both visual and audible instructions to guide a worker in a highly-intuitive way across every pick.  The precise way these instructions come together outpaces other technologies in improving accuracy, and reducing errors to drive compelling operational impact.  

As the name suggests, there are two parts to vision and voice picking: 

LV Image

  • Vision picking: visual instructions seen through Augmented Reality smart glasses showing intuitive, easy-to-follow overlays on real world locations for each step. Example: Green overlay signals a correct location, vs red for incorrect.
  • Voice picking: spoken instructions heard audibly from headphones, directing the worker to locations and items one by one. Example: Spoken cue: “Pick from Bin A4.”

Usually these are separate technologies.  But what many warehouses are realizing today is that when combined, they’re even more powerful. 0-90 Image

Watch the case study of how one warehouse launched vision and voice picking to beat conveyor congestion – revamping operations at a critical time.

Watch the video now.

Why Combine Vision and Voice Picking?

What is it about this combination that’s more effective together when applied to picking? Intuitive Visual Cues with Augmented Reality

When workers can see and hear instructions simultaneously, all of a sudden they’re empowered with a vastly more effective form of instruction to unleash their best work. 

Why?  Vision and voice picking taps into more of the worker’s senses as a multimodal experience.

Activating several human senses at one time, the audio-visual combination make it 100% clear what to do—with zero doubt.  Receiving inputs on multiple fronts, the worker not only hears a cue, but sees graphical-based overlays directly on their environment for visual reinforcement at the same time.  For a picker responsible for sourcing hundreds of items from near-identical shelving and packages that all look the same from the outside, this degree of clarity is a game-changer.  Cutting through the complexity to zero in on the item needed – they can now retrieve what they need faster, and correctly the first time, every time.  

For common challenges with voice-directed cues such as noisy environments requiring the worker to slow down waiting for repeated commands, the presence of the visuals eliminates any delay.  Even if a cue is missed, the worker has the information needed to proceed with the task keeping their rhythm unchanged.  Plus, even when voice commands are heard clearly, adding visuals into the equation takes the process from reactive to proactive, with workers mentally clued into their next destination and already heading towards it – sometimes even before a voice command has even been fully completed.

The winning combination provides workers with:

  • Intuitive, natural cues easily processed by the human brain both audibly and visually
  • Multiple sources of information for faster responses, instant reactions to speed tasks
  • Removed barriers of repetition, hesitation, or delay that slows down work
  • Increased clarity, confidence in each micro-decision adding up to reduce costly errors
  • Guards against going “mind-numb” across repetitive tasks, staying alert to heighten productivity

The fact is when a worker has both voice and visuals at their disposal, the combined effect removes barriers to unlock their full potential.

How it Works

How do vision and voice picking technologies combine into a single solution that workers can use? Vision Picking with LogistiVIEW

It begins with a software platform that connects the technologies with the warehouse’s underlying systems to work together as one

  1. The software is integrated with the Warehouse Management System (WMS) or other system of record to access order fulfillment information.
  2. Through simple workflow building software tools, the series of steps workers will follow are mapped out as a process.
  3. Workers wear a pair of Augmented Reality (AR) smart glasses with built-in with speakers to see order information displayed as visual overlays, and heard as spoken commands.Workflow Example
  4. For each pick, computer vision technology scans for barcodes matching each order, while AI correlates and filters relevant information to optimize the pick route.
  5. The worker sees persistent color-coded, symbol-driven cues showing universally-recognizable guidance overlaid through AR on real world pick locations, while hearing voice commands at the same time.
  6. Following either the visual or voice command—whichever is most useful at that moment—workers progress through the instructions to complete a pick route.
  7. Errors are flagged by the system for immediate correction, prompting the worker to make the fix and avoid costly order returns.
  8. Coordination with robotics, existing automation, and other systems takes place within the software – optimizing how all moving parts come together to complete the operation.

All completely portable with nothing bolted down, pick locations can be easily moved or adapted at any time allowing flexibility to business needs.

Learn more about Vision Picking installations being deployed in warehouses today.


What are warehouses accomplishing with vision and voice picking?Driving Results-1

For a task as central as DC picking, the operation-wide impact of vision and voice is felt at all levels.  New benchmarks become possible in accuracy and efficiency gains, alongside savings in time, errors, and expense.  Extending all the way from workers on-the-job, to operations management, and to driving top-line results it’s truly a business-wide effect.

For workers this means:

  • More accurate, efficient results on every task, every shift—every time
  • Powerful toolset to perform at their best, unlocking full potential
  • Constant support to rely on to combat mistakes, fatigue, and barriers on the job
  • Increased engagement, job satisfaction with enabling tools for better, more efficient work

For operations managers this means:

  • Empowering a more capable workforce, equipped to work in a more natural, intuitive way
  • Training reduced to minutes for rapid on-boarding of new, temporary, and staff of all language backgrounds
  • Confidence in order fulfillment at the utmost quality with built-in guards preventing errors
  • Flexibility to adapt configurations responding to spikes and cycles with changes on the fly

For the company this means:

  • Operations-wide order fulfillment improvements to reduce costly returns, drive customer value
  • Productivity investment in people and process to optimize and strengthen every task performed
  • Adaptability to stay ahead of changing needs to support ongoing business growth
  • Extending the capacity of DC’s, systems, and workers – going further with the right toolset

Scorecard: Vision and Voice Picking vs Pick to Light (PTL)

Vision and Voice vs. Pick to Light

How does Vision Picking and Voice compare to more traditional warehouse technologies like Pick/Put to Light (PTL)?

Here are the facts:

According to the Material Handling Institute (MHI), PTL, “Uses light devices mounted at item locations on flow rack, shelving, workstations or other storage media to guide operators…Illuminated LEDs MHI logodirect pickers to the right product location, and display the required unit quantity. After picking all pieces the operator confirms the completed activity, often by pressing a button on the device.

Translation?  PTL demands hardwired lights, and mounted racking.  That means a bolted down, fixed footprint of expensive assets.  Instead, Vision Picking and Voice provides all the advantages of PTL – without the cost and permanent footprint.

At 80% less investment than traditional lights, Vision and Voice Picking provides PTL’s capabilities at a fraction of the cost through a technology-driven approach that eliminates the fixed, expensive hardware.

The difference is:light based instructions

  • Light-based instructions delivered through Augmented Reality –simulated through smart glasses, not hardwired electrical installations.
  • Visual overlays signal which bins to pick from – and can be shown on any shelf in the warehouse, not limited to one dedicated display.
  • Computer vision scans the barcodes in the vicinity cross-checked with WMS data and filtered by AI to power instructions.
  • Intuitive, instantly-recognizable visual cues make it simple to know which item to pick – as simple as green vs red, even showing a picture of the product packaging to speed identification.
  • The worker sees their specific instructions right through the smart glasses (complemented by voice) – meaning multiple distinct PTL sequences can be running in the same area with each worker following different instructions.
  • Requiring no infrastructure bolted anywhere to the floor, the whole system can be moved at a moment’s notice – with no added cost.

The picking process is seamless, and allows a degree of flexibility that’s simply not possible with traditional PTL.

Watch the webinar “Reinventing Pick/Put-to-Light with Augmented Reality: How omnichannel warehouses are achieving Flexible Automation and ROI in less than 90 days.”  

Learn about the award-winning LogistiVIEW Vision Pick & Put Wall voted 2018 Reader’s Choice by Material Handling Product News.

Vision and Voice Scorecard vs PTL


Scorecard: Vision and Voice Picking vs Standalone Voice

What are the advantages of Vision Picking and Voice compared to voice only?vision and voice vs voice only

Many warehouses already use voice-directed solutions today, so why add vision into the equation?

Ultimately it’s about enabling your workers with information delivered in the best way to drive action.  On the typical warehouse floor, realities of noise, congestion, and constant movement are nonstop.  And when you’re responsible for completing hundreds of repetitive tasks a day amidst this, it’s critical to have seamless cues to progress easily from one step the next.  Having sustained access to not just one type of input to receive instructions, but multiple, is the key to removing frictions to the best picking possible.

Vision can step in where voice falters in scenarios when:

  • Noisy environments drown out commands from being heard by system or worker, requiring repetition
  • Waiting for a system to finish verbalizing a long instruction ends up slowing experienced workers down
  • Language barriers or accents cause repeated commands, delaying action
  • Distraction or fatigue at the end of a shift causes worker to lose catch of the cue, necessitating duplicate commands


In any of these scenarios, having visual cues at the ready enables the worker to progress straight through the task – avoiding repetition or delay.  Armed with a visual cue, they can read it instinctively confirm or jog their memory of what they just heard, or fill in a gap of a missed prompt.  Instead of interrupting a flow that could have continued smoothly, visual clues fill in the blank spaces for workers proceed fluidly.  This adds up across micro-second hesitations that compound across a shift, and across a workforce, directly impacting the business performance. 

The reverse is also true: in the moments where a voice command is easier for workers to catch, they can take action immediately based on what they just heard.  The point is voice and visuals are complementary to one another, serving up information the worker needs filling each other’s gaps to remove frictions that often stand in the way of optimal picking.

When it comes to how they stack up, it’s not an either-or scenario where visuals or voice are inherently better. Instead, they’re better together. The intersection of both hits a sweet spot that enables better action.


Scorecard: Vision and Voice Picking vs RF Scanners

What does Vision and Voice Picking provide that RF scanners cannot?Vision and Voice vs. RF Scanners

If there’s a “workhorse” of picking today, it’s the ubiquitous Radio Frequency (RF) scanner.  What advantages do vision and voice picking bring to the table vs this tried and tested method?

It boils down to this: hands-free, eyes on the product.  Compared to clutching an RF gun, vision and voice picking frees up the worker’s hands while picking.  Scanning takes place automatically through head-worn smart glasses with merely a glance, meaning products can be handled easier and more freely as they’re being picked.  But it’s not just hands that stay on the product – it’s also the worker’s eyes.  Whether a handheld, wrist, or ring scanner, RF devices require the worker to look away from the product every time they need to read a cue.  Each diverted look is a micro-hesitation where workers need to regain focus on where they left off – slowing task completion.  Instead, when cues are available both visually in front of them, and audibly, there’s no need to interrupt the focus on the inventory.  The result is continuous focus, allowing for smooth transitions from one pick to another – without the gaps where losing track, and losing focus slows tasks.

Vision and voice picking outperforms RF scanners in terms of:

  • Keeping hands free to handle product, eyes remaining on inventory at all times
  • Eliminating the need to look away, disturbing focus and creating cognitive interruptionAdobeStock_77054490
  • Avoiding workers losing track of where they left off, introducing potential for error

Good news for those unwilling to let the trusty RF gun go, however.  It is 100% possible to integrate RF with vision and voice picking.  Scanning can be performed either via the RF gun or smart glasses, allowing for easily merging the two processes into one. 

While traditionalists may cling tight to their RF guns, the race to advantage in today’s warehouse is increasingly about eyes on the prize – which is precisely what vision and voice delivers.

Case Study: E-Commerce Retailer Launches Vision & Voice Picking in 90 Days

Read how omni-channel retailer Peter Millar faced conveyor congestion due to unforeseen business growth – and needed a solution fast.  Choosing Vision and Voice picking over PTL and standalone voice, learn what they achieved in a rapid implementation. 

Case Study Pages Image

Read Peter Millar Case Study Now

Webinar: Vision and Voice Picking Reinventing PTL

Learn more about how DC’s are using vision-voice combinations to drive new efficiencies – some launchingwerc-logo in less than 90 days.  Watch “Reinventing Pick/Put-to-Light with Augmented Reality: How omnichannel warehouses are achieving Flexible Automation and ROI in less than 90 days.”

In this webinar hosted by the Warehousing Education & Research Council and LogistiVIEW, you’ll learn from real examples of how vision and voice picking are applied to reinvent operations – changing what’s possible, and turning around peak seasons as a result.


Software Integration of Vision and Voice Picking

Many assume that in order to implement vision and voice picking it’s a prerequisite to have a modern, up-to-date system.  But for many DC’s today, that’s just not the reality.  Legacy systems relying on older infrastructure such as TelNet or green-screen back-ends are still the norm in many warehouses. 

The dreaded question then becomes: do we need to change our system?  And the answer is no. 

Changing a system involves risks of instability, long software integration headaches, and escalating costs.  And while many legacy systems were not built to integrate with modern technologies, the right integration approach is the key to making them work.


It is possible to get all the benefits of Vision and Voice Picking with the exact same systems you have today.  Legacy databases, TelNet systems, WMS, and WCS, and other green screen “dinosaurs” can be made to work with Vision and Voice Picking – without custom coding, or back-end changes.  Unique software integration approaches are taking legacy warehouse systems far beyond what they were envisioned with modern capabilities to drive productivity to new heights. 

How is this done? 

Check out this presentation to learn more: “De-Risking Wearable Deployment: How 3 Companies Integrated LogistiVIEW without Modifications to their Legacy Systems.” It explains how an apparel retailer, medical device manufacturer, and dental supplies distributor each rapidly changed how their picking process took place – with the systems they had.

Interested in bringing vision and voice picking to your warehouse?

Just fill out the form below, and schedule a time that works for you.