top of page
ASR

RECOGNIZER

DEEP NEURAL

NETWORKS

 

The ASR engine is built entirely in-house by VoiceGain utilizing Deep Neural Networks as the core enabling technology.

 

The ASR offered by VoiceGain is not merely repackaging of existing academic or open source projects. Rather, we have used latest DNN research to build a custom speech recognition pipeline, which gives us full control over all its aspects.

​

Out-of-the-box (using standard models) accuracy for transcription (lecture/talk type of audio) varies between 90% and 95% for real-time use-cases and between 92% and 97% for offline transcription.

 

Out-of-the-box accuracy for IVR type recognition is about the same as other commercial ASR engines. However, see next point about customization.

​

USER

CUSTOMIZABLE

 

Our Customers can modify both the Language Model and the Acoustic Model in the ASR.

​

Language Model mode customization requires simple upload of domain specific corpus and vocabulary files.

​

The Acoustic Model customization requires accurately transcribed audio files. They do not have to be be time-aligned, so the effort to prepare them is relatively low.

​

With Custom Acoustic and Language Models​ some of our customers have achieved accuracy above 98% for real-time transcription.

​

Customizing ASR for IVR recognition has allowed us to reach recognition of 2+% above the results from major commercial IVR ASRs on specific domains.

DEPLOYMENT

IN
CLOUD

 

Every VoiceGain customer account  has access to in-cloud functionality. You can start live transcription within minutes of creating an account.

​

IVR use is also possible in the cloud. You can connect to one of our SIP endpoints, or provision a phone number via our web portal.

​

Services in the cloud are available in two pricing tiers.

  1. Premium SLA ASR best suited for critical real-time recognition.

  2. Economy SLA ASR suited for off-line or less critical use cases - benefits from lower pricing.

​​

​

ON

PREM

 

OnPrem deployment provides the same functionality as the Cloud plus the following extras:

  • MRCP interface for the IVR ASR

  • Acoustic Model Customization

  • IVR Tuning Tool

 

Additional benefits of OnPrem are: (a) full control of data and data security, (b) lower price per-minute.

​

On-Prem setup requires a Kubernetes Cluster with Nvidia CUDA GPUs, plus Document and Object Storage, to be provided by the customer. 

​

VoiceGain apps are auto-deployed into the Kubernetes Cluster and managed from the cloud web portal by customer.

INTERFACE

RESTFUL

WEB API

 

VoiceGain Web API for ASR supports:

  • large vocabulary transcription - real-time, semi-real-time, and offline,

  • recognition using context-free grammars (e.g. GRXML)

​

Speech audio input via:  (a) inside request, (b) retrieved from a provided URL, (c) via HTTP/2, (b) streamed from provided Java utility.

​

Recognition results available via: (a) polling, (b) webhook callback, (c) HTTP/2, (d) webscockets, (e) NATS queue.

​

Auxiliary API methods for, e.g.: (a) language model construction, (b) webs​ocket management.

​

WEB

PORTAL

 

A Cloud Web Portal is provided to facilitate the following tasks:

  • status dashboard

  • results / report browsing

  • ASR configuration

  • Language Model construction

  • transcription output queue management

  • management of OnPrem deployments

  • IVR tuning &  regression tools

  • IVR prompt management

  • user management

  • API documentation and support

  • billing

​

Local, on-prem WebPortal additionally enables training of custom acoustic DNN models.

IVR

MRCP
IVR

 

OnPrem install provides a drop-in replacement for most of ASR Engines that support MRCP interface - an example would be Nuance 10 Recognizer.

​

It has been tested for compatibility with Dialogic and Aspect VXML engines.

​

Supports SRGS/GRXML grammars, many built-in dynamic grammars available.

​

Basic TTS engine is included. Also included is an advanced audio prompt server with automatic concatenation.

​

Comes with powerful, web-based, tuning and regression tools.

​

ASR priced per recognizer time, not per port - no need to pay when not in use.

​

SIP

IVR

 

For customers who are not tied to MRCP interface nor VXML portals, we offer a simpler, modern IVR.

​

In this mode the IVR connects directly to a PBX or an SBC via a SIP/RTP endpoint

​

Call flow can be controlled via webhooks, for example, from Node JS application.

​

Supports standard GRXML and well as our easy and compact JJSGF grammar format.

​

Same tools as for MRCP IVR are also applicable to this IVR flavor.

​

Also priced per recognizer time, not per port - no need to pay when not in use.

Deployment
Interface
IVR

ROADMAP

FEATURES
COMING SOON

 

Nvidia EGX compatibility

​

Custom Acoustic Model training in the Cloud

​

NextGen Analytics engine - ASR engine with integrated real-time speech analytics capabilities.

​

Keyword Wake-Up mode - do not waste ASR time/resource while waiting for a command.

​

Advanced DNN-based TTS - more natural sounding voices.

​

Spanish Language Support - ASR and TTS

​

IVR Tasks - define our own or use those from VoiceGain Task Library

​

Advanced USPS address recognition - high accuracy through integration into LM.

​

On-Device ASR reference implementation - does not require server or Cloud for recognition

APPLICATIONS

Roadmap
bottom of page