Product | VoiceGain

ASR

RECOGNIZER

DEEP NEURAL

NETWORKS

The ASR engine is built entirely in-house by VoiceGain utilizing Deep Neural Networks as the core enabling technology.

The ASR offered by VoiceGain is not merely repackaging of existing academic or open source projects. Rather, we have used latest DNN research to build a custom speech recognition pipeline, which gives us full control over all its aspects.

Out-of-the-box (using standard models) accuracy for transcription (lecture/talk type of audio) varies between 90% and 95% for real-time use-cases and between 92% and 97% for offline transcription.

Out-of-the-box accuracy for IVR type recognition is about the same as other commercial ASR engines. However, see next point about customization.

USER

CUSTOMIZABLE

Our Customers can modify both the Language Model and the Acoustic Model in the ASR.

Language Model mode customization requires simple upload of domain specific corpus and vocabulary files.

The Acoustic Model customization requires accurately transcribed audio files. They do not have to be be time-aligned, so the effort to prepare them is relatively low.

With Custom Acoustic and Language Models some of our customers have achieved accuracy above 98% for real-time transcription.

Customizing ASR for IVR recognition has allowed us to reach recognition of 2+% above the results from major commercial IVR ASRs on specific domains.

DEPLOYMENT

IN
CLOUD

Every VoiceGain customer account has access to in-cloud functionality. You can start live transcription within minutes of creating an account.

IVR use is also possible in the cloud. You can connect to one of our SIP endpoints, or provision a phone number via our web portal.

Services in the cloud are available in two pricing tiers.

Premium SLA ASR best suited for critical real-time recognition.
Economy SLA ASR suited for off-line or less critical use cases - benefits from lower pricing.

PREM

OnPrem deployment provides the same functionality as the Cloud plus the following extras:

MRCP interface for the IVR ASR
Acoustic Model Customization
IVR Tuning Tool

Additional benefits of OnPrem are: (a) full control of data and data security, (b) lower price per-minute.

On-Prem setup requires a Kubernetes Cluster with Nvidia CUDA GPUs, plus Document and Object Storage, to be provided by the customer.

VoiceGain apps are auto-deployed into the Kubernetes Cluster and managed from the cloud web portal by customer.

INTERFACE

RESTFUL

WEB API

VoiceGain Web API for ASR supports:

large vocabulary transcription - real-time, semi-real-time, and offline,
recognition using context-free grammars (e.g. GRXML)

Speech audio input via: (a) inside request, (b) retrieved from a provided URL, (c) via HTTP/2, (b) streamed from provided Java utility.

Recognition results available via: (a) polling, (b) webhook callback, (c) HTTP/2, (d) webscockets, (e) NATS queue.

Auxiliary API methods for, e.g.: (a) language model construction, (b) websocket management.

WEB

PORTAL

A Cloud Web Portal is provided to facilitate the following tasks:

status dashboard
results / report browsing
ASR configuration
Language Model construction
transcription output queue management
management of OnPrem deployments
IVR tuning & regression tools
IVR prompt management
user management
API documentation and support
billing

Local, on-prem WebPortal additionally enables training of custom acoustic DNN models.

IVR

MRCP
IVR

OnPrem install provides a drop-in replacement for most of ASR Engines that support MRCP interface - an example would be Nuance 10 Recognizer.

It has been tested for compatibility with Dialogic and Aspect VXML engines.

Supports SRGS/GRXML grammars, many built-in dynamic grammars available.

Basic TTS engine is included. Also included is an advanced audio prompt server with automatic concatenation.

Comes with powerful, web-based, tuning and regression tools.

ASR priced per recognizer time, not per port - no need to pay when not in use.

SIP

IVR

For customers who are not tied to MRCP interface nor VXML portals, we offer a simpler, modern IVR.

In this mode the IVR connects directly to a PBX or an SBC via a SIP/RTP endpoint.

Call flow can be controlled via webhooks, for example, from Node JS application.

Supports standard GRXML and well as our easy and compact JJSGF grammar format.

Same tools as for MRCP IVR are also applicable to this IVR flavor.

Also priced per recognizer time, not per port - no need to pay when not in use.

Deployment

Interface

IVR

ROADMAP

FEATURES
COMING SOON

Nvidia EGX compatibility

Custom Acoustic Model training in the Cloud

NextGen Analytics engine - ASR engine with integrated real-time speech analytics capabilities.

Keyword Wake-Up mode - do not waste ASR time/resource while waiting for a command.

Advanced DNN-based TTS - more natural sounding voices.

Spanish Language Support - ASR and TTS

IVR Tasks - define our own or use those from VoiceGain Task Library

Advanced USPS address recognition - high accuracy through integration into LM.

On-Device ASR reference implementation - does not require server or Cloud for recognition

APPLICATIONS

Roadmap