Speech Recognition API

Machine Learning transcription of speech to text from audio and/or video files

Try for free     API Documentation     API Console     Pricing

State of the art audio processing using Deep Learning

HPE Haven OnDemand uses deep learning, through the use of artificial neural networks, to deliver state-of-the-art audio processing for speech to text. Speech to text is the process of translating spoken words into text. It is used in many contexts to analyze, search, and process audio content, such as command-and-control systems, dictation software, audio and video search, or subtitling.  

Designed for rapid integration into any app

curl -X POST http://api.havenondemand.com/1/api/async/recognizespeech/v1 --form "file=@hpnext.mp4"


Supports transcript alignment 

Assign time-codes to all words in the transcript

Transcript alignment assigns time codes to all the words in an audio transcript, even if they contain noise and missing sections. This is used in systems such as those generating automatic subtitles from manual transcripts or adding the ability to jump to a given position in audio by word. This functionality can in turn be used for checking script adherence to determine, for example, whether a call center operator is sticking to a pre-agreed script. 



Understands foreign languages

Support for the most common International spoken languages

The Haven OnDemand Speech Recognition API is able to transcribe speech from more than 20 spoken languages and variants including Arabic, Chinese (Mandarin), Dutch, English, Farsi (Persian), French, German, Italian, Japanese, Portuguese, Russian, Spanish and more.




About HPE Haven OnDemand

HPE Haven OnDemand provides more than 70 REST APIs for rapid integration in enterprise, mobile, desktop, IoT, augmented reality, virtual reality, and web apps. Reimagine your world and accelerate development with Applied Machine Learning from Haven OnDemand.

See all 70 HPE Haven OnDemand Machine Learning APIs