The new Cloud Speech API has support for more languages, word-time timestamps, and increases the time for uploadable files.
The Google Cloud Speech API , which since its launch in 2016 has allowed developers to use Google services to transcribe spoken words to text has received a major update today .
The most interesting new feature of the API is the addition of support for 30 new languages, which add to the 89 languages already understood by the service, including multiple regional variants of English, Spanish and Arabic. In the list of these new languages we find Bengali, Latvian and Swahili. According to Google, the new languages built into the Cloud Speech API are spoken by about 1 billion people.
In addition, Google has also introduced some major new features. Among them is the support for time-tags at word level , reports TechCrunch . The idea is to label each word with its timestamp so that developers can, for example, easily allow their users to hear what a given word sounds like.
This is especially interesting for human-supervised transcription and translation services that use this API to speed up their workflows. “Having the ability to map audio to text with timestamps significantly reduces the time spent reviewing transcripts,” explains Happy Scribe co-founder André Bastie, whose company uses Cloud Speech for its interview transcription service.
It also increases the time of the files that the developers can upload to the service, which goes from the 80 minutes of the previous version to up to 3 hours in length. Developers can also request a fee extension to upload even longer files. As so far, developers can get 60 minutes of free audio processing through the Voice API and invoice $ 0.006 every additional 15 seconds.