You can find all the supported encodings here. encoding - Speech-to-Text API only supports a specific type of audio encodings.sample_rate_hertz - Sample rate of the video/audio which we extracted using pydub module.You can check all the supported languages here. language_code - The language used in your video/audio.I’ll download this video using the pytube3 module.īefore diving into transcribing the audio, let’s talk about the configuration required. I’m a Suits fan so I’ll use this video for the demonstration. Move your credentials.json here and then export the credentials -Įxport GOOGLE_APPLICATION_CREDENTIALS="credentials.json" Data Preparation.
Google cloud speech to text install#
Now, run below commands from your TerminalĬd Generate-SRT-File-using-Google-Cloud-s-Speech-to-Text-API pip install -r requirements.txt From the navigation bar, go to APIs & Services > Library > Cloud Speech-to-Text API and Click on Enable. Follow Creating and managing service accounts to set this up.Įnable the Speech-to-Text API in your Google Cloud Project. Download the service account credentials as c redentials.json. Also, a service account with the right to use Speech-to-Text API.Follow Creating and managing projects to set this up. You need to have a Google Cloud project with billing enabled.You need to have Git, Python 3.7 and ffmpeg installed on your system.We will use one such API to generate subtitles - Google Cloud’s Speech-to-Text API. Added benefits are - they require less time to set up, easy to learn, and are cost-efficient. Use pre-trained APIs - Now there are multiple pre-trained APIs that can do this job efficiently.This ain’t worth it for small scale applications/organizations. Train your own ML model - This requires a lot of data, manual labor to annotate the data (Irony), time, and frankly, a lot of money.