Audio/Video to Text Extractor with Google Cloud Speech-to-Text

This Python script extracts speech from audio or video files using Google Cloud Speech-to-Text and stores the transcribed text in a file. It automatically handles uploading long audio files to Google Cloud Storage (GCS) and processes both short and long audio files using Google Cloud's transcription APIs.

Features

Extract speech from both audio and video files (MP4, MKV, MOV, AVI).
Converts audio to the required format (16kHz, 16-bit mono PCM WAV).
Automatically uploads long audio files to Google Cloud Storage (GCS) for transcription.
Uses Google Cloud Speech-to-Text for transcribing speech from audio/video.
CLI-based usage for easy audio/video file processing.

Requirements

Python 3.12+
FFmpeg (for audio extraction and conversion)
Google Cloud Speech-to-Text API and Google Cloud Storage
Google Cloud Credentials (Service Account JSON)

Google Cloud Setup

Enable the Google Cloud Speech-to-Text API in the Google Cloud Console.
Create a Google Cloud Storage bucket to upload audio files for transcription.
Set up a service account with appropriate permissions and download the service account JSON key file.

Set the environment variable for Google Cloud credentials:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json"

Installation

First, ensure FFmpeg is installed:

On Linux:
```
sudo apt-get install ffmpeg
```
On macOS:
```
brew install ffmpeg
```
On Windows, download the installer from the FFmpeg website.

Install the required dependencies:

poetry install

Usage

You can run the script to extract speech from an audio or video file and save the transcribed text to a file.

Command-Line Interface (CLI) Example:

poetry run audi input_file/video_file.mpg -o output_text.txt --bucket your_gcs_bucket

audio_path_to_audio_or_video_file: The path to the audio or video file you want to transcribe.
output_text.txt: The output file where the transcribed text will be saved.
your_gcs_bucket: The Google Cloud Storage bucket where the audio file will be uploaded for transcription.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
sandbox		sandbox
tests		tests
video2text		video2text
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio/Video to Text Extractor with Google Cloud Speech-to-Text

Features

Requirements

Google Cloud Setup

Installation

Usage

Command-Line Interface (CLI) Example:

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio/Video to Text Extractor with Google Cloud Speech-to-Text

Features

Requirements

Google Cloud Setup

Installation

Usage

Command-Line Interface (CLI) Example:

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages