Gemini AI Audio Transcription python Module

mirror of https://github.com/AJaySi/ALwrity.git synced 2026-04-25 00:45:54 +03:00

Table of Contents

Overview
Functions

1. load_environment()
2. configure_google_api()
3. transcribe_audio(audio_file_path)

Usage
Dependencies
Logging

Overview

The gemini_audio_text.py module is designed to transcribe audio files using Google's Gemini Pro model. It includes functionality to load environment variables, configure the Google API, and handle audio transcription.

Functions

1. `load_environment()`

Description: Loads environment variables from a .env file.

def load_environment():
    load_dotenv()
    logger.info("Environment variables loaded successfully.")

2. `configure_google_api()`

Description: Configures the Google Gemini API for audio transcription. Raises: ValueError if the GEMINI_API_KEY environment variable is not set.

def configure_google_api():
    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        error_message = "Google API key not found. Please set the GEMINI_API_KEY environment variable."
        logger.error(error_message)
        raise ValueError(error_message)
    
    genai.configure(api_key=api_key)
    logger.info("Google Gemini API configured successfully.")

3. `transcribe_audio(audio_file_path)`

Description: Transcribes audio using Google's Gemini Pro model. Args:

audio_file_path (str): The path to the audio file to be transcribed. Returns:
str: The transcribed text from the audio. Returns None if transcription fails. Raises:
FileNotFoundError if the audio file is not found.

def transcribe_audio(audio_file_path):
    try:
        load_environment()
        configure_google_api()

        logger.info(f"Attempting to transcribe audio file: {audio_file_path}")

        if not os.path.exists(audio_file_path):
            error_message = f"FileNotFoundError: The audio file at {audio_file_path} does not exist."
            logger.error(error_message)
            raise FileNotFoundError(error_message)

        model = genai.GenerativeModel(model_name="gemini-1.5-flash")

        try:
            audio_file = genai.upload_file(audio_file_path)
            logger.info(f"Audio file uploaded successfully: {audio_file=}")
        except FileNotFoundError:
            error_message = f"FileNotFoundError: The audio file at {audio_file_path} does not exist."
            logger.error(error_message)
            raise FileNotFoundError(error_message) 
        except Exception as e:
            logger.error(f"Error uploading audio file: {e}")
            return None

        try:
            response = model.generate_content([
                "Transcribe the following audio:",
                audio_file
            ])

            if response and hasattr(response, 'text'):
                transcript = response.text
                logger.info(f"Transcription successful:\n{transcript}")
                return transcript
            else:
                logger.warning("Transcription failed: Invalid or empty response from API.")
                return None

        except Exception as e:
            logger.error(f"Error during transcription: {e}")
            return None

    except Exception as e:
        logger.error(f"An unexpected error occurred: {e}")
        return None

Usage

Ensure you have a .env file with the following environment variables:
- GEMINI_API_KEY: Your Google API key.
Call the transcribe_audio function with the path to your audio file:
```
transcript = transcribe_audio("path/to/your/audio/file.wav")
```

Dependencies

os
sys
google.generativeai
dotenv
loguru

Logging

The module uses the loguru library for logging to the console with colorized and formatted messages.