ai_audio_translate
No security coverage
AI Audio Translator translates audio and video media files into selected languages using AI-powered speech recognition, translation, and voice synthesis. Upload an audio file or video, select a target language, and get a fully translated version.
AI Disclosure.Note that This module is created with the help of AI agents.
Translated videos are not perfect like audio might not sync properly with video.
Features
- Audio Translation Pipeline: Transcribes audio → translates text → generates speech in target language
- Video Translation with Timing: Preserves original video with translated audio track, maintaining lip-sync timing through intelligent segment processing
- Flexible AI Provider Support: Works with any AI module provider (Currently only tested with Gemini)
- One-Click Translation: "Translate" button appears directly on media entity operations
- Format Support: Audio (MP3, WAV, OGG, FLAC, M4A) and Video (MP4, WebM, MOV, AVI)
Post-Installation
- Configure AI Providers: Go to
/admin/config/ai/providersand set up at least one provider supporting Speech-to-Text, Chat (for translation), and Text-to-Speech - Create Language Vocabulary: Create a taxonomy vocabulary (e.g., "Languages") with terms like "Spanish", "Hindi", "Sanskrit"
- Module Settings: Visit
/admin/config/ai/audio-translatorand:- Select your language vocabulary
- Optionally customize the translation prompt
- Optionally override AI providers for specific operations
- Configure FFmpeg path if not auto-detected (video translation only)
- Translate Media: Go to any audio/video media entity, click "Translate" in operations, select language, and submit
Configuration Tips:
- For video translation, ensure FFmpeg is installed on your server (see Status Report)
- Test with short audio files first to verify AI provider configuration
- Monitor the queue at
/admin/config/ai/audio-translator - Check watchdog logs for translation errors:
/admin/reports/dblog
Additional Requirements
Required Drupal Modules:
- AI (^1.0) : Provides AI provider management and operation types
- Media (core): For media entity management
- Taxonomy (core): For language vocabulary
Required AI Provider Capabilities:
- Speech-to-Text (transcription)
- Chat or Text Generation (translation)
- Text-to-Speech (voice synthesis)
System Requirements:
- PHP 8.1 or higher
- Drupal 10.4+ or 11.x
- FFmpeg (required for video translation only; audio works without it)
- Ubuntu/Debian:
sudo apt-get install ffmpeg - DDEV: Add
webimage_extra_packages: [ffmpeg]to.ddev/config.yaml - Verify:
ffmpeg -version
- Ubuntu/Debian:
Technical Details
Video Translation Pipeline:
- Extract audio track from video using FFmpeg
- Detect speech segments via silence analysis
- Transcribe each segment with AI speech-to-text
- Translate segments with duration constraints to maintain timing
- Generate TTS audio for each translated segment
- Time-stretch segments to match original duration (prevents desync)
- Rebuild audio timeline with proper timing
- Mux translated audio back into original video
File Size Limits:
- Audio: 25 MB maximum
- Video: 500 MB maximum