Voice Notes and Audio Transcription

Overview

The Voice Notes feature allows users to record voice messages or upload existing audio files, which are transcribed and sent as messages with attached audio files. This provides a convenient alternative to typing, especially on mobile devices or when multitasking. The audio files are preserved for playback, allowing recipients to listen to the original voice recording.

Architecture

VoiceNoteRecorder: A React component that handles recording, uploading, and transcribing voice notes
Audio Upload API: Stores the audio file in Supabase storage
Audio Transcription API: Uses OpenAI's Whisper model to transcribe speech to text
MessageFileItem: Displays audio files with playback controls in messages
ChatInput: Integrates the voice note recorder and handles sending messages with audio attachments
Audio File Handler: Processes uploaded audio files for transcription

Implementation Details

VoiceNoteRecorder Component

The VoiceNoteRecorder component uses the Web Audio API to record audio from the user's microphone. It provides:

Start, stop, and cancel recording controls
Visualization of audio input
Recording duration tracking
Upload and transcription functionality
Audio preview before sending

Audio File Transcription

In addition to recording voice notes directly, the system can now transcribe existing audio files:

Users can upload audio files through the regular file attachment interface
The system detects audio file types and offers to transcribe them
If selected, the audio is processed through the same transcription pipeline as voice notes
Both the transcription and the audio file are attached to the message

Audio Upload API

Accepts audio files (WebM, MP3, WAV, etc.)
Stores them in a dedicated voice_notes bucket in Supabase storage
Returns the URL and path for accessing the stored file

Audio Transcription API

Takes the URL of an uploaded audio file
Uses OpenAI's Whisper model to transcribe the speech
Returns the transcribed text
Preserves the audio file for playback

Chat Input Integration

The voice note functionality is integrated into the ChatInput component, which:

Toggles the voice note recorder interface
Handles the transcribed text and audio file URL
Sends both the transcription as message content and the audio file as an attachment
Detects and processes uploaded audio files for transcription

User Flow

Recording Voice Notes

User clicks the microphone icon in the chat input
The voice note recorder interface appears, and the user can start recording
User speaks their message, with visualization feedback showing voice input
User stops recording when finished
The audio is automatically uploaded and transcribed
The transcription is displayed for review
User can preview the recording before sending
When sent, the transcribed text appears as the message content with the audio file attached

Uploading Audio Files

User clicks the attachment icon in the chat input
User selects an audio file from their device
System detects the audio file and asks if the user wants to transcribe it
If confirmed, the audio is uploaded and transcribed
When sent, the transcribed text appears as the message content with the audio file attached
Recipients can play back the original audio recording

Future Enhancements

Support for longer recordings (currently limited to ~1 minute)
Enhanced audio processing (noise reduction, normalization)
Support for additional audio formats
Waveform visualization of the recording
Save/download options for voice recordings
Voice commands for controlling the AI
Multiple language transcription support
Batch processing of multiple audio files

Voice Notes Feature

Documentation