Voice Notes Feature

Documentation

  • Overview
  • Voice Notes Feature

Voice Notes and Audio Transcription

Overview

The Voice Notes feature allows users to record voice messages or upload existing audio files, which are transcribed and sent as messages with attached audio files. This provides a convenient alternative to typing, especially on mobile devices or when multitasking. The audio files are preserved for playback, allowing recipients to listen to the original voice recording.

Architecture

  • VoiceNoteRecorder: A React component that handles recording, uploading, and transcribing voice notes
  • Audio Upload API: Stores the audio file in Supabase storage
  • Audio Transcription API: Uses OpenAI's Whisper model to transcribe speech to text
  • MessageFileItem: Displays audio files with playback controls in messages
  • ChatInput: Integrates the voice note recorder and handles sending messages with audio attachments
  • Audio File Handler: Processes uploaded audio files for transcription

Implementation Details

VoiceNoteRecorder Component

The VoiceNoteRecorder component uses the Web Audio API to record audio from the user's microphone. It provides:

  • Start, stop, and cancel recording controls
  • Visualization of audio input
  • Recording duration tracking
  • Upload and transcription functionality
  • Audio preview before sending

Audio File Transcription

In addition to recording voice notes directly, the system can now transcribe existing audio files:

  • Users can upload audio files through the regular file attachment interface
  • The system detects audio file types and offers to transcribe them
  • If selected, the audio is processed through the same transcription pipeline as voice notes
  • Both the transcription and the audio file are attached to the message

Audio Upload API

  • Accepts audio files (WebM, MP3, WAV, etc.)
  • Stores them in a dedicated voice_notes bucket in Supabase storage
  • Returns the URL and path for accessing the stored file

Audio Transcription API

  • Takes the URL of an uploaded audio file
  • Uses OpenAI's Whisper model to transcribe the speech
  • Returns the transcribed text
  • Preserves the audio file for playback

Chat Input Integration

The voice note functionality is integrated into the ChatInput component, which:

  • Toggles the voice note recorder interface
  • Handles the transcribed text and audio file URL
  • Sends both the transcription as message content and the audio file as an attachment
  • Detects and processes uploaded audio files for transcription

User Flow

Recording Voice Notes

  1. User clicks the microphone icon in the chat input
  2. The voice note recorder interface appears, and the user can start recording
  3. User speaks their message, with visualization feedback showing voice input
  4. User stops recording when finished
  5. The audio is automatically uploaded and transcribed
  6. The transcription is displayed for review
  7. User can preview the recording before sending
  8. When sent, the transcribed text appears as the message content with the audio file attached

Uploading Audio Files

  1. User clicks the attachment icon in the chat input
  2. User selects an audio file from their device
  3. System detects the audio file and asks if the user wants to transcribe it
  4. If confirmed, the audio is uploaded and transcribed
  5. When sent, the transcribed text appears as the message content with the audio file attached
  6. Recipients can play back the original audio recording

Future Enhancements

  • Support for longer recordings (currently limited to ~1 minute)
  • Enhanced audio processing (noise reduction, normalization)
  • Support for additional audio formats
  • Waveform visualization of the recording
  • Save/download options for voice recordings
  • Voice commands for controlling the AI
  • Multiple language transcription support
  • Batch processing of multiple audio files