15 KiB
Voice Input for Profile and Gift Descriptions
Time Safari currently requires users to type descriptions for their profiles and gifts manually. This project will enable users to speak their descriptions and have them automatically transcribed and filled into the appropriate text fields. The implementation will prioritize on-device processing to maintain privacy and enable offline functionality.
Overview
The goal is to add voice-to-text functionality that allows users to describe their profiles or gifts through speech, with the transcribed text automatically populating the relevant form fields. This is particularly valuable for:
- Users who prefer speaking over typing, especially on mobile devices
- Faster data entry, particularly for longer descriptions
- Accessibility improvements for users with mobility or dexterity challenges
- Natural language descriptions that capture nuance better than typing
- Privacy-focused users who want on-device processing without cloud transcription
This plan provides a comprehensive roadmap for implementing voice input in Time Safari, starting with quick wins to validate the approach before committing to full integration.
Task 1: Basic Voice Input Setup
1.1 Capacitor Speech Recognition Integration
Objective: Establish basic voice recording and transcription capabilities using Capacitor plugins.
Key Components:
Plugin Selection
- Primary Option:
@capacitor-community/speech-recognition- Cross-platform speech recognition - Alternative:
@capacitor-community/voice-recorder+ on-device ML model (if available) - Fallback: Web Speech API for browser environments
Initial Implementation Steps
- Install and configure Capacitor speech recognition plugin
- Request and handle microphone permissions
- Implement basic start/stop recording functionality
- Display recording status indicator to user
- Test on both iOS and Android devices
Basic Recording Interface
interface VoiceRecorder {
start(): Promise<void>;
stop(): Promise<SpeechRecognitionResult>;
cancel(): Promise<void>;
isSupported(): boolean;
hasPermission(): Promise<boolean>;
requestPermission(): Promise<boolean>;
}
Quick Win Test:
- Record audio and verify it's captured
- Display transcript result in console/log
- No form integration yet - just prove recording works
1.2 Permission Management
Objective: Handle microphone permissions gracefully across platforms.
Platform-Specific Considerations:
- iOS: Info.plist
NSMicrophoneUsageDescriptionrequired - Android: AndroidManifest.xml
RECORD_AUDIOpermission - Web: User gesture required for microphone access
Permission Flow:
- Check permission status on component mount
- Request permission if not granted
- Show clear error messages if permission denied
- Provide settings link to enable permissions manually
- Handle permission state changes (user grants/denies during session)
Quick Win Test:
- Verify permission prompts appear correctly
- Test denial handling (show user-friendly message)
- Test re-requesting permissions after denial
1.3 Basic UI Components
Objective: Create minimal UI elements for voice recording.
Components Needed:
- Record Button: Visual indicator with recording state (idle/recording/processing)
- Status Indicator: Show "Listening..." or "Processing..." feedback
- Cancel Button: Allow user to abort recording
- Visual Feedback: Animated microphone icon or waveform during recording
Quick Win Test:
- Create standalone component with record button
- Toggle recording state visually
- Display transcript result below button (simple div)
Task 2: Speech-to-Text Transcription
2.1 On-Device Transcription Setup
Objective: Configure on-device speech recognition to avoid cloud processing.
Implementation Approach:
- Use device-native speech recognition APIs through Capacitor
- Configure for offline/on-device processing when available
- Fall back to online recognition if on-device unavailable (with user consent)
Platform Capabilities:
- iOS: Speech framework (on-device Siri recognition)
- Android: SpeechRecognizer API (on-device available on Android 10+)
- Web: Web Speech API (typically cloud-based, but can use local models in some browsers)
Configuration Options:
- Language selection (default to device language, allow override)
- Continuous vs. single-shot recognition
- Interim results vs. final results only
- Confidence thresholds for transcription quality
Quick Win Test:
- Speak simple phrases and verify transcription accuracy
- Test with different languages if device supports
- Compare on-device vs. online transcription quality
2.2 Text Processing & Formatting
Objective: Clean and format transcribed text for form fields.
Text Processing Steps:
- Capitalization: Proper sentence capitalization (first letter, proper nouns)
- Punctuation: Add periods at sentence endings if missing
- Removal: Strip filler words like "um", "uh", "like" (optional, user preference)
- Normalization: Fix common speech-to-text errors (homophones, numbers)
- Formatting: Handle line breaks for longer descriptions
Processing Pipeline:
interface TextProcessor {
clean(text: string): string;
capitalize(text: string): string;
addPunctuation(text: string): string;
removeFillers(text: string): string;
normalize(text: string): string;
}
Quick Win Test:
- Input test transcript with common issues
- Verify each processing step works independently
- Test with real transcriptions from speech recognition
2.3 Error Handling & Retry Logic
Objective: Handle transcription failures gracefully.
Error Scenarios:
- No speech detected (silence or background noise only)
- Network unavailable (if using cloud fallback)
- Recognition timeout
- Low confidence transcription
- Permission revoked during recording
Recovery Strategies:
- Show clear error messages with actionable guidance
- Allow user to retry recording easily
- Offer manual text input as fallback
- Save partial results if transcription interrupted
Quick Win Test:
- Simulate each error condition
- Verify error messages are user-friendly
- Test retry functionality
Task 3: Profile Description Integration
3.1 Profile Form Integration
Objective: Integrate voice input into profile description editing.
Integration Points:
- Add microphone button next to profile description textarea
- Replace or append to existing text based on user preference
- Maintain existing form validation and submission flow
- Preserve text if user switches between voice and typing
User Experience Flow:
- User taps microphone icon next to description field
- Recording starts, button shows recording state
- User speaks description
- User taps stop button (or auto-stops after pause)
- Transcribed text appears in description field
- User can edit transcribed text manually if needed
- User can record again to replace or append
Form State Management:
- Track whether text came from voice or typing
- Allow editing of transcribed text
- Handle form submission with voice-generated text
- Save draft state including voice transcriptions
Quick Win Test:
- Add microphone button to profile form
- Record and fill description field
- Submit form and verify description saves correctly
3.2 Profile-Specific Text Processing
Objective: Optimize transcription processing for profile descriptions.
Profile Context Awareness:
- Profile descriptions are typically first-person ("I am...", "I enjoy...")
- May include skills, interests, location, availability
- Often include action-oriented language
Optimization Strategies:
- Recognize common profile phrases and ensure proper formatting
- Handle personal pronouns appropriately
- Format lists and bullet points if user says "first", "second", etc.
- Preserve natural language flow while cleaning up speech artifacts
Quick Win Test:
- Test with typical profile description phrases
- Verify formatting looks natural in profile preview
- Compare user satisfaction with typed vs. voice descriptions
Task 4: Gift Description Integration
4.1 Gift Form Integration
Objective: Integrate voice input into gift (Give) description editing.
Integration Points:
- Add microphone button to gift description/impact text fields
- Enable microphone while camera is active
- Support both "what was given" and "impact" descriptions
- Handle gifts that may be time-based or physical items
- Support longer descriptions that describe impact and outcomes
User Experience Flow: Similar to profile flow, but optimized for gift context:
- User initiates camera & microphone 1.1. Tap a button to begin 1.2. Allow a shortcut from the desktop to go directly to this page
- Collect information from audio, eg. description at first
- User taps the camera to take a picture and complete gathering of audio data
- Transcribed text fills field
- Reviews and submits
Quick Win Test:
- Enable camera & microphone button by default when gifting
- Verify gift submission works with voice transcriptions
4.2 Gift-Specific Text Processing
Objective: Optimize transcription for gift/impact descriptions.
Gift Context Awareness:
- Gift descriptions often include recipient names or pronouns
- Action-oriented language ("I helped", "I gave", "we organized")
- Impact descriptions may include outcomes and results
- May reference time commitments or physical items
Optimization Strategies:
- Recognize gift-related phrases and terminology
- Handle recipient references appropriately
- Format action verbs naturally
- Support both past-tense (completed gifts) and present-tense (ongoing) descriptions
Quick Win Test:
- Test with various gift description styles
- Verify descriptions format well
- Verify other items like giver & recipient fill in
Task 5: Advanced Features & Optimization
5.1 Voice Command Recognition
Objective: Enable voice commands for common actions (future enhancement).
Potential Commands:
- "Next field" - Move to next form field
- "Delete that" - Remove last transcribed segment
- "Add period" - Add punctuation
- "Start over" - Clear current field and restart
- "Save draft" - Save current form state
Quick Win Test (if implemented):
- Test single command recognition
- Verify commands work during recording
- Test command accuracy
5.2 Editing & Correction Interface
Objective: Make it easy to correct transcription errors.
Editing Features:
- Highlight uncertain words (low confidence transcriptions)
- Allow inline editing of transcribed text
- Quick correction suggestions for common errors
- Undo/redo for transcription changes
Quick Win Test:
- Show confidence scores for words
- Allow clicking on words to edit
- Test undo functionality
5.3 Performance Optimization
Objective: Optimize voice input for speed and efficiency.
Optimization Areas:
- Reduce latency between recording and transcription display
- Implement incremental transcription (show words as recognized)
- Cache recognition models to avoid re-initialization
- Optimize audio processing for faster transcription
- Reduce battery consumption during recording
Quick Win Test:
- Measure time from stop recording to text display
- Monitor battery usage during extended recording
- Test performance on lower-end devices
Implementation Phases
Phase 1: Proof of Concept (1-2 weeks)
Goal: Prove voice recording and basic transcription works
- Install Capacitor speech recognition plugin
- Create standalone test component with record button
- Test microphone permissions on iOS and Android
- Verify audio recording works
- Display raw transcription output
- Quick Win: Successfully record and see transcript in console
Deliverables: Working prototype that can record and show transcription
Phase 2: Basic Text Processing (1 week)
Goal: Clean and format transcribed text
- Implement basic text cleaning (capitalization, punctuation)
- Create text processing utility functions
- Test with various transcriptions
- Add unit tests for text processing
- Quick Win: Clean transcription appears formatted correctly
Deliverables: Text processing utilities with tests
Phase 3: Profile Form Integration (2 weeks)
Goal: Voice input works in profile description field
- Add microphone button to profile form
- Integrate recording with description textarea
- Handle form state (voice vs. typed text)
- Test form submission with voice transcriptions
- Add UI feedback (recording indicator, status messages)
- Handle errors gracefully (permissions, recognition failures)
- Quick Win: Can record and fill profile description, submit successfully
Deliverables: Working voice input in profile forms
Phase 4: Gift Form Integration (1-2 weeks)
Goal: Voice input works in gift description fields
- Add microphone buttons to gift form fields
- Integrate with both "what given" and "impact" fields
- Optimize text processing for gift context
- Test with various gift types and descriptions
- Quick Win: Can record and fill gift descriptions, submit successfully
Deliverables: Working voice input in gift forms
Phase 5: Polish & Enhancement (2-3 weeks)
Goal: Improve UX and handle edge cases
- Add loading states and animations
- Implement retry logic for failed transcriptions
- Add editing capabilities for transcribed text
- Optimize performance and battery usage
- Add accessibility improvements
- Comprehensive testing on multiple devices
- User testing and feedback incorporation
Deliverables: Production-ready voice input feature
Technical Considerations
Privacy & Security
- Prioritize on-device processing to avoid sending audio to servers
- Request explicit user consent before using any cloud-based recognition
- Ensure audio data is not stored permanently
- Clear audio buffers after transcription
- Document privacy implications in user-facing documentation
Accessibility
- Ensure voice input doesn't interfere with screen readers
- Provide keyboard shortcuts for recording start/stop
- Support alternative input methods (voice should complement, not replace typing)
- Test with users who have speech impairments
Performance
- Minimize battery usage during recording
- Optimize for lower-end devices
- Handle network connectivity changes gracefully
- Implement timeout mechanisms to prevent infinite recording
- Consider limiting maximum recording duration
Platform Compatibility
- Test on iOS (multiple versions) and Android (multiple versions)
- Provide graceful degradation for unsupported platforms
- Handle platform-specific permission models
- Test in both Capacitor native and web environments
- Support browser environments with Web Speech API fallback
User Experience
- Provide clear visual feedback during recording
- Make it easy to cancel or retry recordings
- Allow editing of transcribed text
- Don't force voice input - always allow typing as alternative
- Provide helpful error messages with actionable guidance
- Consider adding voice input tutorials or onboarding
Testing Strategy
- Unit tests for text processing utilities
- Integration tests for form submission with voice transcriptions
- Device testing on physical iOS and Android devices
- Test with various accents and speech patterns
- Test error scenarios (no permission, network issues, etc.)
- User acceptance testing with real users