Files

Trent Larson d913ed7482 add overall Gift Economies organization to this repo and refactor

2025-12-25 09:30:09 -07:00

15 KiB

Raw Blame History

Voice Input for Profile and Gift Descriptions

Time Safari currently requires users to type descriptions for their profiles and gifts manually. This project will enable users to speak their descriptions and have them automatically transcribed and filled into the appropriate text fields. The implementation will prioritize on-device processing to maintain privacy and enable offline functionality.

Overview

The goal is to add voice-to-text functionality that allows users to describe their profiles or gifts through speech, with the transcribed text automatically populating the relevant form fields. This is particularly valuable for:

Users who prefer speaking over typing, especially on mobile devices
Faster data entry, particularly for longer descriptions
Accessibility improvements for users with mobility or dexterity challenges
Natural language descriptions that capture nuance better than typing
Privacy-focused users who want on-device processing without cloud transcription

This plan provides a comprehensive roadmap for implementing voice input in Time Safari, starting with quick wins to validate the approach before committing to full integration.

Task 1: Basic Voice Input Setup

1.1 Capacitor Speech Recognition Integration

Objective: Establish basic voice recording and transcription capabilities using Capacitor plugins.

Key Components:

Plugin Selection

Primary Option: @capacitor-community/speech-recognition - Cross-platform speech recognition
Alternative: @capacitor-community/voice-recorder + on-device ML model (if available)
Fallback: Web Speech API for browser environments

Initial Implementation Steps

Install and configure Capacitor speech recognition plugin
Request and handle microphone permissions
Implement basic start/stop recording functionality
Display recording status indicator to user
Test on both iOS and Android devices

Basic Recording Interface

interface VoiceRecorder {
  start(): Promise<void>;
  stop(): Promise<SpeechRecognitionResult>;
  cancel(): Promise<void>;
  isSupported(): boolean;
  hasPermission(): Promise<boolean>;
  requestPermission(): Promise<boolean>;
}

Quick Win Test:

Record audio and verify it's captured
Display transcript result in console/log
No form integration yet - just prove recording works

1.2 Permission Management

Objective: Handle microphone permissions gracefully across platforms.

Platform-Specific Considerations:

iOS: Info.plist NSMicrophoneUsageDescription required
Android: AndroidManifest.xml RECORD_AUDIO permission
Web: User gesture required for microphone access

Permission Flow:

Check permission status on component mount
Request permission if not granted
Show clear error messages if permission denied
Provide settings link to enable permissions manually
Handle permission state changes (user grants/denies during session)

Quick Win Test:

Verify permission prompts appear correctly
Test denial handling (show user-friendly message)
Test re-requesting permissions after denial

1.3 Basic UI Components

Objective: Create minimal UI elements for voice recording.

Components Needed:

Record Button: Visual indicator with recording state (idle/recording/processing)
Status Indicator: Show "Listening..." or "Processing..." feedback
Cancel Button: Allow user to abort recording
Visual Feedback: Animated microphone icon or waveform during recording

Quick Win Test:

Create standalone component with record button
Toggle recording state visually
Display transcript result below button (simple div)

Task 2: Speech-to-Text Transcription

2.1 On-Device Transcription Setup

Objective: Configure on-device speech recognition to avoid cloud processing.

Implementation Approach:

Use device-native speech recognition APIs through Capacitor
Configure for offline/on-device processing when available
Fall back to online recognition if on-device unavailable (with user consent)

Platform Capabilities:

iOS: Speech framework (on-device Siri recognition)
Android: SpeechRecognizer API (on-device available on Android 10+)
Web: Web Speech API (typically cloud-based, but can use local models in some browsers)

Configuration Options:

Language selection (default to device language, allow override)
Continuous vs. single-shot recognition
Interim results vs. final results only
Confidence thresholds for transcription quality

Quick Win Test:

Speak simple phrases and verify transcription accuracy
Test with different languages if device supports
Compare on-device vs. online transcription quality

2.2 Text Processing & Formatting

Objective: Clean and format transcribed text for form fields.

Text Processing Steps:

Capitalization: Proper sentence capitalization (first letter, proper nouns)
Punctuation: Add periods at sentence endings if missing
Removal: Strip filler words like "um", "uh", "like" (optional, user preference)
Normalization: Fix common speech-to-text errors (homophones, numbers)
Formatting: Handle line breaks for longer descriptions

Processing Pipeline:

interface TextProcessor {
  clean(text: string): string;
  capitalize(text: string): string;
  addPunctuation(text: string): string;
  removeFillers(text: string): string;
  normalize(text: string): string;
}

Quick Win Test:

Input test transcript with common issues
Verify each processing step works independently
Test with real transcriptions from speech recognition

2.3 Error Handling & Retry Logic

Objective: Handle transcription failures gracefully.

Error Scenarios:

No speech detected (silence or background noise only)
Network unavailable (if using cloud fallback)
Recognition timeout
Low confidence transcription
Permission revoked during recording

Recovery Strategies:

Show clear error messages with actionable guidance
Allow user to retry recording easily
Offer manual text input as fallback
Save partial results if transcription interrupted

Quick Win Test:

Simulate each error condition
Verify error messages are user-friendly
Test retry functionality

Task 3: Profile Description Integration

3.1 Profile Form Integration

Objective: Integrate voice input into profile description editing.

Integration Points:

Add microphone button next to profile description textarea
Replace or append to existing text based on user preference
Maintain existing form validation and submission flow
Preserve text if user switches between voice and typing

User Experience Flow:

User taps microphone icon next to description field
Recording starts, button shows recording state
User speaks description
User taps stop button (or auto-stops after pause)
Transcribed text appears in description field
User can edit transcribed text manually if needed
User can record again to replace or append

Form State Management:

Track whether text came from voice or typing
Allow editing of transcribed text
Handle form submission with voice-generated text
Save draft state including voice transcriptions

Quick Win Test:

Add microphone button to profile form
Record and fill description field
Submit form and verify description saves correctly

3.2 Profile-Specific Text Processing

Objective: Optimize transcription processing for profile descriptions.

Profile Context Awareness:

Profile descriptions are typically first-person ("I am...", "I enjoy...")
May include skills, interests, location, availability
Often include action-oriented language

Optimization Strategies:

Recognize common profile phrases and ensure proper formatting
Handle personal pronouns appropriately
Format lists and bullet points if user says "first", "second", etc.
Preserve natural language flow while cleaning up speech artifacts

Quick Win Test:

Test with typical profile description phrases
Verify formatting looks natural in profile preview
Compare user satisfaction with typed vs. voice descriptions

Task 4: Gift Description Integration

4.1 Gift Form Integration

Objective: Integrate voice input into gift (Give) description editing.

Integration Points:

Add microphone button to gift description/impact text fields
Enable microphone while camera is active
Support both "what was given" and "impact" descriptions
Handle gifts that may be time-based or physical items
Support longer descriptions that describe impact and outcomes

User Experience Flow: Similar to profile flow, but optimized for gift context:

User initiates camera & microphone 1.1. Tap a button to begin 1.2. Allow a shortcut from the desktop to go directly to this page
Collect information from audio, eg. description at first
User taps the camera to take a picture and complete gathering of audio data
Transcribed text fills field
Reviews and submits

Quick Win Test:

Enable camera & microphone button by default when gifting
Verify gift submission works with voice transcriptions

4.2 Gift-Specific Text Processing

Objective: Optimize transcription for gift/impact descriptions.

Gift Context Awareness:

Gift descriptions often include recipient names or pronouns
Action-oriented language ("I helped", "I gave", "we organized")
Impact descriptions may include outcomes and results
May reference time commitments or physical items

Optimization Strategies:

Recognize gift-related phrases and terminology
Handle recipient references appropriately
Format action verbs naturally
Support both past-tense (completed gifts) and present-tense (ongoing) descriptions

Quick Win Test:

Test with various gift description styles
Verify descriptions format well
Verify other items like giver & recipient fill in

Task 5: Advanced Features & Optimization

5.1 Voice Command Recognition

Objective: Enable voice commands for common actions (future enhancement).

Potential Commands:

"Next field" - Move to next form field
"Delete that" - Remove last transcribed segment
"Add period" - Add punctuation
"Start over" - Clear current field and restart
"Save draft" - Save current form state

Quick Win Test (if implemented):

Test single command recognition
Verify commands work during recording
Test command accuracy

5.2 Editing & Correction Interface

Objective: Make it easy to correct transcription errors.

Editing Features:

Highlight uncertain words (low confidence transcriptions)
Allow inline editing of transcribed text
Quick correction suggestions for common errors
Undo/redo for transcription changes

Quick Win Test:

Show confidence scores for words
Allow clicking on words to edit
Test undo functionality

5.3 Performance Optimization

Objective: Optimize voice input for speed and efficiency.

Optimization Areas:

Reduce latency between recording and transcription display
Implement incremental transcription (show words as recognized)
Cache recognition models to avoid re-initialization
Optimize audio processing for faster transcription
Reduce battery consumption during recording

Quick Win Test:

Measure time from stop recording to text display
Monitor battery usage during extended recording
Test performance on lower-end devices

Implementation Phases

Phase 1: Proof of Concept (1-2 weeks)

Goal: Prove voice recording and basic transcription works

Install Capacitor speech recognition plugin
Create standalone test component with record button
Test microphone permissions on iOS and Android
Verify audio recording works
Display raw transcription output
Quick Win: Successfully record and see transcript in console

Deliverables: Working prototype that can record and show transcription

Phase 2: Basic Text Processing (1 week)

Goal: Clean and format transcribed text

Implement basic text cleaning (capitalization, punctuation)
Create text processing utility functions
Test with various transcriptions
Add unit tests for text processing
Quick Win: Clean transcription appears formatted correctly

Deliverables: Text processing utilities with tests

Phase 3: Profile Form Integration (2 weeks)

Goal: Voice input works in profile description field

Add microphone button to profile form
Integrate recording with description textarea
Handle form state (voice vs. typed text)
Test form submission with voice transcriptions
Add UI feedback (recording indicator, status messages)
Handle errors gracefully (permissions, recognition failures)
Quick Win: Can record and fill profile description, submit successfully

Deliverables: Working voice input in profile forms

Phase 4: Gift Form Integration (1-2 weeks)

Goal: Voice input works in gift description fields

Add microphone buttons to gift form fields
Integrate with both "what given" and "impact" fields
Optimize text processing for gift context
Test with various gift types and descriptions
Quick Win: Can record and fill gift descriptions, submit successfully

Deliverables: Working voice input in gift forms

Phase 5: Polish & Enhancement (2-3 weeks)

Goal: Improve UX and handle edge cases

Add loading states and animations
Implement retry logic for failed transcriptions
Add editing capabilities for transcribed text
Optimize performance and battery usage
Add accessibility improvements
Comprehensive testing on multiple devices
User testing and feedback incorporation

Deliverables: Production-ready voice input feature

Technical Considerations

Privacy & Security

Prioritize on-device processing to avoid sending audio to servers
Request explicit user consent before using any cloud-based recognition
Ensure audio data is not stored permanently
Clear audio buffers after transcription
Document privacy implications in user-facing documentation

Accessibility

Ensure voice input doesn't interfere with screen readers
Provide keyboard shortcuts for recording start/stop
Support alternative input methods (voice should complement, not replace typing)
Test with users who have speech impairments

Performance

Minimize battery usage during recording
Optimize for lower-end devices
Handle network connectivity changes gracefully
Implement timeout mechanisms to prevent infinite recording
Consider limiting maximum recording duration

Platform Compatibility

Test on iOS (multiple versions) and Android (multiple versions)
Provide graceful degradation for unsupported platforms
Handle platform-specific permission models
Test in both Capacitor native and web environments
Support browser environments with Web Speech API fallback

User Experience

Provide clear visual feedback during recording
Make it easy to cancel or retry recordings
Allow editing of transcribed text
Don't force voice input - always allow typing as alternative
Provide helpful error messages with actionable guidance
Consider adding voice input tutorials or onboarding

Testing Strategy

Unit tests for text processing utilities
Integration tests for form submission with voice transcriptions
Device testing on physical iOS and Android devices
Test with various accents and speech patterns
Test error scenarios (no permission, network issues, etc.)
User acceptance testing with real users

15 KiB Raw Blame History

Voice Input for Profile and Gift Descriptions

Overview

Task 1: Basic Voice Input Setup

1.1 Capacitor Speech Recognition Integration

Plugin Selection

Initial Implementation Steps

Basic Recording Interface

1.2 Permission Management

1.3 Basic UI Components

Task 2: Speech-to-Text Transcription

2.1 On-Device Transcription Setup

2.2 Text Processing & Formatting

2.3 Error Handling & Retry Logic

Task 3: Profile Description Integration

3.1 Profile Form Integration

3.2 Profile-Specific Text Processing

Task 4: Gift Description Integration

4.1 Gift Form Integration

4.2 Gift-Specific Text Processing

Task 5: Advanced Features & Optimization

5.1 Voice Command Recognition

5.2 Editing & Correction Interface

5.3 Performance Optimization

Implementation Phases

Phase 1: Proof of Concept (1-2 weeks)

Phase 2: Basic Text Processing (1 week)

Phase 3: Profile Form Integration (2 weeks)

Phase 4: Gift Form Integration (1-2 weeks)

Phase 5: Polish & Enhancement (2-3 weeks)

Technical Considerations

Privacy & Security

Accessibility

Performance

Platform Compatibility

User Experience

Testing Strategy

15 KiB

Raw Blame History