Commit Graph

8 Commits

Author SHA1 Message Date
Matthew Raymer
ac39255672 test(android-test-app): unify presentation framework with evidence collection
Implement P0-P5 directives for operator clarity, consistent outcomes, and
easy evidence capture across all test phases.

Changes:
- alarm-test-lib.sh: Add evidence collection (capture_alarms, capture_logcat,
  capture_screenshot), verdict functions (verdict_pass/warn/fail), run directory
  management, and release gating support (RELEASE_GATE_PHASE3)

- test-phase1.sh: Refactor to unified framework with CLI modes (--setup,
  --run, --smoke, --all, --ci), micro-prompts, evidence capture, and verdict
  blocks for all 5 tests

- test-phase2.sh: Add evidence capture, verdict blocks, and STRICTNESS policy
  (soft/hard) for warn vs fail behavior

- test-phase3.sh: Add evidence capture, verdict blocks, release gating
  (--gate-phase3), and fatigue reduction (time estimates, automation hints)

- RUNBOOK-TESTING.md: New comprehensive operator guide (669 lines) covering
  prerequisites, all phases, evidence locations, verdict interpretation,
  common failures, and troubleshooting

All test scripts now use consistent UI helpers (section, substep, info, ok,
warn, error), standardized evidence collection, and clear verdict reporting.
Evidence is saved to timestamped run directories (runs/<RUN_ID>/) with alarms,
logs, and screenshots organized by test phase and scenario.

Tests pass with consistent presentation and reproducible evidence collection.
2025-12-24 12:01:16 +00:00
Matthew Raymer
1053b668d0 test(phase1): automate TEST 4 invalid data handling verification
Implements automated testing for TEST 4 (Invalid Data Handling) to verify
recovery gracefully handles invalid database entries without crashing.

Changes:
- Add injectInvalidTestData plugin method for injecting invalid test data
  (empty schedule IDs, null nextRunAt, empty notification IDs)
- Make test app debuggable to enable direct database access
- Enhance test-phase1.sh with automated database injection and verification:
  * Detect debuggable app status (check for DEBUGGABLE flag)
  * Inject invalid data via direct SQL (schedules and notifications)
  * Handle WAL mode with checkpoint
  * Verify data injection success
  * Trigger recovery and check logs for "Skipping invalid" messages
  * Report pass/fail/inconclusive results

Fixes database constraint issues discovered during testing:
- Include jitterMs and backoffPolicy in schedule inserts
- Include priority, vibration_enabled, sound_enabled in notification inserts

Test results:  PASSED
- Invalid data successfully injected
- Cold start recovery correctly skips invalid entries
- Recovery completes without crashing
- Boot recovery processes invalid data (follow-up improvement needed)

This enables automated verification that recovery handles corrupted or
invalid database entries gracefully, preventing crashes in production.
2025-12-08 07:06:00 +00:00
Matthew Raymer
5bdb6979e1 fix(android): enforce one-per-day semantics in scheduleDailyNotification
Fix duplicate alarm bug where updating schedule time created multiple
schedules in database, violating "one notification per day" contract.

Plugin Changes:
- Use stable scheduleId "daily_notification" instead of timestamp-based IDs
- Delete all existing notification schedules before creating new one
- Cancel alarms in AlarmManager before database deletion
- Add detailed logging for cleanup operations
- Make scheduleDailyReminder delegate to scheduleDailyNotification

Test Harness Changes:
- Make TEST 2 fail when alarm count > 1 after schedule update
- Make TEST 2 fail when alarm count > 1 after recovery
- Add clear failure messages explaining "one per day" violation
- Add final verdict section with detailed failure summary

Results:
- Before: 2-3 alarms, 2 schedules in DB, "Pending: 2" in UI
- After: 1 alarm, 1 schedule in DB, "Pending: 1" in UI
- TEST 2 now correctly passes with proper validation

This ensures that updating schedule time maintains exactly one alarm
per day, preventing duplicate notifications and database bloat.
2025-12-08 06:36:16 +00:00
Matthew Raymer
ca194952e4 test(android): add auto-reset for TEST 1 and create golden run documentation
Add automatic app state reset for TEST 1 to ensure clean starting state when
lingering alarms from TEST 0 are detected. Create PHASE1_TEST1_GOLDEN.md with
actual values from successful run.

TEST 1 Auto-Reset:
- Detect lingering plugin alarms before TEST 1 starts
- Automatically uninstall/reinstall app to clear alarms
- Verify clean state (0 alarms) before proceeding
- Gracefully skip TEST 1 if clean state cannot be achieved
- Take failure screenshots when reset fails
- Wrap all TEST 1 steps in conditional to skip on reset failure

Documentation:
- Create PHASE1_TEST1_GOLDEN.md with actual values from passing run
- Document auto-reset behavior in golden run steps
- Add cross-references between TEST 0 and TEST 1 golden docs
- Include actual timestamps, scheduleIds, and recovery metrics

This ensures TEST 1 always starts from a known clean state, making test
results reliable and reproducible. The golden doc serves as a baseline for
comparing future TEST 1 runs.
2025-12-04 10:22:35 +00:00
Matthew Raymer
1103513db3 test(android): fix alarm counting logic and add screenshot capture
Fix alarm counting to correctly parse dumpsys output where app ID and
action appear on different lines. Add screenshot capture for test
diagnostics and create golden run documentation.

Test Harness Improvements:
- Fix get_plugin_alarm_count() to track app ID and action separately
  across alarm block lines (fixes false 0-count bug)
- Add show_plugin_alarms_compact() to display complete alarm blocks
- Add wait_for_stable_plugin_alarm_count() polling helper to reduce
  race condition false negatives
- Add take_screenshot() and take_failure_screenshot() helpers for
  automatic test state capture
- Integrate screenshots into TEST 0 at key checkpoints
- Update TEST 0 messaging to handle race conditions gracefully
- Add screenshots/ to .gitignore

Documentation:
- Create PHASE1_TEST0_GOLDEN.md with actual values from successful run
- Document expected script output, UI state, dumpsys shape, and logcat
  patterns
- Include pass/fail checklist for future test runs

This fixes the issue where alarm counting always returned 0 because the
AWK logic required app ID and action on the same line, but dumpsys
output has them on separate lines (header line has app ID, tag line
has action).
2025-12-04 09:28:28 +00:00
Matthew Raymer
fc2f64bae3 fix(notify): eliminate duplicate alarm scheduling and fix test harness counting
Centralize all notification alarm scheduling through NotifyReceiver.scheduleExactNotification()
with idempotence checks to prevent duplicate alarms. Implement one-alarm policy using
setAlarmClock() only. Fix test harness alarm counting to deduplicate by Alarm handle.

Plugin Changes:
- Add ScheduleSource enum to track scheduling paths (INITIAL_SETUP, ROLLOVER_ON_FIRE, etc.)
- Add DB-level idempotence check before scheduling (prevents logical duplicates)
- Add explicit alarm cancellation before scheduling (safety net)
- Implement one-alarm policy: use setAlarmClock() only, no setExact* fallbacks for same event
- Add deep logging for all AlarmManager calls (variant, requestCode, pendingIntentHash)
- Update all rollover paths (DailyNotificationReceiver, DailyNotificationWorker) to use
  centralized function with ROLLOVER_ON_FIRE source
- Add @JvmStatic annotation to scheduleExactNotification for Java interop

Test Harness Changes:
- Fix get_plugin_alarm_count() to deduplicate by Alarm handle (prevents double-counting
  same alarm in main list and "Next wake from idle" section)
- Update TEST 0 messaging: treat 0 alarms as race condition (inconclusive, not failure)
- Make post-rollover check the authoritative assertion point (only fails on >1 or 0 alarms)
- Remove redundant "Found 0 alarms - test may not be accurate" messages

This fixes the duplicate alarm bug where two distinct AlarmManager entries were created
for the same daily notification, violating the "one notification per day" contract.
2025-12-01 10:09:54 +00:00
Matthew Raymer
07ace32982 refactor(test): extract shared helpers into alarm-test-lib.sh
Extract common helper functions from test-phase1.sh, test-phase2.sh,
and test-phase3.sh into a shared library (alarm-test-lib.sh) to reduce
code duplication and improve maintainability.

Changes:
- Create alarm-test-lib.sh with shared configuration, UI helpers, ADB
  helpers, log parsing, and test selection logic
- Refactor all three phase test scripts to source the shared library
- Remove ~200 lines of duplicated code across the three scripts
- Preserve all existing behavior, CLI arguments, and test semantics
- Maintain Phase 1 compatibility (print_* functions, VERIFY_FIRE flag)
- Update all adb references to use $ADB_BIN variable
- Standardize alarm counting to use shared count_alarms() function

Benefits:
- Single source of truth for shared helpers
- Easier maintenance (fix once, benefits all scripts)
- Consistent behavior across all test phases
- No functional changes to test execution or results
2025-11-28 08:53:42 +00:00
Matthew Raymer
3151a1cc31 feat(android): implement Phase 1 cold start recovery
Implements cold start recovery for missed notifications and future alarm
verification/rescheduling as specified in Phase 1 directive.

Changes:
- Add ReactivationManager.kt with cold start recovery logic
- Integrate recovery into DailyNotificationPlugin.load()
- Fix NotifyReceiver to always store NotificationContentEntity for recovery
- Add Phase 1 emulator testing guide and verification doc
- Add test-phase1.sh automated test harness

Recovery behavior:
- Detects missed notifications on app launch
- Marks missed notifications in database
- Verifies future alarms are scheduled in AlarmManager
- Reschedules missing future alarms
- Completes within 2-second timeout (non-blocking)

Test harness:
- Automated script with 4 test cases
- UI prompts for plugin configuration
- Log parsing for recovery results
- Verified on Pixel 8 API 34 emulator

Related:
- Implements: android-implementation-directive-phase1.md
- Requirements: docs/alarms/03-plugin-requirements.md §3.1.2
- Testing: docs/alarms/PHASE1-EMULATOR-TESTING.md
- Verification: docs/alarms/PHASE1-VERIFICATION.md
2025-11-27 10:01:34 +00:00