Implement P0-P5 directives for operator clarity, consistent outcomes, and
easy evidence capture across all test phases.
Changes:
- alarm-test-lib.sh: Add evidence collection (capture_alarms, capture_logcat,
capture_screenshot), verdict functions (verdict_pass/warn/fail), run directory
management, and release gating support (RELEASE_GATE_PHASE3)
- test-phase1.sh: Refactor to unified framework with CLI modes (--setup,
--run, --smoke, --all, --ci), micro-prompts, evidence capture, and verdict
blocks for all 5 tests
- test-phase2.sh: Add evidence capture, verdict blocks, and STRICTNESS policy
(soft/hard) for warn vs fail behavior
- test-phase3.sh: Add evidence capture, verdict blocks, release gating
(--gate-phase3), and fatigue reduction (time estimates, automation hints)
- RUNBOOK-TESTING.md: New comprehensive operator guide (669 lines) covering
prerequisites, all phases, evidence locations, verdict interpretation,
common failures, and troubleshooting
All test scripts now use consistent UI helpers (section, substep, info, ok,
warn, error), standardized evidence collection, and clear verdict reporting.
Evidence is saved to timestamped run directories (runs/<RUN_ID>/) with alarms,
logs, and screenshots organized by test phase and scenario.
Tests pass with consistent presentation and reproducible evidence collection.
Implements automated testing for TEST 4 (Invalid Data Handling) to verify
recovery gracefully handles invalid database entries without crashing.
Changes:
- Add injectInvalidTestData plugin method for injecting invalid test data
(empty schedule IDs, null nextRunAt, empty notification IDs)
- Make test app debuggable to enable direct database access
- Enhance test-phase1.sh with automated database injection and verification:
* Detect debuggable app status (check for DEBUGGABLE flag)
* Inject invalid data via direct SQL (schedules and notifications)
* Handle WAL mode with checkpoint
* Verify data injection success
* Trigger recovery and check logs for "Skipping invalid" messages
* Report pass/fail/inconclusive results
Fixes database constraint issues discovered during testing:
- Include jitterMs and backoffPolicy in schedule inserts
- Include priority, vibration_enabled, sound_enabled in notification inserts
Test results: ✅ PASSED
- Invalid data successfully injected
- Cold start recovery correctly skips invalid entries
- Recovery completes without crashing
- Boot recovery processes invalid data (follow-up improvement needed)
This enables automated verification that recovery handles corrupted or
invalid database entries gracefully, preventing crashes in production.
Fix duplicate alarm bug where updating schedule time created multiple
schedules in database, violating "one notification per day" contract.
Plugin Changes:
- Use stable scheduleId "daily_notification" instead of timestamp-based IDs
- Delete all existing notification schedules before creating new one
- Cancel alarms in AlarmManager before database deletion
- Add detailed logging for cleanup operations
- Make scheduleDailyReminder delegate to scheduleDailyNotification
Test Harness Changes:
- Make TEST 2 fail when alarm count > 1 after schedule update
- Make TEST 2 fail when alarm count > 1 after recovery
- Add clear failure messages explaining "one per day" violation
- Add final verdict section with detailed failure summary
Results:
- Before: 2-3 alarms, 2 schedules in DB, "Pending: 2" in UI
- After: 1 alarm, 1 schedule in DB, "Pending: 1" in UI
- TEST 2 now correctly passes with proper validation
This ensures that updating schedule time maintains exactly one alarm
per day, preventing duplicate notifications and database bloat.
Add automatic app state reset for TEST 1 to ensure clean starting state when
lingering alarms from TEST 0 are detected. Create PHASE1_TEST1_GOLDEN.md with
actual values from successful run.
TEST 1 Auto-Reset:
- Detect lingering plugin alarms before TEST 1 starts
- Automatically uninstall/reinstall app to clear alarms
- Verify clean state (0 alarms) before proceeding
- Gracefully skip TEST 1 if clean state cannot be achieved
- Take failure screenshots when reset fails
- Wrap all TEST 1 steps in conditional to skip on reset failure
Documentation:
- Create PHASE1_TEST1_GOLDEN.md with actual values from passing run
- Document auto-reset behavior in golden run steps
- Add cross-references between TEST 0 and TEST 1 golden docs
- Include actual timestamps, scheduleIds, and recovery metrics
This ensures TEST 1 always starts from a known clean state, making test
results reliable and reproducible. The golden doc serves as a baseline for
comparing future TEST 1 runs.
Fix alarm counting to correctly parse dumpsys output where app ID and
action appear on different lines. Add screenshot capture for test
diagnostics and create golden run documentation.
Test Harness Improvements:
- Fix get_plugin_alarm_count() to track app ID and action separately
across alarm block lines (fixes false 0-count bug)
- Add show_plugin_alarms_compact() to display complete alarm blocks
- Add wait_for_stable_plugin_alarm_count() polling helper to reduce
race condition false negatives
- Add take_screenshot() and take_failure_screenshot() helpers for
automatic test state capture
- Integrate screenshots into TEST 0 at key checkpoints
- Update TEST 0 messaging to handle race conditions gracefully
- Add screenshots/ to .gitignore
Documentation:
- Create PHASE1_TEST0_GOLDEN.md with actual values from successful run
- Document expected script output, UI state, dumpsys shape, and logcat
patterns
- Include pass/fail checklist for future test runs
This fixes the issue where alarm counting always returned 0 because the
AWK logic required app ID and action on the same line, but dumpsys
output has them on separate lines (header line has app ID, tag line
has action).
Centralize all notification alarm scheduling through NotifyReceiver.scheduleExactNotification()
with idempotence checks to prevent duplicate alarms. Implement one-alarm policy using
setAlarmClock() only. Fix test harness alarm counting to deduplicate by Alarm handle.
Plugin Changes:
- Add ScheduleSource enum to track scheduling paths (INITIAL_SETUP, ROLLOVER_ON_FIRE, etc.)
- Add DB-level idempotence check before scheduling (prevents logical duplicates)
- Add explicit alarm cancellation before scheduling (safety net)
- Implement one-alarm policy: use setAlarmClock() only, no setExact* fallbacks for same event
- Add deep logging for all AlarmManager calls (variant, requestCode, pendingIntentHash)
- Update all rollover paths (DailyNotificationReceiver, DailyNotificationWorker) to use
centralized function with ROLLOVER_ON_FIRE source
- Add @JvmStatic annotation to scheduleExactNotification for Java interop
Test Harness Changes:
- Fix get_plugin_alarm_count() to deduplicate by Alarm handle (prevents double-counting
same alarm in main list and "Next wake from idle" section)
- Update TEST 0 messaging: treat 0 alarms as race condition (inconclusive, not failure)
- Make post-rollover check the authoritative assertion point (only fails on >1 or 0 alarms)
- Remove redundant "Found 0 alarms - test may not be accurate" messages
This fixes the duplicate alarm bug where two distinct AlarmManager entries were created
for the same daily notification, violating the "one notification per day" contract.
Extract common helper functions from test-phase1.sh, test-phase2.sh,
and test-phase3.sh into a shared library (alarm-test-lib.sh) to reduce
code duplication and improve maintainability.
Changes:
- Create alarm-test-lib.sh with shared configuration, UI helpers, ADB
helpers, log parsing, and test selection logic
- Refactor all three phase test scripts to source the shared library
- Remove ~200 lines of duplicated code across the three scripts
- Preserve all existing behavior, CLI arguments, and test semantics
- Maintain Phase 1 compatibility (print_* functions, VERIFY_FIRE flag)
- Update all adb references to use $ADB_BIN variable
- Standardize alarm counting to use shared count_alarms() function
Benefits:
- Single source of truth for shared helpers
- Easier maintenance (fix once, benefits all scripts)
- Consistent behavior across all test phases
- No functional changes to test execution or results