- Update README with native-first architecture and compatibility matrix - Enhance API documentation with TimeSafari-specific examples - Update integration guide with current architecture and troubleshooting - Add comprehensive observability dashboards guide - Add accessibility and localization implementation guide - Add legal and store compliance guide - Add manual smoke testing documentation - Update all documentation to reflect native-first architecture Documentation: API reference, integration guide, observability, A11y, compliance
343 lines
9.3 KiB
Markdown
343 lines
9.3 KiB
Markdown
# TimeSafari Daily Notification Plugin - Observability Dashboards
|
|
|
|
**Author**: Matthew Raymer
|
|
**Version**: 1.0.0
|
|
**Created**: 2025-10-08 06:08:15 UTC
|
|
|
|
## Overview
|
|
|
|
This document provides sample dashboards, queries, and monitoring configurations for the TimeSafari Daily Notification Plugin. These can be imported into your monitoring system (Grafana, DataDog, New Relic, etc.) to track plugin health and performance.
|
|
|
|
## Key Metrics
|
|
|
|
### Core Performance Metrics
|
|
- **Fetch Success Rate**: Percentage of successful content fetches
|
|
- **Notification Delivery Rate**: Percentage of notifications successfully delivered
|
|
- **Callback Success Rate**: Percentage of successful callback executions
|
|
- **Average Fetch Time**: Mean time for content fetching operations
|
|
- **Average Notification Time**: Mean time for notification delivery
|
|
|
|
### User Interaction Metrics
|
|
- **User Opt-out Rate**: Percentage of users who opt out of notifications
|
|
- **Permission Grant Rate**: Percentage of users who grant notification permissions
|
|
- **Permission Denial Rate**: Percentage of users who deny notification permissions
|
|
|
|
### Platform-Specific Metrics
|
|
- **Android WorkManager Starts**: Number of Android background task starts
|
|
- **iOS Background Task Starts**: Number of iOS background task starts
|
|
- **Electron Notifications**: Number of Electron desktop notifications
|
|
- **Platform Error Rate**: Percentage of platform-specific errors
|
|
|
|
## Sample Queries
|
|
|
|
### Grafana Queries
|
|
|
|
#### 1. Notification Delivery Success Rate
|
|
```promql
|
|
# Success rate over last 24 hours
|
|
(
|
|
sum(rate(dnp_notifications_success_total[24h])) /
|
|
sum(rate(dnp_notifications_total[24h]))
|
|
) * 100
|
|
```
|
|
|
|
#### 2. Average Fetch Time
|
|
```promql
|
|
# Average fetch time over last hour
|
|
avg_over_time(dnp_fetch_duration_seconds[1h])
|
|
```
|
|
|
|
#### 3. User Opt-out Rate
|
|
```promql
|
|
# Opt-out rate over last 7 days
|
|
(
|
|
sum(rate(dnp_user_opt_outs_total[7d])) /
|
|
sum(rate(dnp_user_interactions_total[7d]))
|
|
) * 100
|
|
```
|
|
|
|
#### 4. Platform Error Rate
|
|
```promql
|
|
# Platform error rate over last hour
|
|
(
|
|
sum(rate(dnp_platform_errors_total[1h])) /
|
|
sum(rate(dnp_platform_events_total[1h]))
|
|
) * 100
|
|
```
|
|
|
|
### DataDog Queries
|
|
|
|
#### 1. Health Status Dashboard
|
|
```datadog
|
|
# Notification health score
|
|
100 - (
|
|
(sum:dnp.notifications.failed{*}.as_rate() /
|
|
sum:dnp.notifications.total{*}.as_rate()) * 100
|
|
)
|
|
```
|
|
|
|
#### 2. Performance Trends
|
|
```datadog
|
|
# Fetch performance trend
|
|
avg:dnp.fetch.duration{*}.rollup(avg, 300)
|
|
```
|
|
|
|
#### 3. User Engagement
|
|
```datadog
|
|
# User engagement rate
|
|
(sum:dnp.user.opt_ins{*}.as_rate() /
|
|
sum:dnp.user.interactions{*}.as_rate()) * 100
|
|
```
|
|
|
|
## Sample Dashboard Configurations
|
|
|
|
### 1. Overview Dashboard
|
|
|
|
**Purpose**: High-level plugin health and performance overview
|
|
|
|
**Panels**:
|
|
- **Notification Success Rate** (Gauge): Current success rate percentage
|
|
- **Active Schedules** (Stat): Number of active notification schedules
|
|
- **Recent Errors** (Logs): Last 10 error events
|
|
- **Performance Trends** (Time Series): Fetch and notification times over time
|
|
- **User Metrics** (Bar Chart): Opt-ins vs opt-outs over last 7 days
|
|
|
|
### 2. Platform-Specific Dashboard
|
|
|
|
**Purpose**: Monitor platform-specific performance and issues
|
|
|
|
**Panels**:
|
|
- **Android WorkManager Status** (Stat): Active background tasks
|
|
- **iOS Background Task Success** (Gauge): Success rate for iOS tasks
|
|
- **Electron Notification Count** (Counter): Desktop notifications sent
|
|
- **Platform Error Breakdown** (Pie Chart): Errors by platform
|
|
- **Platform Performance** (Time Series): Performance by platform
|
|
|
|
### 3. User Engagement Dashboard
|
|
|
|
**Purpose**: Track user interaction and engagement metrics
|
|
|
|
**Panels**:
|
|
- **Permission Grant Rate** (Gauge): Current permission grant rate
|
|
- **Opt-out Trends** (Time Series): Opt-out rate over time
|
|
- **User Interaction Heatmap** (Heatmap): User actions by time of day
|
|
- **Engagement Funnel** (Funnel): Permission → Opt-in → Active usage
|
|
|
|
## Alerting Rules
|
|
|
|
### Critical Alerts
|
|
|
|
#### 1. Notification Delivery Failure
|
|
```yaml
|
|
alert: NotificationDeliveryFailure
|
|
expr: dnp_notifications_success_rate < 0.95
|
|
for: 5m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Notification delivery success rate below 95%"
|
|
description: "Notification success rate is {{ $value }}% for the last 5 minutes"
|
|
```
|
|
|
|
#### 2. High Error Rate
|
|
```yaml
|
|
alert: HighErrorRate
|
|
expr: rate(dnp_errors_total[5m]) > 0.1
|
|
for: 2m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High error rate detected"
|
|
description: "Error rate is {{ $value }} errors/second"
|
|
```
|
|
|
|
#### 3. Platform Errors
|
|
```yaml
|
|
alert: PlatformErrors
|
|
expr: rate(dnp_platform_errors_total[5m]) > 0.05
|
|
for: 3m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Platform-specific errors detected"
|
|
description: "Platform error rate is {{ $value }} errors/second"
|
|
```
|
|
|
|
### Warning Alerts
|
|
|
|
#### 1. Performance Degradation
|
|
```yaml
|
|
alert: PerformanceDegradation
|
|
expr: avg_over_time(dnp_fetch_duration_seconds[10m]) > 5
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Fetch performance degraded"
|
|
description: "Average fetch time is {{ $value }} seconds"
|
|
```
|
|
|
|
#### 2. High Opt-out Rate
|
|
```yaml
|
|
alert: HighOptOutRate
|
|
expr: rate(dnp_user_opt_outs_total[1h]) > 0.1
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High user opt-out rate"
|
|
description: "Opt-out rate is {{ $value }} users/hour"
|
|
```
|
|
|
|
## SLO Definitions
|
|
|
|
### Service Level Objectives
|
|
|
|
#### 1. Notification Delivery SLO
|
|
- **Target**: 99.5% success rate
|
|
- **Measurement**: Successful notifications / Total notifications
|
|
- **Time Window**: 30 days
|
|
- **Error Budget**: 0.5%
|
|
|
|
#### 2. Performance SLO
|
|
- **Target**: 95% of fetches complete within 3 seconds
|
|
- **Measurement**: Fetch duration percentiles
|
|
- **Time Window**: 7 days
|
|
- **Error Budget**: 5%
|
|
|
|
#### 3. Availability SLO
|
|
- **Target**: 99.9% uptime
|
|
- **Measurement**: Plugin health endpoint availability
|
|
- **Time Window**: 30 days
|
|
- **Error Budget**: 0.1%
|
|
|
|
## Log Analysis
|
|
|
|
### Structured Log Patterns
|
|
|
|
#### 1. Error Analysis
|
|
```bash
|
|
# Find all errors in the last hour
|
|
grep "DNP-.*-FAILURE" /var/log/timesafari/daily-notification.log | \
|
|
jq -r '.timestamp, .eventCode, .message' | \
|
|
head -20
|
|
```
|
|
|
|
#### 2. Performance Analysis
|
|
```bash
|
|
# Find slow operations
|
|
grep "DNP-FETCH-START\|DNP-FETCH-SUCCESS" /var/log/timesafari/daily-notification.log | \
|
|
jq -r 'select(.duration > 5000) | .timestamp, .duration, .message'
|
|
```
|
|
|
|
#### 3. User Behavior Analysis
|
|
```bash
|
|
# Analyze user interactions
|
|
grep "DNP-USER-\|DNP-PERMISSION-" /var/log/timesafari/daily-notification.log | \
|
|
jq -r '.timestamp, .eventCode, .data.userId' | \
|
|
sort | uniq -c
|
|
```
|
|
|
|
## Monitoring Best Practices
|
|
|
|
### 1. Log Retention
|
|
- **Structured Logs**: Retain for 30 days
|
|
- **Error Logs**: Retain for 90 days
|
|
- **Performance Logs**: Retain for 7 days
|
|
- **User Interaction Logs**: Retain for 1 year (with privacy compliance)
|
|
|
|
### 2. Metric Collection
|
|
- **High-frequency metrics**: Collect every 30 seconds
|
|
- **Medium-frequency metrics**: Collect every 5 minutes
|
|
- **Low-frequency metrics**: Collect every 1 hour
|
|
- **User metrics**: Collect on-demand
|
|
|
|
### 3. Alert Tuning
|
|
- **Start with conservative thresholds**
|
|
- **Adjust based on historical data**
|
|
- **Use different severity levels**
|
|
- **Implement alert fatigue prevention**
|
|
|
|
### 4. Dashboard Design
|
|
- **Keep dashboards focused and actionable**
|
|
- **Use consistent color schemes**
|
|
- **Include context and annotations**
|
|
- **Regular review and updates**
|
|
|
|
## Integration Examples
|
|
|
|
### Grafana Dashboard JSON
|
|
```json
|
|
{
|
|
"dashboard": {
|
|
"title": "TimeSafari Daily Notification Plugin",
|
|
"panels": [
|
|
{
|
|
"title": "Notification Success Rate",
|
|
"type": "stat",
|
|
"targets": [
|
|
{
|
|
"expr": "(sum(rate(dnp_notifications_success_total[24h])) / sum(rate(dnp_notifications_total[24h]))) * 100"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Prometheus Recording Rules
|
|
```yaml
|
|
groups:
|
|
- name: timesafari_daily_notification
|
|
rules:
|
|
- record: dnp:notification_success_rate
|
|
expr: (sum(rate(dnp_notifications_success_total[5m])) / sum(rate(dnp_notifications_total[5m]))) * 100
|
|
|
|
- record: dnp:fetch_duration_avg
|
|
expr: avg_over_time(dnp_fetch_duration_seconds[5m])
|
|
|
|
- record: dnp:user_opt_out_rate
|
|
expr: (sum(rate(dnp_user_opt_outs_total[1h])) / sum(rate(dnp_user_interactions_total[1h]))) * 100
|
|
```
|
|
|
|
## Troubleshooting Guide
|
|
|
|
### Common Issues and Queries
|
|
|
|
#### 1. High Error Rate
|
|
```bash
|
|
# Check recent errors
|
|
curl -s "http://localhost:9090/api/v1/query?query=rate(dnp_errors_total[5m])" | jq
|
|
```
|
|
|
|
#### 2. Performance Issues
|
|
```bash
|
|
# Check fetch performance
|
|
curl -s "http://localhost:9090/api/v1/query?query=avg_over_time(dnp_fetch_duration_seconds[10m])" | jq
|
|
```
|
|
|
|
#### 3. User Engagement Issues
|
|
```bash
|
|
# Check user metrics
|
|
curl -s "http://localhost:9090/api/v1/query?query=rate(dnp_user_opt_outs_total[1h])" | jq
|
|
```
|
|
|
|
## Privacy and Compliance
|
|
|
|
### Data Retention
|
|
- **User interaction logs**: 1 year maximum
|
|
- **Performance metrics**: 90 days maximum
|
|
- **Error logs**: 30 days maximum
|
|
- **Personal data**: Redacted or anonymized
|
|
|
|
### GDPR Compliance
|
|
- **User consent**: Tracked and logged
|
|
- **Data portability**: Export capabilities
|
|
- **Right to deletion**: Automated cleanup
|
|
- **Privacy by design**: Built into observability system
|
|
|
|
---
|
|
|
|
**Note**: These dashboards and queries should be customized based on your specific monitoring infrastructure and requirements. Regular review and updates are recommended to ensure they remain relevant and actionable.
|