You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
9.3 KiB
9.3 KiB
TimeSafari Daily Notification Plugin - Observability Dashboards
Author: Matthew Raymer
Version: 1.0.0
Created: 2025-10-08 06:08:15 UTC
Overview
This document provides sample dashboards, queries, and monitoring configurations for the TimeSafari Daily Notification Plugin. These can be imported into your monitoring system (Grafana, DataDog, New Relic, etc.) to track plugin health and performance.
Key Metrics
Core Performance Metrics
- Fetch Success Rate: Percentage of successful content fetches
- Notification Delivery Rate: Percentage of notifications successfully delivered
- Callback Success Rate: Percentage of successful callback executions
- Average Fetch Time: Mean time for content fetching operations
- Average Notification Time: Mean time for notification delivery
User Interaction Metrics
- User Opt-out Rate: Percentage of users who opt out of notifications
- Permission Grant Rate: Percentage of users who grant notification permissions
- Permission Denial Rate: Percentage of users who deny notification permissions
Platform-Specific Metrics
- Android WorkManager Starts: Number of Android background task starts
- iOS Background Task Starts: Number of iOS background task starts
- Electron Notifications: Number of Electron desktop notifications
- Platform Error Rate: Percentage of platform-specific errors
Sample Queries
Grafana Queries
1. Notification Delivery Success Rate
# Success rate over last 24 hours
(
sum(rate(dnp_notifications_success_total[24h])) /
sum(rate(dnp_notifications_total[24h]))
) * 100
2. Average Fetch Time
# Average fetch time over last hour
avg_over_time(dnp_fetch_duration_seconds[1h])
3. User Opt-out Rate
# Opt-out rate over last 7 days
(
sum(rate(dnp_user_opt_outs_total[7d])) /
sum(rate(dnp_user_interactions_total[7d]))
) * 100
4. Platform Error Rate
# Platform error rate over last hour
(
sum(rate(dnp_platform_errors_total[1h])) /
sum(rate(dnp_platform_events_total[1h]))
) * 100
DataDog Queries
1. Health Status Dashboard
# Notification health score
100 - (
(sum:dnp.notifications.failed{*}.as_rate() /
sum:dnp.notifications.total{*}.as_rate()) * 100
)
2. Performance Trends
# Fetch performance trend
avg:dnp.fetch.duration{*}.rollup(avg, 300)
3. User Engagement
# User engagement rate
(sum:dnp.user.opt_ins{*}.as_rate() /
sum:dnp.user.interactions{*}.as_rate()) * 100
Sample Dashboard Configurations
1. Overview Dashboard
Purpose: High-level plugin health and performance overview
Panels:
- Notification Success Rate (Gauge): Current success rate percentage
- Active Schedules (Stat): Number of active notification schedules
- Recent Errors (Logs): Last 10 error events
- Performance Trends (Time Series): Fetch and notification times over time
- User Metrics (Bar Chart): Opt-ins vs opt-outs over last 7 days
2. Platform-Specific Dashboard
Purpose: Monitor platform-specific performance and issues
Panels:
- Android WorkManager Status (Stat): Active background tasks
- iOS Background Task Success (Gauge): Success rate for iOS tasks
- Electron Notification Count (Counter): Desktop notifications sent
- Platform Error Breakdown (Pie Chart): Errors by platform
- Platform Performance (Time Series): Performance by platform
3. User Engagement Dashboard
Purpose: Track user interaction and engagement metrics
Panels:
- Permission Grant Rate (Gauge): Current permission grant rate
- Opt-out Trends (Time Series): Opt-out rate over time
- User Interaction Heatmap (Heatmap): User actions by time of day
- Engagement Funnel (Funnel): Permission → Opt-in → Active usage
Alerting Rules
Critical Alerts
1. Notification Delivery Failure
alert: NotificationDeliveryFailure
expr: dnp_notifications_success_rate < 0.95
for: 5m
labels:
severity: critical
annotations:
summary: "Notification delivery success rate below 95%"
description: "Notification success rate is {{ $value }}% for the last 5 minutes"
2. High Error Rate
alert: HighErrorRate
expr: rate(dnp_errors_total[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} errors/second"
3. Platform Errors
alert: PlatformErrors
expr: rate(dnp_platform_errors_total[5m]) > 0.05
for: 3m
labels:
severity: warning
annotations:
summary: "Platform-specific errors detected"
description: "Platform error rate is {{ $value }} errors/second"
Warning Alerts
1. Performance Degradation
alert: PerformanceDegradation
expr: avg_over_time(dnp_fetch_duration_seconds[10m]) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "Fetch performance degraded"
description: "Average fetch time is {{ $value }} seconds"
2. High Opt-out Rate
alert: HighOptOutRate
expr: rate(dnp_user_opt_outs_total[1h]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "High user opt-out rate"
description: "Opt-out rate is {{ $value }} users/hour"
SLO Definitions
Service Level Objectives
1. Notification Delivery SLO
- Target: 99.5% success rate
- Measurement: Successful notifications / Total notifications
- Time Window: 30 days
- Error Budget: 0.5%
2. Performance SLO
- Target: 95% of fetches complete within 3 seconds
- Measurement: Fetch duration percentiles
- Time Window: 7 days
- Error Budget: 5%
3. Availability SLO
- Target: 99.9% uptime
- Measurement: Plugin health endpoint availability
- Time Window: 30 days
- Error Budget: 0.1%
Log Analysis
Structured Log Patterns
1. Error Analysis
# Find all errors in the last hour
grep "DNP-.*-FAILURE" /var/log/timesafari/daily-notification.log | \
jq -r '.timestamp, .eventCode, .message' | \
head -20
2. Performance Analysis
# Find slow operations
grep "DNP-FETCH-START\|DNP-FETCH-SUCCESS" /var/log/timesafari/daily-notification.log | \
jq -r 'select(.duration > 5000) | .timestamp, .duration, .message'
3. User Behavior Analysis
# Analyze user interactions
grep "DNP-USER-\|DNP-PERMISSION-" /var/log/timesafari/daily-notification.log | \
jq -r '.timestamp, .eventCode, .data.userId' | \
sort | uniq -c
Monitoring Best Practices
1. Log Retention
- Structured Logs: Retain for 30 days
- Error Logs: Retain for 90 days
- Performance Logs: Retain for 7 days
- User Interaction Logs: Retain for 1 year (with privacy compliance)
2. Metric Collection
- High-frequency metrics: Collect every 30 seconds
- Medium-frequency metrics: Collect every 5 minutes
- Low-frequency metrics: Collect every 1 hour
- User metrics: Collect on-demand
3. Alert Tuning
- Start with conservative thresholds
- Adjust based on historical data
- Use different severity levels
- Implement alert fatigue prevention
4. Dashboard Design
- Keep dashboards focused and actionable
- Use consistent color schemes
- Include context and annotations
- Regular review and updates
Integration Examples
Grafana Dashboard JSON
{
"dashboard": {
"title": "TimeSafari Daily Notification Plugin",
"panels": [
{
"title": "Notification Success Rate",
"type": "stat",
"targets": [
{
"expr": "(sum(rate(dnp_notifications_success_total[24h])) / sum(rate(dnp_notifications_total[24h]))) * 100"
}
]
}
]
}
}
Prometheus Recording Rules
groups:
- name: timesafari_daily_notification
rules:
- record: dnp:notification_success_rate
expr: (sum(rate(dnp_notifications_success_total[5m])) / sum(rate(dnp_notifications_total[5m]))) * 100
- record: dnp:fetch_duration_avg
expr: avg_over_time(dnp_fetch_duration_seconds[5m])
- record: dnp:user_opt_out_rate
expr: (sum(rate(dnp_user_opt_outs_total[1h])) / sum(rate(dnp_user_interactions_total[1h]))) * 100
Troubleshooting Guide
Common Issues and Queries
1. High Error Rate
# Check recent errors
curl -s "http://localhost:9090/api/v1/query?query=rate(dnp_errors_total[5m])" | jq
2. Performance Issues
# Check fetch performance
curl -s "http://localhost:9090/api/v1/query?query=avg_over_time(dnp_fetch_duration_seconds[10m])" | jq
3. User Engagement Issues
# Check user metrics
curl -s "http://localhost:9090/api/v1/query?query=rate(dnp_user_opt_outs_total[1h])" | jq
Privacy and Compliance
Data Retention
- User interaction logs: 1 year maximum
- Performance metrics: 90 days maximum
- Error logs: 30 days maximum
- Personal data: Redacted or anonymized
GDPR Compliance
- User consent: Tracked and logged
- Data portability: Export capabilities
- Right to deletion: Automated cleanup
- Privacy by design: Built into observability system
Note: These dashboards and queries should be customized based on your specific monitoring infrastructure and requirements. Regular review and updates are recommended to ensure they remain relevant and actionable.