Engineering On Call
Overview
To ensure system reliability and timely response to urgent issues, we’ve established an on-call rotation for the engineering team. This process defines responsibilities and expectations for engineers assigned to on-call duties, promoting accountability, learning, and continuous improvement.
🔄 Rotation Details
- 📅 Duration: Each on-call shift lasts 1 week
- 👥 Participants:
- Primary On-Call Engineer: Main point of contact for any alerts or support issues during the rotation.
- Secondary On-Call Engineer: Backup support for the primary. The primary may delegate tasks to the secondary if needed (e.g., during time off or overlapping workload).
📝 Responsibilities
1️⃣ Primary Engineer
- Actively monitor and respond to:
- User reports:
support-arc-twilio
tickets - System alerts:
ntfy-alerts
- User reports:
- Coordinate with relevant teams to resolve high-priority issues based on established SLAs.
- Keep stakeholders informed as needed.
- Document actions taken and any notable findings.
- Share a summary of the week’s activities and learnings by EOD Monday after the on-call week.
- ✅ Summary of handled tickets: volumes, themes, patterns
- 💡 Learnings and observations: what went well, what was confusing, unexpected
- 🛠️ Fixes or follow-ups: e.g., PRs, eng handbook updates
- 🚀 Proactive work suggestions: how to prevent similar issues, automation ideas, tooling gaps
2️⃣ Secondary Engineer
- Stay informed and available during the week in case backup is required.
- Assist with ticket resolution or alerts if the primary delegates a task.
- Optionally shadow the process for ramp-up or cross-training.