Bitnob Payout Incident Postmortem Template
1. Incident Overview
FIELD | DETAILS |
---|---|
Incident Title | Short descriptive title (e.g., "NIP Corridor Payout Delays - 24 April 2025") |
Date and Time of Incident | When the incident started and ended (UTC). |
Reported By | Who detected or reported the issue. |
Severity Level | Critical / High / Medium / Low. |
Affected Systems | APIs, Treasury, Webhooks, Internal Dashboards, Specific Corridors. |
Affected Customers | Number of customers impacted, specific partners if known. |
Initial User Impact | Delayed payouts, failed payouts, incorrect statuses, user escalations. |
2. Incident Timeline
TIME (UTC) | EVENT |
---|---|
13:00 | First alert triggered by liquidity buffer monitor for NGN payouts. |
13:15 | Webhook delivery failures reported from multiple partners. |
13:30 | On-call engineering team initiated investigation. |
14:00 | Root cause identified: upstream banking API outage. |
14:15 | Communication sent to impacted users. |
15:30 | Payout retries initiated manually. |
16:00 | Full service restoration confirmed. |
3. Root Cause Analysis
FIELD | DETAILS |
---|---|
Primary Root Cause | NIP upstream banking rail outage with no available redundancy. |
Contributing Factors | Treasury buffer was sufficient, but no alternate payout route activated automatically. |
Detection Weaknesses | Alerting on webhook failure was delayed by 15 minutes. |
Process Weaknesses | Manual escalation to Treasury delayed liquidity redeployment. |
4. Impact Assessment
FIELD | DETAILS |
---|---|
Number of Failed or Delayed Payouts | Total and breakdown per corridor. |
User Impact | Number of user support tickets, social escalations, refunds required. |
Financial Impact | Any losses, refunds, fees paid to users. |
Regulatory Impact | Any reportable incidents to regulators or partners. |
5. Immediate Actions Taken
ACTION | TIMESTAMP | OWNER |
---|---|---|
Liquidity buffer topped up manually for NGN corridor. | 14:00 UTC | Treasury Ops |
Partner bank outage escalated to account manager. | 14:05 UTC | BD Team |
User communication emails sent explaining delay. | 14:15 UTC | Customer Support |
Webhook replay queued for delayed payouts. | 15:00 UTC | Engineering |
6. Lessons Learned
Detection must happen within 5 minutes, not 15 minutes.
Redundant liquidity routes for high-risk corridors (e.g., backup NIP provider) must be operational.
User communication templates for payout delays must be pre-approved and ready for immediate use.
7. Preventative and Remediation Actions
ACTION ITEM | OWNER | DEADLINE |
---|---|---|
Integrate second NIP banking partner for automatic failover. | Treasury and Engineering | May 15, 2025 |
Implement faster webhook failure detection and alerting. | Engineering Ops | May 1, 2025 |
Review SLA thresholds and escalation policies for payouts. | PM + Ops | April 30, 2025 |
Create live liquidity monitoring dashboard visible to PMs and Operations. | Product Analytics | May 5, 2025 |
8. Communication Summary
AUDIENCE | CHANNEL | MESSAGE SENT |
---|---|---|
Affected Users | Email / In-App Notifications | Service interruption notice and next steps. |
Internal Teams | Slack / Incident Response Channel | Real-time updates and postmortem sharing. |
Strategic Partners (if needed) | Direct Email | Professional incident report if SLAs breached. |
Incident Severity: (Confirm based on real financial and user impact.)
Full Postmortem Distribution: (Confirm who will receive final write-up — e.g., leadership, compliance, strategic partners.)
A strong postmortem culture is not about avoiding blame. It is about building payout products that:
Detect failures earlier,
Recover faster,
Communicate better,
Protect user trust under stress.
Serious payout platforms are not judged by whether incidents happen. They are judged by how systematically and transparently they respond and improve.
Sample Filled Payout Incident Postmortem Report
1. Incident Overview
FIELD | DETAILS |
---|---|
Incident Title | NIP Corridor Payout Delays - 24 April 2025 |
Date and Time of Incident | 24 April 2025, 12:30 UTC – 15:45 UTC |
Reported By | Treasury Ops Monitor |
Severity Level | High |
Affected Systems | NIP (Nigeria Instant Payment) payouts, Webhook delivery delays |
Affected Customers | 143 users, 2 enterprise payout partners |
Initial User Impact | Payouts delayed beyond 30-minute SLA; increased support ticket volume |
2. Incident Timeline
TIME (UTC) | EVENT |
---|---|
12:30 | Liquidity buffer monitor flagged NGN payout delays. |
12:45 | Webhook delivery failures started appearing for NIP payouts. |
13:00 | On-call engineering initiated investigation. |
13:20 | Root cause identified: upstream bank API partial outage (Bank Partner A). |
13:30 | Treasury switched to backup bank partner manually. |
13:45 | User support escalation triggered communications. |
14:30 | Manual payout retries began. |
15:45 | All delayed payouts completed successfully. |
3. Root Cause Analysis
Field | Details |
---|---|
Primary Root Cause | Upstream partner bank's NIP API degraded without automated failover. |
Contributing Factors | Liquidity buffer was sufficient, but platform did not auto-switch payout route. |
Detection Weaknesses | Webhook failure threshold was too high, delaying internal alert. |
Process Weaknesses | Manual Treasury intervention required; failover was not automated. |
4. Impact Assessment
FIELD | DETAILS |
---|---|
Number of Failed or Delayed Payouts | 157 payouts delayed; 0 permanently failed. |
User Impact | 27 user support tickets; 3 escalations to account managers. |
Financial Impact | No direct financial loss; goodwill refunds ($50 total) to key accounts. |
Regulatory Impact | None; internal thresholds for SLA breaches not exceeded materially. |
5. Immediate Actions Taken
ACTION | TIMESTAMP | OWNER |
---|---|---|
Manual liquidity reallocation to backup bank. | 13:30 | Treasury Ops |
User notification emails sent. | 13:45 | Customer Support |
Triggered manual webhook replays for delayed payouts. | 14:30 | Engineering Ops |
6. Lessons Learned
Relying on a single payout rail per corridor is operationally fragile.
Webhook delivery failures should trigger alerts faster (current threshold too lenient).
User communication templates should be ready to deploy immediately, not drafted during incident.
7. Preventative and Remediation Actions
ACTION ITEM | OWNER | DEADLINE |
---|---|---|
Integrate auto-failover to multiple NIP bank partners. | Engineering + Treasury | 15 May 2025 |
Lower webhook failure alert threshold from 5% to 2%. | Engineering Ops | 2 May 2025 |
Pre-approve standard payout delay user notification templates. | Customer Support | 30 April 2025 |
8. Communication Summary
AUDIENCE | CHANNEL | MESSAGE SENT |
---|---|---|
Affected Users | Email + In-App Notifications | 14:00 UTC |
Internal Teams | Slack Incident Channel | Real-time updates throughout |
Enterprise Partners | Direct Email Reports | 17:00 UTC after incident closure |
Final Notes
Notes on Formatting
Here’s a ready-to-download Markdown version combining the Template and Sample:
Bitnob Payout Incident Postmortem Template
1. Incident Overview
| Field | Details |
|:------|:--------|
| Incident Title | |
| Date and Time of Incident | |
| Reported By | |
| Severity Level | |
| Affected Systems | |
| Affected Customers | |
| Initial User Impact | |
2. Incident Timeline
| Time (UTC) | Event |
|:-----------|:------|
3. Root Cause Analysis
| Field | Details |
|:------|:--------|
4. Impact Assessment
| Field | Details |
|:------|:--------|
5. Immediate Actions Taken
| Action | Timestamp | Owner |
|:-------|:----------|:------|
6. Lessons Learned
7. Preventative and Remediation Actions
| Action Item | Owner | Deadline |
|:------------|:------|:---------|
8. Communication Summary
| Audience | Channel | Message Sent |
|:---------|:--------|:-------------|
Sample Filled Payout Postmortem
1. Incident Overview
| Field | Details |
|:------|:--------|
| Incident Title | NIP Corridor Payout Delays - 24 April 2025 |
| Date and Time of Incident | 24 April 2025, 12:30 UTC – 15:45 UTC |
| Reported By | Treasury Ops Monitor |
| Severity Level | High |
| Affected Systems | NIP payouts, Webhook delivery delays || Field | Details |
|:------|:--------|
| Incident Title | NIP Corridor Payout Delays - 24 April 2025 |
| Date and Time of Incident | 24 April 2025, 12:30 UTC – 15:45 UTC |
| Reported By | Treasury Ops Monitor |
| Severity Level | High |
| Affected Systems | NIP payouts, Webhook delivery delays |
| Affected Customers | 143 users, 2 enterprise partners |
| Initial User Impact | Payouts delayed beyond SLA, increased support tickets |
| Affected Customers | 143 users, 2 enterprise partners |
| Initial User Impact | Payouts delayed beyond SLA, increased support tickets |
2. Incident Timeline
| Time (UTC) | Event |
|:-----------|:------|
| 12:30 | Liquidity monitor flagged NGN payout delays. |
| 12:45 | Webhook delivery failures appeared. |
| 13:00 | Investigation initiated. |
| 13:20 | Root cause found: upstream bank API degradation. |
| 13:30 | Manual switch to backup bank. |
| 13:45 | User communication triggered. |
| 14:30 | Manual webhook replays started. |
| 15:45 | Full payout restoration confirmed. |
3. Root Cause Analysis
| Field | Details |
|:------|:--------|
| Primary Root Cause | Upstream banking partner outage. |
| Contributing Factors | No automatic failover, webhook alert delay. |
4. Impact Assessment
| Field | Details |
|:------|:--------|
| Failed or Delayed Payouts | 157 delayed |
| User Impact | 27 support tickets |
| Financial Impact | $50 goodwill refunds |
| Regulatory Impact | None |
5. Immediate Actions Taken
| Action | Timestamp | Owner |
|:-------|:----------|:------|
| Liquidity switch | 13:30 | Treasury Ops |
| User notifications | 13:45 | Support |
| Webhook replays | 14:30 | Engineering Ops |
6. Lessons Learned
Auto-failover is critical for major corridors.
Faster webhook failure alerts needed.
7. Preventative and Remediation Actions
| Action Item | Owner | Deadline |
|:------------|:------|:---------|
| Add second bank partner | Engineering + Treasury | 15 May 2025 |
| Tighten webhook alert thresholds | Engineering Ops | 2 May 2025 |
| Approve user delay communication templates | Support | 30 April 2025 |
8. Communication Summary
| Audience | Channel | Message Sent |
|:---------|:--------|:-------------|
| Affected Users | Email, In-App | During incident |
| Internal | Slack | Continuous |
| Partners | Direct Email | Post-resolution |