Bitnob Payout Incident Postmortem Template

1. Incident Overview

FIELDDETAILS
Incident Title
Short descriptive title (e.g., "NIP Corridor Payout Delays - 24 April 2025")
Date and Time of Incident
When the incident started and ended (UTC).
Reported By
Who detected or reported the issue.
Severity Level
Critical / High / Medium / Low.
Affected Systems
APIs, Treasury, Webhooks, Internal Dashboards, Specific Corridors.
Affected Customers
Number of customers impacted, specific partners if known.
Initial User Impact
Delayed payouts, failed payouts, incorrect statuses, user escalations.

2. Incident Timeline

TIME (UTC)EVENT
13:00
First alert triggered by liquidity buffer monitor for NGN payouts.
13:15
Webhook delivery failures reported from multiple partners.
13:30
On-call engineering team initiated investigation.
14:00
Root cause identified: upstream banking API outage.
14:15
Communication sent to impacted users.
15:30
Payout retries initiated manually.
16:00
Full service restoration confirmed.

3. Root Cause Analysis

FIELDDETAILS
Primary Root Cause
NIP upstream banking rail outage with no available redundancy.
Contributing Factors
Treasury buffer was sufficient, but no alternate payout route activated automatically.
Detection Weaknesses
Alerting on webhook failure was delayed by 15 minutes.
Process Weaknesses
Manual escalation to Treasury delayed liquidity redeployment.

4. Impact Assessment

FIELDDETAILS
Number of Failed or Delayed Payouts
Total and breakdown per corridor.
User Impact
Number of user support tickets, social escalations, refunds required.
Financial Impact
Any losses, refunds, fees paid to users.
Regulatory Impact
Any reportable incidents to regulators or partners.

5. Immediate Actions Taken

ACTIONTIMESTAMPOWNER
Liquidity buffer topped up manually for NGN corridor.
14:00 UTC
Treasury Ops
Partner bank outage escalated to account manager.
14:05 UTC
BD Team
User communication emails sent explaining delay.
14:15 UTC
Customer Support
Webhook replay queued for delayed payouts.
15:00 UTC
Engineering

6. Lessons Learned

Detection must happen within 5 minutes, not 15 minutes.

Redundant liquidity routes for high-risk corridors (e.g., backup NIP provider) must be operational.

User communication templates for payout delays must be pre-approved and ready for immediate use.

7. Preventative and Remediation Actions

ACTION ITEMOWNERDEADLINE
Integrate second NIP banking partner for automatic failover.
Treasury and Engineering
May 15, 2025
Implement faster webhook failure detection and alerting.
Engineering Ops
May 1, 2025
Review SLA thresholds and escalation policies for payouts.
PM + Ops
April 30, 2025
Create live liquidity monitoring dashboard visible to PMs and Operations.
Product Analytics
May 5, 2025

8. Communication Summary

AUDIENCECHANNELMESSAGE SENT
Affected Users
Email / In-App Notifications
Service interruption notice and next steps.
Internal Teams
Slack / Incident Response Channel
Real-time updates and postmortem sharing.
Strategic Partners (if needed)
Direct Email
Professional incident report if SLAs breached.

Incident Severity: (Confirm based on real financial and user impact.)

Full Postmortem Distribution: (Confirm who will receive final write-up — e.g., leadership, compliance, strategic partners.)

A strong postmortem culture is not about avoiding blame. It is about building payout products that:

Detect failures earlier,

Recover faster,

Communicate better,

Protect user trust under stress.

Serious payout platforms are not judged by whether incidents happen. They are judged by how systematically and transparently they respond and improve.

Sample Filled Payout Incident Postmortem Report

1. Incident Overview

FIELDDETAILS
Incident Title
NIP Corridor Payout Delays - 24 April 2025
Date and Time of Incident
24 April 2025, 12:30 UTC – 15:45 UTC
Reported By
Treasury Ops Monitor
Severity Level
High
Affected Systems
NIP (Nigeria Instant Payment) payouts, Webhook delivery delays
Affected Customers
143 users, 2 enterprise payout partners
Initial User Impact
Payouts delayed beyond 30-minute SLA; increased support ticket volume

2. Incident Timeline

TIME (UTC)EVENT
12:30
Liquidity buffer monitor flagged NGN payout delays.
12:45
Webhook delivery failures started appearing for NIP payouts.
13:00
On-call engineering initiated investigation.
13:20
Root cause identified: upstream bank API partial outage (Bank Partner A).
13:30
Treasury switched to backup bank partner manually.
13:45
User support escalation triggered communications.
14:30
Manual payout retries began.
15:45
All delayed payouts completed successfully.

3. Root Cause Analysis

FieldDetails
Primary Root Cause
Upstream partner bank's NIP API degraded without automated failover.
Contributing Factors
Liquidity buffer was sufficient, but platform did not auto-switch payout route.
Detection Weaknesses
Webhook failure threshold was too high, delaying internal alert.
Process Weaknesses
Manual Treasury intervention required; failover was not automated.

4. Impact Assessment

FIELDDETAILS
Number of Failed or Delayed Payouts
157 payouts delayed; 0 permanently failed.
User Impact
27 user support tickets; 3 escalations to account managers.
Financial Impact
No direct financial loss; goodwill refunds ($50 total) to key accounts.
Regulatory Impact
None; internal thresholds for SLA breaches not exceeded materially.

5. Immediate Actions Taken

ACTIONTIMESTAMPOWNER
Manual liquidity reallocation to backup bank.
13:30
Treasury Ops
User notification emails sent.
13:45
Customer Support
Triggered manual webhook replays for delayed payouts.
14:30
Engineering Ops

6. Lessons Learned

Relying on a single payout rail per corridor is operationally fragile.

Webhook delivery failures should trigger alerts faster (current threshold too lenient).

User communication templates should be ready to deploy immediately, not drafted during incident.

7. Preventative and Remediation Actions

ACTION ITEMOWNERDEADLINE
Integrate auto-failover to multiple NIP bank partners.
Engineering + Treasury
15 May 2025
Lower webhook failure alert threshold from 5% to 2%.
Engineering Ops
2 May 2025
Pre-approve standard payout delay user notification templates.
Customer Support
30 April 2025

8. Communication Summary

AUDIENCECHANNELMESSAGE SENT
Affected Users
Email + In-App Notifications
14:00 UTC
Internal Teams
Slack Incident Channel
Real-time updates throughout
Enterprise Partners
Direct Email Reports
17:00 UTC after incident closure

Final Notes

Note

Incident Severity: High (user SLA breach but no financial or regulatory damage)

Postmortem distributed to Product, Treasury, Engineering, and Leadership.

Notes on Formatting

Note

Both the template and the sample are clean Markdown style.

Very easy to convert to a .md file or even export to PDF cleanly.

Here’s a ready-to-download Markdown version combining the Template and Sample:

Bitnob Payout Incident Postmortem Template

1. Incident Overview

| Field | Details |

|:------|:--------|

| Incident Title | |

| Date and Time of Incident | |

| Reported By | |

| Severity Level | |

| Affected Systems | |

| Affected Customers | |

| Initial User Impact | |

2. Incident Timeline

| Time (UTC) | Event |

|:-----------|:------|

3. Root Cause Analysis

| Field | Details |

|:------|:--------|

4. Impact Assessment

| Field | Details |

|:------|:--------|

5. Immediate Actions Taken

| Action | Timestamp | Owner |

|:-------|:----------|:------|

6. Lessons Learned

7. Preventative and Remediation Actions

| Action Item | Owner | Deadline |

|:------------|:------|:---------|

8. Communication Summary

| Audience | Channel | Message Sent |

|:---------|:--------|:-------------|

Sample Filled Payout Postmortem

1. Incident Overview

| Field | Details |

|:------|:--------|

| Incident Title | NIP Corridor Payout Delays - 24 April 2025 |

| Date and Time of Incident | 24 April 2025, 12:30 UTC – 15:45 UTC |

| Reported By | Treasury Ops Monitor |

| Severity Level | High |

| Affected Systems | NIP payouts, Webhook delivery delays || Field | Details |

|:------|:--------|

| Incident Title | NIP Corridor Payout Delays - 24 April 2025 |

| Date and Time of Incident | 24 April 2025, 12:30 UTC – 15:45 UTC |

| Reported By | Treasury Ops Monitor |

| Severity Level | High |

| Affected Systems | NIP payouts, Webhook delivery delays |

| Affected Customers | 143 users, 2 enterprise partners |

| Initial User Impact | Payouts delayed beyond SLA, increased support tickets |

| Affected Customers | 143 users, 2 enterprise partners |

| Initial User Impact | Payouts delayed beyond SLA, increased support tickets |

2. Incident Timeline

| Time (UTC) | Event |

|:-----------|:------|

| 12:30 | Liquidity monitor flagged NGN payout delays. |

| 12:45 | Webhook delivery failures appeared. |

| 13:00 | Investigation initiated. |

| 13:20 | Root cause found: upstream bank API degradation. |

| 13:30 | Manual switch to backup bank. |

| 13:45 | User communication triggered. |

| 14:30 | Manual webhook replays started. |

| 15:45 | Full payout restoration confirmed. |

3. Root Cause Analysis

| Field | Details |

|:------|:--------|

| Primary Root Cause | Upstream banking partner outage. |

| Contributing Factors | No automatic failover, webhook alert delay. |

4. Impact Assessment

| Field | Details |

|:------|:--------|

| Failed or Delayed Payouts | 157 delayed |

| User Impact | 27 support tickets |

| Financial Impact | $50 goodwill refunds |

| Regulatory Impact | None |

5. Immediate Actions Taken

| Action | Timestamp | Owner |

|:-------|:----------|:------|

| Liquidity switch | 13:30 | Treasury Ops |

| User notifications | 13:45 | Support |

| Webhook replays | 14:30 | Engineering Ops |

6. Lessons Learned

  • Auto-failover is critical for major corridors.

  • Faster webhook failure alerts needed.

7. Preventative and Remediation Actions

| Action Item | Owner | Deadline |

|:------------|:------|:---------|

| Add second bank partner | Engineering + Treasury | 15 May 2025 |

| Tighten webhook alert thresholds | Engineering Ops | 2 May 2025 |

| Approve user delay communication templates | Support | 30 April 2025 |

8. Communication Summary

| Audience | Channel | Message Sent |

|:---------|:--------|:-------------|

| Affected Users | Email, In-App | During incident |

| Internal | Slack | Continuous |

| Partners | Direct Email | Post-resolution |