Dead letter queue
Inspect, replay, and purge failed deliveries with per-subscription and system DLQ
When npayload cannot deliver a message after all retry attempts, it moves the message to a dead letter queue (DLQ). The DLQ gives you visibility into delivery failures and tools to fix and replay them.
How the DLQ works
npayload maintains two levels of dead letter queues:
- Per-subscription DLQ. Each subscription has its own DLQ. When a delivery fails for that subscription, the message lands here. This lets you debug failures specific to one endpoint or consumer.
- System DLQ. Catches messages that fail due to system-level issues (invalid channel configuration, serialization errors). These are rare but important to monitor.
What triggers a DLQ entry
| Trigger | Description |
|---|---|
| Retries exhausted | Webhook endpoint returned errors for all retry attempts |
| Permanent rejection | Endpoint returned 4xx (not retried, sent to DLQ immediately) |
| Circuit breaker timeout | Message exceeded the maximum queue time while the circuit was open |
| Serialization failure | Message could not be serialized for delivery |
Inspecting DLQ entries
List entries to see what failed and why.
// List per-subscription DLQ entries
const entries = await npayload.dlq.list({
subscriptionGid: 'sub_abc123',
limit: 25,
});
for (const entry of entries.items) {
console.log(entry.gid); // DLQ entry ID
console.log(entry.messageGid); // Original message ID
console.log(entry.channel); // Source channel
console.log(entry.failureReason); // Why delivery failed
console.log(entry.lastAttemptAt); // When the last attempt was made
console.log(entry.attemptCount); // Total delivery attempts
console.log(entry.payload); // Original message payload
}// List system DLQ entries
const systemEntries = await npayload.dlq.listSystem({ limit: 25 });Replaying failed deliveries
After fixing the underlying issue (endpoint deployed, configuration corrected), replay messages from the DLQ.
Replay a single entry
await npayload.dlq.replay(entry.gid);The message is re-delivered through the normal delivery pipeline with a fresh set of retries.
Bulk replay
Replay all entries for a subscription, or all entries matching a filter.
// Replay all entries for a subscription
const result = await npayload.dlq.replayAll({
subscriptionGid: 'sub_abc123',
});
console.log(result.replayed); // Number of entries replayed
// Replay entries from a specific channel
const result2 = await npayload.dlq.replayAll({
channel: 'orders',
});Replayed messages go through the full delivery pipeline including retries. If the underlying issue is not resolved, they will return to the DLQ.
DLQ alerts and monitoring
Monitor your DLQ to catch integration issues early. The SDK provides methods to check DLQ depth.
// Get DLQ stats for a subscription
const stats = await npayload.dlq.getStats({
subscriptionGid: 'sub_abc123',
});
console.log(stats.totalEntries); // Total entries in the DLQ
console.log(stats.oldestEntry); // Timestamp of the oldest entrySet up monitoring based on these stats. A growing DLQ indicates a persistent delivery problem.
// Example: alert if DLQ exceeds threshold
const stats = await npayload.dlq.getStats({
subscriptionGid: 'sub_abc123',
});
if (stats.totalEntries > 100) {
await alertOpsTeam('DLQ threshold exceeded', {
subscription: 'sub_abc123',
entries: stats.totalEntries,
});
}Purging entries
Remove DLQ entries you have investigated and do not need to replay.
// Purge a single entry
await npayload.dlq.purge(entry.gid);
// Purge all entries for a subscription
await npayload.dlq.purgeAll({
subscriptionGid: 'sub_abc123',
});Purging is permanent. The message data is deleted and cannot be recovered. If you might need the data later, replay the entries to a logging channel before purging.
Common failure patterns
| Pattern | Cause | Resolution |
|---|---|---|
All entries show connection_refused | Endpoint is down | Deploy or restart the endpoint, then replay |
All entries show timeout | Endpoint too slow | Optimize endpoint response time or increase timeout |
Entries show 401 or 403 | Auth credentials expired | Update webhook headers or secrets, then replay |
Mixed 5xx errors | Intermittent failures | Check endpoint logs, fix the bug, then replay |
Entries show 400 | Payload format mismatch | Check your consumer's input validation, update if needed |
Best practices
- Monitor DLQ depth for every subscription. A non-empty DLQ should always trigger an investigation
- Replay entries promptly after fixing issues. Messages in the DLQ are subject to the channel's retention policy
- Use the system DLQ as a health indicator. Entries here often point to configuration issues
- Purge entries only after you have either replayed them or confirmed they are no longer needed
- Build idempotent consumers so that replayed messages are processed safely
Next steps
Was this page helpful?