Skip to main content
npayload is launching soon.
npayloadDocs
Guides

Dead letter queue

Inspect, replay, and purge failed deliveries with per-subscription and system DLQ

When npayload cannot deliver a message after all retry attempts, it moves the message to a dead letter queue (DLQ). The DLQ gives you visibility into delivery failures and tools to fix and replay them.

How the DLQ works

npayload maintains two levels of dead letter queues:

  • Per-subscription DLQ. Each subscription has its own DLQ. When a delivery fails for that subscription, the message lands here. This lets you debug failures specific to one endpoint or consumer.
  • System DLQ. Catches messages that fail due to system-level issues (invalid channel configuration, serialization errors). These are rare but important to monitor.

What triggers a DLQ entry

TriggerDescription
Retries exhaustedWebhook endpoint returned errors for all retry attempts
Permanent rejectionEndpoint returned 4xx (not retried, sent to DLQ immediately)
Circuit breaker timeoutMessage exceeded the maximum queue time while the circuit was open
Serialization failureMessage could not be serialized for delivery

Inspecting DLQ entries

List entries to see what failed and why.

// List per-subscription DLQ entries
const entries = await npayload.dlq.list({
  subscriptionGid: 'sub_abc123',
  limit: 25,
});

for (const entry of entries.items) {
  console.log(entry.gid);              // DLQ entry ID
  console.log(entry.messageGid);       // Original message ID
  console.log(entry.channel);          // Source channel
  console.log(entry.failureReason);    // Why delivery failed
  console.log(entry.lastAttemptAt);    // When the last attempt was made
  console.log(entry.attemptCount);     // Total delivery attempts
  console.log(entry.payload);          // Original message payload
}
// List system DLQ entries
const systemEntries = await npayload.dlq.listSystem({ limit: 25 });

Replaying failed deliveries

After fixing the underlying issue (endpoint deployed, configuration corrected), replay messages from the DLQ.

Replay a single entry

await npayload.dlq.replay(entry.gid);

The message is re-delivered through the normal delivery pipeline with a fresh set of retries.

Bulk replay

Replay all entries for a subscription, or all entries matching a filter.

// Replay all entries for a subscription
const result = await npayload.dlq.replayAll({
  subscriptionGid: 'sub_abc123',
});
console.log(result.replayed); // Number of entries replayed

// Replay entries from a specific channel
const result2 = await npayload.dlq.replayAll({
  channel: 'orders',
});

Replayed messages go through the full delivery pipeline including retries. If the underlying issue is not resolved, they will return to the DLQ.

DLQ alerts and monitoring

Monitor your DLQ to catch integration issues early. The SDK provides methods to check DLQ depth.

// Get DLQ stats for a subscription
const stats = await npayload.dlq.getStats({
  subscriptionGid: 'sub_abc123',
});

console.log(stats.totalEntries);    // Total entries in the DLQ
console.log(stats.oldestEntry);     // Timestamp of the oldest entry

Set up monitoring based on these stats. A growing DLQ indicates a persistent delivery problem.

// Example: alert if DLQ exceeds threshold
const stats = await npayload.dlq.getStats({
  subscriptionGid: 'sub_abc123',
});

if (stats.totalEntries > 100) {
  await alertOpsTeam('DLQ threshold exceeded', {
    subscription: 'sub_abc123',
    entries: stats.totalEntries,
  });
}

Purging entries

Remove DLQ entries you have investigated and do not need to replay.

// Purge a single entry
await npayload.dlq.purge(entry.gid);

// Purge all entries for a subscription
await npayload.dlq.purgeAll({
  subscriptionGid: 'sub_abc123',
});

Purging is permanent. The message data is deleted and cannot be recovered. If you might need the data later, replay the entries to a logging channel before purging.

Common failure patterns

PatternCauseResolution
All entries show connection_refusedEndpoint is downDeploy or restart the endpoint, then replay
All entries show timeoutEndpoint too slowOptimize endpoint response time or increase timeout
Entries show 401 or 403Auth credentials expiredUpdate webhook headers or secrets, then replay
Mixed 5xx errorsIntermittent failuresCheck endpoint logs, fix the bug, then replay
Entries show 400Payload format mismatchCheck your consumer's input validation, update if needed

Best practices

  • Monitor DLQ depth for every subscription. A non-empty DLQ should always trigger an investigation
  • Replay entries promptly after fixing issues. Messages in the DLQ are subject to the channel's retention policy
  • Use the system DLQ as a health indicator. Entries here often point to configuration issues
  • Purge entries only after you have either replayed them or confirmed they are no longer needed
  • Build idempotent consumers so that replayed messages are processed safely

Next steps

Was this page helpful?

On this page