Overview:
We sincerely apologize for the disruption experienced by your users related to webhook completions. Below is a summary of what happened, the underlying cause, and the steps we are taking to address and prevent similar issues in the future.
What Happened:
-
Issue Identified (February 2025): Webhooks were temporarily disabled to efficiently load records via the CSOD Required Learning API. Once the load was successfully completed, webhooks were re-established. However, it was discovered that CSOD retains a rolling 72-hour backlog of webhook traffic. When the connection was restored, CSOD attempted to resend all webhook events from the previous three days, resulting in an influx of data and causing delays in processing new, real-time completions.
Fixes Implemented (February 2025): In collaboration with CSOD, the following actions were taken:
-
Increased webhook throttling on Degreed’s side to accelerate processing.
-
Adjusted the webhook retention period in coordination with CSOD to prevent future bulk resends of outdated webhook events.
-
Suspended webhooks temporarily while system adjustments were made to ensure a smooth transition back to real-time processing.
Impact on Users: As a result of the backlog, users experienced delays in completion data being reflected in the system. While no data was lost, real-time completion updates were delayed.
-
Root Cause:
Degreed and CSOD identified that webhook data from CSOD includes a 72-hour retention buffer, which resulted in an unexpected influx of old completion records when the webhook connection was re-established. This led to a significant delay in real-time processing as the system worked through the backlog.
Mitigation Efforts:
-
Clients with real-time webhook dependencies were prioritized to ensure minimal impact.
-
Throttling was increased to process the backlog more efficiently moving forward.
-
Webhook retention settings were modified to prevent similar issues in future large-scale deployments.
Next Steps:
-
Improved Communication: Going forward, we will proactively notify clients of any changes that may impact webhook processing, including best practices for large-scale deployments.
-
System Optimization: Degreed engineering teams are actively working on enhancements to optimize webhook traffic handling for high-volume clients.
-
Client Support & Alignment: Our team is collaborating closely with CSOD and affected clients to refine webhook usage and ensure data flows efficiently without unnecessary delays.
We deeply regret any confusion and inconvenience caused by this issue and appreciate your patience as we work towards a seamless solution. If you have any additional questions or concerns, please don’t hesitate to reach out.