WHAT HAPPENED?
From 9:35AM EDT on Sunday, April 29th until 9:21AM EDT on Monday, April 30th, a single endpoint of the Harvest API, https://harvest.greenhouse.io/v1/applications, returned 500 errors for all requests to a subset of our customers.
WHO WAS AFFECTED?
Any customer who was informed that they were affected by the database maintenance that took place on Saturday, April 28th who also utilizes the Harvest API Applications endpoint was affected.
This issue did not affect any other endpoints in the Harvest API or any endpoints in the Job Board API.
WHAT WAS THE CAUSE?
On Saturday, April 28th, we performed scheduled database maintenance over one of our databases. On Sunday, April 29th, while performing follow-up maintenance to Saturday’s maintenance window, we updated the database configuration for several of our applications, but failed to update a piece of the configuration that affects the Harvest API Applications endpoint.
For performance reasons, the Harvest API Applications endpoint is served separately from the rest of the Harvest API. While we have monitoring over the Harvest API in general, those monitors do not cover the Applications endpoint specifically, and this misconfiguration went unnoticed until the morning of Monday, April 30th.
WHAT ARE WE DOING TO PREVENT THIS FROM OCCURRING AGAIN?
We have already added monitors over the Harvest API applications endpoint specifically, and we will be doing an analysis of other API endpoints to ensure they are all being fully monitored. We take the reliability of our software very seriously, and are committed to making changes to prevent similar issues from occurring again. Please accept our apologies for any inconvenience caused. If you have any questions, please reach out to your Customer Success Manager or create a ticket at https://www.greenhouse.io/support.