Greenhouse Recruiting down for some customers

Incident Report for Greenhouse

Postmortem

WHAT HAPPENED?

At 3:36pm ET on 1/23/2019, we made a change to Greenhouse Recruiting that included an update to our primary database. The database update introduced a performance degradation and as a result we began serving errors at an increased rate.

Around 4:40pm ET, we applied a fix for the performance issues and by 4:44pm ET error rates for Greenhouse Recruiting had returned to normal.

‌

WHAT WAS THE EFFECT?

For the duration of the incident, a subset of users received errors when trying to access Greenhouse Recruiting.

A subset of Harvest API write requests and Partner API requests made during the incident also returned errors at an increased rate.

Outgoing webhooks in Greenhouse Recruiting may have been delayed during this period.

‌

WHO WAS AFFECTED?

Most customers who access Greenhouse Recruiting through https://app.greenhouse.io either encountered increased errors or were unable to access the application at all.

Customers who access Greenhouse Recruiting through https://app2.greenhouse.io were unaffected by this incident.

About 50% of customers who access Greenhouse Recruiting through a company subdomain (e.g. https://mycompany.greenhouse.io) either encountered increased errors or were unable to access the application at all.

Candidates applying through Greenhouse Job Boards or the Job Board API were unaffected. Greenhouse Onboarding and Harvest API read requests were also unaffected.

‌

WHAT WAS THE CAUSE?

The incident was caused by a change we made to a table in our database. Databases have multiple ways of fetching the same result and use a query planner to pick the most efficient one. After making the change, the query planner began issuing inefficient query plans for any queries against the modified table. The modified table is frequently queried by our application, so the bad query plans resulted in query timeouts and increased database utilization.

After we isolated the root cause, we updated table statistics for that table and the query planner began returning more efficient queries. Soon after, site performance returned to normal.

‌

WHAT ARE WE DOING TO PREVENT THIS FROM OCCURRING AGAIN?

We will be performing a review of our database change management process to ensure that similar incidents can't occur again in the future.

We take the reliability of our software very seriously, and are committed to making changes to prevent similar issues from occurring again. Please accept our apologies for any inconvenience caused. If you have any questions or concerns, please reach out via: https://support.greenhouse.io/hc/en-us/requests/new

Posted Jan 24, 2019 - 18:58 UTC

Resolved

Our monitoring system has indicated that error rates in Greenhouse Recruiting have returned to normal. We deeply apologize for the inconvenience that this incident has caused.

We will be issuing a detailed post-mortem tomorrow.

Posted Jan 23, 2019 - 23:57 UTC

Update

We are continuing to monitor for any further issues.

Posted Jan 23, 2019 - 22:58 UTC

Update

We are continuing to monitor for any further issues.

Posted Jan 23, 2019 - 22:38 UTC

Update

We are continuing to monitor for any further issues.

Posted Jan 23, 2019 - 22:18 UTC

Monitoring

Service has been restored. We are monitoring performance and will provide a full post-mortem shortly.

Posted Jan 23, 2019 - 21:57 UTC

Update

The Greenhouse infrastructure team has identified an issue in the database that is blocking additional web traffic. We're continuing to work on getting the database healthy.

Posted Jan 23, 2019 - 21:22 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Jan 23, 2019 - 20:50 UTC

Update

We are continuing to investigate this issue.

Posted Jan 23, 2019 - 20:49 UTC

Investigating

We are currently investigating this issue.

Posted Jan 23, 2019 - 20:46 UTC

This incident affected: Greenhouse Harvest API (Silo 1), Greenhouse Business Intelligence Connector (Silo 1), Greenhouse Recruiting (Document conversion, Silo 1), and Greenhouse Job Boards, Greenhouse Onboarding, Greenhouse Developer Portal.