Sunday 19th February 2017

(Sunday 19th February 2017 23:44:00 UTC)
API February 19th high latency incident post-mortem

We've identified and resolved the root cause of high latency: the primary database server had run out of memory.

Within a few moments of the incident, a secondary database server had taken over as the primary and started accepting read/write queries from the API. It took the secondary database a few minutes to finish processing the queued requests and lower latency back to normal ranges.

Once we made sure the secondary server was stable and latency had become tolerable again, we increased the primary server's available memory and began a full data resync, which took around 40 minutes to complete.

We will be constantly monitoring the database servers' resource utilization to prevent a similar incident from happening in the future.