January 27, 2013 all systems functioning normally again

As of 2:53 PM on January 27, 2013 All systems are functioning normally again. We had intermittent issues across our network.

Here’s what happened. 

One of our 4 memcached servers had run out of memory, and in the process locked up. This made it so that our database servers were seeing 8x the average calls. Since our monitors started telling us about higher than normal database activity, we started investigating the issue there.  

What did we learn?

It turns out, our monitoring on the memcached systems isn’t as good as we thought it was. Had we known that the one of the memcached server was out of commission, we would’ve been able to identify the problem, and fix it. Rather than investigating what was causing the spike in the database usage. 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s