Category Archives: outage

Service Interruption

RESOLVED @ 3:22pm CST: The network issues have been resolved and everything is online at this time. We continue to investigate the root cause with our provider.

3:19pm CST: There is currently a problem with our hosting provider which is affecting all Pressable-hosted websites. We are actively troubleshooting the issue and will update this post with more details as they are available.

Service Interruption

 

RESOLVED @ 3:54 AM Central: Our hosting provider isolated the cause of the problems and made changes that should prevent further service disruptions.  We are not anticipating another loss of connectivity and will continue to monitor the situation.

UPDATE @ 3:41 AM Central:  The network connectivity issues returned momentarily, but things have recovered. The root cause appears to be some network instability between Dallas and San Antonio, Texas.  Will provide more details once they are available.

UPDATE @ 3:34 AM Central:  The network issues have been resolved and everything is online at this time.  We continue to investigate the root cause with our provider.

3:14AM Central: There is currently a problem with our hosting provider which is affecting all Pressable-hosted websites. We are actively troubleshooting the issue and will update this post with more details as they are available.

Chicago Datacenter Issue – RESOLVED

Howdy,

We just finished dealing with an issue in our Chicago datacenter that was causing several other clusters to experience instability. Our “Ursa” cluster was taking on an extreme amount of traffic that looks to be, in large part, a bot attack.

This happened as a result of the “Ursa” cluster having a set of tools not running appropriately that detects and mitigates issues like this.

We’ve cleared this up and all sites are now back up and running appropriately.

If you have any questions or concerns, please contact our help desk via your https://my.pressable.com control panel.

Thank you!

Rackspace Scheduled Critical Maintenance

This is a notice that Rackspace will be performing critical security related updates to many cloud server host machines in order to patch vulnerabilities in Xen Hypervisor.

You can read more about this maintenance here:

https://community.rackspace.com/general/f/53/t/4978

These patches/updates will require host machines to be rebooted, subsequently causing cloud servers hosted on them to require a reboot as well.

As it relates to our customers, here are the maintenance windows that we have been provided with and can expect we will begin seeing server reboots occur based on cluster:

  • Hyperion, Pegasus, Cartwheel Clusters
    • Tuesday, March 3rd 01:00 – Tuesday, March 3rd 05:00 EST COMPLETE
  • Galaxy01, Thor, Bode, Ursa, Hydra Clusters
    • Wednesday, March 4th 22:00 – Thursday, March 5th 06:00 CST
    • Thursday, March 5th 22:00 – Friday, March 6th 02:00 CST

To find out which cluster your sites are on, please reference our knowledge base article on identifying which cluster your site is on.

We definitely understand these kinds of outages are not ideal but we are hoping this early notice is helpful in the way of being able to notify your users, visitors, and customers.

If you have any questions, please feel free to contact the help desk via your my.pressable.com control panel.

Thank you!

All Systems Operational

We’re happy to report that all systems are back online and operational. We will be coming forth with a detailed explanation of what happened in the coming days, but this is what we can share so far.

  1. This was a coordinated attack on our systems.
  2. This attack used a modified version of the “Slow-Loris” attack against our platform.
  3. Due to the sophistication of this particular attack, it went undetected by the network security team at our provider Rackspace.  It made it look like our infrastructure was being overloaded, when it was not.
  4. We identified this was an attack at 1:00AM on January 24th 2015, by 5:30AM, we had a solution in place that was blocking the majority of the attacks, this is when some customers on “Bode” started noticing their websites working again.

As of 1:30PM January 24th 2015, we have the majority of the attacks blocked, and have pushed the rules to block these attacks throughout our infrastructure.

We are working as fast we can to answer tickets specific to your site, and will keep you posted.

Currently, our systems are reporting at 100%, any issues you may be experiencing now are not related to this outage, and we encourage you to create a  support ticket so we can help you.

Once again, we’re very sorry for this to have happened, we’re working to find out why we were targeted and by whom, but more importantly, we’re working to ensure we are protected against this in the future.

Credits/Refunds/Recompense?..

We will be reaching out to all of our customers who were affected, sometime next week to make this right.  At this current moment, we have some ideas, but our focus is currently on stability and prevention.

Configuration Issue Causing High Load

We are currently experiencing a configuration issue that is causing higher than normal load on some clusters in our network. This is resulting in some sites being unavailable or extremely slow to load and is unrelated to the attacks from earlier.

We will update again as soon as this has been cleared up and begins subsiding.

If you have questions or concerns, please submit a ticket via the https://my.pressable.com control panel.

UPDATE 5:05PM CST: At this time we’re seeing stability return to our systems. However, we’re still investigating with partners to determine the root cause of issues earlier. We’ll provide an update when we’re confident in the stability of the systems.

UPDATE 5:53PM CST: We are continuing to see a configuration issue that is causing higher than normal load on some clusters in our network. This is resulting in some sites being unavailable or extremely slow to load and is unrelated to the attacks from earlier. We will update again as soon as this has been cleared up and begins subsiding.

UPDATE 6:40PM CST: Servers have again returned to normal levels, but until we’ve determined the root cause we won’t call this resolved. Please stay tuned to the status blog for updates.

UPDATE 8:55PM CST: Our team is still working with providers to determine the root cause of these issues. We’ve continued to see relatively normal operating levels across our machines, but until we’re able to determine the root cause we won’t be out of the woods. Thanks for your continued patience on a very trying day for all involved.