Category Archives: outage

Rackspace Scheduled Critical Maintenance

This is a notice that Rackspace will be performing critical security related updates to many cloud server host machines in order to patch vulnerabilities in Xen Hypervisor.

You can read more about this maintenance here:

https://community.rackspace.com/general/f/53/t/4978

These patches/updates will require host machines to be rebooted, subsequently causing cloud servers hosted on them to require a reboot as well.

As it relates to our customers, here are the maintenance windows that we have been provided with and can expect we will begin seeing server reboots occur based on cluster:

  • Hyperion, Pegasus, Cartwheel Clusters
    • Tuesday, March 3rd 01:00 – Tuesday, March 3rd 05:00 EST COMPLETE
  • Galaxy01, Thor, Bode, Ursa, Hydra Clusters
    • Wednesday, March 4th 22:00 – Thursday, March 5th 06:00 CST
    • Thursday, March 5th 22:00 – Friday, March 6th 02:00 CST

To find out which cluster your sites are on, please reference our knowledge base article on identifying which cluster your site is on.

We definitely understand these kinds of outages are not ideal but we are hoping this early notice is helpful in the way of being able to notify your users, visitors, and customers.

If you have any questions, please feel free to contact the help desk via your my.pressable.com control panel.

Thank you!

All Systems Operational

We’re happy to report that all systems are back online and operational. We will be coming forth with a detailed explanation of what happened in the coming days, but this is what we can share so far.

  1. This was a coordinated attack on our systems.
  2. This attack used a modified version of the “Slow-Loris” attack against our platform.
  3. Due to the sophistication of this particular attack, it went undetected by the network security team at our provider Rackspace.  It made it look like our infrastructure was being overloaded, when it was not.
  4. We identified this was an attack at 1:00AM on January 24th 2015, by 5:30AM, we had a solution in place that was blocking the majority of the attacks, this is when some customers on “Bode” started noticing their websites working again.

As of 1:30PM January 24th 2015, we have the majority of the attacks blocked, and have pushed the rules to block these attacks throughout our infrastructure.

We are working as fast we can to answer tickets specific to your site, and will keep you posted.

Currently, our systems are reporting at 100%, any issues you may be experiencing now are not related to this outage, and we encourage you to create a  support ticket so we can help you.

Once again, we’re very sorry for this to have happened, we’re working to find out why we were targeted and by whom, but more importantly, we’re working to ensure we are protected against this in the future.

Credits/Refunds/Recompense?..

We will be reaching out to all of our customers who were affected, sometime next week to make this right.  At this current moment, we have some ideas, but our focus is currently on stability and prevention.

Configuration Issue Causing High Load

We are currently experiencing a configuration issue that is causing higher than normal load on some clusters in our network. This is resulting in some sites being unavailable or extremely slow to load and is unrelated to the attacks from earlier.

We will update again as soon as this has been cleared up and begins subsiding.

If you have questions or concerns, please submit a ticket via the https://my.pressable.com control panel.

UPDATE 5:05PM CST: At this time we’re seeing stability return to our systems. However, we’re still investigating with partners to determine the root cause of issues earlier. We’ll provide an update when we’re confident in the stability of the systems.

UPDATE 5:53PM CST: We are continuing to see a configuration issue that is causing higher than normal load on some clusters in our network. This is resulting in some sites being unavailable or extremely slow to load and is unrelated to the attacks from earlier. We will update again as soon as this has been cleared up and begins subsiding.

UPDATE 6:40PM CST: Servers have again returned to normal levels, but until we’ve determined the root cause we won’t call this resolved. Please stay tuned to the status blog for updates.

UPDATE 8:55PM CST: Our team is still working with providers to determine the root cause of these issues. We’ve continued to see relatively normal operating levels across our machines, but until we’re able to determine the root cause we won’t be out of the woods. Thanks for your continued patience on a very trying day for all involved.

Network Degredation Impacting Site Availability

We’re currently experiencing an issue with our network causing a complete degradation of services and loss of traffic. We’re working with our provider to determine the cause and source of the issues, but early signs point to a targeted attack against our systems. We’ll provide more details as they become available.

UPDATE 3:25PM CST: We’re still working with our provider to determine the source of increased traffic and to correct any issues.

UPDATE 3:55PM CST: At this time things appear to have stabilizied. However, we’re still working with our provider to determine the root cause of the issues and put any nesseceary measures in place to prevent any similar issues.

Site Availability issue on Galaxy01

We’re currently experiencing an issue with our Galaxy01 cluster of servers. This issue is causing intermittent 500/502 errors while processes fail to load. We’re currently evaluating the system and working to restore services to normal operational levels as quickly as possible. We’ll update this post with more information as we have it.

UPDATE 10:00AM CST: The team is still working to restore this cluster of servers to 100%. We’re getting closer as some new firewall rules come online and additional capacity. We’ll provide another update shortly.

UPDATE 10:45AM CST: At this time services are beginning to return to normal across the affected servers. We’re still waiting on some new rules to finish processing, but things are trending in a positive direction.

Server Outages

Howdy,

We are currently experiencing an issue with sites being down on a small portion of our network. This issue is a result of some of our servers going offline unexpectedly. We are working with Rackspace on getting these machines back up right now and things will resume working normally shortly.

If you have any questions or concerns, please contact our help desk via the https://my.pressable.com control panel.

UPDATE December 4th, 2014 @ 8:40AM Central: A large majority of servers are back online and functioning properly. We have a couple more servers waiting to come back online but most sites are back up and functioning properly again. We will update once more as soon as all servers are back online and functional.

UPDATE December 4th, 2014 @ 9:00 AM Central: All servers and sites are now back online and functional. If for some reason you are still seeing a site offline, please let us know and we can figure out what is causing it.

RESOLVED: Intermittent my.pressable.com Control Panel Downtime

Howdy,

We created a previous post about our email provider outage potentially causing issues with email sending from our servers but wanted to specifically point out that this also affects the https://my.pressable.com control panel.

Users are seeing the control panel time out or be completely unresponsive on an intermittent basis.

These issues are also related to the same issues that are caupring problems with email in general because of the ties we in place for email on my.pressable.com.

As soon as Mailgun has these issues resolved, the control panel will function properly once more.

You can follow the status of the Mailgun issues here:

http://status.mailgun.com/incidents/9s93dc83gtlw

If you have quesitons or concerns, please send an email to help@pressable.com.

Thank you!

UPDATE 09/30/2014 @ 12:40 PM Central: Email issues with all sites have been resolved. http://status.pressable.com/2014/09/29/email-provider-outage-causing-issues/

RESOLVED: Email Provider Outage Causing Issues

Howdy,

Our email provider, Mailgun, is experiencing issues that is causing our customer emails to not be sent in some cases. This originally resulted in emails from sites not being sent appropriately in addition to ones that come from our https://my.pressable.com control panel.

These issues occur as a result of a larger emergency maintenance at Rackspace itself.

As this maintenance has progressed, our email provider has subsequently been able to restore service and emails from sites have begun sending on a much more consistent basis.

As of this morning, Mailgun is still working through some lingering issues and one of those issues affects emails being sent from our https://my.pressable.com control panel.

Once they are done addressing this, emails will work appropriately across our network again.

You can read more about the Mailgun outage here:

http://status.mailgun.com/incidents/9s93dc83gtlw

You can read some about the larger Rackspace Emergency Maintenance here:

https://status.rackspace.com/

If you have any questions or concerns, please contact our helpdesk at help@pressable.com.

UPDATE 09/29/2014 @ 11:07 AM Central: Our https://my.pressable.com control panel is experiencing intermittent downtime as a part of this ongoing issue. If you are having trouble accessing the control panel, you might try giving it a few minutes and trying again. Once this issue has subsided, the panel will work normally again.

UPDATE 09/30/2014 @ 6:30 AM Central: We’re still working with our email provider (Mailgun) to resolve lingering issues caused by their downtime. Currently issues seem to only be impacting customers in the same datacenter as Mailgun services. We’ll provide an update as soon as we’ve learned more from our provider.

UPDATE 09/30/2014 @ 11:00 AM Central: We’ve confirmed with our provider (Mailgun), that they are still experiencing issues with their systems. We’re in constant communication with them providing them details as they test fixes to bring services back to 100%. We’ll update you as we have more information.

UPDATE 09/30/2014 @ 12:40 PM Central: At this time mail services have been restored across our systems. Our tests show everything working properly now and we’re expecting no further issues with email delivery. A huge thanks to Rackspace and Mailgun for staying on top of this and getting things resolved.

RESOLVED: Instability on Web Clusters

Howdy,

We are experiencing suspicious activity that is causing instability on our some of our web server clusters.

We are working on expanding the stability/performance on this cluster and identifying the source of this suspicious activity. During this time, customers may experience intermittent 502s and slow loading sites.

We will update the status blog as soon as this is finalized.

UPDATE 2:50PM CST: The team is still working to bring new hardware and firewall rules online to help mitigate the attacks on our systems. We’ve brought new hardware online for some customers which is beginning to return some stability to the systems.

UPDATE 4:00PM CST: At this time we’ve brought on our new hardware and firewall rules for all of the customers on Thor. We’re still working to get these in place for our customers on Galaxy01.

UPDATE 5:55PM CST: At this time we’ve brought up new hardware and firewall rules for all customers. Stability should be returning to the web servers, but we’re still experiencing database issues.