Fastly ‘service configuration’ responsible for massive website outage

Fastly website outage
FILE PHOTO: Vehicles drive past the New York Times headquarters in New York March 1, 2010. REUTERS/Lucas Jackson/File Photo

(Reuters) – Thousands of government, news and social media websites across the globe were coming back online on Tuesday after getting hit by a widespread hour-long outage linked to US-based cloud company Fastly Inc.

High-traffic sites including Reddit, Amazon, CNN, PayPal, Spotify, Al Jazeera Media Network and the New York Times went down, according to outage tracking website Downdetector.com. They came back after outages that ranged from a few minutes to around an hour early in the morning in the United States but the middle of the day in Europe.

“Our global network is coming back online,” Fastly said.

One of the world’s most widely-used cloud-based content delivery network providers, the company earlier reported disruption from a “service configuration” but on Wednesday announced the incident was caused by a bug in its software triggered when one of its customers changed their settings.

“This outage was broad and severe, and we’re truly sorry for the impact to our customers and everyone who relies on them,” the company said in a blog post authored by Nick Rockwell, its senior engineering and infrastructure executive.

He said the problem should have been anticipated.

Fastly operates a group of servers strategically placed around the world to help customers move and store content close to their end users quickly and safely.

The company post gave a timeline of events and promised to examine and explain why Fastly had failed to detect the software bug during its own testing process.

Fastly said the bug was in a software update shipped to customers on May 12 but was not triggered until one unidentified customer carried out settings changes that triggered the problem “which caused 85% of our network to return errors.”

Fastly noticed the outage within a minute it occurring at 0947 GMT, and engineers worked out the cause at 1027 GMT. Once they disabled the settings that triggered the problem, most of the company’s network quickly recovered.

“Within 49 minutes, 95% of our network was operating as normal,” the company said.

Its networks were fully recovered at 1235 GMT and it began rolling out a permanent software fix at 1725 GMT, Fastly said.

“Incidents like this underline the fragility of the internet and its dependence on a patchwork of fragmented technology. Ironically, this also underlines its inherent strength and how quickly it can recover,” Ben Wood, chief analyst at CCS Insight, said.

“The fact that an outage like this can grab headlines around the world shows how rare it is.”

Typical service configurations for a cloud service provider can include updating security rules to protect information, or instructing a server to refresh the contents of a news site before serving it to a customer, said Andy Champagne, senior vice president at Akamai, a cloud service company.

A simple typo can be propagated to thousands of servers and cause disruptions, he said. 

Fastly, which went public in 2019 and has a market capitalization of under $6 billion, is far smaller than peers like Amazon’s AWS. The company’s content delivery network (CDN) helps websites move content using less-congested routes, enabling them to reach consumers faster.

“In the grand scheme of things, we actually think that this is a little bit of a positive for other CDNs and also just shows how difficult managing a CDN can be,” said James Fish, analyst at Piper Sandler & Co. 

Apart from Fastly, the other main CDN providers include Akamai Technologies, Cloudera and AWS.

“It certainly reminds us just how crucial so few sites and services are to our digital lives,” said Neil Campling, global TMT analyst at Mirabaud Securities.

Users received error messages quickly when they visited affected websites on Tuesday, which is an indication Fastly was not a victim of a DDoS attack, or a type of cyber-attack in which a bad actor overwhelms a network with a flood of internet traffic, Champagne said.

The United Kingdom’s attorney general earlier tweeted that the country’s main gov.uk website was down, providing an email for queries.

The disruption may have caused issues for citizens booking COVID-19 vaccinations or reporting test results, the Financial Times reported.

Websites operated by news outlets including the Financial Times, the Guardian and Bloomberg News also faced outages. 

Many of the websites affected earn revenue from digital advertising. Worldwide, websites lost over $29 million in digital ad revenue per hour during the outage, according to back-of-the-envelope estimates from media measurement firm Kantar.

News publishers came up with inventive workarounds to report about the outage when their websites failed to load.

Popular tech website the Verge used Google Docs to report news, while UK Technology Editor at the Guardian started a Twitter thread to report on the problems.

At the onset of the outage, nearly 21,000 Reddit users reported issues with the social media platform, while more than 2,000 users reported problems with Amazon, according to Downdetector.com.

Twitter users quickly reacted to the outage, creating the #InternetShutdown hashtag with KITKAT’s official handle telling its 441,500 followers “Guess it’s time to Have A Break.”

“We were offline for a few minutes because the whole internet broke down,” tweeted Jitse Groen, chief executive of food delivery group Just Eat Takeaway.com. 

Shares of New York-listed Fastly were up 7.7% after being down nearly 4% in pre-market trading. 

(Reporting by Subrat Patnaik, Noor Zainab Hussain, Chavi Mehta and Shubham Kalia in Bengaluru, Supantha Mukherjee in Stockholm, Sheila Dang in Dallas and Toby Sterling; Editing by Patrick Graham, Bernard Orr, Chris Reese and Nick Zieminski)

Be the first to comment

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.