Liv McMahonTechnology reporter and
Lily JamaliNorth America Technology Correspondent

Amazon Web Services (AWS) said late Monday that it had resolved a massive outage that knocked some of the world’s largest websites offline for much of the day.
More than 1,000 apps and websites – including social media platforms like Snapchat and banks such as Lloyds and Halifax – were impacted by problems that Amazon said were at the heart of the cloud computing giant’s operations in the US.
The platform outage monitor Downdetector said user reports of problems globally soared to more than 11 million during the outage on Monday.
Even after Amazon fixed the underlying problem, experts said the outage demonstrated the perils of having so many companies rely on a single, dominant provider.
“What this episode has highlighted is just how interdependent our infrastructure is,” said Prof Alan Woodward of the University of Surrey.
“So many online services rely upon third parties for their physical infrastructure, and this shows that problems can occur in even the largest of those third-party providers.
“Small errors, often human made, can have widespread and significant impact.”
The issues appear to have begun at around 07:00 BST on Monday, as users began to report problems accessing a slew of platforms.
This included a wide range of different sites and services, from massive online games like Fortnite to the language-learning app Duolingo.
Early in the day, Downdetector told the BBC it had seen more than four million reports from users across 500 sites within just a few hours – more than double the amount it would see across an entire regular weekday.
These later peaked at more than 11 million, it said, as more services including Reddit and Lloyds Bank attempted to recover.
At around 2300 BST, Amazon said all AWS services had “returned to normal operations.”
But not before the company had to throttle parts of its own system in order to address the root issue.
A new series of “cascading failures” may have arisen after the initial outage, according to Mike Chapple, an information technology professor at Notre Dame University.
“It’s like when you have a large-scale power outage. Crews start working to try to bring it back on line,” Mr Chapple said. “The power might flicker a few times,” he explained, but it’s possible Amazon had initially “only addressed the symptoms” and not the cause.
What went wrong?
Amazon has not yet fully detailed what caused Monday’s outage or issued an official statement regarding it.
It said in an update on its service status web page the issue “appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1”.
DNS, which stands for Domain Name System, is often likened to a phone book for the internet.
It effectively translates the website names people use (like bbc.co.uk) into numbers which can be read and understood by computers.
This process basically underpins the way we use the internet, and disruptions to it can leave web browsers unable to locate the content they are looking for.
Matthew Prince, chief executive of Cloudflare, told the BBC the AWS outage highlighted the power cloud services have over how the internet works.
“Everyone has a bad day, today Amazon had a bad day,” he said.
“There are amazing things about the cloud, it allows you to scale… but if you have an outage like this it can take down a lot of services we rely on.”
And Cori Crider, head of the Future of Technology Institute, told the BBC it was “a bit like a bridge collapsing”.
“An essential part of the economy has fallen to pieces,” she said.
And with so much of cloud computing relying on Amazon, Microsoft and Google – estimated at around 70% – she said the status quo was “unsustainable”.
“Once you have a concentrated supply in a handful of monopoly providers, when something like this falls over, it takes a huge percentage of the economy out with it,” she said.
“We should really look at trying to buy more local services, rather than relying on a handful of American monopoly platforms.
“That’s a risk to our security, our sovereignty and our economy and we need to look at structural separations to make our markets more resilient to these kind of shocks.”
One computer science expert says some of the responsibility rests with the companies that use AWS.
“Companies using Amazon haven’t been taking enough adequate care to build protection systems into their applications,” says Ken Birman, a computer science professor at Cornell University in New York.
Outages like the one on Monday occur frequently, although not always at this scale.
Birman tells the BBC that app developers should take care to invest in backing up mission-critical applications that live in the cloud.
“We know how to make these systems stronger, and we know how to do it securely,” Birman says.
The question of responsibility could well land in the courts.
More than a year after the massive CrowdStrike outage, Delta Airlines is still wrangling with the company to recover more than $500m in losses.
Even after CrowdStrike had fixed the issue, the airline said it had to manually reset 40,000 servers, leading to major flight delays over several days.
Additional reporting by Esyllt Carr.
