Update (Jan. 12, 2024): T-Mobile has reached out to us to share a statement regarding the outage:
This was not a cyberattack. This has been resolved – it was an internal technical issue that temporarily impacted some platforms.
Earlier today, T-Mobile encountered a serious system outage that lasted a few hours. Although the issue has now been resolved, customers who tried to log in were worried that T-Mo could have been hacked and their accounts compromised. This is because customers encountered an error when they tried to login via the web or app. They were also redirected to a dedicated site for outages.
The outage affected several internal systems too. Apart from not being able to access their accounts, T-Mobile employees could not do anything for customers. It also didn’t help that T-Mobile Money, the Un-carrier’s banking service, also got affected in the outage.
Because of the outage, some customers reportedly got suspended due to non-payment. This was because they couldn’t pay their bills and customer support couldn’t do anything about the issue either.
According to a Reddit post (via The Mobile Report), the issue was caused by a “rogue admin ID” that was deleting internal programs from servers. The post read:
“A script was executed that deleted every single namespace managed by the Conducktor platform. A namespace is essentially an abstraction of a cluster of EC2 instances that are leased from AWS. Conducktor manages the leasing, organization, networking configuration, and API orchestration to handle deployment and configuration of AWS stuff in general but its primarily EC2 instances, K8S configuration, some Redis, Elsaticache, Routing/Load Balancers, etc. It’s a lot. Too much to list and it gets complicated in a hurry and I don’t know how to succinctly summarize it. Maybe “Giant magical AWS wrapper” ?
But the overall is that this means that every team that owned an application or service, that deployed to AWS via Conducktor, had their stuff nuked. Conducktor is very widely used in Digital for APIs and applications. So most UI applications that are served from a webserver, APIs running on a java server, etc., were impacted as those servers themselves were deployed to EC2 instances managed by Conducktor. This is why this was such a widespread problem across channels (Retail, Care/Telesales, Web and App) as well as across lines of business (Prepaid, Postpaid, Business, Tmobile Money – which nobody knows exists nor should they, etc etc).
Quoting from a guy on the bridge, this was done by “A rogue admin ID” …so …I dunno, that smells really bad to me. Like, someone’s going to jail kinda bad.“
Although the system has already been restored at this point, T-Mobile has not yet issued an update on what caused the outage. The Reddit post seems a possible scenario but is still unconfirmed at this point.
It also looks like no customer data was exposed in the outage but we still have to wait for an official statement from T-Mobile.
Source: The Mobile Report