[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Will there be a lesson learned on AWS 2025 East-1 Outage due to DynamoDB with regards to TLA+?



Amazon’s 2022 paper, “Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service” [1], provides some insight into which parts of DynamoDB have been specified in TLA+.

In Section 5.4, the authors note:

“The core replication protocol was specified using TLA+ [12, 13]. When new features that affect the replication protocol are added, they are incorporated into the specification and model checked. Model checking has allowed us to catch subtle bugs that could have led to durability and correctness issues before the code went into production.”

Markus

https://www.amazon.science/publications/amazon-dynamodb-a-scalable-predictably-performant-and-fully-managed-nosql-database-service

> On Oct 29, 2025, at 11:31 AM, Felipe Oliveira Carvalho <felipekde@xxxxxxxxx> wrote:
> 
> "TLA+ is used in DynamoDB" doesn't mean everything is checked with TLA+. And the race condition was in the DNS service, so someone would have to think about the consequences of race conditions in a low-concurrency database of DNS records. You wouldn't even need TLA+ for that. For so long, checking and then updating DNS records non-atomically has been OK that no one thought it would be an issue.
> 
> --
> Felipe
> 
> On Wed, Oct 29, 2025 at 3:09 PM Andrew Helwer <andrew.helwer@xxxxxxxxx> wrote:
> Depends on whether someone who works at Amazon decides to write a TLA+ spec reproducing the bug or not; I don't think anybody outside the company has enough detail to do it. Amazon seems to have largely shifted from TLA+ to P as championed by Ankush Desai, but Ankush recently left Amazon so who knows what the future of Amazon formal methods use looks like. The last TLA+ conference was associated with ETAPS, and one of the industry track ETAPS talks was from an AWS employee where they talked about using a new Rust-based model-checker called Kani.
> 
> Andrew
> 
> On Mon, Oct 27, 2025 at 8:50 AM Chris Ortiz <zitroomega@xxxxxxxxx> wrote:
> Some people say it is a race condition, but as I understand TLA+ is used in DynamoDB and can catch behavior such as this thatcould have been prevented. 
> 
> Thanks,
> Zitro
> 
> -- 
> You received this message because you are subscribed to the Google Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
> To view this discussion visit https://groups.google.com/d/msgid/tlaplus/ac3f4438-8cf7-41e8-9c19-2f42eb01d9fdn%40googlegroups.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
> To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CABj%3DxUVnugpc9WfomLuwZa4s70M5scNEVCZczUF%2BK8yw0FsN%2Bw%40mail.gmail.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
> To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CAOC8YXarDkG25uPaiBeet%2BK9v_xOFs_AVnSt2W7D%2BH3D80iZdA%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/82EA9BF7-A7EE-4FD4-9A65-C26CD1CC914B%40lemmster.de.