[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Experimenting with PlusCal / TLA+ at Dropbox



Hi Dan,


I understand that this issue is not present in the spec that you posted.

I believe the issue was present in the original spec, but the modifications that you made to cas_sentinels fixed it. In the original spec, a writer can proceed with db_commit if cas_error_keys is empty, even if cas_success_keys is not equal to db_write_keys. Such a state can occur if db_write_keys is non-empty and the writer takes a cas_sentinels step when it is Down. This violates consistency because keys can be modified in the database when they are not Pending in the cache.

You fixed this issue by (a) removing cas_error_keys and comparing cas_success_keys to db_write_keys and (b) changing the if-statement in cas_sentinels to an await.

thanks for pointing this out. Indeed, I removed the variable cas_error_keys, relying on the putative invariant that it was the complement of cas_success_keys w.r.t. db_write_keys, and overlooking the fact that this need not hold when the server is down. It appears to me that this is a spec error rather than a protocol error: a server that is down should presumably not write to the database (i.e., why would db_commit not be guarded by the same if-statement as get_sentinels or cas_sentinels), but only Elliott would be able to tell for sure.

Anyway, the current spec only models server startup, but not server failures. It is easy to extend it to the case where servers have persistent memory that ensures that all local variables are restored after recovering from failure, but I don't know if this is the intended behavior.

Best regards,

Stephan