Hi Dan,
thanks for pointing this out. Indeed, I removed the variable cas_error_keys, relying on the putative invariant that it was the complement of cas_success_keys w.r.t. db_write_keys, and overlooking the fact that this need not hold when the server is down. It appears to me that this is a spec error rather than a protocol error: a server that is down should presumably not write to the database (i.e., why would db_commit not be guarded by the same if-statement as get_sentinels or cas_sentinels), but only Elliott would be able to tell for sure. Anyway, the current spec only models server startup, but not server failures. It is easy to extend it to the case where servers have persistent memory that ensures that all local variables are restored after recovering from failure, but I don't know if this is the intended behavior. Best regards, Stephan |