Novel queue-lock like concurrent spin-lock formalization in TLA+

See https://sstewartgallus.com/git?p=qlock.git;a=blob;f=tla/StackLock.tla;h=6884d808b5ffd605e912f4b12ae38928fd9b8313;hb=HEAD for the code.

On many-core machines spin locks can be slow because the cores ping each other constantly. To improve on this people often use test-and-test and set locks as loads from a memory location are often cheaper than exchange and compare and exchange operations. However, there is still a small amount of intercore communication which can be expensive on machines with a whole lot of cores. To improve on this the MCS and CLH queue locks were invented. Basically, in these algorithms each waiting core spins on its own separate memory location which is linked into a queue. This may possibly be faster in some scenarios although for little-core machines the added complexity is often slower. A deficiency with these algorithms is that the waiters are ordered in strict FIFO queue order and no other scheme is possible. For example, one might want to use a priority queue (for hard real-time scenarios) or a stack (for better raw performance.) I came up with an algorithm (which I think is new but might possibly not be as I am not really an expert on this topic) that allows the ADT details to be easily replaced by any structure that offers push, pop and isEmpty functions. This algorithm can then be used with stacks or priority queues.

I implemented this algorithm in Rust (with the specific case of a stack) and then formalized the abstract details in TLA+ which led to me discovering and fixing a deadlock bug. So far, I have run and tested the model under a number of threads but past 4 or so threads the model checker becomes very slow. Future steps should involve checking to see if more properties can be asserted and tested under the model, scaling the analysis to a larger number of threads, possibly formalizing an abstract proof of correctness for any number of threads and to experimentally validate and check an operation log of the implementation for linearizability with a checker such as Knossos.

Does my TLA+ code look correct? Can anyone any further steps I should take? Does anyone have any other thoughts?