Re: [tlaplus] Patterns for detecting data race with TLA+?

• From: Shuhao Wu <shuhao@xxxxxxxxxxxx>
• Date: Mon, 30 May 2022 20:57:12 -0400
• Feedback-id: ief094244:Fastmail
• Ironport-data: A9a23:irrt2q34n17odBp7dPbD5ZV2kn2cJEfYwER7XKvMYLTBsI5bpzMCm DBODzyOb/qCM2P1c9slYYm+9EtQv5DVyNc2TVc53Hw8FHgiRejtVY3IdB+oV8+xBpSeFxw/t 512hv3odp1coqr0/0/1WlTZhSAgk/nOHNIQMcacUsxLbVYMpBwJ1FQywobVvqYy2YLjW17W4 YuoyyHiEAbNNwBcYjp8B52r80sHUMTa4Fv0aXRjDRzjlAa2e0g9VPrzF4npR5fLatU88tqBe gr25OrRElU1UPsaIojNfr7TKiXmS1NJVOSEoiI+t6OK2nCuqsGuu0o2HKJ0VKtZt9mGt9Rsy YRVtp+pdQYGG5TjxMYhfh8GPxgraMWq+JefSZS+mcmazkmDdHK1hvszUgc5OooX/usxCmZLn RAaAGpVP1bT2qTvnur9F7cEascLdKEHOKsDvnh4ySzCTvwgSrrofIfyzoFq/Q4Rof5yQ8jlX uYzVBBBQTj+TjBwElgQD506keiygWTnaHtTr1f9Sa8fujWPkFMhgeeF3Nz9KvnVFclnmnShn 0HgwGr3KzsCJu614G/Qmp6rrraXwXmTtJgpPLG57fV3m0a72mgaThgNTx66p+O4gwi/XcheI goa4EITQbMa8UWqSpz6VkT9riPa+BEbXNVUHqsx7wTlJrfoDxixAnUFHyZCa8UateATfBN29 GOkmY3FPGk62FGKck61+rCRpDK0HCEaK24eeCMJJTfpBfGz8OnfaTqfHr5e/L6JYs7dQm6vn mjbxMQqr/BC0p5RjvTTEUXv2mr0/vD0ohgJChI7t19JAyt8bY+hIouhsB3VsKgGI4GeQV2M+ nMDnqByDdzi77nTyURho81XRtlFAspp1hWB2TaD+LF9p1yQF4aLJ9w43d2HDB4B3jw4UTHoe lTPngha+YVeOnCnBYcuPd/qWp93kfK/T4+6PhwxUjaoSsghHONg1HE+DXN8I0iw+KTRufpiY 8fEK5zE4YgyWPk+klJauNvxIZdynnxkrY8ibZ/8yBuj3NKjiI29GN843K+1RrlhtMus+V2Lm /4Gbpfi40gPAYXWP3iKmaZOfABiBSZhWfje9ZcNHsbdeVoOJY3UI7qPqV/XU9I1xf09eyah1 irVZ3K0P3Kk2iGacl/VNCo9AF4tNL4mxU8G0eUXFQ7A8xAejUyHtc/z2rNnJedPGDBLpRK1c xUERylEKvFGSzCC/D1EKJeg8sptcxOkgQ/INC2gCNT6k1iMWCSRkuIIvCO2nMXNMsZzncQ5p LKk2wzBRocbXEJpC8O+hDeH0QaqpXZE8A5tdxKgHzSQEXkANKBlLCv+ivI4OcYRMQ6FzTyfv +pT7dH0usGVy7IIHBL1aWxoYmtn/yaS3qaXIoUD0YuLCA==
• Ironport-sdr: k54a7MKxsxp/gJ1X4jBn3mvvfwcO+A8xGcZqRLptAxwiVXJyZIWnYRSJ2iIFAKHxRmyZqvI6A5 5OH4djlc0iUU5NJp/gjSfrvW8R/drYm7Z1gPcjIf2qAFLMlc9mOca6EBi4bCSJ1AoijEqFqzds vuBPKBLZPP5z/wiUkldH8ZSbYDhFWmR5zw7O6xcBkiOVgTHN6Rmr+it33xtRpEf82lFyAVy05i yQlIH9iTknWzIWLORGwrgDD96ZGLGMY1eAIIwyFISs7bUIRSt430qnXXQ11sKsRGv8PloUYGCK QU0=
• References: <c420702c-cd8c-2c58-ca2d-f9d1bd8569d0@shuhaowu.com> <6c3f1b9d-7000-1c53-0087-f6882d8afdc9@gmail.com> <923db76b-4cb2-1256-9130-0a86641d6c6f@shuhaowu.com> <CABf5HMj4YG91_qSGjgqCqtpGqt1GwuH9WJTAQQ879xAy7wWrfw@mail.gmail.com>
• User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0

Thanks Calvin for the excellent example. You're right the state-space explosion is a concern for TLC. My approach is also similar to yours, with procedures and multi-steps. Except all I'm doing is incrementing and decrementing a counter:

CONSTANT N
CONSTANT Uninitialized

(*--algorithm DataRaceDetection

variables
Memory = [i \in 1..N |-> Uninitialized],
MemAccCnt = [i \in 1..N |-> [reads |-> 0, writes |-> 0]],

procedure Write(idx, value) begin
mem_write1:
Memory[idx] := value;
MemAccCnt[idx].writes := MemAccCnt[idx].writes + 1;
mem_write2:
MemAccCnt[idx].writes := MemAccCnt[idx].writes - 1;
return;
end procedure;

return;
end procedure;

fair process A = "A" begin
A1:
A2:
call Write(2, 2);
end process;

fair process B = "B" begin
B1:
B2:
end process;

end algorithm; *)

WriteDataRace == \E i \in DOMAIN MemAccCnt: MemAccCnt[i].writes > 1

ReadDataRace == \E i \in DOMAIN MemAccCnt: MemAccCnt[i].writes >= 1 /\ MemAccCnt[i].reads >= 1


\* Maybe better to check ~WriteDataRace and ~ReadDataRace in TLC for more detailed err msg
NoDataRace == ~DataRace


This approach is a bit cumbersome (especially since procedures cannot directly return a value), but should get the job done. It also is a bit tedious to parse in the error trace, as variables setting up the procedure can clutter up the error trace window a bit.

Thanks all,
Shuhao

On 2022-05-30 16:46, Calvin Loncaric wrote:

> What is the bad state that both processes writing at the same time would /cause/?


Normally this is a great idea, but as Shuhao points out, in C/C++ the data race itself /is/ the bad state, since data races are undefined behavior. Even if the data race is benign (in the sense that it does not lead to any bad states in the TLA+ model) you would still want to know about it, since it could cause bad behavior in a real C/C++ implementation.

> splitting the read/write operations into multiple steps


Yes, that is exactly how you would check such a thing in TLA+. PlusCal does not provide a way to express the statement "process P is about to do a read/write to variable X". However, by splitting reads and writes into multiple steps, we can make it possible to write such a statement.


Here's a spec I wrote that uses that idea. It uses auxiliary variables "op" and "addr" to record what each process is about to do before it does it. The extra steps for each read/write do unfortunately increase the state space, but this feels like the right approach to me.


---- MODULE DataRace ----

EXTENDS Naturals, Sequences

(*

--algorithm my_processes

variables
mem   = <<0, 0>>,
addr  = [p \in ProcSet |-> 0],
op    = [p \in ProcSet |-> "none"],
local = [p \in ProcSet |-> 0]; \* return value from read(a)

define
end define

begin
local[self] := mem[a];
op[self] := "none";
addr[self] := 0; \* optional; reduces the state space
return;
end procedure

procedure write(a, value)
begin
start_write:
op[self] := "write";
do_write:
mem[a] := value;
op[self] := "none";
addr[self] := 0; \* optional; reduces the state space
return;
end procedure

fair process A = "A"
begin
A1:
A2:
end process

fair process B = "B"
begin
B1:
B2:
end process

end algorithm

*)

\* BEGIN TRANSLATION
\* END TRANSLATION

\* A data race happens when
\*   (1) there are two concurrent operations to the same address and
\*   (2) at least one of those operations is a write.
DataRace ==
\E x, y \in ProcSet:
/\ x /= y
/\ op[x] /= "none"
/\ op[y] /= "none"
/\ op[x] = "write" \/ op[y] = "write"

NoDataRace == ~DataRace

====


On Mon, May 30, 2022 at 9:59 AM Shuhao Wu <shuhao@xxxxxxxxxxxx <mailto:shuhao@xxxxxxxxxxxx>> wrote:

Thanks Hillel for the suggestion. Your blog post is also very
enlightening, especially on the forbidden state transition checks
that I
also struggled with[1].

For my problem though, the race conditions that occur in real-life seem
difficult to model with TLA directly. For example, if I read/write to a
large (multi-word) variable in something like C/C++ without explicit
synchronization, it can result in a torn-read. Without synchronization,
the compiler may even re-order and/or optimize the statements in
unpredictable (undefined) manners. Since in TLA+, the variable values
are usually abstract values, it seems difficult to me to model the

I suppose what I could do is to create an auxiliary variable that
increments and then decrements with each read and write operation,
which
may require splitting the read/write operations into multiple steps for
the aux variable. Then I can use an invariant to assert that the aux
variable never increases above 1 to ensure synchronization is properly
implement. Not 100% sure if this is the simplest approach as I have yet
to test it, but it might be an acceptable work around for this purpose.

Thanks again,
Shuhao

[1] I've worked on an unfinished post detailing my journey on
discovering how to write temporaral properties that specifying
conditions like A leads to B and remains at B, or A leads to B at least
once. Once I get it done, I may ask for more feedback here.

On 2022-05-27 17:45, Hillel Wayne wrote:
> Hi Shuhao,
>
> Your spec doesn't /necessarily/ mean the two variables are
> writing the same variable at the same time. Remember, each label
> represents an /abstract/ amount of time: for all you know the two
> actions could be two years apart. It's a good proxy for bad
behavior,
> but it's still an "indirect" property.
>
> I find that "direct properties" scale better. What is the bad
state that
> both processes writing at the same time would /cause/? If you check
> /that/ as an invariant, race conditions will naturally appear in the
> error trace. You can then write indirect properties for the specific
> race conditions you see to make them trigger earlier in the model
checking.
>
> (One useful tool for this is Action Properties
> <https://www.hillelwayne.com/post/action-properties/
<https://www.hillelwayne.com/post/action-properties/>>, which let you
> assert properties on how variables change in a step.)
>
> H
>
> On 5/27/2022 4:13 PM, Shuhao Wu wrote:
>> Hello all:
>>
>> I'm currently working on some lock-less algorithms with TLA+
>> (specifically, with PlusCal). One of the things that needs to be
>> checked is that the algorithm is data-race-free, where no two
>> processes (in PlusCal terms) are reading and writing to the same
>> variable at the same time. Here's a sample spec with a data race on
>> the Global2 variable:
>>
>> variables Global1 = 0, Global2 = 0;
>>
>> fair process A = "A"
>>   variable local;
>> begin
>>   A1:
>>     local := Global1;
>>   A2:
>>     Global2 := 10;
>> end process;
>>
>> fair process B = "B"
>>   variable local;
>> begin
>>   B1:
>>     local := Global1;
>>   B2:
>>     local := Global2;
>> end process;
>>
>> Checking if a data race occurred in TLC is technically as simple as
>> checking the invariant ~(pc["A"] = "A2" /\ pc["B"] = "B2").
However, I
>> suspect this doesn't scale very well to specifications with more
>> variables and states, as you need to manually enumerate every write
>> (and read) step for the all "global" variables.
>>
>> Is there a known pattern somewhere to check for this class of
>> problems? My quick googling doesn't show anything. Am I approaching
>> this problem from an incorrect angle?
>>
>> Thanks,
>>
>> Shuhao
>>
> --
> You received this message because you are subscribed to the Google
> Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it,
send
> an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx
<mailto:tlaplus%2Bunsubscribe@xxxxxxxxxxxxxxxx>
> <mailto:tlaplus+unsubscribe@xxxxxxxxxxxxxxxx
<mailto:tlaplus%2Bunsubscribe@xxxxxxxxxxxxxxxx>>.
> To view this discussion on the web visit
>

>


-- You received this message because you are subscribed to the Google
    Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx
<mailto:tlaplus%2Bunsubscribe@xxxxxxxxxxxxxxxx>.
To view this discussion on the web visit