[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Patterns for detecting data race with TLA+?



> What is the bad state that both processes writing at the same time would cause?

Normally this is a great idea, but as Shuhao points out, in C/C++ the data race itself is the bad state, since data races are undefined behavior. Even if the data race is benign (in the sense that it does not lead to any bad states in the TLA+ model) you would still want to know about it, since it could cause bad behavior in a real C/C++ implementation.

> splitting the read/write operations into multiple steps

Yes, that is exactly how you would check such a thing in TLA+. PlusCal does not provide a way to express the statement "process P is about to do a read/write to variable X". However, by splitting reads and writes into multiple steps, we can make it possible to write such a statement.

Here's a spec I wrote that uses that idea. It uses auxiliary variables "op" and "addr" to record what each process is about to do before it does it. The extra steps for each read/write do unfortunately increase the state space, but this feels like the right approach to me.


---- MODULE DataRace ----

EXTENDS Naturals, Sequences

(*

--algorithm my_processes

    variables
        mem   = <<0, 0>>,
        addr  = [p \in ProcSet |-> 0],
        op    = [p \in ProcSet |-> "none"],
        local = [p \in ProcSet |-> 0]; \* return value from read(a)

    define
        Addr1 == 1
        Addr2 == 2
    end define

    procedure read(a)
    begin
        start_read:
            addr[self] := a;
            op[self] := "read";
        do_read:
            local[self] := mem[a];
            op[self] := "none";
            addr[self] := 0; \* optional; reduces the state space
            return;
    end procedure

    procedure write(a, value)
    begin
        start_write:
            addr[self] := a;
            op[self] := "write";
        do_write:
            mem[a] := value;
            op[self] := "none";
            addr[self] := 0; \* optional; reduces the state space
            return;
    end procedure

    fair process A = "A"
    begin
        A1:
            call read(Addr1);
        A2:
            call write(Addr2, 10);
    end process

    fair process B = "B"
    begin
        B1:
            call read(Addr1);
        B2:
            call write(Addr1, 4);
    end process

end algorithm

*)

\* BEGIN TRANSLATION
\* END TRANSLATION

\* A data race happens when
\*   (1) there are two concurrent operations to the same address and
\*   (2) at least one of those operations is a write.
DataRace ==
    \E x, y \in ProcSet:
        /\ x /= y
        /\ op[x] /= "none"
        /\ op[y] /= "none"
        /\ addr[x] = addr[y]
        /\ op[x] = "write" \/ op[y] = "write"

NoDataRace == ~DataRace

====



On Mon, May 30, 2022 at 9:59 AM Shuhao Wu <shuhao@xxxxxxxxxxxx> wrote:
Thanks Hillel for the suggestion. Your blog post is also very
enlightening, especially on the forbidden state transition checks that I
also struggled with[1].

For my problem though, the race conditions that occur in real-life seem
difficult to model with TLA directly. For example, if I read/write to a
large (multi-word) variable in something like C/C++ without explicit
synchronization, it can result in a torn-read. Without synchronization,
the compiler may even re-order and/or optimize the statements in
unpredictable (undefined) manners. Since in TLA+, the variable values
are usually abstract values, it seems difficult to me to model the
actual "bad" event.

I suppose what I could do is to create an auxiliary variable that
increments and then decrements with each read and write operation, which
may require splitting the read/write operations into multiple steps for
the aux variable. Then I can use an invariant to assert that the aux
variable never increases above 1 to ensure synchronization is properly
implement. Not 100% sure if this is the simplest approach as I have yet
to test it, but it might be an acceptable work around for this purpose.

Thanks again,
Shuhao

[1] I've worked on an unfinished post detailing my journey on
discovering how to write temporaral properties that specifying
conditions like A leads to B and remains at B, or A leads to B at least
once. Once I get it done, I may ask for more feedback here.

On 2022-05-27 17:45, Hillel Wayne wrote:
> Hi Shuhao,
>
> Your spec doesn't /necessarily/ mean the two variables are reading and
> writing the same variable at the same time. Remember, each label
> represents an /abstract/ amount of time: for all you know the two
> actions could be two years apart. It's a good proxy for bad behavior,
> but it's still an "indirect" property.
>
> I find that "direct properties" scale better. What is the bad state that
> both processes writing at the same time would /cause/? If you check
> /that/ as an invariant, race conditions will naturally appear in the
> error trace. You can then write indirect properties for the specific
> race conditions you see to make them trigger earlier in the model checking.
>
> (One useful tool for this is Action Properties
> <https://www.hillelwayne.com/post/action-properties/>, which let you
> assert properties on how variables change in a step.)
>
> H
>
> On 5/27/2022 4:13 PM, Shuhao Wu wrote:
>> Hello all:
>>
>> I'm currently working on some lock-less algorithms with TLA+
>> (specifically, with PlusCal). One of the things that needs to be
>> checked is that the algorithm is data-race-free, where no two
>> processes (in PlusCal terms) are reading and writing to the same
>> variable at the same time. Here's a sample spec with a data race on
>> the Global2 variable:
>>
>> variables Global1 = 0, Global2 = 0;
>>
>> fair process A = "A"
>>   variable local;
>> begin
>>   A1:
>>     local := Global1;
>>   A2:
>>     Global2 := 10;
>> end process;
>>
>> fair process B = "B"
>>   variable local;
>> begin
>>   B1:
>>     local := Global1;
>>   B2:
>>     local := Global2;
>> end process;
>>
>> Checking if a data race occurred in TLC is technically as simple as
>> checking the invariant ~(pc["A"] = "A2" /\ pc["B"] = "B2"). However, I
>> suspect this doesn't scale very well to specifications with more
>> variables and states, as you need to manually enumerate every write
>> (and read) step for the all "global" variables.
>>
>> Is there a known pattern somewhere to check for this class of
>> problems? My quick googling doesn't show anything. Am I approaching
>> this problem from an incorrect angle?
>>
>> Thanks,
>>
>> Shuhao
>>
> --
> You received this message because you are subscribed to the Google
> Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx
> <mailto:tlaplus+unsubscribe@xxxxxxxxxxxxxxxx>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com
> <https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/923db76b-4cb2-1256-9130-0a86641d6c6f%40shuhaowu.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/CABf5HMj4YG91_qSGjgqCqtpGqt1GwuH9WJTAQQ879xAy7wWrfw%40mail.gmail.com.