[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Patterns for detecting data race with TLA+?



Thanks Calvin for the excellent example. You're right the state-space explosion is a concern for TLC. My approach is also similar to yours, with procedures and multi-steps. Except all I'm doing is incrementing and decrementing a counter:

CONSTANT N
CONSTANT Uninitialized

(*--algorithm DataRaceDetection

variables
  Memory = [i \in 1..N |-> Uninitialized],
  MemAccCnt = [i \in 1..N |-> [reads |-> 0, writes |-> 0]],
  ReadReturnValue; \* Return value of the Read procedure

procedure Write(idx, value) begin
  mem_write1:
    Memory[idx] := value;
    MemAccCnt[idx].writes := MemAccCnt[idx].writes + 1;
  mem_write2:
    MemAccCnt[idx].writes := MemAccCnt[idx].writes - 1;
    return;
end procedure;

procedure Read(idx) begin
  mem_read1:
    ReadReturnValue := Memory[idx];
    MemAccCnt[idx].reads := MemAccCnt[idx].reads + 1;
  mem_read2:
    MemAccCnt[idx].reads := MemAccCnt[idx].reads - 1;
    return;
end procedure;

fair process A = "A" begin
  A1:
    call Read(1);
  A2:
    call Write(2, 2);
end process;

fair process B = "B" begin
  B1:
    call Read(1);
  B2:
    call Read(2);
end process;

end algorithm; *)

WriteDataRace == \E i \in DOMAIN MemAccCnt: MemAccCnt[i].writes > 1
ReadDataRace == \E i \in DOMAIN MemAccCnt: MemAccCnt[i].writes >= 1 /\ MemAccCnt[i].reads >= 1

DataRace == WriteDataRace \/ ReadDataRace
\* Maybe better to check ~WriteDataRace and ~ReadDataRace in TLC for more detailed err msg
NoDataRace == ~DataRace

This approach is a bit cumbersome (especially since procedures cannot directly return a value), but should get the job done. It also is a bit tedious to parse in the error trace, as variables setting up the procedure can clutter up the error trace window a bit.

Thanks all,
Shuhao

On 2022-05-30 16:46, Calvin Loncaric wrote:
> What is the bad state that both processes writing at the same time would /cause/?

Normally this is a great idea, but as Shuhao points out, in C/C++ the data race itself /is/ the bad state, since data races are undefined behavior. Even if the data race is benign (in the sense that it does not lead to any bad states in the TLA+ model) you would still want to know about it, since it could cause bad behavior in a real C/C++ implementation.

 > splitting the read/write operations into multiple steps

Yes, that is exactly how you would check such a thing in TLA+. PlusCal does not provide a way to express the statement "process P is about to do a read/write to variable X". However, by splitting reads and writes into multiple steps, we can make it possible to write such a statement.

Here's a spec I wrote that uses that idea. It uses auxiliary variables "op" and "addr" to record what each process is about to do before it does it. The extra steps for each read/write do unfortunately increase the state space, but this feels like the right approach to me.


---- MODULE DataRace ----

EXTENDS Naturals, Sequences

(*

--algorithm my_processes

     variables
         mem   = <<0, 0>>,
         addr  = [p \in ProcSet |-> 0],
         op    = [p \in ProcSet |-> "none"],
         local = [p \in ProcSet |-> 0]; \* return value from read(a)

     define
         Addr1 == 1
         Addr2 == 2
     end define

     procedure read(a)
     begin
         start_read:
             addr[self] := a;
             op[self] := "read";
         do_read:
             local[self] := mem[a];
             op[self] := "none";
             addr[self] := 0; \* optional; reduces the state space
             return;
     end procedure

     procedure write(a, value)
     begin
         start_write:
             addr[self] := a;
             op[self] := "write";
         do_write:
             mem[a] := value;
             op[self] := "none";
             addr[self] := 0; \* optional; reduces the state space
             return;
     end procedure

     fair process A = "A"
     begin
         A1:
             call read(Addr1);
         A2:
             call write(Addr2, 10);
     end process

     fair process B = "B"
     begin
         B1:
             call read(Addr1);
         B2:
             call write(Addr1, 4);
     end process

end algorithm

*)

\* BEGIN TRANSLATION
\* END TRANSLATION

\* A data race happens when
\*   (1) there are two concurrent operations to the same address and
\*   (2) at least one of those operations is a write.
DataRace ==
     \E x, y \in ProcSet:
         /\ x /= y
         /\ op[x] /= "none"
         /\ op[y] /= "none"
         /\ addr[x] = addr[y]
         /\ op[x] = "write" \/ op[y] = "write"

NoDataRace == ~DataRace

====


On Mon, May 30, 2022 at 9:59 AM Shuhao Wu <shuhao@xxxxxxxxxxxx <mailto:shuhao@xxxxxxxxxxxx>> wrote:

    Thanks Hillel for the suggestion. Your blog post is also very
    enlightening, especially on the forbidden state transition checks
    that I
    also struggled with[1].

    For my problem though, the race conditions that occur in real-life seem
    difficult to model with TLA directly. For example, if I read/write to a
    large (multi-word) variable in something like C/C++ without explicit
    synchronization, it can result in a torn-read. Without synchronization,
    the compiler may even re-order and/or optimize the statements in
    unpredictable (undefined) manners. Since in TLA+, the variable values
    are usually abstract values, it seems difficult to me to model the
    actual "bad" event.

    I suppose what I could do is to create an auxiliary variable that
    increments and then decrements with each read and write operation,
    which
    may require splitting the read/write operations into multiple steps for
    the aux variable. Then I can use an invariant to assert that the aux
    variable never increases above 1 to ensure synchronization is properly
    implement. Not 100% sure if this is the simplest approach as I have yet
    to test it, but it might be an acceptable work around for this purpose.

    Thanks again,
    Shuhao

    [1] I've worked on an unfinished post detailing my journey on
    discovering how to write temporaral properties that specifying
    conditions like A leads to B and remains at B, or A leads to B at least
    once. Once I get it done, I may ask for more feedback here.

    On 2022-05-27 17:45, Hillel Wayne wrote:
     > Hi Shuhao,
     >
     > Your spec doesn't /necessarily/ mean the two variables are
    reading and
     > writing the same variable at the same time. Remember, each label
     > represents an /abstract/ amount of time: for all you know the two
     > actions could be two years apart. It's a good proxy for bad
    behavior,
     > but it's still an "indirect" property.
     >
     > I find that "direct properties" scale better. What is the bad
    state that
     > both processes writing at the same time would /cause/? If you check
     > /that/ as an invariant, race conditions will naturally appear in the
     > error trace. You can then write indirect properties for the specific
     > race conditions you see to make them trigger earlier in the model
    checking.
     >
     > (One useful tool for this is Action Properties
     > <https://www.hillelwayne.com/post/action-properties/
    <https://www.hillelwayne.com/post/action-properties/>>, which let you
     > assert properties on how variables change in a step.)
     >
     > H
     >
     > On 5/27/2022 4:13 PM, Shuhao Wu wrote:
     >> Hello all:
     >>
     >> I'm currently working on some lock-less algorithms with TLA+
     >> (specifically, with PlusCal). One of the things that needs to be
     >> checked is that the algorithm is data-race-free, where no two
     >> processes (in PlusCal terms) are reading and writing to the same
     >> variable at the same time. Here's a sample spec with a data race on
     >> the Global2 variable:
     >>
     >> variables Global1 = 0, Global2 = 0;
     >>
     >> fair process A = "A"
     >>   variable local;
     >> begin
     >>   A1:
     >>     local := Global1;
     >>   A2:
     >>     Global2 := 10;
     >> end process;
     >>
     >> fair process B = "B"
     >>   variable local;
     >> begin
     >>   B1:
     >>     local := Global1;
     >>   B2:
     >>     local := Global2;
     >> end process;
     >>
     >> Checking if a data race occurred in TLC is technically as simple as
     >> checking the invariant ~(pc["A"] = "A2" /\ pc["B"] = "B2").
    However, I
     >> suspect this doesn't scale very well to specifications with more
     >> variables and states, as you need to manually enumerate every write
     >> (and read) step for the all "global" variables.
     >>
     >> Is there a known pattern somewhere to check for this class of
     >> problems? My quick googling doesn't show anything. Am I approaching
     >> this problem from an incorrect angle?
     >>
     >> Thanks,
     >>
     >> Shuhao
     >>
     > --
     > You received this message because you are subscribed to the Google
     > Groups "tlaplus" group.
     > To unsubscribe from this group and stop receiving emails from it,
    send
     > an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx
    <mailto:tlaplus%2Bunsubscribe@xxxxxxxxxxxxxxxx>
     > <mailto:tlaplus+unsubscribe@xxxxxxxxxxxxxxxx
    <mailto:tlaplus%2Bunsubscribe@xxxxxxxxxxxxxxxx>>.
     > To view this discussion on the web visit
     >
    https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com
    <https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com>

     >
    <https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com?utm_medium=email&utm_source=footer
    <https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com?utm_medium=email&utm_source=footer>>.

-- You received this message because you are subscribed to the Google
    Groups "tlaplus" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx
    <mailto:tlaplus%2Bunsubscribe@xxxxxxxxxxxxxxxx>.
    To view this discussion on the web visit
    https://groups.google.com/d/msgid/tlaplus/923db76b-4cb2-1256-9130-0a86641d6c6f%40shuhaowu.com
    <https://groups.google.com/d/msgid/tlaplus/923db76b-4cb2-1256-9130-0a86641d6c6f%40shuhaowu.com>.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group. To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx <mailto:tlaplus+unsubscribe@xxxxxxxxxxxxxxxx>. To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/CABf5HMj4YG91_qSGjgqCqtpGqt1GwuH9WJTAQQ879xAy7wWrfw%40mail.gmail.com <https://groups.google.com/d/msgid/tlaplus/CABf5HMj4YG91_qSGjgqCqtpGqt1GwuH9WJTAQQ879xAy7wWrfw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/1f4279d5-e9d6-4bde-2b4c-24a12734e4fd%40shuhaowu.com.