Hello,I understand the idea but your definitions do not express what you think they do. (Did you at all play with the Toolbox, which lets you evaluate constant expressions, to test your spec?)You intend to express characters as sequences of bits, and sequences are indeed functions. However, in TLA+, functions are not represented as sets of pairs, so you cannot combine functions using set union or set enumeration. I recommend writing, for example,UTF8_1B == { s \in [ 0 .. 7 -> BIT] : s[0] = 0 }You may also use the :> and @@ operators defined in the TLC standard module for constructing elementary functions and for combining functions, as inUTF8_B234 == { (0 :>1) @@ (1 :> 0) @@ s : s \in [ 2..7 -> BIT ] }Follow up question is there better way to express this repeating pattern:{[0 -> 1], [1 -> 1], [2 -> 1], [3 -> 1], [4 -> 0]}See above how to write what you mean using :> and @@. Alternatively, you could write[ i \in 0..4 |-> IF i=4 THEN 0 ELSE 1 ]And indeed, if you decided to index your bit sequences from 1 instead of from 0, you could use standard TLA+ sequences and write <<1,1,1,0>>. The definition above could similarly be written asUTF8_B234 == { <<1,0>> @ s : s \in [ 1 .. 6 -> BIT ] }Hope this helps,StephanOn 18 Oct 2020, at 02:20, Igor Kim <bios...@xxxxxxxxx> wrote:Hi,I'm playing with TLA+ language so not a real application.Can I think if UTF-8 specification (ignoring endianness) as this?LOCAL INSTANCE NaturalsNotNegNat == {i \in Nat: i >= 0} \* array index\* BITBIT == {0, 1}\* ASCII == 1 byte UTF-8 == 0XXX_XXXXUTF8_1B == [0 -> 0] \cup [1 .. 7 -> BIT]ASCII == UTF8_1B\* bytes 2, 3, 4 for 2,3,4 byte UTF8 == 10XX_XXXXUTF8_B234 == {[0 -> 1], [1 -> 0], [2 .. 7 -> BIT]}\* 2 byte UTF8 == 110X_XXXX, 10XX_XXXXUTF8_2B ==[1 -> {[0 -> 1], [1 -> 1], [2 -> 0], [3 .. 7 -> BIT]}]\cup[2 -> UTF8_B234]\* 3 byte UTF8 == 1110_XXXX, 10XX_XXXX, 10XX_XXXXUTF8_3B ==[1 -> {[0 -> 1], [1 -> 1], [2 -> 1], [3 -> 0], [4 .. 7 -> BIT]}]\cup[2 .. 3 -> UTF8_B234]\* 4 byte UTF8 == 1111_0XXX, 10XX_XXXX, 10XX_XXXX, 10XX_XXXXUTF8_4B ==[1 -> {[0 -> 1], [1 -> 1], [2 -> 1], [3 -> 1], [4 -> 0], [5 .. 7 -> BIT]}]\cup[2 .. 4 -> UTF8_B234]UTF8_CHAR == UTF8_1B \/ UTF8_2B \/ UTF8_3B \/ UTF8_4B\* Sequence of UTF8 CharactersUTF8_SEQ == [NotNegNat -> UTF8_CHAR]--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/1ed97684-f9d4-4e62-bcce-8571345d0809n%40googlegroups.com.