Re: [tlaplus] Unicode

Ron,

What you are proposing to do involves modifying the core of SANY and
TLC. These programs were written on a shoestring and are neither well
structured nor well documented. They now have very few unknown bugs
because they have been largely unchanged for 15 years. Any plan to
make such extensive modifications needs to include plans for
extensively (and intelligently) testing the new version to make sure
that it doesn't introduce any bugs for ASCII specs. (For example, I
have worked to avoid having to make even the smallest changes to the
syntax based on experience with the prover because I am afraid of any
change to the JavaCC code.)

Given that, I will be reluctant to incorporate such changes without
evidence that there is sufficient demand for handling unicode symbols.
Chris Newcombe has told me that engineers do not like the
pretty-printed version of the specs because they find the symbols
unfamiliar, and they are more comfortable with ASCII specs. While I
like the pretty-printed versions, I find very little difference in
ease of reading between the two--the significant difference occurring
in symbols like \subseteq or \prec that don't occur very often. I
expect the advantages of ASCII (for example, the ability to do serious
editing in Emacs) mean that I wouldn't use unicode symbols in my
specs. So, I think you should find out how many users would find
your proposal a significant improvement.

Now to get to your proposal. There seem to be two viable approaches:

1. Make the actual .tla file ASCII and only have the unicode symbols
displayed in the Toolbox module editor. This means that there is no need
for the ASCII to be easy to read. For example, it would be quite
all right for the spec displayed as follows (where & stands for the
\in symbol)

pred(S) == \A value & S: \/ x == 1
(*NoteThisNiceCommentHere*)\/ x == 2

to produce this in the ASCII file

pred(S) == \A value \in S: \/ x == 1
(*NoteThisNiceComment Here*)\/ x == 2

Note that the ASCII file could have arbitrary information required to
display it encoded in special comments at the end of lines.

This would require almost no changes to the existing tools. The
Toolbox could translate locations in the ASCII file to locations in
the displayed text for displaying locations of errors reported by TLC
and of proof obligations reported by TLAPS. I'm not sure what should
be done about diplaying proof obligations created by TLAPS in the
course of a proof.

2. Put the Unicode in the .tla files. This potentially requires
changes to code scattered throughout SANY and TLC, making it hard to
avoid introducing bugs. One problem is in the communication between
the Toolbox and the tools. This is all done with stdio and stderr.
Do those interfaces support Unicode? If not, then thousands of print
and read statements in a hundred or so Java files need to be modified.

----

In any case, a lot of Toolbox code needs to be rewritten. You will
need to find someone to do that, because neither Markus nor I will
have time to do more than answer questions.

Cheers,

Leslie

P.S. The alignment algorithm of the pretty-printer does not guarantee
correctness. It's a heuristic that works only most of the time. But in

Plan 2, correctness of the translation to unicode isn't necessary, and

it's OK if people who want ASCII are required to write the spec in ASCII.