Hi Jamie,
>>we get some guarantees of atomicity from the underlying tools (Phoenix/Elixir/Erlang) but questioning those assumptions (and our understanding of those assumptions) would be a good exercise.
Explicitly capturing the assumptions is a very important part of the process. For example, I too built several systems using Erlang (at Amazon from 2006 to 2010; I can't reveal which systems), so I have some (albeit dated) knowledge of incorrect assumptions about that environment. For example, if you are using the send/receive operators of Distributed Erlang (not just local/send receive), then it's worth reading this paper:
The paper starts by honestly admitting that "Since ... Erlang starts in industry, and not in a university, Erlang is very much defined in terms of its implementation". It then describes the drawbacks of lack of a formal specification for Erlang (quote: "doing verification without formal semantics is very hard"), and some of the consequences.
The consequences boil down to application authors making incorrect assumptions. As an example, the authors cite two other papers that show how incorrect assumptions about handling of network partitions were the cause of 2 serious bugs in the original leader-election algorithm in Erlang's standard library. (Sidebar: IIRC, that algorithm was written by members of the Erlang engineering team or researchers with close links to the team -- i.e. people with the best knowledge of the implementation of Erlang. However, it's been 10+ years since I read those papers, so I might be misremembering.)
The bulk of the above paper is a definition of formal semantics for Distributed Erlang. The definition is in the form of "small-step operational semantics and transition rules". I haven't studied operational semantics, but I suspect that most engineers would struggle with this notation, and so find it very difficult to fully understand and internalize the semantics. My suspicion seems to be supported by the fact that the paper says this is the third attempt to define the semantics of Erlang (hence the name of the paper 'A more accurate semantics ...'). The authors mention that their own earlier attempt was 'incomplete'. I put 'incomplete' in quotes as I think it simply means incorrect, because the authors say, "With
incomplete we mean that there are possible behaviors in the Erlang
run-time system, which can not be described by the semantics."
I suspect that the semantics would be much more understandable (and so likely more accurate), and more useful to application engineers if it was given in the form of a TLA+ specification, but I've never had time or inclination to do the translation.
Finally, I'll point out that even the above semantics is not complete. E.g. I once ran into an issue with a production Erlang system that was caused by the fact that in Distributed Erlang, process IDs contain a hidden 'creation count', which is like a generation number of the node. But the generation number only has range 1..3, so can cycle very quickly if nodes are spin-crashing or the epmd process (Erlang Port Mapper Daemon) is crashing[1]. So in some circumstances, process ids in Erlang are much more likely to be re-used than application authors realize (back then most Erlang programmers assume that the chance of pid re-use was negligible.) This is another common bad assumption about "the environment", and could be caught with a specification at an appropriate level of abstraction.
I also hit various other surprising semantic issues when implementing the Erlang port driver for Berkeley DB[2], but I've thankfully forgotten all of the details.
It's that kind of craziness with concurrent and distributed systems --when the consequences can really matter to customers-- that led me to using TLA+.
cheers,
Chris
The incarnation aka "Creation" is stored in EPMD, the Erlang
Port Mapper Daemon, that is external to all nodes.
It cycles 1,2,3 then round to 1 again. You can not change it.
It is platform independent. An undistributed node has creation 0.
Creation 0 is also used for dirty tricks having the meaning
of the current creation.
We have had plans to change it from 2 bits to 32 but it would
hit hard on erl_interface and applications using it.
-- Raimo Niskanen, Erlang/OTP, Ericsson AB
_
cheers,
Chris