Writing Specifications for a Distributed System using Ivy

Before we jump into writing specifications in a distributed setting, we first define what a specification is. I take the definition from the magnificent Ken McMillan: a specification is a statement.

A statement describes an abstract view of a program. The view itself is often at an interface, which hides or abstracts internal states. A specification is stated in terms of two elements:

The way we write specifications is through an abstract program that observes or monitors all program events. This abstract program is able to remember the execution history of program being monitored, and decides, at any given moment, whether an action is allowable according to the specification.

One way to implement this abstract monitor program is to use guarded command form:

\[e(V):\ \gamma (S,V) \rightarrow {S := \tau(S, V)}\]

It means if a guarded command \(\gamma\) determines a given event \(e\) satisfies certain specifications with parameter \(V\) under state \(S\), then we accept the code and then deterministically update the state with a function \(\tau\).

The observation \(E\) of system is going to be a finite sequence of events, which corresponds to the system behavior, denoted as \(e_0(V_0)…e_{n-1}(V_{n-1})\). A run of \(E\) is a state sequence \(s_0\ …s_n\) such that for \(i\in 0\ … n- 1\), \(\gamma(s_i, V_i)\) is true and \(s_{i+1} = \tau(s_i, v_i)\). Observation \(E\) is accepted by the specification iff it has a run. We can test whether an observation is accepted by just executing the guarded commands. In layman’s term, if all guarded commands accepts the their corresponding event at a given time, then the sequence events must satisfy our specification and should be accepted.

Now let’s replicated file as an example. Out first informal attempt to the specification for “append” operation would be:

However, the problem with this specification is that this is a liveness property, meaning that we can’t practically test such property by observing a finite sequence of events. Therefore, we resort to a different safety specification we can test:

Now we convert liveness to safety by explicitly defining the moment hen the eventuality should hold.

Liveness property means a good thing eventually happens. A liveness property can be refuted by finite execution. Safety property means a bad thing never happens. A safety property can always be refuted by a finite execution.

To see how replicated file specification plays in action, we use the example given in Prof. McMillan’s presentation. The code is written in Ivy and is pretty self-explanatory. In this demo we only have two processes.

To install Ivy, simply execute virtualenv ivyenv && source ivyenv/bin/activate && pip install ms-ivy. This is tested on Ubuntu 18.04 LTS and may vary slight on other distros.

#lang ivy1.8

include numbers
include collections
include network

global {
    alias byte = uint[8]
    instance file : vector(byte)
    type pid = {0..1}
    instance net : tcp.net(byte)
}

process host(self:pid) = {
    export action append(val:byte)
    import action show(content:file)
    instance sock : net.socket
    var contents : file

    after init{
        contents := file.empty;
    }

    implement append {
        contents := contents.append(val);
        sock.send(host(1-self).sock.id, val);
        show(contents);
    }

    implement sock.recv(src:tcp.endpoint, val:byte) {
        contents := contents.append(val);
        show(contents);
    }
}

Then we form our specification based on the guarantee that if all sent messages are delivered, the two replicas are identical. The specification is equivalent to the guarded command we’ve talked about earlier.

specification {
    var msg_count : nat

    after init {
        msg_count := 0;
    }

    after host.sock.send(self:pid, dst:tcp.endpoint, val:byte) {
        msg_count := msg_count + 1;
    }

    after host.sock.recv(self:pid, src:tcp.endpoint, val:byte) {
        msg_count := msg_count - 1;
        ensure msg_count = 0 -> host(0).contents.eq(host(1).contents);
    }
}

We wrote the above code into a file named append.ivy and we generate the testing code using ivyc target=test append.ivy. Then we run the code using ivy_launch append.ivy.

Interestingly, the program yields an error message:

`ivy_shell`; ./append "[[0,{addr:0x7f000001,port:49124}],[1,{addr:0x7f000001,port:49125}]]"
> host.append(1,251)
< host.show(1,[251])
< host.show(0,[251])
> host.append(1,46)
< host.show(1,[251,46])
> host.append(0,183)
< host.show(0,[251,183])
< host.show(0,[251,183,46])
< host.show(1,[251,46,183])
assertion_failed("append.ivy: line 49")
append.ivy: line 49: error: assertion failed

What happens is the program generates tests that randomizes message arrival times and we can see a delivered message may arrive after its target sends another message, therefore creating corrupted file contents.

Notice that here we are actually running on real network to find counter examples, the downside is the test may be arbitrary long depending on the randomized testing cases. Instead, we will use bounded model checking (BMC) to test if the specification is correct. This way we can reply purely on the logic of our specification instead of running on the real network. The Ivy checker uses Z3 Theorem Prover.

BMC construct a boolean formula that is satisfiable if and only if the underlying state transition system can realize a finite sequence of state transitions that reaches certain states of interest.

To tell Ivy using bounded model checking, we add the following lines to append.ivy:

axiom host(0).sock.id ~= host(1).sock.id

attribute method=bmc[10]

Executing ivy_check detailed=false append.ivy, we see an error message:

> host.append(1,80)
< host.show(1,[80])
> host.append(0,64)
< host.show(0,[64])
> host.sock.recv(0,{tcp.endpoint.addr:...,tcp.endpoint.port:...},80)
< host.show(0,[64,80])
> host.sock.recv(1,{tcp.endpoint.addr:...,tcp.endpoint.port:...},64)
< host.show(1,[80,64])
append.ivy: line 49: error: assertion failed

Sometimes BMC can help us find the error faster because it is systematically checking all possible actions. However, increasing the number of steps for the BMC can result in the exploration space growing exponentially, so we are going to use some combination of BMC and randomized test cases.