I have a proprietary protocol that describes the exchange of messages of two participants (applications) via a messaging broker (mqtt). A participant is of type A or B. Each type is allowed to send and receive a specific set of messages. The protocol defines a strict sequence in which the messages are allowed to be exchanged.

For example:

  • A has to start the conversation with sending message M with field Z having value 1.

  • B then receives the message and has to answer with Message N with field Z having value (newValue = oldValue + 1) = 2.

I would like to offer all implementors of the protocol a programming language independent artifact that gives them the ability to verify that the received message is valid and allowed according to the order without writing business logic in their implementation.

I thought about modeling the protocol as a FSM (Finite State Machine) and generating a Regex from it.

Is this a good idea? Is there a standard solution for this problem?

2

A Finite State Machine is essentially a graph where each state is a node.
Since the total number of states is finite the current state can be represented as a single integer.

The edges of the graph are labelled with a trigger condition – in this case the transfer of a message. Assuming there are only two parties involved in the protocol (lets call them client and server) and that the client is the party that initiates the flow (sends the first message) every message that the client sends will be received by the server and vise versa – hence if both parties know their role (client or server) they can use the same FSM.

If ALL you need is an FSM, you simply need to pick a format that allows you to serialize the graph. Then you need a minimal implementation in whatever programming language you are using, to track the state of the conversation, specifically the state will be represented as a boolean and an integer:

  • The role of the current party (client or server)
  • The integer state of the conversation.

I am envisioning a wrapper that integrates with the connection code so that it sees all inbound and outbound messages – updating it’s state as each message passes – which would then throw an exception, if it ever gets into an invalid state. Perhaps this logic already exists in workflow engines, but that is probably overkill for what you need, you may be better off implemented it yourself.


However there is an elephant in the room:

If ALL you need is an FSM

As stated above, the FSM is not capable of capturing your requirement of newValue = oldValue + 1 it is simply tracking state transitions, based on the types of messages being sent.

Could we extend the graph representation to capture these kind of constraints ?

Maybe … we are now saying that any implementation of this validator has to maintain visibility on two messages (the previous one and the one about to be sent/received) and that transitions between states can have conditions that can check values in either message and/or compare values in both messages. This is getting cumbersome, but it is still technically possible to implement it.

However, at some point you may get to point where you need to store additional arbitrary state, at that point you have moved from an FSM to a full Turing machine / Turing complete language. In such a case I don’t see the value of inventing yet another language for this.

1

You describe some fairly general error checking, along the lines of
pre- and post-conditions.
No, a regex does not seem like a good fit for this.

There’s enough checks that if in 2023 we have some number of checks,
in 2024 it seems likely we will add one (or a bunch of) checks.
Placing a burden upon several implementors in various languages.
So avoiding “translate from this representation
to that target language” seems advantageous.

Recommend you offer a validation tool, written in python or some other
language you find convenient, which accepts log records.
Require implementations to have a logging verbosity flag
so when needed they produce log records in a standard format,
perhaps JSON.
Validate such logs, flagging any protocol violations.


Log records will likely include timestamps,
and perhaps the ID(s) of recently received message(s)
so we can infer the causality of happens-before relationships.
Your “oldValue + 1” example suggests that we might need to
be routinely logging a fair bit of state, local variables
that in some circumstances the validator may need to inspect.

Log messages should definitely mention they describe v1
of the protocol, or v2 or however you rev the spec.
Because it will change.
And you will care about back-compat interoperation
between peers running code of different vintage.

If timeouts figure into your protocol,
then you’ll want some approach to exercise corner cases,
minimally by introducing random delays or dropped messages.

A logging setup supports both an online proxy validator,
which can tear down a session immediately upon noticing
a violation, and also an offline historic checker
that can report on yesterday’s run.
Offline checking works nicely both for automated
integration tests and for scrutiny of a full day’s
worth of production interactions.

“I have a problem. I know, I’ll use regexes. Now I have two problems.”

What you have is a sequence of message summaries between two parties, and you want to verify that the sequence is correct. This is almost precisely the problem that Don Good’s group at The University of Texas at Austin worked, as a formal verification problem, in the early 1980s.

I would be VERY surprised to learn that J. Strother Moore’s group at UT Austin did not know of similar work, using ACL2. ACL2 contains a formally verifiable subset of Common LISP and includes a theorem prover. I’d start there.

You wrote

to verify that the received message is valid and allowed according to the order

which sounds to me like two separate problems:

  1. verify that a received message M (by B) or N (by A) is syntactically valid, and follow certain semantical contraints which can be validated by looking at the individual message alone

  2. verify that the order constraints are fulfilled, which can only be analysed by looking at M and N together (by A).

Note the 2nd problem should not be intermixed with monitoring any state of some counters. When the correctness of a message N depends on some former message M, N needs to include either a reference to the unique ID of M, or the full copy of the content of M, so party A (having kept a copy of M) can pass M and N together into a stateless validator. The information that M is the first message of a sequence can be indicated by not having a “former message” ID. So there should be no need for using any kind of state machine inside the validation – the only place where some state needs to be managed is the place where you keep a history of former messages, and that can be kept completely separate from the validation.

You also wrote it is a proprietary protocol – but how proprietary? When the protocol is based on some standard like JSON or XML, for part 1, one could utilize XmlSchema or JSON schema. For part 2, one could utilize XSLT or some XSLT equivalent for JSON, writing some predicates which test the conditions. XmlSchema validators and XSLT processors are available for any modern language environment, and I guess same is true for JSON validators and a few of the JSON-XSLT equivalents.

For a fully proprietary protocol, however, the situation can become arbitrary complex, or you need an equally proprietary schema language and query language – don’t expect a standard for validations if not even the messages themselve follow a syntacatical standard. A handful of condition types might be checked by regexes, but beyond a certain complexity this will become impractical.

As a pragmatic solution, you could provide a reference implementation for the validation in some popular, high-level language like Python or JavaScript. This allows implementors to either use that reference implementation as a blueprint for creating their own, or run an embedded Python or Javascript engine inside their environment for it.