Why You Should Not Try to Order Your Vector Clocks
Vector Clocks capture causal relationships between events in a distributed system. However, ordering them is not easy and not always…
Logical clocks capture chronological and causal relationships between events in a distributed system. Instead of a physical clock, the number of events is counted to create a notion of time within a system. For one, Lamport Timestamps employ a single, shared event counter that is exchanged with each message. These timestamps then allow the ordering of events to preserve causality between them.
Vector Clocks keep individual counters of events per server. While this increases complexity and message size, Vector Clocks can further identify concurrency between events. This allows for the detection and resolution of concurrent modifications. While ordering Vector Clocks is feasible, it is not straightforward (and should not be necessary).
What is Causality?
Causality describes the potential influence of some event a on some other event b. For example, if a happened before b on the same server, then a might have influenced b. The same holds if messages were exchanged between servers such that there is any way event a influenced event b.
This influence is captured by the so-called Happened-Before Relation (also denoted as ⟶). Two events are called concurrent if neither of them happened before the other.
What is the Clock Consistency Condition?
Lamport timestamps capture causal relationships in a higher timestamp, meaning that if a happened before b, then C(a) < C(b) (known as the clock consistency condition).
Vector Clocks provide a stronger notion: a happened before b iff VC(a) < VC(b). This means that Vector Clocks also capture if two events are not causally related, i.e., concurrent. For example, if neither VC(a) < VC(b) nor VC(b) < VC(a) holds — the events (or rather their timestamps) are then called concurrent, denoted by VC(a) || VC(b).
Why Do We Want Consistent Ordering?
Consistent ordering of distributed events mainly incurs two aspects:
- The order should preserve the causal relationships between events.
- The order should be consistent across nodes, i.e., the order has to be unique.
Once we can order our events consistently, each server can order events and apply changes similarly, ensuring consistent states across servers. Hence, consistent ordering directly enables consistent states (given the same events).
Ordering is Straightforward with Lamport Timestamps
As Lamport Timestamps are just integers, we can sort them with our favorite traditional sorting algorithm. And, since a potential causality always implies a higher timestamp, ordering based on Lamport Timestamps preserves the correct causal order of events.
However, it does not define a total order for sets of events: Two events with the same timestamp, i.e., need a “tie-breaker” such as the server ID that generated that event (or some other unique property). Without such a tie-breaker, the order of events might depend on the initial order and the sorting algorithm, i.e., it is not necessarily reliable.
Why Ordering Vector Clocks is Hard
Ordering Vector Clocks, however, is not as straightforward. While the Lamport Timestamps imply a partial order regarding event causality, the timestamps are totally ordered. Vector Clocks, on the other hand, can be concurrent. This concurrency, however, is not transitive, i.e., for three events a, b, and c, even if VC(a) || VC(b) and VC(b) || VC(c), it is not implied that VC(a) || VC(c).
An Example
Consider the following example:
VC(a) = [1, 0]
VC(b) = [0, 1]
VC(c) = [2, 0]
Then VC(a) || VC(b) and VC(b) || VC(c) but VC(a) < VC(c)
Hence, for this example, valid orders are [a b c], [b a c], and [a c b].
Note that even a tie-breaker cannot directly solve this issue if the initial order is different because Vector Clocks do not have a total order (as opposed to Lamport Timestamps) — this is a common misconception (and I used to believe this). The problem lies in how standard sorting algorithms work: They rely on a strongly connected property of the values, i.e., it always holds a < b or b < a (or a=b). This property does not hold with Vector Clocks because they can be concurrent (which does not mean they are equal).
One could first order the events based on some arbitrary tie-breaker (note that in the best case, this one should be globally unique) and then sort the events using a stable topological sorting algorithm, e.g., a stable version of Kahn’s Algorithm.
Why Employing Vector Clocks for Ordering Does Not Make Sense (Usually)
Compared to simple Lamport Timestamps, Vector Clocks can capture the concurrency of events. This is a convenient feature since it allows, e.g., resolving concurrent modifications.
However, the order itself is usually irrelevant when considering concurrent events. Typically, an arbitrary tie-breaker dictates the outcome (often, one would need to resolve the conflict manually). For the sole purpose of ordering, Vector Clocks do not provide additional benefits over Lamport Timestamps.
There are certain cases, however, when it could make sense to employ Vector Clocks for ordering. For instance, when your events are equipped with a wall clock timestamp, this physical timestamp can be used to order concurrent Vector Clocks, i.e., preserve causality while using a physical clock for improved ordering.
Ordering Vector Clocks: The Easy Way Out
Lamport Timestamps and Vector Clocks work similarly in gauging the number of events in the distributed system: Both increase counters and take the maximum when merging clocks. Hence, summing up all the entries derives a Lamport Timestamp from the Vector Clock. This sum then allows for deriving a total order with the help of a tie-breaker.
Conclusion
While feasible, Vector Clocks are not the solution to ordering your events. Vector clocks capture concurrency so potential conflicts can be resolved; the concurrency does not affect the sorting. Sorting Vector Clocks is more complicated, and the partial ordering implied by Vector Clocks is not solved by just adding tie-breakers, as with Lamport Timestamps, which (by the way) already provide an excellent way to order distributed events in a causally preserving way (cf. [1]).
[1]: Time, Clocks, and the Ordering of Events in a Distributed System — Leslie Lamport