Time and Order of Events in Distributed System

1. The Need for Logical Clock

One
of the challenges in distributed system is the lack of global time
clocks, it's very hard to timestamp events is different processes and
order them globally.

To solve the "Time & Order of Events in Distributed System" problem, let's rethink what "Time" and "Order" means - essentially, Time is a property of events that is used to Order or Sequence them.

In distributed system, many events occur in different process independently, we can't and no need
to find a deterministic order between two independent events. But are
there dependent/related events? Yes, events are related when one
causally affects another, for example through communication between
processes.

Causal action determines order among events,
and we must use some form of information to represent this kind of
orders. This kind of information catches the causal relationship of
events but don't need a real time/clock, it is called Logical Time or Logical Clock.

2. How to implement Logical Clock?

What we have in hand?
- Ordered Evends within a Process
- Causual Relationship between some Crosss-Process Events

With these information, we can define a relation ship called "Happend Before"
- 1. if Event E1 and E2 are from the same process, and E1 comes before E2, then E1 Happened Before E2
- 2. if E1 is sending of a message and E2 is receiving of the same message, then E1 Happened Before E2
- 3. if E1 Happened Before E2, and E2 Happened Before E3, then E1 Happened Before E3.

In essence, we have a partial order among the events in distributed system.

2.1 Lamport Clock

Lamport Clock is a mechanism that can extend a partial order into a total order that is consistent with the original partial order.

The algorithm is:

1. Each process set a initial clock value(arbitary) on start up;
2. A
process increments its counter between two consecutive events in that
process; (Send message and Receive message are all events)
3. When a process sends a message(after doing step 2), it includes its counter value with the message;
4. On receiving a message(after doing step 2), the receiver process sets its counter to be greater than the clock value in received message AND greater than or equal to its present clock value;
5. If two events have the same clock value, use its associated process ID to determine the order between them.

After
applying this algorithm, we can get a total order (each event pair can
be compared in terms of order) of all the events in a distributed
system.

The invariant of Lamport clock algorithm is that: if event E1 is Happened Before event E2, then Clock(E1) > Clock(E2).

But
the drawback of Lamport Clock is that: if Clock(E1) > Clock(E2), we
don't know whether E1 is happened before E2 or E2 is happened before
E1. The root cause of this problem is that, Lamport Clock lost some
information during the extending algorithm. In another word, the
resulting total order is just one of those valid total orders.

Sometimes, we need to keep the whole partial order information. To this end, people invented so called Vector Clock.

2.2 Vector Clock

The main purpose of vector clock mechanism is to retain the complete partial order information(I.E. all possible total orders, not just one of them, as in Lamport Clock) in a logical clock system.

The basic observation is that, in Lamport Clock algorithm, we only retain the information that a receiving message event is causally afftected by the sending message event, but some information about the fact that the receiving message event is NOT causally impacted by some events in other processes is lost.

To
fix this problem, we should not only use the clock value of the sending
process, but also the clock value from other processes to identify all
causal events from all processes in the whole distributed system.

So the basic idea is:
- Use a vector to represent event time/clock
- Each element stores the clock value for one process

Combining
the upper idea and Lamport Clock algorithm, we can design a new
algorithm to produce vector clock value for each events in the whole
distributed system:

1. Initially all clock values are set to zero;
2. Before a process experiences an internal event, it increments its own clock value in the vector by at least one.
3. Each time a process prepares to send a message, it conducts Step 2 and then sends its entire vector along with the message being sent.
4. Each time a process receives a message, it conducts Step 2 and updates each element in its vector by (suppose the sending process is Ps):

If local_vector[Ps] <= other_vector[Ps] then
local_vector[Ps] = other_vector[Ps] + 1;
for each element i other than Ps do
local_vector[i] = MAX(local_vector[i], other_vector[i]);

From the algorithm we can see that, an element of the vector clock of an event means that this event only causally dependents on the events happened before that time in the corresponding process.

How to use Vector Clock obtained by the upper algorithm to infer event order?
- Let vClock(x) denote the vector clock of event x
- $VC(x) ＜ VC(y) \iff \forall z [VC(x)_z \le VC(y)_z] \and \exists z' [ VC(x)_{z'} ＜ VC(y)_{z'} ]$
[ In English: vClock(x) is less than vClock(y) if and only if
at least one element in vClock(x) is strictly less that that of
vClock(y), and other elements are less than or equal to those in
vClock(y). ]
- X Happened Before Y <=> vClock(X) ＜ vClock(Y)

In a word, X Happened Before Y if and only if at
least one element in vClock(x) is strictly less that that of vClock(y),
and other elements are less than or equal to those in vClock(y)
.

[Reference]

1. Order

by Leslie Lamport (1978)
4. Timestamps in message-passing systems that preserve the partial ordering by Colin J. Fidge (1988)