Network Partition
Consider a set of machines connected in a network of some arbitrary topology, with the implicit expectation that every machine in the set can talk to any other machine in the set.
Note we normally refer to a process in one machine sending messages to a process on another machine. For our purposes here, when we say "machine A talks to machine B", we imply "A process in machine A successfully sends a message to a process on machine B".
In its simpler formulation, a network partition is a type of failure that results in subsets of machines being isolated from the rest. Consider machines A and B connected to switch S1, in turn switch S1 connected to switch S2, and switch S2 connected to machines C, D, E and F. If a failure makes the connection between S1 and S2 unavailable, then the set of machines { A, B } can talk to each other but can't take to any machine in the set { C, D, E, F }, and vice versa. Each one of these sets is a partition.
Defined as above, network partitions are one type of communication failure, most commonly resulting from the failure of one particular component (eg, a cable or switch or router) in a network topology. Note that this formulation allows for considering a single machine that is disconnected from a network an extreme case of a network partition: one machine is in a partition just by itself.
Sometimes the concept of network partition is used more generally, to mean any connectivity failure: in this formulation, any failure that results in one or more machines in the set to fail to reach one or more other machines in the set is considered a network partition.
- In the set { A, B, C, D, E }, it is possible that A can talk to any other machine, but every machine other than A can talk to A but nobody else. In this case the partitions are { A, B }, { A, C }, { A, D }, { A, E }. This could happen if machine A has 4 network interfaces (NICs) and has direct cable connections to B, C, D, and E. Every machine in this network can talk to every other machine if A has proper routing configured between its NICs; on routing failure (eg resulting from a mistake in a routing configuration change) the partitions discussed result.
- A machine M may be able to talk to a machine N but the machine N may not be able to talk back to M (eg, firewall misconfiguration).
In the last case above, due to the lack of symmetry it is not possible to talk about sets of machines as individual partitions in the sense used before; calling this state a network partition is a bit of a stretch, but is possible to define a set relation that captures the notion of "X can talk to Y" and build a definition on top of that. We are not interested in a precise mathematical definition for our purposes here, but is important to understand that the term "network partition" can be used in different ways depending on context.