Routing Chord

Repository Rust

A Chord is a peer to peer algorithm where the members of the system, nodes, are arranged in a circular fashion. Each node is responsible for the section of the circle between itself and the node that comes after it. While chords are typically used to implement a distributed hash table, this implementation uses the structure to route data instead.

How it functions

In most implementations each node needs to derive its own address, as this is easier and more distributed than assigning each address manually. Additionally, each node must be able to verify the address of incoming messages. Often this is done by using the IP address, as this is easily verifiable by the recipient.

Each node must at least track its successor and predecessor to properly form the circular structure of the chord. With just these two links messages can be routed clockwise around the circle, with each node passing the message along to its successor until the message has arrived. However, this leaves the system open to poor performance as the number of hops is proportional to the number of nodes. And the system is more brittle: if more than one node leaves at a time, the chord can end up split in two. To address this, the node also stores a selection of other nodes, called the finger table. These connections act as both backup connections and shortcuts when routing messages.

Since a chord has no centralized management structure, each node must participate in the correct maintenance of the overall structure. To do this, each node periodically performs the three following operations: stabilize, notify and fix_fingers. The first two are important for inducting new nodes into the chord. The third maintains the finger table for stability and performance.

Creating a new chord is straightforward when it comes to the logic. The first node starts and sets itself as its own predecessor. This must be true since there are no other nodes. In this scenario, the only tricky part is advertising such a small chord for others to join.

Joining a chord is more involved since other nodes must update their data to reflect the addition. First, the joining node connects to some other node in the existing chord. Since this node is the only other node it knows about, it sets that node as its successor. The chord is likely in an incorrect state, but it is connected enough for the stabilize and notify operations to take over and move the chord towards correctness.

During the stabilize operation, a node asks its successor which node it thinks is its predecessor. If the successor answers that the first node is its predecessor, then all is well and no further action is required. If, however, the successor indicates that some other third node is its predecessor, then the node must decide if it should update which node it thinks is its successor. If the new third node is closer to the node than its current successor, the new node becomes the successor. Otherwise, the original successor is maintained.

A diagram of a small chord with only a handful of nodes. It demonstrates a new node becoming the new predecessor of the node it joined behind. — When the node at 132 queries its supposed successor, it is instead informed of the joining node at 155

A diagram of a small chord with only a handful of nodes. It demonstrates a new node moving backwards in the chord, becoming the new predecessor of the node behind its join target. — In the opposite case, the joining node is the one to receive the new successor information. This puts it behind the successor's predecessor.

This process allows new nodes over time to find their correct position in the chord. However, this only updates a node’s successor. The predecessor needs updating as well. The notify operation takes care of this. Every so often each node alerts its successor to its existence. When a node receives such a notification, it checks the new location against the current location. If the new location is closer to it than the old location, it accepts the new node as its predecessor. Occasionally, nodes clear their predecessor to handle the case where their predecessor disconnected, allowing further nodes to become the new predecessor.

Through these two processes, the correct order of the nodes in the chord is maintained. The finger table, which stores connections to additional nodes, also needs to be kept up to date. When fix_fingers is run, it queries the chord to find the predecessor of chord locations that are ahead of it. The locations it queries are spaced out such that the ith query is 2^i locations ahead of the node itself. This allows the routing procedure to function like a binary search of the circular address space.

When messages are routed they are sent to the furthest node in the finger table that is still behind the destination. That node similarly routes the message ahead. Since the finger table entries are closely spaced near the node and widely spaced far away, this causes messages that are far away to take a large shortcut and smaller more precise steps as they get closer. The first entry in the finger tables is the node’s successor. The last entry in the finger tables is half way around the circle from the node itself.

This implementation

The chord structure can be used in different ways. Often it is used as a hash table to find some data that is associated with a key. In that scheme the key is a location in the chord, and that location’s predecessor is the node which is hosting the related data. This implementation does not store data. Instead, it is about creating tunnels from one of the nodes to another.

Addresses in this implementation are public keys. This allows nodes to generate their address themselves, but also messages can be confirmed to be from their claimed origin by signing the messages with the private key.

Routing data through a structure like this comes naturally since it already must route its internal query messages. This functionality is extended to allow each node to send arbitrary data to another node. In the simplest arrangement, the data is routed to a particular address in the circular address space and if that node exists, it accepts the data.

However, this implementation goes further. Instead of directly routing data from one node to another, it allows nodes to establish aliases where all data to an alias is re-routed to the aliased node. These aliases are known only by the node itself and the node that is hosting the alias. This allows nodes to create anonymous channels that are routed through the chord.

To establish an alias, a node generates a public/private key pair. It also generates a symmetric stream key to encrypt the incoming alias messages. The public key becomes the exposed alias address to which messages can be sent. The node can prove that it is the rightful owner of the alias, because it knows the corresponding private key.

The node then sends a request through the chord to discover which node is the predecessor of the alias address. That node is selected to be the host of the alias. Since addresses are public keys the node can use the address to encrypt further messages to the alias host. Only the host is able to receive these messages.

Next, the node sends the address of the alias and the address to which the alias should forward messages to the host node. The host receives the request, verifies that the message is signed by the alias’s private key, and stores an entry in its alias table.

When a message arrives for the alias, the host wraps the message with the alias’s stream key, and forwards it to the next address indicated by the alias table. When the original node receives this message it uses the alias’s private key to decrypt the message before accepting the data.

Nodes may also send messages forward through their aliases. They can sign instructions with the alias’s private key and send that message to the host node. The host can verify the message is from the alias’s owner, and perform an operation. One important operation is to send a message as if it were the alias. This allows the alias chain to be extended. The alias acts as if it were a full node, and sends a message out to another node to request it to host an alias back to it.

In this case the original is the one that generates the new key, tracks which aliases are pointing where, and generates the appropriate message for the alias to send. The host can’t be trusted to do this work, it only relays the constructed message as if it were the alias itself.

Once the chain is a few hops long no node except the original knows the full extent of the chain, and the identity of the true recipient is made anonymous. However, because people often do want to have a persistent identity, even if that identity is anonymous, the last alias’s key pair is saved from session to session. The intermediate keys can be changed and regenerated as needed to preserve the anonymity.

Additional thoughts

Since incoming messages are wrapped with the the alias’s stream key and outgoing messages are encrypted with the public/private keys, this creates a directional effect. Incoming messages will be encrypted with better performance than the outgoing messages. For this reason the outgoing messages are control messages to puppet the aliases, rather than a fully fledged communication channel.

To create a bi-directional connection, additional work is required. This could be implemented by having a single listen channel for each node. When a connection is established, a message is sent to this channel by the initiator with details about who it is and what alternate channel to use. Then if the recipient accepts the connection, it can reply with its own alternate channel. The two alternate channels then become the two halves of the bi-directional channel. Alternatively, each node can produce an incoming channel, and then de-multiplex the incoming messages using an additional identifier in the messages themselves.

This project used TCP to establish the connections, but since the data being transmitted is discrete messages this works at odds with TCP in some respects. Also, the system cannot assume that any operation succeeded on the other side, and therefore must retry operations to ensure they succeed. This also repeats some of the value that TCP provides. Future work might include a translation to UDP, which could also help with NAT hole punching.