DDB

Repository RustDistributed System

The ‘groups’ functionality of the Spider platform involves a chunk of data that is fully synchronized between each of the members of the group. This is required because the goal is that each member has the same view of the data as the others. While I was working on that project, I wondered what it would be like to relax that constraint. This project explores that premise through the implementation of a distributed key value store. I also used the opportunity to explore using UDP instead of TCP.

It should be noted, that since each node can have a different view of the data, this is clearly not intended for situations where that constraint is required. This experiment is not intended to be a replacement for traditional databases.

Since there is no single synchronized view of the data, there must be another way to determine what data is seen by any given node. Part of the motivation behind both the Spider’s group data and this experiment is to try to model a distributed system after the way that human beings interact with data socially. Therefore, each node can determine its view of the data based on its trust of the source(s) of that data.

The system propagates new data through it using a gossip protocol. When a node assigns a new value to a key, it broadcasts the value to a few other neighboring nodes. These nodes in turn broadcast the value to a few of their neighbors. If a node receives a broadcast it has already heard from another node, it ignores it. In this way the new data spreads through the system. This gives each node an opportunity to decide if it trusts the origination of the data. If the data is trusted it is forwarded to the neighbors and a copy is kept locally. If the data is neutral, it is forwarded but not kept locally. And finally, if the data is distrusted, it is rejected and neither kept nor forwarded.

Each node curates what data is allowed into its local set by determining trust. There are two parts to this determination. First, the user can set the node’s trust of some source to some value. This defaults to a neutral trust for new nodes. Next, each of the neighbors are asked what their trust of the source is. The neighbor’s trust is then adjusted by the node’s trust of that neighbor. Trusted neighbors’ opinions are given more weight than neutral ones. The combined result is applied as an adjustment to the base value set by the user.

When values are sent through the network, there needs to be a way to know if the new value should supersede the current value. Also, some node might not trust the most recent sender and consider a less recent value to be its current value. Two implications come from this. Values must have some kind of sequence number to order them, and some number of historical values should be kept for the system to be able to produce a value when the more recent values are undesirable.

Ordinarily, introducing a sequence number like this would imply some way to strictly assign the numbers uniquely, and in order. Unfortunately, this goes against the premise of the experiment. The system must be distributed, and not require nodes to strictly sequence anything. However, since the system is not required to provide the same view to each node, there is a way around this.

If a node sets a new value with the next known sequence number, but due to latencies in the network, another node has chosen its next sequence number to be greater, this is ok since other nodes can overwrite your values at any time, in this case it was immediately. If two nodes simultaneously choose the same sequence number, we can use the trust system to break the tie. Nodes will use whichever sequence number is from a source they trust more, and as nodes can have different views of the system this is not a violation.

Uses

A database where nodes can construct their own version of the stored data isn’t straightforwardly useful. What could this be used for? Since this system is designed in a way that mimics human social behaviors, it could be used to underpin a social media application. This way users can block each other, or indicate that they want to see more data from another trusted user. This behavior can then be directly integrated with the data layer instead of synthesized on top of it. The sequence structure of the messages is also conducive to building a feed, which is a common component of social media.

The sequenced or feed like structure of the messages could also be useful for disseminating other public data. E.g. temperature sensors could all write to some key, and then each user only trusts nearby or otherwise relevant sources.

The system also has a mechanism to remove antisocial nodes from the network. If a node is blocked by enough of its peers, its messages will not be relayed through the network and it will be effectively banned. This allows banning without relying on some authority to make the decisions.

One weakness of this system is that it requires nodes to see and present antisocial data or behaviors before the node or its user can react and then start to distrust the offending node. This means malicious content has to be seen by at least some portion of the userbase before the safety net kicks in. This could be at least partially solved through the use of automated trust adjustments. If an automated filter detected objectionable content, it could automatically decrease the trust level by some amount. This distrust would then spread more rapidly since many nodes would simultaneously apply distrust. Then since trust is partially based on neighbors, those neighbors would depress the trust level even more.

Other strategies could include adjusting the base trust level, or stricter filtering so that only actively trusted nodes’ content is displayed.