Federation – what flows where, and why?
With all of the recent hullabaloo with Gab, and then, today Kiwi Farms joining the fediverse, there has been a lot of people asking questions about how data flows in the fediverse and what exposure they actually have.
I'm not really particularly a fan of either of those websites, but that's beside the point. The point here is to provide an objective presentation of how instances federate with each other and how these federation transactions impact exposure.
How Instances Federate
To start, lets describe a basic model of a federated network. This network will have five actors in it:
(yeah yeah, I know, I'm not that good at making up fake domains.)
Next, we will build some relationships:
- Sophie follows Alyssa and Bob
- Emily follows Alyssa and Chris
- Chris follows Emily and Alyssa
- Bob follows Sophie and Alyssa
- Alyssa follows Bob and Emily
Here's what that looks like as a graph:
Normally posts flow through the network in the form of broadcasts. A broadcast type post is one that is sent to and only to a pre-determined set of targets, typically your followers collection.
So, this means that if Sophie makes a post,
chatty.example is the only server that gets a copy of it. It does not matter that
chatty.example is peered with other instances (
This is, by far, the majority of traffic inside the fediverse.
The other kind of transaction is easily described as relaying.
To extend on our example above, lets say that Bob chooses to
Announce (Mastodon calls this a boost, Pleroma calls this a repeat) the post Sophie sent him.
Because Bob is followed by Sophie and Alyssa, both of these people receive a copy of the
Announce activity (an activity is a message which describes a transaction). Relay activities refer to the original message by it's unique identifier, and recipients of
Announce activities use the unique identifier to fetch the referred message.
For now, we will assume that Alyssa's instance (
social.example) was able to succeed in fetching the original post, because there's presently no access control in practice on fetching posts in ActivityPub.
This now means that Sophie's original post is present on three servers:
Relaying can cause perceived problems when an instance blocks another instance, but these problems are actually caused by a lack of access control on object fetches.
A variant on the broadcast-style transaction is a
Create activity that references an object as a reply.
Lets say Alyssa responds to Sophie's post that was boosted to her. She composes a reply that references Sophie's original post with the
Because Alyssa is followed by actors on the entire network, now the entire network goes and fetches Sophie's post and has a copy of it.
This too can cause problems when an instance blocks another. And like in the relaying case, it is caused by a lack of access control on object fetches.
From time to time, people talk about metadata leakage with ActivityPub. But what does that actually mean?
Some people erroneously believe that the metadata leakage problem has to do with public (without access control) posts appearing on instances which they have blocked. While that is arguably a problem, that problem is related to the lack of access controls on public posts. The technical term for a publicly available post is
as:Public, a reference to the security label that is applied to them.
The metadata leakage problem is an entirely different problem. It deals with posts that are not labelled
The metadata leakage problem is this: If Sophie composes a post addressed to her followers collection, then only Bob receives it. So far, so good, no leakage. However, because of bad implementations (and other problems), if Bob replies back to Sophie, then his post will be sent not only to Sophie, but Alyssa. Based on that, Alyssa now has knowledge that Sophie posted something, but no actual idea what that something was. That's why it's called a metadata leakage problem — metadata about one of Sophie's objects existing and it's contents (based on the text of the reply) are leaked to Alyssa.
This problem is the big one. It's not technically ActivityPub's fault, either, but a problem in how ActivityPub is typically implemented. But at the same time, it means that followers-only posts can be risky. Mastodon covers up the metadata leakage problem by hiding replies to users you don't follow, but that's all it is, a cover up of the problem.
The solution to the metadata leakage problem is to have replies be forwarded to the OP's audience. But to do this, we need to rework the way the protocol works a bit. That's where proposals like moving to an OCAP-based variant of ActivityPub come into play. In those variants, doing this is easy. But in what we have now, doing this is difficult.
Anyway, I hope this post helps to explain how data flows through the network.