"DON" StatusNet Node - Part One: Read Protocols
Apr 8, 2017
9 minutes read

While you're reading this, keep in mind that I'm available for hire! If you've got a JavaScript project getting out of hand, or a Golang program that's more "stop" than "go," feel free to get in touch with me. I might be able to help you. You can find my resume here.

Recently there’s been a bit of buzz about a piece of software called Mastodon. There’s a standard called OStatus, which builds atop Atom to provide a federated microblogging community, and Mastodon is an implementation of this standard. It’s exploded a bit recently, and that’s what got me looking into it. I wanted to set up a copy for myself, to experiment with it, but I decided I couldn’t be bothered setting up a whole rails environment to give it a spin. Much easier to write my own version from scratch, right? Well, no. But it was more fun.

If you’d like to check it out, it’s hosted at don.fknsrs.biz, with source code at fknsrs.biz/p/don. If you search for a user, it’ll auto-subscribe to that user. Feel free to add anyone you like!

OStatus Standard and Protocols

The concepts behind OStatus are really neat - it ties together several existing protocols (Atom, PubSubHubbub, WebFinger, Salmon, among others) in a really neat way, resulting in what seems to be a pretty sensible network design. The protocols it builds on are all quite simple in isolation, and luckily OStatus doesn’t add too much complexity to the mix itself.

Acct URIs

The first thing to learn about is the acct URI scheme. This is part of WebFinger (RFC 7033), and is basically a standardisation of the classic username@host syntax. For example, I have a twitter account, and as an acct url, I could refer to it as acct:deoxxa@twitter.com. The significant thing about this is that, according to the WebFinger specification, you can use this URI to look up more information if the service it’s associated with operates an appropriate WebFinger endpoint.

Now, twitter doesn’t host a WebFinger endpoint, so we can’t use that URI for anything, but I also have a mastodon.network account - and they do provide a WebFinger endpoint. We can call that account acct:deoxxa@mastodon.network. This is the account that I’ll be using as an example.

To get started, you have to find where the WebFinger endpoint is! There are actually two ways to do this. The first is just to slap /.well-known/webfinger onto the end of the host portion of the acct URI. This is actually standardised, so it should work nearly all the time, and you should probably do this first. However! It’s totally possible for someone to host a WebFinger endpoint at any location, and if you want to be able to handle that, you have to know a teeny bit about “Link-based Resource Descriptor Documents”, or LRDD for short. You can use this to recover the process if, for some reason, someone hosts their WebFinger endpoint at a non-standard path.

Web Host Metadata

So, LRDD. There’s a “Web Host Metadata” specification in the form of RFC 6415 that details a way to request metadata about a web host that you can use to find additional information. Short version: request /.well-known/host-meta, via HTTPS, from the host you’d like to know about. This path (and scheme) is part of the protocol, so you can safely hard-code it. If we use mastodon.network as an example, their metadata looks like the following.

<?xml version="1.0"?>
<XRD xmlns="http://docs.oasis-open.org/ns/xri/xrd-1.0">
  <Link rel="lrdd" type="application/xrd+xml" template="https://mastodon.network/.well-known/webfinger?resource={uri}"/>

Hooray XML. Get used to it, because literally all these protocols are XML.

Right there, look, it says “webfinger” - that must be it! Well, yes. It is, but you can’t just assume that it’s always going to have the string “webfinger” in the URL. It could technically be anything, and there could be many more Link elements depending on the service. What you have to do is parse the document, then find the Link element where the rel attribute is lrdd. This element will have a template attribute, which is a URI Template (as specified in RFC 6570). In this template, the {uri} bit is where you put the acct URI when you want to make a request.


So, tying this all together, we end up with a URL of https://mastodon.network/.well-known/webfinger?resource=deoxxa@mastodon.network. You probably won’t be able to just open that in your browser - your browser will request HTML, and the server won’t supply it, so you’ll get a blank page. Use cURL or something instead if you’d like to give it a try! It should look something like the following.

<?xml version="1.0"?>
<XRD xmlns="http://docs.oasis-open.org/ns/xri/xrd-1.0">
  <Link rel="http://webfinger.net/rel/profile-page" type="text/html" href="https://mastodon.network/@deoxxa"/>
  <Link rel="http://schemas.google.com/g/2010#updates-from" type="application/atom+xml" href="https://mastodon.network/users/deoxxa.atom"/>
  <Link rel="salmon" href="https://mastodon.network/api/salmon/4714"/>
  <Link rel="magic-public-key" href="..."/>
  <Link rel="http://ostatus.org/schema/1.0/subscribe" template="https://mastodon.network/authorize_follow?acct={uri}"/>

This is an XRD document - an “extensible resource description”. Apparently that makes me a resource. But don’t laugh too soon, because if you have a mastodon account, you are a resource too. So we’re in this together. Probably. Huddle tight for warmth.

So, now that we’ve got my user metadata, we can look at these different links. What are they? There are so many of them! Computers are awesome. The one we’re interested in at the moment is my Atom feed - it’ll always be located in a Link element with a rel of http://schemas.google.com/g/2010#updates-from. This doesn’t mean my updates are hosted on Google or anything - it’s just that Google initially defined that rel value for one of their products, then a few clients started using it, then everyone else started using it for compatibility’s sake. You can safely treat it as an opaque value.


According to the document up there, my update feed URL is https://mastodon.network/users/deoxxa.atom. Go take a look at it if you want! It should display fine in your browser, but it’ll be a bit ugly. I’ve copied it below with some highlighting.

<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:activity="http://activitystrea.ms/spec/1.0/" xmlns:poco="http://portablecontacts.net/spec/1.0" xmlns:media="http://purl.org/syndication/atommedia" xmlns:ostatus="http://ostatus.org/schema/1.0" xmlns:mastodon="http://mastodon.social/schema/1.0">
    <link rel="alternate" type="text/html" href="https://mastodon.network/@deoxxa"/>
    <link rel="avatar" type="image/jpeg" media:width="120" media:height="120" href="https://mastodon.network/system/accounts/avatars/000/004/714/original/f5ae14273acf55ef.jpg?1491480661"/>
    <link rel="header" type="" media:width="700" media:height="335" href="https://mastodon.network/headers/original/missing.png"/>
  <link rel="alternate" type="text/html" href="https://mastodon.network/@deoxxa"/>
  <link rel="self" type="application/atom+xml" href="https://mastodon.network/users/deoxxa.atom"/>
  <link rel="hub" href="https://mastodon.network/api/push"/>
  <link rel="salmon" href="https://mastodon.network/api/salmon/4714"/>
    <title>In the beginning...</title>
    <content type="html"><p>In the beginning...</p></content>
    <link rel="self" type="application/atom+xml" href="https://mastodon.network/users/deoxxa/updates/2810.atom"/>
    <link rel="alternate" type="text/html" href="https://mastodon.network/users/deoxxa/updates/2810"/>
    <link rel="mentioned" href="http://activityschema.org/collection/public" ostatus:object-type="http://activitystrea.ms/schema/1.0/collection"/>

Arrrrggghhhh more XML!

Alright. There’s a lot there. Let’s start from the top. This is a plain old Atom feed - the kind you’d shove in your RSS reader, if it was 2009. I actually kind of miss 2009. First up we can see that the feed has a logo - that’s my avatar. That same image is included later on as an avatar link in the author section. Right at the bottom, there’s an entry element. Normally there’d be several of these, but I’ve only made one post on that account. It’s a good thing too, since that’s already far too much XML for one day.

The interesting part for us right now is actually in the middle - the link elements. You can see one where rel is set to hub - this is a PubSubHubbub server that we can use to subscribe to updates! If we ask it politely, we can have it notify us of every new post that I make. Just what you always wanted - a constant stream of unimportant garbage.


Superfeedr have an excellent article that details exactly how to implement a PubSubHubbub consumer. I’m not going to go into too much detail here.

Firstly, we have to understand the basics of PubSubHubbub. It’s made up of three parts - producers, hubs, and consumers. Producers push events to hubs, consumers receive events from hubs, and hubs act as brokers between the two, forming a fan-out mechanism. Hubs also do some limited redelivery if consumers are unavailable when an event is pushed. This means consumers don’t have to poll for updates, and producers don’t have to implement all that messy retry logic on their own.

A hub is located at a particular URL. A consumer must make a request to that URL to subscribe to a particular “topic”. For our purposes, the topic is just the Atom feed URL. The consumer must also tell the hub where to send events to, when they come in from the producer. This forms the basic handshake.

As we saw before, we can find the hub URL by looking at the link elements in the Atom feed. If we make the right request to this URL, we can ask the hub to notify us of updates as they happen. That request has the following parameters.

  • hub.callback: the URL you’d like the hub to send events to
  • hub.mode: this will always be subscribe or unsubscribe
  • hub.topic: this is the feel URL
  • hub.verify: this will be sync or async - it determines whether the hub verifies your subscription in-band or out-of-band. This is an advisory parameter. The hub is not obliged to follow it.
  • hub.lease_seconds: this is how long you’d like to subscribe for. This is also advisory. The hub may have a maximum (or minimum) lease time.

Once you make this request, the hub will immediately make a request to verify your intent. The hub will send a parameter that you have to echo back, to confirm to the hub that you did indeed intend to receive updates to that URL. This helps to prevent invalid (either malicious or accidental) subscriptions from being carried out. If you respond correctly to this request, the hub will make your subscription active, and will deliver events to you as they come in, until the subscription expires (as specified in the lease_seconds parameter).

In DON, I generate a unique identifier for each subscription, and that forms part of the callback URL that I send to the hub. This way I can correlate incoming messages with particular subscriptions and feed URLs, without having to rely on the metadata from the hub.

Feed Updates

With OStatus, post updates come in the form of Atom feed documents. These documents (usually!) contain author information, and exactly one entry element. You can use the author information to update any locally cached data you have, such as the user’s avatar, display name, etc. The entry will be one of several kinds of activity - it could be a post, a reply, a “share” (kind of like a retweet), or a deletion.

In DON, I keep a history of posts for every user I’ve seen in the system. This may change in the future.

All Together

Let’s recap. We started from just an account - deoxxa@mastodon.network. We used the host portion of that account to look up some metadata about the host. That host metadata told us how to find information about an account (my account!) on the service. Then that account metadata told us how to find my updates. Then when we finally fetched my post feed, a link in there told us how to subscribe for notifications. Links on top of links on top of links. Quite neat though, overall. Kind of like clicking through websites.

That just about wraps up the read side of these protocols. There is quite a bit of semantic behaviour I’ve glossed over for the sake of simplicity, but I’ll go through that when I describe the application logic itself.

Back to posts