« More interviews

#1: Nicholas Tollervey - Drogulus

15 Jul 2013

Nicholas Tollervey talks about the drogulus, his programmable peer-to-peer data store. In the process he describes how a Distributed Hash Table (DHT) works and what motivated him to start the project.



Kindly transcribed by: mhr

Francis Irving: Who’s from the Drogulus project, and he’s a technical Python person, and a training musician, and he also used to be a teacher, and he’s making the Drogulus, and he’s going to tell us all about it. Hello!

Nicholas Tollervey: Hello!

Francis: So what is the Drogulus?

Nicholas: Well, basically the Drogulus is a programmable peer-to-peer data store that I’ve been working on during my commute to London, my 14-minute commute in the morning. Basically, what it is, it’s a bit of an experiment in peer-to-peer decentralization. It’s sort of a place for me to experiment and explore some ideas that have been knocking around in my head for quite a while. So the Drogulus itself is a global, federated, decentralized, open data store that can be programmed by anyone. And we ensure that the identity and provenance of the people using the Drogulus is ensured by cryptographically signing digital addresses; we use public-key cryptography for that. So being federated, in that the system consists of many independent entities and is decentralized, there’s no one entity more important than any of the others. It means that users are free from choke points of authority that may be used to control access or usage of the system. Being open.

Francis: So, those are quite, kind of big words. From the user’s point of view, how will that look in the end?

Nicholas: How will it look [laugh]?

Francis: How will it work?

Nicholas: How will it work?

Francis: That’s the kind of. . .

Nicholas: Well.

Francis: In terms of the interface that someone using it will experience?

Nicholas: I’ve just not gotten to that part yet.

Francis: Yep.

Nicholas: It’s pretty low-level, I’ve got some ideas of how it might work. But, you know, the important thing for me at the moment is to get the basic technology right and working, and then build on top of that.

Francis: And that technology uses a relatively new algorithm, doesn’t it? The distributed hash table algorithm?

Nicholas: Yeah, okay, so yes. The distributed hash table. So, I’ll explain what that is. Out of a totally abstract and nontechnical level, I’ll explain how it works. This is the story of the distributed hash table, as it were. It’s sort of a peer-to-peer dictionary, so there’s a unique key in the dictionary that identifies some value. So, in the case of the traditional dictionary, the key is the word and then the associated value is its definition. Like ‘aardvark’ is an animal with a long snout that always appears at the beginning of most dictionaries.

Francis: [Laugh]

Nicholas: Being a data store, though, the distributed hash table allows us to create, retrieve, update, and delete their own keys and associated digital values. So, the hash table is distributed because it’s split up into many, into the equivalent of sort of — there’d be different volumes of a traditional dictionary, where each volume relates to a particular area in the whole dictionary, as it were, and each person who ever uses the distributed hash table has a copy of just one volume from the distributed hash table. But each volume can be distributed to many, many different users.

And so what users do is they keep track of their friends on the network to know which friend holds what volume, so that when they want to interact with the distributed hash table, and they move to contact in order to retrieve a value or effect the changes to the distributed hash table, and if they don’t know the person with the correct volume for the thing that they’re trying to interact with, then they play sort of a ‘six degrees of separation’ game with their friends until the person with the right volume is found. And the other important thing to mention with distributed hash tables is that they share an interesting property with BitTorrent, which is where the more popular an entry into the distributed hash table becomes, the more widespread it becomes in the dictionary itself, which means the performance is improved since popular items are actually easier to find. That’s kind of it, at a high level.

Francis: Hmm. So it’s like a totally different way of storing things. So rather than store it on a physical hard drive on my computer, they’re actually spread out over the Internet? Do I not even know where they are?

Nicholas: Absolutely [laugh].

Francis: [Laugh]

Nicholas: And they’re replicated because it’s a very nebulous thing, this distributed hash table, so there are peers joining and leaving the distributed hash table all the time. So part of the algorithm is that values are replicated through the hash table so that, you know, you’d have to get rid of a huge number of nodes to ensure that you got rid of a value.

Francis: Yeah. So I was about to say, so suppose some computers that happen to store either document as saved in the hash table, like important documents to me, would there be multiple copies of it, and on different people’s machines?

Nicholas: [Nod] That is correct, yes.

Francis: And then they’d be encrypted, presumably.

Nicholas: The Drogulus only uses cryptography to sign digital assets.

Francis: Okay.

Nicholas: If you encrypt your data because you want to make it private, then that’s up to you. But that’s going to be dealt with at a higher level, obviously. But I’m working at a very low-level, here. To get the basic functionality right.

Francis: Yep. So I put my document in lots of different places, it’s spread automatically by the distributed hash table on the Internet, and then if several of those computers then disappear for some reason, or the person stops running software, or deletes all of the content in that node, does it then detect that and then replicate it, automatically, to other nodes?

Nicholas: Okay, yes, so the algorithm that I use is called Kademlia, and there’s a rather excellent paper from about ten years ago that explains this in great detail, but every X number of minutes, the algorithm tries, or a node will try and replicate its value to close-by peers. So it will try to spread things out like that. The other thing is that the way Kademlia works is that it tries to use the most, the best performing peers in the distributed hash table. So it’ll use those that have demonstrated that they’ve had lots of uptime — let me see what I mean — and try to use those more than those that are bit more transient, as it were.

Francis: So basically, is there a kind of peer-rating system almost like eBay’s rating system, where the nodes rank each other?

Nicholas: Well, there’s something called the routing table, which is basically how — I just told you about the distributed hash table — that’s where the node on the network keeps track of its ‘friends’, as it were; ‘friends’ elsewhere on the distributed hash table. That’s actually ordered so that the most performing, the best performing nodes are ranked higher in the routing table than other nodes. So, yeah.

Francis: So, the Drogulus, then, I presume it’s not the first implementation of DHT. What motivated you to make the Drogulus, and what’s interesting about it?

Nicholas: Okay, so there are lots of different distributed hash tables and obviously there was an implementation behind the original paper. The most famous implementation of Kademlia is probably the way that BitTorrent uses it for tracking, for replacing trackers.

Francis: That’s in the magnet links, specifically.

Nicholas: Yep, that’s the magnet links, right there. So my motivation for creating the Drogulus is a bit different to Bittorrent and things. Basically I have a growing unease with the current state of the Web, and this could be summarized in three ways. The first one is that on the Web, users are no longer in control of their data online, and identity. They’re locked into website that act as walled gardens of data, each requiring different sets of credentials, et cetera, et cetera, et cetera. The second problem — unease — that I have, is that programmers have to build on the Web using complicated and quirky technology that’s defined in a top-down manner by committees and things. You know, you only have to think about OAuth and calls and JavaScript Date objects, and things like that to realize that it’s a bit hacky, and there’s no way for developers to maybe get around that. They have to wait for browser developers to implement the latest version of JavaScript, or implement the latest HTML5 things, and they have no say into, you know, that DRM is going into the new standard, and things like that. So, it’s top-down rather than bottom-up.

And the most important problem I have is that there are many advertent points of control and lock-in and authority built into the Web, by virtue of the way that it’s built/architected. Each of these problems is a potential mechanism for disempowerment, and spying, and exploitation, and things like that, which obviously, given the recent shenanigans with Snowden, and the Pirate Bay being censored, and of course everybody knows about the Great Firewall of China. You know, I think that the beautifully simple and open hypertext system that Tim Berners-Lee created has grown into a mechanism of centralization and complication that’s beholden to dodgy commercial, political, and legal manipulation. And more worryingly, our data’s analyzed by companies and it’s sold in the form of targeted advertising, and governments get access to it without our consent.

So to get back to the Drogulus, rather than slagging off the Web, which I believe is a great thing, there are many aspects of today’s Web that are contrary to the concept that’s very important to me, and that’s autonomy. So by autonomy, I mean someone who is self-directing, they’re free to act of their own accord, and they lack imposition from others. And autonomy also suggests there’s some sort of intelligence and reason and awareness, enough to be able to enjoy and make use of this freedom that you have. And by having this intelligence it entails decision-making, so that people become accountable for their actions. And lastly, autonomy is sort of the opposite of such undesirable states as tyranny and slavery and nasty things like that. So I asked myself, you know, how would software designed to grow autonomy function, and I started to hack, and we get the Drogulus.

Francis: Hmm. Interesting, so it’s like, you think the original Web was or felt free, and that is kind of recreating it in some ways, or what it was originally meant to be?

Nicholas: Okay, I’m old enough to remember using the Web when it was just text [laugh]. And when I was at university back in 1993, using the Mosaic Web browser, and I remember actually staying up until the early hours in the computing lab just browsing the Net, and realizing that NASA is on the Web, and look, there’s all this information over here, and there’s this guy writing stuff over here, and you know, all this amazing stuff. And at the moment it just feels like — well, I was thinking about it just yesterday, which websites do I visit most? Well there’s Google for search, there’s the BBC News website, the Guardian, Hacker News for all my sort of technology stuff, I’ve got various RSS feeds, through which I used to use Google Reader, and that’s got shut down, so there’s only really a handful of websites that I might use and gone is this sort of proliferation of everyone had a different blog, and people had control because they were in control of their server, and so on and so forth. So yes, in a way, it is a little about getting back to that decentralized nature that was the beginning of the Web.

Francis: So there are quite a few of these projects that are thinking about how to redecentralize the Internet in different ways, so what do you think the implications are? What might happen, and what should we watch out for, both good and bad, when quite a few people start to use things like the Drogulus?

Nicholas: Okay, I mean the Drogulus isn’t finished, so you can’t use it yet, although it’s getting close to a usable state.

Francis: It works a bit, for programmers.

Nicholas: It works a bit for programmers. You can get the test suite to pass [laugh]. So basically, from my point of view, decentralization means a loss of power — or a movement of power — from those that control and use the centralized systems that we currently have to those who participate in and build the decentralized systems that are being built. And in a way, it’s sort of a way of answering three questions — What is the best way to organize diverse entities that coexist together in large dynamic groups, like in a society or in a network? How are these arrangements created? And who is responsible for making these things work? These are questions that are surprisingly important for political philosophy and the software engineer. It shows that, you know, there’s quite a bit of overlap between these two subjects, when you start to think about it.

So, questions like ‘What is the best way to organize diverse entities?’ can be answered in a political way by saying, well, use this form of government, and not that form of government, and so on and so forth. So peer-to-peer answers these questions by saying the most effective way to organize the most diverse, dynamic group of things — participants — is with a peer-to-peer architecture. Which can be, for example, for a technical reason like Bittorrent — it’s just more efficient to do what you want to do in that particular way; or it might be for political reasons, like with Bitcoin, because you don’t want a central bank controlling a currency. And the means of creating such a network is for an open protocol that describes the expected behavior of the participants, including checks and balances to ensure that participants are behaving themselves on the network, and it’s therefore the participants’ responsibility to correctly implement the protocol in order to make the system work correctly. So, I guess the redicent — redecitrent — [rede]centralization (I’ll try to say that properly). . .

Francis: [Laugh] It’s a bit long, isn’t it?

Nicholas: Yeah, it is. I think it’s going to become a significant force because people have seen the pendulum swing from a decentralized web to a very centralized web, and the pendulum’s swinging back. There’s a reaction to this centralization, and it’ll become a significant force for change, and that it’s sort of our responsibility as people who are participating in creating these peer-to-peer systems to make sure that what we do provides a net improvement on the way things are at the moment, and promotes autonomy, this thing that I think is valuable. Rather than facilitate disempowerment, and spying, and other nefarious sort of activities. And yeah, that’s about it.

Francis: Great! Thanks Nicholas, I’ve got to wrap up right now [laugh]. I feel like I have a bit more responsibility as a programmer.

Nicholas: Well, yes. Programming is a political activity because we’re creating the rules of the digital world, as programmers, and if you do it in an unthinking way, without considering the ethical implications of what you’re writing, then in some sense, you’re not being responsible. And this is something that’s important to me.

Francis: So if people want to contribute code to the Drogulus, where can they find it [laugh]?

Nicholas: It’s on that centralized code repository called Github [laugh].

Francis: [Laugh]

Nicholas: It’ll be github.com/ntoll/drogulus. So, there’s a website at drogul.us as well, so.

Francis: Fantastic! Thank you very much, Nicholas.

Nicholas: That’s alright, my pleasure.

Francis: drogul.us? I’ll put that on the. . .

Nicholas: drogul.us. Yes, exactly, that has all the details on it.

Francis: [Laugh] Okay, fantastic! Good talking to you, and have a good rest of this summery day.

Nicholas: I will! I have a hammock waiting for me!

Francis: [Laugh]

Nicholas: [Laugh]