Tony Arcieri talks about Cryptosphere, an open source P2P web application platform. In the process he describes how this can give more privacy and governance control to users of all kinds of network applications. 07 Aug 2013
Kindly transcribed by: David Hansen
Irina Bolychevsky: Hello everybody, and today we are taking to Tony Arcieri, the creator of Cryptosphere. And Cryptosphere is an open-source peer-to-peer web application platform which makes it easy to build decentralized privacy-preserving software so that users can keep control of their own content. Hi Tony!
Tony Arcieri: Hi!
Irina: Great to have you here! Why don’t you tell us a little more about yourself and what Cryptosphere is.
Tony: So I’m Tony Arcieri. I’m probably best known for a framework for concurrent and distributed computing I created called Celluloid, which is based on this thing called the actor model. So I basically created the [audio skips] because I was writing peer-to-peer software and I thought it was really hard. So in the past there have been some of these other frameworks, like Twisted for Python, EventMachine for Ruby, and probably the most famous of that sort was called node.js.
And I really struggled with them, trying to write this other peer-to-peer software about seven years ago; that software is called DistribuStream. So Cryptosphere is kind of my second attempt at building a peer-to-peer platform now that I’ve hopefully solved the problem of like, what framework do you even build the peer-to-peer software on top of.
Irina: Right, going through the process once. And what’s motivated you to build a peer-to-peer program?
Tony: So, I’m definitely a big fan of privacy, and I think the way the web works right now, basically if you want to use any web application whatsoever you have to hand all your data over to them; you know, everything about you. So you wind up with your Googles, your Facebooks. You’re giving them things that you may or may not want to be seen by the entire world, but the interest of say a Facebook is to make all that information as public as possible. So I really want to solve that problem generally. So I’m not trying to make a Diaspora competitor or anything like that, right? I want to build the framework that acts like a Diaspora should be built on top of.
Irina: Right, so you’re more focusing on the underlying foundational structure upon which, ideally, the other people would build apps where the communication would be peer-to-peer, or is that. . . ?
Tony: Yeah, definitely. So I’m trying to build, basically, software that should make it easy for anybody to build a Diaspora type of software. And there are a bunch of other people working on these sort of apps, even just like web-based encryption apps. Cryptocat, the chat software [audio skips] and I think what the creators of this type of software are learning is that building secure software within the whole web environment right now is very, very hard. So I’m trying to not only create my own solutions, but put together existing solutions to a comprehensive package to where it should be easy for people to start writing these apps where everything is encrypted everywhere, and all the encryption happens on your local computer before anything is ever sent to the network.
Irina: Ok, so does that include metadata or is it just the content that’s going to get encrypted?
Tony: So there are a bunch of places this does leak certain types of information. So one of the other things that Cryptosphere does is store data. And it’s storing data on a peer-to-peer network where the peers are unreliable, basically, like you can’t really trust anybody in a peer-to-peer network. So one of the things it has to do is make it easy, if peers drop out of the network, to repair the [audio skips]. I mean it adds redundancy but, you know, if some of the peers go away, it has to find new peers to pick up the slack.
So to do that it does leak a little bit of information. It leaks information about the general structure of your data. So I mean, there are various types of metadata that get leaked. Somebody doing deep packet analysis can potentially learn a lot about, basically the behavior of what it’s doing. The idea is, beyond the things that are necessary for the network to even work, it tries to keep everything it can confidential.
Irina: Ok, and what stage are you in the project now?
Tony: It’s still fairly early on. So one of the downsides of writing your own concurrency framework to build a project on top of is I spend a lot of time maintaining that. The upside is people are actually using that in the review world now, so there are a bunch of projects that are built on top of it, and I have a really good co-contributor on the project. He’s basically trying to take over most of the day-to-day work on it, which should hopefully free me up to work on the Cryptosphere more.
Irina: Awesome. And what kind of things are being written on top of it now?
Irina: So, sorry, could you explain a little bit more of this honing down and the difference between Celluloid that you mentioned?
Tony: So Celluloid itself doesn’t really have a security model, so you can use it in what’s called a trusted environment to build apps where basically every single node trusts each other. The problem with the peer-to-peer system is that doesn’t work at all; basically you have to assume every single node in the entire network is a potential attacker. So the Cryptosphere is a very limited, much more secured version of Celluloid itself, to where basically there are all of these mechanisms in place that make sure greedy peers, malicious peers, etc. can’t abuse the system.
Irina: And this extra security — is there a trade-off? What kind of implications does that have? What’s the benefit?
Tony: So there are a few things. So the goal of the Cryptosphere is to put people back in control of their own data. So if you don’t want Google or Facebook to store all your data, basically you either have to store that, which doesn’t work really well, because probably people don’t want to maintain their own servers, people are using laptops, their laptops go offline, etc., etc. So what the Cryptosphere does is try to have the peer-to-peer grid store all that data for you. And there are several other peer-to-peer systems that provide this sort of general storage service. Just off the top of my head the big ones are Freenet, GNUnet, and my personal favorite, which is this really obscure one called Tahoe-LAFS, by this guy Zooko.
Yeah, so those systems basically allow the peer-to-peer grid to store data. Some of the big unanswered questions with all these systems are things like accounting — basically, how do you make sure people contribute fairly to the network? Like if you’re storing stuff on the network, you should in some way contribute back to the network, right? So you either need to basically turn your home computer into a storage server and contribute equally to the network in order to participate, or if you can’t do that, you should be buying storage shares off of other people who are doing that.
Irina: Right. And so at the moment do you have of any of this kind of regulatory aspects in place?
Tony: No, they’re all just plans.
Irina: They’re all just plans?
Tony: Yeah, so I kind of want to do something somewhat similar to the Bitcoin blockchain. But instead of having one blockchain to rule them all, I want each peer to have their own individual blockchain, where basically they’re doing an IOU system. So you find a peer, that peer is like, ok, I’ll take your data; you effectively try to set up a lease with that peer, right? So both mutually sign off using digital signatures saying, this guy is going to offer me this storage service, and in exchange for that, if he asks me for storage service I will give it to him, and we’ll basically do tit for tat — you store a megabyte of data for a day, I’ll store a megabyte of data for a day for you.
Irina: Cool. And so, just to explain a little bit, so obviously the idea of having this way of self-regulation — correct me if I’m misunderstanding — is precisely so there’s no central authority that’s like saying, yes that’s fine, no that’s bad, you’re not allowed in the network, yes you are; and that you want the system to regulate itself.
Tony: Yeah. So basically my goal is like, each peer decides its own destiny. So every peer tries to learn as much about the network as it can. So they try to model, basically, the entire structure of the Internet is the goal. So this has kind of been a big problem with peer-to-peer systems in the past, that every peer looks alike, which is definitely not the case with the Internet, right? Ideally you’re collaborating with peers that are — I want to say geographically, but that isn’t really what matters — but basically, you want people who are very few network hops from you, whom you have high bandwidth links with. So there have been a lot of various attempts in peer-to-peer systems to optimize this stuff in the past.
The main thing I can think of is this protocol called P4P, which would basically let internet service providers describe this information. . .
Irina: What was that — P3P, did you say, or?
Tony: P4P, P-the number 4-P.
Tony: I forget what it stands for, specifically [proactive network provider participation for P2P]. So they were trying to let ISPs tell peer-to-peer networks how to organize. And my goal is so they don’t have to do that at all, that the peer-to-peer network can actually just learn this information completely automatically.
So the way it does that goes back to that blockchain idea; so basically each peer maintains its own mutually verified history of how fast they can talk to every other peer in the network. And if they download a bunch of these little histories from the other peers that they actually work with on a regular basis and kind of grow to trust, just because the network seems to be working right, [audio skips] to lease out storage and obtain storage service from these other peers. So basically when you do that it’s kind of like going to a restaurant a bunch, and you eventually become a regular, right? So then you might ask people at the restaurant. . .
Irina: So you kind of upgrade through the levels of, you know, having proven yourself to be more reliable or more available, like more things get routed through you, or?
Tony: Yeah. So the idea is, basically these peers can talk to each other, right? So you can imagine going to a restaurant or a bar and being a regular, and so you ask the bartender, it’s like, ‘What other restaurants should I go to?’ That kind of thing, right?
Irina: Right, ok. So I’m not entirely sure about the comparison to the blockchain. But so each node — or each peer — in the network has a log of their connection to other peers, and speed or level, location, how convenient it is for you to connect to them. And when it comes to regulating the network or actually routing through this network, what is the actual algorithm or mechanism or way that you envisage that working?
Tony: So when you have all these logs, when you have all this metadata about how effectively was I able to communicate with these other peers, then basically you can start collecting all [audio skips] and so the algorithm you use to select — so what you want to do, you want to expand your network of peers. You have a bunch of peers you’ve been working with and you want to effectively grow your peer network, because you have either more data then you can store on your existing [audio skips].
So what you would do then is apply an algorithm. So the name for this type of algorithm is called collaborative filtering. It’s probably best known as the Amazon recommendation engine. So there are several types of algorithms that fit this category; the main one is known as singular value decomposition. You do all this crazy stuff with sparse matrices. But basically the idea is, based on your history with other peers, you can look at the peers that they have also interacted with, because every time you interact with one of these peers you’re going to grab their whole history of every other peer on the network they’ve interacted with. So basically. . .
Irina: So, wait. So the data for any individual peers, that’s stored locally, or what. . .?
Tony: Each peer stores its own history, and then when you want to. . .
Irina: And when you come across a peer, you get that history, or?
Tony: When you want to do something, when you want to engage in one of these leases with a peer, what they’re going to do is give you their individual blockchain and you’re going to sign off that. Like, yes, I’m giving storage service; or yes, I’m accepting storage service. So just by the way the network operates it’s going to grab these histories, you’re going to sign them, they’re going to sign them, but in the meantime you’re collecting; each peer collects all this information about the peers it’s interacted with.
And then once you have all that information you can kind of do the Amazon thing, right? You can go, here’s my history of peers I’ve interacted with, kind of like, here’s the products I’ve looked at or bought on Amazon. And you could go, ok, here’s all the other peers I know about, and based on that, which peers are most similar to the peers I’ve interacted with and had good service with?
Irina: Ok, awesome! I love this idea of using this Bitcoin-type style/method. And you talked a bit about other, let’s see, peer-to-peer protocols. What is it that made you want to make Cryptosphere? And if you were going to summarize how you think, you know, what does it bring, how does it distinguish itself, what is the focus of it?
Tony: So I should probably first start by talking about Tahoe. So Tahoe-LAFS is a very, very similar system. I’m taking a lot of their ideas, but it’s also a project I’ve contributed to. So they’re trying to do a peer-to-peer filesystem. Right now it’s mostly targeted at small groups, but they’re talking about expanding it out to larger and larger networks. The main thing that I think distinguishes the Cryptosphere from Tahoe is, I want to make it really easy to build web applications on top of the Cryptosphere. And [audio skips] avoiding that mostly because web security is really, really, really hard to get right, and they feel like if you can’t get it right, you shouldn’t do it at all.
The one other distinguishing characteristic is trying to heavily integrate with Git. So Git is a distributed version control system. It’s something people are really familiar with, and it’s a great way to manage things like the source code of the HTML apps. So the idea is, if somebody’s familiar with Git they can just write all their stuff, check it in to Git, Git push, and this gets kind of blasted out to the whole peer-to-peer network. So anybody who goes to a Cryptosphere address looking for a website gets the latest, greatest version of that code, and everything is kept secure, end to end the entire way without them ever having to think about that.
Irina: Cool! So you’re providing this service for web app developers. If you were to think about, in a few years this is where I want this to be, and this is what I really want people to be doing with it, you know, what’s the dream, what’s the goal?
Tony: I mean, so there are all sorts of things I can think of that you can build on top of this. The goal would be distributed Facebooks, distributed Wikipedias, all these things where there are these systems where a bunch of people are trying to collaborate on something. Or there are social networks.
The social network aspect is way more interesting to me, because you know, people want to share stuff with their friends. They want to keep it just within their group of friends. They don’t necessarily want to show the world or their employers or their parents, right? They want to go have a good time and be able to take pictures, but not worry about like, you know, somebody seeing something that is something unseemly that they wouldn’t want the entire world to see, right? I want people to be able to have that sort of sharing among their friends but without the sort of Facebook worries of, Facebook just wants all your data, they want everybody to be able to see everything.
Irina: Right. Exactly.
Tony: Yeah. The same with Wikipedia, right? You have this really cool collection of all the world’s knowledge, but a lot of people don’t like the way Wikipedia is being run right now. It would be great if [audio skips] Wikipedia, and let somebody else maybe experiment with the social policies around how Wikipedia is run or something. Like maybe it shouldn’t be that everybody is able to edit everything at all times. Maybe Wikipedia needs more of a security model that isn’t just, you know, hey, all these people are editing stuff and we don’t like your edits, so we’re the mods, we’re just going to remove it, right?
So I think if you or a group of people are really unhappy with the way Wikipedia is run, there shouldn’t be this giant infrastructural investment to try to make your own Wikipedia replacement, right? It should be as easy as like, I’m going to take Wikipedia, I’m just going to fork it, and the parts of Wikipedia that are shared between the old Wikipedia and the new Wikipedia just get shared by the network, and then just the things that have changed you can control yourself.
Irina: Ok, so that’s quite interesting. So I guess you’re also seeing this aspect of using Git and the ability to have — so you fork, and have versions, the ability to push — to work with what Cryptosphere is offering.
Tony: Yeah, definitely. I mean, this is something I see all over the place. If you look at something like the Domain Name System, right? There are governments that are like, we don’t like these domain names; we’re just going to shut them down. So right now DNS is very, very centralized, and it would be cool if DNS itself could be decentralized. There are people also working on this with systems like Namecoin.
I think it would be really neat if we have someone like ICANN who’s in control of DNS, and if enough people get mad at ICANN and they’re like, we want to make our own domain name registry that’s distinct from yours, they could basically fork ICANN and people could go, ok, we’re going to trust this alternative ICANN to be the domain name registry. It starts as just a fork of the original but it could be a fork that’s ICANN but, you know, the domain names are trying to sensor — well, we’re just going to keep all those in there. So basically it makes it really hard for governments to leverage things like centralized entities to censor. . .
Irina: Push control, and push that — so to sort of start wrapping up, what is next, what are you working on, and how do people get involved? And just to kind of answer that, I guess it would be — from a user perspective, say if you are technical, if you’re not technical, how do you start using this now? And actually, how reliable is the network at present?
Tony: So the network at present is nonexistent.
Irina: Nonexistent? Ok!
So that’s where I’ve been focused on. The other things I’ve already tackled, which are fairly hard but I think, not the hardest problems in the system — I’ve done an awful lot of work trying to make sure the [audio skips] is good. So I’ve developed my own wrapper to a fairly prestigious cryptography library called the Networking and Cryptography library by Dan Bernstein, otherwise known as NaCl, or he says it should be called ‘salt’! So Dan Bernstein is pretty much one of the leading cryptographers in the world, and I have built basically a wrapper to his library and worked with other people to make it easier to distribute, easier to install.
So ultimately my goal for the Cryptosphere is basically, you have one thing as an end user, right? You have one thing you can download. You have a nice, simple installer. And ideally what you get is a custom web browser that talks to this little back end that’s running locally, encrypting everything locally so nothing’s going through a centralized service at all. And basically you have this secure web browser with this complicated peer-to-peer back end that’s doing all the magic for you. So yeah, the UI in the end should be a web browser, in my opinion.
Irina: Awesome! Ok, and in terms of the applications that work on top of it, are they going to be just, essentially, apps that are specific for the browser?
Most notably, HTML5 added real sandboxing, this thing called ‘iframe’ sandbox. So basically I can have this little outer page that’s doing all the security stuff — it’s doing all the secret management, and the actual apps that are running on the Cryptosphere will run inside of the sandbox, talk over these message channel things to the outer page, and the outer page is going to talk to this actual back end you’re going to have to download which I’m writing in Ruby.
Irina: Ok, so eventually people will be able to download, have this browser, and then they can go and access and use awesome Facebook phones; everything is automatically encrypted for them, they don’t have to set anything up, and there’s no central routing — it’s all just through the peer-to-peer network. That sounds like an amazing feature! And how far away do you think you are, and what do you need to happen for that to be a reality that’s actually dependable and efficient and useable?
Tony: So my goal is to ship this in a year. So 10 years ago I presented my other peer-to-peer system at DEF CON, so my goal is to try to ship this at the 10th anniversary of my last [audio skips] my other system kind of fizzled out. But as far as what I’m looking for with collaborators, on the Ruby side I’m trying to pick people out of my other project, Celluloid. Celluloid itself has gotten a lot of people who are interested in these sort of ideas to come to a single place and talk. So I didn’t mention it before, but a lot of people who are using Celluloid now are actually using it to build stuff like Bitcoin trading websites. So there are already a lot of people who are interested in this sort of thing who are kind of hangers-on to Celluloid.
So there are already a bunch of these sort of people working on the parts I want to use in the browser. These technologies specifically are called Oasis and Conductor. Oasis is the sandboxing and Conductor is the application framework. So they would build these things called cards, kind of like in Twitter, you know, when you click on a Tweet it has media attached to it, right? You get a little tiny embedded view of that media. So that’s actually coming offsite, off of Twitter. So it’s the same idea. You can pull this third-party content into your system and still have it be secure.
Irina: Awesome! Ok, well thank you very much Tony! And for everybody, cryptosphere.org is the new site. Cool! Maybe we can talk to you again.
Tony: Yeah, definitely! Thanks.
Irina: Bye bye!