« More interviews

#7: Jeremie Miller - TeleHash

Jeremie Miller, creator of Jabber, talks about his new distributed wire protocol, TeleHash. What lessons has he learnt about how to make a protocol popular? How does Kademlia work? 17 Oct 2013



Kindly transcribed by: Ryan Pardieck

Francis Irving: Hello, welcome to Redecentralize where today we’re interviewing Jeremie Miller, who was the creator — from his farm in Iowa — of Jabber and founded the startup Singly. And now he’s working on TeleHash and he’s going to tell us all about it. Hello Jeremie.

Jeremie Miller: Hello. Great to be doing this. I’m excited!

Francis: So, can you give us a brief introduction to TeleHash, what it does?

Jeremie: So it’s hard to do a brief introduction. TeleHash, I’ve been working on it specifically under that name for about five years now. And for much of that lifetime it’s been sort of a research project. It was born of a lot of the problems that even back when the whole community of us was working on Jabber — Jabber was a federated model. But there was still a lot of desire to solve the peer-to-peer problems, and it didn’t fit well with the Jabber architecture, so we never really got play with that strongly in that environment. And it was about the same time that Jabber was evolving, a lot of the distributed hash table research and a lot of experimentation was happening, and out there in the real world people were building apps with it.

So like I said, about five years ago I sort of came back to wanting to really take everything that had been learned about distributed hash tables and see if we could solve some of those original problems that we had wanted to do as part of Jabber, in relation to how do you connect people peer-to-peer, how do you take care of the real-time aspect of stuff going back and forth. . . Anyway, it evolved over the last five years, and there’s only been two versions of it implemented, so we’re on version 2 right now. There’s been probably a dozen different actual versions of the spec that had evolved, but this latest version has incorporated. . . Sorry, this hasn’t been very brief. The latest version of this over the last six months has incorporated a lot of the strongest cryptographic protocols we can incorporate into it, because we realized that in building a communications system, that it has to now natively — from the very ground up — be private as well; that it has to incorporate privacy from the very essence of its DNA, not just as a layer on top. So that has been a big piece of the new version.

Francis: Okay. And maybe too explain it, can you give us from the point of view of an end-user? Perhaps a technical one, but then somebody building applications built on top of it, what would it look like when it is finished? How does TeleHash feel and work?

Jeremie: So the goal is to actually build a set of communication apps that a user doesn’t necessarily know or see that anything special or different is happening other than that there is some assurance of trust that when they’re using the communication system, that they know that their messages and things that they’re sharing with somebody else are going straight to that other person. That they’re not having to upload or share them with some other company, or on some sites, or some other server. That they know that it’s going from their phone to the other person’s phone, or if the other person has like a photo sharing place that it’s coming back and forth from wherever they’re sharing photos from. And the same for media streaming, audio/video. We want that knowledge to be apparent to the person. But otherwise they don’t really see anything different.

We’re trying to build the same set of asynchronous and synchronous communication patterns — instant messaging, chat, sort of the mail-style patterns, full social network patterns. All of the typical communications systems that you use apps and technology for can be built on top of TeleHash again. So that’s from the perspective of a user. From the perspective of a developer who’s using and building something on top of TeleHash, the applications no longer need to care about hostnames, or DNS, or IP addresses or ports; all they have to care about is the fingerprint of the other endpoint they want to reach. TeleHash actually takes care of turning that fingerprint into a network path, and it takes the shortest network path possible to get there. So the perspective of the developer who tries to simplify everything to just — I have a fingerprint of someone I want to talk to which we call a ‘hash-name’, and I have data that I want to send back and forth; either an ongoing stream of data or just a one-time request-response.

Francis: So this is using distributed hash table stuff. I know lots of people don’t know about or understand it yet; can you kind of explain how that works? How it can send messages between two places without any intervening server getting hold of them?

Jeremie: So the distributed hash table that TeleHash is based on, since there are a number of different strong patterns out there of rules of how to create distributed hash tables — the one TeleHash is based on is called Kademlia. I think it’s pronounced correctly. It’s how I’ve heard other people pronounce it. It’s been around for about ten years now, and it’s one of the simplest. Its original design was sort of as a key-value store, but TeleHash doesn’t use it that way. It only uses it to resolve other endpoints that I want to connect from one place to another, and a distributed hash table will help coordinate and find that other endpoint. And to explain how it works is actually — there’s no special math — all it is is an exclusive OR. So I have my identity, which is a SHA-256 hash, and I have the identity of somebody that I want to reach, which is their SHA-256 hash. I can find the distance between me and them by just doing an XOR of all of the bits. And usually the first couple bits are different, so the distance is very far. Whereas. . .

Francis: That’s a distance in the count of hash table space, rather than any physical distance.

Jeremie: Yup, and like in a hash table, you have to have seeds. So when I first turn on I have to go connect to somebody else in the distributed hash table, and how I find or how I resolve somebody else is I go to whoever I know that is closest to the one that I’m seeking. So I have a list of people I’m connected to, I sort them based on their distance from the hash-name I’m trying to reach, and I say ‘Hey, do you know this hash-name, or do you know anybody closer?’ And they do the same comparison of everybody they’re connected to, and they give me back a list of whoever’s closer. And it feels like that would be very brute-force, except that Kademlia has a rule about how you keep a list of buckets and you try to keep connections open to people that are close to you, so that you always have more knowledge of and more connections to other hash-names that are near to you. So the queries will actually consecutively get closer and closer to their endpoint.

Francis: Okay, on a very kind of practical level, how are getting around various NATs and routing and firewalls and things, for those communications?

Jeremie: So I wanted to make sure that was completely built into and native to TeleHash, not dependent on any external service or any external provider for that. Whenever you’re connected to anybody else in the distributed hash table, they obviously know what your public IP and port is. And if I want to connect to you, the act of connecting to you means I search for your hash-name, so I’m talking to somebody who actually is already connected to you or who knows you. And they say, ‘Hey, I know them.’ And then they tell me what your public IP and port is and I send you a little packet so that I can open a path from my NAT towards yours, which might not get there yet, but I also go back to them and say, ‘Hey, I’m trying to connect to this person.’ And they hand my information over to them, so that they can then send a packet that punches all the way through the NATs back to me.

Francis: So once you’ve made the connection with Kademlia, you then have a direct connection between the two parties.

Jeremie: The goal, yes. The goal is that every hash-name is connected directly.

Francis: Okay, that’s kind of interesting. And the cryptography that you’re using, you said you’ve done lots of work recently on improving that.

Jeremie: Yep, so all the packets sent on the wire are always encrypted to the recipient so that anybody recording anything off the wire can’t actually see anything just by recording the traffic. We actually use two patterns of encryption. One is just for identity. So we’re using RSA to identify what hash-name is the fingerprint of your RSA key, so the other side can sort of assert and say, ‘Yes, I am this person,’ and they can sign the request to guarantee who they are. And then you can encrypt a secret so that only they can decrypt. That’s actually not used for the content that’s sent back and forth. Once the identity is exchanged and verified, it’s basic forward secrecy using elliptic-curve Diffie-Hellman, such that each side creates a session key — a temporary elliptic curve — and then they use Diffie-Hellman to derive a shared secret and they use AES from that shared secret. I mean there’s a lot more stuff involved in this, but at a high level all of the content is actually sent encrypted using temporary keys, such that if something was recorded and cracked at any later point, it would only decrypt that session. As well as if either side was compromised you couldn’t actually decrypt the traffic, even if you were able to compromise the keys.

Francis: Something that people always worry about with this is the meta-data analysis, just learning things from the fact that people are even talking to each other at all. Does any of this kind of distribution hashing stuff help with that?

Jeremie: It helps in that because of the function of Kademlia and a distributed hash table itself, you have many, many hash-names that you’re connected to and talking with all the time because you’re exchanging queries and basically status updates. So it’s not intentional — like to try and create fake traffic patterns — but just the nature of using the distributed hash table does create a lot of sort of random network traffic back and forth. But the real goal of TeleHash here isn’t to try and create an anonymous network. It’s to create a network that people can use to communicate with the people they know. You’re instant messaging with your friends, your family, and you’re sharing photos to family members, and you might be talking to work members. And we’re doing a lot of work to make sure that the Internet of things, that TeleHash supports them just as well. So that when I have a bunch of sensors and devices and computers around me, I can talk to them directly. So it isn’t anonymizing traffic, it’s actually almost the inverse. It’s about creating a trusted path to the people that you know.

Francis: Yeah, so I always find the resilience aspects really interesting, as well as the privacy ones. So things like if there was a hurricane, or if the servers went down of the central server, then presumably the packets get routed directly peer-to-peer.

Jeremie: Yeah, that’s actually one of the excited things that has been one of the design principles from the beginning, and one of the things that I was disappointed that Jabber couldn’t do as easily. In that sort of a disaster scenario — or wherever the network might not be reliable or trusted or might block things at a higher level — that because everything travels peer-to-peer, as long as you can establish a connection with somebody else locally, you can actually exchange and communicate with them. And they can help connect anybody else’s locally, so the hash table itself will reform with whatever network connectivity is available. And as well we’re designing, we call them switches, the ones that actually handle the crypto and do all the network traffic that an application sort of embeds into it. The switches should be able to take advantage of — in a phone — ideally both the cell network as well as the WiFi network. The goal is to know every network path available to another hash-name, such that if one doesn’t work you can fall back to another. And I’d love for someday when the neighborhood networks start to increase, that the neighborhood network is yet another path that any local application can then use to connect anybody else.

Francis: Okay, so in terms of take-up, how much response have you had? Are there any good applications running, and what’s your plan to increase that number and get people to be able to use it?

Jeremie: So the current status is that the version 2 basically has been developed over the last six months or so. And we have a couple of core implementations of that that are, I would say, quite unstable yet. So we’re not sort of in a production mode where people are using it on a day-to-day basis, but we have about a dozen people that are involved in helping implement in various different languages and environments, and getting those some real-world testing and real-world experience to make sure that all the NAT hole punching works, to make sure the heuristics about how to maintain the distributed hash table work well, and to make sure that all of the implementations work well with each other. So we’re at the early implementation stage. We have some sort of test chat and test messaging apps that we’ve built on top of it, but we’re just getting to the point now where we’re going to start to build some things where people who aren’t as technical can start to play with and experiment with.

Francis: Yeah. So particularly since real-world Jabber is used by millions — but hundreds of millions of people actually, potentially — are there any lessons you’ve learned from that as to how you can get adoption of a new system like this in a way that’s usable for everyone?

Jeremie: Well I think the biggest lesson is to take your time and don’t try to rush it. Do things well. The adoption comes through having done it well, having provided and created something that isn’t just temporary, that it actually has a lot of infrastructure and support and community behind it. So we’re trying to do that, and this is going to be a many-year project. It already has been many years, but getting it to scale is going to take many more years. So it’s not about trying to get some app that has hundreds of millions of users on it, it’s about creating some really open infrastructure and a lot of implementations of it such that it can become embedded in lots of other places.

Francis: So it’s kind of the opposite of the startup that just sits parasitically on top of something. It’s like a whole new thing, like the web, that provides resources and infrastructure capability to people.

Jeremie: Yeah, and I think a lot of what all of us that are working on it are trying to do is just to demonstrate that this is possible. Even if somebody looks at TeleHash and says, ‘Oh this is great, you know, I can build something on top of it.’ And then maybe they’re just inspired by it and they don’t actually use TeleHash; they’re just like, ‘Okay I can actually do a distributed app and use these different technologies,’ and that’s great. We’re trying to demonstrate that it’s possible to actually build all these communication networks that obey privacy, obey the intentions of the user using it, and have all of the same features — if not more — than the existing apps that are centralized.

Francis: Fantastic! Thank you very much, and is there anything else you want to say, and particularly anything that people watching can do or help to contribute?

Jeremie: So what would be wonderful is if people who are interested in this that are sort of into the developer, low-level systems side and getting the current implementations working well. It would be great to have anybody who’s interested in this kind of stuff and who likes to dabble with crypto and network sockets and low-level system things. Hit telehash.org or hit me up. I’m pretty easy to track down on the Internet.

Francis: Great! Thank you Jeremie, that’s fantastic. Good luck with that, and I hope that you succeed in building it.

Jeremie: All of us who are working in distributed technology, I think are going to make a difference in the long run here, so I’m excited to be just part of that larger community.

Francis: Definitely.