23 May 2023
WOLFGANG TREMMEL: Let's get this started. Hey, good afternoon. I am going to chair this session together with Brian here, and short announcement, well I am trying to ‑‑ I am repeating myself, but hey, the Plenary sessions are governed by the Programme Committee and if you want to join the Programme Committee, there's still time to put your name forward, just an send to pc [at] ripe [dot] net, and we will put you on the list and you can be elected and stand here at the next RIPE meeting.
Good. It's 2:00, and we are starting the first afternoon session, first presentation is by Ana Custura, and she is going to talk about what should networks do with IPv6 extension headers. The stage is yours.
ANA CUSTURA: Hi, everyone, I am doing a PhD at the University of Aberdeen and there I specialise mostly in a wide scale Internet measurements to help protocol standardisation. I am going to talk about IPv6 extension headers and I am going to present to you some results from a recent survey we have done with RIPE Atlas.
IPv6. Pausing for effect. So as you know, it was standardised in the 19 the 0s in RFC2460 so what I want to point out it only became a full standard in 2017, with RFC182100100 so between the 1990s and six years ago you have some years of operational experience that inform what made it into the full standard.
To begin with, IPv6 promised a few things, it promised to fix the lack of IPv4 address space and it managed to do that; it promised a simplified base lead to more efficient forwarding and routing and indeed that is less complicated than the IPv4 one.
It also promised improved IP packet fragmentation and I guess in the original speck the way fragmentation was specified a little bit too complex and flexible and many RFCs, many refinements came along, so to speak, and so in the full version of the spec you have a more constrained way to do fragmentation so it was too flexible for not a lot of benefit. IPv6 promised multicast and I put a question‑mark, I am not talking about local multicast but rather inter‑domain which never really took off, or should I say yet.
And it also promised toned end security, IPsec. And IPsec did take off but I mean, it was mandatory in the original IPv6 standard in RFC2460 but is no longer in the full standard. And in the meantime, other ways of doing toned end security have emerged like QUIC or VPNs. Finally IPv6 promised extensibility and this is what this talk is about. I am pointing some arrows from fragmentation and IPsec towards extensibility because they are done via IPv6's extensibility framework which are extension headers.
They are the filling, the sandwich filling between the IPv6 base header and whatever up layer protocol you have. And they add new functionality into IPv6 so you can have fragmentation via fragment header, via the auth enteritation header or ISP one or different ways of doing routing, using the routing header and finally, the two extension headers that I measured and I am going to talk about, are the destination options header and hop‑by‑hop so as the name suggests the name carries options and these are intended for the ‑‑ for your destination, for the end node. And the IP option header carries headers that are meant to be look at by the routers on the path if configured to do so.
Look at all these amazing header and all the functionality they bring to IPv6. Well, it's not that simple, because extension headers had a bit of a rocky start, there's many concerns around them and they are all summed up very nicely in the RFCin the title there 9098 and I am going to try to summarise it for you, but essentially the first routers that implemented IPv6 did not have a way to process packets with extension headers in hardware. And so they pointed all of the packets with extension headers to the control plane of the router via the internal link and essentially if you do extension headers you open up your router to all sorts of DDoS attacks. That was one problem.
Another problem is extension headers, you can have multiple of them in what is quite a complex arrangement, and that led to implementations that have quite a lot of bugs. And so you open routers up again to the DDoS attack by sending them multiple packets with extension headers. The implementations are buggy, not a lot of extension headers were going around so the code paths were not exercised a lot.
And there is such a bug that was discovered even in the past six months so basically you can send them out from header header packet to the Linux kernel and make it panic. That's the link at the bottom of the slide.
Finally because you can have many extension headers in the same packet, you can have an IPv6 header that is quite big and that places extra processing demands on the router so routers end up not being able to forward packets with these headers at line rate, and so that's yet another problem that extension headers can have.
It comes as no surprise many networks drop packets with extension headers and so measurements done maybe in 2016 in RFC7872 point this out. But the story may not be so dire after all, and because now you have many of the problems that I mentioned in previous slides have seen improvements over the past 20 years, you now have better hardware that can process extension headers in a more sensible way, and recently within the past five years or so you actually see renewed interest in extension headers, many of them are being worked on and have been specified within the IETF. Probably the most widely deployed one that I am mentioning here is the IPv6 routing type, the rest are all nice but destination options or hop‑by‑hop options are used for some sort of telemetry or measurement. So you can have alt‑mark and PDM and these options you can measure to measure packet loss or delay in your network, and then you have options described in IOAM which allows you to essentially collect packet telemetry and operational information as packets flow through your routers.
The last option that I mention here is quite special, because all of the previous ones have something in common, they are supposed to be used only across an operator domain so in a controlled environment, in other words not across the Internet. The last option here the minimum PMTU can be used to discover path MTU, and of course then the scope is Internet‑wide. So the question that I'm trying to answer today is can options be used more widely across domains or in the Internet?
So the way to find this out is through measurements, I will be only focusing on destination options and hop‑by‑hop options as I speak, and here are some measurements that have been done over the past few years, they all report different things, that's because they measure slightly different things and use a slightly different methodology but what they all agree on is that packets with extension headers don't really survive the Internet 100 percent of the time. So, I have done my own experiments with RIPE Atlas, and I have tried to also measure survival to begin with. The way to do this is I took all of the IPv6 enabled RIPE Atlas probes and there's between 5 and 6,000 of them depending on time of day and I have sent out trace route packets to seven destinations, some of which I controlled, for simplicity today I will show results from one or two destinations at most.
So, yeah, you have packets across the Internet, see if they arrive at the destination AS, I tested both TCP and UDP to see if there's a protocol difference and let's see what we found out.
Well it looks like many packets support this, they see a very high survival rate, between 90 and 97% across all of the destinations I have tested and here are the results for the UK. If at this point you are wondering what option did you send, well RIPE Atlas allows one and that's a padding option. That was specified in RFC 2460 so should be well understood by routers everywhere.
And we have sent the minimum possible size of option which is 8 bytes. So, for UDP, very high traversal rates, for TCP not so much. At the good end reversal of 70% for this particular destination, this varies quite a lot depending on where you are going. So I would say between 40 to 70% but the transport difference is there.
What about hop‑by‑hop options? The story is slightly different here. While destination options survive many paths, hop‑by‑hop only survive a few of them. The transfer difference remains so there's still more packets survive than over TCP.
So, I guess next question that we asked was why? Why does this happen? To start to answer that question we first look at per AS survival, where do packets get dropped? Here we found overwhelmingly it's the local AS that is responsible for most of the drops. So for example, in the case of destination options the local AS and that is the AS where the RIPE Atlas probe itself lives, drops maybe 5% of all UDP packets sent, 25% of all the TCP packets sent. So if you think about it like this, you send packetS on 100 percent of the paths and then once they go through that first AS, you only get say 75% of paths still supporting them.
In the case of TCP and destination options.
For hop‑by‑hop options the story is again slightly difference, local drops up to three‑quarters of the packets that you send or rather, it drops all the packets for three‑quarters of the paths that we tested.
So local ASs are a problem. So, next we tried to work out what would actually happen if the packets would traverse that first AS. So in other words, are there any barriers for packets to traverse to the end of their destination if they were to survive it?
So the way to find this out is, we did the test in reverse, that is we sent a trace route from our original destination all the way back to the RIPE Atlas probe, we can do this because most of the RIPE Atlas probes have public, and by public I mean global Unicast, IPv6 addresses, and we kept the protocol and port the same and the question was: Does the packet then on the back, if you test it backwards does it reach that first AS?
And we found out for destination options it definitely does, so that means we tested around 300 paths where packets were originally dropped and we found out that 97% of the test, of the reverse path tests were successful so that means that if you were to eliminate those probes dropped caused by the local AS, you improve traversal by quite a bit because that he is no other obstacles, from 92% to 96% so that's great.
Hop‑by‑hop options: The story is lightly different again because not all of the packets we send on this reverse path did it make it back to the original AS. You could still improve overall traversal, still double it if you were to eliminate those drops from the probes local AS but you couldn't necessarily bring traversal beyond 25%.
So the key take‑away from this slide is that transit networks don't drop packets with destination options but do they drop quite a lot more packets with hop‑by‑hop options.
Why does this happen? Well, you can have many different reasons. You can have weird architecture that simply drops packets because they don't support them. More commonly probably what you have is a network policy so this is basically people, network operators trying to protect their infrastructures with ACL and dropping packets to the ex‑tension headers. And then you have different devices that like to look into your packet, you have things like load balancers which often need access to the upper layer protocol, they want to know port numbers or something and when they see a packet that has extension header they just drop it.
The next question we asked was whether or not extension header size makes a difference to how it traverses.
For this experiment remains the same, except the RIPE Atlas allows you to specify a size in bytes for the extension header that you are about to send, and we send many different sizes, we increment it by 8 up to 64 bytes in size, and again it's the same test; the survival is successful if the packet makes it to your destination AS.
And we found out that yes, there is like some sort of magical number beyond which packets with extension headers get dropped a lot more. So TCP sees the biggest drop in traversal with extension headers that are around 48 bytes in size and for UDP that's shifted 8 bytes because it sees the biggest drop, with packets with extension headers that are around 56 bytes in size. So, notice the 8 byte shift between TCP and UDP and this implies that the size of the transport header makes a difference to the survival of the packet with extension headers, so how does that make sense? Well, we know the TCP header is larger than the UDP header so maybe traversal has something to do with the total size of the IPv6 plus extention header plus transfer header. The fact remains that where extension headers can be used end‑to‑end, often you want them smaller than 40 bytes for them to get through. So, they work if they are small enough but we don't know why and especially we don't know why we see this difference, if you have any ideas let me know and I will buy you a beer.
Which brings me to the last experiment, this is the fun one, novel one, maybe you have seen info on extension headers and size before but you have probably not seen this. The last experiment that we did has to do with load balancers or routers that do equal multipath routing, you have devices in the Internet that like to spread traffic on different paths and then to do this, they use some information, they use maybe certain destination port and maybe they use source and destination address. Port information becomes a bit more accessible is not in the same place if you introduce extension header in the packet, if you do add extension headers does this impact the fraction of reduced ECMP, that was the question we tried to answer here and to do this, we used Paris trace route is essentially like a trace route but on steroids, instead of one you do 16 of them and you combine the results and then you work out what the topology is between a source and a destination. This works because between each of the 16 measurements you change the flow labour and the source or at least this is what Paris trace route in RIPE Atlas does and so load balancers go on to a different path every single time so what did we find? .
Well, if we do a Paris trace route measurement with packets with no extension headers whatsoever, we find a median of around 4 paths, this is great, this is exactly what we expected because essentially this reproduces the findings in the original trace route paper, what we didn't expect to find is that as soon as you had an extension header into the packet Paris trace routes starts defecting fewer paths so essentially what this is probably due to is the fact that you get the load balancers that use a set as basis for hash function and maybe they expect to find port information but because you have the ex‑tension header they no longer find port information and they route ow on to the same path so that's why you detect fewer of them. This has some implications, it means that if you are going to use extension headers for something, you don't want to put them into the same flow so basically if you have a TCP flows that mixes packets with and without extension headers you have to be quite careful because they might go in different paths because of this issue. Also possible fix for this might be to use the flow label, to at least mix the flow label into whatever function it is you are using to route packets in different paths. But I will now give you nice practical example of the problem that I have just described, so I will show you three figures.
All of these are basically measurements of the same source destination pair and the first one that says no extension header measurement is the vanilla measurement, the one measured without any extension headers so Paris trace route in this case discovers four different paths and you can see there are two different load balancers there, and then in your second figure you can see what happens when you add destination option into the mix, suddenly all of the packets take the exact same path, possibly to the problem I have described. Now, you you will probably not guess when you add a hop‑by‑hop option option on this particular source destination pair and what happens is probably a bug or something quite unexpected, suddenly you run Paris trace route with packets with hop‑by‑hop options and that first router seems to basically enumerate all of the interfaces it has so this test takes 14 different paths, this is clearly not expected behaviour, this is probably some sort of bug, I would assume, but what it shows is that not all load balancers are out there are necessarily able to cope with packets with extension headers today in the Internet. We get to the question where I answer the question in title, what should networks do with them.
If you have vulnerable hardware so hardware you know processes extension headers on the slow path you have to have some firewall and I mean access control list in place to just block them or block the ones you don't need, even better.
If you have better hardware that doesn't do stupid stuff, then you don't necessarily want to firewall all of them, in fact if you unnecessarily block extension headers when you don't need to you can stifle innovation so basically you can discourage people from specifying new options.
There's two things that will help extension header story, one is better hardware which I have mentioned and the other one is having ‑‑ there's a few new drafts in the IETF that have appeared that aim to talk about how you should process them and aim to steer the discussion in a direction that would make it then easier in the future for these to be processed.
So, if the other question was can we extend IPv6 the answer is it is done right now, all of the options that I have talked to you about in the beginning, they are being used by people within their own domains, that's quite low risk because you control your own hardware and you know exactly what you are going to get.
If you want to use extension headers in the Internet opportunistically, well, you are going to have to wait a bit, destination options are quite close, almost there because they traverse most of the paths but for hop‑by‑hop options which is going to have to wait and see, if I am going to speculate I think the operational experience that you are getting now from people who run these things in their own domains and the improvements in hardware will eventually translate so if the right option comes along, you might as well see it go.
And this concludes my talk, there are some references on this last page.
BRIAN NISBET: Okay, thank you very much. Do we have any questions? We do. Please.
JEN LINKOVA: Thanks again. I just have a comment. I think I have said that before, so with operators normally permits which we need, right, so actually before fragmentation might be broken as well as v6 if nobody cares, so that's why I think people are slowly fixing fragmentation, people actually are using ESP quite ‑‑ they will be using it more because every time they use wi‑fi calling for example, your phone actually established to a gateway, so as soon as you switch to v6 only network you have to use ESP and if you block you would notice it, right? So I guess it's ‑‑ we should not be putting the cart in front of the horse, right? There are headers which are used, and I think ‑‑ and the reason we have them broken and filtered is because we have had the safety net of happy eyeballs and I would expect the situation improve as soon as we start deprecating the legacy protocol in the network, like RIPE network ‑‑
ANA CUSTURA: Yes, agreed.
BRIAN NISBET: Okay. Please.
AUDIENCE SPEAKER: Talking about happy eyeballs did you test how stable your measurements are, is it if you ‑‑ if you measure it once on certain parts it always stays like this, so if it's broken, it's broken and if it works, it works?
ANA CUSTURA: Yes, so I have repeated my measurements number of times and that is generally the case. Interestingly, when I did the ECMP test what I found is you actually do have a handful of cases, not that many, where the different, if a load balancers routes you on to a different path that may be broken, three paths work, five don't depending on exactly where you are going, you might get there, you might not.
AUDIENCE SPEAKER: To conclude Jen's comment, it's a chicken and egg problem, we have happy eyeballs it makes it work, it makes it more complicated, I don't know what to do
ANA CUSTURA: I think maybe we need the right option to come along.
BRIAN NISBET: Okay. In that case, thank you very much.
Our next speaker who is coming to us live over the Internet is Geoff Huston from APNIC and let's just ‑‑ that's ‑‑ we have Geoff and we have some slides and so Geoff will be asking the question: DNSSEC, yes or no? Please take it away.
GEOFF HUSTON: Thanks, Brian. My apologies for not being able to be with you this week, I would have liked to, I almost got there but family medical emergencies intervened, but here I am.
You know, over the last few years I have become fascinated by technologies that just don't quite work the way we thought, and you know sort of if you look at that, what hasn't worked the way we thought it would, you think IPv6 would be the major sort of area of study and you would probably be right, 30 years later we are still running a whole heap of v4 and v6 is still just waiting, it's kind of how much longer. But let's move along a little bit because you have heard about v6 this afternoon. To another absolutely fascinating topic which is the DNS, and this whole issue of how are we going with DNSSEC? Because quite frankly, it's not working the way we thought, and why isn't Google.com signed with DNSSEC or Amazon or Facebook or Microsoft or a whole bunch of other domain names? If DNSSEC is meant to be so great, why aren't major Internet enterprises using it? Are they just ignorant and the fact they are not using DNSSEC is really just haven't heard about it yet, maybe on Thursday we will get around to it. Or the real reasons why they are not, and if one group of folk are saying look, DNSSEC is actually really very, very useful, you need to do it, if you want to get a new gTLD through this ICANN process, you have to DNSSEC sign that top level name, so some folk are going yea, yea, this is great, just do it, but there's a whole bunch of others that aren't. Why? Why the mixed signal? And I am kind of fascinated with that. With do we sort of look at this technology and think you'd think that having a mechanism that says this is the answer and the DNSSEC, and you kind of go yeah, really? Having a mechanism to go is that a lie or is that the truth, would actually be really helpful. You'd think everyone would want it because, you know, the Internet is a really hostile place, if we turn the lights off it will glow in the dark it's so bad. So you'd think having some mechanism to actually say this is a good answer, we would all be going I will five of those, this is fantastic. Yet, there are a bunch of folk, big enterprises and major, who sit there going yeah but no. So, let's have a look at this a little bit.
What I wan to actually look at is the case for and against. So what's DNSSEC in one slide, there you go DNSSEC in one slide. Hope you are a speed reader because it's gone.
The case for no, I don't need no DNSSEC, and part of the issue is, despite all of us saying keys are easy. Keys are hard, keys are really hard and, quite frankly, we have a whole lot of trouble keeping NS delegation records in the DNS properly tamed, so this whole issues of trying to deal with keys, wow, we trip over our feet constantly, there is a website there that logs them as they come and I don't expect that to go out of action any time soon. The mistakes just keep on coming.
How often should you roll your keys? You are a zone admin. Every six months because you need to keep the practice in? RIPE first tried that with keys on their reverse zones and they found every time they rolled the keys the query traffic just rose and rose and rose because there is a bunch of people out there getting pre‑canned configurations that had these keys built in and every time they changed their key they go, wow, I don't like the answer. So sometimes regular rolling gets you into trouble. On the other hand, if you roll infrequently like, say, every five years, like the root zone key, then every time you do it, it's a new adventure into the unknown, with new staff and new procedures and new ways to stuff it up.
So, in some ways, it's not easy to figure out how to actually do that kind of: you need to do this regularly but not every day, kind of thing. So, we are still not sure how to deal and manage with keys.
The procedure for passing part of my key upward to my parent, you kind of go it's just like an NS record, like a delegation record, you just hand it your parent. We have built an entire industry over there in the registry, registrar ICANN land about trying to get the NS records right and it's an incredibly complex process, EPP, the lot and when confronted with well let's just add DS records to this, everybody throws up their hands in horror, can't do it, really difficult, I don't know how, all of a sudden we are confronted with yet another problem, the processes of doing it with semi‑automation and polling always strikes me as bizarre that whenever you get a mechanism where you can't do explicit signalling the first thing one reaches to is unconstrained, polling, probably the worst idea so you sit there and have you changed yet, have you changed yet? It just seems so wasteful.
So, this doesn't work. And of course, there's a whole lot in DNSSEC to get warped, to get twisted and to mangle. We have talked about key management, what about key zoning, how long should it be, 20 years or two weeks? Should we sign the entire zone at all at once which is fine if you have got five, if you have got a few 100 million this could be a problem.
Do you use front end signers instead? What's going on here? What about the TTL settings on all of these records? And that's just the simple zone management problems.
Very few folk run their own computers in the basement any more, they just put it out to a Cloud, we are a service‑orientated world and what do you do with DNS, I hand it to DNS operator and they will look after it. What if they fail? I will hand it to two, surely two is better than one. What about that single key that you are doing with your front end signers when you have got five different operators all handling DNS zone publication for you, how do you manage that? How do you syncronise across all these provides and issues with multiple keys and DNSSEC and behaviour.
And then again, you know, just to make life even more fun, the wonderful mess that is NSEC3 which could only have been invented in the IETF, couldn't it? No one else would have dared to wander into such a space, which although it sounds simple, has become hideously complex.
So, you know, things are getting more and more complicated. And then we get into the next issue of incremental signing, because we invented these negative records, no data and no such domain, because the assumption was that online signers wouldn't work, we shouldn't be putting private keys out at the front end, all this stuff doesn't work, we should have assembled zone, sign it, and serve static material. That was ten years ago. We have moved on. And now we are actually seeing a huge amount of use of these incremental signers, where you push the answer out to the edge of your infrastructure and then attach the signature as the answer leaves. Which is great until you think about NSEC records and NSEC3 records where somehow you need to know a bit more about the zone than just this name doesn't exist. And so inventive folk have got extremely inventive and started to invent compact forms of these spanning records of NSEC and NSEC3 which actually just span the one byte around the name that doesn't exist, there really are a synthetic form of saying no, not really, I am not going to tell you what is in the domain, I am going to limit this all the way down. And now we are actually talking about saying well, it's the same NSEC response whether it's no data, in other words the name exists but you are not asking for a results record type that is defined for or it has no name out there and we are trying to look at redefining those two kinds into single that says no, that's not there. And there's contract form of denial of service could often of course confuse resolvers out there because resolvers always get confused.
What kind of crypto do you need? If you are talking post quantum resilience, this is a very, very difficult answer, because all of a sudden if you try and stack things up and get more complex algorithms they just get bigger, and the larger they get, the harder they are to compute; the harder they are to compute, then the more time is spent. And all of this means that as we get these responses that get larger, larger keys etc., then all of a sudden UDP starts to fail out, these large responses stress out DNS over UDP, you start to get truncation, let's switch to TCP, more roundtrip times get burned.
And of course if you get anything wrong in the configure, as always in the DNS there's this huge amount of cache configuration out there, cached content that isn't going to fix itself, you have to wait for TTLs to expire. The problem that SLAAC got into, October 21, when it found itself in the blackhole of there is no such thing called SLAAC, had to wait for 24 hours to repair because of the TTL choice. If they had a smaller TTL you could have resolved it in an hour but no, we want long TTLs; oops!.
Last but not least, when we talk about validation, very, very few resolvers out there in sort of user land, my phone, my laptop, whatever, they don't validate, they just rely on the information provided by the recursive resolver somewhere upstream. Hang on a second. That path between the recursive resolver and me is entirely unprotected. If someone attacks in there the answer is, sucks to be you, I guess, because there's no protext. The only thing that says the recursive resolver did it job it sets one bit going yes I validated this, I have set the A B bit, on unprotected path between the resolver and the stub. Talk about trust. This just seems to be naive trust, incredible trust and probably misplaced trust.
So, validation is a problem, it's slow, it's error‑prone, it stresses out UDP and most end users don't even put those validation functions in the edge. If they did, every wi‑fi hotspot would be non‑functional. Let's rely on someone else to do, set the bit and go we are good which seems bizarre for a security product.
There's a really big question: All of this work, all of this effort, what's the threat? Well, the late lamented Dan Kaminsky came up with a very novel form of, you didn't need to be on the path to see the query but with a certain amount of jarring responses back down to the target victim, you could actually make a random chance of getting the right answer inserted back in first and mislead the end user, and this style attack worked but there are lots of ways of countering it, and the way we have chosen, oddly enough, is not everyone should run DNSSEC, far from it; we just randomise the port number and randomise the upper/lower case of the query name, and check that the response has the same randomisation and then at off‑path attack trying to guess what was in the query in order to get in first with the wrong answer is largely countered. So that's not a threat, not as bad as it was. So, all that's left is on‑path direct attacks of response substitution. But even then, because you see, my app and your app, my browser, your browser, everyone else basically can't count on DNSSEC, the whole reason why we are all using TLS is because it can't count on believing the DNS, it doesn't believe that the answer it gave is necessarily the service I need to go to. And so, the belts and braces approach says look, irrespective of whether the DNS ‑ I am going to start up a TLS session and send it, you know, do the certificate dance and basically if I don't get the certificate for this name, then I'm not going to actually go there. If I don't get that key signed by the right private key, this is not the authentic server I want to go to.
In some ways all that's going on is TLS is actually what's saving us, DNSSEC is kind of well, you know. So, inside all of that, why bother? The case for no, the reason why I guess, Google does it, the Bank of America does it and a whole bunch of others don't, is it just makes things slower. There's more to go wrong, we don't even know what we are protecting against and we are adding complexity, fragility and costs without understanding the benefit, or by the way TLS is working just fine.
If I convinced you? Have we made the case? Because of course, like any good debate, there is a yes case, and the yes case is I suppose somewhat more subtle, because you can accept the fact that the threat model around the Cumminski attacks, etc., aren't really compelling so in some ways DNSSEC, as it stands, is not the answer, but DNSSEC does one thing that we are kind of missing that would be good to have, because if we really could trust the DNS responses we got, if we could believe they are authentic and current, then we could use the DNS for roles that we are not really true today. We could use the DNS to actually do things where, today, it would be really, really unwise. And I suppose you have got to start with the question: Is TLS actually working? Is TLS doing a really fine job? And it's a good question, because quite frankly, the whole issue of domain name certificates sounds truly, truly bizarre. I go off to somebody else and I probably never met them before, some third party that, and I say to them that's my domain name and they say well give us money, Geoff, and I peel off some money or these days if I use let's encrypt, I don't pay them any money at all, and they then issue me with a digital object that says this is Geoff's domain name, what? Okay, great.
And then I present that certificate to everyone else going, this is really my domain name, they have certified my key pair, I have signed this as my private key, this is genuine. Sounds sort of weird. Because quite frankly, there's a trust model going on here which is pretty odd. There's about 1,000 or so, I think it's about 850 entities that are members of this certificate and browser forum, the CAB forum, and they issue domain name certificates that are trusted by browsers. So if you are a member of this forum, then the certificates you issue are trusted. Well, great, but what's going on here? Well, this certificate issuer says I'm going to take some tests to determine if you control a domain name. What's that test? I will put a text record in your domain zone. If you put this text record in, with some value, then you obviously own the domain name. Okay. Sounds a bit suss. But what's the certificate say or what does the process says? It firstly says I will never deviate from this test and always give everyone the same test when they deviate, registration agents etc. Secondly they will undertake they never, ever, ever lie about the certificates they issue. Except when they do, oops.
And they undertake their systems that they have online to actually do these tests and issue these certificates are never, ever, ever comprised by hostile attack. Except when they are.
So it's kind of a weird trust model, isn't it? And quite frankly, any of those trusted CAs can issue a certificate for any domain name, and so I don't have to, if I want to imitate or get a certificate about your domain name, I don't necessarily have to comprise your certificate authority, the one you used; any certificate authority is fair game, I can knock off any of them and get a false certificate for your domain name and folk will believe it because penny is a really difficult problem, so difficult that in Chrome, the Google name is actually in the code, the pinning is done directly for Google, the rest of us, ah, well, you have all got a problem. Okay, let's solve this problem by transparency so all CAs should stash a copy of the issued certificate so all issuance transactions are recorded for posterity or something. I have seen attacks that last for all about an hour or so, nobody cares about the footprints they leave behind, the damage has been ex filtrated, nobody cares about transparency logs, they are a mere tissue, a pantomime of good security here, they don't work. The whole thing about revocation is a nightmare, I pulled up a certificate, revoked it immediately and started to run a massive ad and the number is scary because quite frankly, almost nobody, almost Apple does but almost no one else checks for revocation of a certificate so even when the issuer and the subject say this is a bad certificate don't use it, everyone else goes no, certificates don't care and so those keep on using T ISC P, none of this stuff works well.
So it leads you to the thought why do we trust TLS? And I think the answer is scary, because we don't have any alternative. There's nothing better out there. We trust it because there's nothing else to trust. It's a desperation rather than a well‑engineered system. Could we do better? And this is where this thought bubble about DNSSEC comes in, because what we are really trying to do with certificates is to associate a domain name a service name with a key pair, and if you can do that secure association without all the paraphernalia and theatre of certificates and just simply put it in the DNS and sign it with DNSSEC, you get everything that certificates offer, the X509 system without that added rigmarole that either the X 500 system brings along, you currency and authenticity and aren't a part of the story any more, so DANE which is what we are talking about, allows a clean association of keys and names, we can do this, the domain name certificates is to attach third party commentary has had all kinds of problems. But that doesn't get over the fact that DNSSEC sucks, it's just hell. And the real answer is, don't use it. Let's not use DNSSEC and supplying signatures and validation using UDP at all. Walk away from this, it doesn't work. And the answer is, well, can we do this over a different transport? And of course, the RFCseries these days is a lot like the Bible, everything is there, everything, anything to be said, anything to be contradicted, there's an RFCfor it no matter what you are trying to do and so if you look at RFC7901 you see a very cute idea of pushing the workload of validation from the client across to the server, and the server actually assembles the validation chain and just passes over to the client saying look, use your copy of the root zone key signing key, the KSK, and you authenticate all these answers you would have got if you had bothered asking, if you can validate that, this is good, which is a really there's another RFC as well, that talks about using that chain extension inside the TLS handshake. So instead of passing certificates, I just pass the DNSSEC record that says this is the key, this is the reason why this key and this domain name are bound together, use your copy of the root zone KSK to validate this and press on and all of a sudden there's been no DNS queries, it's queriless, DANE and DNSSEC can fold into TLS without all of the time, trouble and hassle that we have currently got with third party certificate system. So, if we are serious about DNSSEC, we have got a whole bunch of problems that we need to fix and one of them is, maybe we are not just not thinking about the problem right. Incremental queries and answers and the DNSSEC do not work well together, and let's not try to make it, we are going to waste an extraordinary amount of effort not solving this so let's move on and away and start looking at Dot and DoH and TCP transports and trying to fold DNSSEC directly into the application levels that need that kind of information.
Now, we can do a bunch of things here and change a lot of this, but it's going to take time, money and effort, which is fine until you realise that I have never paid for a DNS query in my life and neither have you. No one pays for the DNS, they just don't. It's kind of this oh, well it's crud in my ISP service fee, whatever. And the folk who run open resolvers well you know, success disaster, the more I use them, the more they have got costs that they are not getting any money out of me. So in some way the economy of DNS actually works against the deployment of improvements in DNSSEC. And we might be stuck with UDP because of economics, not because of technology.
So, maybe we should rethink this completely and walk away from DNSSEC being an attribute of queries and answers and think about DNSSEC as a server side function right up in the application. And in the same way that a huge amount of the Internet is actually being reimagined as application level interactions, even QUIC is an application level system, not a platform or infrastructure system. And just simply think of DNSSEC in the same context as an attribute of applications.
Because as far as I can see, there's kind of this chicken and egg problem if we try and make DNSSEC work in the DNS, that it's really hard to make a compelling use case that makes DNSSEC in the query response model essential, it just doesn't work. And quite frankly, we can do it differently and put it on the server side, but that only works if everybody signs. We can do a much better job in creating service side authenticity and leverage DNSSEC to do so, but quite frankly, with no one signing, there's no point in doing it. So, in some ways the why bother and no point sort of things wrap around each other and self‑reinforce the problem that it's really, really hard to change things.
And when we talk about well, is DNSSEC going to fly or not? The issue is, this is a really big Internet, it's massive, and widely distributed diverse systems hate change, they just loathe it, change is challenging to orchestrate, costs and benefits are misaligned, they are incredibly averse to flexibility and agility and even if the certificate system is creaking, groaning and falling apart, that might still not be enough to convince the world to change, the benefits are marginal, the transitional costs are really, really high.
And I think I can generalise this a little bit, why has v6 taken so long? Did we get the design wrong, is it really, really broken? And I think the answer is no, not really, it's a fine protocol, if that was our first thought back in 1983 we'd probably go yeah right, works fine. It's not that it's v6 or DNSSEC, it's that once a technology sits in the middle and is the dominant technology at the time, it is awfully hard to unseat and when we talk about security and bolting on security as an afterthought in so much of the IP infrastructure, we kind of land up with the issue that yeah it's a really rough fit but we are actually getting precisely the amount of security that we are prepared to pay for, and that's a sad answer. I have got a couple of minutes left by my clock then hopefully if I hand it back to the room, there are some time for questions or bricks or anything in between. Thank you very much.
WOLFGANG TREMMEL: Thank you. I see three people lined up at the microphone and since we are quite short on time, I'm closing the microphones right now. So can you ask your questions but the microphone queues are then closed.
AUDIENCE SPEAKER: Max Tulyev from Netassist. The question is, if you compare on slow satellites DNSSEC and DNS over TLS, what will be faster and what approximately their rate, how much faster, thank you?
GEOFF HUSTON: DNS over TLS versus DNS over UDP works best in the stub to recursive resolver where the same session can be reused. So, it takes time to set up TLS, oddly enough DNS over QUIC is faster, so once you have set that up, and you are then using that same security association to ask again and again, it's much the same overhead, so in the stub to resolver case, continuously use session it makes no difference between you use UDP or TLS or QUIC even to actually handle the sequence of queries. In the recursive to authoritative server situation all bets are off, all bets are off because there's no single session that can get reused and so recursive resolver to, TLS, QUIC, they all become a little bit scary, scary because they just take up time and resource. Thank you. I will be quicker next time, I promise.
WOLFGANG TREMMEL: There's one online question from Shane Kerr:
"The concerns about compact online signing are overstated, several vendors have been using it with separate implementations for many years, it's different but it works."
It's a remark instead of question.
GEOFF HUSTON: Fine. I am well aware of Cloudflare's work; I think it's brilliant.
AUDIENCE SPEAKER: ISC, I think you raise a good questions that we have to think about it but I also want to say that the case for no one, sketchy, too dark, that's key management, key roll‑overs, there is Open Source software, BIND, not DNS where it's turning on an option and it works for you, it works for 90% of the people here and yes, so when I think of DNSSEC is hard, it's hard on protocol level but really on operational level it's not that hard, it's very easy.
GEOFF HUSTON: I have so much sympathy for the good folk at SLAAC and Amazon who manage to kill SLAAC for 24 hours when they signed their zone. There's so many bits that just don't work right and unfortunately it's so easy to get it wrong, so Matthijs, yes, I would like to be confident too, I would love to be confident, it would make my night, but the experiences we have tend to say we still haven't got it quite right yet but we live and learn.
JIM REID: Another failed travel every on the DNSSEC bus. Fantastic talk as always. I could probably rant to this microphone for as long as you have talked today about all the defects and problems with deck, I realise I am up against the clock, so I am going to keep it fairly short. I think one of the things that needs to happen now in my view is time for the IETF to think things properly about this. I think we are at this stage now where DNSSEC is a stinking mess, it's never going to get deployed universally, give up, it's never going to happen and should stop wasting time in IETF trying to make it better and easier to deploy, the tooling is never going to happen there and never going to work because the protocol is fundamentally broken and the main reason is nobody gets the benefit of signing because there's benefit for others, if you sign you take a risk, if you are on the ‑‑ you are taking the risk so that's never going to work out, I think this time we have to sort of say stop this nonsense and perhaps rely on just using encrypted DNS transport such as DoH perhaps, or DoQ. DNSSEC is a dead dead duck.
GEOFF HUSTON: I have a very quick response to that, Jim, and it kind of goes: This talk was never going to happen in the IETF, it was only going to happen in a room of network infrastructure operators, it's not the IETF's decision to say no, it's a standards body, they are addicted to saying yes, they can't resist, they will always standardise stuff, no matter what. It's actually the infrastructure operators who have the call on where and how they spend their money and the way they define their service. This is the room and the people who honestly need to think this through. So, this talk is to you, audience, and to you, folk, who run infrastructure going, think about what you are doing and what your customers want and if DNSSEC isn't working, don't flog the dead horse, move on. And I think in that, I agree with you completely.
JIM REID: Thanks, Jim. I want to make one quick follow‑up remark, it's a pity you cannot bring this discussion into the IETF because I think it has to take place there and it's a crying shame that can't happen. It's an entirely different argument but I think that's an important thing that has to be borne in mind, too.
GEOFF HUSTON: Thank you all for your attention and bearing with me online. Thank you.
WOLFGANG TREMMEL: Thank you, Geoff. Quick reminder, please rate the talks and also the ‑‑ you can still put your name forward for the Programme Committee elections.
Next speaker is Randy Bush. Is he online? Randy is going to talk about RPKI ecosystem measurement. Randy, the stage is yours.
RANDY BUSH: First, I want to continue the conversation of a moment ago and note that there is a higher percentage of operators and operator presentations in RIPE than at the IETF and there is a higher percentage of researchers and research presentations at RIPE than at the IETF and there are no vendor presentations so far this meeting, and I don't miss them, thanks to the RIPE Programme Committee, RACI and to the community.
So anyway, so this is a story about the RPKI, as usual. Romain had to present this at the active measurement conference and an academic audience so first he had to describe the RPKI, and the RPKI is another example of complexity that I have said for some years, it started mostly ‑‑ started simply for most things, if we want to think of our general structure, the classic three planes except I didn't take a plane to get there, I am coming over the Internet so there's a management plane management protocols, we drive configuration down it, and RPKI and ROV are one aspect of configuration. And so they drive down the management plane, they affect how BGP happens and then they affect the payload and the payload is important because that's what the customer is paying us for, so we use the RIRs and the RPKI to maintain our data and it reaches all the way down to the data plane to move the packets. But what happens when something goes wrong up in the management plane and the RIRs and I put in a bad ROA, how long does it take to fix it?
Well, let's think about how to measure this, on the left we see the management plane, on the right the control plane and the data plane on the bottom and the ‑‑ we can take timing from, we call this the query, the user tells the RIR, hey, create a ROA, please, the RIR, we can measure when it signs it from the signing time, the publication from when it appears at the publication point, when validators get it out on the Internet, when BGP is affected, we see it at the route collectors and then on the data plane we can actually trace route. So, being experimentalists, we get prefixes from all five of the RIRs, some 24s in 4, some 68s in 6 and from one AS, which has both ROV extremes and direct peers which are non‑validating, and as a separate experiment we get a bunch of prefixes from RIPE that we use from different ASs over on the other side of the world that are all in the non‑validating universe, and we measure for eleven months, well what do we measure? Well, we use something we call ROA beacons and so we ‑‑ each of the five RIRs, depending upon their user interface we either get to use an app I or we use screen scraping to create and delete ROAs. We divide the two into two groups, control and test, the controls always have good ROAs, those prefixes are always valid. The test prefixes always have an invalidating ROA, locked down all the time, and then, half the time we announced a validating ROA and if you are an RPKI freak you know that a validating ROA overrides the invalidating ROA.
So, let's see what happens here.
We create ROAs, we see when they sign it, the not before time, the publication time, where relying parties get it and when it appears in BGP, this is ROA creation. First we notice that APNIC has a 20‑minute matching time so the mean is 10 minutes. The parentheses are IPv6, okay?
The publication time of ARIN and APNIC ‑‑ pardon me, ARIN and LACNIC, are greatly skewed, something is going on here. It turns out, and I will go into more detail, that the hardware security modules used to sign the ROAs are due to the fact that they are normal infrastructure for grown‑ups, are doing everything in GMT, but ARIN and LACNIC are running their servers in local time. So you get a monstrous skew, where the not before is hours before the publication. Now, we noticed this after a while, we told them and they hacked around it, but this greatly skews their data. We see for instance, AFRINIC, we do 15 minutes it's BGP, RIPE about the same, this isn't bad. Okay.
So, the ROA creation delay is significantly different across RIRs and we know at least one NIR and we didn't measure NIRs, that only publishes once a day, and originally APNIC only committed to once a day but now they batch 20 minutes, so the relying party we used was unfortunately only one instance of software and you should check out Phillip Smith's measurements on how relying party software varies and it is not pretty. As Geoff says, we are running it in a pretty broken universe. We did not test RPKI to router performance because that felt like DNSs and therefore those measurements would taken effectively zip, so that's fast.
Some relying party software has head of line blocking, in other words it is going along and it hits a publication point that it can't reach and it stops dead until it times out and the other publication points are not fetched until the time has happened.
The collectors at RIPE and RIS, they are great if the control prefix is missing, we throw away the measurement so we get the BGP effect, we only used RC 0 and 1 and studies have shown that's enough, RIPE and RIS has all the biases for years, just like leaking your earlier prefixes in an earlier presentation today.
ROA revoke delay for a ROA, APNIC has had a stucky interface and the 20‑minute delay which skewed their data greatly.
Click, clicky click, there we go. So we can see the ‑‑ we look at CDF of BGP update, to a create, and we see the time zone bugs for LACNIC and ARIN. I have told you about the time zone bugs. Run your infrastructure in GMT, this is a global Internet. Withdraws are slower than announces, okay, and we will notice two interesting anomalies here, first, right shift, but slow slow up to closure and there's two things happening:
The reason withdraws are slower is most of the big ISPs have multiple relying party caches. For a ROA to validate, only one of the caches needs to have a successful ROA. For a withdraw to work, it must have withdrawn from both of the caches. So, the polling skew of the caches skews the delay time for withdraws.
So, here we look at sprint and NTT with their consent. You will see the publication wags for the signing problems, we will see the slower withdraws, we also see the relying party software has a stuck ROA bug, if you have seen papers and presentations on BGP stuck prefixes, well, we have publication points and relying party software with stuck ROAs.
So, BGP update delay again showing a slow poleers between these different ISPs, here is a range of sips, some have slower pollers and you see the stuck ROA, that it never reaches one in some cases. Okay? The stuck ROA bug again, this is a withdraw, okay, as opposed to an announce. And the data plane measurement trace route from Atlas probes to the test prefixes, infrastructurely, the results are similar to BGP, in other words the control plane does seem to be in sync with the data plane, the one cute thing is for those who know path hunting, path hunting is my router really has, depending upon how many peers it has, multiple paths for the same prefix, if I get a withdraw or an invalidation on one of them, I fall back to the second and similarly to the third and fourth, that's called path hunting, and here, the red vertical lines are the validating ROA has been withdrawn and we see that we accept for a few minutes some other paths, that's the only thing interesting we learned on the data plane.
So, we have two major delays in fixing the bug where I issued the bad ROA, the delay at the RIR's publication to the publication point and the ISP delay getting it into BGP.
The polling delay of ISPs is big. If we assume time zone anomalies are fixed, let's not worry about those two and let's just look at AFRINIC, which takes the majority of the time is between publication and the relying party, the same for the APNIC prefixes, the same for the RIPE prefixes, so what we are seeing here is the ISPs are polling infrequently. This is a historical artefact because when there was only r‑Sync, the RIRs worried about the loads on their servers, but now, RD P dominates, we don't need that fear, so, the problems we have are BGP propagates in minutes, not as fast as we might like but it's still pretty Darren fast, RPKI is significantly slower which kills both or damages both time to repair and time to authorise a DDoS mitigator and things like that. We found two HSM problems, head of line blocking on some bad relying party software, we also, I am not going to discuss it here, when we dissect the ROAs between the RIRs we found some interesting differences. The limitations in the study was the relying party software, we only used one example and it wasn't a very nice one, it had a fixed rate so we could vary it to see the resolution. We didn't measure RPKI to routers but that should be fast, my family has joked about "should". We also did not measure delegated CA, like big ISPs etc. And between the RIRs how to automate, ensuring in withdrawing ROAs was hell. The researcher who did the heavy lifting on this was Romain and he says he never wants to do an Ops‑heavy experiment again, we just lost too much time doing this.
Okay. This was a paper at PAM. Again the full author list.
And of course, this is an operator form, that's nice research but what can we do? CAs and RIRs please publish very frequently. Relying parties, poll at least every ten minutes or more frequently if using RRDP, because caching and the if modified signs header means the load on publication points is negligible, don't be afraid of hammering them.
Not too frequently if it's r‑Sync, maybe 30 minutes and yes, this discourages r‑Sync but we are trying to deprecate it and you can tell whether it's r‑Sync or RDP by the URL from which you are fetching and by you, I mean the relying party software.
Okay? As protocol designers, BGP is the only large scale push protocol we have and I agree with Geoff, that poll protocols suck and money sucks but it's what we have got. BGP as a transport, unfortunately if we try to push RPKI data over it, it's a dangerously shared fate and by that I mean, the purpose of the RPKI is a parallel set of data which are meant to check each other, if you use one to transport the other, it's called shared fate and among other dangers of BGP, is it is utterly unordered, reordering at every hop is pretty much guaranteed. We did try back in '98 to design a system with DNS for it, that draft is still in the repository somewhere; the problem is, make before break, if I'm moving ‑‑ if I have connected to Geoff's ISP right now and I have a ROA for it, and I want to connect to Illich's, I have to issue ROA for Illich's I make before break, I switch the circuit and takeaway Geoff's ROA but in DNS I can't have two delegations. Also DNS is again a pull protocol. We fantasise about that for the management plane that is immune to random attacks. I fantasise about ending world peace and world hunger, IETF doesn't seem to be a great place to put big hopes in about this.
And a warning about what is happening in the IETF and it's a parallel to a warning Geoff made, which is: If you remember Bert back five years did the rise in DNS complexity, well, 18 years older before I made a presentation at the IETF Plenary, it was the same presentation except it's a horse instead of a Kamell and the DNS and we are overloading it and we continue to overload it.
So, I want to ‑‑ one of the most famous computer scientists of the last 100 years, Edsger Dijkstra says don't be confused by the complexity of our own making and this is a big warning to the glorious IETF. Thanks to these people, rack space, bandwidth, routers, switches, at loss probes, etc., etc., and that's it. Thank you for your indulgence.
BRIAN NISBET: We try to discourage position statements masquerading as questions.
AUDIENCE SPEAKER: Tray Darley, Accenture Security. Loved the talk, just as an important though, question: Where do you see the role of standards organisations in the future, like so much have standards work seems to be happening in one codes de facto and defined in small Kebals from what I see, I am not such a player in the IETF but I have done in some other standards organisations and I think you touched on a valid point, how do you see standards organisations evolving or what role do you see them playing in the coming decade or so?
RANDY BUSH: We need some place to generate complexity. Operators are too lazy and want things too simple.
BRIAN NISBET: Okay.
AUDIENCE SPEAKER: David Lawrence sales force. I appreciate that this is an operator‑first community and I have an enormous amount of respect for Geoff and for Randy and for Jim, but there's quite a lot of negativity here for the IETF right now, I would just like to point out that in the next session we will be having an IETF BoF to better explain how we can be working together rather than at loggerheads with each other, the IETF very much does need operator input on how to make things better, I would just like to kind of veer us away from everything is terrible about the IETF.
RANDY BUSH: You are speaking to someone who sat at a different table and had one of IPv6's major designers look him in the face and say "operators don't even understand logarithms".
ROBERT KISTELEKI: Pretending for a moment that I don't work for the NCC, there's a slight difference between IPv4 and IPv6‑related timings in your slides. Can you offer an explanation for that?
RANDY BUSH: No.
ROBERT KISTELEKI: Thank you.
BRIAN NISBET: Okay.
RANDY BUSH: Interesting question, of course, no, no we didn't dive into it, I'm sorry.
AUDIENCE SPEAKER: Tim NLnet Labs. So, thank you all for doing this research, I think it's really valuable to see what the propagation times are that we are looking at, which, you know, have an impact on DDoS, have an impact on mistakes you make with ROAs. I will get to my question or statement:
I think it's really good if there is a strong indication made about what is an acceptable delay in RPKI validation because the thing is that I asked this question a couple of years ago and I didn't really get a clear answer on that. When I used to work for RIPE NCC, I worked on fully asynchronous RPKI validator that doesn't block but the cost is you get a huge amount of complexity and if the story is half an hour is good enough, then you are going to get code that does the simple thing, it does sequential processing, it blocks because it's easier to reason about, it doesn't make it right. If there's a strong message that no, this needs to be faster and needs to be resilient and needs to be not block because some repository is being a bad actor, which to my mind is well, pretty clear then they shouldn't. I think that message will help because that will guide relying party implementers to make that. And the story is no it should be simple to reason about and just do this thing eventually but do it well, then you get a different solution.
RANDY BUSH: I hear your point, Tim, and I think, as an operator, I would like it to be on the scale of BGP. That not necessarily just two minutes end‑to‑end but could we get it 10 to 15 or somewhere is reasonable that a mistake is acceptable and secondly I believe you can do relying party software that doesn't have a line block, with reasonable software engineering and I think that the broken publication point that's causing that software to get ahead of line blocking are not stupid people, it's just operational mistakes, it's hey, I need to protect my publication points so I put it behind a firewall and some clutch changes the firewall rule until we catch it, you know, and whether that's the firewall of an institution or a firewall of a rather large country, sorry to rant on. But I hear your point, Tim.
BRIAN NISBET: Okay. There is nobody else at the mics so I think that's that. Thank you, again, very much Randy.
Okay. So, that concludes our session, coffee break, I would ask you again to rate the talks, if you suddenly wish to nominate yourself for the PC, you have 120 seconds, give or take, and yeah, enjoy your coffee break, thanks very much, folks.
LIVE CAPTIONING BY AOIFE DOWNES, RPR