MAT Working Group session
25 May 2023
STEPHEN STROWES: Good morning. Welcome to Thursday the 25th May and welcome to the RIPE 86 session of the MAT Working Group. My name is Stephen, I am one of the Chairs of the Working Group. We have in the room, Massimo, one of the other Chairs, and we have on Meetecho, Nina. This is a hybrid meeting, so if you're on Meetecho, welcome. If you're in the room, also welcome.
We have two of our speakers online today and we have three of our speakers in the room, I think that's a good balance for a hybrid meeting and I'd like to see that continue as we go forward. Online is good, on site is also good.
This is our agenda for this morning. We received more submissions than we can accept, which is a good position for the Working Group to be in, so I think we accept around 50% of what we receive and we reject around 50%. That is fairly healthy I think for the community. And I think it puts us in a good position for sharing work between network researchers and negotiating operators. More submissions are always good, more interesting submissions are always good. We will court you for future submissions as we're heading into the next meeting as well.
In this session we're going to cover a range of topics and we have a full agenda so really I should be moving off the stage. We have six speakers in all and our first speaker this morning is going to be Marco. Marco is a professor at the.Sapienza university of home. His main research interests are in the field of traffic engineering and writing optimisation, software networking and network programmability. Marco, the floor is yours.
MARCO POLVERINI: Hi, hello everyone. Can you hear me? Okay.
Thank you for the introduction. So, the title of my speech is a digital Twin based framework to enable what have analysis in BGP optimisation.
So, I want to thank you the organiser for inserting my speech inside the agenda, I am very pleased to have this speech today and a bit sorry for not being present but online.
This is the outline of the presentation. So we will start giving some motivation and then I will present to you the digital twin technology, we will see some use cases, an overview of the proposed framework and finally a proof of concept.
So, the common approach used to configure and to optimise the configuration of the BGP is based on the so‑called tweak and pray approach. And generally, it means that before you want to change your configuration in order to get a given objective, but you have no clue of what is going to be the final outcome and you don't realise what it will be until you actually perform it. So, of course this paradigm can create network instabilities, it can perform, give performance degradation over even caution cause service disruption.
Wouldn't it be great to have a means to know what would be the effect of the of a BGP configuration change before actually applying it, so be able to predict what is going to be the outcome we get.
This is something that is known as "What if" analysis. And in different engineering sectors the execution of what if analysis is enabled by the digital twin concept n this talk I'm going to propose to you a framework that allows to create, manage and operate the digital twin of a BGP ecosystem. So enable what if analysis.
What is a digital twin? It's basically a faithful replica of a physical system. For instance, here, we have BGP system composed of four different autonomous systems, each of them has its own digital twin. A DT for an autonomous twin can be represented by an emulated network environment where it replicates the one of the physical elements.
Very important for digital twin is the continuous synchronisation that he has with the physical counter part, meaning that the digital twin has to replicate faithfully the current status of the physical twin.
Plus, we also need to interconnect all the digital twins of the different autonomous systems in order to get the digital twin of the inter BGP ecosystem.
So, why it can be useful to have such a digital twin. As I was explaining in the introduction, it enables what if analysis. Let's see some use cases that in our opinion can benefit from the availability of this framework.
The first one is balancing the incoming traffic. So, let's assume that these autonomous system here, wants to load‑balance the traffic, incoming traffic sent to this prefix that currently is unbalanced over the two different connections he has with his two neighbours. So, a possibility could be to perform an AS path prepending but we don't know actually what is going to be the final distribution of the traffic as soon as we apply the configuration change over the physical system. In this case, we can apply this configuration change over digital twin, and have a prediction of what is going to be the output we get.
Another use case could be this one, where there is this autonomous system here that wants to reduce the latency to reach this specific prefix. So currently the path he uses is this one. So he is wondering if it could be over in achieving his objective to create a new connection towards the perhaps an Internet Exchange point or maybe simply to buy another transit from another autonomous system. So in this case once more in the physical system, you don't get the solution until you actually perform the peering connection. On the other hand, if you have the digital twin enabled, you can simply try what is the effect of creating a new connection.
And another interesting use case would be the one of avoiding misconfiguration. So let's assume that this autonomous system which has two different upstream providers is changing this configuration. For whatever reason, maybe for a fat finger problem or something else, he starts forwarding updates that he receives from this autonomous system towards these other one. In this case the full Internet routing table is sent. Once more, if you have the digital twin technology you can, before applying the configuration change over the physical system, you can test it over the digital twin and see if the configuration is affecting the correct working of your network.
So this is an overview of the framework that enables the creation of the digital twin is mainly composed of three different elements. There is the digital twin manager, the digital twin itself and the communication module.
As you see here, this is the physical system. Each autonomous system has a digital twin manager that is in charge of managing the lifecycle of a digital twin, which means that to create and to continuously synchronise the digital twin with the physical system. Then there is the digital twin of the singleton must say system which basically is represented by an emulater network environment which is continuously synchronised with the network. As you see, all the different digital twins that are now disconnected, needs to be interconnected and for this reason there is another module here which is in charge of connecting the different digital twins in such a way that the digital system replicates the configuration of the physical one.
We have tested our solution in our proof of concept scenario where basically everything is running inside a single host machine, so either the physical system which is this one, and it's composed of four different autonomous systems, as well as each digital twin. So, in this case we are not running it in our real network environment.
The objective of this proof of concept is to load‑balance the incoming traffic inside the AS 4 and specifically it wants to change the routing path followed by the traffic originated by this prefix and spend it go to this prefix and the technique it's going to use is the AS path prepending. Initially the traffic passes over this path, so you receive through Internet zero interface. The framework creates the digital twin every single autonomous system and you have to imagine that here there is an horizontal block that interconnects the different digital twins.
In the test we have performed, as you see at the beginning, this is a TCP dump capture done in this interface here. As you see at the beginning, the traffic originated by the red prefix which is this one, and sent to the yellow is received through the upper path, so through the ethernet zero interface. After that the network operator he performance the what if analysis by performing this configuration change but not over the physical element but in the original twin. So it's running basically a what‑if analysis.
And as we see in this other figure here, which is instead a TCP dump capture perform over the ethernet one interface, as you see after the digital twin has converged, basically we start receiving traffic toward the down path. So passing through this other path, meaning that the AS path prepending technique in this case is able to achieve the objective the operator wants to get.
So, this concludes my talk. I am open to answer questions.
MASSIMO CANDELA: Thank you very much Marco. Thank you for your presentation. So do we have questions?
SPEAKER: Okay, I have a question. So last time they told me that I have to come here. So which mic?
So Massimo Candela, so you presented this system but I have a few questions. So the part when you say emulated network, so did you ‑‑ do you also have the software framework, do you provide that, do you have a link that people with use or it is using something that is known to exist or can you explain a bit better that?
MARCO POLVERINI: Yeah, yeah of course, thank you very much for the question. I have a bit of knowledge in the ‑‑ Yes, we are reliant on an already existing tool which is freely downloadable and available in the Internet, which the Kathara network emulater. Kathara is a software based on Docker containers, and each Docker container runs softwares which are wider based software switch so it can replicate the contraplane of an IP router. So, yes, it's freely, the community can use it.
MASSIMO CANDELA: We had Randy in the queue, but he disappeared now. Randy is back. Go with the question.
MARCO POLVERINI: Randy Bush, IIJ and Arrcus, I think he just answered my question which is: What if I have a newer router vendor with a different CLI? And I guess the answer is do not really go into the CLI level. No, no, we're not going in this moment at the CLI level. Anyway that's a good point, thank you for this question. The framework, I mean, in this moment it's creating the digital twin of an autonomous system relying on this Kathara framework, but in general you can think of changing it and perhaps using a different type of emulation like the GNS 3 which allows you to use images of real routers, so if you are using a Cisco router or a Juniper router, as soon as you have the e‑mail of the router you can directly use that framework to emulate your infrastructure, and in that case you have ‑‑ I mean the reliability of all the features of your real infrastructure as well as the CLI.
AUDIENCE SPEAKER: SIDN. As I understand it, you have built a very interesting framework for automatically running tests on digital twin of a BGP setup did you also allow for automatically collecting results and presenting them in a format that can be automatically parsed for instance?
MARCO POLVERINI: Good, question. Thank you for giving it. At the moment everything has to be run manually, the proof of concept that I showed you is basically we are acting in the CLI of one router and issuing a pink on the other and we are interpreting the result. We can imagine that we can automate this process, of course it requires time, and we can also imagine that ‑‑ we can create a graphical user interface to interact with the tool, perhaps creating a catalogue of different what‑if analysis to perform if you are interested in one specific use case or another, and we can create templates for showing the result of the test. For the moment, the tool doesn't do nothing of this, you have to do everything manually.
AUDIENCE SPEAKER: David Schweizer. I work for NETDEF. Do you have any provisions which you use with this framework to capture traffic from your real world application on maybe one end and another and then replay this in digital twin and analyse if there is any difference?
MARCO POLVERINI: Okay. Good question also. This one. What I have to say is of course the digital twin concept can generate very fascinating nice the idea of having a replica of your physical system. But of course, we cannot imagine that everything is up in the real system is also happening in the digital twin because they have to work to different time scales. Of course the resources the digital twin has are much lower than the one of the physical network, it cannot replicate all the traffic that is in the physical system also in the digital one. So we have to rely on some sort of high level description of the status of the physical infrastructure. For instance, we can imagine that together with the emulation framework, there is also a simulation framework which is based on synthetic data like instead of having traffic flowing over the digital twin, you can imagine that you have a high level description like the utilisation of all the links of of the current delay and so on and so forth.
MASSIMO CANDELA: Thank you very much. Thank you very much Marco.
And now we go to the next presentation. He is a well‑known contributor of our community, Romain. He is the deputy director of ISA research lab. It's a resource that many of you already know and he is going to present the Internet Yellow Pages.
ROMAIN FONTUGNE: Thank you very much. And thank you for having me. So, today I will present the Internet Yellow Pages. This is a collaboration with different people. But it really started with a discussion with my close colleague from RIPE NCC, Emile Aben, and it started when we realised that in our research, we are constantly asking the same question and I'm sure that people in the room if you are a researcher, a network operator, a peering coordinator, policy maker, regulator, whatever, we all have this question where we wonder what is that AS, what is this prefix? Who announced it had? How it propagates? Do we have Atlas traceroute for it? Is it an RPKI? Is it seen as an IXP routers? And all those questions, we usually get answers from a lot of different places, what is very nice with our community is a lot of groups give nice datasets, so we can go to RIPEstat, PeeringDB, CAIDA, Whois, and we can find some piece of this information. The dataset is really nice. The problem, though, is it's very laborious and we found that we have those creations that are going to take from PeeringDB merge is with some data in CAIDA and we all have these different scripts and we thought there should be a way we can do better than that.
So, we ‑‑ so that we need a tool where, a tool or a place where we can have all those datasets, and this should be open to anyone, everyone can use it, and so contribute and add new data to it.
Now, to do that, in the design we found that there is a threat of it could be just a simple GitHub repo where people dump data, but this is not very useful because it's still very hard to use. And on the other end of the spectrum, you could think of a database where you can query your data but then here the problem we found is that for traditional related database it's quite hard to do because you have to define that schema for all your datasets and any time you want to add a new dataset, you have to rethink the schema. So we thought like one of the requirements is so that we want something structurable, so something that is quite flexible.
And what we ended up doing is do build a knowledge graph for network resources. Now, if you don't know what a knowledge graph is, the idea is very simple. It's just a graph where you put your data and so each of the nodes here, you have an example. Each of the nodes will be a data point and this data point will be linked in the graph. I like to show this picture here because I think this is the situation where we are now. There is a lot of cool datasets out there, and you could see these are like very good books so you could have, maybe this collection is like your PeeringDB data and there is RIPEstat somewhere and CAIDA and BGP data.
All these books could be very nice and a very nice story, nice data, but it's very hard to merge them, to join information between the two.
So I think this step of pulling the data and putting it in a graph is very useful.
It's not only just importing the data. We also have this extra step where we have to add semantics to the graph. Smart people call this ontology, but what it means really is just that each of our nodes can have like very specific meaning. So we decide ‑‑ we define like different types of nodes. This is still changing but we have different types of nodes, different type of relationships, and if you look at this example here which is like an example from our knowledge graph, I think it's very intuitive. We start from this red node here, the red nodes are in this example domain names, so we start from ripe.net, you can see that it resolved to an IP address, that IP address is part of a prefix. This prefix is originated from an AS. It happened that this AS and this prefix are in RPKI object, called a ROA, and you can see that AS is also a member of MANRS, it's managed by an organisation that is called RIPE NCC, this organisation has a website, it's registered in the Netherlands, and we could browse even more the graph.
From this very simple example you can realise that it doesn't really matter where the data is from. We can easily browse the graph just using these semantics that we put on the node and the page. One the advantages of this knowledge graph is instead of doing a synthetic search where we know what the dataset looks like and we're going to try to filter our dataset and try to pinpoint like some rows in our dataset. In the knowledge graph you can make what is called a semantic search where you know the question you have and you just translate that question into a query. And that will give this thing.
Okay, so we have a prototype that is currently running online. This prototype is available at this web address. It might be like quite hard to use if you don't know, it's the database we're using. It has Neo4j as a querying language called cypher and an API called bolt. If you know those two things, then it's going to be easy. If you don't you might have to learn that. Again it's an ongoing project. We are building some interface around that database so that it could be more easy to use.
Currently we have data from 16 different organisations. There is like the most popular one, the one you are probably used to like Peering dB, we have some data from the RIPE NCC, MANRS, CAIDA, and there is some, maybe a bit more exotic dataset from like Research Lab, for example a group at Stanford, a classification of ASes, this Internet division at Georgia tech give a mapping of what ‑‑ of sibling ASes. We have data from the CloudFlare radar which I think is very, very nice, like CloudFlare radar gives us some stats and information about their quad 1 DNS resolver. So ‑‑ and we are adding new datasets almost weekly right now.
So, now, I'd like to show the tool. Stop me if I speak for too long.
I'd like to show a few queries to show you what you could do with that. But the application are kind of endless, there are so many different applications we can do with this. So this is just some very simple example.
Here in this graph, now I have wrote a query and I ask the most popular domain name that end with.nl. And those are ‑‑ let me zoom in here. Those are the red nodes here. So you can see like this u20.nl, it's a popular domain name, and the query is trying to find who is hosting those popular.nl domain name. We have the domain name, we have the IP request, the prefix, and the AS. And here we have the name of the AS. It's very similar to what I have shown in the slide.
The first thing you can see from this graph, just the structure of the graph is interesting, because you can see that there is a few clusters that form and when we look at what are those ASes, so every time I click on a node or link, you will see some information appearing here. So this is AWS or an AWS host of this.nl services. It's CloudFlare, there is ‑‑ I think this is SURF, Microsoft, and, if you look at the broad picture, it shows that well the good news is all those domains are not consolidating around only one Cloud service, so there is of course this big Cloud provider that hosts a lot of them, and for example, in Japan ‑‑ so, that picture is going to look different for each country and in Japan it looks like there is a lot for consolidation around AWS. Here I think there is a healthy spread of all those domains.
That's just one example. And a very similar query we can do is now for those domain names, I want to know which one mapped two prefixes that are not in RPKI. Because those are popular domain name we might expect that those are on RPKI, and you can see like there is a lot less nodes there. Which is a very good news. Most of the popular NL domain names are mapping to prefixes that are in RPKI.
And here, you can still see this big cluster here. This is Akamai. There is a small cluster here, this is fastly, and then there is a smaller provider here and there. And we could try to understand a bit better why those are not in RPKI. So I will zoom in the one of these, bring this one here. So, this is Amsterdam.nl. It's mapped to that IP and this prefix. This prefix is originated by KPN. But this is not in RPKI. So we could try to understand why. So if I click on this prefix, we can get more information about it. We got tonnes of things. I'm not going to explain everything. We can see it's RPKI not found. And that prefix was a /24. And we can see it's also part of a bigger prefix of /21. And that /21 has other information related to it. It's assigned to this opaque idea, we have a dedicated stat in the database, and now if we look at what that has, it has a few prefixes.
So, just to get up to here. I want to show here is that we found that this popular domain name mapped to a prefix. This prefix is actually assigned to an organisation which is this Amsterdam, my Dutch friend said it's the municipality of Amsterdam and these other people that should reg the prefix in RPKI. So just saying it's like not the fault of KPM to fix this, it will be for them to register that.
It's a very ‑‑ it's actually a simple example and it shows also how we could very easily go through tonnes of different things. We started from this domain name, this DNS data, and then we have BGP data here to see like who originate this prefix, we have RPKI data, and we have some geolocation information, we have dedicated stats, we have names from different organisations. There is also this classification I took, we can see that this AS is government and public administration AS.
I don't have much more time left. So very quickly, like we'll show a second example. It's ‑‑ imagine like I am in a parameter and I'm interested in coming to the Netherlands and deploy some infrastructure there. We have this query when we can ask what are the most ‑‑ what are the biggest eyeball networks in the Netherlands and the biggest transit network in the Netherlands? So these are all the yellow nodes you see here. So it is a big Dutch eyeball or a big Dutch transit, and we can ask which IXP they are a member of.
MASSIMO CANDELA: We are just, if you can do a closing remark on your presentation.
ROMAIN FONTUGNE: Okay. It's just to show this, we can see this kind how these things are deployed there.
Actually that's where I wanted to stop. So that's the Internet Yellow Pages. We have ‑‑ the last thing I want to say is we have this public instance, but you can deploy this ‑‑ what I was showing is on my laptop. If you go to the nub repo, you can get a doc image and a dump of the database and you can just use this locally so you can also add your own data if, if you have confidential data you can and your own data and do your own analysis.
If you want to discuss that there is my colleague Emile in the room and after the session I'd be happy to discuss and do another live demo. So please come and see Emile if you have more questions about this. Thank you and sorry I took too much time I guess.
STEPHEN STROWES: Thank you. We are running over but that's kind of on us, so we'll take this one question if you can make it brief enough.
AUDIENCE SPEAKER: Alex. I have a question about the architectures curse of a database when you mix all those datasets together. Can you say something?
ROMAIN FONTUGNE: The accuracy of the data is we are not controlling ‑‑ we are importing the data so we know what it is, if it's rubbish we're not going to import it. As soon as we see some interesting data we are going to integrate it to the Yellow Pages, and as soon as we find like problems in the data, we are going to just ask the person in the data to correct it. I think it's better for the community in general, we just ask that person to correct it and then so everyone can enjoy this correction.
MASSIMO CANDELA: Thank you very much for this presentation, really interesting.
And now we go with the next presenter. This time in person. He is a Ph.D. student at the university of again observe apps in France.
YEVHENIYA NOSYK: I am a PhD student and today going to talk about how Internet requests get intercepted and manipulated in the wild.
But before we dive in the manipulation part, let's just very quickly recall how DNS is supposed to work in theory. So here we have a very basic example with the client on the left. That wants to resolve the example.comdomain name and it's recursive. So the resolver is doing all the resolution, it first contacts the resolvers name server. The example.comname server and then when the final response is obtained, it's sent back to the compliant.
Now provide that had all the resolver cacheses are empty every DNS resolution is supposed to start at the route zone. So of course the route servers must be highly available and distributed all over the world. What we have right now is a 13 root servers letters which are announced at any Kase prefixes. However what we have behind is more than actually 1700 individual Anycast instances. So those instances there are slightly different in a way that some of them are local. Meaning that they only serve a defined clients, for example those could be clients from certain autonomous systems or roughly clients from a single country.
And then you have global instances. Meaning that those would be announced and available all over the world.
One important thing about the example I have just shown is that when we send such a query forexample.com domain name to a root server, we're not expecting to get any final response here. The root servers are not authoritative neither for example.com nor for any other second level domain name. So, the only thing we're getting here, we're expecting to get is simply a bunch of referral to the TLD name servers.
But it turns out that sometimes things do not go as expected and some weird things can happen. What happened two years ago, is that it was reported that some clients from Mexico would not be able to access the Whatsapp domain name, and then after some initial troubleshoot it go turned out that even more domain names were affected, and there was some animal contacting the k‑root server.
So, the very same problem was reproduced using the RIPE Atlas probes. And what we have here is a RIPE Atlas probe on the left, located in the Mexico, and this probe is requested to resolve the facebook.com domain name, but by contacting the k‑root server directly. And than surprisingly enough we get the IP address in response. This is surprising for at least two reasons; the first one is that the k‑root server is not authoritative for facebook.com so we're not actually expecting any response here. And the second problem was with this response is that the return IP address is actually bogus, it does not belong to Facebook but it actually belongs to Twitter.
Now, to understand where the query was routed to to which instance of the k‑root server, that was send a TXT query for the server just to get the identifier of the instance and it turned out that the query from Mexico was sent to the K‑root instance located in China. Here is another problem. This root server instance is local and it it is not supposed to propagate beyond China. Not surprisingly it was confirmed by the K‑root operator a that the route server instance was not expected to propagate beyond China, it should have stayed there and of course it was made clear that K‑root or all the roots or operators they do not serve bogus data so the clients must have encountered some middle boxes on the way.
So the first question I wanted to answer was was it the one time event or the K‑root instance was the casely reachable from outside?
And so to answer this question we have relied on RIPE Atlas built in measurements, and in particular, we have this measurement running on all the connected probes towards K‑root servers that are requested in the identifier of the responding instance. So we have analysed the measurements two months before the event was reported, and nine months after. What we have seen is that at least two months before the event was reported, so in September 2021, the local instance would be already reachable from the outside. We have seen that 57 probes from 15 countries would be reacheded that K‑root server instance. Now even after the leak was fixed the instance would be still occasionally reachable from outside. This time over IPv6 and this time from much less probes.
And finally, as we have seen later, I'll show later, that those probes would also experience injected responses for the facebook.com domain name.
And so the second question we wanted to answer was slightly more broad. This time we were wondering actually how many RIPE Atlas probes they received injected response when contacting DNS root servers.
So we can once again rely on RIPE Atlas to answer this question. But this time we have set up our own custom measurements. So what we have done was we requested each connected RIPE Atlas probe to send a bunch of DNS requests targeting all the root server letters over both IP proposals, both transport proposals, we have requested the A AAAA research records and the domain names of query, google.com, facebook.com and ripe.net. One more important thing about those measurements is we have asked to include the NS 80 option in responses. So in this way we could know which instance is actually sending us the response. So we have run these measurements during the nine months in 2022. And we had more than one billion measurements to analyse in total.
So, when analysing those responses we have broadly divided them into categories. The first one is non‑injected, when we did not receive any response to our queries, and that's something we expect because we were sending queries for second‑level domain names to root servers, they are not authoritative for any of those, so we're not expecting any response.
However, there were less than 1% of queries where we have received some kind of responses.
So taking a look at those.
The most common injected response time was A type. We had received more than seven million of those, and interestingly enough, the great majority of responses were actually valid. So, we have seen that almost 50% of facebook.com and almost 90% of google.com DNS queries would result in correct responses. The IP addresses returned were belonging to those entities even though once again it's unexpected to see it from root servers.
Then we see quite similar transfer for the AAAA resource record, which is the second most popular response type, and here we see that the ratio of correct responses is even higher.
Then we got a bunch of URI responses, ASO response received on one RIPE Atlas probe. And finally we got a bunch of C‑Names that were pointing google.com to four safe search google.com, which is a filter and service provided by Google and it is there to exclude explicit content from the research results.
Then we were also wondering who was actually sending those responses to us. So as I mentioned, we requested to include the NSA stream of whenever possible in all the response that is we received and we analysed those for, from all the 1 billion measurements that we received, whether injected or not. So we got more than 12,000 unique name server identifier strings, and we managed the majority of those to either the root server instances, public resolvers, filtering services, but then also some of them were unclassified or empty.
Now, not surprisingly, whenever we received the injected response, it never came from valid root servers, so, in the great majority of cases, those were actually NSID were empty so we could not fingerprint the authorities, but otherwise we were getting responses from public resolvers, filtering services and some other unclassified entities.
Speaking about the persistence of this kind of manipulation. So what we did in the figure on the left is that we plotted the ratio of the RIPE Atlas probes that received injected responses to all those participating in the measurements per week. And then we also did the same for RIPE Atlas measurements. As you can see during the nine months of the experiment, the ratio of response injection it gets relatively stable. We see that less than 1% of RIPE Atlas measurements, they experience ‑‑ they result in injected responses, and between 3 and 4% of RIPE Atlas probes per week, they also receive injected responses.
Then speaking about the duration of manipulation. So, this figure on the right shows that slightly more than 20% of RIPE Atlas probes that were receiving injected responses, they did it during all the nine months of the experiment. So for those, the response injection was constantly present.
Now, the question is how can we avoid this kind of manipulation so that end users do not receive injected responses?
So, one idea that was proposed back in 2018 is to include some geographical hints in BGP announcements of Anycast prefixes so that whenever the router at the destination network, they receive several announcements they can chose the instance which would be geographically the closest, and in that case for the users in Mexico, they could have chosen one of the instances located nearby, because they did have some.
Then the second counter measure would be to use the QNAME minimisation because once again for those users in Mexico, once sending the query to root name name servers it's not necessary to send the whole.net query as we are not expecting to receive the response. So the dot net query would be more than enough in that case.
Finally, one could do encrypted DNS so that some on path entities do not see the traffic. However, it is not yet deployed by recursive resolvers and also name servers. It's still a work in progress. Finally one could do DNSSEC. However N that multiplex ken case, the query domain name was not DNSSEC signed, and the resolver destination network was not doing DNSSEC validation.
So, what we have just seen is that almost 7 percent of RIPE Atlas probes participated in measurement, they received an injected responses when contacting DNS root servers. What we also saw is that in the great majority of cases, those injected responses were actually valid. So, this kind of manipulation will stay transparent to end users.
However, it should be noted that this kind of DNS filtering, it can propagate beyond its intended scope, and that's exactly what happened for the clients in Mexico in November 2021, when there happened a route leak and they experienced colateral damage from DNS manipulation somewhere else.
And finally we saw that the BGP leaks can stay unnoticed for a long period of time.
So that's all I want to talk about. Thank you.
MASSIMO CANDELA: Thank you very much. It's time for questions. We already have a queue.
AUDIENCE SPEAKER: Your research probes just injections which I think could be avoided. Do you think we need to implement a recursive DNS resolver in each customer device?
YEVHENIYA NOSYK: I think in those particular cases it wouldn't even help because the injection does not take place in the end client network, it takes place somewhere far from them. They do not have control over what's happening in those networks. However, I do believe that those counter measures that could decrease the probability of receiving injected responses.
AUDIENCE SPEAKER: I mean, if the injection performed on public DNS servers, then we need to make direct request to root servers.
YEVHENIYA NOSYK: As far as I know in the great majority, not the great majority, I don't know in how many networks, but you are often not allowed to query some, query particular DNS servers, whether it's a resolver or a rotation named server so you will be intercepted for example in your university network your queries would be intercepted in any case.
AUDIENCE SPEAKER: But we are working on safe DNS requests, weren't you?
YEVHENIYA NOSYK: Yes.
AUDIENCE SPEAKER: Bonjour, Kostas Zorbadelos, CANAL Telecom. Very nice work and thank you for sharing it. I have just one question. During all your measurements in the nine months or whatever, did you manage to find a pattern of probes, of Atlas probes that were affected by these injections where they all in specific ASes, in specific countries? Did you manage to categorise them somehow?
YEVHENIYA NOSYK: So, regarding countries, we have seen that those probes were located in 66 different countries, and I think all the RIPE Atlas probes they reside in 170 countries at least, something like that. At least those that participated in our measurements. So it's spread geographically, it's not happening in some particular countries.
AUDIENCE SPEAKER: Chris. RIPE NCC. Thank you, interesting presentation. You probably know that we have a built in measurement 30,0002 which uses the locally configured probe resolvers but with popular domain targets. So obviously that wouldn't be useful in this case because it's using whichever resolver the probe has configured. Do you think it would be useful to have a similar built‑in measurement which is rotating through popular domains but also populating through the root servers directly? So on an ongoing basis rather than a single campaign?
YEVHENIYA NOSYK: I guess here the problem is not about contacting the root servers but the problem I think is in contacting any particular DNS server. Sometimes it's just not allowed in your local network. So, this kind of measurement could be done, but it should not necessarily target the root servers, it can be whatever DNS server that is not pre‑configured.
MASSIMO CANDELA: Then, thank you very much.
I also would like to remind that if you want the RIPE Labs article competition, which is a competition that happens every RIPE meeting so keep an eye on that. And we can go to the next presentation.
Nurullah Demir, a cyber researcher at the Internet of security and intelligence systems security. His research focuses on web security and privacy and with measurement along with i.e. security. The stage is yours.
NURULLAH DEMIR: Thank you very much. Good morning. Thank you for having me here. I'll today present our study understanding update behaviour of websites and its impact on security. So our goal was to understand how ‑‑ if the inaudible on the web pages are updated or what are the impacts when we use outdated information.
So, when we looked at the modern architecture, it's pretty complex. We have on the client side, too many competents, security groups, libraries, frameworks and also we have on service side programme languages, web servers, different programme languages and databases, and hosting management systems and many other third parties.
So, it's not really an easy task to keep the whole web environment secure because of this diversity, and any software product in this chain can endanger the whole web environment. And one way to keep the web environment secure is ‑‑ or one of the ways to keep the environment secure is processing the software updates and our motivation was in this research to understand how up to date utilised software is on the web and the security implications of utilising outdated software. And that's why we conducted large scale analysis on 5.6 million websites. Here is an overview of our method.
So we had used different open source datasets like we have firstly imported data from HTTP archive, this is an open source platform that has millions of pages monthly and makes the results openly available, it contains different metrics like requests, responses, or performance data, or software utilised by the web pages, and so it was for us in testing to use this dataset, and we have imported the data from these datasets for 18 months. And so we now which website uses which software and with which software region, but we still need know if the utilised software are up‑to‑date or outdated so for that we needed registry and for that, the reason ‑‑ so, many of web software companies are open source and we have collected the release history of the software that we could identify via GitHub API, and for the software companies that are not on GitHub, we had to collect the release history manually so. Now we know, at the moment we also know after this step if software version is outdated or not, and to identify if a software content is vulnerable or not, we used the database from the NVD, NVD provides vulnerabilities for publicly well known software component, so after this step we also know if a software component is vulnerable. And we have done also such ‑‑ we have done many data operations like we have convinced all the software version to a standard so to make all the software versions compatible with like when we identify a software version from the HTTP archive and compare is with a release history to make all the software versions comparable. And we also dropped all records with polluted data.
So now we turn to our results. An overview on other dataset. We identified ‑‑ we had 5.6 million distinct sites, and about 32,000 releases. And we had also about 342 distinct softwares and 147 thousand vulnerabilities in our dataset.
So now we checked the update behaviour of websites. We had a median of 3 available software products better website, and we identified that 94% of all observed websites were not fully updated. So that means at least one software product on the website were outdated and it wasn't really so that there was only a software product was outdated but 74% of software on the web pages were outdated.
And adoption of the software releases. So each month there was a new update for 67% of software, and what I personal ‑‑ what's also interesting is that the website operators process an update every hundred days, so every 5 months. And the most processed release type was a patch. So a patch releases were mostly processed.
So, what happens when we utilise outdated software? Now we turn to the visibility of the websites. We have seen that 94% of websites contained at least one vulnerable software. And 12% of websites had at least one critical vulnerability. So, that means the common vulnerability current system was 10. So that means that the vulnerability can be exploited easily and the impact is maximum.
And another interesting thing was that when we process updates on any software component, vulnerability score is reduced from 6.2 to 2.4. So it has a significant impact.
Of course, we had some limitations like HTTP archive, also a contributor. Only close landing pages so it doesn't crowd Internet pages and it doesn't interact with the websites, and the limitation, the other limitation it is comes from Wappalyzer, which is built in the HTTP archive, pap lies err may not identify all TTL software and the database NVD database, there are some discussions about its reliability, but it's still the most common used database when it comes to vulnerability of software components. And that also may be some conditional vulnerabilities like vulnerability can require ‑‑ so it may require software ‑‑ another software component or maybe a function that must be activated, so such conditional ‑‑ there may be such conditional vulnerabilities. We didn't do any ‑‑ so we haven't done any validations on such conditional vulnerabilities.
So, in conclusion, our measurement tries to highlight the current state of the web and showed also the update behaviour of websites over the course of 18 months and shows most of features of software on websites are very old and mostly about four years old. And records that 95% of websites contain at least one vulnerable software. But again here, this must be seen as an upper bound because of the limitations. However, website providers, and we as the web community, need to take more care about updates.
Thank you very much. Questions please, if you have any.
MASSIMO CANDELA: Question time. Otherwise I have two myself. Okay, then I go ahead to the question mic pooh.
So, the first question I have is: You talk about this vulnerabilities and you collected data. What is the most common vulnerability or the most common vulnerabilities you find and what were you able to do in practice? Because I mean there are various possibilities so it would be nice to know if you have a statistic. And the second question is when you say in the software you deDetective Garda a version of not vulnerable software, so you go on a website, you detect the vulnerabilities. Can you say how that happens, how do you detect a specific vulnerable version of the software?
NURULLAH DEMIR: So the first question is, so our top list in our paper we had a top list of the most common vulnerabilities, and it almost matched the OS top ten, which is industry standard, and the most common vulnerabilities were absent input validations and the most common vulnerability was inaudible and it was about 90% of the, of websites, and also the vulnerabilities like CSLF, or he is co‑injection or buffer overflow were still remaining on the top of the list, and to your second question, so there is a standard, a CPE, so in CPE ‑‑ in the NVD databases the entries are in a CV E format, so when you get a vulnerability for WordPress and it has an identifier in CPE format, so, it's also a standard, each software product, so to each software product will be an unique assignment ‑‑
STENOGRAPHER: Get the hell out of here fast!
Phew! False alarm!!
STEPHEN STROWES: Everything is fine! It was a test. Unfortunately they told us too late, we had already walked at least 1 kilometre when they told us. But I was with the next presenter, he was with me so I was sure that without us nothing was going to go ahead. So thank you very much for being here, we are going to try to take a few minutes from the break because otherwise there's no other way around this and the next presentations are very important so I think we can just go directly ‑‑
STEPHEN STROWES: So, what we want to do is restart the session, and Nurullah had just taken I think the last question. Maria was in the queue, I don't see Maria here, but can we thank him for his presentation.
MASSIMO CANDELA: It's your first time, it usually is not like this.
STENOGRAPHER: Don't mind him, it's always Kay mad! Ha ha.
FARZAD EBRAHIMI: Thank you for coming back after that. I really tried to avoid presenting here, but...
I have to be quick. Also I want to give time for Robert who is going to do a retrospect on what the RIPE NCC has been doing I think this is an important and maybe unvalued little thing thing that we do in the RIS project.
I care deeply about the things we do with RIS, I am a product owner there.
So, what I want is to redo the RIS. I'll explain what it does, it doesn't make a lot of noise. Authoritative plans is quarter 3 to redo these things and he we want your input. I talked to a bunch of you already but I thought this was an excellent stage to actually just put up, receive feedback here or in the hallway on e‑mail on what we should be doing if we redo this. So, that's why I want to be on this stage presenting even after a fire alarm.
RIS beacons, what is it? Ris is the route collection project and one thing we do that people may not know is that we have beacons, so, we have prefixes, we put up and tear down at specific intervals. So, it goes up, goes down, and by that you can basically measure how it propagates over the Internet and, well the time is too short to do a live demo, otherwise I insist that we actually do this as how it propagates through the room but I won't do that to you.
So, we have two flavours of this. We have the route collectors, 2 dot route collectors all over the world and each one of them has a stable prefix err that's always on and one goes up two hours, down two hours, and it looks a bit like this. So you expect it to be very, one announcement, one withdrawal, but there is a lot of interesting things happening here which is basically that's how the Internet behaves at this point in time.
So it provide you with, I think, valuable input on, and has been doing this since I think 2005. And here you have a list of all the prefixes that we are using for this.
We have a couple of specials.
We have prefixes where we have different RPKI states for it, we do Anycast failover and we have a fast one that does every 20 minutes up, 20 minutes down.
And why would we want to redo this because it's already quite nice?
Well, we have our IPv4 address space for this is almost all used up. So there is 46 /24s and v4 ran out I heard, so, what we do is that the things that we need to be doing, can we make room for other potentially more useful things? That's what I want to have discussions about.
We get requests from researchers,
ANDREW DE LA HAYE: Can you do this, can you do that? We have cool research papers like Randy presented work on Tuesday on like an encrypting RPKI key materials. So, should we be doing these types of things? I think we should but I would love it to ‑‑ love people to tell me yes that's a good idea, no, that's to bad idea. And I also want to be a responsible steward for this project. For instance, how much BGP churn is acceptable? Is it every 20 minutes or ‑‑ and this is a part from a paper I really like and it has an ethics part that actually mentioned that in their experiment they at some point did half a percent of all the BGP updates worldwide, which depends on the purpose, depends if it's useful enough, if that's good or maybe a bit too much.
For the current beacons, I actually looked at the upstream diversity of them. So how many networks do propagate from particular places, with the idea that may maybe someplaces are not that useful so we can actually reuse these prefixes. And I think we have a couple of places that are maybe not that useful.
For instance, there is a couple of 0s here, basically meaning we have a prefix that we announce but nobody propagates it or we have a 1 here in Japan, which basically means our single upstream that propagates, we basically measure how that upstream propagates from Tokyo, is that useful? I don't think so, but I don't really know, and also specifically if people are depending on these being available, I would love to know. I think for the bigger ones, the RC 00 for instance, that might be the one that a lot of people use.
And of course we can keep doing IPv6 for all of them. And just remove some of the v4 prefixes so we have some room for other experiments.
Some ideas there. Longer than /24 prefixes. We did a fun experiment with ARIN space, we gave that space back so we stopped that experiment because responsible stewardship of that address space, it was ARIN experimental and if you are not using it you give it back. But we could redo this experiment with the address space that the project already has.
RPKI experiments, ASPA experiments, I mentioned this research already. We could do fun things with Anycast. The probably there is if we I didn't cast from RIS because BGP typically doesn't give you the back what you give it, we won't see it in RIS any more, so we need to be a bit smarter there maybe. GOST/zoomey brought characterisation. Unknown BGP attributes. I had a version of the slides where I crossed this out because we did this in 2010 and it caused a bit of a problem, operational problems around the Internet. So we probably shouldn't be doing this. But maybe there are experiments in this area. She should be doing, so if so, I would love to hear it.
The fast beacon, is it interesting or not? Or even faster if we flap every minute and then for an hour, two hours, and then stop? Is that proper use of RIS? I think it is, but I would love to hear people's ideas.
And insert your idea here.
I want to write this up in the next couple of weeks on RIPE Labs if we have a final plan and before that I would love it if people would actually give me some feedback on this. And that's all I have. There is e‑mail addresses here, there is hallway, RIPE Labs that you can read up on within the next couple of weeks on this. So that's it for me.
STEPHEN STROWES: Thank you Emile. Ordinarily I would ask questions now but we're right in the coffee break, so questions to Emile please or in the MAT Working Group mailing list would be super welcome.
And last but not least we have Robert, also of the NCC to give us a regular tools update.
ROBERT KISTELEKI: Good morning everyone. I no longer know if I'm over time or within time but I will try to spare time for you and the jokes about burning discussions and heated discussions.
This is the tools update. I work for the NCC. And I usually give this update at the MAT Working Group. So I will do it again.
About RIPEstat, so one of the focus points for RIPEstat has been service quality and monitoring the team implemented an internal, I would like to call it, highly advanced way of testing the system itself so that any kind of bugs and problems are not surfacing and you don't see them. That said, every now and then we have problems. Most of the time with the back‑ends that are behind RIPE Atlas which are dozens and some of them misbehave now and then and we are trying to protect you as the users from those as much as we can you go the whole purpose of this exercise is to be aware of those and catch them early on, including the potential bugs in the apps.
And of course, this led to identifying and fixing some of those bugs in the API.
RPKI history, so, for every AS query now you can do RPKI history as well, so that's really nice, and some internal optimisation is on the pipeline that are not necessarily user visible but you might notice that the questions are, or the answer is coming faster and so on so that's ultimately good.
We have been cooperating with M‑Lab for a long time now and exposing their data via RIPEstat as well in particular bandwidth graphs have been available. We are in the process of restoring those for various reasons it was not available recently and it will be live soon.
Finally, you may have seen that RIPEstat changed its default user interface, the code name is 2020, in practice I think it was 2021 or 2022. So that's the default UI nowadays. It is where we put most of our UI work. But the old UI is still available and the plan is that we will open source it once we get some interesting bugs and code clean ups done. We would really like to tell if you there were particular features in the old based UI that you really like and would miss if it went away, because then we are much more motivated to put that into the new UI as well. Eventually, it will go away, the old UI, so please talk to us if you have any use cases you can only do in the old UI.
There are three major things going on and some other major ones internally which is not on the slides, but I would like to talk about these three.
The user interface renewal is in progress, we had some setbacks because of resource limitation, but now we are trying to revamp that again, we will use improve the user experience, the layout will change here and there all for the better hopefully. We have already renewed the documentation, so, if you go to/docs, it is much more useful, easier to search and use. The Ambassador support is rolling out, we are getting it beta test and promotional pages are up next and more and more functionality will be converted.
Another focus point is we are work on the probe packaging. We want to make it easier for ourselves to develop it because now we have to maintain six or seven different architectures in branches and everything in parallel, which is not easy. But also one of the key points here is to make it easier for you to install software probes on various Linux platforms, RedHat, Debian and so on. And we are cooperating and want to cooperating more with open source packageers, so for example, if you happen to be one in Debian space, we would love to you talk to us because we want to know what the quirks in Debian in particular so that we can make it easier to package it up officially.
Finally there is a large chunk of work in the background going on about renewing our big data storage engine. We, as you probably know we have been storing RIPE Atlas measurement results for 13 years now. And we have not thrown out anything. It's in a system at the moment, we are looking at what are the options to renew that technologically, maybe use the Cloud, maybe not, make it easier and nicer for the users to use it, and operationally cheaper as well.
We published an article on RIPEstat called API usage anti patterns. If you are using Atlas and especially if you are using Atlas extensively I would like you to take a look at the article because there are some hints on how to do this nice to the system and what are the quirks that if you are aware of, then your usage will also be more optimal for you not only for us. So that's good.
And finally, we have heard signals that some people may want to abuse the software probes to run what is officially called probe farms. So lots of probes just to make credits. So, you can expect a proposal from us coming out to deal with this somehow, limiting how many of those can exist at the same time and so on.
All right. IP Map. Various improvements in IP Map. I will not go into the too many datasets but the concept is IP Map is an infrastructure geolocation service, it really aims to geolocate router interfaces and other infrastructure components. It may capture end users as well but that's not the component and it has multiple engines behind it so different ways of coming up with the answer and the mixer or combiner that makes an ultimate decision on where we think those IP addresses are.
So various improvements on the existing engines behind that. We are working on a crowd sourcing engine. You may have heard about the geofeeds topic recently so, that's something that we also will, we want to incorporate because it's a high quality signal in this engines that we're thinking about as well.
Finally RIS, this is the complementary slide set to what Emile said. One of the things that happened recently is that the team managed to optimise how we produce the MRT dumps, so they actually appear much much faster now publicly on our FTP site and for reasons, the actual size of those dumps is also smaller.
And finally, I will not go into the datasets of this slide. But the point is, I guess we will present it multiple times over time. It tries to illustrate that we would love to be closer in each country or each region with the RIS peers should be closer as much as possible to our collectors, so that the quality of the data is increasing, and if you are really interested in the datasets of this graph, then talk to Emile or me perhaps.
And that was it. I was trying to be fast enough.
STEPHEN STROWES: Thank you Robert. I think we're good on time. Thank you Robert once again for your good work and thank you to the room for interrupting your coffee break after the unplanned interruption. Please remember to rate the presentations. If you want to contact us, that's the Chairs e‑mail address up top and if you want to participate in the mailing list, please do. The mailing list is open to everybody and we welcome any and all discussions on anything that you saw here today or anything you would like to see covered at a future RIPE meeting. I think, unless there is anything else to add Massimo, I think that's us.
MASSIMO CANDELA: Thank you very much, thank you very much for coming back especially.
(Coffee break ‑‑ well a very small coffee)