16–17 May 2023
Vzdělávací komplex UTB - budova U18
Europe/Prague timezone

The CSNOG 2023 Meeting Report

The fifth meeting of the community of Czech and Slovak network administrators, CSNOG, took place on 16 and 17 May 2023. The CSNOG event is organized by CZ.NIC, NIX.CZ and CESNET. The program of this event is managed by the program committee.

Presentations and videos from this year's CSNOG  are available on the event website under the program section.

CSNOG 2023 in numbers:

  • 119 participants, mainly from the Czech and Slovakia
  • 20 talks
  • 7 partners:
    • GOLD: Wedos, Nokia, Salumanus
    • SILVER: RIPE NCC, ICANN, Quantcom, Alef Nula

This summary was written by Petr Krčmář, who is a member of the Program Committee.

May 16

Steve Jones: what to consider before deploying 400G using DWDM

400G technology is deployed primarily inside datacenters, but is gradually spreading to larger interconnection networks. 800G is close, but there is a big gap between marketing and reality. 400G is widely available and its deployment is growing rapidly. The difference between transceivers is the range, which ranges from tens of meters to more than 120 kilometers. There are very few single-lambda DWDM transceivers. They will certainly increase in number, but so far they are relatively rare. Nevertheless, the market is maturing rapidly and the situation is changing very fast. IP over DWDM allows passive DWDM to be connected directly to a switch or router. Some transceivers are capable of operating at lower speeds. The simplest way to deploy 400G is to use passive multiplexing that works with legacy transceivers. The 600G/64QAM technology is out of specification and not yet usable in production due to its error rate.

Matěj Pavelka: Anomaly Detection in Large Flow Datasets

Customers often use a mix of many different tools, which means operators should be able to use perhaps six different environments. The solution is to merge data sources into onea single interface, such as the open tools Grafana or Kibana, which provide instant answers to queries. Because I have no idea what data I will use next and cannot have it calculated in advance. Most open data analysis tools are not suitable for fast queries because they were written twenty years ago and calculate flows in hundreds per second at most, while we can deliver answers in seconds.

Communication between all "boxes" is important to support APIs and universal protocols. Detection systems are degrading and becoming more challenging, so splitting anomalies into groups and using dynamic boundaries may be a solution. Metrics should then end up in one open database that can also offer different aggregations. Traffic in networks does not grow linearly or quadratically, but grows exponentially. In a few years you will be moving at a much higher speed than today.

David Čermák, Josef Miegl: Next Generation Network Management

When it comes to network management, manual procedures are often used, but they have their limitations. Communication between teams can be problematic and changes in the network are often the domain of network administrators alone. That is why we have decided to transition to full automation, which should be service-oriented rather than solely focused on the network itself.
We use Netbox for infrastructure management. It's a datacenter management system that allows you to store information about your entire infrastructure in one place, and it's also very customizable with plugins so you can bend it to your purposes. The central point of automation is Ansible, which uses templates in Jinja. The entire workflow is implemented using Pipelines in GitLab, allowing us to preserve the historical state of the network. 
During the transition to automation you will encounter the fact that devices from different vendors have different configuration syntax and that you still need to make manual changes. For quick fixes, it is better not to deal with pipelines and templates, but to ensure that the configuration does not break with the actual state. A small solution can be incrementally improved instead of deploying a huge tool. Ultimately, the advantage of automation is that everyone can look into it and the network can then be handled by many more people than there are in the networking team.

Alexander Zubkov: Roleplay in BGP - what the new RFC 9234 brings

BGP violations can be divided into two groups: leakage and hijacking. In the case of leakage, leakage, the path goes to a legitimate target that has permission to report the prefix. In the case of hijacking, some other operator reports the path and the traffic is then routed to that operator. While leaks are caused by a configuration error, hijacking is usually part of a violation.
Internet networks have a hierarchical model of relationships between providers, subscribers, neighbors, and route servers. Leakage can occur, for example, when a subscriber starts acting as a provider for another provider. The solutions lie in community control, but each operator must be careful to avoid unexpected traffic and subsequent Internet damage. The new RFC 9234 solution, which was created in 2016, adds additional roles for networks and allows control of proper routing settings. Unfortunately, it does not work for multicast, VPNs or confederation. Many free router implementations already support roles, as do development versions of tools such as tcpdump and WireShark. Of course, there is still a lot of software that does not yet implement the new RFC.

Ondřej Vondrouš: FlowPing - a tool for active measurement of communication networks

FlowPing is a software traffic generator that can generate a specific traffic flow and continuously decrease or increase it. It can be controlled from the command line or load a configuration file with the complete test flow. It is also possible to define asymmetric links if we do not want to load both directions. The program can run in two modes: classic system wait and active mode, which uses at least one processor core. The output data is in JSON format and can be processed either individually or within a given time range. The program can measure delay, jitter, loss rate, duplicate packets, etc. It can also perform penetration tests and inject specific traffic. FlowPing is an open-source project licensed under GNU GPL 3 and is available on GitHub.

Petr Hemerka: Measuring Tools and Services of the MSEK system

MSEK stands for Measurement System for Electronic Communications and was built between 2018 and 2021. It is an important information system that carries legal obligations. It is operated in Prague, as close as possible to the peering nodes, and has its own ASN. The publicly available speed measurement tool is CTU - NetTest, which is available on the web and Android mobile app. As of September 2021, 1.5 million tests have been conducted, with 3,100 more being added daily. A non-public tool for CTU technicians is Exfo, which allows measurement of quality of service parameters at a fixed location up to 10 Gbps using portable terminals. For mobile network measurements, the 4drive-box is used to measure up to four operators on highways, rail corridors and in municipalities. It is used to measure radio and especially data parameters. It is built into a measuring vehicle that travels around the Czech motorways every year. The measurement servers are connected via a 10Gbps link to NIX.CZ and transit. Measurement results are available on the CTU Portal, where development criteria have recently been added and more information will be added. Plans are underway to increase capacity to the NIX.CZ peering node as the number of tests increases.

Josef Grill: Anycast Network on All Continents

Wedos faces regular DDoS attacks that can clog three 100GE lines. In 2022, the strongest attack in the history of the Czech Internet appeared with a power of hundreds of gigabits and a long duration. Therefore, Wedos started buying hardware and building an anycast network that expanded to 25 points around the world. Without this network, it is impossible to defend against major attacks. Wedos has its own standalone AS208414 system and is connected to several upstreams and local IXPs. Each of the 45 servers has a 40 Gbps uplink and is connected to each of the switches by a separate 10GE link. The WEDOS Global anycast network uses BGP anycast technology, has over 1500 physical servers and connectivity of over 2.5 Tbps. It provides protection at network layers three and four and protects sites at layer seven. The largest sources of attacks are filtered at the ingress.
In March, a total of 1.9 billion requests from 8.7 million unique IP addresses were logged on the network. 10.8 million requests were blocked by WAF. Most requests were handled from the Czech Republic, USA and Slovakia.

Petr Špaček: Latency Fluctuations on Authoritative DNS Servers

One of the users of the BIND DNS server performed an upgrade from version 9.11 to 9.16 and observed a significant random delay in some responses. Traditional latency measurement tools proved ineffective because they only give aggregate measurement information or are not designed to measure authoritative servers. The developers extended the dnsperf tool to allow detailed tabulation of results for individual time ranges. Scripts for the shotgun tool have been created to display the results. An AWS cloud environment was used for testing, which proved to be very consistent. 
The measurements then revealed that about 0.02% of the queries are unstable and return a different long answer for each measurement. The graph showed that this was noise, not rate limiting or buffers. The user experiencing problems had a large number of zones and was using the catalog zone. After creating the same setup, the behavior normalized, but the user was still having issues.
It turned out that the user was aggressively modifying the catalog zone with a script and a typo in the source code was creating a small hash table, causing high latency. The modification helped, but the problem was not completely resolved. The problem was in a newer version of BIND that handles data in multiple buffers, and the kernel itself allocates packets to threads. The kernel didn't know if a thread was busy updating catalogs, which was an architectural problem. Version 9.11 is unsupported and version 9.16 will be out in six months. Lesson learned: test and pay attention to edge cases because averages can be deceiving, and what appears to be noise may turn out to be the most critical aspect.

Maciej Andziński: finding sites for .CZ DNS servers

Currently, DNS servers for the .CZ domain are located in 13 countries on five continents and we are constantly adding new servers. We measure latency using passive TCP handshake response measurements, which gives us accurate data for each source IP address. We combine this data with Internet infrastructure information obtained from PeeringDB to automate the process of generating a list of locations where it would be worthwhile to add a new server.  The result is a simple table that shows a list of countries where it is worth adding another server. The latest version of this table shows the United States, India, and Hong Kong. Latency is only one method, of course, but it helps to address situations where connections suffer from too long a response time. However, other parameters also matter, such as the number of packets served per second.

Kryštof Šádek: Launching (another) anycast scope for DNS

The original state included four "letters" that are used to handle traffic from the .CZ TLD. The D-letter servers were shared with hosted partner domains, which was not ideal for selective RTBH and load balancing. It was necessary to add information in the RIPE database for anycast with two ASNs and create records for reverse zones.  In addition, ROA records needed to be created for the routs. Approximately 100 DNS servers from different vendors were configured, requiring 18 different configuration methods. Automation using Ansible and Netbox is essential, as is the use of the BIRD and FRR daemons to promote the servers using BGP. In the current state, we have six letters from A to F, with the first four serving the .CZ domain exclusively and the others hosting additional domains in addition.

Ondřej Caletka: deploying an IPv6-mostly network

We are still in the process of transitioning to IPv6 and the best transition mechanism is dual-stack, where IPv6 is added alongside IPv4. However, dual-stack does not address the shortage of IPv4 addresses, which is the main reason for deploying IPv6. NAT64 translates the IPv4 space to IPv6 so that the end-device can only have IPv6 connectivity, which works great except for a few edge cases. Mobile platforms are prepared for this situation and operators can push the network to individual vendors to determine how it will work. Apple, for example, won't release any app that doesn't work on such a network as of 2016. The edge cases are then handled by the Happy Eyeballs 2.0 algorithm and tethering is handled by the CLAT component. Android uses CLAT to solve the IPv6 problem, which does not require application developers to change. The situation on the desktop is worse because Happy Eyeballs 2.0 only works on Apple and CLAT is not available on Windows, Linux or ChromeOS. RFC 8925 allows IPv6-only network deployments using the new DHCP option number 108, which is supported by current versions of Android, iOS, and macOS operating systems. Support for this option is widespread, with 74% of connected devices declaring it during the last RIPE meetup in Belgrade. macOS activates CLAT from version 13 onwards with no special requirements. No special support for option 108 is required to run a similar network. Just add an entry to send this option to the DHCP server configuration. The RA PREF64 option has little support, but patches are appearing in other implementations to add it. This solution combines everything from a dual-stack network and an IPv6-only network and is the most complicated. Additionally, you still need IPv4 for legacy devices, and communication between IPv6-only and dual-stack devices can be problematic.

Marian Rychtecký: Automated Deployment of Cisco NXOS

The original dual-star architecture was easy to maintain manually, but NIX.CZ decided to expand, which made it unsustainable. Therefore, we built a circular topology with a capacity of 400 GE and 40 links. Automation using command line commands was not powerful and reliable enough, so we chose Cisco DME, which allows for quick configuration of devices and provides a lot of useful data for further processing. The DME works as an object tree, similar to SNMP, that interprets options into individual objects and translates them into the CLI. Initially this seemed like a complicated path, but we ended up creating a custom python library containing the template translated into JSON objects. It took us about a month. It is now possible to turn on various properties using this library, but some things still need to be entered directly in the command line. So after unpacking a new feature, we have to enter four or five commands in the console before we can configure it automatically.

May 17

Tomas Podermański: the limits of the Linux kernel network subsystem scheduler

The packet passes through the Linux kernel subsystem and produces a response. During routing, the packet is rotated in the kernel and exits the interface. The network card invokes an interrupt, fills the ring buffer, and executes the string processing functions. The packet scheduler queues the packets and schedules their further processing. The scheduler can be controlled using the tc command or by setting the net.core.dev_weight property using sysctl. A multicore system complicates the situation. Modern network cards are smart enough to distribute incoming packets between different buffers. This way we can parallelize processing very well, except for the packet scheduler, which only runs in a single thread. If you want to limit traffic, there can be a counting problem because you don't know which queue has handled how much traffic. Routinely locking the scheduler costs us a lot of performance, and throughput is reduced by an order of magnitude. This can lead to large delays for clients and an inability to achieve the contracted speed. The MikroTik platform is in the same boat, if you use anything other than sfq you will drop to around one million packets per second. The packet scheduler made it into the Linux kernel in 1997 and has remained virtually unchanged since then, which is interesting considering the speed increase. There was an attempt to update it in 2015, but it turned out to be such a complex problem that all the modifications were removed after the fact. It is not possible to improve performance in the current implementation, so a new parallel implementation is needed. If you are using Linux as a server, leave the mq discipline enabled. If you want to use Linux as a router, it is better to get a processor with fewer cores and a higher clock speed and not use a single-threaded interface for VLANs.

Ondřej Blažek: eBPF/XDP and L4 loadbalancer

The original BPF has been with us since 1992 and is now called eBPF. It is an efficient way of extending the kernel that does not compromise its stability. The code is verified and compiled into native instructions with a limit of one million instructions. With eBPF, you can monitor network traffic, syscalls, and other parts of the kernel. There are various net hook points that you can connect to and process data. The lowest point is XDP, which is closest to the net card and allows you to work directly with memory space. Another option is the tc hook and connecting to sockets. The advantage of eBPF is that the packet stays in kernel space and there is no need to allocate a processor core. This approach is compatible with the TCP/IP stack and is supported by most manufacturers of 10-gigabit and faster cards. Ideally, eBPF should be used for DDoS protection or load balancing.

Pavel Mráček, Tomáš Procházka: distributed NVME storage

Seznam.cz currently operates in three datacentres, housing 12 thousand physical servers with Intel Xeon Silver or AMD EPYC processors and 192 or 256 GB memory with 10 or 25 GE connectivity. They tackled the issue of port congestion during random data reads through testing and implementing QoS and scheduling on the server ports.. As an additional measure, they deployed ECN (Explicit Congestion Notification). This does not drop the packet, but adds congestion information to the IP header. Cisco also supports AFD (Approximate Fair Drop) to reserve a certain portion of the buffer for specific flows. Similarly, DPP (Dynamic Packet Prioritization) can be deployed to allow smaller flows to be prioritized into a faster queue.

Maria Matějka: BIRD 3: Already or not yet?

BIRD 3 is a new version that offers more performance than BIRD 2. However, there are still some problems, such as slower restart and communication with netlink. You can try to deploy BIRD 3, but you need to be aware of the changes in command line control. The developers are working on other improvements for both versions, including support for BGP Roles and the SNMP AgentX plugin. Support for BIRD version one will be discontinued at the end of 2023. Now is a good time to upgrade to BIRD 2.13 or a reasonably current version. This is because the developers no longer have the first version code in their heads and have to study the code again for any change. Kubernetes Calico also uses BIRD 1.6.8 internally, which someone will also have to sort out.

Jan Kolouch: NIS 2 and its transposition in the Czech Republic

The NIS 2 Directive was published in December 2022 and came into effect on 16 January 2023. Member States have 21 months to implement it in their own legislation. It is expected to be ready by October 2024. All implementing decrees, as well as the existing Cybersecurity Act, will be abolished.  The new directive will regulate medium and large enterprises and selected services. The regulation will take place in two modes: higher and lower obligations. The National Cyber and Information Security Agency (NÚKIB) received thousands of comments on the law, but there are still problems with definitions and reporting under the different regulations. Probably the biggest problem is that the law works with the concept of significant impact, but nowhere defines the degree of determination of such impact.Currently, there is a need to report under various regulations: GDPR, DORA, ENISA and others. The NCIB argues that there is a need for cooperation between authorities and it will take time. Similarly, the situation in which there is a system outage that prevents the submission of reports has not been addressed.  We are still at the beginning and it is still possible to comment and influence the outcome.

Jonáš Papoušek: coordinated vulnerability disclosure in the Czech Republic

CVD is a formalized process that allows ethical hackers to share information about vulnerabilities with the responsible organization. There are several ways to conduct ethical vulnerability discovery, but they can come with legal challenges. CVD should be built on accountability and the organization should evaluate whether it wants to deploy it. The advantages of CVD are that it is inexpensive, legal, and increases product security. Implementing CVD is a task for the NCIS as part of the action plan for the National Cybersecurity Strategy 2021-2025. Other countries have different approaches to protecting vulnerability discoverers. The Czech Republic is trying to find an effective way within the framework of current legal standards.

Zbyněk Pospíchal: IoD

The Internet of Things (IoT) is a breeding ground for many security weaknesses. This area has a certain subset - remotely controlled erotic gadgets - which are equally or worse off in terms of security than the IoT as a whole. The first such devices appeared in the 1990s, when security was not yet an issue. Since these devices now use Bluetooth and can then be controlled over the internet, it is possible to misuse such devices to collect personal data, but it is also possible to monetise them. One example is a webcam vibrator that had Wi-Fi and a known default password. Similarly, one manufacturer collected data on device usage without the users' knowledge. Lawsuits were filed one of the users received a compensation of four million dollars. The thing to remember is that lost company data may not be the worst thing that can happen to you.

Ondřej Filip: Root Zone KSK Ceremony

DNS was not initially designed to be a secure protocol, much like many other protocols. Later, the DNSSEC protocol was created, which works and is widely used today.  Then we started signing individual zones, starting with national zones. For DNSSEC to work globally, we need a so-called  secure anchor - a point of trust. In this case, it is the root zone. The process of signing the root zone was discussed extensively and eventually the roles were split between IANA and Verisign. The signing process is very transparent and takes place four times a year in a secure room with smart cards. Transparency is essential and the whole process is very robust because it is a sensitive operation on a database that contains data from every country in the world. Community representatives are selected on the basis of reputation, knowledge of DNS and DNSSEC technology and cultural or geographical diversity.

Blažej Krajňák: the DDR mechanism in practice

DDR stands for Discovery of Designated Resolvers and allows you to discover information about encryption support when communicating with DNS resolvers. The way it works is that once the client gets the addresses of DNS resolvers, for example from DHCP, it queries each of them for a special _dns.resolver.arpa record that contains the necessary encryption information. The client then attempts to connect over this encrypted connection, and communication must be established using a valid TLS certificate issued to the actual IP address. This implies that the device must have a public IP address.