This is not too different from wpa_supplicant used by several operating for key management for wireless networks. The complicated key negotiation and authentication can remain in user space, the encryption of the negotiated key can be done in the kernel (kTLS) or, when eBPF can control both sides, it can even be done without using TLS but encrypting using a network level encapsulation format to it works for non-TCP as well.
The model I'm describing contains two pieces:
1) Moving away from sidecars to per-node proxies that can be better integrated into the Linux kernel concept of namespacing instead of artificially injecting them with complicated iptables redirection logic at the network level.
2) Providing the HTTP awareness directly with eBPF using eBPF-based protocol parsers. The parser itself is written in eBPF which has a ton of security benefits because it runs in a sandboxed environment.
We are doing both. Aspect 2) is currently done for HTTP visibility and we will be working on connection splicing and HTTP header mutation going forward.
Bounded loop plus 1M instruction limits in the 5.4 kernel (no record at hands about the exact version), gives a large range of supported headers. Also note that these BPF code are on the network level, which is subject to the MTU limit as well, which usually is 1500 and now can be 10s of KBs (65,525 bytes maxmial in theory accroding to https://www.lifewire.com/definition-of-mtu-817948, but my networking knownledge is poor). These makes it possible to effectively handle all possible headers.
HTTP is actually fine.
HTTP2 will be a bigger issue as it has HPACK, and Huffman coding, that would be very complicated to maintain inside BPF runtime. I haven't thought about it closely yet. But based on our experience at http://px.dev, I am not aware of any glaring technical obstacles.
This is interesting and all, but I've also written bounded loop BPF code on 5.6 kernels, and it is not easy to get the verifier to accept seemingly obvious loops. I'm not saying it's impossible, I'm saying I'd like to see what this code actually looks like. I'd be a little shocked if it just looked exactly like Node's HTTP parser.
I need to double verify when was the bounded loop patch got into the kernel, I suppose it's 5.6 as you mentioned above.
What I actually was thinking is that one can write C code and ask the compiler to unroll it.
```
pragma(unroll)
for (..., i < 100; ++I) {
parsing code
}
```
Also the other comment note the stake bookkeeping for HTTP to maintain the state when the parsing spans multiple packets, assuming here we are talking about XDP probes.
One quick idea is to use BPF_TABLE(, uint128_t, some data structure)
I haven't tested if uint128_t is OK as key type. And the data structure in the value needs more thoughts. Roughly I am thinking turn any state bookkeeping into some BPF tables, and keyed through whatever data that matches the context. This probably means uint128_t as Ipve/6 address, and a nested map with key as the port. Or combined v4 IP & port.
It'll be interesting. I suppose the code from Isovalent will eventually be open sourced. Or is it already so? Haven't checked yet.
Bounded loops are 5.3. I'm just saying that after like 9 months of development following their introduction, it remained tricky to get the verifier to accept loops with seemingly obvious bounds. I know the feature works (I did ultimately get some loops working!) but I could not have straightforwardly ported userland C code to do it.
You've always been able to unroll loops, but of course you're chewing up code space doing that.
I don't know what BPF_TABLE is (I think it's a BCC-ism?) but BPF hash maps can take 16 byte keys. But notice that you're now writing something that looks nothing at all like Node's HTTP parser.
I'm not doubting that they did this work. I just want to know what it ends up looking like!
another challenges I can see is wheee to actually store the state of a connection. Even if we just focus on http/1.1 then not all headers will be received at one, and data from previous segments needs to be carried forward. Would it be eBPF maps? Those also seem rather limited for this usecase, and are probably also not extremely fast.
I can imagine getting something to work for http/1.1 - but http/2 with multiplexing and stateful header compression is a completely different beast.
If you are in a position where you can do that then great. Most folks out there are in a position where they need to run arbitrary applications delivered by vendors without an ability to modify them.
The second aspect is that this can get extremely expensive if your applications are written in a wide number of language frameworks. That's obviously different at Google where the number of languages can be restricted and standardized.
But even then, you could also link a TCP library into your app. Why don't you?
One of the methods that Cilium (which implements this eBPF-based service mesh idea) uses to implementation authentication between workloads is Wireguard. It does exactly what you describe above.
In addition it can also be used to enforce based on service specific keys/certificates as well.
It can do both. It can authenticate and encrypt all traffic between nodes which then also encrypts all traffic between the pods running on those pods. This is great because it also covers pod to node and all control plane traffic. The encryption can also use specific keys for different services to authenticate and encrypt pod to pod individually.
What the proposed architecture allows is to continue using SPIFFE or another certificate management solution to generate and distribute the certificates but use either a per-node proxy or an eBPF implementation to enforce it. Even if the authentication handshake remains in a proxy but data encryption moves to the kernel then that is a massive benefit from an overhead perspective. This already exists and is called kTLS.
Many people seem to make an assumption that kernel code is perfect and that when code is merged into the Linux kernel, it is automatically secure. That is definitely not the case. Kernel developers make mistakes as well and they have devastating consequences.
Right now, the security of the Linux kernel code depends on a combination of code review, fuzzing, controlling the pace of code changes, and running LTS releases to increase the chance others found the bugs already.
eBPF further increases the security model of kernel development by adding a verification step to the model. It means that there is an additional layer of protection in case of code imperfections.
The focus on eBPF safety is awesome. eBPF is software, software will have bugs, eBPF is no exception. The best way to improve the security of software is to question it. Given the wide spread use of eBPF in highly critical and exposed scenarios, the pressure on making it as bug-free as possible is very high so it's probably fair to assume that the scrutiny put in place, will lead to a high quality implementation of the verifier.
Another good way to get started is https://ebpf.io/. It features pointers that go beyond just networking and also cover usage of use cases involving tracing, profiling, security, ...
The shift from BPF to eBPF was less of an evolutionary step as the name might indicate. The overlap with the name BPF is primarily due to the requirement for eBPF to be a superset of BPF in order to avoid having to maintain two virtual machines long-term. This was one of the conditions for eBPF to be merged and in that context, the name eBPF made sense.
Disagree (see sibling post). Classic BPF could have been translated into any virtual machine design they came up with (because classic BPF is incredibly simple). When McCanne came up with the same design in 1998, his team called it "BPF+", for the same reason eBPF is called eBPF --- because it is pretty much an evolution of the earlier idea.
To be clear: the dispute over the history of BPF/eBPF is not interesting, and I don't want to litigate it anymore than they do.
I'm just here to say that eBPF and BPF are in fact pretty closely related. The eBPF design is uncannily similar to Begel, McCanne, and Graham's BPF+ design[1]; in particular, the BPF+ paper spends a fair amount of time describing an SSA-based compiler for a RISC-y register ISA, and eBPF... just uses (at this point) LLVM for a RISC-y register ISA.
Most notably, the fundamental execution integrity model has, until pretty recently, remained the same --- forward jumps only, limited program size. And that's to me the defining feature of the architecture.
The lineage isn't important to me, so much as the sort of continuous unbroken line from BPF to eBPF, regardless of what LKML says.
I'm not going to spam this forum with a marketing pitch so I'll just refer to https://www.isovalent.com/product and add that you can buy a Cilium Enterprise distribution with enterprise specific add-ons from us.
The Future of Networking? Networking is not only linux. eBPF is linux-only. Everyone else uses the secure variant dTrace, which has even wide-spread user-space support. So you can trace across the kernel, processes and its extensions/scripts. For decades.
Future of Security? eBPF is insecure. User-accessible arrays in the kernel can never be secure. dTrace did not do that for a reason, it was already compromised with the spectre-like attacks, and the fixes were laughable at best to safe face.
Linux might be advised to do better (or is just NIH?), but advertising Worse as Better was fashionable in the 80ies only.
I personally think that networking will be almost exclusively based Linux in some form. If you want to interpret it as "eBPF - The Future of Linux Networking" then that is totally fine as well. That said, eBPF-based networking can be offloaded to SmartNICs already so it may be less Linux specific than you seem to assume right now.
Comparing dTrace and eBPF is definitely a very interesting question. I've actually asked Brendan Gregg in the Q&A of his keynote at eBPF summit this year how he compares dTrace and eBPF these days. Here is his answer (jumps right to the specific question):
https://youtu.be/jw8tEPP6jwQ?t=4618
I doubt that eBPF will remain a Linux-only technology. Ports to FreeBSD are already underway it seems [0] and Microsoft declared intent to invest into eBPF [1]. I'm not sure what that means on timeline for eBPF availability on Windows though. There are also several user space implementations for eBPF which could become interesting to provide a universal programmability approach across traditional kernels like Linux, microkernels like Snap and application kernels like gVisor.
This is not too different from wpa_supplicant used by several operating for key management for wireless networks. The complicated key negotiation and authentication can remain in user space, the encryption of the negotiated key can be done in the kernel (kTLS) or, when eBPF can control both sides, it can even be done without using TLS but encrypting using a network level encapsulation format to it works for non-TCP as well.
Hint: We are hiring.