More

tgraf · on Dec 9, 2021

Then you should interview again but with us.

This is not too different from wpa_supplicant used by several operating for key management for wireless networks. The complicated key negotiation and authentication can remain in user space, the encryption of the negotiated key can be done in the kernel (kTLS) or, when eBPF can control both sides, it can even be done without using TLS but encrypting using a network level encapsulation format to it works for non-TCP as well.

Hint: We are hiring.

tgraf · on Dec 9, 2021

The model I'm describing contains two pieces: 1) Moving away from sidecars to per-node proxies that can be better integrated into the Linux kernel concept of namespacing instead of artificially injecting them with complicated iptables redirection logic at the network level. 2) Providing the HTTP awareness directly with eBPF using eBPF-based protocol parsers. The parser itself is written in eBPF which has a ton of security benefits because it runs in a sandboxed environment.

We are doing both. Aspect 2) is currently done for HTTP visibility and we will be working on connection splicing and HTTP header mutation going forward.

tptacek · on Dec 9, 2021

What does an HTTP parser written in BPF look like? Bounded loops only --- meaning no string libraries --- seems like a hell of a constraint there.

star-trek-fleet · on Dec 10, 2021

Bounded loop plus 1M instruction limits in the 5.4 kernel (no record at hands about the exact version), gives a large range of supported headers. Also note that these BPF code are on the network level, which is subject to the MTU limit as well, which usually is 1500 and now can be 10s of KBs (65,525 bytes maxmial in theory accroding to https://www.lifewire.com/definition-of-mtu-817948, but my networking knownledge is poor). These makes it possible to effectively handle all possible headers.

HTTP is actually fine.

HTTP2 will be a bigger issue as it has HPACK, and Huffman coding, that would be very complicated to maintain inside BPF runtime. I haven't thought about it closely yet. But based on our experience at http://px.dev, I am not aware of any glaring technical obstacles.

tptacek · on Dec 10, 2021

This is interesting and all, but I've also written bounded loop BPF code on 5.6 kernels, and it is not easy to get the verifier to accept seemingly obvious loops. I'm not saying it's impossible, I'm saying I'd like to see what this code actually looks like. I'd be a little shocked if it just looked exactly like Node's HTTP parser.

star-trek-fleet · on Dec 10, 2021

I need to double verify when was the bounded loop patch got into the kernel, I suppose it's 5.6 as you mentioned above.

What I actually was thinking is that one can write C code and ask the compiler to unroll it.

``` pragma(unroll) for (..., i < 100; ++I) { parsing code } ```

Also the other comment note the stake bookkeeping for HTTP to maintain the state when the parsing spans multiple packets, assuming here we are talking about XDP probes.

One quick idea is to use BPF_TABLE(, uint128_t, some data structure) I haven't tested if uint128_t is OK as key type. And the data structure in the value needs more thoughts. Roughly I am thinking turn any state bookkeeping into some BPF tables, and keyed through whatever data that matches the context. This probably means uint128_t as Ipve/6 address, and a nested map with key as the port. Or combined v4 IP & port.

It'll be interesting. I suppose the code from Isovalent will eventually be open sourced. Or is it already so? Haven't checked yet.

tptacek · on Dec 10, 2021

Bounded loops are 5.3. I'm just saying that after like 9 months of development following their introduction, it remained tricky to get the verifier to accept loops with seemingly obvious bounds. I know the feature works (I did ultimately get some loops working!) but I could not have straightforwardly ported userland C code to do it.

You've always been able to unroll loops, but of course you're chewing up code space doing that.

I don't know what BPF_TABLE is (I think it's a BCC-ism?) but BPF hash maps can take 16 byte keys. But notice that you're now writing something that looks nothing at all like Node's HTTP parser.

I'm not doubting that they did this work. I just want to know what it ends up looking like!

star-trek-fleet · on Dec 10, 2021

Oh nice, we haven't tried bounded loop, because our product is committed to support as old as 4.13.

BPF_TABLE is BCC.

Matthias247 · on Dec 10, 2021

another challenges I can see is wheee to actually store the state of a connection. Even if we just focus on http/1.1 then not all headers will be received at one, and data from previous segments needs to be carried forward. Would it be eBPF maps? Those also seem rather limited for this usecase, and are probably also not extremely fast.

I can imagine getting something to work for http/1.1 - but http/2 with multiplexing and stateful header compression is a completely different beast.

tgraf · on Dec 9, 2021

It looks not too different from the majority of HTTP parsers out there written in C. Here is an example of NodeJS [0].

[0] https://github.com/nodejs/http-parser/blob/main/http_parser....

tptacek · on Dec 9, 2021

Node's HTTP parser doesn't have to placate the BPF verifier, is why I'm asking.

tgraf · on Dec 9, 2021

If you are in a position where you can do that then great. Most folks out there are in a position where they need to run arbitrary applications delivered by vendors without an ability to modify them.

The second aspect is that this can get extremely expensive if your applications are written in a wide number of language frameworks. That's obviously different at Google where the number of languages can be restricted and standardized.

But even then, you could also link a TCP library into your app. Why don't you?

tgraf · on Dec 9, 2021

One of the methods that Cilium (which implements this eBPF-based service mesh idea) uses to implementation authentication between workloads is Wireguard. It does exactly what you describe above.

In addition it can also be used to enforce based on service specific keys/certificates as well.

allset_ · on Dec 9, 2021

Isn't the Wireguard implementation in Cilium between nodes only, not workloads (pods)?

tgraf · on Dec 9, 2021

It can do both. It can authenticate and encrypt all traffic between nodes which then also encrypts all traffic between the pods running on those pods. This is great because it also covers pod to node and all control plane traffic. The encryption can also use specific keys for different services to authenticate and encrypt pod to pod individually.

tgraf · on Dec 9, 2021

What the proposed architecture allows is to continue using SPIFFE or another certificate management solution to generate and distribute the certificates but use either a per-node proxy or an eBPF implementation to enforce it. Even if the authentication handshake remains in a proxy but data encryption moves to the kernel then that is a massive benefit from an overhead perspective. This already exists and is called kTLS.

tgraf · on Dec 9, 2021

OP here: It's whimsical.com. I really love it.

unmole · on Dec 9, 2021

Thank you, Thomas! I really admire all that you have done with Cilium.

tgraf · on Nov 11, 2020

Extending a bit on what Alexei is talking about (Full eBPF summit talk: https://youtu.be/jw8tEPP6jwQ?t=639)

Many people seem to make an assumption that kernel code is perfect and that when code is merged into the Linux kernel, it is automatically secure. That is definitely not the case. Kernel developers make mistakes as well and they have devastating consequences.

Right now, the security of the Linux kernel code depends on a combination of code review, fuzzing, controlling the pace of code changes, and running LTS releases to increase the chance others found the bugs already.

eBPF further increases the security model of kernel development by adding a verification step to the model. It means that there is an additional layer of protection in case of code imperfections.

The focus on eBPF safety is awesome. eBPF is software, software will have bugs, eBPF is no exception. The best way to improve the security of software is to question it. Given the wide spread use of eBPF in highly critical and exposed scenarios, the pressure on making it as bug-free as possible is very high so it's probably fair to assume that the scrutiny put in place, will lead to a high quality implementation of the verifier.

tptacek · on Nov 11, 2020

BPF has always been statically verified, back to 1991 or whenever.

If anything, eBPF is less sound than classic BPF, because the verifier is dramatically more complicated, as is the execution environment.

tgraf · on Nov 10, 2020

Another good way to get started is https://ebpf.io/. It features pointers that go beyond just networking and also cover usage of use cases involving tracing, profiling, security, ...

tgraf · on Nov 10, 2020

The shift from BPF to eBPF was less of an evolutionary step as the name might indicate. The overlap with the name BPF is primarily due to the requirement for eBPF to be a superset of BPF in order to avoid having to maintain two virtual machines long-term. This was one of the conditions for eBPF to be merged and in that context, the name eBPF made sense.

tptacek · on Nov 10, 2020

Disagree (see sibling post). Classic BPF could have been translated into any virtual machine design they came up with (because classic BPF is incredibly simple). When McCanne came up with the same design in 1998, his team called it "BPF+", for the same reason eBPF is called eBPF --- because it is pretty much an evolution of the earlier idea.

tgraf · on Nov 10, 2020

I'm not going to argue with you. You can read up on initial naming and framing in slides of netconf and plumbers conferences as well as LKML archives.

gonzo · on Nov 11, 2020

Remember when Microsoft claimed to invent various computing technologies, even though they had been around since the 70s or earlier?

That’s the type of history you’re articulating here.

tptacek · on Nov 11, 2020

To be clear: the dispute over the history of BPF/eBPF is not interesting, and I don't want to litigate it anymore than they do.

I'm just here to say that eBPF and BPF are in fact pretty closely related. The eBPF design is uncannily similar to Begel, McCanne, and Graham's BPF+ design[1]; in particular, the BPF+ paper spends a fair amount of time describing an SSA-based compiler for a RISC-y register ISA, and eBPF... just uses (at this point) LLVM for a RISC-y register ISA.

Most notably, the fundamental execution integrity model has, until pretty recently, remained the same --- forward jumps only, limited program size. And that's to me the defining feature of the architecture.

The lineage isn't important to me, so much as the sort of continuous unbroken line from BPF to eBPF, regardless of what LKML says.

[1]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.597...

tgraf · on Nov 10, 2020

Disclaimer: I wrote the post.

Happy to answer any questions.

plesiv · on Nov 10, 2020

First of all, congrats. The tech is great and I hope you'll be able to make a company around it.

As for the question: How are you looking to make money?

tgraf · on Nov 10, 2020

I'm not going to spam this forum with a marketing pitch so I'll just refer to https://www.isovalent.com/product and add that you can buy a Cilium Enterprise distribution with enterprise specific add-ons from us.

rurban · on Nov 11, 2020

At first two annoying lies in the title alone.

The Future of Networking? Networking is not only linux. eBPF is linux-only. Everyone else uses the secure variant dTrace, which has even wide-spread user-space support. So you can trace across the kernel, processes and its extensions/scripts. For decades.

Future of Security? eBPF is insecure. User-accessible arrays in the kernel can never be secure. dTrace did not do that for a reason, it was already compromised with the spectre-like attacks, and the fixes were laughable at best to safe face.

Linux might be advised to do better (or is just NIH?), but advertising Worse as Better was fashionable in the 80ies only.

tgraf · on Nov 11, 2020

I personally think that networking will be almost exclusively based Linux in some form. If you want to interpret it as "eBPF - The Future of Linux Networking" then that is totally fine as well. That said, eBPF-based networking can be offloaded to SmartNICs already so it may be less Linux specific than you seem to assume right now.

Comparing dTrace and eBPF is definitely a very interesting question. I've actually asked Brendan Gregg in the Q&A of his keynote at eBPF summit this year how he compares dTrace and eBPF these days. Here is his answer (jumps right to the specific question): https://youtu.be/jw8tEPP6jwQ?t=4618

I doubt that eBPF will remain a Linux-only technology. Ports to FreeBSD are already underway it seems [0] and Microsoft declared intent to invest into eBPF [1]. I'm not sure what that means on timeline for eBPF availability on Windows though. There are also several user space implementations for eBPF which could become interesting to provide a universal programmability approach across traditional kernels like Linux, microkernels like Snap and application kernels like gVisor.

[0] https://papers.freebsd.org/2018/bsdcan/hayakawa-ebpf_impleme... [1] https://twitter.com/markrussinovich/status/12830391539203686...