Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Neat! Is your tool open source?

It'd be nice if perf record had a fundamentally faster way of working. I found a nice description of how it works in the README.md for cargo-trace: "perf relies on perf_event_open_sys to sample the stack. Every time a sample is taken, the entire stack is copied into user space. Stack unwinding is performed in user space as a post processing step. This wastes bandwidth and is a security concern as it may dump secrets like private keys."

cargo-trace is apparently dormant now, but I found it really interesting. It does the unwinding via eBPF instead, which should be quicker while recording, not generate as much (sensitive) data, and not require as much post-processing. (Symbolization would still happen in post-processing.)



I'll ask about opensourcing the tool. But just in case, the recipe is to use pipe mode and pre-parse all frames, stream them as messages, sometimes to several targets (pub/sub) with some streaming-zstd, and also splitting the pmu/probes/Intel-PT streams and treating them separately. Stack-traces are analysed (precomputed cfg optimised structure so unwinding is faster) before storing in adhoc in-house format with all other system traces. Only annoying thing is changing perf-record settings (pid changes, need event X, new probe) means restart and I ran out of interns before we had no-loss switchover...


Sounds more specialized than I was imagining but a cool system.

The idea of a more efficient compressed encoding seems generally applicable. I imagine just piping through zstd would be an improvement over plain perf record directly to a file, but it sounds like your tool's splitting makes zstd more effective. It'd be handy to be able to just do perf record ... | fancy-recompress > out, and even better to upstream the format improvement into perf itself. I feel you on "ran out of interns"; there's always more to do...


Well it started working even better when i separated streams and compressed separately pmc and Intel-PT, and syscalls/dynamic probes.

But yes, in a pinch piping to zstd has far less overhead than writing directly to disk.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: