Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But Rust works badly with mmapped (memory-mapped) files, as the article notes. So in C you could load (and save!) stuff almost instantly, whereas in Rust you still have to de-serialize the input stream.


No you don't. I've written multiple programs that load things instantly off the file system via memory maps. See the fst crate[1], for example, which is designed to work with memory maps. imdb-rename[2] is a program I wrote that builds a simple IR index on your file system that can then instantly search it by virtue of memory maps.

Rust "works badly with memory mapped files" doesn't mean, "Rust can't use memory mapped files." It means, "it is difficult to reconcile Rust's safety story with memory maps." ripgrep for example uses memory maps because they are faster sometimes, and its safety contract[3] is a bit strained. But it works.

[1] - https://github.com/BurntSushi/fst/

[2] - https://github.com/BurntSushi/imdb-rename

[3] - https://docs.rs/grep-searcher/0.1.7/grep_searcher/struct.Mma...


I didn't read your code but one problem I suspect you ran into is that you had to re-invent your container data structures to make them work in a mmapped context.


No, I didn't. An fst is a compressed data structure, which means you use it in its compressed form without decompressing it first. If you ported the fst crate to C, it would use the same technique.

And in C, you have to design your data structures to be mmap friendly anyway. Same deal in Rust.

But this is moving the goal posts. This thread started with "you can't do this." But you can. And I have. Multiple times. And I showed you how.


> which means you use it in its compressed form without decompressing it first.

So your code operates directly on a block of raw bytes? I can see how that can work with mmap without much problems.

My argument was more about structured data (created using the type system), which is a level higher than raw bytes.


> So your code operates directly on a block of raw bytes? I can see how that can work with mmap without much problems.

Correct. It's a finite state machine. The docs of the crate give links to papers if you want to drill down.

> My argument was more about structured data (created using the type system), which is a level higher than raw bytes.

Yes. You should be able to do in Rust whatever you would do in C. You can tag your types with `repr(C)` to get a consistent memory layout equivalent to whatever C does. But when you memory map stuff like this, you need to take at least all the same precautions as you would in C. That is, you need to build your data structures to be mmap friendly. The most obvious thing that is problematic for mmap structures like this that is otherwise easy to do is pointer indirection.

With that said, this technique is not common in Rust because it requires `unsafe` to do it. And when you use `unsafe`, you want to be sure that it's justified.

This is all really besides the point. You'd have the same problems if you read a file into heap memory. The main problem in Rust land with memory maps is that they don't fit into Rust's safety story in an obvious way. But this in and of itself doesn't make them inaccessible to you. It just makes it harder to reason about safety.


Dang, burntsushi up in the house! Hey, just wanted to say I enjoy your work––I've learned a lot from it. Thank you!


It's very tedious to debate with someone who explicitly makes assumptions about something (like code) without having read it, and puts the burden of refuting those assumptions on you...


It doesn’t say it “works badly” it says the borrow checker can’t protect against external modifications to the file while memory-mapped, which has a host of issues in C as well.

You can mmap files in Rust just fine, but it’s generally as dangerous as it is in C.


I don’t get this obsession with “dangerous.” Honestly, what does that even mean? I think a better word is “error-prone.” Danger is more like, “oh my god a crocodile!”


> Honestly, what does that even mean?

It has a very specific meaning in Rust: the user can cause memory unsafety if they make a mistake.

> I think a better word is “error-prone.”

The issue with the connotation there is that it's not about the rate of problems, it's about them going from "impossible" to "possible."


There can be real danger when the code is used in certain applications. For example when controlling the gate of the crocodile cage in a zoo.


Concurrency bugs can absolutely cause dangerous danger of the deadly variety:

https://en.m.wikipedia.org/wiki/Therac-25


Unfortunately, as is most always the case of negligence instead of some particular language features:

“A commission attributed the primary cause to general poor software design and development practices rather than single-out specific coding errors. In particular, the software was designed so that it was realistically impossible to test it in a clean automated way.“

Ergo, concurrency doesn’t kill people, people do.


You sound like you make a refutation, but you really don't. This whole discussion is about giving tools to developers that are systematically less error-prone, which your quote suggests would have been helpful to that specific development team.


the main problem here is that C has the capability to declare mmap regions correctly: `volatile char[]` and Rust does not (`[Cell<u8>]` is close but not exactly right, and annoying)

most rust folks who use mmap don't mark the region as Celled, which means they risk UB in the form of incorrect behavior because the compiler assumes that the memory region is untouchable outside the single Rust program, and that's not true

(it's also not true generally b/c /dev/mem and /proc/pid/mem exist, but it's beyond Rust's scope that the OS allows intrusion like that)


Errors are up to interpretation. It just means the thing didn't happen as requested. Errors are meant to be expected or not expected depending on the context.

Dangerous means dangerous. It's not up for interpretation.

Languages have multiple, very different words, for exactly this reason.


Agreed. But still, folks make it sound bad. For instance “danger” in the many context could also be reframed as “powerful”, could it not?


But that may be of little solace. If you snapshot your entire heap into an mmapped file for fast I/O, then basically the entire advantage of Rust is gone.


Is there literally no other code in the application?

Rust has plenty of situations where you do unsafe things but wrap that in safe APIs. If you’re returning regions of that mmapped file, for example, a lifetime can be associated to those references to ensure that those are valid for the duration of the file being mmapped in the program.

It can be used to ensure that if you need to write back to that mmapped file (inside the same program) that there are no existing references to it, because those would be invalid after an update to the file. You need to do the same in C, but there are no guardrails you can build in C to make that same assurance.


> If you snapshot your entire heap into an mmapped file for fast I/O,

I've never heard of this trick. And my first reaction is "That would be a nightmare of memory unsafety if I did it in C++"

What's it used for? IPC?


I think emacs (used to?) do something awful like this. https://lwn.net/Articles/707615/


I'd call mmaping data structures into memory an advanced systems programming trick which can result in a nice performance boost but which also has some severe drawbacks (portability across big/little endian architectures and internal pointers being two examples).

I know some very skilled C++ and Rust developers who can pull it off. If you're at that skill level, Rust is not going to get in your way because you're just going to use unsafe and throw some sanitizers and fuzzers at it. I wouldn't trust myself to implement it.


You have to combine it with other techniques, e.g. journaling to make it safe, but this is not always necessary (e.g. when using large read-only data-structures)


In C you can access pointers to memory mapped files effortlessly in ways that are often extremely unsafe against the possible existence of other writers and against the making being unmapped and mapped elsewhere. It’s also traditional to pretend that putting types like int in a mapped file is reasonable, whereas one ought to actually store bytes and convert as needed. Rust at least requires a degree of honesty.


is it something deeply ingrained to rust? or is it something rust is working on?


It's more like, Rust wants to make guarantees that just aren't possible for a block of memory that represents a world-writable file that any part of your process, or any other process in the OS, might decide to change on a whim.

In other words, mmaped files are hard, and Rust points this out. C just provides you with the footgun.


The problem is that compilers are allowed to make some general assumption about how they're allowed to reorder code, always based on the assumption that no other process is modifying the memory. For example, the optimizer may remove redundant reads. That's a problem if the read isn't really redundant -- if the pointer isn't targeting process-owned memory, but a memory mapped file that's modified by someone else. Programs might crash in very "interesting" ways depending on optimization flags.

C has this issue as well, but Rust's compiler/borrow checker is particularly strong at this kind of analysis, so it's potentially bitten even harder.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: