WizTree is famously almost 50x faster than WinDirStat (on normal Windows NTFS drives) by reading the Master File Table (MFT) instead of walking the tree to measure each file.
WizTree isn't open-source like WinDirStat but "free as in beer" with optional donations.
What's the downside of just reading the MFT? Why doesn't Microsoft do it in file explorer, and why wouldn't every tool use it instead of walking through the file system? Maybe there's no downside but it's such a huge speed boost that it would be weird to not use it otherwise, right?
>What's the downside of just reading the MFT? Why doesn't Microsoft do it in file explorer, and why wouldn't every tool use it instead of walking through the file system?
One disadvantage is that you can't read the MFT of network shares or device emulators presenting "virtual drive letters" to the OS.
The typical (and slower) Win32 API functions FindFirstFile()/FindNextFile() used to iterate through the files structure work at a higher level of abstraction so they work on more targets that don't have an NTFS MFT. Indeed, if you point WizTree to a SMB network share, it will be a lot slower because it can't directly read the MFT.
It's conceivable that Microsoft developers could have programmed Windows Explorer differently to have an optimized code path of reading MFT for local disks and then fall back to slower FindFirstFile()/FindNextFile() for non-MFT disks. Maybe that adds too much complexity and weird bugs. I notice that most of the 3rd-party "Win Explorer replacement" utilities also don't read MFT.
> It's conceivable that Microsoft developers could have programmed Windows Explorer differently to have an optimized code path of reading MFT for local disks and then fall back to slower FindFirstFile()/FindNextFile() for non-MFT disks
Surely this would have been worth doing, even if it meant flushing out bugs elsewhere.
Along with the reasons others have mentioned, it would also bypass any filter driver in the file system stack (Windows has the concept of a stack of filter drivers that can sit in front of the file system or hardware) and would also ignore any permissions (ACLs) on who can see those files. There’s no way they can credibly use this technique outside of say something from SysInternals: it violates the security and layering of the operating system and its APIs.
Is there a Linux equivalent for those "filters"? I'm a bit clueless about win32 and NT sadly enough...
Would that mean that there's no way to "scope" the MFTs?
Edit:
That also makes sense, since if I got it right they aren't necessarily supposed to be consumed by userspace programs?
I guess that's why those tools always ask for admin access and basically all perms to the FS.
It's a bit sad that the user gets exposed to a much slower search and FS experience even if the system underneath has the potential to be as fast as it gets. And I don't think ReFS is intended to replace NTFS (not that it's necessarily more performant anyways)
There is no equivalent on Linux. That's why linux has no online antivirus scanners (scanners that scan the file as it's opened) while this is a basic feature of every antivirus program on Windows.
Linux has device mappers (dm-crypt, dm-raid and friends). But those sit below the file system, emulating a device. Window's file system filter drivers sit above the file system, intercepting API calls to and from the file system. That's super useful if you want to check file contents on access, track where files are going, keep an audit log of who accessed a file, transparently encrypt single files instead of whole volumes, etc. But you pay the price for all that flexibility in performance.
> That's super useful if you want to check file contents on access, track where files are going, keep an audit log of who accessed a file, transparently encrypt single files instead of whole volumes
Or if you just want to generally make the filesystem so slow that everyone has to invent their own pack files just to avoid file system api calls as much as possible.
Filters are vaguely similar to things like mountpoints overlaying portions of the filesystem. E.g. in Linux you might have files in /d1/d2/{f1,f2,f3} in the root filesystem but you also have a mountpoint of a 2nd filesystem on /d1/d2 that completely changes the visibility / contents of d2. Filter drivers can do similar things (although they are not actually independent mountpoints).
You need admin permissions to read the MFT on Windows. The traditional security model of both Windows and Linux assumes that the kernel is a security barrier between system and unprivileged user, and between different unprivileged users. An admin being able to bypass security restrictions isn't traditionally seen as a problem.
Indeed, only in very recent history has the admin/root user/owner been seen as a threat to the system and the system employs defenses against them. I'm hoping that trend reverses because I really hate the direction things are going.
There are pretty good reasons to do that. We've been really lax in what is allowed to run as root/admin when in reality, those permissions should only be used when doing things like reading the MFT or snooping on all the network traffic with Wireshark. It should not be required to run as root/admin in order to install most software because installing software is a very common thing to do.
Even if you want more control over your system, I still think technically capable people would be better served by having a separate administrator account from your normal day-to-day account which you have to explicitly log into (so no UAC prompts, you need to go onto that other account and then you get the UAC prompt). Unfortunately, I think most Desktop OSes are still too unusable with this sort of workflow due to how much software insists on admin for installation.
I largely agree. I think what makes the "the user is a threat" model so difficult to me is that there is a lot of truth to it. Users often don't know enough to make good decisions.
I really like your idea of logging in separately, such that is isn't something you're going to do cavalierly. That seems like a great compromise to me! I fully agree that we way overuse admin and really don't need it for the majority of things.
> it would also bypass any filter driver in the file system stack
The main use case for filter drivers is antivirus, and that is primarily about file contents not file metadata - so if MFT access bypassed filter drivers, that might not be a major issue. I think most non-antivirus use cases are also primarily about data not metadata.
If necessary, one could even devise a design in which MFT access is combined with filter drivers - MFT scanning to find matching files, then for each matched file access its metadata via standard APIs (to ensure filter drivers are invoked) before returning to client. That would be slower than a pure MFT scan but still faster than a scan done purely with standard APIs. A registry key could turn this on/off so sites can decide for themselves where to place the performance versus security tradeoff
> and would also ignore any permissions (ACLs) on who can see those files
They could expose an API which enables MFT scanning with some degree of ACL checking added.
If you do the ACL check as late as possible in processing the query, it would give much better performance than standard APIs that evaluate ACLs on every access. For example, suppose I want to scan a volume for all files with the extension ‘*.exe’. The API would only have to do an ACL check on each matching entry, not on every entry it considers.
There also might be reasonable situations in which ACL checking could be bypassed. For example, if I am requesting a search for files of which I am the owner, just assume the owner should have the right to read the file’s metadata. Or, if I have read permission on a directory, assume I am allowed aggregate information on the count and total size of files in that directory and its recursive subdirectories. These “bypasses” could be controlled by system settings (registry entries / group policy), so customers with higher security needs could disable them at the cost of reduced performance.
Rather than putting this in the OS kernel, it could be a privileged system service which exports an API over LPC/COM/etc. Actually with that design it isn’t even necessary to wait for Microsoft to implement this, it could always be implemented as an open source project, if someone felt sufficiently motivated to do so. (Or even as a proprietary product, although I suspect that would limit its adoption, and the risk is if it takes off, Microsoft would just implement the same thing as a standard part of Windows.)
Reading the MFT directly requires Administrator permissions, and doing it correctly means reimplementing support for every nook and cranny of NTFS including things like hard links, junction/reparse/mount points, sparse files, etc.
You call that a workaround but it’s basically the best possible situation security-wise. If this didn’t work securely then it wouldn’t be possible to implement disk defragmenter or even explorer. It’s so core to Windows NT’s security model that I wouldn’t call it a workaround.
You do similar things even with more modern stacks - assign a permission to an application and grant permissions to the application to the user.
The only real concern is that Windows NT permissions are not as granular as they could be.
> Windows NT permissions are not as granular as they could be.
For objects, Windows NT permissions are ridiculously granular; e.g. GENERIC_WRITE can be mapped to a half-dozen separately settable type-specific flags, depending on the object type (file, named pipe, etc.). It’s too granular for even an administrator to make sense of, arguably, and the documentation is somewhere between bad and nonexistent. (The UI varies from decent, like the ACL editor you can access from e.g. Explorer, to “you can’t make this shit up”, like SDDL[1].)
For subjects, the situation is not good, like on every other conventional OS. You could deal with that by introducing a “user” for each app, as on Android. But I’m not aware of any attempts to do that (that would expose this mechanism in a user-visible way).
(Then there’s the UWP sandbox, which as far as I tell is build with complete disregard of the fundamental concepts above. I don’t think it’s worth taking seriously at this time.)
I have no idea if there’s a granular object permission that could give access to the MBR of a disk. I’ve thankfully never had to dig that deep into Windows internals.
I’ve had to work with SDDL before to setup granular permissions for WMI monitoring on a whole lot of computers and my god, did it make me love the Cloud and Linux. I can’t emphasize enough how unintuitive setting these permissions is creates systemic over privileging.
Been using the portable version of 1.4 for decades after first coming across it in some PC magazine or something like that many years ago. Not terribly pretty, but it does what I need and it still works.
One possible reason is that it isn't a published part of the filesystem's external interface, and the format is not guaranteed to be static between versions or even point releases (though in reality, while the behaviours may be officially undefined that are unlikely to change significantly).
Also, it requires admin elevation to access. Anything running elevated is a potential security concern as it can access much else too.
> Why doesn't Microsoft do it in file explorer
Not sure, but it could be because that would be seen as an unfair advantage so to avoid anti-trust allegations they would have to publish the format and make stability guarantees for it, so others could use it as easily/safely. That, and the reasons above & below too.
> and why wouldn't every tool use it instead of walking through the file system?
Largely because walking the filesystem works for all filesystems, local and remote, so you cover everything with one tree walk implementation. Implementing a tree-walk over the MFT data where available is extra work to implement and support for one filesystem, and not many care enough, or are not aware of the potential speed benefit at all, for it to be a huge selling point such that all toolmakers feel compelled to bother.
> One possible reason is that it isn't a published part of the filesystem's external interface, and the format is not guaranteed to be static between versions or even point releases (though in reality, while the behaviours may be officially undefined that are unlikely to change significantly).
I am not going to pull every document, but the MFT structure is documented and published. I am uncertain what you mean by "external interface".
Though all the sub-pages of that state things like “[This structure is valid only for version 3 of NTFS volumes; it may be altered in future versions.]” — while it is true that any API could see breaking changes in future, this suggests that you should expect them, so I'd not call it supported in the same sense of the main file/directory access APIs which I would not expect to see breaking changes in (additional properties & functionality yes, but not existing things changing behaviour).
A lot of people talking about the details, does not constitute official documentation though.
You can find a lot of articles talking about SQL Server's DBCC IND and DBCC PAGE, but that isn't official documentation – they are essentially internal functions and not supported and could change or go away entirely despite having been around for many versions, as they have in Azure). Similarly there articles talking about sys.dm_db_database_page_allocations which sort-of does the job of DBCC IND, but again this is not officially documented & supported.
> I am uncertain what you mean by "external interface".
I meant the published interface. Maybe "supported API" would have been a better phrase to use?
Though as pointed out below, there is at least some official documentation on the MFT structure.
It's probably also racy to access the raw MFT while there are concurrent programs creating new files (or deleting files). That complication can be avoided by using the ordinary OS directory iteration primitives.
Yep but then the tradeoff of performance gains are completely discarded. The easiest solution is to take a snapshot with VSS, which is both fast and makes a quiesced copy of $MFT. From there, one could monitor FS changes if they wanted to have live updates.
With RAM sizes now, it's curious why any OS wouldn't just cache some or all of metadata for some local volumes on a block basis rather incur the greater resource usage of transforming disk into different structures, and then caching and track individual entries.
I am building an advanced filemanager (FileNinja) for Windows with full integrated everything search & query. you have the option of saving bookmarks to virtual folders that consist of everything searches. Instant directory sizes, tags, custom file descriptions for ntfs. Anyone interested?
https://youtu.be/JREufgkf5pk?si=sP05UCOrskpX8OTq
Try Everything 1.5a - an "alpha" version with many improvements, in development for years but inexplicably hidden away on their website. Never experienced any instability.
You should not be starting it when you want to search. You should open it when you log in, and leave it in the tray. It will do a full index on launch then subscribe to filesystem notifications to keep itself up to date for as long as it’s open.
Do that and it’s alarmingly fast and responsive except for the minute or two right after launch.
Contrasting seemingly all the other responses to this, I use it the same way you do (only opening it when needed) and I'm fine with the delay: even at its slowest rebuilding the index and searching is faster than the in-built windows Search.
WizTree also understands things like OneDrive and Dropbox, and know that files "stored in the cloud" aren't taking up any disc space -- WinDirStat thinks my drive is 140% full.
Wiztree and WinDirStat will both double count hard links. I have a 12TB hard drive holding "17TB" because of sparse files and hard links. Windows file manager properties agree with Wiztree and WinDirStat as far as space used. I think the file manager looks for free space and calculates that separately, while Wiztree and WinDirStat are just adding up used space.
Wiztree takes like 3 seconds where WDS takes 30. In realy big analysis and cleanup scenarios with rescans, it's enough to let you do your job faster. In every day scenarios, it removes any hesitation to visualize a system. It's basically free and near instant.
Fact is, WDS community must be kind of abandoned, or else it would be doing the same trick. It's SO much faster that it becomes a genuine quality of life improvement. I need it, and don't mind using a non free tool until the OSS solution has the capability.
Didn't try AltWinDirStat, but did try FastWinDirStat.
The thing is, FastWinDirStat uses a licensed propietary component. No problem for me, but the author did have some back and forth with another user on GitHub.
Seems FastWinDirStat license don't match with using a closed source library, or something...
As for its actual functioning, it does as it says. Works much faster than WinDirStat
Looks like a pretty clear violation of the WinDirStat license. They took WinDirStat which is GPL, linked it with some other proprieraty code and distributed the result.
(They could have been clear-ish (with caveats) by distributing only the source code and let the users do the compiling and linking, similarly to how you could download ZFS and build it into Linux. But you mustn't distribute the result further.)
You’ve got me interested but I’m finding it quite annoying that WizTree doesn’t actually have pictures of the software UI on the website. At least not under any of the obvious links I’ve checked.
SpaceSniffer's UI is less clunky, but Wiztree's scan is an order of magnitude faster. That kind of speed difference can affect when you're willing to use the tool.
I find myself much more willing to pop open wiz tree to get a quick view of my system or a particular storage folder.
WizTree isn't open-source like WinDirStat but "free as in beer" with optional donations.
There's also a fork of WinDirStat patched to read the MFT but I don't know anyone who's tried it: https://github.com/ariccio/altWinDirStat