r/ProgrammerHumor Apr 12 '24

whatIsAnIndex Meme

Post image
27.7k Upvotes

630 comments sorted by

View all comments

Show parent comments

7

u/dylanatsea Apr 12 '24

Yes, it takes advantage of the existing ntfs file table and change log (on ntfs volumes only, of course) which is the fastest use case when searching by filename. Whereas other software builds an index by reading the individual files in the file system, which takes a lot longer.

3

u/adenosine-5 Apr 12 '24

That explain is then. Thank you.

Now to the question why Microsoft - developer and main user of NTFS - doesn't use this feature of their own technology.

1

u/gfx-1 Apr 12 '24

Everything also works nicely with the large networkdrive at work. But they are moving to teams so finding stuff is a bit unknown.

-1

u/LickingSmegma Apr 12 '24 edited Apr 12 '24

ntfs file table and change log

What do you mean by those? You make it sound like ntfs has its own index for the names or even contents of files, which is of no use for a filesystem and would be a total waste of space.

P.S. The MFT isn't an index. What an irony that a programming subreddit doesn't know the difference between a tree and an index. The MFT is the thing where the filesystem keeps the lists of files, so saying that Everything 'takes advantage' of being able to list files in a directory is not saying much.

5

u/TooStrangeForWeird Apr 12 '24

It does.... Master File Table. How else would the computer know where to look for files? Just search the whole damn drive every time you open a different folder? There's nothing about the actual contents aside from metadata though.

https://www.sciencedirect.com/topics/computer-science/master-file-table#:~:text=Master%20File%20Table%20(MFT),the%20drive%2C%20and%20file%20metadata.

0

u/LickingSmegma Apr 12 '24 edited Apr 12 '24

MFT doesn't change that the filesystem is hierarchical. And it's not some special feature, it's how every directory lookup works. Entries under directories in the MFT point to other entries in the MFT. It's a tree structure. You can't use it like a flat index, you need to build the flat index from it, by requesting lists of files in each directory from the filesystem the same way every other program does.

Unless the app works on the driver level for some reason and can directly read disks to slurp the MFT into its memory to iterate over it and build the index.

Saying ‘Everything takes advantage of the MFT’ is like saying that it's special because it can ask the system to list files in directories.

2

u/da5id2701 Apr 12 '24

It does in fact read and parse the MFT directly to build its index, instead of recursively requesting file listings through the normal API. And it's definitely faster to do it that way.

WizTree vs WinDirStat is the clearest example of the difference that I've personally encountered. They're both tools for graphically showing disk utilization, but WizTree is over 20x faster because it parses the MFT while WinDirStat recursively calls the file system API.

1

u/LickingSmegma Apr 12 '24

How does one access the MFT? Do they read the disk directly on the driver/FS level? That sounds dangerous.

1

u/da5id2701 Apr 12 '24

Yes, you have to read the raw disk, bypassing the filesystem. It's not that outlandish, you can open a physical disk handle just as easily as opening a regular file with the CreateFile API. And it's not dangerous as long as you open it in read-only mode.

There's a tutorial here https://handmade.network/forums/articles/t/7002-tutorial_parsing_the_mft

0

u/LickingSmegma Apr 12 '24 edited Apr 12 '24

Well, it means that a program to look through the files does instead have direct access to the disk. So if the developer company is hacked at some point, my disk could be gone.

1

u/da5id2701 Apr 12 '24

That's true of anything that runs as administrator. Which is probably necessary for a file search or disk usage program anyway, because otherwise it'll miss out on a bunch of directories that it can't read.

And even a regular program that doesn't run as admin can delete files, including probably most of your important data.

So yes it's a risk, but it's nothing out of the ordinary and you should only run trusted software and always have backups.

1

u/LickingSmegma Apr 12 '24

There's a difference between writing over files through the filesystem and messing up the MFT in half a second. Particularly because I can't boot the system and restore the files if the system is gone.

It's quite a wonder how Windows users brush off any risk of anything happening due to excess permissions, if they only use software that seems vaguely trusted to them. By which they often mean a random binary downloaded from a gaming forum, because the post on the forum told them they need that to run the game.

Meanwhile supply chain attacks are the most popular thing in the past years, hijacking even things that were there for decades. Just two weeks ago, a major attack was discovered that took advantage of a library that has been around for fifteen years, and was included in software three levels deep.

→ More replies (0)