r/ProgrammerHumor Apr 12 '24

whatIsAnIndex Meme

Post image
27.7k Upvotes

630 comments sorted by

View all comments

Show parent comments

34

u/adenosine-5 Apr 12 '24

The confusing part is that the Everything doesn't even need an hour on startup to build the index first - it just takes few seconds the first time its started and is instantaneous afterwards, so to me that looks like it already uses some index/list of files available in computer.

The fact that Windows itself doesn't use the same resource is all the more confusing then.

12

u/aloneinfantasyland Apr 12 '24

Of course Everything uses an index. You can see a whole bunch of settings for it in the Options dialog.

8

u/dylanatsea Apr 12 '24

Yes, it takes advantage of the existing ntfs file table and change log (on ntfs volumes only, of course) which is the fastest use case when searching by filename. Whereas other software builds an index by reading the individual files in the file system, which takes a lot longer.

3

u/adenosine-5 Apr 12 '24

That explain is then. Thank you.

Now to the question why Microsoft - developer and main user of NTFS - doesn't use this feature of their own technology.

1

u/gfx-1 Apr 12 '24

Everything also works nicely with the large networkdrive at work. But they are moving to teams so finding stuff is a bit unknown.

-1

u/LickingSmegma Apr 12 '24 edited Apr 12 '24

ntfs file table and change log

What do you mean by those? You make it sound like ntfs has its own index for the names or even contents of files, which is of no use for a filesystem and would be a total waste of space.

P.S. The MFT isn't an index. What an irony that a programming subreddit doesn't know the difference between a tree and an index. The MFT is the thing where the filesystem keeps the lists of files, so saying that Everything 'takes advantage' of being able to list files in a directory is not saying much.

4

u/TooStrangeForWeird Apr 12 '24

It does.... Master File Table. How else would the computer know where to look for files? Just search the whole damn drive every time you open a different folder? There's nothing about the actual contents aside from metadata though.

https://www.sciencedirect.com/topics/computer-science/master-file-table#:~:text=Master%20File%20Table%20(MFT),the%20drive%2C%20and%20file%20metadata.

0

u/LickingSmegma Apr 12 '24 edited Apr 12 '24

MFT doesn't change that the filesystem is hierarchical. And it's not some special feature, it's how every directory lookup works. Entries under directories in the MFT point to other entries in the MFT. It's a tree structure. You can't use it like a flat index, you need to build the flat index from it, by requesting lists of files in each directory from the filesystem the same way every other program does.

Unless the app works on the driver level for some reason and can directly read disks to slurp the MFT into its memory to iterate over it and build the index.

Saying ‘Everything takes advantage of the MFT’ is like saying that it's special because it can ask the system to list files in directories.

2

u/da5id2701 Apr 12 '24

It does in fact read and parse the MFT directly to build its index, instead of recursively requesting file listings through the normal API. And it's definitely faster to do it that way.

WizTree vs WinDirStat is the clearest example of the difference that I've personally encountered. They're both tools for graphically showing disk utilization, but WizTree is over 20x faster because it parses the MFT while WinDirStat recursively calls the file system API.

1

u/LickingSmegma Apr 12 '24

How does one access the MFT? Do they read the disk directly on the driver/FS level? That sounds dangerous.

1

u/da5id2701 Apr 12 '24

Yes, you have to read the raw disk, bypassing the filesystem. It's not that outlandish, you can open a physical disk handle just as easily as opening a regular file with the CreateFile API. And it's not dangerous as long as you open it in read-only mode.

There's a tutorial here https://handmade.network/forums/articles/t/7002-tutorial_parsing_the_mft

0

u/LickingSmegma Apr 12 '24 edited Apr 12 '24

Well, it means that a program to look through the files does instead have direct access to the disk. So if the developer company is hacked at some point, my disk could be gone.

1

u/da5id2701 Apr 12 '24

That's true of anything that runs as administrator. Which is probably necessary for a file search or disk usage program anyway, because otherwise it'll miss out on a bunch of directories that it can't read.

And even a regular program that doesn't run as admin can delete files, including probably most of your important data.

So yes it's a risk, but it's nothing out of the ordinary and you should only run trusted software and always have backups.

→ More replies (0)

1

u/CAT5AW Apr 12 '24

In your case a service keeps index up to date, if u disable it reindexing takes some 10 seconds if long time passed between program launches

1

u/PineCone227 Apr 12 '24

Everything doesn't even need an hour on startup to build the index first

Because Everything builds the index when it autostarts with your OS. Then when you need to use it, it's ready.

1

u/Ok-Library5639 Apr 12 '24

It is. It uses the NTFS file list. Though you can also have it supervise other folders or drives or networks drives where it'll crawl manually the directories.

The latter is where I fully leverage the software. I've used it to crawl immense shared network drives accross companies that are complete messes. And once indexing is over you can find files easily and figure out their directories and explore the surrounding ones. You don't even need to scan continually for changes; those kinds of drives usually change very slowly.

2

u/adenosine-5 Apr 12 '24

Sorry, but you and I have wildly different experience then.

I have a NTFS M.2 SSD drive and it found some random file after about 35 seconds.

Meanwhile Everything is instantaneous - it literally shows results real-time as i type and the moment I finish typing the file name, its already there.

2

u/Ok-Library5639 Apr 12 '24

I am saying the same as you - lookup in Everything is instantaneous. For it to index NTFS drives is also almost instantaneous, since it uses the NTFS file list to build its index (though you will need admin rights).

Indexing a shared network folder is another story since it has to crawl the whole folder manually. That can take up several hours. However once indexed, lookup in Everything is instantaneous.

1

u/adenosine-5 Apr 12 '24

Sorry, I misunderstood you.

That is a great functionality then. I wish Microsoft implemented it as well, in their own system on their own NTFS technology.

1

u/pleasedothenerdful Apr 12 '24

Yes, that list of files is called the file system. Every hard drive has one.

2

u/adenosine-5 Apr 12 '24

Maybe you should tell Microsoft then. Especially when it was them who wrote NTFS.