How to Build a Searchable File Index at Scale

How many files do you have on your computer? How much total space do they take up? Can you, armed with only the file name, find a specific file that you worked on 3 years ago? If your disk is getting full, which files would you delete or move first, in order to free up space?

Chances are that you know where to go to answer these questions, especially if you have file management software. These tools usually build a metadata index, a data structure that maps file names to metadata like access time, modified time, file size etc., and allows for aggregation and filter queries on this metadata. For example, a metadata index could quickly answer the question “what is the total size of all .mp3 …


Why we built our own NFS client in Golang

This article is part of a series about building Igneous DataDiscover, a searchable file index. Click here for the overview.

Image for post
Image for post
Go to the first post in this series for an overview of this diagram

The first step in building a metadata index, as you might imagine, is actually getting the file metadata so that we can index it. This is easy — all we have to do is mount the NFS server:

and then write code to walk the localmount directory, collecting metadata.

Ah, if only it had been that easy, we would have saved many late nights and I could end this article here. The problem with reading from a mounted volume is that you’re limited by what the kernel’s NFS client can do. And the Linux kernel’s NFS client implementation is not exactly built for high performance. We can figure out what the NFS client is doing by inspecting the network traffic. …

About

Sudarshan Muralidhar

Software engineer at Igneous. Cofounder of Upbeat Music App. I do cloud things.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store