Contents>

How to Spotlight Search Across Every Mac With Osquery

Or — Learn how to find potential data breaches before they happen

Have you ever wanted to find exactly the right file on your system in a hurry?

That’s why those clever folks at Apple built Spotlight.

Originally introduced in Mac OS X Tiger, Spotlight continuously maintains an index of all of the files on your Mac, which allows you to instantly search for files not just by their names, but by a files’s metadata, and even the text content inside of them.

As the UX designer at Kolide, I lean on Spotlight heavily everyday, to quickly locate and pull up the right design assets from the 1000+ Sketch files on my system.

While the benefits of Spotlight as a user of macOS are obvious and intuitive, you may be surprised to learn you can leverage this powerful feature across your Mac fleet to hunt for evidence of malware, data breaches, and other undesirable artifacts in your end-users’ devices.

In this article, we will discuss how you can programmatically access Spotlight using a utility called Osquery, and we’ll demonstrate useful queries you can run to find evidence of compromise in your organization.

What Is Osquery?

Osquery is a free open-source project created by Facebook that allows you to query a device with SQL as if it were a real relational database.

For example, if you wanted to list all of the apps installed on a device you can simply open your terminal, type osqueryi (osquery’s command line utility) and in the prompt run the following:

SELECT * FROM apps LIMIT 1;

                  name = 1Password 7.app
                  path = /Applications/1Password 7.app
     bundle_executable = 1Password 7
     bundle_identifier = com.agilebits.onepassword7
           bundle_name = 1Password 7
  bundle_short_version = 7.0.7
        bundle_version = 70007000
   bundle_package_type = APPL
           environment =
               element =
              compiler = com.apple.compilers.llvm.clang.1_0
    development_region = en
          display_name =
           info_string =
minimum_system_version = 10.12.6
              category = public.app-category.productivity
   applescript_enabled = 0
             copyright = Copyright © AgileBits Inc.
      last_opened_time = 1533860585.5755

When I run this query, osquery translates the SQL into live API calls that list all of the apps currently installed on my device. If I installed more apps and ran this command again, the output would immediately reflect those changes.

For more information on the architecture and inner-workings of osquery, see Zach Wasserman’s spectacular deep dive, Osquery: Under the Hood

In our query above, apps is called a Virtual Table in osquery terminology. Osquery has hundreds of these virtual tables, many of them which work across platforms. To see them all check out osquery’s schema documentation.

To download osquery on your device, simply visit the official website and find the right package for your platform.

This article focuses on a special Virtual Table called mdfind. This Virtual Table (named after the CLI utility mdfind) allows us to use Spotlight to locate files on a Mac that meet our search criteria.

The mdfind Virtual Table

Prior to mdfind’s inclusion into osquery, searching for files across the file-system when their location was unknown, necessitated great deals of recursion, and was best avoided to prevent undue strain on the device being queried.

You can read more about searching for files using the file table in my previous blog post: The File Table in Osquery is Amazing! — but it has drawbacks.

Enter macOS Spotlight and the mdfind table!

macOS Spotlight (mdfind) is like a lightweight grep without the p. The best part is, because it’s built around an index, it’s insanely fast and well suited to locating files where the precise location or name is unknown.

The mdfind Virtual Table was born first as a custom osquery go table written by Victor Vrantchan (groob) and later added (v3.2.6) to the core Osquery open-source project by Facebook developer Mitchell Grenier (obelisk)

Unlike most osquery Virtual Tables, where querying is straightforward if you are familiar with SQL, mdfind requires some Apple developer-level knowledge of Spotlight’s own unique query language. You should think of mdfind like a pass-through that allows you to access the raw power of Spotlight from osquery.

What Can mdfind Find?

Spotlight indexes an absolutely incredible breadth of data across your file system. There are over 125 published metadata attributes that Spotlight is capable of indexing and which you can search by. However most systems actually have closer to double that number and 3rd party applications can add even more via custom attributes.

You can check to see what metadata attributes are present on your own device by running the following command in the terminal:

mdimport -A

Sample `mdimport -A` output

The naming schema is straightforward and all attributes are prepended by kMDItem:

k - (Hungarian Notation for constant, used by Apple since Pascal)
MD - (metadata)
Item
AttributeName

The standard stuff is all there of course:

File Name (kMDItemFSName)
File Size (kMDItemFSSize)
File Creation Date (kMDItemFSCreationDate)

Spotlight also gives you access to metadata that you may not realize even exists across your files including:

Downloaded File Source (kMDItemWhereFroms)
File EXIF Altitude in Meters above sea-level (kMDItemAltitude)
PDF Password Security Method (kMDItemSecurityMethod)

While some of these are useful (and others just strange), they still require some basic level of knowledge about the files themselves. What if you are just looking for files of any type that contain specific phrases or confidential information? This is where Spotlight really shines (pun intended). This is accessed through the attribute…

File Text Contents (KMDItemTextContent)

Spotlight can index the text contents of any ASCII plain-text files, PDFs, Messages, emails, text files, csv’s, python files, shell scripts, JSON and other compatible formats, that are within an indexable location. That’s right, it’s pretty much the bee’s knees! Let’s discuss some practical applications of that utility.

Using mdfind to Prevent a Data Breach

Unlike other types of devastating cyber security incidents that involve advanced threat actors and malware, most data breaches for SaaS companies can be attributed to innocuous events. For example, a well-intentioned software engineer simply forgetting to delete a production database backup on their device after troubleshooting a customer issue.

Once a production database backup is on a device, forgetting to encrypt that device and leaving it in the back of a cab is all it takes for a simple oversight to turn into a disastrous headline.

So, let’s say you were afraid your engineers had unintentionally left a copy of your production database on their laptop. How would we find it? For this example, I will be using Kolide’s Live Query feature to run a search across a few of our own Kolide devices (but you could use osqueryi to test these queries yourself).

I have created a mock production db dump on my device to illustrate the process, located at:

/Users/fritz-imac/dev/pg/backups/backup_2018-07-11T06-57-36Z

** 👀 Let’s see if we can find it!**

* For the purposes of this post I am going to assume little experience in writing osquery SQL queries and share a couple useful tips along the way. If you want to skip ahead to the finished query feel free to jump ahead to the final attempt.

Attempt 1 — A basic mdfind query: Single condition, zero complexity

The most basic approach would be to search for any file containing the string 'backup'. All mdfind queries have the same basic building blocks, and they are all joined on a table that contains a path column, most typically the file table.

SELECT f.path FROM file AS f JOIN mdfind ON mdfind.path = f.path

This segment tells osquery to join file against mdfind on path and to return the file.path.

Next, we need to provide the mdfind.query component. This will be the method we use to search the device for the desired files and pass the appropriate path to the joined file table.

Our condition is represented here as:

AND mdfind.query = "kMDItemFSName == '*backup*'"

In it, we are telling the mdfind API to look for any files whose filename contains the partial string '*backup*' with the * characters representing wildcards which would allow us to match things like: 01-02-22-backup.zip or fritzsbackupfile.gzip

It’s important to note that the mdfind.query must be made within double quotations and individual operator comparison strings such as ‘backup’ must be within single quotes.

This gives us a complete and valid mdfind osquery query:

SELECT
  f.path
FROM file AS f
JOIN mdfind ON mdfind.path = f.path
 AND mdfind.query = "kMDItemFSName == '*backup*'"

Let’s give it a try now to see what is returned:

query results

Attempt 1 - 2016 Results

Oof!

With only 9 Devices Targeted, it is going to take a fair bit of sifting to find our needle in the haystack. You can imagine how much larger the result set becomes when you are querying thousands of machines.

Furthermore, because the only item returned by mdfind is path we need to ask the file table to return some more relevant information in our results:

“Tell me more, tell me more”

Attempt 2 — Two conditions and boolean logic

In order to expedite our ability to parse these results, let’s return some additional metadata such as:

file size: f.size
creation time: f.btime
last modified time: f.mtime

Additionally, we will use the datetime function to return time in the standard ISO-8601 format and we will use the ROUND function on f.size so that it returns in MB instead of bytes.

For more information on transforming osquery results into ‘pretty’ human readable output, I recommend reading: How to deal with dates and times in osquery

Alright, now that we have some data about the files that we can quickly scan by eye, let’s add some more conditional logic.

As we discussed earlier, Spotlight has a truly wild feature, which is the ability to search an item’s text content (across a pretty wide array of filetypes including .pdf, .olm, .py, etc.)

We can run our search against a file’s contents by calling the kMDItemTextContent metadata attribute in our query:

...AND mdfind.query = "kMDItemTextContent == 'foo'"

We need to think like a production database backup and consider what strings we would give a strong signal to noise ratio for filtering down results. For we might look for a CREATE TABLE statement that would indicate the presence of a standard SQL DB.

SELECT
  f.path,
  f.size AS size_bytes,
  ROUND((f.size * 10e-7),2) AS size_megabytes,
  f.btime AS file_created_epoch,
  datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
 AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE'"

Attempt 2 - 9 Results

🔔 Ding, ding, ding! 🔔 - I see our file!

But we can do better!!

Attempt 3 — Filtering down results

As you might have seen in our results set, there is an additional component we could be scoping our query with. Any uncompressed DB backup will likely range in size from hundreds of MB to hundreds of GB. Therefore, we can add a condition to our query to filter out any results that are below a certain threshold in size. For our example, we will ignore any file that is less than 100 MB.

SELECT
  f.path,
  f.size AS size_bytes,
  ROUND((f.size * 10e-7),2) AS size_megabytes,
  f.btime AS file_created_epoch,
  datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
 AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE'"
 AND size_megabytes > 100

the updated query

Attempt 3 - 1 Result

We did it! Using three clauses, we’ve isolated our test file!

Attempt 4 — But wait there’s more…

What if I told you there was another identical backup file without such a convenient naming schema located on the device? Well, we are going to miss it with that kMDItemFSName condition. Let’s try killing that condition and seeing what happens…

the new query showing false positives

🥜 Nuts! — Some false positives in there!

But I can see our other file. Let’s see if we can filter it down by further refining the kMDItemTextContent argument with some knowledge of the database we’re looking for.

I happen to know our production database should have a table called alerts, so let’s change the string to CREATE TABLE alerts:

SELECT
  f.path,
  f.size AS size_bytes,
  ROUND((f.size * 10e-7),2) AS size_megabytes,
  f.btime AS file_created_epoch,
  datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
 AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE alerts'"
 AND size_megabytes > 100

the new query showing false positives

Attempt 4 - 2 Results

Boom goes the dynamite!

We have found our db backups and can now reach out to the responsible end-user and ask that they delete the unused file.

Hopefully this iterative example will give you some inspiration into how you might leverage the mdfind table to locate and prevent unintentional catastrophic data breaches.

In the next section, we will discuss some of the basic syntax that you can use within the mdfind.query

mdfind Query Syntax Tips

Apple Support Documentation: File Metadata Query Expression Syntax

Comparative logic works using the following operators

== equals

!= not equal

< less than

> greater than

<= less than or equal to

>= greater than or equal to

c makes string case-insensitive

...mdfind.query = "kMDItemFSName = '*FoO'c"

d ignores diacritical marks (such as à, ê, ñ, ß, etc.)

...mdfind.query = "kMDItemFSName = '*föo'd"

* Wildcard lets you search for partial matches on either side of a string ...mdfind.query = "kMDItemFSName = '*foo'" && AND condition

...mdfind.query = "kMDItemFSName = 'foo' && kMDItemTextContent = 'bar'"

|| OR condition

...mdfind.query = "kMDItemFSName = 'foo' || kMDItemFSName = 'bar'"

( & ) Use parentheses to enclose multiple groups of conditions:

...mdfind.query = "(kMDItemFSName = 'foo' || kMDItemFSName = 'bar') && (kMDItemTextContent = 'paris' || kMDItemTextContent = 'france')

$time. Constraining to time such as files created in a time range: (eg. $time.now, $time.today, $time.yesterday, $time.this_week, $time.this_month, $time.this_year)

These can be further modified by providing a parenthetical number afterwards. The number in parenthesis refers to the unit of time measurement: (eg. now is registered in seconds, today in days, this_week weeks, etc.)

Let’s modify the $time.now example to search for files created in the last hour.

...mdfindquery = "kMDItemFSCreationDate >= $time.now(-3600)"

“Wow! This mdfind thing can do it all! There’s no stopping me!”

🐯 Easy tiger… there are a couple of gotchas that you have to look out for when using mdfind with osquery!

Caveats to the mdfind Osquery Table:

Spotlight can only give you the paths of matching files

“There’s no p in this gre”

While the contents of files can be read by mdfind, they cannot be printed (output) as part of the results set.

This is an intentional feature on the part of the Facebook osquery team. This limitation keeps the mdfind table within the scope of osquery’s privacy-minded development. The Facebook security team has stated: arbitrarily reading files has never been and will never be on the roadmap for core osquery development.

“It’s kind of like playing 20 questions”

Because you cannot print the surrounding strings that match within a file when querying kMDItemTextContent or examine the file by arbitrarily reading it in osquery. You must construct your query very intentionally in order to avoid false positives.

Remember our earlier example, if you wanted to look for downloaded copies of your production database across your infrastructure. You could not simply search for the string CREATE TABLE because it would net too many false positives. Instead you would want to specify a string that would be found only in a real db backup and strengthen your argument by adding exclusionary criteria such as scoping to file size.

No regular expressions

I want them, you want them, sadly none of us can have them.

As useful as regex would be, it is currently unsupported by Spotlight and Apple has shown no sign of intending to add regex functionality ever.

This means, no matter how cool it would be, you can’t run the following query within mdfind to search for plain-text files containing credit card numbers:

...AND mdfind.query = "-regex '^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$'

You have to get more creative when it comes to finding things like files containing credit cards or social security numbers. The easiest way is typically looking for files with strings like cvv, or ssn, but those may produce large quantities of false positives.

Another approach is including canary fingerprint values in your db backups which you can use to key off of (eg. a fake credit card # of a known value that _always belongs to a fake user).

mdfind only indexes *some of your files

Because the intended usage of Spotlight is to quickly locate relevant user actionable items (apps, system preferences, files), Spotlight does not index hidden folders or files by default. While mdfind can be forced to import from new directories, Apple prevents system files and hidden directories from being indexed. This means you cannot run the following query to find all of your locally cloned repositories:

...AND mdfind.query = "kMDItemFSName = '.git'"

This also means forget about indexing your .ssh folders or any items located within hidden folders (those prepended with the . character).

macOS only & Spotlight must not be disabled

It should go without saying that you cannot use the mdfind table on any operating system other than macOS.

I would love to see someone take on the challenge of developing a similar table for Windows Cortana, (the Linux space has too many options to choose only one) but for the moment we just have to satisfy ourselves with querying solely the Macs in our fleet via this method.

Furthermore, due to indexing performance issues in earlier iterations of OS X there are still those (grumpy developers) who disable mdfind on their system.

Users can also exclude directories from being indexed and globally limit what types of files are imported on their system.

spotlight privacy settings

To output any user-specified excluded directories you can run the following query:

SELECT
  value
FROM plist
WHERE path = '/.Spotlight-V100/VolumeConfiguration.plist'
  AND key = 'Exclusions'
  AND value IS NOT NULL
  AND value != ''

the query checking for excluded directories

Categories of files can also be excluded irrespective of their parent directory. For instance, some folks don’t love Spotlight indexing their Mail.app, Outlook.app or Messages.app conversation and email history. As a result querying for these files will often not return results.

a screenshot of Spotlight Item Index Preferences

To output the Spotlight preferences of a user you can run the following osquery query against their device. Unfortunately, due to the handling of nested keys by osquery, the plist XML output will need to be humanly parsed and cannot be procedurally checked.

SELECT *
FROM plist
WHERE path LIKE '/Users/%/Library/Preferences/com.apple.Spotlight.plist';

Advanced and easy mode, but easy mode isn’t always so easy

The mdfind table can be queried in one of two ways: explicitly, wherein you specify the kMDItem attributes and their desired criteria, or implicitly* wherein you simply provide a simple string of text like a user would in the Spotlight Search bar.

*I didn’t discuss this method earlier because it is my firm belief that it doesn’t work as well and results in too many false-positives, but I will demonstrate it here for the sake of being comprehensive.

Query for any file that contains foo:

...AND mdfind.query = "foo"

Query for any file that contains BOTH foo and bar *

*By default all space delimited strings are treated as AND‘ed conditions when simple querying Spotlight

...AND mdfind.query = "foo bar"

Query for any file that contains foo but NOT bar

...AND mdfind.query = "foo(-bar)"

Know your metadata attributes, use the ones that work

Apple Support Documentation: Spotlight Metadata Attributes

While using Spotlight in the UI of macOS allows intelligent full-text search across available attributes, the mdfind table in osquery is much more useful if specify the attributes you wish to search and the criteria you want to match: (eg. kMDItemFSName, kMDItemTextContent, etc.)

Because you need to know your metadata attributes by name it helps to keep a cheat-sheet like the above link handy so that you can find what you are looking for.

In general however the most useful items I have found are:

1. `kMDItemFSName`

Great for finding files of a certain extension type:

...AND mdfind.query = "kMDItemFSName == '*.pdf'"

2. `kMDItemTextContent`

Similar to our article’s production backup example, great for finding strings inside of compatible documents.

...AND mdfind.query = "kMDItemTextContent == '*ssn,*'"

3. `kMDItemFSCreationDate` & `kMDItemFSContentChangeDate`

Great for finding any file that was created / modified on, or within a range.

...AND mdfind.query =
"(   kMDItemFSCreationDate >= $time.iso(2018-08-01T00:00Z)
  && kMDItemFSCreationDate <= $time.iso(2018-08-20T00:00Z))
  && (kMDItemFSName = '*.csv')"

Some mdfind operators do not work as documented

`onlyin` has only spotty support

mdfind has an argument called: -onlyin /path/you/want which would typically constrain the results to only items which are within the specified parent directory (any level of nesting below that parent directory). This argument can be used but must be formatted accordingly with the string first followed by -onlyin:

...AND mdfind.query = "foo -onlyin /path/you/want/"

Because of the way that we are forced to write our query within double quotations you cannot (to my frustration) constrain to a parent directory with a space in the path. Enclosing the path in single quotes does not work, \ escaping the spaces does not work. If you can figure out a way, I would love to hear about it.

`kMDItemKind` is kind of garbage

You may be tempted to search for files that are pdf’s by typing:

...AND mdfind.query = "kMDFSItemKind == 'pdf'"

but DON’T!

For some totally unclear reason, the ItemKind metadata attribute is inconsistent at best and will miss files that it shouldn’t.

You should instead rely on kMDItemFSName with wildcards and explicitly name the desired extension, as it is more reliable:

...AND mdfind.query = "kMDItemFSName == '*.pdf'"

Chained `OR` conditions

mdfind in the terminal supports strings such as: mdfind foo|bar(-baz)

This would return items that matched foo OR bar, but NOT baz

These pipe | OR conditions cannot be used in osquery when using the simple syntax.

In order to OR conditions you must use explicit kMDItem conditions separated by double pipes: || eg.

...AND mdfind.query "kMDItemFSName == 'foo' || kMDItemFSName == 'bar'"

And on that note…

Maximum 5 `OR` conditions for a single metadata attribute

Let’s say you wanted to find files that contain any of the following strings.

fritz@acme.com
jane@acme.com
john@acme.com
frank@acme.com
stella@acme.com
joyce@acme.com
rupert@acme.com

You would only be able to specify 5 OR'ed conditions using: kMDItemTextContent = 'fritz@acme.com' || kMDItemTextContent = 'jane@...

With more than 5 Conditions, mdfind stops returning results entirely.

If you feel demoralized you regarding the usage of mdfind, I am here to tell you that despite these limitations, mdfind has allowed us at Kolide to build dozens of compelling queries for our customers.

Wrapping Things Up

As you can see, there is a wealth of possibility in the mdfind table for quickly locating files or performing aggregation functions.

Stay tuned for the next blog post in this series, wherein, we will build on the knowledge presented in this article to focus less on the basics and explore more real world examples such as:

Non permitted Photos taken while at a secure facility.
Finding files downloaded from a specific source (eg. plain-text 2FA backup codes)
CSV files created in the last 14 days, containing the string 'SSN’

If you simply can’t wait to read the next article, more mdfind queries can be found in Kolide powering our Checks feature which looks for potential sources of data compromise across your fleet.

If you’d like to read more osquery content like this, sign up for our biweekly newsletter.