How to Spotlight Search Across Every Mac With Osquery
Or — Learn how to find potential data breaches before they happen
Have you ever wanted to find exactly the right file on your system in a hurry?
That’s why those clever folks at Apple built Spotlight.
Originally introduced in Mac OS X Tiger, Spotlight continuously maintains an index of all of the files on your Mac, which allows you to instantly search for files not just by their names, but by a files’s metadata, and even the text content inside of them.
As the UX designer at Kolide, I lean on Spotlight heavily everyday, to quickly locate and pull up the right design assets from the 1000+ Sketch files on my system.
While the benefits of Spotlight as a user of macOS are obvious and intuitive, you may be surprised to learn you can leverage this powerful feature across your Mac fleet to hunt for evidence of malware, data breaches, and other undesirable artifacts in your end-users’ devices.
In this article, we will discuss how you can programmatically access Spotlight using a utility called Osquery, and we’ll demonstrate useful queries you can run to find evidence of compromise in your organization.
What Is Osquery?
Osquery is a free open-source project created by Facebook that allows you to query a device with SQL as if it were a real relational database.
For example, if you wanted to list all of the apps installed on a device you
can simply open your terminal, type osqueryi
(osquery’s command line utility)
and in the prompt run the following:
SELECT * FROM apps LIMIT 1;
name = 1Password 7.app
path = /Applications/1Password 7.app
bundle_executable = 1Password 7
bundle_identifier = com.agilebits.onepassword7
bundle_name = 1Password 7
bundle_short_version = 7.0.7
bundle_version = 70007000
bundle_package_type = APPL
environment =
element =
compiler = com.apple.compilers.llvm.clang.1_0
development_region = en
display_name =
info_string =
minimum_system_version = 10.12.6
category = public.app-category.productivity
applescript_enabled = 0
copyright = Copyright © AgileBits Inc.
last_opened_time = 1533860585.5755
When I run this query, osquery translates the SQL into live API calls that list all of the apps currently installed on my device. If I installed more apps and ran this command again, the output would immediately reflect those changes.
For more information on the architecture and inner-workings of osquery, see Zach Wasserman’s spectacular deep dive, Osquery: Under the Hood
In our query above, apps
is called a Virtual Table in osquery terminology.
Osquery has hundreds of these virtual tables, many of them which work across
platforms. To see them all check out
osquery’s schema documentation.
To download osquery on your device, simply visit the official website and find the right package for your platform.
This article focuses on a special Virtual Table called mdfind. This Virtual
Table (named after the CLI utility mdfind
) allows us to use Spotlight to
locate files on a Mac that meet our search criteria.
The mdfind Virtual Table
Prior to mdfind’s inclusion into osquery, searching for files across the file-system when their location was unknown, necessitated great deals of recursion, and was best avoided to prevent undue strain on the device being queried.
You can read more about searching for files using the file
table in my previous
blog post: The File Table in Osquery is Amazing! — but it has drawbacks.
Enter macOS Spotlight and the mdfind table!
macOS Spotlight (mdfind) is like a lightweight grep
without the p
. The best
part is, because it’s built around an index, it’s insanely fast and well
suited to locating files where the precise location or name is unknown.
The mdfind
Virtual Table was born first as a custom osquery go table written
by Victor Vrantchan (groob) and later added
(v3.2.6) to the core Osquery open-source project by Facebook developer
Mitchell Grenier (obelisk)
Unlike most osquery Virtual Tables, where querying is straightforward if you are
familiar with SQL, mdfind
requires some Apple developer-level knowledge
of Spotlight’s own unique query language. You should think of mdfind
like a
pass-through that allows you to access the raw power of Spotlight from osquery.
What Can mdfind Find?
Spotlight indexes an absolutely incredible breadth of data across your file system. There are over 125 published metadata attributes that Spotlight is capable of indexing and which you can search by. However most systems actually have closer to double that number and 3rd party applications can add even more via custom attributes.
You can check to see what metadata attributes are present on your own device by running the following command in the terminal:
mdimport -A
The naming schema is straightforward and all attributes are prepended by
kMDItem
:
k - (Hungarian Notation for constant, used by Apple since Pascal)
MD - (metadata)
Item
AttributeName
The standard stuff is all there of course:
- File Name (
kMDItemFSName
) - File Size (
kMDItemFSSize
) - File Creation Date (
kMDItemFSCreationDate
)
Spotlight also gives you access to metadata that you may not realize even exists across your files including:
- Downloaded File Source (
kMDItemWhereFroms
) - File EXIF Altitude in Meters above sea-level (
kMDItemAltitude
) - PDF Password Security Method (
kMDItemSecurityMethod
)
While some of these are useful (and others just strange), they still require some basic level of knowledge about the files themselves. What if you are just looking for files of any type that contain specific phrases or confidential information? This is where Spotlight really shines (pun intended). This is accessed through the attribute…
- File Text Contents (
KMDItemTextContent
)
Spotlight can index the text contents of any ASCII plain-text files, PDFs, Messages, emails, text files, csv’s, python files, shell scripts, JSON and other compatible formats, that are within an indexable location. That’s right, it’s pretty much the bee’s knees! Let’s discuss some practical applications of that utility.
Using mdfind to Prevent a Data Breach
Unlike other types of devastating cyber security incidents that involve advanced threat actors and malware, most data breaches for SaaS companies can be attributed to innocuous events. For example, a well-intentioned software engineer simply forgetting to delete a production database backup on their device after troubleshooting a customer issue.
Once a production database backup is on a device, forgetting to encrypt that device and leaving it in the back of a cab is all it takes for a simple oversight to turn into a disastrous headline.
So, let’s say you were afraid your engineers had unintentionally left a copy of
your production database on their laptop. How would we find it? For this example,
I will be using Kolide’s Live Query feature to run a search across a few
of our own Kolide devices (but you could use osqueryi
to test
these queries yourself).
I have created a mock production db dump on my device to illustrate the process, located at:
/Users/fritz-imac/dev/pg/backups/backup_2018-07-11T06-57-36Z
** 👀 Let’s see if we can find it!**
* For the purposes of this post I am going to assume little experience in writing osquery SQL queries and share a couple useful tips along the way. If you want to skip ahead to the finished query feel free to jump ahead to the final attempt.
Attempt 1 — A basic mdfind query: Single condition, zero complexity
The most basic approach would be to search for any file containing the string
'backup'
. All mdfind queries have the same basic building blocks, and they are
all joined on a table that contains a path
column, most typically the file
table.
SELECT f.path FROM file AS f JOIN mdfind ON mdfind.path = f.path
This segment tells osquery to join file
against mdfind
on path
and to
return the file.path
.
Next, we need to provide the mdfind.query
component. This will be the method
we use to search the device for the desired files and pass the appropriate
path
to the joined file
table.
Our condition is represented here as:
AND mdfind.query = "kMDItemFSName == '*backup*'"
In it, we are telling the mdfind API to look for any files whose filename
contains the partial string '*backup*'
with the *
characters representing
wildcards which would allow us to match things like: 01-02-22-backup.zip
or
fritzsbackupfile.gzip
It’s important to note that the mdfind.query
must be made within double
quotations and individual operator comparison strings such as ‘backup’ must be
within single quotes.
This gives us a complete and valid mdfind osquery query:
SELECT
f.path
FROM file AS f
JOIN mdfind ON mdfind.path = f.path
AND mdfind.query = "kMDItemFSName == '*backup*'"
Let’s give it a try now to see what is returned:
Attempt 1 - 2016 Results
Oof!
With only 9 Devices Targeted, it is going to take a fair bit of sifting to find our needle in the haystack. You can imagine how much larger the result set becomes when you are querying thousands of machines.
Furthermore, because the only item returned by mdfind is path we need to ask the file table to return some more relevant information in our results:
“Tell me more, tell me more”
Attempt 2 — Two conditions and boolean logic
In order to expedite our ability to parse these results, let’s return some additional metadata such as:
- file size:
f.size
- creation time:
f.btime
- last modified time:
f.mtime
Additionally, we will use the datetime
function to return time in the standard
ISO-8601 format and we will use the ROUND
function on f.size
so that it
returns in MB instead of bytes.
For more information on transforming osquery results into ‘pretty’ human readable output, I recommend reading: How to deal with dates and times in osquery
Alright, now that we have some data about the files that we can quickly scan by eye, let’s add some more conditional logic.
As we discussed earlier, Spotlight has a truly wild feature, which is the ability to search an item’s text content (across a pretty wide array of filetypes including .pdf, .olm, .py, etc.)
We can run our search against a file’s contents by calling the
kMDItemTextContent
metadata attribute in our query:
...AND mdfind.query = "kMDItemTextContent == 'foo'"
We need to think like a production database backup and consider what strings we
would give a strong signal to noise ratio for filtering down results. For
we might look for a CREATE TABLE
statement that would indicate the presence of
a standard SQL DB.
SELECT
f.path,
f.size AS size_bytes,
ROUND((f.size * 10e-7),2) AS size_megabytes,
f.btime AS file_created_epoch,
datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE'"
Attempt 2 - 9 Results
🔔 Ding, ding, ding! 🔔 - I see our file!
But we can do better!!
Attempt 3 — Filtering down results
As you might have seen in our results set, there is an additional component we could be scoping our query with. Any uncompressed DB backup will likely range in size from hundreds of MB to hundreds of GB. Therefore, we can add a condition to our query to filter out any results that are below a certain threshold in size. For our example, we will ignore any file that is less than 100 MB.
SELECT
f.path,
f.size AS size_bytes,
ROUND((f.size * 10e-7),2) AS size_megabytes,
f.btime AS file_created_epoch,
datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE'"
AND size_megabytes > 100
Attempt 3 - 1 Result
We did it! Using three clauses, we’ve isolated our test file!
Attempt 4 — But wait there’s more…
What if I told you there was another identical backup file without such a
convenient naming schema located on the device? Well, we are going to miss it
with that kMDItemFSName
condition. Let’s try killing that condition and seeing
what happens…
🥜 Nuts! — Some false positives in there!
But I can see our other file. Let’s see if we can filter it down by further
refining the kMDItemTextContent
argument with some knowledge of the database
we’re looking for.
I happen to know our production database should have a table called alerts
, so
let’s change the string to CREATE TABLE alerts
:
SELECT
f.path,
f.size AS size_bytes,
ROUND((f.size * 10e-7),2) AS size_megabytes,
f.btime AS file_created_epoch,
datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE alerts'"
AND size_megabytes > 100
Attempt 4 - 2 Results
Boom goes the dynamite!
We have found our db backups and can now reach out to the responsible end-user and ask that they delete the unused file.
Hopefully this iterative example will give you some inspiration into how you
might leverage the mdfind
table to locate and prevent unintentional
catastrophic data breaches.
In the next section, we will discuss some of the basic syntax that you can use
within the mdfind.query
mdfind Query Syntax Tips
Apple Support Documentation: File Metadata Query Expression Syntax
Comparative logic works using the following operators
==
equals
!=
not equal
<
less than
>
greater than
<=
less than or equal to
>=
greater than or equal to
c
makes string case-insensitive
...mdfind.query = "kMDItemFSName = '*FoO'c"
d
ignores diacritical marks (such as à, ê, ñ, ß, etc.)
...mdfind.query = "kMDItemFSName = '*föo'd"
*
Wildcard lets you search for partial matches on either side of a string
...mdfind.query = "kMDItemFSName = '*foo'"
&&
AND condition
...mdfind.query = "kMDItemFSName = 'foo' && kMDItemTextContent = 'bar'"
||
OR condition
...mdfind.query = "kMDItemFSName = 'foo' || kMDItemFSName = 'bar'"
(
& )
Use parentheses to enclose multiple groups of conditions:
...mdfind.query = "(kMDItemFSName = 'foo' || kMDItemFSName = 'bar') && (kMDItemTextContent = 'paris' || kMDItemTextContent = 'france')
$time.
Constraining to time such as files created in a time range:
(eg. $time.now
, $time.today
, $time.yesterday
, $time.this_week
,
$time.this_month
, $time.this_year
)
These can be further modified by providing a parenthetical number afterwards.
The number in parenthesis refers to the unit of time measurement: (eg. now
is
registered in seconds, today
in days, this_week
weeks, etc.)
Let’s modify the $time.now
example to search for files created in the last
hour.
...mdfindquery = "kMDItemFSCreationDate >= $time.now(-3600)"
“Wow! This mdfind thing can do it all! There’s no stopping me!”
🐯 Easy tiger… there are a couple of gotchas that you have to look out for when using mdfind with osquery!
Caveats to the mdfind Osquery Table:
Spotlight can only give you the paths of matching files
“There’s no p in this gre”
While the contents of files can be read by mdfind, they cannot be printed (output) as part of the results set.
This is an intentional feature on the part of the Facebook osquery team. This limitation keeps the mdfind table within the scope of osquery’s privacy-minded development. The Facebook security team has stated: arbitrarily reading files has never been and will never be on the roadmap for core osquery development.
“It’s kind of like playing 20 questions”
Because you cannot print the surrounding strings that match within a file when
querying kMDItemTextContent
or examine the file by arbitrarily reading it in
osquery. You must construct your query very intentionally in order to avoid
false positives.
Remember our earlier example, if you wanted to look for downloaded copies of
your production database across your infrastructure. You could not simply search
for the string CREATE TABLE
because it would net too many false positives.
Instead you would want to specify a string that would be found only in a real db
backup and strengthen your argument by adding exclusionary criteria such as
scoping to file size.
No regular expressions
I want them, you want them, sadly none of us can have them.
As useful as regex would be, it is currently unsupported by Spotlight and Apple has shown no sign of intending to add regex functionality ever.
This means, no matter how cool it would be, you can’t run the following query within mdfind to search for plain-text files containing credit card numbers:
...AND mdfind.query = "-regex '^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$'
You have to get more creative when it comes to finding things like files
containing credit cards or social security numbers. The easiest way is typically
looking for files with strings like cvv
, or ssn
, but those may produce large
quantities of false positives.
Another approach is including canary fingerprint values in your db backups which you can use to key off of (eg. a fake credit card # of a known value that _always belongs to a fake user).
mdfind only indexes *some of your files
Because the intended usage of Spotlight is to quickly locate relevant user actionable items (apps, system preferences, files), Spotlight does not index hidden folders or files by default. While mdfind can be forced to import from new directories, Apple prevents system files and hidden directories from being indexed. This means you cannot run the following query to find all of your locally cloned repositories:
...AND mdfind.query = "kMDItemFSName = '.git'"
This also means forget about indexing your .ssh
folders or any items located
within hidden folders (those prepended with the .
character).
macOS only & Spotlight must not be disabled
It should go without saying that you cannot use the mdfind table on any operating system other than macOS.
I would love to see someone take on the challenge of developing a similar table for Windows Cortana, (the Linux space has too many options to choose only one) but for the moment we just have to satisfy ourselves with querying solely the Macs in our fleet via this method.
Furthermore, due to indexing performance issues in earlier iterations of OS X there are still those (grumpy developers) who disable mdfind on their system.
Users can also exclude directories from being indexed and globally limit what types of files are imported on their system.
To output any user-specified excluded directories you can run the following query:
SELECT
value
FROM plist
WHERE path = '/.Spotlight-V100/VolumeConfiguration.plist'
AND key = 'Exclusions'
AND value IS NOT NULL
AND value != ''
Categories of files can also be excluded irrespective of their parent directory. For instance, some folks don’t love Spotlight indexing their Mail.app, Outlook.app or Messages.app conversation and email history. As a result querying for these files will often not return results.
To output the Spotlight preferences of a user you can run the following osquery query against their device. Unfortunately, due to the handling of nested keys by osquery, the plist XML output will need to be humanly parsed and cannot be procedurally checked.
SELECT *
FROM plist
WHERE path LIKE '/Users/%/Library/Preferences/com.apple.Spotlight.plist';
Advanced and easy mode, but easy mode isn’t always so easy
The mdfind table can be queried in one of two ways: explicitly, wherein you
specify the kMDItem
attributes and their desired criteria, or implicitly*
wherein you simply provide a simple string of text like a user would in the
Spotlight Search bar.
*I didn’t discuss this method earlier because it is my firm belief that it doesn’t work as well and results in too many false-positives, but I will demonstrate it here for the sake of being comprehensive.
Query for any file that contains foo
:
...AND mdfind.query = "foo"
Query for any file that contains BOTH foo
and bar
*
*By default all space delimited strings are treated as AND
‘ed conditions when simple querying Spotlight
...AND mdfind.query = "foo bar"
Query for any file that contains foo
but NOT bar
...AND mdfind.query = "foo(-bar)"
Know your metadata attributes, use the ones that work
Apple Support Documentation: Spotlight Metadata Attributes
While using Spotlight in the UI of macOS allows intelligent full-text search
across available attributes, the mdfind
table in osquery is much more useful
if specify the attributes you wish to search and the criteria you want to match:
(eg. kMDItemFSName
, kMDItemTextContent
, etc.)
Because you need to know your metadata attributes by name it helps to keep a cheat-sheet like the above link handy so that you can find what you are looking for.
In general however the most useful items I have found are:
1. kMDItemFSName
Great for finding files of a certain extension type:
...AND mdfind.query = "kMDItemFSName == '*.pdf'"
2. kMDItemTextContent
Similar to our article’s production backup example, great for finding strings inside of compatible documents.
...AND mdfind.query = "kMDItemTextContent == '*ssn,*'"
3. kMDItemFSCreationDate
& kMDItemFSContentChangeDate
Great for finding any file that was created / modified on, or within a range.
...AND mdfind.query =
"( kMDItemFSCreationDate >= $time.iso(2018-08-01T00:00Z)
&& kMDItemFSCreationDate <= $time.iso(2018-08-20T00:00Z))
&& (kMDItemFSName = '*.csv')"
Some mdfind operators do not work as documented
onlyin
has only spotty support
mdfind has an argument called: -onlyin /path/you/want
which would typically
constrain the results to only items which are within the specified parent directory
(any level of nesting below that parent directory). This argument can be used
but must be formatted accordingly with the string first followed by -onlyin
:
...AND mdfind.query = "foo -onlyin /path/you/want/"
Because of the way that we are forced to write our query within double
quotations you cannot (to my frustration) constrain to a parent directory with a
space in the path. Enclosing the path in single quotes does not work, \
escaping the spaces does not work. If you can figure out a way, I would love to
hear about it.
kMDItemKind
is kind of garbage
You may be tempted to search for files that are pdf’s by typing:
...AND mdfind.query = "kMDFSItemKind == 'pdf'"
but DON’T!
For some totally unclear reason, the ItemKind metadata attribute is inconsistent at best and will miss files that it shouldn’t.
You should instead rely on kMDItemFSName with wildcards and explicitly name the desired extension, as it is more reliable:
...AND mdfind.query = "kMDItemFSName == '*.pdf'"
Chained OR
conditions
mdfind
in the terminal supports strings such as: mdfind foo|bar(-baz)
This would return items that matched foo
OR bar
, but NOT baz
These pipe |
OR conditions cannot be used in osquery when using the simple
syntax.
In order to OR
conditions you must use explicit kMDItem
conditions separated
by double pipes: || eg.
...AND mdfind.query "kMDItemFSName == 'foo' || kMDItemFSName == 'bar'"
And on that note…
Maximum 5 OR
conditions for a single metadata attribute
Let’s say you wanted to find files that contain any of the following strings.
fritz@acme.com
jane@acme.com
john@acme.com
frank@acme.com
stella@acme.com
joyce@acme.com
rupert@acme.com
You would only be able to specify 5 OR
'ed conditions using:
kMDItemTextContent = 'fritz@acme.com' || kMDItemTextContent = 'jane@...
With more than 5 Conditions, mdfind
stops returning results entirely.
If you feel demoralized you regarding the usage of mdfind, I am here to tell you
that despite these limitations, mdfind
has allowed us at Kolide to build dozens
of compelling queries for our customers.
Wrapping Things Up
As you can see, there is a wealth of possibility in the mdfind table for quickly locating files or performing aggregation functions.
Stay tuned for the next blog post in this series, wherein, we will build on the knowledge presented in this article to focus less on the basics and explore more real world examples such as:
- Non permitted Photos taken while at a secure facility.
- Finding files downloaded from a specific source (eg. plain-text 2FA backup codes)
- CSV files created in the last 14 days, containing the string 'SSN’
If you simply can’t wait to read the next article, more mdfind queries can be found in Kolide powering our Checks feature which looks for potential sources of data compromise across your fleet.
If you’d like to read more osquery content like this, sign up for our biweekly newsletter.