Get Notified When Critical Processes Stop Running Using Kolide + Osquery
Osquery delivers unparalleled speed for endpoint monitoring. With a Live Query, you can ask thousands of devices simultaneously to report on their current state and get results back in realtime. However, no one has time to sit at a console running queries all day long.
Users need a method to persistently monitor specific items and collect that data over time. This requirement led to the creation of Collections. Collections allow users to repeatedly run queries and cache the results for devices over time.
The scheduled nature of these Collections means we can stay up to date on the current or last known state of a device, without manually running the same Live Query hundreds of times a day.
An example of a collection might be one that catalogs all of the macOS applications installed across your fleet. The query would look like:
SELECT * from apps;
The data output by this Collection is stored in two formats side by side, Results and Events.
Collections results
Results represent the most current data of devices matching the conditions of the Collection.
In this example, the Results tab would be comprised of any application that is currently installed on a mac. For this device there are 450 currently installed applications.
Collections events
Events represent the entire historical dataset of devices which have ever matched the conditions of the Collection.
So in our example, it would include all installed applications: observed OR observed and then later removed. Hence, the reason we have 10,429 events.
You might think 10,429 historical applications sounds like a lot. This is because anytime a row of data is changed, it results in a removed and added event. To better understand this concept, we can take the example of an application that undergoes an update.
+-----------------+---------+---------+--------------------------+
| name | version | event | time |
+-----------------+---------+---------+--------------------------+
| Adobe Photoshop | 11.0.1 | added | 2019-01-23T17:40:28.000Z |
| Adobe Photoshop | 10.9.8 | removed | 2019-01-23T17:40:28.000Z |
| Adobe Photoshop | 10.9.8 | added | 2018-03-10T09:21:15.000Z |
+-----------------+---------+---------+--------------------------+
After being updated, the version number of the application will change. Since version number is part of our result set, the original row of data for that application will no longer match the updated values. The old row will have a removed event and the newly collected data (with the updated version) number will be an added event.
Considering the likelihood and impact of churn
If you want to avoid producing unwanted churn of mostly duplicative events,
you need to consider the schema of your queried table and what columns are
subject to frequent change. For example, if you were to query the mounts
table you would not want to SELECT *
or include inodes
or blocks
columns, as they will be different nearly every time the mounts
table
is queried.
Collections advanced search
The most powerful feature of Collections is the Advanced Search, which allows you to search offline datasets and to build notification rules using boolean logic.
This type of searching allows you to drill down through your results set without necessitating multiple iterative Live Query attempts. For instance, using this we can check for Adobe Photoshop apps that haven’t been opened in a long time to facilitate software license retrieval.
Creating event tags and notifications using advanced search
Using Advanced Search from the Events tab we can build conditional rulesets for Event Tags which can power external Notifications.
If we wanted to check that a critical process was always up and running (Apache webserver, etc.) we could create a Collection and Event Tag to check for the process’ absence. For our example, let’s create a Notification which alerts us anytime the Slack desktop app process is not running.
Crafting the proper query
You might be tempted to use the following query to monitor a process’ presence on a device:
SELECT * FROM processes
Unfortunately, because of the exceptionally dynamic state of processes on a device, this would create huge amounts of churn as we discussed above, and your dataset would easily reach millions of rows of data recording anytime a process started and stopped.
Finding the absence of data
We want to take a more targeted approach with the processes
table. Likewise,
because we are trying to chase the absence of data, we will need to utilize a
query with NOT EXISTS
logic:
SELECT * FROM system_info
WHERE
NOT EXISTS (SELECT *
FROM processes
WHERE name = "Slack");
This query returns the system_info
for any device which is found not to
have the Slack
process currently running.
Creating the collection
We can go ahead and create the collection in Kolide Cloud by navigating to: https://kolide.com/app/collections/new
After naming our new Collection and choosing which platforms it will run on, we can copy paste our query from above and create the new Collection. Collections get added to the osquery schedule and as a result, typically take several minutes to initially populate with data.
Once the collection is created we will go to the Events tab, open the Advanced Search and create a new Event Tag for any events that match:
_kolide_action
is added
This way, we will know anytime a device is added to the Collection, meaning in our case, that the Slack process stopped running.
Configuring notifications on top of collections
Now that we have created a new Event Tag we can navigate to the Notifications page of the app and create a New Slack Channel notification:
https://kolide.com/app/settings/notifiers
Once the Slack notification has been configured we will get notifications in the specified Slack channel anytime the Event Tag criteria is met. So if we were to kill our Slack app for testing purposes we should see a new Event produced and a subsequent notification:
That’s a Wrap!
We will now receive a notification every time a device is online but the Slack desktop app process is not active. This same logic could be applied to any other process or the installation of an application using a similar NOT EXISTS query. We’ve had customers use similar queries to:
Check that their desired antivirus app was installed
Check that Apache is running
Check for properly customized etc/hosts
Finding the absence of expected data is just as critical as detecting the presence of unwanted data.
If you haven’t explored the incredible utility of osquery yet on your own, Kolide offers the fastest and easiest route to get up and running.
If you’d like to read more osquery content like this, sign up for our biweekly newsletter.