Summary
Log, or machine, data can contain a wealth of information and be used for multiple use cases from Security Analytics to IT Operations and Monitoring. However, it can often be very difficult to extract any meaning out of this data due to its structure and sheer volume.
Looker can be used to give log data meaning and make it easier for end users to extract insights from a dataset that is traditionally difficult to work with. A “Brute Force Attack” is a common Security Analytics pattern that is used to detect potentially malicious activity when a user’s attempted logins are consecutively denied several times before eventually succeeding.
Looker’s threshold-based alerts, schedules, and actions can be used to detect this activity, alert someone via email, slack, or text, and also trigger other workflows like opening a support ticket.
Dataset
This should be setup on “Access” or “Audit” logs that contain information on users and attempted logins. In this example we are using GCP Audit Logs which contain the following required fields: Login Time, Service Name (name of the service the user is trying to access), User Email, and Login Granted (Y/N). Your data may vary depending on what logs you are collecting but we will use these four columns as the basis of our analysis.
Solution
Because we need to understand the specific order and timing of which these events occurred, we will need to need to use a derived table that leverages SQL window functions and user-input liquid parameters. Using a derived table with user inputs (parameters), we can construct a query to help identify security threats. Scheduling this query to alert you when certain criteria are met can help you stay on top of your security threats without having to manually check logs. Our solution uses four CTEs to calculate all of the necessary fields needed to identify threats. Below, we will dissect each CTE used in our analysis.
Disclaimer: This analysis is written in BigQuery. Some SQL functions may need to be altered if you are using a different dialect. The goal of this article is to give you the necessary building blocks so that you can implement this within your Looker instance.
CTE Breakdown
rank_logins CTE: The first CTE in our derived table is used to calculate three columns, Login Rank, Login Grant Rank, and Previous Login Time.
-
Login Rank - What number login is this for a given user? This rank is cumulative as time progresses and increases by one each login regardless of whether or not the user had a successful login.
-
Login Grant Rank - What number login is this for a given user but paritioned by whether or not it was a successful login. Again, this rank increases by one for each new login but in this case we rank the successful logins separately from the unsuccessful logins.
-
Previous Login Time - What is the time of the previous login?
Both of the ranks columns are calculated using the ROW_NUMBER window function from BigQuery. The Previous Login Time column is calculated using the LAG window function.
Using date parameters we can grab the start and end date based on a user’s filter selection to only capture logs within a given timeframe. This timeframe can be relative (last 24 hours) or absolute (between 2020-10-01 and 2020-10-15).
There is an optional filter if you’d only like to capture logs from certain users (e.g. only look at logs for users with an @company.com email).
consec_logins CTE: Our second CTE in our derived table is used to calculate two new columns, Consec Login Rank and Event ID Timestamp. Consec Login Rank is calculated from Login Rank and Login Grant Rank from the rank_logins CTE.
-
Consec Login Rank - How many cumulative logins is this for a given user and service assuming we start the count over whenever we switch between a successful and unsuccessful login (and vice-versa). If you have four successful logins following by three unsuccessful logins following by two successful logins, we would rank these: 1, 2, 3, 4, 1, 2, 3, 1, 2
-
Event ID Timestamp - What’s the max timestamp for a given series of logins (each series of successful or unsuccessful logins for a given series is considered an event).
For this CTE, we use the ROW_NUMER and MAX window functions.
login_facts CTE: Our third CTE in our derived table is used to calculate two new “facts” about our logins, Latest Successful Login and Max # of Failed Consecutive Logins
-
Latest Successful Login - When was our last successful login (within the given timeframe from the first CTE)?
-
Max # of Failed Consecutive Logins - What was the max number of consecutive unsuccessful logins?
These columns are calculated for a given user and service.
max_failed_login CTE: The last CTE in our derived table is used to calculate one new column, Max Failed Login Time
- Max Failed Login Time - What is the max time associated with the most consecutive failed logins. Note: This is not just our most recent failed login time. We specifically want the max time associated with the max number of consecutive unsuccessful logins.
This column is calculated for a given user and service.
Putting It All Together
Now that we have all of the necessary calculations, we will join the CTEs together to complete our analysis. In this final table, we will add a few filters to ensure we only pull back security threats that are relevant.
-
Filter for events where the max successful login time is greater than the max failed login time. We only want to capture those events where an individual was able to gain access to our system after several unsuccessful logins.
-
Filter for failed logins above our threshold. One of the filters required for this analysis is a failed_login_threshold. This tells us how many consecutive failed logins are required before we consider something to be suspicious. For example, if you set this threshold at five, the analysis will only return events that had at least five failed logins in a row for a given service and where there was a successful login after that fifth (or higher) unsuccessful login.
-
Filter for the most recent successful login after our series of failed logins (above our threshold) so that we can see that this user successfully logged into our system after a suspicious number of unsuccessful logins.