How We Built an SQL Firewall — AcraCensor

Intro

There are two main ways to mitigate SQL injections: inside the app (using prepared statements, stored procedures, escaping) and outside the app (using Web Application Firewalls or SQL firewalls).

WAFs analyse web and HTML traffic using rule sets based on regexs and are good for covering the known vulnerabilities. SQL firewalls sit closer to the database, analyse SQL statements for potentially malicious content, which makes them more flexible in SQL injections prevention. As our data encryption suite Acra implements defense in depth principle, it uses SQL firewall as one of the lines of defense for protected data.

We had previously covered the issue of “WAF vs SQL firewalls” in detail in the article “Preventing SQL Injections When WAF’s Not Enough” — you can read it if you’d like to delve deeper into the differences between these firewall types (or you can skip right to Building an SQL firewall to make SQL injection prevention easier section).

Why do you need an SQL firewall?

WAFs are a common way of providing database protection against SQL injections. The problem with this approach is that they mostly filter out the known SQL injection types and attack patterns, doing it on the network level, verifying the signatures in the traffic in the inner network. There is a million of ways to bypass WAF or to trigger false positives (i.e. ModSecurity is often criticised for triggering dozens of false alarms). With WAF, there a technical possibility for malicious code to be interpreted by the backend of the app and executed on the database.

SQL firewall can detect the malicious command in the query that would look harmless to a WAF

SQL firewalls improve security in cases where WAF isn't sufficient (also covered in the article mentioned above). When we’ve built Acra encryption suite, we quickly realized that having encryption/decryption proxy in front of a database is a perfect place for integrating an SQL firewall. But when we tried to find a suitable open source firewall, we faced the lack thereof. Since Acra has to parse SQL statements anyway, we created our own SQL filtering engine as inseparable part of Acra encryption suite — the AcraCensor SQL firewall.

Building an SQL firewall to make SQL injection prevention easier

Just like all the other components of Acra database protection suite, AcraCensor is open source and is written using Go. AcraCensor only supports SQL database types — MySQL and PostgreSQL (and their flavours i.e. MariaDB). AcraCensor logs SQL requests directed to the database, allows and denies those requests according to certain rules (configured as allowlist/denylist), or ignores them and logs exception (might be useful if some requests can’t be parsed by AcraCensor).

We made AcraCensor compatible with SIEM systems, with logs useable for alerts configuration and anomaly detection. To demonstrate the process of configuring AcraCensor for SQLi prevention in OWASP Mutillidae 2 example app, we created a Docker-based demo.

Here (and previously in this article) we’re using some slides from the conference talk by Artem Storozhuk, software security engineer at Cossack Labs who worked on AcraCensor. The slides help explain the principles and details of AcraCensor’s work.

A video demo of AcraCensor at work

AcraCensor is configured through a set of rules, which define which SQL queries to let through and which — to block:

handlers:
  - handler: deny
    queries:
      - SELECT * FROM users
    tables:
      - cars
    patterns:
      - SELECT user_address FROM users %%WHERE%%
      - "%%INSERT%%"
  - handler: allowall

A very simple example configuration of AcraCensor rules. Read on for more details.

Parsing SQL protocol: a thousand ways to shoot yourself in the foot

The first thing that AcraCensor does for blocking requests is parsing SQL statements, which is not easy. Let’s see how it’s done:

Each incoming query is parsed into its AST structure according to the syntax of the SQL protocol. Each rule from the configuration is also parsed into an AST. Upon launch, AcraCensor translates configuration rules into AST structures in memory so that it doesn’t compromise the performance on processing of each query. If an incoming query agrees with at least one allow rule, AcraCensor allows it to proceed; otherwise, it is blocked (actually rules processing is a more complex multi-step strategy as AcraCensor supports several ways of matching, we outline more details on this below).

The Difference in Syntax Between MySQL and PostgreSQL

Both MySQL and PostgreSQL implement their “supersets” above the SQL protocol. Their syntax and the keywords and very different. Here are the main differences we’ve tripped over:

  • RETURNING keyword (supported by PostgreSQL, but not by MySQL);
  • Interpretation of double quoted identifiers: “tableName” | “columnName”;
  • JSON (PostgreSQL has following syntax '{"a":1,"b":2}'::json->'b', MySQL has different one JSON_OBJECT('{"a":1,"b":2}')->"$.b");
  • Full text search (is not standardized in SQL so every database implements it in its own way);
  • Prepared Statements syntax (prepare FROM vs prepare AS);
  • Common table expressions (set up names for queries);
  • Typecasting (column_name::type_name vs column_name type_name).

And this is just a quick comparison — the complete list of differences can go on and on. If you want to see the bottom of this iceberg — read more about the differences in syntax between MySQL and PostgreSQL.

Lack of a good open source SQL parser

Because of the syntax differences, many projects only support a specific version of the SQL protocol for the exact database they work with. At first, we used SQLparser by xwb1989 — an open source parser written on Go, but it didn’t quite meet our needs as it only supports a limited set of queries and doesn’t support prepared statements, stored procedures, and complex requests.

A few pull requests would have turned SQLparser into a tool we could use. But the SQLparser’s repository looked abandoned so we decided to fork it and expand within Acra’s repository, adding specific tweaks for several database types we work with.

No SQL parser can parse all SQL statements

Unfortunately, even with our additions and improvements, AcraCensor still wasn’t able to parse all of the queries. And it's potentially dangerous to send such queries to the database as they might contain SQL injections. So for security reasons, we decided that AcraCensor will block "unparseable" queries by default. However, some of these queries might be totally legit, so users should have a way to allow them. To remedy this, we added a special ignore_parse_error field into the configuration.

By default, ignore_parse_error is false in AcraCensor. Setting it to true will push AcraCensor to ignore "unparseable" queries and send them to the database — allowing developers to configure AcraCensor rules during development and integration.

In production environment, security-oriented users leave ignore_parse_error disabled, but add such “unparseable but legit” queries to the QueryIgnore handler manually, thus having well-defined and secure firewall configuration. The users encountering parsing errors can send us their GitHub Issue or PR and we'll look into it and try to help.

Configuration and rules

Existing handlers

Here is an example of configuring AcraCensor. It consists of handlers that are applied top to bottom:


ignore_parse_error: true
parse_errors_log: unparsed_queries.log
handlers:
  - handler: deny
    queries:
      - SELECT * FROM users
    tables:
      - customers
      - secrets
      - salaries
    patterns:
      - SELECT EMP_ID, LAST_NAME FROM EMPLOYEE %%WHERE%%;
      - SELECT user_address FROM users %%WHERE%%
      - SELECT %%COLUMN%% FROM users WHERE userid = %%VALUE%%
  - handler: allow
    queries: 
      - SELECT username, password, mysignature, is_admin FROM accounts;
    tables:
      - credit_cards
      - accounts
      - blogs_table
      - page_hints
      - page_help
    patterns:
      - "%%INSERT%%"

Allowlist and denylist

There are two basic handler types for AcraCensor:

  • The allow, which allows something specific and restricts/forbids everything else. The allowall that should be a final statement, that means that all other queries will be allowed.
  • The deny, which allows everything and forbids something specific. The denyall means to block all queries (that haven’t been allowed or ignored before).

For each handler, there are settings that regulate:

  • Queries — exact match character-by-character of the incoming SQL query and configured one;
  • Tables — match all requests for specified tables;
  • Patterns — match specific SQL statement types.

You can configure the allowlist and the denylist separately or simultaneously. The order of priority for the lists is defined by their order in the configuration file. Priority of processing for each list is the following: queries, followed by tables, followed by patterns.

Pattern matching

Pattern matching is a feature that was really fun to build. As we mentioned before, each query is parsed into AST structure. Each input pattern from the configuration file is also parsed as AST structure.

Patterns work in the following manner: when a query addressing the database is checked, each pattern (if any detected) is parsed alongside the incoming query. Parsing process extracts the SQL statement type (SELECT, INSERT, UPDATE etc), the WHERE and UNION nodes, tables, columns, and values from the body of the query. If the pattern and the query have matching nodes, then tables are checked. If matches for tables are detected, then columns are checked. When separate columns match, a decision to block (through denylist) or to allow (through allowlist) the query is made.

The following piece of code illustrates how every incoming query is checked rule-by-rule:


// CheckQuery checks each query, returns false and error if query is not whitelisted or
// if query tries to access to non-whitelisted table
func (handler *AllowHandler) CheckQuery(normalizedQuery string, parsedQuery sqlparser.Statement) (bool, error) {
  // skip unparsed queries
  if parsedQuery == nil {
    return true, nil
  }
  //Check exact queries
  if len(handler.queries) != 0 {
    queryMatch := common.CheckExactQueriesMatch(normalizedQuery, handler.queries)
    if queryMatch {
      return false, nil
    }
  }
  //Check tables
  if len(handler.tables) != 0 {
    _, allTablesInWhitelist := common.CheckTableNamesMatch(parsedQuery, handler.tables)
    if allTablesInWhitelist {
      return false, nil
    }
  }
  //Check patterns
  if len(handler.patterns) != 0 {
    matchingOccurred := common.CheckPatternsMatching(handler.patterns, parsedQuery)
    if matchingOccurred {
      return false, nil
    }
  }
  return true, nil
}

CheckPatternsMatching method is basically a loop that tried to match query with every patterns from configuration. Jump inside the AcraCensor matching core to see details about pattern matching for each of SQL nodes.

Take a look at the example of COLUMN pattern:


rule
SELECT %%COLUMN%%, %%COLUMN%% FROM company

matching queries
SELECT users, cats FROM company
SELECT a, b FROM company

non-matching queries
SELECT users FROM company
SELECT users, cats, chameleons FROM company 
SELECT users, cats, chameleons FROM company
SELECT users, cats FROM zoo

AcraCensor supports several dozen of patterns, the full list can be found in the documentation for Acra/AcraCensor.

Security-wise configurations

Despite the fact that AcraCensor can be configured in many ways, we suggest the following structure of handlers for better security:

  • Put query_capture first to log every query into the censor_log.
  • Next, put query_ignore to ignore some database-specific control queries that occur when the database is starting and when the database drives check connections.
  • Following the secure by default technique, put the allow handler to allow listed queries.
  • Finally, put the denyall handler. In this case all queries that are not in the allowlist will be blocked.

There is a less secure, still a useable approach when the deny handler is set for blocking all the unwanted queries, and then the allowall handler is put to allow all the other queries.

Example configuration for “Allow — Denyall” rule:


parse_errors_log: unparsed_queries.log
handlers:
  - handler: query_capture
    filepath: censor.log
  - handler: query_ignore
    queries:
      - ROLLBACK
      - COMMIT
      - BEGIN
  - handler: deny
    patterns:
      - SELECT * FROM users
      - SELECT %%COLUMN%% FROM users %%WHERE%%
      - DROP table users
  - handler: allow
    tables:
      - users
      - accounts
      - blogs_table
    patterns:
      - "%%INSERT%%"
  - handler: denyall

Logging and masking requests

It’s very important that AcraCensor doesn’t log sensitive data. That’s why the first thing AcraCensor does is masking the values in SQL queries’ logs. After building AST structure, all “leaves” or “values” are edited and masked to replaced keyword.

This is what the logs from AcraCensor look like:


select * from accounts where cid = :replaced1
insert into hitlog(hostname, ip, browser, referer, `date`) values (:replaced1, :replaced2, :replaced3, :replaced4, now())
insert into accounts(username, password, mysignature) values (:replaced1, :replaced2, :replaced3)

This is what the code looks like:


/// HandleRawSQLQuery returns a normalized (lowercases SQL commands) SQL string,
// and redacted SQL string with the params stripped out for display.
// Taken from sqlparser package
func HandleRawSQLQuery(sql string) (normalizedQuery, redactedQuery string, parsedQuery sqlparser.Statement, err error) {
  bv := map[string]*querypb.BindVariable{}
  sqlStripped, _ := sqlparser.SplitMarginComments(sql)

  // sometimes queries might have ; at the end, that should be stripped
  sqlStripped = strings.TrimSuffix(sqlStripped, ";")

  stmt, err := sqlparser.Parse(sqlStripped)
  if err != nil {
    return "", "", nil, ErrQuerySyntaxError
  }
  outputStmt, _ := sqlparser.Parse(sqlStripped)

  normalizedQ := sqlparser.String(stmt)

  // redact and mask VALUES
  sqlparser.Normalize(stmt, bv, ValueMask)
  redactedQ := sqlparser.String(stmt)

  return normalizedQ, redactedQ, outputStmt, nil
}

Performance

Any database proxy is bound to introduce network performance drawbacks. We did some tests on working laptops (not a server-like configuration) and found out that AcraCensor introduces ~6,7% performance penalty compared to using Acra without firewall.

We measured the number of connections during one session from one client application to PostgreSQL database and back. First we did a few rounds of testing to measure the “ideal performance” of the client-server-database infrastructure, then we introduced Acra as data protection proxy between backend app and database and repeated tests, and finally, we enabled AcraCensor with complex SQL firewall rule set and repeated tests again. We had no goal to get the best possible results or to optimize the database performance, however we are quite happy with current AcraCensor performance results (the details about testing and measurements can be found in the talk “Building SQL firewall: insights from developers”).

Future improvements of AcraCensor

At the present moment, the users need to set the AcraCensor configuration rules manually. To do that more quickly, we recommend turning on the firewall in the “allowall” mode and running the app in the beta-testing mode to collect all firewall logs and processed queries. Based on the logs, you’ll be able to formulate the rules and patterns for the firewall.

But how to make AcraCensor more user-friendly? By adding “dynamic profiling”. The stage of rule-setting can be automated - we plan to add machine learning for the SQL firewall to be able to automatically accumulate the queries, define, and suggest the rules. From the security point of view, such machine learning system should not be run/tested in a production setting because there is a risk that the attackers may “teach” the firewall to set the “wrong” rules. Many security engineers are quite sceptical about all the “machine learning” hype, but we believe that it makes our work easier if used correctly.

Defense in depth

AcraCensor is part of Acra’s defense in depth approach, that includes:

  • Sensitive data encryption (encrypt data on client side in a way so it can’t be decrypted on client side or database side);
  • Trusted zone data decryption (decrypt data only in well-monitored trusted zone — AcraServer itself — that stores cryptographic keys securely and controls who and why wants to decrypt the data);
  • Access control (encrypt data per user/per application using different keys);
  • Intrusion detection system (based on poison records (honey tokens));
  • Audit logs (to have a track of entities who accesses data, with timestamps and access details);
  • Integration to SIEMs (for monitoring events and analyzing anomalies);
  • Logging, metrics, tracing (to understand what’s going on under the hood).

To sum it up, it will be worth mentioning that we still don’t expect AcraCensor or some other SQL firewall, or, in fact, any other single tool alone to become a “silver bullet” of security. One tool — even great one — is never enough for solid protection.

Security-wise, any project should implement echelonised defense principles and not rely on a single tool, building security for the whole data lifecycle instead. This is what defence in depth is — an independent set of security controls aimed at mitigating multiple risks in case of an attacker crossing the outer perimeter.

Copyright © 2014-2019 Cossack Labs Limited
Cossack Labs is a privately-held British company with a team of data security experts based in Kyiv, Ukraine.