Matomo

Transparent data encryption for SQL databases with Acra 0.93 | Cossack Labs

🇺🇦 We stand with Ukraine, and we stand for Ukraine. We offer free assessment and mitigation services to improve Ukrainian companies security resilience.

List of blogposts

Transparent data encryption for SQL databases with Acra 0.93

Recently, we released a new version of the Acra database security suite. Acra allows encrypting sensitive data fully transparently for the app and the database without any application code changes.

In this article, we explain details of how transparent data encryption works, what actually makes it “transparent”, and what features of SQL protocol Acra encapsulates under the hood for a comforting user experience.

Acra transparent data encryption is available as part of Acra Community Edition (free on GitHub, checkout ready-to-use engineering examples).

If you are familiar with terms field level encryption and transparent data encryption, scroll to the How does Acra work? section.

Cossack Labs Acra provides transparent data field encryption

  1. What are modern transparent data encryption techniques?
  2. Database wire protocols
  3. Integrating Acra type awareness to the database wire protocols
  4. Example: application uses simple text query protocol
  5. Example: application uses extended binary query protocol
  6. Summary

What are modern transparent data encryption techniques? #

In the context of databases, data protection goes beyond “data at rest encryption” and “row level security”. The modern approach is to use transparent column level encryption or field level encryption.

Field level encryption means that the database stores sensitive fields (names, emails, accounts) in encrypted form.

“Acra transparent data encryption” means that the encryption process happens transparently for both: the database and the application that talks with the database. The typical approach is to use an encryption proxy in the middle, like the Acra database security suite. It’s important that encryption happens outside the database, thus the database never gets access to the plaintext data or encryption keys.

Cossack Labs Acra Server works as database proxy and encrypts/decrypts sensitive data fields transparently

Acra Server works as database proxy and encrypts/decrypts sensitive data fields transparently.

Don’t confuse Acra transparent data encryption with Oracle TDE and SQL Server TDE: while these technologies have similar naming, they work differently under the hood.

With Acra transparent data encryption, neither the database nor the app know that the data is encrypted, because it happens transparently for them. Also, Acra uses unique encryption keys per each sensitive data field, significantly decreasing chances of plaintext data leakage.

Benefits of transparent data encryption #

The biggest benefit of Acra transparent data encryption is that it doesn’t require any change in the application code or dealing with cryptographic libraries and keys. Transparent data encryption saves resources on development and allows to protect data with minimum effort.

Acra’s transparent encryption is the most useful when the infrastructure (apps, databases) is already built. For example, when data privacy regulations change, your team is required to encrypt sensitive fields but doesn’t need to rebuild the existing system.

How does Acra work? #

In short: your application communicates with the database via Acra.

Note: Acra can work as SQL proxy, as API server, as client-side SDK, or even module for SCADA. Here we describe Acra that works as a SQL database proxy.

Acra works as a database proxy: it receives all incoming data from the applications and encrypts sensitive data fields before sending them to the database. In the reverse direction, Acra captures queried data from the database, decrypts it, and sends to the application.

Developers or DBAs customize which fields to consider sensitive and which actions Acra should take (encrypt, searchable encrypt, mask, tokenize, etc). Typically, sensitive fields are PII (names, emails, dates), financial data, regulatory-specific data.

Important point is that Acra adds extra protection for sensitive data fields, which works in addition to TLS and database-level security controls (like row-level security and data-at-rest encryption).

How transparent data encryption works in Acra? #

In databases, data fields are stored in certain types: string, int, and bytes.

But encrypted data is binary data, it doesn’t preserve data format and type. It is stored as a binary large object (blob) in the database, or, in some rare cases, encoded as base64 string.

While the format preserving encryption algorithms exist, they don’t provide acceptable performance and security guarantees.

FPE works well in small home projects, but it doesn’t satisfy modern security requirements of the enterprise solutions.

Acra uses AES-256-GCM and wraps data into the AcraBlock during encryption of the sensitive data fields.

So, when data is encrypted, the database doesn’t know its original type anymore. But when data is read from the database and decrypted, it should receive its type again. Developers prefer to work with string, ints, and their custom serialization types but not the binary chunks.

Thus, it is important that Acra not only decrypts but decodes data back to its original type.

Cossack Labs Acra wraps data into the AcraBlocks, special cryptographic containers, during encryption of the sensitive data fields

Acra wraps data into the AcraBlocks, special cryptographic containers, during encryption of the sensitive data fields.

How does Acra know the original data types? #

In the previous versions, Acra didn’t know the original data type and sent decrypted data back as binary. Developers added a few code lines to their apps to decode binary data back to its original type. It made the encryption almost transparent and required some application changes, which is not ideal in many scenarios.

Since Acra 0.93.0 release, Acra supports fully transparent encryption. Acra decrypts and decodes binary data back to the original type, so the applications require zero changes and continue to work with strings and ints.

We call this feature “type awareness”.

When users configure Acra which data fields to encrypt in encryptor_config, they can specify the data type of each field, so Acra knows what to expect.

With type awareness, Acra not only hides cryptographic details from developers but reduces encryption integration cost and time.

Building type awareness was much harder than it looks like. It required special expertise in PostgreSQL and MySQL wire protocols, encoding, and encryption.

So, how to achieve it?


Curious to try Acra Community Edition without coding?


Database wire protocols #

Let’s describe how database wire protocols work in the parts related to querying data.

PostgreSQL and MySQL wire protocols support 2 types of queries: simple and complex.

PostgreSQL calls them a “simple query” and “extended query”. MySQL uses a different naming: “text protocol” and “prepared statements”.

PostgreSQL supports text and binary formats of query results for both protocols. By default, a simple query is not configurable and supports only text format, while an extended query protocol supports format selection (text or binary).

MySQL supports text and binary formats of query results according to a protocol type.

In general, these protocols share a similar approach. The first one looks as simple as possible via querying data by SQL query. The second one looks like SQL’s prepared statements but on a wire protocol layer.

The PostgreSQL protocols are a bit more complicated, and we will use them as a reference. Please note that all steps described below are true for MySQL too. Packets are named and look a bit differently, but the main approach is the same.

A. Simple query protocols #

Here is schema of packets for a simple query INSERT INTO users(id, email) VALUES ('1', 'example@cossacklabs.com') RETURNING id:

Cossack Labs PosgtreSQL query simple protocol schema

Inserting data using simple query protocol (PostgreSQL).

The database driver (DB driver) sends one packet with a query and receives four packets in response at once.

In the first packet, the Row Description, the database describes how many fields are in each row, their types, and encoding.

The second packet Data row contains result rows with data.

The last two packets, Command completion and Ready for query, are the service packets signaling that the last commands were executed and the database is ready to accept a new command.

B. Extended query protocols #

Extended query protocols are more complicated:

Cossack Labs PosgtreSQL query extended protocol schema

Inserting data using extended query protocol (PostgreSQL).

The database driver sends a query in two steps: first—a query, next—parameters. Each step contains several packets to both sides.

In the first step, the driver sends 3 packets:

  1. The Parse packet with a query INSERT INTO users(id, email) VALUES ($1, $2) RETURNING id with placeholders $1, $2 instead of real values.

The driver gives a name to that query as a statement name. It works as a prepared statement and can be re-used and referenced in the future. Optionally the driver may specify OID for each parameter or leave 0 that means “unspecified”.

  1. The Describe packet—to request description for parameters and result rows.

  2. The Sync is a required packet from the driver for extended protocol and closes implicit transactions.

The database responds with 4 packets:

  1. The Parse completion packet notifies that it is processed successfully.

  2. The Parameter description packet contains type identifiers OID expected from the database for each parameter. It describes to drivers what types are expected and how they should encode language-specific types into database-specific ones.

  3. The Row description packet contains the count of fields of every row in the result set, field or column name, and type identifiers. It helps drivers to understand what types they should expect and verify that they can decode them into language-specific types.

  4. The Ready for query packet signals about finishing processing of this block of packets and readiness to accept new commands.

In the second step, the driver sends values for parameters, asks to execute a query, and returns the query result:

  1. The Bind packet contains values for every placeholder, which format was used for encoding, statement name related to these parameters, and optionally specifies desired formats for the result rows and a portal name for the values.

The portal name is similar to a named cursor in PostgreSQL protocol. More information about the portal name you can find in the PostgreSQL documentation.

  1. The Describe packet—to request description for the result rows (without parameter descriptions as in the first step). The driver should repeat it because the types can be changed according to real values of the parameters.

  2. The Execute packet signals—to execute a statement with passed values and link the result with the portal name.

  3. The Sync packet notifies about the end of this block of packets.

The database responds with 5 packets:

  1. The Bind packet notifies about successful bind operation.

  2. The Row description contains the same or updated types according to the received parameters.

  3. The Data row packets contain the result values according to the Row description metadata.

  4. The Command completion packet notifies that sending of the Data row packets is finished.

  5. The Ready for query packet notifies about finishing processing current block of commands and readiness to accept a new command.


Integrating Acra type awareness to the database wire protocols #

This is how databases communicate with applications through drivers.

While looking at these schemas, let’s find out how we can hide the fact that the data types on the database side differ from the application types. Since the extended protocol includes a simple query protocol, we will rely on a more complex extended protocol to cover all the packet types.

From the application side to the database side we can find Parse packets including specified parameter types via OID. It is optional, usually drivers leave 0 values that mean “unspecified”.

On the other hand, the database sends the ParameterDescription and RowDescription packets and they are the most interesting for us.

The ParameterDescription packet contains information about the types a database expects to receive as parameters according to declared table names in a query. Strict drivers rely on that values and validate ORM models and passed parameters into the function to query data. If the language-specific data types cannot be mapped correctly to the database type, then the database driver can raise an error.

The RowDescription packet contains information about the data types of the result fields in the rows. The database drivers use metadata to validate the data types of a receiver’s variables / structs and check if they can be mapped from the database type.


Example: application uses simple text query protocol #

How does an application and a database communicate data types to each other?

Let’s take a look at a case with one table CREATE TABLE (id INTEGER PRIMARY KEY, email BYTEA). BYTEA is a PostgreSQL type for binary data, MySQL uses BINARY.

An application gets an email from the user, encrypts it (itself via Acra), and stores ciphertext in the database.

The simplest flow looks like this:

Cossack Labs scheme application uses simple text query protocol to send encrypted data without Acra type awareness

Application sends data to the database, Acra encrypts it without type awareness, application needs to decode decrypted data.

  1. The application gets an email string and converts it to the language-specific binary type (bytes for Python, bytea[] for Java, []byte for Golang). Usually, string can be represented and used as is for binary arrays.

  2. The applications write email to the database. The database driver creates a query INSERT INTO users(id, email) VALUES ($1, $2) RETURNING id, email and pass it as arguments (1, example@cossacklabs.com).

Different drivers do different actions:

  • encode values as SQL literals, escape string values if needed, and put them into the query string: INSERT INTO users (id, email) VALUES (1, '[example@cossacklabs.com](mailto:example@cossacklabs.com)') RETURNING id, email, eventually send the result using textual wire protocol;
  • use a binary protocol and send query in two steps: first, a query with placeholders, and then only data in text or binary format;
  • send query as a prepared statement under the hood: sending it in two steps but explicitly using prepared statements as SQL commands.

Let’s continue with the first case.

A driver will encode email to a binary literal that is usually represented as HEX string: example@cossacklabs.com -> \x6578616d706c6540636f737361636b6c6162732e636f6d (PostgreSQL bytea literal).

  1. The database parses a SQL query, extracts table’s name users, loads table’s schema from the internal system tables with a description of the column’s types declared in CREATE TABLE query. Then the database understands that the literal 1 has an INTEGER type (but could be BIGINTEGER, SMALLINTEGER, etc.) and email is BYTEA, and it can validate type restrictions.

  2. The database finds data and prepares to send it to the application. The database stores all data in a binary format for better efficiency. So, before sending, the data should be encoded into the textual format according to a wire protocol. The same system table is used to understand which data types should be used to encode the results. In our example, the system should send inserted id and email values.

  3. The database sends metadata of the response and the result data to the application. Database sends data in 2 packets.

First—metadata describing a query result data format (in our example, it’s a textual format, but it can be binary as well) and the result data types, so the application’s driver understands how to parse and decode the data. Usually, the data types and formats are represented as DB-specific identifiers.

Second packet—the result data according to the types and formats.

  1. The application receives and passes these packets to the database driver. The driver parses the metadata and then decodes the result data to the supported language-specific types.

  2. The application decodes the binary value of email to the string type to use string-related methods or prepare to render it in the user interface.


How Acra injects type metadata into simple query protocol #

If Acra knows the types used by the application, it can capture the data, encrypt, and encode it to the binary type used on the database side.

In a reverse direction, Acra captures data from the database encoded as binary values. Then it updates metadata in the first packet with the type’s identifiers to make them acceptable by the application, decodes data to the proper types (in our example it is TEXT), and sends it to the application.

With Acra type awareness, the application sends email as a string value and gets back the same type. Under the hood, the email is encrypted, encoded to a binary type, then decrypted, and decoded back to its original type.

But it all happens transparently for the application and the database.

Cossack Labs scheme application sends data to the database, Acra encrypts it, by respects data types and uses 'bytea' with the database, but 'text' with the application. Application receives decrypted and correctly decoded data.

Application sends data to the database, Acra encrypts it, by respects data types and uses ‘bytea’ with the database, but ‘text’ with the application. Application receives decrypted and correctly decoded data.


Example: application uses extended binary query protocol #

The steps described above are true for a simple query protocol with text encoding.

But most modern databases use extended query protocols in binary format to improve performance. The binary protocol is usually several steps longer than the text protocol.

Let’s see the example of an application using extended binary query protocol to talk to the database.

  1. The application gets an email from the UI and converts it to the binary type.

  2. Then the application passes it to the database driver in the same way as described above. But now, the driver doesn’t escape values and puts them into the query as literals.

  3. The database driver sends this query in several steps. First, it sends a query with the types of placeholders used in the query: Parse[driver] -> ParseComplete[db]. Here this email will be sent as BYTEA type.

After that, the driver usually asks the database to return the types for the placeholders and types for data in the final response to the query.

The database responds with two packets: Describe[driver] -> ParameterDescription[db] -> RowDescription[db].

Now, the driver should send the data to use it instead of the placeholders encoded into the types declared before: Bind[driver] -> BindComplete[db].

Then, the driver asks to describe the result’s types as they can be changed according to passed parameters. For example, the result type can be changed if the application uses CASE statements in the SELECT.

The database responds with the types as described before: Describe[driver] -> RowDescription[db].

Finally, the application sends a command to execute a prepared query with the passed parameters and returns the result rows: Execute[driver] -> DataRows[db].

  1. And at the end, the application receives the data as described in the previous case of a simple flow and converts email to the string type.

How Acra injects type metadata into extended binary query protocol #

So, this flow is more complicated than the first one. What should Acra do to hide the encoding / decoding steps and allow the application to work with email as string, while the database stores it as BYTEA?

Cossack Labs scheme application sends data to the database using extended query protocol, Acra encrypts it, by respects data types and uses 'bytea' with the database, but 'text' with the application. Application receives decrypted and correctly decoded data.

Application sends data to the database using extended query protocol, Acra encrypts it, by respects data types and uses ‘bytea’ with the database, but ‘text’ with the application. Application receives decrypted and correctly decoded data.

Acra adds type awareness carefully handling the next metadata:

  1. Captures the Parse[driver] packet and tamper the metadata about the placeholder’s types if any.

The application wants to send the string types, the database wants to receive the BYTEA types as they were stored. So, Acra captures the metadata of the packet, looks at the encryptor_config, and finds columns that should be encrypted, then Acra replaces the types for all the related placeholders and puts a BYTEA type identifier instead.

  1. Capture the ParameterDescription[db] and RowDescription[db] packets.

With the ParameterDescription[db] packet, the database says that it expects to receive a BYTEA type for the encrypted columns. Acra changes these identifiers to the proper application’s types described in encryptor_config. In our case, it is changed to TEXT, and the application does not have any troubles.

Note, that different database drivers behave differently: some drivers don’t ask to describe these types or just ignore them. But strict drivers request and verify that in the application (f. e. in the ORM’s models) the types correspond to the database’s types and raise an exception if they don’t.

With the RowDescription[db] packet, the database responds with the result’s types. In our case, it returns id and email. This packet describes that id has an INTEGER’s type identifier and email has a BYTEA type. So, Acra captures, recognizes that email is an encrypted column, and replaces the result’s type to the application expected type.

  1. In the end, Acra decrypts the data in the DataRows and encodes it into the correct format.

The database driver can ask the database to encode the data into the text or binary format. It usually applies to integers, as the strings are formatted equally in both formats.

Integers are encoded into 8/16/32/64 bit values in a binary format (for example, 0x11223344) or into decimal numbers in a text representation (for example, 123). After the decryption, Acra encodes these values into the correct format.


We work with companies on demanding markets.
Read how we use Acra to protect data in critical infrastructure


Summary #

Encryption is complicated, transparent data encryption is even more complicated.

With the latest 0.93.0 release, Acra introduced a “type awareness” feature. Type awareness makes integrating field level encryption transparent for the database or the application. The users don’t need to change the application code or database schemas—only integrate Acra and configure which fields to encrypt and in what format.

Under the hood, Acra carefully parses database wire protocols, injects required type values, encrypts, encodes, decrypts, and decodes data, but it’s all hidden from developers eyes.

Our goal is to make strong data security accessible for developers without a cryptographic background. So developers can build their products and be confident that data security has been taken care of.

Acra Community Edition is free to use. Try our new engineering examples on GitHub to see how type awareness works in various examples.

Find out the description of new configuration settings related to type awareness in Acra documentation.

Switch to Acra Enterprise Edition, when your solution grows and requires elevated security guarantees.

Contact us

Get whitepaper

Apply for the position

Our team will review your resume and provide feedback
within 5 business days

Thank you!
We’ve received your request and will respond soon.
Your resume has been sent!
Our team will review your resume and provide feedback
within 5 business days