Secure Search Over Encrypted Data

Intro

More and more data is outsourced to remote (cloud) storage providers fuelled by “software as a service” trends in enterprise computing. Data owners want to be certain that their data is safe against thefts by outsiders, internal threats, and untrusted service providers alike. To safeguard the data, encryption is used.

Modern encryption is much more than enabling “data at rest encryption” checkbox on AWS S3 or using TLS connection between database and backend. For example, US Department of Defense shared their Ten Commandments of Software, where point 9 is “Data should always be encrypted unless it is part of an active computation.”, which means that the default state for valuable data should be “encrypted”, protected from adversaries and insiders alike.

However, encryption makes it harder to search over the data once it is encrypted, which is both what encryption is needed for and a huge operational downside. In an ideal world, it would be convenient to keep the data encrypted and still be able to securely search over it, without constraining the application architecture and today it is possible. In this article, we explore the methods of secure search over encrypted data, including our technology called Acra Searchable Encryption (Acra SE) that is available in Acra database security suite.

Need searchable encryption in your SQL database?

Searchable encryption schemes

When we discuss searchable encryption, we assume that data is stored encrypted in a database. There’s some application backend that interacts with a database, reads and writes data. There’s some client application (web or mobile app or another server) that uses the data. The goal of searchable encryption is to allow the client application to find requested data without decrypting it neither on the database side nor on the application backend side.

Naive/obvious/basic methods for searching over encrypted data

1. Decrypt, search, encrypt.

The most basic approach to searching through encrypted data is to download the data to the client's computer, decrypt it locally, and then search for the desired results in the plaintext data. For many reasons, this approach is neither practical nor scalable due to the possible significant data size.

Plus the key management problems – how to handle key distribution properly when the client needs the decryption key and can consequently leak it and the access credentials to encrypted storage.

Also, the emergence of a plaintext copy of the encrypted data in the “decrypt+search” scenario renders the encryption useless as the attacker can find a way to intercept and steal the plaintext copy.

2. Decrypt, run query, send results.

Another method lets the application backend (server side) request encrypted data from the database, decrypt it, run a search query on the server side and send only the results back to the user. Similarly to the scheme above, this introduces key management problems – as the server now has both data and decryption key.

In this case, security is compromised as all records protected by encryption will be revealed to every service and attacker that has access to the server environment.

Cryptographic schemes for secure search

There are two ways to implement searchable encryption (SE). It can be either done through encrypting data in a way that enables execution of queries’ directly on the ciphertext or by introducing a searchable index stored together with the encrypted data.

However, it is desirable to support the fullest possible search functionality on the application backend side, without decrypting the data, with the smallest possible loss of data confidentiality. In a properly implemented secure search scheme, the database server and application backend (server-side) never ever sees the decrypted data.

A typical workflow for a searchable encryption scheme would look something like this:

Abstract searchable encryption scheme.

The first SE scheme (and the first formal scientific definition of searchable encryption scheme) was proposed by Song, Wagner, and Perrig in 2000. The problem of search in encrypted data has since received much interest from governments, academic researchers, and industrial sector (Google’s Encrypted BigQuery, Microsoft’s SQL Server 2016, Azure SQL Database, Oracle Database Cloud, Amazon RDS, etc.).

Numerous approaches to searchable encryption. The first two – Searchable Symmetric Encryption (aka SSE, Symmetric Searchable Encryption) and Public Key Encryption with Keyword Search (PEKS, marked in violet) – are quite popular.

There is no single perfect approach to implementing SE as there will always be some tradeoff, be it in performance, functionality, security, or usability.

The main cryptographic techniques for constructing searchable encryption schemes are symmetric searchable encryption (SSE) and public key encryption with keyword search (PEKS). SSE-based schemes use symmetric key primitives and allow the user to create ciphertexts and perform the secure search. In PEKS schemes, the public key is used for creating (writing) ciphertexts and a private key is used for searching (reading). Thus, such schemes allow multi-user writing, but only one user (who is holding a private key) is able to read the data written by other users. Note that SSE schemes are usually more efficient than PEKS schemes due to the use of symmetric primitives.

It should also be noted that there exist more obscure techniques also related to searchable encryption, i.e. predicate encryption (PE), inner product encryption (IPE), or hidden vector encryption (HVE). These techniques focus on access control rather than search, so we’re not considering them in this article.

Another approach is called homomorphic encryption (HE), which – in general – allows performing computations on ciphertext that generate an encrypted result, which, when decrypted, matches what you would expect from performing similar computation on the plaintext. HE-based search methods exist both in theory and Proof-Of-Concept implementations, however, for most practical use-cases, homomorphic encryption is still sufficiently fragile and slow.

Attacks on searchable encryption schemes

The most dangerous types of attacks on searchable encryption schemes are leakage inference attacks. They are dangerous because they don’t outright “break” cryptography (which would be very hard) but employ statistical analysis of the queries’ and indexes’ metadata, which is always leakage-prone to a certain degree.

Let’s consider some low-entropy data like the binary gender field (“male”/”female”). The attacker doesn’t need to break the cryptographic algorithm used for encryption of records. It would be enough to observe the queries addressed to the database and the responses they get to increase the probability of simply guessing the plaintext content.

The most powerful cryptanalytic attacks on searchable encryption are:

  • Count Attack – 40% keyword recovery rate with 80% of dataset known to attacker. Works well if the keyword universe size does not exceed 5000.
  • Hierarchical-Search Attack – extension of the Count Attack, 40% keyword recovery rate under a condition that (at least) 40% of the data leaks. Attacker is able to inject a set of constructed records.

As you can see (by the conditions of mentioned attacks), there are two most important security parameters that should be controlled:

  • Encrypted dataset and metadata that can be potentially leaked to an attacker and
  • Keyword universe size of this dataset. The attempts to limit the leakage usually cause performance degradation or decline in query expressiveness.

No silver bullet

As it often happens in the world of security (and encryption in particular), there is no one-size-fits-all solution when it comes to secure search in encrypted data because different demands exist towards the performance efficiency and speed of work of various secure search schemes.

SE needs to balance security, performance, and usability, which gets harder and harder as databases get more specialised (think SQL, NoSQL, and NewSQL) trying to keep up with the demands of different users.

The searchable encryption tradeoff scheme below illustrates the fact that if you’re trying to improve a method in some aspects, you inevitably face the need to make tradeoffs in other aspects.

Most of the progress in the area of SE has been made in the area of keyword search on encrypted documents. While this has many practical applications (email, desktop search engines, cloud document storage, etc.), much of the data produced and consumed is stored and processed in relational databases queried using SQL.

Existing practical searchable encryption solutions for SQL databases

SQL databases enable users to store, query, and update their data in a user-friendly manner. A cryptographic data protection mechanism for searching over encrypted data stored in a SQL database should allow the database server to efficiently process the search queries without having access to the plaintext data. However, creating a method satisfying such constraint for SQL databases is not straightforward.

The existing secure search solutions for SQL databases use an index of keywords.

In relational databases, the records’ attributes are often interconnected through some relations, which provide the structure to the data. Data is arranged into tables and records. Queries are based on conditions that are applied to one or more attributes of a record. Keywords are a large plain list that is only connected with separate records and which provides no information on the ways the data in one record is related to other records. As a result, to come up with a keyword search scheme that could match SQL in its expressiveness, keywords would need to preserve the notion of an attribute, meaning that keywords would have to be added to the stored data.

The first searchable encrypted database solution was based on quantization (quite an academic and advanced approach we’re not going into here). CryptDB was the first non-quantization-based solution capable of handling a large subset of SQL. This became possible as CryptDB relies on property-preserving encryption like deterministic and order-preserving encryption, applied to a column of a SQL table. It is extremely fast, efficient, and supports almost all query types. CryptDB uses what’s called “onion-like cryptography”. The design of CryptDB influenced the Cipherbase system and the SEEED system.

“The model CryptDB for efficient storage of encrypted data using the so-called "onion-like": each value in the table sequentially encrypted by different encryption algorithms in order of increasing secrecy”. Source: Cryptowiki.net “Processing Queries on an Encrypted Database”

One of the main problems with CryptDB is that it’s also extremely good at allowing the person configuring it to shoot themselves in the foot (and there are more problems, all of which prevent CryptDB from becoming a de-facto standard with tons of wrappers for it). Making mistakes with CryptDB is easy and the results of such mistakes can be catastrophic.

For the want of a real-life example, if the level of security with CryptDB is set to maximum, very few queries will be let through. The administrator, in this case, will be flooded by users’ complaints about how “nothing’s working!”. As a result, the (considerably irritated) administrator might carelessly disable some crucial encryption layers and then many more queries will be let through, but the security level will be very low. It takes a very experienced system administrator with strong knowledge of cryptography, which is a rare combination of skills, to configure CryptDB in a truly secure way.

Mylar exists in a MongoDB world. It's a good lightweight solution that’s created to work with Meteor JavaScript framework.

Our needs are SQL-specific, which is why we turned to CipherSweet as it is both safe “out of the box” and SQL-oriented.

Secure Search in Acra security suite

Observing the need for a secure searchable encryption tool, we’ve created our own searchable encryption system in Acra security suite.

Acra Searchable Encryption (Acra SE) is a solution for secure search in an encrypted SQL database based on the blind indexing approach, developing and evolving the original idea of the CipherSweet project — storing keyed hashes of keywords on the database side.

It is proxy-based, with the capability of filtering the non-relevant query results (which is a limitation in the original CipherSweet) on the AcraServer’s side. Acra uses strong encryption AES-GCM-256, allows to encrypt specific database cells and then search through them.

Proxy between application and untrusted storage provider.

Unlike the existing solutions, our system provides adjustable performance-leakage ratio for protected search, strict separation of duties (based on proxying database requests — see the illustration above) which guarantees absence of cryptographic key leakage from application, secure storage and management of cryptographic keys, and a set of additional security features that better correspond to the real-world threats.

A detailed scientific paper on secure search in Acra is available in IACR archive of publications.

Summary — stay tuned

There are different ways to search through encrypted data, but not all of them are production-ready.

This article is just scratching the surface in the area of secure search and introduces the concept and our take on it – Acra Searchable Encryption – a solution for secure search in an encrypted data stored in SQL database. Acra SE uses strong modern cryptography and is based on the blind indexing approach.

You can find more info on secure search over encrypted data in the slides for the talk “Search over encrypted records: from academic dreams to production-ready tool” by one of our security engineers behind search in Acra.

Overview of existing searchable encryption approaches, as well as cryptographic scheme of Acra SE and its performance valuations are available in scientific paper.

Acra Searchable Encryption is part of Acra Enterprise, and in the following articles, we'll tell you more details how search works, stay tuned.

Copyright © 2014-2019 Cossack Labs Limited
Cossack Labs is a privately-held British company with a team of data security experts based in Kyiv, Ukraine.