This article was initially published on November 2018, then reviewed and updated with the information regarding CCPA on April 2020.
We frequently see how regulatory requirements are mapped onto real-world demands during the integration of our tools and security consulting projects. Producing a coherent vision of which data assets need to be protected is the first step in designing encryption solutions – in the end, encryption comes at a cost and it makes sense to know where this cost is justified.
The modern toughening privacy regulations define sensitive data in different ways and ask for different approaches. While using encryption went from a near-esoteric practice into a more or less typical developer task, some extremely important, but rarely asked questions remain unanswered.
Do you do the data encryption properly or just turn on the default "database encryption" option? What needs to be encrypted, what is sensitive? This article explains how we approach this task and provides a cheatsheet answering these questions for a number of popular privacy regulations (GDPR, CCPA, HIPAA, FISMA, etc.)*.
Why not just encrypt everything?
Technically speaking, encrypting all the data at rest and in transit between the nodes of your system is the “encrypting everything” strategy, and it’s fairly easy. Just turn on the native configuration flags in your database, enable TLS, and you’re good to go against some risks. But if the keys are easily available nearby, the security benefits of such encryption are limited.
Sometimes encrypting everything is far from being a good idea. When you encrypt and store data in one place, diligently store the keys elsewhere, and run decryption in a controlled environment – the data becomes protected, but less usable, less available, and harder to manage. So how do we decide where to ditch the convenience and what to encrypt in the name of security?
Compliances vs real-world risks
In the past, if you wanted to do business in finance and get people to trust you, you needed to have certain certifications like PCI DSS. You wanted to work in healthcare? Don’t return without a HIPAA compliance certificate. Education? FERPA needed. GICSP and ISOC for industrial business, NISP, and so on... The list of industry-specific certifications can go on and on.
The need to have so many different specifications for each separate industry is explained by the fact that each regulation dictates its own definition of the sensitive data that needs to be protected, and each industry has its own certification/standardisation bodies with different opinions. With the emergence of GDPR and CCPA, sensitive data and personally identifiable information (PII) became closely intertwined from the viewpoint of personal rights.
Aside from the “cost of doing business”, compliances (or rather the failure to comply with their regulations) pose a real-world risk: the fines are large and can easily affect stock market share or lead to kicking you or your product out of the market.
So here we offer a cheatsheet** that can help your decision-making about encryption in your products when dealing with some of the most popular data privacy regulations today. Let’s go over them and see what you need to protect.
Cheatsheet: Sensitive data/PII as defined by data regulations
Personally identifiable information (PII) – is the information that can be used for identification or location of an individual. Personal and sensitive personal information can be loosely defined as such information, the disclosure of which can lead to consequences ranging from mildly inconvenient to life-threatening for the individual they belong to, in case of such information becoming public without the individual's explicit consent due to a data breach or other reasons.
When applied to the data you consider protecting, asking yourself “is it this kind of information” should be a good general guideline, and it’s always better to err on the side of providing excessive protection for the information as opposed to providing insufficient protection.
The following cheatsheet is our best effort in compiling a list of PII guidelines and variants as encountered in the wild in the actual data protection regulations.
Sensitive data definition
What is considered to be PII
There is no single unified data protection law in the USA
The exact definition of PII can vary depending on:
● The state where the PII is identified.
For instance, California Consumer Privacy Act defines PII as any “information that identifies, relates to, describes, is capable of being associated with or could reasonably be linked, directly or indirectly, with a particular consumer or household”.
The GSA definition of PII is:
The PII in the USA federal legislation is subdivided into “always sensitive” and “sensitive in certain contexts” types.
Generally considered to be PII in most states:
● Social security number,
● Credit card number,
● Financial/Bank account number,
● First name,
● Last name,
● Zip code,
● Email address (especially as login),
● Date of birth,
● Password or passphrase,
● Military ID,
● Drivers license number,
● Vehicle license number,
● Phone and Fax numbers.
Sensitive PII even when not linked with additional PII or contextual information:
● Complete (9-digit) SSN,
● Alien Registration number (A-number),
● Driver’s license or state identification number,
● Passport number,
● Biometric identifiers (i.e., fingerprint, iris scan, voice print).
Sensitive PII when linked with the person’s name or other unique identifiers (i.e. phone number):
● Citizenship or immigration status,
● Criminal history,
● Medical information,
● Full date of birth,
● Authentication information such as mother’s maiden name or passwords,
● Portions of SSNs such as the last four digits,
● Financial account numbers,
● Other data created by DHS to identify or authenticate an individual’s identity, such as a fingerprint identification number (FIN) or Student and Exchange Visitor Information System (SEVIS) identification number.
PII sensitive in a certain context:
PII that might not include the data elements described above may still be sensitive and require special handling if those could cause:
● Substantial harm,
● Unfairness to an individual.
Federal Agencies and The Federal Information Security Modernization Act (FISMA 2014)
FISMA is one of the most important cyber security laws affecting the U.S. federal agencies. It applies to all federal information and information systems including data, information systems, information technology), and all forms of information.
PII according to FISMA is the information that allows identifying and contacting the data subjects without their consent to be identified and/or contacted, and the loss of which may lead to identity theft or other fraudulent use, resulting in substantial harm, embarrassment,
● First/Last names,
● Social Security numbers,
● Biometric records,
● Other personal information linked or linkable to an individual.
Medical Information for Covered Entities and Health Insurance Portability and Accountability Act HIPAA / The Health Information Technology for Economic and Clinical Health Act (HITECH, a part of the ARRA).
HIPAA uses the term Protected Health Information (PHI) to refer to protected data, which overlaps a lot with PII defined in the other US legislative acts.
While PII is the information that can identify a person, PHI is subject to strict confidentiality and includes anything used in a medical context that can identify patients.
The disclosure requirements are much stricter than those that apply to most other industries in the United States. Protecting PHI is always legally required, but protecting PII is only mandated in some cases.
Certain information like full name, date of birth, address and biometric data is always considered to be PII.
Other data, like first name, first initial and last name or even height or weight may only count as PII in certain circumstances, or when combined with other information.
● First/Last name,
● Address (incl. ZIP code and country),
● Credit card number,
● Patient payment information,
● Health plan beneficiary numbers,
● Driver’s license,
● Medical records,
● Medical record numbers,
● Patient diagnostic information (past, present, future physical or mental health),
● Patient treatment information,
● Biometric identifiers (incl.fingerprints),
● Full facial photographs and images,
● Device identifiers and serial numbers,
● IP address numbers and web URLs,
● Any other individual’s identifiable information.
EU General Protection Regulation (GDPR)
GDPR identifies 2 types of personal data:
● Personal data,
● Sensitive personal data.
GDPR – Personal data
“Personal data” is “any information relating to an identified or identifiable natural person (data subject); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”.
Any information relating to a physical individual, ranging from someone’s name to their physical appearance i.e.:
● First and last name, patronymic,
● Email, phone, and fax numbers,
● Date and place of birth,
● General medical information,
● Physical address, ZIP code,
● Social security number,
● Driver/vehicle license numbers,
● Passport number,
● Military ID,
● Credit card number,
● Bank account number,
● Automated personal data which can be used for identification,
● Password or passphrase, etc.
GDPR – Sensitive personal data, expanded
Some of the sensitive data types defined in GDPR are new and extremely modern in comparison to other data privacy regulations, and we’d like to expand the list of the sensitive data with more detailed examples when it comes to technology-centric and digital-native data types and mediums.
(Some) Sensitive personal data expanded:
● Genetic data is personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person.
● Biometric data is personal data resulting from specific technical processing relating to the physical, physiological, or behavioural characteristics of a natural person, i.e. facial images, voice print, iris scan, or dactyloscopic data, including with fingerprint login info.
● Data concerning health is personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about their health status.
● Automated personal data, unique online identifiers (i.e. “cookies”, IP address, other info gathered by various user behaviour trackers), location data that can be used for identification or location of the natural person.
● Pseudonymized data – once the pseudonymisation process has taken place (the “personal” is “decoupled” from the personal data), no third party data processors/collectors shall be able to trace the data back to identify or locate the natural person the data belongs to.
“Information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household”.
According to the CCPA, PII includes roughly 11 categories and inferences:
● Identifiers (Name, alias, postal address, unique personal identifier, online identifier, Internet Protocol (IP) address, email address, account name, social security number, driver’s license number, passport number, or other similar identifiers)
● Customer records information
● Characteristics of protected classifications under California or federal law (ace, religion, sexual orientation, gender identity, gender expression, age)
● Commercial information
● Biometric information
● Internet or other electronic network activity information (browsing history, search history, and information regarding a consumer’s interaction with an Internet website, application, or advertisement)
● Geolocation data
● Audio, electronic, visual, thermal, olfactory, or similar information
● Professional or employment-related information
● Education information
● Inferences (preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, aptitudes)
Publicly available information (e.g. data contained in publicly available federal, state, or local government records) is an exception and is not considered to be PII.
Educational Information Covered by The Family Educational Rights and Privacy Act (FERPA)
FERPA follows the GSA/Federal definitions of the PII with some additions and exceptions.
For instance, schools may disclose, without consent, "directory" information about a student, but must first warn parents and allow them to make a request not to disclose the “directory” information on students.
Directory information (can be disclosed in some cases):
● Student's full name,
● Telephone number,
● Date and place of birth,
● Honours and awards,
● Dates of attendance.
● Student ID number,
● Family member names,
● Mother’s maiden name,
● Student educational records,
● Immunization records,
● Health records,
● Individuals with Disabilities (IDEA) records.
Payment Card Data Security Standard (PCI DSS)
The Payment Card Industry Data Security Standards (PCI DSS) require that merchants protect sensitive cardholder information from loss and use good security practices to detect and protect against security breaches.
Sensitive data according to this regulation is the data used for authentication and authorization.
May be stored if protected in accordance to PCI DSS (in an encrypted form):
Primary Account Number (PAN; must be masked whenever it is displayed),
Payment card data:
● cardholder’s name,
● expiration date,
● service code
can only be stored if required for business purposes and properly protected
Data on the magnetic stripe or chip (referred to as full track, track, track 1, track 2, or magnetic stripe data, CVV, CVV2, etc.) must never be stored, even in encrypted form.
Financial Data for Federal Financial Institutions Examination Council’s FFIEC Compliance (FFIEC)
Non-public Personal Information (NPI) must be protected by banks, credit unions, and other financial institutions. This information includes personally identifying financial information.
In addition to Personally Identifying Information, sensitive customer information must also be protected according to FFIEC.
All personally identifiable financial information, i.e.:
● Credit score,
● Collection history,
● Family members’ PII and NPI;
● Any list, description, or other grouping of consumers (and publicly available information pertaining to them) that is derived using any “personally identifiable financial information” that is not publicly available.
Sensitive customer information:
● Customer’s name,
● Telephone number, in conjunction with the customer’s social security number,
● Driver’s license number,
● Account number,
● Credit or debit card number, or
● Personal identification number or password that would permit access to the customer’s account.
Any combination of components of customer information that would allow someone to log into or access the customer’s account is also considered to be sensitive customer information, i.e.:
● Username + password,
● Password + account number, etc.
Other sensitive information
Need to have more extensive controls to guard against alteration (i.e. integrity checkers and cryptographic hashes):
● System documentation,
● Application source code,
● Production transaction data.
Not sure if you need to encrypt any data?
Consult with our data protection engineers.
Going Beyond The PII Regulations
It might look like a lot of work to try and comply with the outlined regulations, but those regulation exist not to add problems, but to prevent them and protect the citizens (and in some cases – governments from citizens, too). What about your business? What do the regulations protect you and your users from?
In the classic map of business risks, data risks apply to:
Regulatory risks: risk of a change in regulations and law that might affect an industry or a business, which is basically the subject of this article :)
Strategic risks: inability to execute the corporate strategy due to a breach of confidentiality, integrity or availability of sensitive commercial data.
Operational risks: breach of operational processes due to:
Process abuse based on adversaries’ gaining the knowledge of sensitive data both inside and outside of the business.
Process halt due to the breach of availability/integrity of sensitive data.
Reputational risks: businesses rely on having customers. When a data breach incident takes place, relationships with customers suffer greatly.
These risks can be illustrated with a plenty of practical threats:
Insider risk: Half of the typical security measures and precautions are aimed at providing protection from external adversaries. Yet, the history of publicly available incidents shows that the major risks in such industries as finance, insurance, and governmental management are posed by the insiders. So, no matter how good the firewall is – unless the access is accountable and protected cryptographically, the risk of an inside job is still higher than other types of threats.
Competitor intelligence: Competition is always about asymmetry in knowledge and resources. Whether it’s about your processes and unique designs or about the client base (which coincidently overlaps with GDPR scope) – leaking commercially sensitive data will give your competitors some advantage. The worst-case scenario for data leaking to your competitors is when your business operates on an assumption that the sensitive data is safe, while it’s not and the competitor is able to manipulate you. Suddenly, when you thought you were swimming in a blue ocean, it might turn red.
Adversarial attacks: Some of the leaked data might be used by active adversaries to exploit your systems. Is leaking a customer loan scoring strategy bad? It depends on the financial losses it entails. Being in the possession of a clear decision-making tree for a bank’s credit processes, adversaries will find it easier to execute fraudulent activities even without having insiders in the business infrastructure.
What else should you consider protecting, from this standpoint? Any data that is sensitive in relation to operational processes, business strategy, competitive advantages, and reputation.
Technical constraints, right tools, and correct decisions
So why not just resort to encrypting everything, after all? In some businesses, this is still the best answer for safeguarding against all the possible risks. Yet, if the applications you run require access to customer data, and, moreover, your own internal sensitive data, this might not be the best solution.
It’s still a problem for your SQL database to search over protected records. Managing the keys is another challenge. Encryption adds overhead. What to do?
Choosing the right tools
Pick the right tools that best suit your technological approach, service model, and business style: End-to-end encryption (aka Zero knowledge architecture) or Centralized trust.
Next, define what and how you’re going to encrypt.
Making the right decisions
Define the risks:
Which regulation do you fall under?
Which processes does your regulation demand – not everything can be “just encrypted” (i.e. it contradicts some of the GDPR-defined rights).
What are the other 3 business risks in regard to the data you store?
Classify the data:
Enumerate all the potentially sensitive data types in your system in regard to the 4 aforementioned risks, including with definitions from all kinds of regulations that apply to you.
Choose which data can be safely encrypted and never used programmatically without a key, encrypt it, distribute the keys appropriately.
Manage the keys really well (works best if the key management model is as close to the real-world use-cases as possible).
Pseudonymize the rest of the PII data.
Analyze the risks for leaking large amounts of data and correlate them. Prevent via decorrelation.
Detect intrusions by watching the access, poisoning the data, etc.
Encryption itself is just a risk-narrowing technique. To use it well, you need to understand the risk and to narrow the attack surface in a controlled manner.
Different regulations require different kinds of PII to be encrypted. In any case, you’ll be better off if you start with encrypting the data defined as risky by the data privacy regulations first (to lift that external pressure off your operations) and proceeding to extend data security to other types of risks next.
Even in the cheat-sheet form, privacy regulation norms are nowhere approaching brevity, even when they are singled out. Trying to reach a compliance with different security regulations at once can be tougher still. And you don’t really need to become a security expert when you’re a much better professional at what you do. You can and should ask for external security assistance. We have a vast experience in taming privacy regulations for business and we can help you.
* This is our subjective take on the issue as we see it, valid at the moment of publication in November 2018 and updates in April 2020.
** Cossack Labs Limited will not be responsible for any loss or damage whatsoever caused resulting from following the instructions or links provided in this article. The information provided here is not confirmed by the officials responsible for observing the mentioned legislative regulations.