Auditable Macros in C Code

Intro

Like death and taxes, one thing that you can be sure of is that using C macros in a modern software project will cause a debate. While for some macros remain a convenient and efficient way of achieving particular programming goals, for others they are opaque, introduce the unnecessary risk of coding errors, and reduce readability.

The criticism of macros is particularly acute in the wider security community. Among Cossack Labs’ engineers and the core Themis crypto library contributors there are people who previously worked on auditing cryptographic implementations of critical code. Their typical knee-jerk reaction to macros was always “kill it with fire and never use it again”. Taking no sides, we would like to assess both pros and cons of using such dangerous things as macros in security code (as we faced the issue when developing Themis) and suggest some techniques for lowering the accompanying risks.

We’ll also discuss a custom “for-audit” build target for Themis designed specifically to generate source code that exposes the macros to inspection because we appreciate the need for security software to be subject to detailed source code scrutiny.


Problem

Macros are a dangerous thing. Used carelessly, they will lead to potentially fatal problems. But like medicine — when applied with knowledge, in moderate quantities and in a controlled manner, they only help. So what’s so special about macros that the general approach is to treat them like a last-resort tool that should be replaced with functions wherever possible?

The problem with macros is that few people understand how to use them properly. Even fewer people are willing to invest time into learning how to implement the “evil” macros for the greater good of a specific project. Since macros rely on substitution and expansion, and cannot be checked by a compiler (which only checks the expanded expressions that use macros), they are prone to creating all sorts of unfortunate problems.

As software developers of cryptographic tools, we are quite used to handling dangerous things :).

Our weapon of choice

We created Themis to be a highly parameterised library, which abstracts dependencies, allows for massive refactoring of wrappers around different layers of code, and is cross-platform and architecture independent. Building it would be nearly impossible without macros.

In doing so we were (and still are) using pure C as the programming language with the widest portability and the best performance characteristics for the tasks we have in mind*.

To ensure that security of resulting code is consistent and proper, a thorough audit is necessary. Auditing code full of macros demanded customised tools that would allow expanding the macros and checking them in a “for audit” build target by the Themis’ build system. By “for audit” we mean that the build system generates code (instead of binaries) intended for manual inspection by human developers. The resulting output is surely not intended for an actual compilation (if attempted, you’d get a ton of compiler warnings from the duplicate #include and local #define statements).

Let’s get more specific.

Why use macros in the first place

Back in the day, macros were used for protecting headers from being included twice by the processor and the compiler. Nowadays, macros (when in skilled hands) allow us to use meta-programming and assist in creating more concise code for debugging purposes through error handling and parameter validation.

Meta-programming

Macros allow code generation during compilation (meta-programming). Unfortunately, the chance to apply this in C arises quite rarely and the resulting code indeed looks a bit weird, so we try not to abuse this approach. However, sometimes it allows to considerably shorten the code thanks to transferring the process of code copying from the writing stage to the compilation stage. An example of such use of macros can be found in our soter/soter_sign.c.

One of the primary objectives we kept in mind when developing Themis is the ability to add or change not only the cryptographic primitives used by higher level crypto systems but the underlying “provider” libraries that deliver those primitives, too.

For example, here a macros allows generating the structure for algorithm selection based on the pre-set list of cryptographic algorithms:

...
switch(algId){
        case SOTER_SIGN_rsa_pss_pksc8:
            //do something for rsa_pss_pksc8
        break;
        case SOTER_SIGN_ecdsa_none_pkcs8:
            //do something for ecdsa_none_pkcs8
            break;
}
...

In fact, this fragment of code could have been written without using macros as there are only two switches, but in soter/soter_sign.c there are many switches and if we suddenly decide to add or delete an algorithm (i.e. RSA with no padding: rsa_none_pkcs8), changing only the first “define” will be enough:

...
#define SOTER_SIGN_ALGS \
    SOTER_SIGN_ALG(rsa,pss,pkcs8)        \
    SOTER_SIGN_ALG(ecdsa,none,pkcs8)    \
    SOTER_SIGN_ALG(rsa,none,pkcs8)
...

During the compilation, all selections will be expanded automatically. Without the macros, it would have taken even more time and — quite possibly — introduced mistakes.


The main problem with macros in C is that they do not check the parameter type, so if anything ill-fitting is introduced at the compilation stage, the macros can process it, and this will lead to extremely unpleasant results**.

This seems to be the cause from which stems the dislike for macros — they can be hiding a careless mistake inside, but no one will detect it until they are expanded. However, macros are created to be used often, which is why they are usually refined by developers until a chance of a human error lurking inside is eliminated. In our case, the macros for C are additionally checked using a compiler until we’re sure everything works just as intended.

Error handling and parameter validation

Another thing we can effectively do using macros is creating more concise code — i.e. in the situations when the same code needs to be executed several times in different parts of the program, but it’s not necessary (or even possible) to turn this code into a separate function. This is when some developers create a macros with an intuitive name and wrap it around the code.

For instance, in Themis we used macros for error processing, i.e. instead of:

if(error){
       // do something
       return error_code;
}

such macros was created:

THEMIS_CHECK(condition, error_code, snippet) / /snippet is some code that needs to be executed in do something

It is convenient, and the code of the macros can always be substituted for something like:

if(error){
       fprintf(stderr, "error in file:line - %s:%s", __FILE__, __LINE__);
       // do something
       return error_code;
}

This allows bringing up the exact file/line where the error took place, not just the error code for debugging purposes. Since it is a macros and not a function, each error will have a correct filename and line index.

This is the best and the most convenient feature — otherwise you’d need to comb through the whole code looking for the error message because in C the error processing in general usually looks like this:


if(condition){ // something 
        //some code;
}

In fact, the usage of macros won’t make the code considerably shorter, but — for the sake of debugging — macros are used for efficient pointing out of mistakes. A common practice is adding “if” to the following line:


fprintf("error %s in file %s on line %u", "ERROR MESSAGE", __FILE__,
__LINE__);

Standard constants __FILE__ and __LINE__ display the filename and the line in which it is found. The problem is that if you don’t use macros, you’ll need to add such line to each debugger. This would take considerable time and effort and if we want this output to take place only in the DEBUG version of the product, we’ll need to add at least 3 lines everywhere:

...
#ifdef DEBUG
fprintf("error %s in file %s on line %u", "ERROR MESSAGE", __FILE__,
__LINE__);
#endif
...

But if you create a macro that looks like this:

#define THEMIS_CHECK(cond, error_handler) \
   do{ \
   if(cond){ \
      error_handler; \
   } }while(0) 

And write:

uint8_t* out_message = malloc(out_message_buffer);
assert(out_message);
THEMIS_CHECK(THEMIS_SUCCESS != themis_secure_message_unwrap(private_key,
        peer_public_key, in_message, in_message_length, out_message,
        &out_message_length), free(out_message); return THEMIS_FAIL);

You’ll get the following result through expanding the macros:

#define THEMIS_CHECK(cond, error_handler) \
   do{ \
   if(cond){ \
#ifdef DEBUG
fprintf("error %s in file %s on line %u", "ERROR MESSAGE", __FILE__,
__LINE__); \
#endif
      error_handler; \
   } }while(0) 

In all the places where error processing occurs, error output will be added.

Why not just use inline functions? The answer is simple — preprocessor finishes its work before an inline function actually turns into an inline function (during the compilation process). This is why in all the places where there are constants __FILE__ and __LINE__ (basically a single place where the defined inline function will be substituted for filename and line index) are just a file and a line where a function is defined, but where an error has occurred.

The downside to this is that when a “snippet” becomes complicated enough (i.e. contains many commands), the call to the macros turns into one long pile of code that is hard to read. Macro expansion of structured multi-line logic into a single line is one of the typical macro-driven obstacles to proper macro debugging. And this doesn’t just apply to machine debuggers — the human eyes like structure, too. So, to keep the resulting code debuggable, in Themis we try to keep the macros short and make sure that they look well expanded by verifying it with a 'for-audit' target (see below).

Preprocessors with macros — splendours and miseries

Despite having said many nice words about macros, they still bloat readability and auditability of the code. Can we keep using macros, but also make the code auditable? As it turns out, yes.

Audit-friendly code is code that is transparent and easy to read without jumping back and forth. Which means that macros need to be expanded and turned into a clear and comprehensive code. Well, gcc provides a macro expansion mode that gathers a bunch of files and expands macros before building. So let’s try using the command gcc-E on the following example. This is the code before executing gcc-E:

...
themis_status_t themis_secure_message_wrap(const uint8_t* private_key,
                                           const size_t private_key_length,
                                           const uint8_t* public_key,
                                           const size_t public_key_length,
                                           const uint8_t* message,
                                           const size_t message_length,
                                           uint8_t* wrapped_message,
                                           size_t* wrapped_message_length){
  THEMIS_CHECK_PARAM(private_key!=NULL);
  THEMIS_CHECK_PARAM(private_key_length!=0);
  THEMIS_CHECK_PARAM(message!=NULL);
  THEMIS_CHECK_PARAM(message_length!=0);
  THEMIS_CHECK_PARAM(wrapped_message_length!=NULL);
  if(public_key==NULL && public_key_length==0){
    themis_secure_message_signer_t* ctx=NULL;
    ctx = themis_secure_message_signer_init(private_key, private_key_length);
    THEMIS_CHECK(ctx!=NULL);
    themis_status_t res=themis_secure_message_signer_proceed(ctx, message, message_length, wrapped_message, wrapped_message_length);
    themis_secure_message_signer_destroy(ctx);
    return res;
  } else {
    THEMIS_CHECK_PARAM(public_key!=NULL);
    THEMIS_CHECK_PARAM(public_key_length!=0);
    themis_secure_message_encrypter_t* ctx=NULL;
    ctx = themis_secure_message_encrypter_init(private_key, private_key_length, public_key, public_key_length);
    THEMIS_CHECK__(ctx!=NULL, return THEMIS_INVALID_PARAMETER);
    themis_status_t res=themis_secure_message_encrypter_proceed(ctx, message, message_length, wrapped_message, wrapped_message_length);
    themis_secure_message_encrypter_destroy(ctx);
    return res;
  }
  return THEMIS_INVALID_PARAMETER;
}
...

And this is what it looks like after expanding the macros with the following command:

$ gcc -Isrc -E -CC src/themis/secure_message.c >secure_message.c.aud
$ cat secure_message.c.aud

The resulting output:

...
themis_status_t themis_secure_message_wrap(const uint8_t* private_key,
        const size_t private_key_length,
        const uint8_t* public_key,
        const size_t public_key_length,
        const uint8_t* message,
        const size_t message_length,
        uint8_t* wrapped_message,
        size_t* wrapped_message_length){
  if(!(private_key!=((void *)0))){ ; return 12; };
  if(!(private_key_length!=0)){ ; return 12; };
  if(!(message!=((void *)0))){ ; return 12; };
  if(!(message_length!=0)){ ; return 12; };
  if(!(wrapped_message_length!=((void *)0))){ ; return 12; };
  if(public_key==((void *)0) && public_key_length==0){
    themis_secure_message_signer_t* ctx=((void *)0);
    ctx = themis_secure_message_signer_init(private_key, private_key_length);
    if(!(ctx!=((void *)0))){ ; return 11; };
    themis_status_t res=themis_secure_message_signer_proceed(ctx, message, message_length, wrapped_message, wrapped_message_length);
    themis_secure_message_signer_destroy(ctx);
    return res;
  } else {
    if(!(public_key!=((void *)0))){ ; return 12; };
    if(!(public_key_length!=0)){ ; return 12; };
    themis_secure_message_encrypter_t* ctx=((void *)0);
    ctx = themis_secure_message_encrypter_init(private_key, private_key_length, public_key, public_key_length);
    do{if(!(ctx!=((void *)0))){return 12;}}while(0);
    themis_status_t res=themis_secure_message_encrypter_proceed(ctx, message, message_length, wrapped_message, wrapped_message_length);
    themis_secure_message_encrypter_destroy(ctx);
    return res;
  }
  return 12;
}
...

Oops. This can hardly be called an auditable code.


Let’s see what we can do to get expanded code that makes sense.

Why so bloated

While an effective expansion of macros means that a variety of conditional clauses can be validated by the same macros, one particular disadvantage is that the error handling macros (in themis/themis_error.h) can cause functions to return with some error code which impacts the execution flow. This can be an impediment for those doing code audits through code inspection where the flow should be explicit.

This happens because gcc or clang (or any other) preprocessors cannot selectively expand only macros — they will expand whatever they lay their “hands” on — include statements, ifdef, endif, defines (constants). It turns any file into a monster filled with incomprehensible numbers (the constants defined through #define are expanded/substituted, too). The preprocessors need to be tweaked and modified for the expanded code to be useful in the “auditable” sense (readable). This is why we created a custom script that solves the problem for us.

Taming the preprocessor with a custom script

Theoretically speaking, there are ways to create a preprocessor that won’t need the custom treatment for every macro. This, however, would involve substantial patching of some non-gcc preprocessor and it is unrealistic to assume that the pre-patched preprocessor will be able to process correctly each and every exotic thing it might encounter when expanding a macros.

To address this concern, we created our custom script specifically for inspection of C code in Themis, but the whole approach should help with enhancing the readability of other code as well. It creates a copy of the source code with pre-expanded macros — to make the code easier to read for those who want to read it.

The script consists of 4 steps executed for each file in the project:

Step 1.

grep '^\s*#\s*include' $file_name > /tmp/include.c

Copies all the #include statements into a temporary file /tmp/include.c. This must be done because if the preprocessor touches the #include statements, the code will look much worse than before we started expanding the macros, and it will be totally useless for a sane audit process.

Step 2.

grep -Pv '^\s*#\s*include\b' $file_name >> /tmp/code.c

Copies everything except for #include statements into a temporary file /tmp/code.c and adds file definitions from “soter/soter_error.h" and "themis/themis_error.h" files into the start on the code. This is done to expand the error processing macros on the next step — since we’ve deleted all the #include statements in the Step 1., otherwise the preprocessor won’t be able to find them.

Step 3.

gcc -I src -E -CC /tmp/code.c | grep -v ^# > /tmp/preprocessed.c 

File preprocessing is executed without the #include statements and the result is put into a temporary file /tmp/preprocessed.c.

Step 4.

cat /tmp/include.c > $2
cat /tmp/preprocessed.c >> $2

We’re bringing together the #include statements and the result of preprocessing. The resulting source file will retain the #define statements for function-like macros. This is done to help the person auditing the code understand that there was a macros which is now expanded. Our script for cpp retains the #define statements for this purpose.


Now how about this? What if we run this script on the original example (from src/themis/secure_message_wrapper.c):

$ ./scripts/pp.sh src/themis/secure_message.c secure_message.c.aud
$ cat secure_message.c.aud

The output now will look like this:

...
themis_status_t themis_secure_message_wrap(const uint8_t* private_key,
        const size_t private_key_length,
        const uint8_t* public_key,
        const size_t public_key_length,
        const uint8_t* message,
        const size_t message_length,
        uint8_t* wrapped_message,
        size_t* wrapped_message_length){
  if(!(private_key!=NULL)){ ; return SOTER_INVALID_PARAMETER; };
  if(!(private_key_length!=0)){ ; return SOTER_INVALID_PARAMETER; };
  if(!(message!=NULL)){ ; return SOTER_INVALID_PARAMETER; };
  if(!(message_length!=0)){ ; return SOTER_INVALID_PARAMETER; };
  if(!(wrapped_message_length!=NULL)){ ; return SOTER_INVALID_PARAMETER; };
  if(public_key==NULL && public_key_length==0){
    themis_secure_message_signer_t* ctx=NULL;
    ctx = themis_secure_message_signer_init(private_key, private_key_length);
    if(!(ctx!=NULL)){ ; return SOTER_FAIL; };
    themis_status_t res=themis_secure_message_signer_proceed(ctx, message, message_length, wrapped_message, wrapped_message_length);
    themis_secure_message_signer_destroy(ctx);
    return res;
  } else {
    if(!(public_key!=NULL)){ ; return SOTER_INVALID_PARAMETER; };
    if(!(public_key_length!=0)){ ; return SOTER_INVALID_PARAMETER; };
    themis_secure_message_encrypter_t* ctx=NULL;
    ctx = themis_secure_message_encrypter_init(private_key, private_key_length, public_key, public_key_length);
    do{if(!(ctx!=NULL)){return THEMIS_INVALID_PARAMETER;}}while(0);
    themis_status_t res=themis_secure_message_encrypter_proceed(ctx, message, message_length, wrapped_message, wrapped_message_length);
    themis_secure_message_encrypter_destroy(ctx);
    return res;
  }
  return THEMIS_INVALID_PARAMETER;
}
...

As we can see, macros and macros alone were expanded. All the constants remain untouched and the code is readable, turning the macros into — to misquote Goethe — a part of that force that always promises the trouble and sometimes produces the good.

Conclusions

Why did we bring up this topic, knowing that many developers won’t ever be convinced that selling your soul to macros is worth the trouble? Well, for once, for the sake of the good old mischievous myth-busting. However, having a flexible codebase is very important to us, and having an ability to manipulate it programmatically is just as important, and we’ll be returning to this notion in the upcoming articles.

Sure, like any other tool, macros can be used poorly. We all still remember that careless manipulation of defines and macros in a well-matured C codebase provides a chance to accidentally build an operating system, a Quake clone, or 3 alternative Haskell compilers. But a thoughtful application of macros provides useful instruments that are too good to be overlooked due to nothing but a widespread prejudice, even if this requires creating a workaround (i.e. custom script) for correct functioning. And the “demonic” power of macros suddenly comes in very handy — just as we’ve demonstrated in the code examples from Themis above.

Sure, the pitfalls of macros and the implied risks of choosing the preprocessor script over the safety afforded by the compiler regarding type-checking, namespaces, etc. are not going anywhere. But given the objectives and constraints, conservative and careful use of macros is appropriate and justified.

P.S. Field Trials or Organising the Process

If you want to see how this recipe works in a large real-world project, take a look at the ‘for-audit’ target in Themis build system. If you follow the instructions below, you’ll get a similar output with expanded auditable code.

1. Clone Themis to your machine and build for-audit sources:

$ git clone https://github.com/cossacklabs/themis.git
$ cd themis
$ make for-audit

2. The output will look like this:

compile build/for_audit/soter/soter_container.c [OK]
compile build/for_audit/soter/soter_crc32.c [OK]
compile build/for_audit/soter/soter_hmac.c [OK]
...
compile build/for_audit/themis/secure_message.h [OK]
compile build/for_audit/themis/themis_error.h [OK]
compile build/for_audit/soter/soter_hmac.h [OK]
compile build/for_audit/soter/soter_hash.h [OK]

3. The for-audit Soter and Themis code will be placed into a for-audit build folder.

$ tree build/for_audit/
build/for_audit
├── soter
│   ├── ed25519
│   │   ├── api.h
│   │   ├── base.h
│   │   ├── base2.h
...
whole source follows

That’s it!

P.P.S. Footnotes

* If we were using C++, we’d resort to templates or extensions instead of macros — far from being explicit, but a bit safer. However, we’re using optimal tools for our particular case, which is the feared and hated by many use of macros.

** In C++ the macros are more advanced and understand polymorphism — when there are multiple macros with the same name, but with a different number of parameters, the macros with a correct number of parameters for the received input will be selected).

This is our take on the subject. If you have something to add or would like to share your take on auditing of code with macros, please reach out to us via @CossackLabs or email.

Copyright © 2014-2017 Cossack Labs Limited
Cossack Labs is a privately-held British company with a team of data security experts based in Kyiv, Ukraine.