Plugging leaks in Go memory management

Intro

As many of you know, Go is an amazing modern programming language with automated memory management. We love Go: we've used it to build Acra, our database encryption suite, we further use it to build other products. Not being extremely fancy, Go is practical and efficient and is a way to get things done in a reasonable timeframe.

Sometimes, even in the age of modern, garbage-collected languages, with their own great profiling tools and well-thought memory management style, you'll still have to dust off old good valgrind to understand what's going on.

This is a story of a pointer management change for external libraries in CGo that made us look into the void really hard, just like our friend, looking into the abyss, holding steadily, waiting for the malicious insects:

While building Acra, we've stumbled upon a few interesting technical challenges, the memory leaks being one of the most interesting to solve.

In the context of memory management and data throughput, Acra works in a pretty straightforward way:

  • it listens to an application via secure channel
  • it proxies network traffic in both directions (from client to the database and the other way)
  • it scans responses from the database for cryptographic containers, and, if certain criteria is met, attempts to decrypt them and replace them inside the traffic stream with plaintext equivalent.

Acra relies on Themis cryptographic library, more specifically, on it's Go wrapper, to perform cryptographic operations.

What if the whole thing starts to leak so bad that OOMKiller just kills it during performance testing stage?

Memory leaks in Go, a bit of background

Memory leaks are very common in almost any language, including garbage collected languages. Go is no exception.

As with any highly abstracted language, a reference to an object, if not properly managed, may be left assigned even after it's not used anymore. This usually happens either within main application's non-deterministic loops or on the borders with external imports. With architectures like Acra's, it's quite hard to detect some memory leaks in development in the first place: they take hours or even days to reveal themselves.

Memory management problems in Go have three main possible causes:

  • Blocked goroutines in a loop
  • Deferred code in a loop
  • Memory issues caused by third-party imports libraries.

First two are well-known and obvious. The third is what happened to Themis wrapper for Go, when we've started to use it to process large volumes of data.

Investigating a leaky piece of Acra

At some point, during performance benchmarking on a machine rather limited in resources, some instances of AcraServer (the main binary within Acra suite) would suddenly cease to be. AcraServer is not dead, it's resting, some might have thought, but evident activities of OOMKiller told otherwise:

kernel: Out of memory: Kill process 9163 (acraserver) score 511 or sacrifice child

What should we do? Put a longer testing session on a shorter leash: collect memory consumption metrics on system level and peek into them once in a while:

Notes on graphs:

  • Here and in all further measurements we've measured memory state after passing of [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000] megabytes through AcraServer. Each graph's X axis changes scale in the middle, so each graph essentially combines 2 graphs: 0..100MB and 100..1000MB. Although these measurements are simple, they were sufficient to quickly detect trends and so are used to illustrate them in this article.
  • Here and afterward, all values are averaged over 100 test runs.
  • Units for all axis are megabytes.
  • "OS" measurements do not include memory size used by Go runtime, since the leak actually happened outside of it.

Workload testing Acra for some time (1-2 hours) shows that memory usage increases with every passing database request. After 2 hours of continuous testing, memory allocated by AcraServer was more than 500Mb.

htop picture:

    top - 15:53:02 up 8 days, 23:25,  0 users,  load average: 2.47, 2.33, 2.24
    Tasks:   3 total,   1 running,   2 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 19.2 us, 11.5 sy,  0.0 ni, 69.2 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
    KiB Mem:   5853768 total,  5562696 used,   291072 free,   129716 buffers
    KiB Swap:        0 total,        0 used,        0 free.  2070512 cached Mem

    %MEM   PID USER      PR  NI    VIRT    RES    SHR S  %CPU     TIME+ COMMAND
     1.3     1 root      20   0  529400   9636   8036 S   0.0   0:00.20 acraserver
     0.1    51 root      20   0   20248   3212   2728 S   0.0   0:00.02 bash
     0.1    57 root      20   0   21968   2488   2092 R   0.0   0:00.05 top

Tuning GC parameters didn't help eliminate the leak. Houston, we have a problem. What if this leak exposes something sensitive? This suddenly stops being a performance problem, do we want 'acrableed' before even releasing Acra in public?

Using pprof wasn't extremely helpful, since it shows only parts of memory allocated by Go during execution process, even memory used by Go itself is not profiled.

However, looking at pprof graph output 1 for some time, we've noticed that significant part of memory allocations was for cryptographic "objects" coming from crypto library (Themis).

These functions are actually wrappers for native code imported from themis.so library:

binary_decryptor.go

    smessage := message.New(private_key, pubkey)
    symmetric_key, err := smessage.Unwrap(decryptor.decoded_key_block_buffer[base.PUBLIC_KEY_LENGTH:])
    if err != nil {
            log.Printf("Warning: %v\n", ErrorMessage("can't unwrap symmetric key", err))
            return nil, decryptor.key_block_buffer[:n], base.ErrFakeAcraStruct
    }

This code uses Secure Message object from Go Themis wrapper:

message.go

    func (sm *SecureMessage) Unwrap(message []byte) ([]byte, error) {
            if nil == sm.private || 0 == len(sm.private.Value) {
                    return nil, errors.New("Private key was not provided")
            }

            if nil == sm.peerPublic || 0 == len(sm.peerPublic.Value) {
                    return nil, errors.New("Peer public key was not provided")
            }
            return SecureMessageUnwrap(sm.private, sm.peerPublic, message)
    }

This line in question is:

SecureMessageUnwrap(sm.private, sm.peerPublic, message)

First guess is that some deallocations just never happened.

Even though at this point the suspect is obvious - something between Go code that uses Themis and Themis itself, we wanted to understand Go's memory behavior better for the future. To analyze memory usage we use old good valgrind. Obviously, valgrind does not take into account any of Go's specifics, but if there are any allocation problems, we will be able to see them in valgrind stats.

By simply starting valgring ./acraserver (acraserver is our Go binary for Acra's AcraServer component) and sending plenty of data through the server, we can see some interesting errors of referencing and reading of previously freed data (we will talk about this next), but no problems with allocated memory blocks.

Advancing with first working hypothesis:

Maybe we are using very small memory blocks and Go's GC ignores them until memory usage grows bigger?

Then we stop valgrind ./acraserver, and see huge set of errors in valgring output, with definite next conclusion:

    ==15592== LEAK SUMMARY:
    ==15592==    definitely lost: 156,456,124 bytes in 536 blocks
    ==15592==    indirectly lost: 0 bytes in 0 blocks
    ==15592==      possibly lost: 1,152 bytes in 4 blocks
    ==15592==    still reachable: 2,080 bytes in 16 blocks
    ==15592==         suppressed: 0 bytes in 0 blocks

This says that some data, which passed through AcraServer, ended up leaking. It was not freed by Go's GC at all. So, based on what we already knew about allocation from pprof, Go did not free up any allocations for data passed to Themis and back.

Just to make sure that it is true, we've ran valgrind in sessions with increasing amount of data passing through AcraServer to see whether there is a direct correlation between the data amount and the leak amount:

... and there really was!

So, what does it mean in our case? Let's see how Themis is used in Acra:

  • Themis provides traffic protection between AcraServer and integration side within client application, called AcraProxy, via Secure Session.
  • AcraServer decrypts special cryptographic containers (AcraStructs) within a stream of responses from the database to the application.

So, what do we see on the graph and how to interpret the data? 0..100MB shows linear dependency, 100..1000MB does as well, 0..100MB shows growth, but with deviations (change in leak sometimes can be proportional to size of allocations), but once we started to measure leaks in 100Mb steps, the obvious 45 degree angle emerged.

The conclusion was that only some parts of Themis-Acra connection are leaking. If we'd have every Themis function leaking, we would see that leak is almost three times bigger than actual data volume (Secure Session from app to Acra, Secure Session from Acra to app, decryption attempts on traffic going through Acra).

Fixing leak cause

Core cryptographic library behind Acra - Themis, is distributed as a shared object library and is written on C. GoThemis is a wrapper, made using native imports like this:

message.go

func SecureMessageUnwrap(private_key []byte, public_key []byte, message []byte) ([]byte, error){
   var m = []byte
   var m_length = 0
   var output = (*byte)(unsafe.Pointer(m))
   var output_length = (C.size_t*)(unsafe.Pointer(&m_length))
   if(THEMIS_SUCCESS != 
      int(C.themis_secure_message_unwrap(unsafe.Pointer(&private_key[0],
                                                        C.size_t(len(private_key)),
                                                        unsafe.Pointer(&public_key[0]),
                                                        C.size_t(len(public_key)),
                                                        unsafe.Pointer(&message[0]),
                                                        C.size_t(len(message)),
                                                        output,
                                                        output_length)))){
       return nil, errors.New("Failed to wrap message")
   }
   return m, nil
}

In this piece of code, according to our current understanding of the way cgo works, themis_secure_message_unwrap function allocates memory for output pointer, which is then never freed.

When we've initially started building Themis for Go, dominant was Go 1.3, which had mechanisms to detect memory blocks allocated inside used libraries and all such blocks were to be freed by GC. Go 1.6 significantly improved its memory management, however as part of those improvements memory allocated by external libraries was no longer garbage collected by the runtime. The easiest method to avoid such leaks is to fully perform all memory management by the used third party library itself.

In new version all Themis interface functions (and some internal Themis functions) were rewritten with get-size/write-to-buffer:

C part (message.go):

    /*
    #cgo LDFLAGS: -lthemis -lsoter
    #include <stdlib.h>
    #include <string.h>
    #include <stdint.h>
    #include <stdbool.h>
    #include <themis/error.h>
    #include <themis/secure_message.h>

    static bool get_unwrapped_message_size(const void *priv, size_t priv_len, const void *public, size_t pub_len, const void *message, size_t message_len, bool is_wrap, size_t *out_len)
    {
            themis_status_t res = themis_secure_message_unwrap(priv, priv_len, public, pub_len, message, message_len, NULL, out_len);
            return THEMIS_BUFFER_TOO_SMALL == res;
    }

    static bool message_unwrap(const void *priv, size_t priv_len, const void *public, size_t pub_len, const void *message, size_t message_len, bool is_wrap, void *out, size_t out_len)
    {
            themis_status_t res = themis_secure_message_unwrap(priv, priv_len, public, pub_len, message, message_len, out, &out_len);
            return THEMIS_SUCCESS == res;
    }

In this snippet, themis_secure_message_unwrap does not allocate the memory itself, it requires previously allocated block of memory to put decrypted data into it.

Go part (message.go):

    */
    import "C"
    ...
    func messageUnwrap(private []byte, peerPublic []byte, message []byte) ([]byte, error) {
            var output_length C.size_t
            if !bool(C.get_message_size(unsafe.Pointer(&private[0]),
                                        C.size_t(len(private)),
                                        unsafe.Pointer(&peerPublic[0]),
                                        C.size_t(len(peerPublic)),
                                        unsafe.Pointer(&message[0]),
                                        C.size_t(len(message)),
                                        &output_length)){
                    return nil, errors.New("Failed to get output size")
            }

            output := make([]byte, int(output_length), int(output_length))
            if !bool(C.message_unwrap(unsafe.Pointer(&private[0]),
                                        C.size_t(len(private)),
                                        unsafe.Pointer(&peerPublic[0]),
                                        C.size_t(len(peerPublic)),
                                        unsafe.Pointer(&message[0]),
                                        C.size_t(len(message)),
                                        unsafe.Pointer(&output[0]),
                                        output_length)){
                    return nil, errors.New("Failed to wrap message")
            }
            return output, nil
    }

By changing output allocation to output := make([]byte, int(output_length), int(output_length)), we make Go explicitly manage this block, and make this pointer visible to Go's GC for further deallocation.

Profiling the results

The first test was completely unscientific - let's just leave Acra for a few days crunching the numbers and see if it leaks. Result: nope, it doesn't. Time to run `valgrind`/`pprof` pair of tests on our initial setup:

 valgrind output:
    ==15592== LEAK SUMMARY:
    ==15592==    definitely lost: 0 bytes in 0 blocks
    ==15592==    indirectly lost: 0 bytes in 0 blocks
    ==15592==      possibly lost: 1,152 bytes in 4 blocks
    ==15592==    still reachable: 2,080 bytes in 16 blocks
    ==15592==         suppressed: 0 bytes in 0 blocks.

Things are fairly simple to fix when you know exactly where to look!

Total memory occupied by the process now is just pprof-visible part (and Go's internal runtime memory, which is out of scope of current article). Zero leaks in the system.

Conclusions, lessons learned

Whenever a new engineering challenge gets solved, it feels like all the reasons and solutions are extremely obvious, and should've been known beforehand. As useful as feeling a bit dumb sometimes is, it actually is hindsight is 20/20 rule (broader understood as particular case of retrospective predictability) and happens all the time. Still, the obvious lessons learned:

New languages are that volatile

When we wrote the first version of Acra, which used Go 1.3, nobody could have guessed that porting it to a more modern version of Go would introduce such drastic changes under the hood. Turning off automated memory management for external components is a serious move: it's not a change in syntactic sugar, which can be fixed with grep then sed incursion on your codebase. It's well grounded and understandable, though: pointer arithmetic is a tricky game, and if you can simplify things - you should.

It is worth mentioning that reading release notes to 1.6 left us bepuzzled and took a long road to understand that

The major change is the definition of rules for sharing Go pointers with C code, to ensure that such C code can coexist with Go's garbage collector.

actually means fix your pointer allocation because we have finally made garbage collector to ignore implicit CGo allocations.

Memory leaks could've leaked the sensitive data.

When developing for security, there is another significant consideration when taking care of memory management. Leak memory and you might leak something sensitive. Mismanage memory, and you might end up nullifying all your efforts in cryptographically protecting the data. Having luckily avoided AcraBleed this time, we always keep an eye on memory leaks, and so should you.

There is no silver bullet for non-transparent things.

We are routinely running GoFuzz against Acra as part of testing. We are also frequently using PProf to see what's going on with the memory under load. However, there are always things that slip undetected, which leads us to the next conclusion...

Extreme performance testing is a must.

One thing many of us learned before: automatic memory managed languages require extra attention in quality assurance. Unit tests and functional tests give some answers to the "does it work the way we anticipate it to work" question, but never quite answer the "will it work in production properly?" one. At that point in time, the only deployment of Acra with significant load ran on Go 1.3, and nobody had realistic production data of 1.6 version.

Memory management still dictates a lot of technical choices.

Even modern, GC-driven languages like Go still simetimes require a lot of precision in memory management. It feels like one of the reasons of absence in Go of many OOP/highly abstract primitives is that these things will dramatically increase complexity of memory management mechanisms, and degrade performance. Even without using all of these niceties, if you try hard enough, you can still cause mayhem.

And, you will have to fix mayhem, just like we did.


Footnotes

1 - Go world needs automagic pprof beautifier, based on number of people on the web wondering what do all these boxes and percents mean. It took us a few moments of confusion, too


Related Posts

Copyright © 2014-2017 Cossack Labs Limited
Cossack Labs is a privately-held British company with a team of data security experts based in Kyiv, Ukraine.