Recently, I’ve spent some time annotating string-processing C code with preconditions so that I can test our verification tool on it. All of these preconditions look quite similar—you get a pointer that you assume points to a null-terminated string; sometimes a pointer to a buffer with its size passed in another argument—but still I wanted to test somehow that they’re never violated at runtime. In this article I’ll show how to use AddressSanitizer to dynamically test such preconditions, including checking whether a pointer points to a correct null-terminated string or how much space is safely accessible in a buffer starting from a given pointer.
All code described in this article can be found here.
There is no way in standard C to write a function that checks whether a pointer points to a valid string or gives you the size of a buffer—that information simply does not exist anywhere at run-time. The proper way of maintaining it would be to keep a data structure that stores the metadata about all the valid regions of addresses, like in the tool described in the paper “An Optimized Memory Monitoring for Runtime Assertion Checking of C Programs”. Although a memory allocator might already have something similar to that for heap regions, tracking all the stack objects, automatic variables, and globals could be expensive.
AddressSanitizer, a popular memory error detection tool, uses a very efficient implementation of shadow memory to track the validity of individual memory cells instead of storing bounds of memory regions explicitly. However, it can still sometimes give you approximate information about ranges of valid addresses if these regions are separated by invalid memory. So, I was wondering how difficult it would be to extend it with support for the kinds of assertions that I want to check.
It seems that AddressSanitizer exposes some API as several functions in the
These functions allow you to access the shadow memory and query information
about region bounds and variable names although some of the functions did not
do exactly what I assumed they’re going to do. In particular, the documentation
__asan_address_is_poisoned says that it returns 1 only if the given
address is safe to access. It does that for addresses in the general area of
the stack, heap, or the globals section but for some completely garbage
NULL it misleadingly returns 1.
I’ve used LLVM 3.7. If you’re getting different results then maybe it’s the LLVM version since I don’t think this API is meant to be very stable. You probably shouldn’t rely on it too much or use it in production code. For testing or debugging it seems very useful, especially these functions:
__asan_locate_addressgives you lots of useful information about a given pointer including its location (whether is on the stack, heap, globals section, etc.), its corresponding variable name, and—most importantly—the base address and size of the memory region it belongs to.
ASAN_POISON_MEMORY_REGIONcould be used if you have a larger block of allocated memory that you logically subdivide into smaller chunks (for example in your own allocator). You can leave some space between these chunks and poison it to be able to detect sequential buffer overflows.
__asan_describe_addressseems very useful for debugging. It prints out the same kind of information about a given address that you get in ASAN error messages (including the allocation-time stack trace).
To check whether a pointer points to a valid region, check whether the type
__asan_locate_address is either “global”, “heap”, “stack”, or
“stack-fake”. Since its a bit unwieldy to deal with these string descriptors
every time, I ended up creating a following wrapper:
The AddressSanitizer’s concept of a region roughly corresponds to the notion of
an object that appears in the C standard although it’s a lot less restrictive.
Individual globals or locals live in their dedicated regions, structure fields
or array elements will usually be a part of a bigger region that contains at
least the whole structure. For dynamically-allocated memory, the region
corresponds to the whole chunk of size requested in the
That’s why ASAN will not detect sequential buffer overflows in adjacent buffers
in the same struct.
Another thing worth mentioning is the treatment of function pointers. It looks
__asan_locate_address classifies them as “heap-invalid” so they’re not
considered to be valid pointers by
__asan_region_bounds. Even though on many
systems it might be possible to cast a function pointer to a data pointer and
read the underlying machine code, the C standard doesn’t guarantee that so it’s
a reasonable choice to treat them as invalid.
To describe what I mean by checking preconditions for string functions, let’s
take the C library function
strcpy() as the example. The man
page gives a following description:
strcpy()function copies the string pointed to by
src, including the terminating null byte (
'\0'), to the buffer pointed to by
dest. The strings may not overlap, and the destination string
destmust be large enough to receive the copy.
To check this precondition dynamically we need to be able to check whether
is a properly null-terminated string,
dest is big enough, and that the two
regions do not overlap. In the following sections I will describe how to do it
using the AddressSanitizer API. All the code together with a few testcases can
be downloaded from here.
Note that AddressSanitizer alone will detect an invalid access if
dest is too small. The reason I’ve implemented these functions is to test
the preconditions I already have and need for static analysis to make sure
that I’m not verifying the function based on wrong assumptions.
src is a valid string
Since we know the region bounds we can simply check whether a string is null-terminated with a linear walk over the region. Since it’s easy to calculate the string’s length as a side effect of this walk, the most convenient way of implementing this is as a “safe strlen” function which accepts any pointer—including ones not pointing to properly-allocated memory.
The following implementation will return the length of the string starting at
str if this string is properly null-terminated. If it isn’t, it returns a
negative value. We need to be very careful with types here—just returning an
int (which usually is 32-bit) can lead to overflows for strings bigger than 2
Of course, the only thing that this procedure really verifies is that there is
a valid sequence of readable bytes starting at
str and ending with
Thus, we might end up with accidentally null-terminated “strings” by taking
addresses of non-string variables or pointers inside large random memory regions.
Just because the function dynamically discovers something to be a valid string
doesn’t mean that it’s always guaranteed to be one.
For example, if
i is a 32-bit variable equal to 16777215 (or
((char *)&i) + 2 will point to a null-terminated “string” on
little-endian architectures but not on big-endian.
Space available in
Checking if the destination buffer is big enough to accommodate the input
string boils down to determining the region bounds and doing some pointer
arithmetic. For convenience, let’s define a function
for a given pointer
ptr, will return the biggest
x such that for every
between 0 and
x (inclusive), dereferencing
ptr+i is safe. This takes into
account the possibility of
ptr being a completely invalid pointer (e.g. NULL
or pointing outside an allocated region) in which case the function returns -1
since not even
ptr+0 is safe.
Again, I return a signed integer for convenience. The cast of
int64_t will overflow for buffers bigger than a few
exabytes so if you’re reading this in
the very far future, please modify the code accordingly.
Checking if buffers are non-overlapping
This is a much more interesting property, since it might allow us to catch bugs that would not be caught with AddressSanitizer alone. To check that the two buffers do not overlap, it might be tempting to just compare their base addresses. This would be too restrictive, e.g.
Both invocations of
strcpy are given non-overlapping arguments, even though
in the second case they belong to the same allocated region. Similar thing
would happen for example for two arrays in the same class or structure.
However, since we know the string’s length, checking for overlap is simpler and
doesn’t require any additional API calls. The only case when the buffers overlap
dest <= src and
dest + length >= src. Note that if you want to be
really super strict about it, then comparing pointers that don’t point within
the same array is undefined behavior according to the C standard.
I don’t really know how to deal with this (checking ASAN’s base addresses is
not enough) but it’s likely a non-issue since we’re making platform-specific
Full precondition for
strcpy can be checked like that: