Shared memory in nginx

July 12th, 2008

Nginx, while being unthreaded, allows worker processes to share memory between them. However, this is quite different from the standard pool allocator as the shared segment has fixed size and cannot be resized without restarting nginx or destroying its contents in another way.

The quick and dirty guide to shared memory in nginx

A (fore)word of caution

First of all, caveat hacker. This guide has been written several months after hands-on experience with shared memory in nginx and while I try my best to be accurate (and have spent some time refreshing my memory), in no way is it guaranteed. You've been warned.

Also, 100% of this knowledge comes from reading the source and reverse-engineering the core concepts, so there are probably better ways to do most of the stuff described.

Oh, and this guide is based on 0.6.31, though 0.5.x is 100% compatible AFAIK and 0.7.x also brings no compatibility-breaking changes that I know of.

For real-world usage of shared memory in nginx, see my upstream_fair module.

This probably does not work on Windows at all. Core dumps in the rear mirror are closer than they appear.

Creating and using a shared memory segment

To create a shared memory segment in nginx, you need to:

  • provide a constructor function to initialise the segment
  • call ngx_shared_memory_add

These two points contain the main gotchas (that I came across), namely:

  1. Your constructor will be called multiple times and it's up to you to find out whether you're called the first time (and should set something up), or not (and should probably leave everything alone). The prototype for the shared memory constructor looks like:

    static ngx_int_t init(ngx_shm_zone_t *shm_zone, void *data);
    

    The data variable will contain the contents of oshm_zone->data, where oshm_zone is the "old" shm zone descriptor (more about it later). This variable is the only value that can survive a reload, so you must use it if you don't want to lose the contents of your shared memory.

    Your constructor function will probably look roughly similar to the one in upstream_fair, i.e.:

    static ngx_int_t
    init(ngx_shm_zone_t *shm_zone, void *data)
    {
    	if (data) { /* we're being reloaded, propagate the data "cookie" */
    		shm_zone->data = data;
    		return NGX_OK;
    	}
    
    	/* set up whatever structures you wish to keep in the shm */
    
    	/* initialise shm_zone->data so that we know we have
    	been called; if nothing interesting comes to your mind, try
    	shm_zone->shm.addr or, if you're desperate, (void*) 1, just set
    	the value to something non-NULL for future invocations
    	*/
    	shm_zone->data = something_interesting;
    
    	return NGX_OK;
    }
    

  2. You must be careful when to access the shm segment.

    The interface for adding a shared memory segment looks like:

    ngx_shm_zone_t *
    ngx_shared_memory_add(ngx_conf_t *cf, ngx_str_t *name, size_t size,
    	void *tag);
    

    cf is the reference to the config file (you'll probably create the segment in response to a config option), name is the name of the segment (as a ngx_str_t, i.e. a counted string), size is the size in bytes (which will usually get rounded up to the nearest multiple of the page size, e.g. 4KB on many popular architectures) and tag is a, well, tag for detecting naming conflicts. If you call ngx_shared_memory_add multiple times with the same name, tag and size, you'll get only a single segment. If you specify different names, you'll get several distinct segments and if you specify the same name but different size or tag, you'll get an error. A good choice for the tag value could be e.g. the pointer to your module descriptor.

    After you call ngx_shared_memory_add and receive the new shm_zone descriptor, you must set up the constructor in shm_zone->init. Wait... after you add the segment? Yes, and that's a major gotcha. This implies that the segment is not created while calling ngx_shared_memory_add (because you specify the constructor only later). What really happens looks like this (grossly simplified):

    1. parse the whole config file, noting requested shm segments

    2. afterwards, create/destroy all the segments in one go

      The constructors are called here. Note that every time your ctor is called, it is with another value of shm_zone. The reason is that the descriptor lives as long as the cycle (generation in Apache terms) while the segment lives as long as the master and all the workers. To let some data survive a reload, you have access to the old descriptor's ->data field (mentioned above).

    3. (re)start workers which begin handling requests

    4. upon receipt of SIGHUP, goto 1

    Also, you really must set the constructor, otherwise nginx will consider your segment unused and won't create it at all.

    Now that you know it, it's pretty clear that you cannot rely on having access to the shared memory while parsing the config. You can access the whole segment as shm_zone->shm.addr (which will be NULL before the segment gets really created). Any access after the first parsing run (e.g. inside request handlers or on subsequent reloads) should be fine.

Using the slab allocator

Now that you have your new and shiny shm segment, how do you use it? The simplest way is to use another memory tool that nginx has at your disposal, namely the slab allocator. Nginx is nice enough to initialise the slab for you in every new shm segment, so you can either use it, or ignore the slab structures and overwrite them with your own data.

The interface consists of two functions:

  • void *ngx_slab_alloc(ngx_slab_pool_t *pool, size_t size);
  • void ngx_slab_free(ngx_slab_pool_t *pool, void *p);
The first argument is simply (ngx_slab_pool_t *)shm_zone->shm.addr and the other one is either the size of the block to allocate, or the pointer to the block to free. (trivia: not once is ngx_slab_free called in vanilla nginx code)

Spinlocks, atomic memory access

Remember that shared memory is inherently dangerous because you can have multiple processes accessing it at the same time. The slab allocator has a per-segment lock (shpool->mutex) which is used to protect the segment against concurrent modifications.

You can also acquire and release the lock yourself, which is useful if you want to implement some more complicated operations on the segment, like searching or walking a tree. The two snippets below are essentially equivalent:

/*
void *new_block;
ngx_slab_pool_t *shpool = (ngx_slab_pool_t *)shm_zone->shm.addr;
*/

new_block = ngx_slab_alloc(shpool, ngx_pagesize);
ngx_shmtx_lock(&shpool->mutex);
new_block = ngx_slab_alloc_locked(shpool, ngx_pagesize);
ngx_shmtx_unlock(&shpool->mutex);
In fact, ngx_slab_alloc looks almost exactly like above.

If you perform any operations which depend on no new allocations (or, more to the point, frees), protect them with the slab mutex. However, remember that nginx mutexes are implemented as spinlocks (non-sleeping), so while they are very fast in the uncontended case, they can easily eat 100% CPU when waiting. So don't do any long-running operations while holding the mutex (especially I/O, but you should avoid any system calls at all).

You can also use your own mutexes for more fine-grained locking, via the ngx_mutex_init(), ngx_mutex_lock() and ngx_mutex_unlock() functions.

As an alternative for locks, you can use atomic variables which are guaranteed to be read or written in an uninterruptible way (no worker process may see the value halfway as it's being written by another one).

Atomic variables are defined with the type ngx_atomic_t or ngx_atomic_uint_t (depending on signedness). They should have at least 32 bits. To simply read or unconditionally set an atomic variable, you don't need any special constructs:

ngx_atomic_t i = an_atomic_var;
an_atomic_var = i + 5;

Note that anything can happen between the two lines; context switches, execution of code on other other CPUs, etc.

To atomically read and modify a variable, you have two functions (very platform-specific) with their interface declared in src/os/unix/ngx_atomic.h:

  • ngx_atomic_cmp_set(lock, old, new)

    Atomically retrieves old value of *lock and stores new under the same address. Returns 1 if *lock was equal to old before overwriting.

  • ngx_atomic_fetch_add(value, add)

    Atomically adds add to *value and returns the old *value.

Using rbtrees

OK, you have your data neatly allocated, protected with a suitable lock but you'd also like to organise it somehow. Again, nginx has a very nice structure just for this purpose - a red-black tree.

Highlights (API-wise):

  • requires an insertion callback, which inserts the element in the tree (probably according to some predefined order) and then calls ngx_rbt_red(the_newly_added_node) to rebalance the tree
  • requires all leaves to be set to a predefined sentinel object (not NULL)

This guide is about shared memory, not rbtrees so shoo! Go read the source for upstream_fair to see creating and walking an rbtree in action.

Leave a Reply