Shared memory in nginx
July 12th, 2008
The quick and dirty guide to shared memory in nginx
A (fore)word of caution
First of all, caveat hacker. This guide has been written several months after hands-on experience with shared memory in nginx and while I try my best to be accurate (and have spent some time refreshing my memory), in no way is it guaranteed. You've been warned.
Also, 100% of this knowledge comes from reading the source and reverse-engineering the core concepts, so there are probably better ways to do most of the stuff described.
Oh, and this guide is based on 0.6.31, though 0.5.x is 100% compatible AFAIK and 0.7.x also brings no compatibility-breaking changes that I know of.
For real-world usage of shared memory in nginx, see my upstream_fair module.
This probably does not work on Windows at all. Core dumps in the rear mirror are closer than they appear.
Creating and using a shared memory segment
To create a shared memory segment in nginx, you need to:
- provide a constructor function to initialise the segment
- call
ngx_shared_memory_add
These two points contain the main gotchas (that I came across), namely:
Your constructor will be called multiple times and it's up to you to find out whether you're called the first time (and should set something up), or not (and should probably leave everything alone). The prototype for the shared memory constructor looks like:
static ngx_int_t init(ngx_shm_zone_t *shm_zone, void *data);
The data variable will contain the contents of
oshm_zone->data, whereoshm_zoneis the "old" shm zone descriptor (more about it later). This variable is the only value that can survive a reload, so you must use it if you don't want to lose the contents of your shared memory.Your constructor function will probably look roughly similar to the one in upstream_fair, i.e.:
static ngx_int_t init(ngx_shm_zone_t *shm_zone, void *data) { if (data) { /* we're being reloaded, propagate the data "cookie" */ shm_zone->data = data; return NGX_OK; } /* set up whatever structures you wish to keep in the shm */ /* initialise shm_zone->data so that we know we have been called; if nothing interesting comes to your mind, try shm_zone->shm.addr or, if you're desperate, (void*) 1, just set the value to something non-NULL for future invocations */ shm_zone->data = something_interesting; return NGX_OK; }-
You must be careful when to access the shm segment.
The interface for adding a shared memory segment looks like:
ngx_shm_zone_t * ngx_shared_memory_add(ngx_conf_t *cf, ngx_str_t *name, size_t size, void *tag);
cfis the reference to the config file (you'll probably create the segment in response to a config option), name is the name of the segment (as angx_str_t, i.e. a counted string), size is the size in bytes (which will usually get rounded up to the nearest multiple of the page size, e.g. 4KB on many popular architectures) and tag is a, well, tag for detecting naming conflicts. If you callngx_shared_memory_addmultiple times with the same name, tag and size, you'll get only a single segment. If you specify different names, you'll get several distinct segments and if you specify the same name but different size or tag, you'll get an error. A good choice for the tag value could be e.g. the pointer to your module descriptor.After you call
ngx_shared_memory_addand receive the newshm_zonedescriptor, you must set up the constructor inshm_zone->init. Wait... after you add the segment? Yes, and that's a major gotcha. This implies that the segment is not created while callingngx_shared_memory_add(because you specify the constructor only later). What really happens looks like this (grossly simplified):parse the whole config file, noting requested shm segments
afterwards, create/destroy all the segments in one go
The constructors are called here. Note that every time your ctor is called, it is with another value of
shm_zone. The reason is that the descriptor lives as long as the cycle (generation in Apache terms) while the segment lives as long as the master and all the workers. To let some data survive a reload, you have access to the old descriptor's->datafield (mentioned above).(re)start workers which begin handling requests
upon receipt of SIGHUP, goto 1
Also, you really must set the constructor, otherwise nginx will consider your segment unused and won't create it at all.
Now that you know it, it's pretty clear that you cannot rely on having access to the shared memory while parsing the config. You can access the whole segment as
shm_zone->shm.addr(which will be NULL before the segment gets really created). Any access after the first parsing run (e.g. inside request handlers or on subsequent reloads) should be fine.
Using the slab allocator
Now that you have your new and shiny shm segment, how do you use it? The simplest way is to use another memory tool that nginx has at your disposal, namely the slab allocator. Nginx is nice enough to initialise the slab for you in every new shm segment, so you can either use it, or ignore the slab structures and overwrite them with your own data.
The interface consists of two functions:
void *ngx_slab_alloc(ngx_slab_pool_t *pool, size_t size);void ngx_slab_free(ngx_slab_pool_t *pool, void *p);
(ngx_slab_pool_t *)shm_zone->shm.addr and
the other one is either the size of the block to allocate, or the
pointer to the block to free. (trivia: not once is ngx_slab_free
called in vanilla nginx code)
Spinlocks, atomic memory access
Remember that shared memory is inherently dangerous because you can have
multiple processes accessing it at the same time. The slab allocator has
a per-segment lock (shpool->mutex) which is used to protect the segment
against concurrent modifications.
You can also acquire and release the lock yourself, which is useful if you want to implement some more complicated operations on the segment, like searching or walking a tree. The two snippets below are essentially equivalent:
/* void *new_block; ngx_slab_pool_t *shpool = (ngx_slab_pool_t *)shm_zone->shm.addr; */ new_block = ngx_slab_alloc(shpool, ngx_pagesize);
ngx_shmtx_lock(&shpool->mutex); new_block = ngx_slab_alloc_locked(shpool, ngx_pagesize); ngx_shmtx_unlock(&shpool->mutex);In fact, ngx_slab_alloc looks almost exactly like above.
If you perform any operations which depend on no new allocations (or, more to the point, frees), protect them with the slab mutex. However, remember that nginx mutexes are implemented as spinlocks (non-sleeping), so while they are very fast in the uncontended case, they can easily eat 100% CPU when waiting. So don't do any long-running operations while holding the mutex (especially I/O, but you should avoid any system calls at all).
You can also use your own mutexes for more fine-grained locking, via the
ngx_mutex_init(), ngx_mutex_lock() and ngx_mutex_unlock() functions.
As an alternative for locks, you can use atomic variables which are guaranteed to be read or written in an uninterruptible way (no worker process may see the value halfway as it's being written by another one).
Atomic variables are defined with the type ngx_atomic_t or
ngx_atomic_uint_t (depending on signedness). They should have at least
32 bits. To simply read or unconditionally set an atomic variable, you
don't need any special constructs:
ngx_atomic_t i = an_atomic_var; an_atomic_var = i + 5;
Note that anything can happen between the two lines; context switches, execution of code on other other CPUs, etc.
To atomically read and modify a variable, you have two functions (very
platform-specific) with their interface declared in
src/os/unix/ngx_atomic.h:
ngx_atomic_cmp_set(lock, old, new)Atomically retrieves old value of
*lockand storesnewunder the same address. Returns 1 if*lockwas equal tooldbefore overwriting.ngx_atomic_fetch_add(value, add)Atomically adds
addto*valueand returns the old*value.
Using rbtrees
OK, you have your data neatly allocated, protected with a suitable lock but you'd also like to organise it somehow. Again, nginx has a very nice structure just for this purpose - a red-black tree.
Highlights (API-wise):
- requires an insertion callback, which inserts the element in the
tree (probably according to some predefined order) and then calls
ngx_rbt_red(the_newly_added_node)to rebalance the tree - requires all leaves to be set to a predefined sentinel object (not NULL)
This guide is about shared memory, not rbtrees so shoo! Go read the source for upstream_fair to see creating and walking an rbtree in action.
Leave a Reply