19 Oct 2017

Invariants hidden in callbacks

In this post I will discuss one of my favorite pet-peeves: callbacks. This post is programming language independent, though I guess it will shine through that I mainly work in C, C++ and Node.js. Callbacks are super nice for accomplishing a wide array of tasks: concern separation, asynchronous execution, future extension, etc. however there are several problems hidden in how callbacks can be implemented.

callbacks.jpg

The first problem is to not allow for a context to be passed through to the callback. With modern languages where we have closures this is a non-problem because the function will automatically carry the extra state with it. However in C you sometimes see a callback (which is a function pointer) used like this:

// Header

// This is how we declare a function pointer with no args, returning void, called MyCallback.

typedef (void) (*MyCallback)();

void registerCallback(MyCallback *cb);



// Source

static MyCallback *g_registeredCallback = NULL;

void registerCallback(MyCallback *cb) {

 g_registeredCallback = cb;

}



void invokeCallback() {

 if (g_registeredCallback != NULL) {

   g_registeredCallback();

 }

}

We register our own callbacks like this:

void callbackFunction() {

 printf("callack was invoked\n");

}

 registerCallback(callbackFunction);

 invokeCallback();

However the API doesn’t allow me to pass any extra state to the callback, so I can't attach the callback to any "object". The solution is to always pass in an extra context parameter:

// Header

typedef (void) (*MyCallback)(void *context);

void registerCallback(MyCallback *cb, void *context);



// Source

static MyCallback *g_registeredCallback = NULL;

static MyCallback *g_registeredCallbackContext = NULL;

void registerCallback(MyCallback *cb, void *context) {

 g_registeredCallback = cb;

 g_registeredCallbackContext = context;

}



void invokeCallback() {

 if (g_registeredCallback != NULL) {

   g_registeredCallback(g_registeredCallbackContext);

 }

}

Now we have a state and we can connect a specific callback invocation to an object. This is automatically solved in JavaScript since a function reference contains it's closure (for non-javascript programmers here is an example of what this means):

function generateCallback() {

 var closureVariable = 32;

 return function() {

   console.log('the closure variable is', closureVariable);

   closureVariable++;

 }

}



let callback1 = generateCallback();

callback1(); // prints 32

callback1(); // prints 33



let callback2 = generateCallback();

callback2(); // prints 32

callback1(); // prints 34

The next problem is that when we invoke a callback we need to consider that the callback can do anything. Take following example where we store callbacks in an array and then later we will process them and clear the callback queue (this time implemented in JavaScript so the context problem from above is automatically solved).

let callbacks = [];

function registerCallback(cb) {

 callbacks.push(cb);

}



function processCallbacks() {

 callbacks.forEach(cb => cb());

 callbacks = [];

}

The implementation looks innocent, however consider following usage:

registerCallback(() => {

 registerCallback(() => {

   console.log('When is this called?')

 });

});

and boom an infinite loop. This problem has many variations, for instance if we allow a callback to be unregistered, can we unregister ourselves from within the callback? The common trait for these problems is that we have some invariant that gets violated, i.e. when we wrote the functions we expected the callbacks array to not be modified while we are processing callbacks.

To ensure we don't violate the invariant we can rewrite like this:

function processCallbacks() {

 let internalCallbacks = callbacks;

 callbacks = [];



 internalCallbacks.forEach(cb => cb());

}

Now we first make a copy of the globally accessible object so registering a new callback while we process callbacks will not be executed.

My final point around callbacks is to always have same state when invoking the callbacks. With state I mean callstack, mutexes held, etc. Consider following example:

function doProcess(callback) {

 if (Math.rand() < 0.5) {

   callback(Math.rand());

   return;

 }



 globalVariable++;

 callback(Math.rand());

}

when the callback is invoked we will not know if the global variable has been updated or not. Common versions of this problem is to have different execution flows where the callback is invoked with different mutexes held along the different paths, or calling a callback both synchronously and asynchronously. A nice solution is to refactor the code so the callback is only ever invoked from one place. Also notice that if you ever refactor your code so the callback is invoked in a different state (for instance holding different mutexes or executing on a different thread) can lead to hard diagnosed and mysterious bugs.

To summarize, the bugs come down to breaking invariants, and when the invariants are implicit it can be hard to spot the problems. The invariants can be broken either by the callback doing things the callback invoker didn't expect, or reversely the signaller invoking the callback at times the callback didn't expect to be invoked.

In Node.js we can often get around the problems by executing the callbacks from a process.nextTick callback:

function processCallbacks() {

 callbacks.forEach(cb => process.nextTick(() => cb()));

 callbacks = [];

}

and then we need to accept that our callbacks always will fire asynchronously. In C and C++ there is no general solution for executing deferred callbacks, so in a future post we will look at what our options are there.

More posts

  • Edument spreads its knowledge at PerlCon

    Today the annual conference PerlCon starts. This year it is held in Riga and Edument's presence is great as three of our developers will speak at the conference.

    Read more
  • We launch summer Boot Camp

    We are launching two Boot Camps, for you who want to become a modern and complete .NET Core developer. And for you who want to learn everything you need to know about JavaScript and frontend programming in 2019.

    Read more
  • Priority queues in Java and Python

    How do you talk about a "priority queue", a queue data structure where elemens get to "cut in line" if they're important? In this article, we'll compare the (quite different) answers from Java and Python standard libraries.

    Read more
  • How to install RavenDB on a VM in Azure (step-by-step, part 1)

    This is a guide for you who want to work with document databases. I describe how I got RavenDB to work on a regular Windows Server 16 virtual machine which in turn runs in the Azure cloud. 

    Read more
  • How to install RavenDB on a VM in Azure (step-by-step, part 2)

    A guide in three parts, this is the second part. 

    Read more