Thursday, June 28, 2012

careful with synchronous operations in async iterator

We use async flow control Node.JS library a lot at work. It provides various convenient functions to guide you through async programming mess. But lately, we found an interesting issue/lesson using async library:

Better avoid using synchronous operations inside the iterator, otherwise, when the number of items to iterate is big enough, you will exceed the call stack size.

For example, this code snippet just gives the basic idea (although realistically, you don't really need async to output a large array ;-)

var async = require('async');
var a = [];

for(var i = 0; i < 3040000; i++) {
 a.push(i);
}

async.forEachSeries(a,
 function (item, cb){
  console.log(item); // non-async operation
  cb();
 },
 function () {
  console.log('all done');
 }
);


When you run it, you will get:

0
1
2
...
RangeError: Maximum call stack size exceeded

The problem here is we got synchronous operation inside the iterator, which ended up maxing out the call stack.

If you really cannot avoid mixing synchronous and asynchronous code in an iterator (most of the times you can!), one simple workaround is to wrap synchronous code inside a process.nextTick call, so you clean up the current stack frame and instead of keep increasing the size.


var async = require('async');
var a = [];

for(var i = 0; i < 3040000; i++) {
 a.push(i);
}

async.forEachSeries(a,
 function (item, cb){
  process.nextTick(function () {
   console.log(item);
   cb();
  });
 },
 function () {
  console.log('all done');
 }
);

This issue does not only apply to forEachSeries, but also other function calls like mapSeries, whilist, until, etc. Here is a more detailed discussion thread, where people proposed a patch to add async.unwind to fix the error.