Salvatore Cordiano Salvatore Cordiano avatar

5 minute read

When we talk about data interchange in web applications, JSON is the de-facto standard, especially in developing RESTful web services. JSON won against its antagonist XML (SOAP) without a battle, but it didn’t prevent the development of alternatives like Google’s Protocol Buffers, Apache Avro or MessagePack. In being thorough, we should also mention gzip JSON compression (sometimes called “JSONC”), and BSON, a bin­ary-en­coded seri­al­iz­a­tion of JSON-like doc­u­ments, both derived directly from JSON. In this article we’ll discuss MessagePack in depth.

MessagePack JSON Vs MessagePack (source: msgpack.org)

What is MessagePack?

« MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it’s faster and smaller ». To start using MessagePack we need to convert our application objects into MessagePack format: this process is called serialization, while the reverse process is called deserialization. The following example can help us better understand what we’re talking about. Consider this simple JSON:

{
	"id": 4,                    // integer
	"isActive": true,           // boolean
	"fullname": "Homer Simpson" // string
}

JSON requires 56 bytes to represent a very simple user object, while MessagePack only needs 38 bytes (compression ratio 1.47, yielding a 32% saving in size). See below the output of the serialization process for the above JSON:

83 a2 69 64 04 a8 69 73 41 63 74 69 76 65 c3 a8 66 75 6c 6c 6e 61 6d 65 ad 48 6f 6d 65 72 20 53 69 6d 70 73 6f 6e

We can see how MessagePack serialization works by reading the official specification. Also, we can split the previous hexadecimal representation to emphasize and explain data types as follows:

83                                          // 3-element map
a2 69 64                                    // 2-byte string "id"
04                                          // integer 4
a8 69 73 41 63 74 69 76 65                  // 8-byte string "isActive" 
c3                                          // boolean true
a8 66 75 6c 6c 6e 61 6d 65                  // 8-byte string "fullname"
ad 48 6f 6d 65 72 20 53 69 6d 70 73 6f 6e   // 13-byte string "Homer Simpson"
                                            // total 38 bytes

Now it’s very simple to figure out the meaning of the sentence « Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves » reported in the headline of MessagePack website.

The main features of MessagePack are:

  • it’s designed for network communication and to be transparently converted from and to JSON;
  • it supports in-place updating, so it’s possible to modify part of a stored object without reserializing it as a whole;
  • it has a flexible Remote Procedure Call (RPC) and streaming API implementation;
  • it supports static-type-checking.

Supported data types

Data types listed by the specification are very similar to those in JSON, that is:

  • Integer represents an integer;
  • Boolean represents true or false;
  • Nil represents nil;
  • Float represents a IEEE 754 double precision floating point numbers including NaN and Infinity;
  • String is a raw type and it represents a UTF-8 string;
  • Binary is a raw type and it represents a binary data using byte array;
  • Array represents a sequence of objects;
  • Map represents a dictionary (key-value pairs of objects);
  • Extension represents a tuple of data whose meaning is defined by applications.

A naive benchmark

Up to this point our reasoning was focused on space efficiency, but a good theoretical computing scientist would have criticized us since we didn’t mention time complexity. In fact, the process of data compression and decompression is not negligible. We can analyze and compare, for example, the time required to parse a JSON document and to unpack a MessagePack document: that’s not completely scientific, but it’s a start.

We wrote two Node.js scripts to execute 1 million JSON parsing and 1 million MessagePack unpacking operations of a sample document containing the same data in the two formats.

A simplified version of the code could be something like this:

// inside script "test_parse_json.js"
for (var i = 0;i<1000000;i++) {
    JSON.parse(jsonDocument); // JSON document parsing
}

// inside script "test_unpack_msgpack.js"
for (var i = 0;i<1000000;i++) {
    msgpack.unpack(msgPackDocument); // MessagePack document unpacking
}

To easily profile our scripts we can run them as below:

aiace:msgpack parallel$ time node test_parse_json.js

real	0m47.296s
user	0m47.202s
sys	0m0.059s

aiace:msgpack parallel$ time node test_unpack_msgpack.js

real	1m47.244s
user	1m47.050s
sys	0m0.120s

Numbers are self-explanatory: the MessagePack binary is smaller than the minified JSON, but MessagePack deserialization is clearly slower than JSON parsing process.

Before going on, we also need to say that all tests are executed in the following described environment and the full code of this benchmark is free available here.

// Machine
OS : Darwin 15.6 (x64)
RAM: 16.384 MB
CPU: 2.200 MHz Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz

// Runtime versions
aiace:msgpack parallel$ node -v
v6.8.1
aiace:msgpack parallel$ npm -v
3.10.9

// Module versions
aiace:msgpack parallel$ npm list msgpack
[email protected] /Users/parallel/Facile/msgpack
└── [email protected]

aiace:msgpack parallel$ npm list fs
[email protected] /Users/parallel/Facile/msgpack
└── [email protected]

aiace:msgpack parallel$ npm list assert
[email protected] /Users/parallel/Facile/msgpack
└── [email protected]

Conclusions

MessagePack allows to save more than 40% of network bandwidth consumption with little more than one line of code. A smaller payload means that less data are transmitted, and that’s very useful in mobile and Internet of Things (IoT) applications, where there’s special care in power efficiency; but we should also pay attention to the overall size of each request, to avoid the absurd situation in which the header is larger than the payload (overhead).

It’s important to underline that, while MessagePack is supported by over 50 programming languages, it doesn’t seem to be particularly efficient from a computational perspective, and can be hard to debug due to being non human-readable.

comments powered by Disqus