== NUM_PTS 1000 ==

board               avg_ser     avg_deser
intel i5             0.000011    0.000025
arm1176jzf-s   0.001328    0.003354

Ok, so serialization is definitely much slower, but with the latency test you're using it shouldn't affect anything.  Good to know though.
 
== Latency Results ==

                         IPC        NoTCP      Bypass
intel i5           :   0.4ms     0.10ms     0.05ms
arm1176jzf-s :   25.0ms   4.50ms     3.50ms

====== Raw Socket Tests ======

Tests the latencies of sending a single char from server to client,
written using simple posix socket code.

intel i5           : 0.1ms
arm1176jzf-s : 1.4ms

Do you have any way of profiling exactly what's taking time, either with gprof, google perf tools, or something else?

Josh