size_t size = 1024;
memseg1 = malloc(size);
memseg2 = malloc(size);
memcpy(memseg2, memseg1, size);
This seems like it will all be in L1 cache. If that's true in the real
use, then I'm not suprised the time is near zero. If memcmp is being
run on lots of different data, you may be running at memory speed which
is probably at least 10x worse.