size_t size = 1024; memseg1 = malloc(size); memseg2 = malloc(size); memcpy(memseg2, memseg1, size); This seems like it will all be in L1 cache. If that's true in the real use, then I'm not suprised the time is near zero. If memcmp is being run on lots of different data, you may be running at memory speed which is probably at least 10x worse.