rdtscp and rdtscp are assembly instructions to read the time stamp counter.
rdtsc is non-serializing, so using it alone will not prevent the processor from performing out-of-order execution.
We have to do two things to have this work the way we want it to.
1. We need to stop the compiler / optimizer from reordering our instructions:
Use a memory fence to prevent reordering:
Make the call volatile:
__volatile__
What this does is prevent the optimizer from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm, but it could still move it with respect to unrelated operations. So __volatile__ is not enough.
Tell the compiler memory is being clobbered:
: "memory")
The "memory" clobber means that GCC cannot make any assumptions about memory contents remaining the same across the asm, and thus will not reorder around it.
2. We need to prevent the processor from performing out-of-order execution around our timing call:
Use some other dummy serializing call to prevent out-of-order execution, eg: cpuid.
// volatile to stop optimizing
volatile int dont_remove __attribute__((unused));
unsigned tmp;
// cpuid is a serialising call
__cpuid(0, tmp, tmp, tmp, tmp);
// prevent optimizing out cpuid
dont_remove = tmp;