Wednesday 29 August 2012

rdtscp / rdtsc and instruction reordering

rdtscp and rdtscp are assembly instructions to read the time stamp counter.

rdtscp is a serializing call, but is only available on newer CPUs. It prevents reordering around the call to rdtscp.

rdtsc is non-serializing, so using it alone will not prevent the processor from performing out-of-order execution.

We have to do two things to have this work the way we want it to.

1. We need to stop the compiler / optimizer from reordering our instructions:

Use a memory fence to prevent reordering:

Make the call volatile:

    __volatile__

What this does is prevent the optimizer from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm, but it could still move it with respect to unrelated operations. So __volatile__ is not enough.

Tell the compiler memory is being clobbered:

    : "memory")

The "memory" clobber means that GCC cannot make any assumptions about memory contents remaining the same across the asm, and thus will not reorder around it.

2. We need to prevent the processor from performing out-of-order execution around our timing call:

Use some other dummy serializing call to prevent out-of-order execution, eg: cpuid.

// volatile to stop optimizing
volatile int dont_remove __attribute__((unused)); 
unsigned tmp;
// cpuid is a serialising call
__cpuid(0, tmp, tmp, tmp, tmp);
// prevent optimizing out cpuid
dont_remove = tmp;                                

No comments:

Post a Comment