Wednesday, 24 July 2013

Notes on performance counters and profiling with PAPI

Performance counters

2 main types of profiling applications with performance counters: aggregate (direct) and statistical (indirect).

  • Aggregate: Involves reading the counters before and after the execution of a region of code and recording the difference. This usage model permits explicit, highly accurate, fine-grained measurements. There are two sub-cases of aggregate counter usage: Summation of the data from multiple executions of an instrumented location, and trace generation, where the counter values are recorded for every execution of the instrumentation.
  • Statistical: The PM hardware is set to generate an interrupt when a performance counter reaches a preset value. This interrupt carries with it important contextual information about the state of the processor at the time of the event. Specifically, it includes the program counter (PC), the text address at which the interrupt occurred. By populating a histogram with this data, users obtain a probabilistic distribution of PM interrupt events across the address space of the application. This kind of profiling facilitates a good high-level understanding of where and why the bottlenecks are occurring. For instance, the questions, "What code is responsible for most of the cache misses?" and "Where is the branch prediction hardware performing poorly?" can quickly be answered by generating a statistical profile.

PAPI supports two types of events, preset and native. 
  • Preset events have a symbolic name associated with them that is the same for every processor supported by PAPI. 
  • Native events, on the other hand, provide a means to access every possible event on a particular platform, regardless of there being a predefined PAPI event name for it
PAPI supports measurements per-thread; that is, each measurement only contains counts generated by the thread performing the PAPI calls

int events[2] = { PAPI_L1_DCM, PAPI_FP_OPS }; // L1 data cache misses; hardware flops
long_long values[2];

PAPI_start_counters(events, 2);
// do work
PAPI_read_counters(values, 2);


Taken from a Dr Dobbs article


No comments:

Post a Comment