Wednesday 29 August 2012

rdtscp / rdtsc and instruction reordering

rdtscp and rdtscp are assembly instructions to read the time stamp counter.

rdtscp is a serializing call, but is only available on newer CPUs. It prevents reordering around the call to rdtscp.

rdtsc is non-serializing, so using it alone will not prevent the processor from performing out-of-order execution.

We have to do two things to have this work the way we want it to.

1. We need to stop the compiler / optimizer from reordering our instructions:

Use a memory fence to prevent reordering:

Make the call volatile:

    __volatile__

What this does is prevent the optimizer from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm, but it could still move it with respect to unrelated operations. So __volatile__ is not enough.

Tell the compiler memory is being clobbered:

    : "memory")

The "memory" clobber means that GCC cannot make any assumptions about memory contents remaining the same across the asm, and thus will not reorder around it.

2. We need to prevent the processor from performing out-of-order execution around our timing call:

Use some other dummy serializing call to prevent out-of-order execution, eg: cpuid.

// volatile to stop optimizing
volatile int dont_remove __attribute__((unused)); 
unsigned tmp;
// cpuid is a serialising call
__cpuid(0, tmp, tmp, tmp, tmp);
// prevent optimizing out cpuid
dont_remove = tmp;                                

Monday 13 August 2012

python decorators

Decorators allow us to 'wrap' functions with custom functionality.

Decorators can be thought of as functions which take a function as a parameter, and return a new function which wraps the passed in function with some new functionality.

def  decorate(func):
    def wrapped():
        print 'before'
        func()
        print 'after'
    return wrapped

Let's say we have a function foo() which we want to decorate; we use the following syntax:


@decorate
def foo():
    print 'foo'


Now, calling foo() results in the following:

    before
    foo
    after

Here is a base class which can be extended to add custom before and after functionality

class decorate:
    """
    A decorator which can be extended to allow custom before and after 
    functionality to be called around a function
    """
    def before(self):
        pass

    def after(self):
        pass

    def __call__(self,func):
        def new_f(*args,**kw):
            self.before()
            try:
                res = func(*args,**kw)
                self.after()
                return res
            except:
                self.after()
                raise
        return new_f

Extend this class and override before and after:

class print_before_after(decorate):
    def before(self):
        print 'before'

    def after(self):
        print 'after'

@print_before_after() # note here we can pass args 
def foo():
    print 'foo'

PAPI: Performance API

PAPI aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events.

http://icl.cs.utk.edu/papi/

User Guide: http://icl.cs.utk.edu/projects/papi/files/documentation/PAPI_USER_GUIDE_23.htm

boost lockfree

Currently being reviewed for release - here is the documentation up for review:

http://tim.klingt.org/boost_lockfree/

inotify - monitor file system events

A mechanism for monitoring file system events.

Inotify can be used to monitor individual files, or to monitor directories. When a directory is monitored, inotify will return events for the directory itself, and for files inside the directory.

http://linux.die.net/man/7/inotify

Sunday 5 August 2012

Building boost

Download the latest version

    http://www.boost.org/users/download/

I wanted to build the boost-python library for python 3; so make sure you have python 3 and it's developer package installed


    sudo yum install python3
    sudo yum install python3-devel

I got python 3.2. However, the headers are installed to /usr/include/python3.2u, whereas boost looks for the headers in <installdir>/python3.2
To work around this I created a symbolic link:

    sudo ln -s python3.2mu python3.2

Other dependencies I didn't have:

    sudo yum install zlib-devel
    sudo yum install bzip2-devel

In the boost source dir:

    ./bootstrap.sh --with-python=python3 --prefix=/usr
    ./b2
    sudo ./b2 install

If you want versioned names you need to pass --layout=versioned to b2

    ./b2 --layout=versioned --with-regex debug threading=multi link=static

Building gcc

Download GCC, and extract the source to some location <gcc_src_dir>

    http://gcc.gnu.org/mirrors.html

Get makinfo via texinfo

    sudo yum install texinfo

build the gcc docs

    cd <gcc_src_dir>/gcc/doc
    ./install.texi2html

point your browser to the install docs, and have a look at the prerequisites

    <gcc_src_dir>/gcc/doc/HTML/index.html

Obtain the prerequisites and build; typically you build the prerequisites as follows:

        ./configure --help     : display possible configuration options
        ./configure            : configure with defaults
        make                   : build from source
        make html              : generate documentation
        sudo make install      : install

Prerequisites:

    gmp: http://gmplib.org/
        I used gmp-5.0.5
        make sure you enable the C++ interface
            ./configure --enable-cxx

    mpfr: http://www.mpfr.org/
        I used mpfr-3.1.1

    mpc: http://www.multiprecision.org/
        I used mpc-1.0

    ppl: http://bugseng.com/products/ppl/Download/
        I used ppl-0.11.2

        tell it where gmp is  (not sure why this didn't get picked up automatically - perhaps a clash with an existing version in /usr?)
            ./configure --with-gmp=/usr/local

    polylib: http://icps.u-strasbg.fr/polylib/
        I used polylib-5.22.5

    cloog-ppl: ftp://gcc.gnu.org/pub/gcc/infrastructure/
        I used cloog-ppl-0.15

configuring gcc

    ../gcc-4.7.1/configure --with-pkgversion='Lori build 04/08/2012' --disable-multilib --disable-libgcj --build x86_64-fedora-linux --with-mpc=/usr/local --with-mpfr=/usr/local --with-cloog=/usr/local --with-gmp=/usr/local --with-ppl=/usr/local