Thursday, 27 September 2012

NUMA local PCI-Express interfaces


With Sandybridge PCI-Express slots are now local to a particular socket and therefore NUMA node.

If you're connecting to a specific NIC then it makes sense to make sure you're on the same NUMA node as the NIC is, thereby preventing having to shuttle data over the QPI between the kernel buffers and your user-space buffers.

Details on what NUMA nodes a particular NIC is on can be found as follows:

/sys/class/net/eth0/device $ cat numa_node
0

/sys/class/net/eth0/device $ cat local_cpus
00005555

/sys/class/net/eth0/device $ cat local_cpulist
0,2,4,6,8,10,12,14

Monday, 24 September 2012

Open BEAGLE - open source genetic programming framework

Open BEAGLE is a C++ Evolutionary Computation (EC) framework. It provides an high-level software environment to do any kind of EC, with support for tree-based genetic programming; bit string, integer-valued vector, and real-valued vector genetic algorithms; and evolution strategy

http://code.google.com/p/beagle/

Friday, 21 September 2012

Using rsync to show progress whilst copying

rsync -r -v --progress -e ssh username@hostname:/source destination 

Mount USB drive by label


$ sudo mount /dev/disk/by-label/volume_name /mnt/mount_point -t auto

Wednesday, 29 August 2012

rdtscp / rdtsc and instruction reordering

rdtscp and rdtscp are assembly instructions to read the time stamp counter.

rdtscp is a serializing call, but is only available on newer CPUs. It prevents reordering around the call to rdtscp.

rdtsc is non-serializing, so using it alone will not prevent the processor from performing out-of-order execution.

We have to do two things to have this work the way we want it to.

1. We need to stop the compiler / optimizer from reordering our instructions:

Use a memory fence to prevent reordering:

Make the call volatile:

    __volatile__

What this does is prevent the optimizer from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm, but it could still move it with respect to unrelated operations. So __volatile__ is not enough.

Tell the compiler memory is being clobbered:

    : "memory")

The "memory" clobber means that GCC cannot make any assumptions about memory contents remaining the same across the asm, and thus will not reorder around it.

2. We need to prevent the processor from performing out-of-order execution around our timing call:

Use some other dummy serializing call to prevent out-of-order execution, eg: cpuid.

// volatile to stop optimizing
volatile int dont_remove __attribute__((unused)); 
unsigned tmp;
// cpuid is a serialising call
__cpuid(0, tmp, tmp, tmp, tmp);
// prevent optimizing out cpuid
dont_remove = tmp;                                

Monday, 13 August 2012

python decorators

Decorators allow us to 'wrap' functions with custom functionality.

Decorators can be thought of as functions which take a function as a parameter, and return a new function which wraps the passed in function with some new functionality.

def  decorate(func):
    def wrapped():
        print 'before'
        func()
        print 'after'
    return wrapped

Let's say we have a function foo() which we want to decorate; we use the following syntax:


@decorate
def foo():
    print 'foo'


Now, calling foo() results in the following:

    before
    foo
    after

Here is a base class which can be extended to add custom before and after functionality

class decorate:
    """
    A decorator which can be extended to allow custom before and after 
    functionality to be called around a function
    """
    def before(self):
        pass

    def after(self):
        pass

    def __call__(self,func):
        def new_f(*args,**kw):
            self.before()
            try:
                res = func(*args,**kw)
                self.after()
                return res
            except:
                self.after()
                raise
        return new_f

Extend this class and override before and after:

class print_before_after(decorate):
    def before(self):
        print 'before'

    def after(self):
        print 'after'

@print_before_after() # note here we can pass args 
def foo():
    print 'foo'

PAPI: Performance API

PAPI aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events.

http://icl.cs.utk.edu/papi/

User Guide: http://icl.cs.utk.edu/projects/papi/files/documentation/PAPI_USER_GUIDE_23.htm