Petaflop: 2012

Friday, 28 December 2012

xbmcfreak installation

I have an ASRock Ion 330HT PC with a 32GB Corsair SSD which I'm using as a media server.

Installation of XBMCFreak LiveCD 10:

Download from http://www.xbmcfreak.nl/downloads-10series/ and burn to CD

Boot up, choose Install LiveCD and walk through the installation.

For partitioning the hard-drive I chose to use Guided - use entire disk. (I initially tried LVM but this failed - I didn't really bother to find out why)

The installation takes about 5 or 10 minutes; be patient with the blank blue screen that is displayed while it installs.

Post installation configuration:

Mount NAS on NFS:

add nas ip-address to /etc/hosts
$ 192.168.1.12 nas

create mount point in fs
$ sudo mkdir /mnt/raid

install nfs client
$ sudo apt-get install nfs-common portmap

mount nfs
$ sudo mount -o rw,async -t nfs4 nas:/mnt/raid /mnt/raid

make it premanent by adding mount command to /etc/fstab
$ nas:/mnt/raid /mnt/raid nfs4 rw,async

Sunday, 9 December 2012

Puppet - open source configuration management

http://puppetlabs.com

Open source job schedulers

Open source job scheduler

http://www.sos-berlin.com/modules/cjaycontent/index.php?id=62&page=osource_scheduler_introduction_en.htm

GNU Batch

http://www.gnu.org/software/gnubatch/

Thursday, 29 November 2012

Pattern recognition algorithms

Boost based Computer Vision and Pattern Recognition Library implements many useful algorithms such as Principal Component Analysis, Eigen solver, etc.

http://boostcvpr.sourceforge.net/

Saturday, 17 November 2012

windows/fedora dual boot - update grub2

After kernel updates the grub boot menu will include both the new kernel version and the previous kernel version.

Remove old kernels (keeping 2)

$ yum install yum-utils
$ package-cleanup --oldkernels --count=2

Update grub's boot menu

It's probably a good idea to make a backup of the old grub.cfg file

$ grub2-mkconfig -o /boot/grub2/grub.cfg

This will update the grub config used to load the boot menu.

You can customize the menu order by renaming the 10_* entries in /etc/grub.d/

Customize the boot order:

Find the menu entries:

$ grep ^menuentry /boot/grub2/grub.cfg | cut -d "'" -f2

Set the default menu entry:

$ grub2-set-default 'one-of-the-above-menu-entries'

Check to see if it worked:

$ grub2-editenv list

Sunday, 4 November 2012

gtest - google unit testing framework

Primer

http://code.google.com/p/googletest/wiki/Primer

Simple test case

#include <gtest/gtest.h>

TEST(TestSuite, TestCase1)
{
ASSERT_TRUE(expr);
}

TEST(TestSuite, TestCase2)
{
ASSERT_TRUE(expr);
}

Get the main function for free

Link gtest_main.cc and you get RUN_ALL_TESTS free

What to do if a test fails

Abort the test on expression failure:

ASSERT_TRUE(expr);

Continue the test on expression failure:

EXPECT_TRUE(expr);

Floating point comparison:

ASSERT_FLOAT_EQ(val1, val2);

ASSERT_DOUBLE_EQ(val1, val2);

ASSERT_NEAR(val1, val2, epsilon);

EXPECT_FLOAT_EQ(val1, val2);

EXPECT_DOUBLE_EQ(val1, val2);

EXPECT_NEAR(val1, val2, epsilon);

Command line options

Repeat tests (useful for finding subtle race conditions)

--gtest_repeat=1000

Enter the debugger upon test failure

--gtest_break_on_failure

Generate an xml report "foobar.xml"

--gtest_output="xml:foobar"

Only run some tests

--gtest_filter=TestSuite* // runs all suites matching TestSuite*

--gtest_filter=TestSuite*-*.*2 // runs all suites matching TestSuite* except cases ending in '2'

--gtest_filter=Foo*:Bar* // separate different reg-ex's with ':'

mysql, python, peewee

I want to use python to work with my mysql database

install the python bindings

$ sudo yum install MySQL-python

download and install peewee - an object-relational mapping utility to python

$ wget https://github.com/coleifer/peewee/archive/master.tar.gz
$ gunzip peewee-master.tar.gz
$ tar -xvf peewee-master.tar
$ cd peewee-master/

$ sudo python setup.py install

$ sudo python setup.py test

Saturday, 3 November 2012

mysql

Install the the client and server software

$ sudo yum install mysql

$ sudo yum install mysql-server

start mysql server

$ sudo service mysqld start

automatically start mysql daemon on startup

$ sudo chkconfig --level 2345 mysqld on

set root password

$ mysqladmin -u root password foobar

$ mysql -u root -p
mysql>

create a new user

mysql> create user 'username'@'localhost' identified by 'password';

mysql> grant all privileges on *.* to 'username'@'localhost' with grant option;

mysql-workbench: admin GUI

$ sudo yum install mysql-workbench

Tutorial on creating a database using mysql-workbench here

Mount NTFS partition in linux

Display available hard disks:
$ sudo fdisk -l

Device Boot Start End Blocks Id System
/dev/sda2 206848 307202047 153497600 7 HPFS/NTFS/exFAT

Manually mount:

$ sudo mount -t ntfs-3g -o defaults,user,rw /dev/sda2 /mnt/windows/

Set up auto mounting in /etc/fstab:

add the following line to your /etc/fstab:

/dev/sda2 /mnt/windows/ ntfs-3g defaults,user,rw 0 0

Thursday, 1 November 2012

gnu screen rc and commands

I am not 100% happy with my screen.rc; it kind of works at the moment but I will edit this post as I find improvements.

.screenrc

# skip the startup message

startup_message off

# go to home dir

chdir

# Automatically detach on hangup.

autodetach on

# Change default scrollback value for new windows

defscrollback 1000

# start with visual bell as default

vbell on

vbell_msg "bell on %t (%n)"

# look and feel

term screen-256color

hardstatus alwayslastline "%{=r}%{G} %{c}%w"

shelltitle "shell"

# highlighting for text marking and printing messages
# '=': replace current settings
# 's': standout
# 'dd': default fore / default back

sorendition =s dd

Commands

Move the current window to a new position (window number):

^a:number [new window number]

eg: ^a:number 5 will move the current window to position number 5

Rename a window
^a<shift>a [new window name]

Wednesday, 24 October 2012

TCP slow start

Slow start is a congestion control strategy used by TCP.

On startup, and after an idle period, since the session doesn't know what the congestion on the network is, the session will start with a TCP congestion window of only 2 segments. This means the session will only send 2 segments and then wait for an ACK before exponentially increasing the congestion window.

With today's modern 10Gb networks, congestion is rarely an issue; and in a low-latency environment practitioners will often say "congestion be damned, send as much as we can and deal with the issues later!"

We can override the default setting on a route by route basis in linux by setting the initial congestion window size:

$ ip route show
10.80.32.0/22 dev eth0 proto kernel scope link src 10.80.33.247
169.254.0.0/16 dev eth0 scope link
127.0.0.0/8 dev lo scope link
default via 10.80.32.1 dev eth0

Now we can change the congestion window size on the default route as follows:

$ sudo ip route change default via 10.80.32.1 dev eth0 proto static initcwnd 10

Friday, 12 October 2012

Fedora 17: Squeezebox server 7.7.2

Go to http://www.mysqueezebox.com/ download and select the Linux RPM.

It will automatically be opened with the Software Installer, resolve dependencies and install the required packages.

Nothing is ever that simple though!

The service wasn't started, and attempts to force it to start using systemctl failed too

$ sudo systemctl start squeezeboxserver.service

Looking in /var/log/messages I found the following:

squeezeboxserver[3040]: Starting Squeezebox Server: Can't locate Slim/bootstrap.pm in @INC (...details of the $INC path)

I then went to /usr to find all files which had anything with Logitech, Squeezebox or Slim in their path

$ cd /usr

$ find . | egrep -i logitec\|squeeze\|slim > /tmp/squeezbox.details

I then searched for the missing file (Slim/bootstrap.pm)

$ grep bootstrap /tmp/squeezbox.details

./lib/perl5/vendor_perl/Slim/bootstrap.pm

So I created a symbolic link to Slim in one of the directories in the $INC path

$ sudo ln -sf /usr/lib/perl5/vendor_perl/Slim /usr/lib64/perl5/vendor_perl/Slim

It still failed to start, so I looked back in /var/log/messages

squeezeboxserver[22941]: Starting Squeezebox Server: Can't locate Digest/MD5.pm in @INC

Some perl modules were missing, which I installed with Yum

$ sudo yum install perl-Log-Log4perl perl-CGI perl-YAML-LibYAML

Now the service started

$ sudo systemctl start squeezeboxserver.service

$ sudo systemctl status squeezeboxserver.service

squeezeboxserver.service - LSB: Startup script for the Logitech Media Server

Loaded: loaded (/etc/rc.d/init.d/squeezeboxserver)

Active: active (running) since Sat, 13 Oct 2012 13:25:40 +1100; 1min 36s ago

I had to open a few ports on my firewall to enable my player to connect (namely 3483 tcp and udp for the player, and 9000 tcp for the web-interface).

$ system-config-firewall
Other ports
Add port 9000, tcp
Add port 3483, tcp and udp

Apply

go to localhost:9000 to configure

Fedora 17: Installation

I have an old Core 2 Duo with 1 500GB hard drive and 4 x 2TB hard drives stuffed in a cupboard.

I have created a 6TB software RAID drive from the 4 x 2TB drives, whilst the 500GB drive is the boot device.

Today I installed Fedora 17 with Gnome using the LiveUSB creator tool.

I used these instructions: http://fedoraproject.org/wiki/How_to_create_and_use_Live_USB

I downloaded the LiveUSB creator tool from here: http://fedorahosted.org/liveusb-creator

I downloaded the 64 bit Fedora 17 Desktop Edition with Gnome desktop manager from here: http://fedoraproject.org/en/get-fedora-options

My existing RAID configuration remained intact, and I installed over Fedora 16 I had on the 500GB drive.

The Quick Start guide make it trivial to set up... just follow the instructions on-screen.

Configure Auto-Login

Since I run this headless in a cupboard, I needed to configure it to automatically login.

Again this is trivial - Just go to System Settings - User Accounts - click on the Unlock button to enable changes, and select Auto Login

Thursday, 27 September 2012

NUMA local PCI-Express interfaces

With Sandybridge PCI-Express slots are now local to a particular socket and therefore NUMA node.

If you're connecting to a specific NIC then it makes sense to make sure you're on the same NUMA node as the NIC is, thereby preventing having to shuttle data over the QPI between the kernel buffers and your user-space buffers.

Details on what NUMA nodes a particular NIC is on can be found as follows:

/sys/class/net/eth0/device $ cat numa_node
0

/sys/class/net/eth0/device $ cat local_cpus
00005555

/sys/class/net/eth0/device $ cat local_cpulist
0,2,4,6,8,10,12,14

Monday, 24 September 2012

Open BEAGLE - open source genetic programming framework

Open BEAGLE is a C++ Evolutionary Computation (EC) framework. It provides an high-level software environment to do any kind of EC, with support for tree-based genetic programming; bit string, integer-valued vector, and real-valued vector genetic algorithms; and evolution strategy

http://code.google.com/p/beagle/

Friday, 21 September 2012

Using rsync to show progress whilst copying

rsync -r -v --progress -e ssh username@hostname:/source destination

Mount USB drive by label

$ sudo mount /dev/disk/by-label/volume_name /mnt/mount_point -t auto

Wednesday, 29 August 2012

rdtscp / rdtsc and instruction reordering

rdtscp and rdtscp are assembly instructions to read the time stamp counter.

rdtscp is a serializing call, but is only available on newer CPUs. It prevents reordering around the call to rdtscp.

rdtsc is non-serializing, so using it alone will not prevent the processor from performing out-of-order execution.

We have to do two things to have this work the way we want it to.

1. We need to stop the compiler / optimizer from reordering our instructions:

Use a memory fence to prevent reordering:

Make the call volatile:

__volatile__

What this does is prevent the optimizer from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm, but it could still move it with respect to unrelated operations. So __volatile__ is not enough.

Tell the compiler memory is being clobbered:

: "memory")

The "memory" clobber means that GCC cannot make any assumptions about memory contents remaining the same across the asm, and thus will not reorder around it.

2. We need to prevent the processor from performing out-of-order execution around our timing call:

Use some other dummy serializing call to prevent out-of-order execution, eg: cpuid.

// volatile to stop optimizing
volatile int dont_remove __attribute__((unused));
unsigned tmp;
// cpuid is a serialising call
__cpuid(0, tmp, tmp, tmp, tmp);
// prevent optimizing out cpuid
dont_remove = tmp;

Monday, 13 August 2012

python decorators

Decorators allow us to 'wrap' functions with custom functionality.

Decorators can be thought of as functions which take a function as a parameter, and return a new function which wraps the passed in function with some new functionality.

def decorate(func):
def wrapped():
print 'before'
func()
print 'after'
return wrapped

Let's say we have a function foo() which we want to decorate; we use the following syntax:

@decorate
def foo():
print 'foo'

Now, calling foo() results in the following:

before
foo
after

Here is a base class which can be extended to add custom before and after functionality

class decorate:
"""
A decorator which can be extended to allow custom before and after
functionality to be called around a function
"""
def before(self):
pass

def after(self):
pass

def __call__(self,func):
def new_f(*args,**kw):
self.before()
try:
res = func(*args,**kw)
self.after()
return res
except:
self.after()
raise
return new_f

Extend this class and override before and after:

class print_before_after(decorate):

def before(self):

print 'before'

def after(self):

print 'after'

@print_before_after() # note here we can pass args

def foo():

print 'foo'

PAPI: Performance API

PAPI aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events.

http://icl.cs.utk.edu/papi/

User Guide: http://icl.cs.utk.edu/projects/papi/files/documentation/PAPI_USER_GUIDE_23.htm

boost lockfree

Currently being reviewed for release - here is the documentation up for review:

http://tim.klingt.org/boost_lockfree/

inotify - monitor file system events

A mechanism for monitoring file system events.

Inotify can be used to monitor individual files, or to monitor directories. When a directory is monitored, inotify will return events for the directory itself, and for files inside the directory.

http://linux.die.net/man/7/inotify

Sunday, 5 August 2012

Building boost

Download the latest version

http://www.boost.org/users/download/

I wanted to build the boost-python library for python 3; so make sure you have python 3 and it's developer package installed

sudo yum install python3

sudo yum install python3-devel

I got python 3.2. However, the headers are installed to /usr/include/python3.2u, whereas boost looks for the headers in <installdir>/python3.2

To work around this I created a symbolic link:

sudo ln -s python3.2mu python3.2

Other dependencies I didn't have:

sudo yum install zlib-devel

sudo yum install bzip2-devel

In the boost source dir:

./bootstrap.sh --with-python=python3 --prefix=/usr

./b2

sudo ./b2 install

If you want versioned names you need to pass --layout=versioned to b2

./b2 --layout=versioned --with-regex debug threading=multi link=static

Building gcc

Download GCC, and extract the source to some location <gcc_src_dir>

http://gcc.gnu.org/mirrors.html

Get makinfo via texinfo

sudo yum install texinfo

build the gcc docs

cd <gcc_src_dir>/gcc/doc
./install.texi2html

point your browser to the install docs, and have a look at the prerequisites

<gcc_src_dir>/gcc/doc/HTML/index.html

Obtain the prerequisites and build; typically you build the prerequisites as follows:

./configure --help : display possible configuration options
./configure : configure with defaults
make : build from source
make html : generate documentation
sudo make install : install

Prerequisites:

gmp: http://gmplib.org/
I used gmp-5.0.5
make sure you enable the C++ interface
./configure --enable-cxx

mpfr: http://www.mpfr.org/
I used mpfr-3.1.1

mpc: http://www.multiprecision.org/
I used mpc-1.0

ppl: http://bugseng.com/products/ppl/Download/
I used ppl-0.11.2

tell it where gmp is (not sure why this didn't get picked up automatically - perhaps a clash with an existing version in /usr?)
./configure --with-gmp=/usr/local

polylib: http://icps.u-strasbg.fr/polylib/
I used polylib-5.22.5

cloog-ppl: ftp://gcc.gnu.org/pub/gcc/infrastructure/
I used cloog-ppl-0.15

configuring gcc

../gcc-4.7.1/configure --with-pkgversion='Lori build 04/08/2012' --disable-multilib --disable-libgcj --build x86_64-fedora-linux --with-mpc=/usr/local --with-mpfr=/usr/local --with-cloog=/usr/local --with-gmp=/usr/local --with-ppl=/usr/local

Monday, 25 June 2012

Using strace & top to debug a multi threaded app

Displays all the system calls being made in each thread

strace -f ./AppName

Displays all threads for the pid in question.

top -p pid -H

Use f,j in top's interactive mode to show which cpus each thread is running on

You can get the same information from the /proc/pid filesystem
This displays each thread and the last cpu it ran on

/proc/pid/task $ for i in `ls -1`; do cat $i/stat | awk '{print $1 " is on " $(NF - 5)}'; done

Wednesday, 20 June 2012

cset set/proc - finer control of cpusets

http://code.google.com/p/cpuset/

Set

Create, adjust, rename, move and destroy cpusets

Commands

Create a cpuset, using cpus 1-3, use NUMA node 1 and call it "my_cpuset1"

$ cset set --cpu=1-3 --mem=1 --set=my_cpuset1

Change "my_cpuset1" to only use cpus 1 and 3

$ cset set --cpu=1,3 --mem=1 --set=my_cpuset1

Destroy a cpuset

$ cset set --destroy --set=my_cpuset1

Rename an existing cpuset

$ cset set --set=my_cpuset1 --newname=your_cpuset1

Create a hierarchical cpuset

$ cset set --cpu=3 --mem=1 --set=my_cpuset1/my_subset1

List existing cpusets (depth of level 1)

$ cset set --list

List existing cpuset and its children

$ cset set --list --set=my_cpuset1

List all existing cpusets

$ cset set --list --recurse

Proc

Manage threads and processes

Commands

List tasks running in a cpuset

$ cset proc --list --set=my_cpuset1 --verbose

Execute a task in a cpuset

$ cset proc --set=my_cpuset1 --exec myApp -- --arg1 --arg2

Moving a task

$ cset proc --toset=my_cpuset1 --move --pid 1234
$ cset proc --toset=my_cpuset1 --move --pid 1234,1236
$ cset proc --toset=my_cpuset1 --move --pid 1238-1340

Moving a task and all its siblings

$ cset proc --move --toset=my_cpuset1 --pid 1234 --threads

Move all tasks from one cpuset to another

$ cset proc --move --fromset=my_cpuset1 --toset=system

Move unpinned kernel threads into a cpuset

$ cset proc --kthread --fromset=root --toset=system

Forcibly move kernel threads (including those that are pinned to a specific cpu) into a cpuset (note: this may have dire consequences for the system - make sure you know what you're doing)

$ cset proc --kthread --fromset=root --toset=system --force

Hierarchy

Using hierarchical cpusets to create prioritised groupings

Example
1. Create a system cpuset with 1 cpu (0)

2. Create a prio_low cpuset with 1 cpu (1)

3. Create a prio_met cpuset with 2 cpus (1-2)

4. Create a prio_high cpuset with 3 cpus (1-3)

5. Create a prio_all cpuset with all 4 cpus (0-3) (note this the same as root; it is considered good practice to keep a separation from root)

To achieve the above you create prio_all, and then create subset prio_high under prio_all, etc

$ cset set --cpu=0 --set=system

$ cset set --cpu=0-3 --set=prio_all

$ cset set --cpu=1-3 --set=/prio_all/prio_high

$ cset set --cpu=1-2 --set=/prio_all/prio_high/prio_med

$ cset set --cpu=1 --set=/prio_all/prio_high/prio_med/prio_low

cset shield - easily configure cpusets

http://code.google.com/p/cpuset/

shield

Basic concept - 3 cpusets
root: present in all configurations and contains all cpus (unshielded)
system: contains cpus used for system tasks - the ones which need to run but aren't "important" (unshielded)
user: contains cpus used for "important" tasks - the ones we want to run in "realtime" mode (shielded)

The shield command manages these 3 cpusets.

During setup it moves all movable tasks into the unshielded cpuset (system) and during teardown it moves all movable tasks into the root cpuset.
After setup, the subcommand lets you move tasks into the shield (user) cpuset, and additionally, to move special tasks (kernel threads) from root to system.

Commands:
Create a shield (Example: 4-core non-NUMA machine: we want to dedicate 3 cores to the shield, and leave 1 core for unimportant tasks; since it is non-NUMA we don't need to specify any memory node parameters; we leave the kernel threads running in the root cpuset)

$ cset shield --cpu 1-3

Some kernel threads (those which aren't bound to specific cpus) can be moved into the system cpuset. (In general it is not a good idea to move kernel threads which have been bound to a specific cpu)

$ cset shield --kthread on

List what's running in the shield (user) or unshield (system) (-v for verbose, list the process names) (2nd -v to display more than 80 characters)

$ cset shield --shield -v
$ cset shield --unshield -v -v

Stop the shield (teardown)

$ cset shield --reset

Execute a process in the shield (commands following '--' are passed to the command to be executed, not to cset)

$ cset shield --exec mycommand -- -arg1 -arg2

Move a running process into the shield (move multiple processes by passing a comma separated list, or ranges (any process in the range will be moved, even if there are gaps))

$ cset shield --shield --pid 1234
$ cset shield --shield --pid 1234,1236
$ cset shield --shield --pid 1234,1237,1238-1240

Monday, 11 June 2012

Dr Dobb's Go Parallel

A multicore application performance blog hosted on Dr Dobbs:

http://www.drdobbs.com/go-parallel

Folly benchmarking

Facebook recently open sourced an internal C++ library called Folly.

They have an interesting benchmarking library.

Check it out here: https://github.com/facebook/folly/blob/master/folly/docs/Benchmark.md

Sunday, 13 May 2012

TTAS spin lock

http://askldjd.wordpress.com/2010/03/31/test-and-test-and-set-lock/

Thursday, 3 May 2012

Schwarz counter in C++11 - std::call_once

Looking at the code below makes me think it would be trivial to implement a Schwarz counter using C++11 idioms

std::unique_ptr<some_resource> resource_ptr;
std::once_flag resource_flag;

void foo()
{
std::call_once(resource_flag,[]{
resource_ptr.reset(new some_resource);
});
resource_ptr->do_something();
}

Wednesday, 18 April 2012

OSX - Get iChat to automatically log back in

Put a small apple script into your crontab which will look in the System Events log and tell iChat to login if it sees the disconnect event in there.

In a terminal:

crontab -e

*/5 * * * * osascript -e 'tell application "System Events" to if (processes whose name is "iChat") exists then tell application "iChat" to log in'

Tuesday, 17 April 2012

(Functional) Reactive Programming in C++

The introduction to Reactive Programming you've been missing
https://gist.github.com/staltz/868e7e9bc2a7b8c1f754

The canonical stackoverflow question
http://stackoverflow.com/questions/1028250/what-is-functional-reactive-programming

What is Reactive Programming?
http://www.paulstovell.com/reactive-programming

FRP in C++ and Its Application
ftp://netlib.bell-labs.com/who/Old/blume/icfp02/191.pdf

C++ implementation: C++ Rx library
https://github.com/Reactive-Extensions/RxCpp

Thursday, 12 April 2012

Kenel bypass & zero-copy

http://ttthebear.blogspot.com/2008/07/linux-kernel-bypass-and-performance.html

kernel bypass
http://www.networkworld.com/news/tech/2005/013105techupdate.html

zero-copy
http://lwn.net/2001/0419/kernel.php3 http://www.linuxjournal.com/article/6345

Interrupt Coalescence
It is possible to reduce mean latency, but will most likely increase the min latency

RDMA
http://www.networkworld.com/news/tech/2003/0324tech.html
http://www.networkworld.com/newsletters/lans/2002/01556276.html

Request completions might be processed either entirely in user space (by polling a user-level completion queue) or through the kernel in cases where the application wishes to sleep until a completion occurs.

A Fast Read/Write Process to Reduce RDMA Communication Latencyhttp://www.people.vcu.edu/~xhe2/publications/Conferences/FRRWP_NAS06.pdf

--- implementation of user-space waiting on rdma ---
create a condition variable based on a futex
have rdma completion handler wake the condition variable

Read Copy Update

http://lwn.net/Articles/262464/

c++11 reference implementation for std::min and std::max

Prepare to have your mind boggled!

This is a reference implementation for std::min and std::max, using c++11 features

namespace detail
{

template <class T, bool make_const, bool make_volatile>
struct union_cv_helper;

template <class T>
struct union_cv_helper<T, false, false>
{
typedef T type;
};

template <class T>
struct union_cv_helper<T, false, true>
{
typedef volatile T type;
};

template <class T>
struct union_cv_helper<T, true, false>
{
typedef const T type;
};

template <class T>
struct union_cv_helper<T, true, true>
{
typedef const volatile T type;
};

} // detail

template <class T, class U>
struct union_cv
{
static const bool make_const = std::tr1::is_const<T>::value || std::tr1::is_const::value;
static const bool make_volatile = std::tr1::is_volatile<T>::value || std::tr1::is_volatile::value;
typedef typename std::tr1::remove_cv<T>::type Tr;
typedef typename std::tr1::remove_cv::type Ur;
typedef typename detail::union_cv_helper<Tr, make_const, make_volatile>::type type;
};

template <class T, class U>
struct promote
{
static T t;
static U u;
static bool b;
typedef typename std::tr1::remove_cv<decltype(b ? t : u)>::type type;
};

namespace detail
{

template <class T, class Tr, class U, class Ur>
struct min_max_return_helper
{
typedef typename promote<T&, U&>::type type;
};

template <class T, class Tr, class U>
struct min_max_return_helper<T, Tr, U, Tr>
{
typedef typename union_cv<T, U>::type& type;
};

template <class T, T t, class U, U u>
struct safe_less_equal
{
static const bool Tsigned = std::tr1::is_signed<T>::value;
static const bool Usigned = std::tr1::is_signed::value;
static const bool Tneg = Tsigned && t < T(0);
static const bool Uneg = Usigned && u < U(0);
static const bool value = Tneg == Uneg ? t <= u : Tneg;
};

template <class T>
struct int_min
{
static const T value = std::tr1::is_signed<T>::value ? T(T(1) << std::numeric_limits<T>::digits) : T(0);
};

template <class T>
struct int_max
{
static const T value = T(~int_min<T>::value);
};

template <class T, class U, bool = std::tr1::is_integral<T>::value && std::tr1::is_integral::value>
struct safe_compare_imp
{
typedef typename promote<T, U>::type R;
static const R Rmin = int_min<R>::value;
static const R Rmax = int_max<R>::value;

static const T Tmin = int_min<T>::value;
static const T Tmax = int_max<T>::value;

static const U Umin = int_min::value;
static const U Umax = int_max::value;

static const bool value = safe_less_equal<R, Rmin, T, Tmin>::value &&
safe_less_equal<R, Rmin, U, Umin>::value &&
safe_less_equal<T, Tmax, R, Rmax>::value &&
safe_less_equal<U, Umax, R, Rmax>::value;
};

template <class T, class U>
struct safe_compare_imp<T, U, false>
{
static const bool value = true;
};

template <class T>
struct safe_compare_imp<T, T, true>
{
static const bool value = true;
};

template <class T>
struct safe_compare_imp<T, T, false>
{
static const bool value = true;
};

template <class T, class U>
struct safe_compare
{
private:
typedef typename std::tr1::remove_cv<typename std::tr1::remove_reference<T>::type>::type Tr;
typedef typename std::tr1::remove_cv<typename std::tr1::remove_reference::type>::type Ur;
public:
static const bool value = detail::safe_compare_imp<Tr, Ur>::value;
};

} // detail

template <class T, class U, bool = detail::safe_compare<T, U>::value>
struct min_max_return {};

template <class T, class U>
struct min_max_return<T&&, U&&, true>
{
typedef typename promote<T&&, U&&>::type type;
};

template <class T, class U>
struct min_max_return<T&&, U&, true>
{
typedef typename promote<T&&, U&>::type type;
};

template <class T, class U>
struct min_max_return<T&, U&&, true>
{
typedef typename promote<T&, U&&>::type type;
};

template <class T, class U>
struct min_max_return<T&, U&, true>
{
private:
typedef typename std::tr1::remove_cv<T>::type Tr;
typedef typename std::tr1::remove_cv::type Ur;
public:
typedef typename detail::min_max_return_helper<T, Tr, U, Ur>::type type;
};

template <class T, class U>
inline
typename min_max_return<T&&, U&&>::type
min(T&& a, U&& b)
{
if (b < a)
return std::forward(b);
return std::forward<T>(a);
}

template <class T, class U, class Compare>
inline
typename min_max_return<T&&, U&&>::type
min(T&& a, U&& b, Compare comp)
{
if (comp(b, a))
return std::forward(b);
return std::forward<T>(a);
}

template <class T, class U>
inline
typename min_max_return<T&&, U&&>::type
max(T&& a, U&& b)
{
if (a < b)
return std::forward(b);
return std::forward<T>(a);
}

template <class T, class U, class Compare>
inline
typename min_max_return<T&&, U&&>::type
max(T&& a, U&& b, Compare comp)
{
if (comp(a, b))
return std::forward(b);
return std::forward<T>(a);
}

Decrypting the CanYouCrackIt MOD puzzle

Awesome piece of hacking/deconstruction, including assembly deconstruction and a virtual machine implementation

http://gchqchallenge.blogspot.com/

Find cpuId current thread is running on

void get_cpu_id(uint32_t& cpu_id)
{
uint32_t leaf = 0x0B;
__asm__ __volatile__("cpuid" : "=d"(cpu_id) : "a"(leaf) : "%rbx", "%rcx");
}

Linux port-scanning

http://www.cyberciti.biz/faq/linux-port-scanning/

Lock-free IPC

http://www.drdobbs.com/cpp/189401457

spinlocks and read/write locks

http://locklessinc.com/articles/locks/

gcc symbol visibility

http://gcc.gnu.org/wiki/Visibility

cpuid - extracting processor features

http://softpixel.com/~cwright/programming/simd/cpuid.php

http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html

High Frequency Trading paper

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1858626

matplotlib - python math plotting library

http://matplotlib.sourceforge.net/

Disruptor concurrent programming framework

http://disruptor.googlecode.com/

c++'s volatile keyword

http://kernel.org/doc/Documentation/volatile-considered-harmful.txt
http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/

sandpile.org: The world's leading source for technical x86 processor information

http://www.sandpile.org/

low latency kernel resources

zabbix: open source enteprise monitoring system

www.zabbix.com

isolcpus - isolate cpus from the kernel scheduler

isolcpus kernel configuration

isolate cpus from the kernel scheduler

useful for removing the critical cores from the scheduler - therefore pinned threads won't be preempted when they block on IO or similar, and therefore that core's data caches won't be trashed by other threads

http://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html

Linux realtime performance articles

Drepper / Red Hat - “What every programmer should know about memory”
http://lwn.net/Articles/259710/

Duval / Red Hat - “From Fast to Predictably Fast”
http://kernel.org/doc/ols/2009/ols2009-pages-79-86.pdf

Lameter / Linux Foundation - “Shoot First and Stop the OS Noise”
http://kernel.org/doc/ols/2009/ols2009-pages-159-168.pdf

Prefaulting the stack

https://rt.wiki.kernel.org/articles/t/h/r/Threaded_RT-application_with_memory_locking_and_stack_handling_example_f48b.html

Prefaulting the heap

https://rt.wiki.kernel.org/index.php/Dynamic_memory_allocation_example

Thursday, 5 January 2012

0MQ

A simple abstraction of the socket api

Intro:
http://lwn.net/Articles/370307/

Guide:
http://zguide.zeromq.org/page:all

FAQ:
http://www.zeromq.org/area:faq

Home page:
http://www.zero.mq

RoCE - RDMA over Converged Ethernet

http://www.mellanox.com/related-docs/prod_software/ConnectX-2_RDMA_RoCE.pdf

RDMA Aware programming user manual:
http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf

Tuesday, 3 January 2012

Network packet capture / analysis: tcpdump

http://www.linuxjournal.com/content/tcpdump-fu

Example: capture anything sent to/from port 7500 on interface eth0 and dump it to a file

$ sudo /usr/sbin/tcpdump -w /tmp/tcpdump.out -s 0 -i eth0 port 7500

We can now use Wireshark to read the file and analyse the packets

To filter the packets based on, for example, a port; use an expression such as tcp.port == 1234

Zero-copy

http://www.linuxjournal.com/article/6345