Monday, 25 June 2012

Using strace & top to debug a multi threaded app

Displays all the system calls being made in each thread

    strace -f ./AppName

Displays all threads for the pid in question.

    top -p pid -H

Use f,j in top's interactive mode to show which cpus each thread is running on

You can get the same information from the /proc/pid filesystem
This displays each thread and the last cpu it ran on

    /proc/pid/task $ for i in `ls -1`; do cat $i/stat | awk '{print $1 " is on " $(NF - 5)}'; done

Wednesday, 20 June 2012

cset set/proc - finer control of cpusets

http://code.google.com/p/cpuset/

Set
Create, adjust, rename, move and destroy cpusets

Commands
Create a cpuset, using cpus 1-3, use NUMA node 1 and call it "my_cpuset1"

    $ cset set --cpu=1-3 --mem=1 --set=my_cpuset1

Change "my_cpuset1" to only use cpus 1 and 3

    $ cset set --cpu=1,3 --mem=1 --set=my_cpuset1

Destroy a cpuset

    $ cset set --destroy --set=my_cpuset1

Rename an existing cpuset

    $ cset set --set=my_cpuset1 --newname=your_cpuset1

Create a hierarchical cpuset

    $ cset set --cpu=3 --mem=1 --set=my_cpuset1/my_subset1

List existing cpusets (depth of level 1)

    $ cset set --list

List existing cpuset and its children

    $ cset set --list --set=my_cpuset1

List all existing cpusets

    $ cset set --list --recurse

Proc
Manage threads and processes

Commands
List tasks running in a cpuset

    $ cset proc --list --set=my_cpuset1 --verbose

Execute a task in a cpuset

    $ cset proc --set=my_cpuset1 --exec myApp -- --arg1 --arg2

Moving a task

    $ cset proc --toset=my_cpuset1 --move --pid 1234
    $ cset proc --toset=my_cpuset1 --move --pid 1234,1236
    $ cset proc --toset=my_cpuset1 --move --pid 1238-1340

Moving a task and all its siblings

    $ cset proc --move --toset=my_cpuset1 --pid 1234 --threads

Move all tasks from one cpuset to another

    $ cset proc --move --fromset=my_cpuset1 --toset=system

Move unpinned kernel threads into a cpuset

    $ cset proc --kthread --fromset=root --toset=system

Forcibly move kernel threads (including those that are pinned to a specific cpu) into a cpuset (note: this may have dire consequences for the system - make sure you know what you're doing)

    $ cset proc --kthread --fromset=root --toset=system --force

Hierarchy
Using hierarchical cpusets to create prioritised groupings

Example
1. Create a system cpuset with 1 cpu (0)
2. Create a prio_low cpuset with 1 cpu (1)
3. Create a prio_met cpuset with 2 cpus (1-2)
4. Create a prio_high cpuset with 3 cpus (1-3)
5. Create a prio_all cpuset with all 4 cpus (0-3) (note this the same as root;  it is considered good practice to keep a separation from root)

To achieve the above you create prio_all, and then create subset prio_high under prio_all, etc

    $ cset set --cpu=0 --set=system
    $ cset set --cpu=0-3 --set=prio_all
    $ cset set --cpu=1-3 --set=/prio_all/prio_high
    $ cset set --cpu=1-2 --set=/prio_all/prio_high/prio_med
    $ cset set --cpu=1 --set=/prio_all/prio_high/prio_med/prio_low

cset shield - easily configure cpusets

http://code.google.com/p/cpuset/

shield

Basic concept - 3 cpusets
root: present in all configurations and contains all cpus (unshielded)
system: contains cpus used for system tasks - the ones which need to run but aren't "important" (unshielded)
user: contains cpus used for "important" tasks - the ones we want to run in "realtime" mode (shielded)

The shield command manages these 3 cpusets.

During setup it moves all movable tasks into the unshielded cpuset (system) and during teardown it moves all movable tasks into the root cpuset.
After setup, the subcommand lets you move tasks into the shield (user) cpuset, and additionally, to move special tasks (kernel threads) from root to system.

Commands:
Create a shield (Example: 4-core non-NUMA machine: we want to dedicate 3 cores to the shield, and leave 1 core for unimportant tasks; since it is non-NUMA we don't need to specify any memory node parameters; we leave the kernel threads running in the root cpuset)

    $ cset shield --cpu 1-3

Some kernel threads (those which aren't bound to specific cpus) can be moved into the system cpuset. (In general it is not a good idea to move kernel threads which have been bound to a specific cpu)

    $ cset shield --kthread on

List what's running in the shield (user) or unshield (system) (-v for verbose, list the process names) (2nd -v to display more than 80 characters)

    $ cset shield --shield -v
    $ cset shield --unshield -v -v

Stop the shield (teardown)

    $ cset shield --reset

Execute a process in the shield (commands following '--' are passed to the command to be executed, not to cset)

    $ cset shield --exec mycommand -- -arg1 -arg2

Move a running process into the shield (move multiple processes by passing a comma separated list, or ranges (any process in the range will be moved, even if there are gaps))

    $ cset shield --shield --pid 1234
    $ cset shield --shield --pid 1234,1236
    $ cset shield --shield --pid 1234,1237,1238-1240

Monday, 11 June 2012

Dr Dobb's Go Parallel

A multicore application performance blog hosted on Dr Dobbs:

http://www.drdobbs.com/go-parallel

Folly benchmarking

Facebook recently open sourced an internal C++ library called Folly.

They have an interesting benchmarking library.

Check it out here: https://github.com/facebook/folly/blob/master/folly/docs/Benchmark.md