Petaflop

Sunday, 15 March 2015

Pandas: merge 2 csv files of market data and plot the spread

import pandas as pd
import numpy as np
from datetime import datetime

%matplotlib inline

# load the csv files into pandas
def read_csv(filename):
return pd.read_csv(
filename,
dtype = {
'bid_vol' : np.float64,
'bid_price': np.float64,
'ask_vol' : np.float64,
'ask_price': np.float64
},
na_values=['nan'],
index_col='time',
parse_dates=['time'],
date_parser=lambda x: datetime.strptime(x[:15], "%H:%M:%S.%f"))
df1 = read_csv('mkt1.best')
df1 = read_csv('mkt2.best')

# since both dataframes have the same column names for price data we
# need to create a multiindex using the instrument id

def create_time_and_id_index(dfs):

# appends all the dataframes into one big dataframe
out = dfs.pop(0)
for df in dfs:
out = out.append(df)
# add 'id' to the index
out = out.set_index('id', append=True)
# sort on timestamp
out = out.sort()
# pivot instr_id from the row index to the column index, leaving only timestamp as the row index
out = out.unstack()
# the times in the dataframes may not match up, so if we add them together, pandas will add NAN
# values for the other columns, so forward fill
out = out.ffill()
# reshuffle the column index so that instr_id is the top level, and the other column labels are the second level
out = out.swaplevel(0,1,axis=1)
# resort the column labels so that columns are grouped by instr_id
out = out.sort(axis=1)
return out

best = create_time_and_id_index([df1, df2])

# create spread columns
best['sell_spread'] = best[mkt2].ask_price - best.[mkt1].bid_price
best['buy_spread'] = best[mkt1].ask_price - best.[mkt2].bid_price

# plot the results!
best.plot(y='sell_spread', figsize=(20, 8))

Bash tab completion example

#!/bin/bash

_apps()
{
echo $(cat ${APPS} | awk '{print $1}' | grep -v -e '^#\|^$')
}

_servers()
{
echo $(cat ${SERVERS} | awk '{print $2}' | sort -u | cut -f2 -d@)
}

_options()
{
echo "--help --verbose --validate --quiet --server"
}

_commands()
{
echo "status start stop restart kill version config"
}

_contains()
{
local e
for e in ${@:2}; do
if [[ "$e" == "$1" ]]; then
echo 1
return 0
fi
done
echo 0
return 1
}

_complete()
{
local prev_cmd="${COMP_WORDS[COMP_CWORD-1]}"
local curr_cmd="${COMP_WORDS[COMP_CWORD]}"

if [[ ${prev_cmd} == "--server" ]]; then
COMPREPLY=( $(compgen -W "$(_servers)" -- ${curr_cmd}) )
return 0
fi

if [[ ${curr_cmd} == -* ]]; then
COMPREPLY=( $(compgen -W "$(_options)" -- ${curr_cmd}) )
return 0
fi

# previous command was an app name, so show commands
if [[ $(_contains "${prev_cmd}" "$(_apps)") -eq 1 ]]; then
COMPREPLY=( $(compgen -W "$(_commands)" -- ${curr_cmd}) )
return 0
fi

# otherwise try match an app name
COMPREPLY=( $(compgen -W "$(_apps)" -- ${curr_cmd}) )
}

_main()
{
complete -F _complete cmd
}
_main

Tuesday, 10 March 2015

C++11 - Unevaluated operands

Operands of sizeof, typeid, decltype and noexcept are never evaluated

We therefore only need a declaration, not the definition, to use a function or object's name in these contexts

std::declval<T>() returns T&&
std::declval<T&>() returns T&

decltype( foo(std::declval<T>()) ) returns foo's return type when foo is called with T&&

declval allows us to provide a declaration without having to evaluate the expression (ie: in an unevaluated context) - useful for SFINAE etc

Example: testing for copy-assignability

template<class T>
class is_copy_assignable
{
template<class U, class=decltype(declval<U&>()=declval<const U&>())>
static true_type try_assignment(U&&);

template<class U>
static false_type try_assignment(...); // catch-all fallback

public:
using type = decltype(try_assignment(declval<T>()));
};

How this works:

try_assignment(...) will match anything, but is also always the worst match, so if the other try_assignment can match, it will.

type will be the return type of try_assignment, which will either be true_type or false_type

the true_type overload will only work if the expression U& = const U& is valid - ie: if it is copy assignable

We use a second template parameter to allow SFINAE to kick in. It is unnamed because we only use it for SFINAE.

Example: testing for copy-assignability, and requiring an lvalue reference return type

The above example doesn't force a requirement on the copy assignment returning an lvalue reference.

If we assign an alias template to the returned type:

template<class T>
using copy_assignment_t = decltype(declval<T&>() = declval<const T&>());

We can then check whether that is a T& in a SFINAE specialisation

template<class T, class=void>
struct is_copy_assignable
: std::false_type {};

template<class T>
struct is_copy_assignable<T, void_t<copy_assignment_t<T>>>
: std::is_same<copy_assignment_t<T>,T&> {};

Monday, 19 January 2015

find files older than today and zip them up with the date as part of the extension

1. find files older than today in a given directory with a given extension

$ find -mtime +1 ${DIR} -name "*.${EXT}"

2. calculate the last modified time (seconds since epoch)

$ MOD_SECS=$(stat -c%Y ${FILE})

3. convert the seconds since epoch into a human readable date format

$ MOD_DATE=$(date +\%Y-\%m-\%d --date="@${MOD_SECS}")

4. create a gzip file with the suffix including the date when the file was last modified

$ gzip -S .${MOD_DATE}.gz ${FILE}

5. putting it all together

for FILE in $(find ${DIR} -mtime +1 -name "*.${EXT}"); do
MOD_SECS=$(stat -c%Y ${FILE})
MOD_DATE=$(date +\%Y-\%m-\%d --date="@${MOD_SECS}")
gzip -S .${MOD_DATE}.gz ${FILE}
done

6. as a script:

#!/bin/bash

if [ "$#" -ne 2 ]; then
echo "Usage: $0 dir ext"
exit 1
fi

DIR=$1
EXT=$2

for FILE in $(find ${DIR} -mtime +1 -name "*.${EXT}"); do
MOD_SECS=$(stat -c%Y ${FILE})
MOD_DATE=$(date +\%Y-\%m-\%d --date="@${MOD_SECS}")
gzip -S .${MOD_DATE}.gz ${FILE}
done

Tuesday, 6 January 2015

bash command line parsing

We want to be able to mix both optional flags, optional arguments and positional arguments

optional flags: getopts character, not followed by a ':'
optional arguments: getopts character, followed by a ':' (which means "take an argument"
positional arguments: after the getopts, use $OPTIND which is the index of the last option getopts parsed.

$ script.sh [options] ARG1 ARG2

#!/bin/bash

usage()
{
echo "Usage: $0 [-a foo] [-b] ARG1 ARG2" 1>&2;
exit 1
}

while getopts ":a:bh" o; do
case "${o}" in
a) a=${OPTARG};;
b) b=YES;; # turn on flag
h) usage ;; # display help
esac
done

# store positional arguments
ARG1=${@:$OPTIND:1}
ARG2=${@:$OPTIND+1:1}

# check positional arguments have been supplied
if [ -z "${ARG1}" ] || [ -z "${ARG2}" ]; then
usage
fi

# display the results
echo a=${a}
echo b=${b}
echo ARG1=${ARG1}
echo ARG2=${ARG2}

Saturday, 22 November 2014

ssh tunnel

Create a tunnel to a remote host via a gateway

ssh -L <local-port-to-listen>:<destination-host>:<destination-port> <gateway_user>@<gateway>

This opens a local port listening for traffic on <local-port-to-listen> and forwards that traffic via the gateway (user@gateway) to the remote destination <destination-host>:<destination-port>

-f executes ssh in the background
-N means no remote command (ie: just create a tunnel)

ssh -N -f -L 8080:destination:8080 user@gateway

Pointing your browser to localhost:8080 will connect to the ssh tunnel which forwards the data to destination:8080, going via the gateway

Tuesday, 18 November 2014

Multicast troubleshooting

Troubleshooting multicast:

Check that the interface is configured with multicast:

$ ifconfig eth9.240
eth9.240 Link encap:Ethernet HWaddr 00:60:DD:44:67:9E
inet addr:10.185.131.41 Bcast:10.185.131.63 Mask:255.255.255.224
inet6 addr: fe80::260:ddff:fe44:679e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Check that the multicast addresses you are subscribing to have a route to that particular interface:

$ ip route
224.0.0.0/4 dev eth9.240 scope link

Run your application and check if the subscriptions are going to the correct interface:

$ netstat -g
IPv6/IPv4 Group Memberships
Interface RefCnt Group
[...]
eth9.240 1 239.1.127.215
eth9.240 1 239.1.1.215

Run tcpdump and check that you are indeed receiving traffic. Do this while your application is running; otherwise the igmp subscription will not be on.

$ tcpdump -i eth9.240
10:15:13.385228 IP 10.0.8.121.45666 > 239.1.1.1.51001: UDP, length 16

If you got to the tcpdump part, the networking should be OK.

If your application is still not receiving packets, it is probably because of the rp_filter in Linux.

The rp_filter filters out any packets that do not have a route on a particular interface. In the example above, if 10.0.8.121 is not routable via eth9.240 so the solution is to:

Check the filter
$ cat /proc/sys/net/ipv4/conf/ethX/rp_filter

add this line to /etc/sysctl.conf
net.ipv4.conf.eth9/240.rp_filter = 0
$ sudo sysctl -p

Check if it’s OK
$ sysctl -a | grep “eth9/240.rp_filter”