TakTuk : a parallel, scalable launcher for cluster and “lightweight”grids

KaRun Tools: rshp and mput (built over TakTuk)



News Description Design Publications Characteristics Usage Install downloads



Last modified: Friday, July 2, 2004

News:

Description:

Taktuk is a parallel and scalable remote execution tool for cluster. It works by propagating the execution of a parallel program on all target nodes, using standard remote execution protocols (rsh, ssh, etc...). Remote call scheduling automatically adapts its behavior to the remote execution protocol used, and to the load of the network and remote hosts. This tool is completly independent of the remote protocol used. All remote execution protocol providing an IO redirection of the remote process launched may be used. A grammar may be optionally provided to describe the environment, hence providing increased overall deployment speed in the context of a complex topology (for instance in a grid environment). For a more responsive deployment a remote launch (rsh ssh or other) delay can be bounded to bypass slow nodes and ignore the specific timeout provided by the protocol used. Taktuk provides full IO and signal redirection to the original console user.

Experimental results give very good response times, in the order of 500 milliseconds for a 225-nodes cluster (Pentium III 733 interconnected by Ethernet 100) to launch a null time command (Unix command test with no arguments). Performance come from the good scheduling of the remote execution call. This schedule is based on a work stealing algorithm. When a node is up, it request a “to be launch” list to its father (in the tree). The schedule of remote execution calls automatically adapts to the protocol used, as well as to the load of network and remote hosts.

Taktuk is composed of a library that provides a broadcast execution service to the program who use it, the launch step is integrated in the library.
This Library was used in the Ka-run project (rshp and mput tools). Ka-Run is now included in this package.

Rshp is used to launch any shell command on a set of nodes.
Mput is used to braodcast files on a set of nodes.

Problematics:

Used by:

Performances and Capabilities:

Design:

This tool is scalable, extensible and portable (independent of the remote execution protocol used -- rsh, ssh, etc... --). It does an efficient deployment and initializes a communication layer on large clusters. To reach good performances a parallelized solution is adopted. There is two way to parallelize the deployment of a parallel application: first it's possible to execute many remote execution calls concurrently. This is a local solution. If standard protocol clients are used with many processes, this approach is bounded by system's limits. Currently many operating systems permit only a limited pool of processes for a user. The second solution is to distribute the deployment on several nodes, i.e build a launch spanning tree. This is a distributed solution. These two solutions are mixed and correctly interleaved to reach the best behavior.

Taktuk is composed of five modules : topology description, external connectors (standard remote execution protocol like: rsh, ssh, etc...), remote execution scheduler, communication layer (based on active message paradigm) and input/output/signal redirection modules.

For local concurrent remote execution calls , we use parallel clients of REP (Remote Execution Protocol). It has been done only for simple protocols like rsh (parallelized with asynchronous socket layer or with threads). This solution is efficient but needs a parallel implantation of the client-side protocol. For complex protocols and for better extensibility and portability, we concurrently fork and execute existing remote execution protocol clients (rsh, ssh, rexec, bexec, rex, etc...). In the second parallel method used --the distributed solution--, we exploit the number of nodes. The “to be launched” list is distributed on all already reached nodes. In this case, all these nodes participate to the launch phase. Once this launch step is done we obtain a “launching” spanning tree on all requested nodes.
All standard remote execution protocols provide a redirection of all IO of the remote process on the client protocol part. A bidirectional link is created by the protocol to redirect these IO. These links are used in Taktuk to build a communication layer between all started nodes. The topology of this virtual interconnect is based on the spanning tree used for the deployment. To establish a complete interconnection a thread is dedicated to a logical router on each nodes. This routing thread is also used to execute remote function calls (like a RPC) requested by other nodes (active messages paradigm).
To address a hierarchical cluster or a grid Taktuk include a description grammar. It's possible to specify more global or specific node informations like connectors (rsh, ssh, etc...) to reach it, specific timeout, user name, sub group nodes in the tree. This description may be easily extended to deal with specific grid constraints.
All process inputs and outputs are locally redirected to an IO thread, which forwards IOs (with active messages) to the tree root for the outputs. The root node broadcasts the inputs collected  to all other nodes (signals are redirected in the same way).

Publications on Taktuk (Previously called Spread):

Characteristics:

Example: helloworld

void main (argc,argv)
{
Taktuk T;
T.initialize(argc,argv);
T.commit();
printf(“hello world !!”);
T.terminate();
}

This code will be executed on all nodes passed as arguments (see previous section).
And I/O of this program will be redirected to the user's terminal, the same way it does with a “normal” program.

helloworld -crsh -m host1 -mhost2 -f file_of_hostname -- helloworld's arguments
This tool can use different remote execution protocol like rsh or ssh



Usage: (This is valid for all applications linked with Taktuk. Replace “taktuk” by the application's name below)

taktuk [[-a Arity] || [-d [Limit]]] [-i] [-v] [-P] [-c Connector] [-l user] [-f filename] || [-m hostname || -m host=XXX,connect=XXX,login=XXX... ]...] [ -[ ALL previous options -[ -] included -] ] -- arguments ...

list of connectors with suid bit on taktuk:

rshs (sequential)

rsha (asynchronous)

rsht (threaded: one thread by remote connection. Only during connection creation, they are destroyed after the launch step)

list of user level connectors:

rsh (forked system “rsh” call)

ssh (forked system “ssh” call)

fork (for SMP)

env (use the command define in the TAKTUK_RSH environment variable)

ssf (deprecated, old French ssh protocol)

This option can be increase with suboptions: connect=protocol,timeout=XXX (in msec)



Quick installation guide:

0/ untar the archive in a temporary repertory
tar -xzvf Taktuk2_dist.tgz

I/ Configuration of the library
Go to the Inuktitut directory just created.
Run './configure --help' to have information about switch that you may
need to configure Taktuk depending on your system.

> ./configure --prefix=/usr/local/

That way configure the library to be installed in '/usr/local/'
using the default C++ compiler ('g++').

II/ Compilation of the source files
Run 'make' to compile all the source files

III/ Installation of the library
Run 'make install' to finish the installation into the choosen directory

IV/ Generate documentation
Run 'make doclib' to extract documentation from source files,
this documentation is in html and is copied in the
“html” subdirectory of the installation directory.
Note that, it requires to have doxygen installed on your system
(http://www.stack.nl/~dimitri/doxygen/).

Warning: Documentation is broken in current release

V/ Clean-up the distribution
In order to cleanup some files generated during the compilation,
run 'make clean'

In order to cleanup all files generated during the compilation and
the configuration of the library (config.cache, config.status),
run 'make distclean'

VI/ rshp tool : parallel remote execution launcher
After the install step, binary is placed in the install_dir/bin

CAUTION: to have rshp works you must configure your host(s)
- the tuk program is launched on all nodes, so if tuk is not set in the PATH variable you must launch it with the full binary name:
example: /usr/local/bin/rshp .....
- the rshp binary must be on all host you want reach
- tuk can use all connector listed below
- to that you must configure your host to perform a rsh (or else) to the remote host without password prompt
- if you want use optimal internal connectors rshp must be set with suid bit:
chown root.root rshp
chmod +s rshp
- thats all folks

Report bugs to Cyrille.Martin@imag.fr