KaRun Tools: rshp and mput (built over TakTuk)
News Description Design Publications Characteristics Usage Install downloads
Last
modified:
deb package taktuk2-0.5-1.deb
New version taktuk2-0.5.tgz :
Package rebuild : Library and tools install
Mput new version : seems stable (in progress)
New parser for sub options
Heterogeneous platforms ia64 / ia32
Grid capacity
Internal version Taktuk2 v0.5. Warning not compatible with older version (version checking)
Renaming tarball: new version are now named Taktuk2_date.tgz
Merge of Taktuk (the library) and KaRun (binaries built over Taktuk) in the same package.
Version check (ensure all Taktuk binaries deployed on a cluster use the same version of libTakTuk)
Taktuk is a parallel and scalable remote execution tool for cluster. It works by propagating the execution of a parallel program on all target nodes, using standard remote execution protocols (rsh, ssh, etc...). Remote call scheduling automatically adapts its behavior to the remote execution protocol used, and to the load of the network and remote hosts. This tool is completly independent of the remote protocol used. All remote execution protocol providing an IO redirection of the remote process launched may be used. A grammar may be optionally provided to describe the environment, hence providing increased overall deployment speed in the context of a complex topology (for instance in a grid environment). For a more responsive deployment a remote launch (rsh ssh or other) delay can be bounded to bypass slow nodes and ignore the specific timeout provided by the protocol used. Taktuk provides full IO and signal redirection to the original console user.
Experimental results give very good response times, in the order of 500 milliseconds for a 225-nodes cluster (Pentium III 733 interconnected by Ethernet 100) to launch a null time command (Unix command test with no arguments). Performance come from the good scheduling of the remote execution call. This schedule is based on a work stealing algorithm. When a node is up, it request a “to be launch” list to its father (in the tree). The schedule of remote execution calls automatically adapts to the protocol used, as well as to the load of network and remote hosts.
Taktuk is composed of a library that
provides a broadcast execution service to the program who use it, the
launch step is integrated in the library.
This Library was used
in the Ka-run project
(rshp and mput tools). Ka-Run is now included in this
package.
Rshp is used to launch any shell command on a
set of nodes.
Mput is used to braodcast files on a set of
nodes.
Problematics:
Efficient and Scalable deployment.
Remote execution protocol independant.
Provide a communication layer between different nodes.
Deal with “lightweight” grids.
Control and redirect IO of remote processes.
Used by:
Ka-run (rshp and mput) distributed in the Clic Mandrake distribution NOW INCLUDED IN BIN DIRECTORY OF TAKTUK PACKAGE
OAR (Ressource management system for large cluster)
Inuktitut framework to deploy parallel applications built with.
Performances and Capabilities:
Less than one second to execute a simple Unix command on 200 nodes (obtain with rshp with “rsh” protocol).
High scalability and low system overload (tested with 3000 virtual nodes on a cluster composed of 200 real nodes)
This tool is scalable, extensible and portable (independent of the remote execution protocol used -- rsh, ssh, etc... --). It does an efficient deployment and initializes a communication layer on large clusters. To reach good performances a parallelized solution is adopted. There is two way to parallelize the deployment of a parallel application: first it's possible to execute many remote execution calls concurrently. This is a local solution. If standard protocol clients are used with many processes, this approach is bounded by system's limits. Currently many operating systems permit only a limited pool of processes for a user. The second solution is to distribute the deployment on several nodes, i.e build a launch spanning tree. This is a distributed solution. These two solutions are mixed and correctly interleaved to reach the best behavior.
Taktuk is composed of five modules : topology description, external connectors (standard remote execution protocol like: rsh, ssh, etc...), remote execution scheduler, communication layer (based on active message paradigm) and input/output/signal redirection modules.
For local concurrent remote execution
calls , we use parallel clients of REP (Remote Execution Protocol).
It has been done only for simple protocols like rsh (parallelized
with asynchronous socket layer or with threads). This solution is
efficient but needs a parallel implantation of the client-side
protocol. For complex protocols and for better extensibility and
portability, we concurrently fork and execute existing remote
execution protocol clients (rsh, ssh, rexec, bexec, rex, etc...). In
the second parallel method used --the distributed solution--, we
exploit the number of nodes. The “to be launched” list is
distributed on all already reached nodes. In this case, all these
nodes participate to the launch phase. Once this launch step is done
we obtain a “launching” spanning tree on all requested
nodes.
All standard remote execution protocols provide a
redirection of all IO of the remote process on the client protocol
part. A bidirectional link is created by the protocol to redirect
these IO. These links are used in Taktuk to build a communication
layer between all started nodes. The topology of this virtual
interconnect is based on the spanning tree used for the deployment.
To establish a complete interconnection a thread is dedicated to a
logical router on each nodes. This routing thread is also used to
execute remote function calls (like a RPC) requested by other nodes
(active messages paradigm).
To address a hierarchical cluster or
a grid Taktuk include a description grammar. It's possible to specify
more global or specific node informations like connectors (rsh, ssh,
etc...) to reach it, specific timeout, user name, sub group nodes in
the tree. This description may be easily extended to deal with
specific grid constraints.
All process inputs and outputs are
locally redirected to an IO thread, which forwards IOs (with active
messages) to the tree root for the outputs. The root node broadcasts
the inputs collected to all other nodes (signals are redirected
in the same way).
Parallel launcher for clusters of PCs, Cyrille Martin and Olivier Richard, Parco 2001.
Scalable monitoring and configuration tools for grids and clusters, P. Augerat, C. Martin, B. Stein, 10th Euromicro Workshop on Parallel, Distributed and Network-Based Processing, 2002.
French publications here
C++ Library
Included in the Inuktitut project:
Support large clusters and “lightweight” grids
Simple program use:
Example: helloworld
void
main (argc,argv)
{
Taktuk
T;
T.initialize(argc,argv);
T.commit();
printf(“hello
world !!”);
T.terminate();
}
This code will be
executed on all nodes passed as arguments (see previous section).
And I/O of this program will be redirected to the user's
terminal, the same way it does with a “normal” program.
Simple call of the binary:
helloworld
-crsh -m host1 -mhost2 -f file_of_hostname -- helloworld's
arguments
This tool can use different remote execution protocol
like rsh or ssh
taktuk [[-a Arity] || [-d [Limit]]] [-i] [-v] [-P] [-c Connector] [-l user] [-f filename] || [-m hostname || -m host=XXX,connect=XXX,login=XXX... ]...] [ -[ ALL previous options -[ -] included -] ] -- arguments ...
-a: arity of the spanning tree, by default is the number of nodes (obsolete for test only)
-d: Dynamic launcher, a “work stealing” algorithm is applied, with a task being a node to launch. All nodes ready to work steal tasks from their hierarchicaly parent nodes. Limit is an integer, when it's reached, a node waits for its current work to terminate before it steals another one.
-l: username on the remote host
-c: specify the connector you want use to connect remote host (rexec, rsh, other...)
list of connectors with suid bit on taktuk:
rshs (sequential)
rsha (asynchronous)
rsht (threaded: one thread by remote connection. Only during connection creation, they are destroyed after the launch step)
list of user level connectors:
rsh (forked system “rsh” call)
ssh (forked system “ssh” call)
fork (for SMP)
env (use the command define in the TAKTUK_RSH environment variable)
ssf (deprecated, old French ssh protocol)
This option can be increase with suboptions: connect=protocol,timeout=XXX (in msec)
-f: filename, file containing a list of host, this option can be used many times and mixed with the -m option.
-i: let fd opened by application when the exec command is used. WARNING for developper only! Don't use it if you don't know what you do ;-)
-m: specify one hostname
-m ... : this option can be increase with suboptions:
host=hostname
connect=connector_name, which connector we must use for this node
login=log_name, login for this node
-[ and -]: are delimiter for childrens specification of one node. Specifie it after a “-m” option
-v: verbose mode (all output and errput are tagged with the node's name and ID)
-P: profiling mode: execution time
0/ untar the
archive in a temporary repertory
tar -xzvf
Taktuk2_dist.tgz
I/ Configuration of the library
Go to the Inuktitut directory just created.
Run './configure
--help' to have information about switch that you may
need to
configure Taktuk depending on your system.
> ./configure
--prefix=/usr/local/
That way configure the library to be
installed in '/usr/local/'
using the default C++ compiler
('g++').
II/ Compilation of the source files
Run 'make' to compile all the source files
III/
Installation of the library
Run 'make install' to finish
the installation into the choosen directory
IV/
Generate documentation
Run 'make doclib' to extract
documentation from source files,
this documentation is in html
and is copied in the
“html” subdirectory of the
installation directory.
Note that, it requires to have doxygen
installed on your system
(http://www.stack.nl/~dimitri/doxygen/).
Warning: Documentation is broken in current release
V/ Clean-up the
distribution
In order to cleanup some files generated
during the compilation,
run 'make clean'
In order to
cleanup all files generated during the compilation and
the
configuration of the library (config.cache, config.status),
run
'make distclean'
VI/ rshp tool : parallel remote
execution launcher
After the install step, binary is
placed in the install_dir/bin
CAUTION: to have rshp works you
must configure your host(s)
- the tuk program is launched on all
nodes, so if tuk is not set in the PATH variable you must launch it
with the full binary name:
example: /usr/local/bin/rshp .....
-
the rshp binary must be on all host you want
reach
- tuk can use all connector listed below
-
to that you must configure your host to
perform a rsh (or else) to the remote host without password prompt
- if you want use optimal internal connectors rshp must be set
with suid bit:
chown root.root rshp
chmod +s rshp
- thats
all folks