w
ParMa2: porting VIA in LAM/MPI
ParMa2: porting VIA in LAM/MPI
What is VIA ?
VIA (Virtual Interface Architecture) is a communication protocol designed
to drastically improve network performance in relation to latency. This
is obtained providing a user direct access to network interface, avoiding
kernel intervention and intermediate copies of data.
All VIA specifications are defined to the computing industry (Compaq,Intel
and Microsoft lead the project), independently of the operating system
and the hardware architecture.
VIA Home Page
Developer:
VI Architecture - Implementor Guide Text
What is M-VIA ?
M-VIA is a VIA implementation for Linux designed at NERSC.
Since now we have tested all 3 beta release of M-VIA (0.9b1,0.9b2 and 0.9.3) and the first stable release, M-VIA 1.0, which supports the following network adapters:
- Ethernet/Fast Ethernet DEC 21x4x (Tulip) based cards.
- Intel Pro/100 Fast Ethernet cards.
- Packet Engines GNIC-I (Yellowfin) Gigabit Ethernet cards.
- Packet Engines GNIC-II (Hamachi) Gigabit Ethernet cards.
To run M-VIA 1.0 is needed a Linux kernel of version 2.2.x.
Benchmark results
The following chart shows the improvement in latency we obtained in
our cluster of four Dual Pentium II 450 MHz.
The TCP and UDP performance was measured with Netperf benchmark suite, using TCP_RR test which sends 1 byte long messagge.
The M-VIA result is obtained with the vpingpong test provided in M-VIA
distribution, with M-VIA 0.9b1 and Linux kernel 2.1.125.
M-VIA
Since on 25th of Septeber was released M-VIA 1.0, it may be interesting to test
its performance:
- Switched Fast Ethernet performance, comparing differences between network
adapter DEC 21140 and DEC 21143, and older M-VIA release.
- Back to back performance.
- Communication performance with different data alignment.
- Connection: we were not able to connect correctly more than two processes at a time, without looping the procedure; it was impossible to recover a connection failure by repeating the procedure; sometimes the client was not notified of the connection failure.
- VipRegisterMem performance for different data length and alignment, comparing to memcpy performance to understand the better approch, registration or memory copy, to use when communicating.
- Virtual Interface behaviour on asyncronous errors; waiting for reliability M-VIA support, we are interested in detecting and possibly recovering transmission errors.
We would also know the error rate of the current release, and discovers error reasons, i.e. buffer network adapter overflow or VIA data structures.
Allowing fair error detection, we may consider implementing some error recovery routine.
VIA Related Projects
We are now interested in improve the performance of parallel programs written for MPI. Taking advantage of M-VIA low latency it is possible to reduce process communication time. Therefore in december 1998 we began introducing a new communication layer within LAM/MPI library.
Porting VIA within LAM-MPI: done work
Complete MPI communication function support
Current release of LAM based on VIA supports all sets of basic communication functions:
- Kind of sends: standard send, syncronous send, buffered send, ready send.
- Non blocking primitives.
- Tag and communicator control on messages.
- MPI_Probe and non blocking MPI_IProbe, used to read a matching envelope.
- Support for receive from any process: MPI_Recv(..,MPI_ANY_SOURCE,..).
Specific data structures
These are the fundamental data structures used by our VIA level:
- A set of linked list, one for each peer, to keep in memory the location of the received messages which are not yet read.
- One linked list for the global ordering of all messages.
- A linked list for self communication.
Flow control
The realized and mainly implemented communication protocol includes flow
control functionalities, to avoid the exhaustion of all communication resources:
- RDMA area space,
- preposted descriptors number.
Packet fragmentation
Since VIA's standard establish a maximum message trasfer size (M-VIA is currently 32KB),
while MPI messages do not have size limits, we implemented a fragmentation and reassembly mechanisms.
VIA patch for LAM-6.3.2
You can try the results of our work downloading a patch that allow VIA support for LAM, release 6.3.2.
- Download lam-6.3.2.via-patch-0.3.gz.
- Unzip lam-6.3.2.via-patch-0.3.gz with gzip -d lam-6.3.2.via-patch-0.3.gz,
and apply the patch with patch -p0 < lam-6.3.2.via.patch-0.3,
then run the configure script adding the option --with-rpi=via;
finally compile LAM with make.
- Do not forget to specify network device used by VIA in file $LAMHOME/config/via.config.
- Hereafter LAM-MPI will use VIA as transport protocol unless you specify mpirun -lamd.
- All basic communication MPI operation are supported:
various type of send, receive probe; both in blocking and non blocking form.
Wildcard MPI_ANY_SOURCE is also supported.
- At least it works with three processes on different nodes; seldom even four.
- For now, you can not exploit SMP machines.
We are updating the release frequently, so some incongruencies may occurr.
If you are interested in this work, please report any bugs, comments or questions to Marco Panella.
Porting VIA within LAM-MPI: future work
Exploiting SMP machines
In order to exploit SMP machines we have to change the way the blocking MPI_Recv (...,MPI_ANY_SOURCE,...) is implemented.
For now it uses VIA's Completion Queue, which is unable to check for messages from both processes on the same node and from processes on other machines at the same time.
Integration in LAM files organization
In order to add VIA support to LAM, without patches' use, it is necessary to integrate our files in LAM distribution.
Decide way at runtime the user can choose M-VIA instead of TCP/IP or shared memory support; this may cause changing the RPI_SPLIT macro within the whole LAM
code.
Messages ordering
MPI supposes that the lower communication layer provides a reliable and ordered message transport. VIA guarantees ordered delivery in Reliable Delivery level, which is not currently implemented in M-VIA. Therefore we could add functions to reorder messages.
Any help or collaborations would be appreciated.
PAPERS
Master's thesis (in italian)
International Conference Proceedings
- Massimo Bertozzi, Franco Boselli, Gianni Conte and Monica Reggiani,
An MPI implementation on the top of the Virtual Interface Architecture,
In Proceedings of Euro PVM/MPI, Barcellona, September 1999. volume 1167, pages 199--206.Springer-Verlag.
- Massimo Bertozzi, Marco Panella, and Monica Reggiani,
Design of a VIA based communication protocol for LAM/MPI suite,
In Procs. 9th Euromicro Workshop on Parallel and Distributed Processing PDP 2001, pages 27-33, Mantova, Italy, February 2001.
Italian conference proceedings
- Massimo Bertozzi, Marco Panella, and Monica Reggiani,
A VIA support for the LAM/MPI suite,
In Procs. Workshop on Sistemi Distribuiti: Algoritmi, Architetture e Linguaggi, Ischia, Italy, September 2000.
Last Update October 11, 2001
Page maintained by Marco Panella