Globus is a connected set of services for data management. It can be used for moving data between your local machine and the cluster. It is based on GridFTP.
Hadoop testing
TestDFSIO : Test how fast is your cluster in terms of I/O
TeraSort: Basically, the goal of TeraSort is to sort 1TB of data (or any other amount of data you want) as fast as possible. It is a benchmark that combines testing the HDFS and MapReduce layers of an Hadoop cluster.
MRBench: MRBench
(see src/test/org/apache/hadoop/mapred/MRBench.java
) loops a small job a number of times. As such it is a very complimentary benchmark to the “large-scale” TeraSort benchmark suite because MRBench checks whether small job runs are responsive and running efficiently on your cluster. It puts its focus on the MapReduce layer as its impact on the HDFS layer is very limited.