To compare performace, use the command system.time().
Here is an example:
Create a matrix of random normaly distributed numbers:
> A <- matrix(rnorm(1000000), 1000)
Check CPU time for unparallel matrix multiplication:
> system.time(A %*% A)
[1] 5.33 0.02 5.36 0.00 0.00
elapesed time was 5.36 seconds
... And for parallel matrix multiplication:
> system.time(parMM(cl, A, A))
[1] 2.49 2.09 9.16 0.00 0.00
elapsed time was 9.16 seconds.
Even in dedicated clusters with high speed networking,
communication is orders of magnitude slower than computation.
In order to reduce communication costs, eliminate unnecessary data
transfer.