snow Simplified
Many computationally demanding statistical procedures, such as
Bootstrapping and Markov Chain Monte Carlo, can be speeded up
significantly by using several connected computers in parallel.
The package snow (an acronym for Simple Network Of Workstations)
provides a high-level interface for using a
workstation
cluster
for
parallel computations
in R .
snow relies on the Master / Slave model of communication
in which one device or process (known as the master) controls
one or more other devices or processes (known as slaves).
In Los Angeles, officials pointed out that such
terms as "master" and
"slave" are unacceptable and offensive, and equipment manufacturers were
asked not to use these terms. Some manufacturers changed the wording to
primary / secondary
(Source: CNN ).
snow Simplified is an adaptation of an
article by Anthony Rossini, Luke Tierney and Na Li,
'Simple parallel statistical computing in R'.
It has been made by Sigal Blay
(sblay sfu.ca).
This work has been made possible by the
Statistical Genetics Working Group at the
Department of Statistics and Actuarial Science , SFU.
snow implements an interface to three different low level
mechanisms for creating a virtual connection between processes:
Socket
PVM (Parallel Virtual Machine)
MPI (Message Passing Interface)
Starting and Stopping clusters
The way to Initialize slave R processes depends on your system configuration.
If MPI is installed and there are two computation nodes, use:
> cl <- makeCluster(2, type = "MPI")
Shut down the cluster and clean up any remaining
connections between
machines:
> stopCluster(cl)
clusterCall(cl, fun, ...)
cl: the computer cluster created with makeCluster.
fun: the function to be applied.
clusterCall calls a specified function with identical arguments on each
node in the cluster.
The arguments to clusterCall are evaluated on the master,
their values transmitted to the slave nodes which execute the function call.
> myfunc <- function(x=2){x+1}
> myfunc_argument <- 5
> clusterCall(cl, myfunc, myfunc_argument)
is the same as
> clusterCall(cl, function(x=2){x+1}, 5)
and the result:
[[1]]
[1] 6
[[2]]
[1] 6
An alternative:
> clusterCall(cl, eval, myfunc(arg1,arg2,...))
clusterEvalQ(cl, expr)
cl: the computer cluster created with makeCluster.
expr: an expression, typically a function call.
clusterEvalQ evaluates a literal expression on each cluster node.
'expr' is treated on the master as a character string.
The expression is evaluated on the slave nodes.
Finding the host name for each cluster node:
> clusterEvalQ(cl, Sys.getenv("HOST"))
[[1]]
HOST
"stat-db"
[[2]]
HOST
"stat-db1"
Loading the boot package on all cluster nodes:
> clusterEvalQ(cl, library(boot))
clusterEvalQ is the cluster version of the R function 'evalq'.
clusterApply(cl, seq, fun, ...)
clusterApply takes a cluster, a sequence of arguments (can be a vector or
a list), and a function, and calls the function with the first element of
the list on the first node, with the second element of the list on the
second node, and so on. The list of arguments must have at most as many
elements as there are nodes in the cluster.
> clusterApply(cl, 1:2, sum, 3)
[[1]]
[1] 4
[[2]]
[1] 5
clusterApplyLB(cl, seq, fun, ...)
clusterApplyLB is a load balancing version of clusterApply.
It hands in a balanced work load to slave nodes when
the length of seq is greater than the number of cluster nodes.
Doesn`t work if cluster Type=Socket.
> length(cl)
[1] 2
> clusterApplyLB(cl, 1:3, sum, 3)
[[1]]
[1] 4
[[2]]
[1] 5
[[3]]
[1] 6
Using clusterApplyLB can result in better cluster utilization than using
clusterApply. However, increasing communication can reduce performance.
Furthermore, the node that executes a particular job is nondeterministic,
which can complicate ensuring reproducibility in simulations.
clusterExport(cl, list)
Assigns the global values on the master of the variables named in 'list' to
variables of the same names in the global environments of each node.
> myvar <- 666
> clusterExport(cl, "myvar")
clusterSplit(cl, seq)
clusterSplit splits 'seq' into one consecutive piece for each cluster and
returns the result as a list with length equal to the number of cluster
nodes. The pieces are chosen to be close to equal in length.
> clusterSplit(cl, 1:6)
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5 6
parApply(cl, X, MARGIN, fun, ...)
X: the array to be used.
MARGIN: a vector giving the subscripts which the function will be
applied over. '1' indicates rows, '2' indicates columns,
'c(1,2)' indicates rows and columns.
fun: the function to be applied.
parApply is the parallel version of `the R function apply.
>A<-matrix(1:10, 5, 2)
> A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> parApply(cl, A, 1, sum)
[1] 7 9 11 13 15
parRapply(cl, X, fun, ...)
Row parallel 'apply' function for matrix X.
May be slightly more efficient than parApply.
parCapply(cl, X, fun, ...)
Column parallel 'apply' function for matrix X.
May be slightly more efficient than parApply.
parLapply(cl, X, fun, ...)
parLapply is the parallel version of the R function 'lapply'.
'lapply' returns a list of the same length as 'X'. Each element
of which is the result of applying 'fun' to the corresponding
element of 'X'.
Create a list of two sequences of numbers:
> x <- list(alpha = 1:10, beta = exp(-3:3))
> x
$alpha
[1] 1 2 3 4 5 6 7 8 9 10
$beta
[1] 0.04978707 0.13533528 0.36787944 1.00000000
[5] 2.71828183 7.38905610 20.08553692
Calculate quantiles for each sequence - parLapply returns a list:
> parLapply(cl, x, quantile)
$alpha
0% 25% 50% 75% 100%
1.00 3.25 5.50 7.75 10.00
$beta
0% 25% 50% 75% 100%
0.04978707 0.25160736 1.00000000 5.05366896 20.08553692
parSapply(cl, X, fun, ...,
simplify=TRUE, USE.NAMES=TRUE)
parSapply is the parallel version of `the R function 'sapply'.
'sapply' is a "user-friendly" version of 'lapply'.
It returns a vector or matrix with 'dimnames' if appropriate.
Calculate quantiles for each sequence - parSapply returns a matrix:
> parSapply(cl, x, quantile)
alpha beta
0% 1.00 0.04978707
25% 3.25 0.25160736
50% 5.50 1.00000000
75% 7.75 5.05366896
100% 10.00 20.08553692
parMM(cl, A, B)
A, B are the matices to be multiplied.
How much faster is using snow?
To compare performace, use the command system.time().
Here is an example:
Create a matrix of random normaly distributed numbers:
> A <- matrix(rnorm(1000000), 1000)
Check CPU time for unparallel matrix multiplication:
> system.time(A %*% A)
[1] 5.33 0.02 5.36 0.00 0.00
elapesed time was 5.36 seconds
... And for parallel matrix multiplication:
> system.time(parMM(cl, A, A))
[1] 2.49 2.09 9.16 0.00 0.00
elapsed time was 9.16 seconds.
Even in dedicated clusters with high speed networking,
communication is orders of magnitude slower than computation.
In order to reduce communication costs, eliminate unnecessary data
transfer.