module (bold = default) build dependencies
spark/2.0.2/b1 x86_64 jdk/1.8.0_111

Trying out spark interactively

Use the interactive script to get a job with however many nodes/cores you need - for example

  interactive -p debug -t 60 -N 2 --ntasks-per-node=22

You can then create a spark cluster within your slurm allocation by running

 module load spark
 source start-spark

You can then start a scala shell that will connect to your spark cluster by running


or a python shell by running


or if you launch a jupyter notebook using

  jupyter notebook --ip=`hostname`.bluehive.circ.private --port=8888

you can then connect to the spark context from within the notebook using

import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/'))   
execfile(os.path.join(spark_home, 'python/pyspark/'))

you can then connect to your jupyter notebook using firefox on x2go or you can tunnel the connection back to your workstation following the instructions for jupyter

Starting a spark cluster inside of slurm

Below is a sample script that will create a spark cluster using the nodes in your job.

#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=24
#SBATCH -p debug
#SBATCH -t 4

. /software/modules/init/sh
module load spark/1.3.0
source start-spark
spark-submit --name "My app" --class SimpleCounter myfile.jar  # for scala or java 
spark-submit --name "My app"                         # for python

While your job is running you can view the status by going to port 8080 on the master from within a web browser running on bluehive. (ie through x2go)

also see Using Software.