Getting Started

How do I get an account?

Complete the online account request form.

What is a NetID and how is it used?

Authentication to log in is based on the UR NetID given to all employees and students at the university. It is the same login name as used to access HRMS, blackboard, or the library. Visitors or collaborators may obtain a guest NetID.

How do I get a guest/collaborator NetID to access the clusters?

Someone from the university must sponsor the guest and approve the guest's use of UR computing systems. These guest NetIDs usually expire after a few months, but they can be extended. Please click here for information on obtaining a guest NetID. Once the guest NetID has been issued, the collaborator must fill out an account request form. If the collaborator is off-campus, he/she will have to use the UR VPN client to log into the machines.

I have never used Linux before. Where do I get help with the basics?

We run a workshop for beginners every semester. Also see UNIX Tutorial for Beginners from Michael Stonebank to help you get started using Linux. If you have any questions, please contact CIRC staff.

Connecting

I'm having trouble connecting.

  • your username/password combination should be the same as for HRMS, blackboard, or the library. Make sure you're entering your username in all lowercase.
  • you can reset your password at the UR My-Identity site.
  • if you're still having trouble, email us.

How do I connect or transfer files to or from the clusters?

See Getting Started.

How do I connect to the VPN?

See University VPN.

The X2go client for MacOS stopped working?

If you have recently upgraded to Yosemite, you will need to download an updated version of the X2go Client for Mac

I cannot reconnect to my X2go session

Occasionally an x2go session may not properly resume. You can terminate all of your open x2go sessions and start fresh by running

x2go-terminate-all-sessions

You can also terminate a particular session. To see a list of sessions use

x2golistsessions

The session ID is the long string in the second field in the pipe-delimted output. You can pass this session ID to x2goterminate-session

[jcarrol5@bluehive ~]$ x2golistsessions
61260|jcarrol5-138-1439402097_stDGNOME_dp32|138|bluehive.....
92754|jcarrol5-123-1439402582_stDGNOME_dp32|123|bluehive.....
98306|jcarrol5-185-1439402594_stDGNOME_dp32|185|bluehive....
[jcarrol5@bluehive ~]$ x2goterminate-session jcarrol5-185-1439402594_stDGNOME_dp32
[jcarrol5@bluehive ~]$

Running Jobs

What is the difference between the head node and the compute nodes?

The BlueHive head node is a gateway to the cluster and is a common point of entry for all of BlueHive users. When you first connect via X2go or ssh to bluehive.circ.rochester.edu, you are on the head node. Since the head node is shared by all of BlueHive users, you should avoid running any commands from the terminal that require significant disk access, cpu time, or memory. Commands like rsync, gzip, tar, or even compilers will run much faster on dedicated resources on a compute node and will avoid disrupting other users. To use a compute node for commands run from the terminal, you can simply preface the commands with srun, along with other slurm options.

[johndoe@bluehive ~]$ srun -p debug -t 60 rsync ....

This will create a job in the debug partition for 60 minutes with the default 1 cpu and 2 GB of memory and will run the rsync command. Once the command finishes, the job will automatically terminate.

If you will be running several commands, you can also use the interactive script to get a shell on a compute node

interactive -p debug -t 60
...
[johndoe@bhc0001 ~]$ rsync ...

and if you need more than 60 minutes, you can use the interactive parition for jobs lasting up to 8 hours

[johndoe@bluehive ~]$ interactive -p interactive -t 8:00:00

How many nodes and CPUs do I get?

When you sign up for an account, you are given access to at most 16 nodes and 120 CPU cores on BlueHive at a given time. If you need more resources, please contact CIRC staff.

How should I run a job?

Any job that uses more than a few minutes of CPU time should run on a compute node, using the queuing system. See Running Jobs.

How do I get my job run sooner?

The fewer resources you request (wall time and memory), the sooner your job will run. See Running Jobs.

How much memory should I request?

As little as possible for your job to run. This will ensure that your job is scheduled as early as possible.

What happens if I specify too little memory?

The operating system will terminate any process that exceeds the requested amount of memory.

What is the maximum amount of time I can request for a job?

See Job Queues for the time limit associated with each queue.

What if I need more time?

You can request a temporary reservation. Email us with a specific request for how many nodes you need, how long you need them, and for what application. BlueHive is shared by the entire university so we can't guarantee that every request can be accommodated but often a temporary reservation for more CPU time can be granted.

How many nodes and CPUs can I use?

Users are limited to 16 nodes and 120 CPU cores at one time. You can submit many jobs requesting more resources in aggregate, but in that case later jobs will not run until prior jobs have completed.

How many jobs can I submit?

Users can have up to 2000 jobs running or pending in the queue at one time. Additional jobs cannot be queued until prior ones have completed. Note that elements of a job array count as separate jobs towards this limit.

How do I request a GPU?

#SBATCH -p gpu --gres=gpu:1

The number after the colon (1 or 2) is how many GPUs per node you require.

How do I convert files from dos format to unix format?

If you edit slurm scripts in windows or dos, you may get the following error when submitting them to slurm:

sbatch: error: Batch script contains DOS line breaks (\r\n)
sbatch: error: instead of expected UNIX line breaks (\n).

To convert the files use the dos2unix utility:

dos2unix myscript.slurm

Files

I want to share files with other users, but when they try to read them, they get "Permission denied."

See Sharing Files.

How much disk space do I get?

See Storage.

I am over my quota. Can I have more disk space?

It may be helpful to first verify that you don't have old files you can get rid of. To view your disk usage for your home directory you can use

module load duc
duc index $HOME
duc gui $HOME

and for your scratch directory

module load duc
duc index $SCRATCH
duc gui $SCRATCH

You can also request that your quota be increased temporarily. Email us with a specific request for how much more space you need, how long you need it, and for what application. As with CPU time, storage is a resource that is shared by the entire university so we can't guarantee that every request can be accommodated, but often a temporary allocation can be granted.

Software

What software is available?

See Software_Index for a complete list.

I need to use a software package that is not currently installed.

If the software is freely available, email us the request at circ@rochester.edu and we will install it. We cannot purchase licenses for proprietary software, but if funds are available from your advisor or another source, we can assist with installing and maintaining proprietary software as well.

When I run an MPI job on the login node or some compute nodes, I am getting a message

librdmacm: Fatal: no RDMA devices found
--------------------------------------------------------------------------
[[0,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: bluehive.circ.rochester.edu
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

or

[bhx0021:06951] mca: base: component_find: unable to open /software/openmpi/1.6.5/b2/lib/openmpi/mca_btl_ofud: librdmacm.so.1: cannot open shared object file: No such file or directory (ignored)

BlueHive uses infiniband, a high speed interconnect. Executables compiled with OpenMPI will try to dynamically load shared objects in the infiniband software package so that they can use it. If they are run on compute nodes which don't have infiniband (or the login node, which is a virtual machine), MPI prints a message about not finding these. The executable should still run correctly.

General

How much does it cost to use CIRC resources?

Research grants are charged a nominal annual fee. Additional capacity and services are available at quoted rates. Please contact the Director of CIRC (Brendan Mort - brendan.mort@rochester.edu) for more information.

Who may attend CIRC workshops?

Anyone associated with the university. Registration is required, however, and space may be limited.