Borg

From ScorecWiki

Jump to: navigation, search

Contents

Borg Information

Here's some quick information about the Borg Computer cluster. This page will be refined over time, but this should give you enough information to hit the ground running.

Hardware Description

Interconnect

Non-blocking switched gigabit Ethernet, with 0.2ms average latency.

Master Nodes

Borg has a master node for managing the operation of the cluster and for interactive use by the cluster users.

Pre/Post Processing Node

The host piglet.scorec.rpi.edu was purchased for the pre- and post- processing of data to be used with this cluster. This machine is just like any other SCOREC host. As with all of the other general purpose SCOREC systems the borg/bigtmp file system is mounted on piglet under /fasttmp .

It does requre special authorization to use; email help@scorec.rpi.edu to request access. Note that request not detailing why they require greater than 16GB of memory to run will be denied.

Compute Nodes

The cluster is composed of 50 compute nodes, all are configured with 4 processors and 6 GB memory each.

Software Description

It is important to be aware that this cluster's processors are AMD64 chips. This means that code compiled for IA32 will generally run but will not be able to see all of the installed memory in a given node nor will be it able to use the additional 8 processor registers. To get around this you must recompile your code with the GNU or Intel tools found on any of these systems (gcc/gfortran will be in your $PATH, icc/ifort are in /usr/local/intel). Note that if you plan to run binaries compiled with the Intel tools you will need to include the Intel libraries in your job scripts (replace 90 with the version of the Intel compiler that you are using):

export LD_LIBRARY_PATH=/usr/local/intel/compiler90/lib:$LD_LIBRARY_PATH

Some pertinent information about using AMD64 at SCOREC:

  • To compile 32 bit code on borg using the GNU compilers you must pass the -m32 flag to the compiler;
  • SCOREC's Intel 32 bit /usr/local can be found on the master node and all compute nodes at /usr/local32. To compile 32 bit code with the Intel toolchain you must use a 32 bit machine. 32 bit code cannot be built with the Intel tools on a 64 bit machine.;
  • To run a 32 bit parallel job that has been built with the Intel tool chain on borg you must compile it with the mpicc located in /usr/local/mpich-icc90/32/latest on a 32 bit machine and run the job with /usr/local/mpich-icc90/32/latest/bin/mpirun on borg.
  • Some programs may require the environment to appear to be 32 bit to run and consequently must be run with the linux32 command, for example:
linux32 /usr/local32/application
  • At this time we share the licenses for the Intel compilers with another group on campus, and as such you may find that the licenses are being used by someone else when you go to use the tools. If this happens please wait a few minutes and try again. If you persistently cannot check out a license email help@scorec.rpi.edu.

Job Submission

Borg uses Slurm as of March 2009

To start an interactive session on one node:

 salloc -N 1 -p <partition>

where partition is either debug or normal.

execute compute jobs with srun, otherwise they will execute on the master node

Further information can be found on the CCNI Wiki and on the Slurm page.

Environment Variables

To pass environment variables through to the compute nodes, such as LD_LIBRARY_PATH, add them to your ~/.bashrc file.

Killing Jobs

 scancel <jobid from squeue>

Further information can be found on the CCNI Wiki and on the Slurm page.

Disk Storage

LocationSizeDescription
/borg16.1TB
  • ONLY items placed in the Backup directory are backed up
  • Input and output from parallel jobs should go here
  • Visible on all nodes and SCOREC-wide as /bigtmp
/users500MB
  • Backed up nightly
  • Only for storage of ssh and shell environment files for each user
  • Visible on all nodes
/import/users1TB
  • Exported from data.scorec.rpi.edu
  • This is your regular SCOREC home directory
  • All compilation and debugging should be done here
  • NOT visible on all nodes
/space~55GB
  • Local disk scratch space
  • Data older than 14 days is automatically DESTROYED
Personal tools