With the use of the EU Processing cluster you agree to follow the ALMA data access policy. In a nutshell:
The status of the cluster can be shown with
pestat
The cluster gateway server is
arcp1.hq.eso.org
It can only be seen from within ESO's network. Ssh into the gateway using your individual/visitor account.
sb slurm_script.sh # standard batch job sbl slurm_script.sh # large (256GB) batch job sbh slurm_script.sh # huge (512GB) batch job si # standard interactive shell sil # large (256GB) interactive shell sih # huge (512GB) interactive shell
The huge queue must strictly only used for jobs that absolutely require 512GB. Please logout of interactive shells as soon as you do not need a shell any more.
In order to see which jobs are running, use the
squeue
command.
In order to see the full context of the scheduling of a given job run e.g.
scontrol show job 1234
where 1234 is the id of the job. This id is returned by squeue. Alternatively the command
ssqueue
can be used. This has grepping capabilities built in, e.g.
ssqueue large bash
will only show the 'large' jobs which are running currently and have 'bash' in the line. Any number of arguments to grep for can be added.
nodeidle
squota
or the users with the largest disk-space consumption
squota top
or for any other user or 'all'.
siand perform the downloads/uploads there.
carta image.fits
For maximum graphical performance, copy the script /opsw/home/arcproc/cartaremote.sh to your local laptop/desktop, run chmod a+x cartaremote.sh and call it e.g.
./cartaremote.sh /opsw/.../somefile.fits
This will remotely start a CARTA backend on arcp1 as well as an ssh tunnel and launch the corresponding visualization URL in your own local browser. Hit CTRL+C in that shell to close the backend and tunnel again. Although ./cartaremote.sh is run on your local laptop/desktop, the file path has to be the path of the file on the cluster. Note that this works only with your own user-account.
If carta does not open within 10 seconds, please change your ~/.carta-beta/config/preferences.json to
{ "$schema": "https://cartavis.github.io/schemas/preferences_schema_2.json", "lastUsedFolder": "", "telemetryConsentShown": true, "telemetryMode": "none", "telemetryUuid": "1", "version": 2 }
In addition to the home directory, each user has access to the workspace at
/opsw/work/<username>
Please only use this directory for data.The home directory is intended to hold the personal configuration (e.g. .bashrc) and possibly python scripts but no data.
The "work" alias command takes you to your workspace. The disk space on the lustre filesystem holding also the workspspace is shared amongst all users. Data should be removed therefore as soon as it is not needed any more. There is no backup available on that space, please consider it as being scratch space.
The environment for the use of CASA, the additional analyis software, the pipleine etc. is sourced in your .bashrc automatically. This allows us to adapt the settings to the changing environment without the users having to change their .bashrc.
On your local computer do
cd /media/<username>/mylargeusbdisk ssh <username>@arcp1.hq.eso.org 'cd /opsw/work/<username>/; tar zcf - directorythatshouldbetarredup' > directorythatshouldbetarredup.tgz
If you access the cluster via a high-latency connection you might want to start a virtual server (vnc) on the cluster and connect to that server over ssh. This also has the advantage, that jobs that have been started can continue to run even if you disconnect (e.g. with your laptop during travels). In order to start the vncserver (if none is running for you) and to get the commands to be used for the ssh tunneling, run on arcp1
getvnc.sh [-geometry WIDTHxHEIGHT]
Note that this command does not start a second server if one is running already. So the command can be run anytime in order to get the tunnel and connection commands. Optionally a geometry can be given, e.g. to obtain a larger window if the vncserver is accessed from a larger screen.
Once that command is run on the cluster gateway arcp1, you have to run the other two commands (one for starting a ssh tunnel and one to connect the vncserver) on your external machine (e.g. laptop). For Mac users, "Chicken of the VNC" can for example be used instead of vncviewer.
Note that if your system is using tigervnc, you might have to enter "localhost:5900" into the connection field and keep the port (if there is one asked for) at 0.
If you do not need your vncserver any more, please run
getvnc.sh -k
As the vncservers do consume resources, please do remove the vncserver as soon as you do not need it anymore.
Only the icewm desktop environment is available. If you prefer 'konsole' over 'xterm' as the terminal window, create a file called 'toolbar' in /lustre/home/`whoami`/.icewm with the content
prog "Terminal" utilities-terminal /usr/bin/konsole prog "Web browser" web-browser xdg-open about:blankand then kill and restart your vnc server.
Once you are logged in, you can start CASA using our local wrapper:
casapy-6.4.1 | will start CASA 6.4.1 |
casapy-x.y | will generally start CASA x.y if available |
casapipe-x.y | will generally start CASA x.y with option --pipeline if available |
ssh -Y -t -C -L 8000:localhost:8000 pipeproc@arcp1.hq.eso.org "echo Tunnel set up for 10 hours; sleep 36000"in a shell (probably you want to create an alias in your .bashrc) and point your webbrowser to
localhost:8000
The following rules strictly apply:
1) Apptainer containers must be self-built and start from an official distribution of CentOS, RockyLinux or Ubuntu. E.g. the first two lines of the apptainer definition file must be
Bootstrap: docker From: centos:centos8The use of any other container, e.g. downloaded from dockerhub, is strictly forbidden.
2) The entire responsibility for any issues arising through the use of containers (spam, cryptomining, violation of ESO's IT policies) are fully carried by the user running the container.
3) All container definition files must be sent to Enrique, Dirk and Felix for a basic inspection before the corresponding containers can be run.
In addition to the use of interactive shells on the cluster, jobs, including those requiring X windows, can be submitted to the batch queue from the gateway server arcp1:
sbl slurm_script.sh
Here slurm_script.sh is a shell script to be executed. Run
man squeue
to get a full description on what these job scripts can look like. Batch jobs using graphics need to connect to a virtual X Windows Frame Buffer. Such a frame buffer is running on each host at "localhost:999" on screen 0.
It is possible to transfer values inside the slurm script with
--export=VARIABLENAME1=$VALUE1,VARIABLENAME2=$VALUE2
In the script then the variable $VARIABLENAME1 will contain the VALUE1 etc.
This is an example, executing "myscript.py" in CASA,:
#! /bin/bash #SBATCH --job-name=serial_job_test # Job name #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=email@eso.org # Where to send mail #SBATCH --output=output_%j.log # Standard output and error log echo "Starting job at `date +'%Y-%m-%d %H:%M:%S'`" # Source the environment source /home/`whoami`/.bash_profile source /home/`whoami`/.bashrc # Set the display to the virtual frame buffer export DISPLAY=:999 # Change to the directory in which the job was submitted cd /lustre/opsw/work/`whoami echo "Running on host `uname -n`" # Run myscript.py in CASA # Note that the output of the actual job is redirected to the logfile directory. The reason is, that the logfiles from torque # only get written out at the very end of the job. Redirecting the output allows to follow the job during the batch execution. casapy-stable --nologger --nogui -c "myscript.py" > /opsw/work/`whoami`/${SLURM_JOB_ID}.log 2>&1 echo "Done at `date +'%Y-%m-%d %H:%M:%S'`"
If you have data on the cluster file system which is voluminous (500 GB or more) and which you don't need for the next few months but don't want to simply delete, you can have it written to LTO8 tape by the ESO IT service desk - and then delete it on Lustre. When you need the data again, it can be restored with the same directory structure, ownership and permissions back on to the Lustre file system. Also for that you need to ask the service desk. Response time is of the order of days.
The LTO8 tapes have a size of 12 TB (compressed). So, ca. 15 TB uncompressed data fit on it.
The tape stays with helpdesk until it is full or until you request to take it yourself and store it elsewhere (your office).
Procedure:
1) Identify the data on Lustre which you want to move to tape. Only whole directories can be stored. 2) File a service desk ticket by writing email to ESO IT Service Desk <servicedesk@eso.org>: Sample text: Subject: "Store backup data for user XXX on LTO" Dear helpdesk, I have data on the EU ARC arcpX cluster, head node arcp1.hq.eso.org, which I would like to write to LTO tape and then delete. Please store the following directories for me on an LTO which was used for my data before or a new one: /opsw/work/YYY.. <your list of directories giving full, absolute paths> Thanks in advance! 3) Helpdesk will assign the task to a person who will contact you with more details 4) Once the backup is complete, they will notify you and you can go ahead and delete all the directories in your list. (You can also ask them to do that for you.) 5) Depending on whether you want to keep the tape yourself or not, you should tell helpdesk what you would like them to do with the tape. Restore: For restore, give the tape back to helpdesk (if you took it) and file a new ticket to request the restore.
IT will charge the ARC for the tape consumption. Each tape is ca. 70 Euros at the moment.
-- FelixStoehr - 26 Nov 2012