Running ECLIS on cluster Aneto

Quite simple : once OK on PC, set a relance directory and a scratch dir on Lustre file system /cnrm. And perform house keeping regularly !

Article mis en ligne le 1er février 2016
dernière modification le 5 février 2016

par senesi

Aneto is CNRM’s cluster. Albeit it may be sometime heavily loaded, using it for simulation can be handy, because its is binary compatible with CNRM’s PCs and servers. Thanks to plumbing work behind the curtain, it is easy to use it, once you are familiar with the specifics of running ECLIS on a PC (see Running ECLIS on a PC or server at CNRM). You will have to do some house-keeping. Available since Eclis V6.7

From the user point of view, the only ECLIS settings which are specific to Aneto (with regard to Running ECLIS on a PC or server at CNRM) are :

  • your environment must be correctly set for running some job on Aneto ; this includes setting the ssh keys correctly (e.g. by using command ssh-key-aneto) ; this also includes setting RDPSERVER in you bash_profile if you want to submit jobs from your PC ; see Aneto’s doc and/or your team IT support
  • the directory used for the ’relance’ must be set to a location on /cnrm. You can do so by setting Eclis parameter RELDIR for each experiment that you want to run on Aneto (see attached example param_MAD150a)
  • param_MAD150a
    Eclis param file example

    you have to define a ’scratch’ directory. This is the place where Eclis will store the experiment run directories, the file transfer directories, and the job steps directories. it must be a location on /cnrm. Eclis will perform a limited house-keeping there (each new run for an experiment will remove all its files which are older than 5 days) ; you have to take care of a deeper house-keeping ; you define the root location for these directories using environment variable ECLIS_SCRATCH ; this can be done in your bash_profile or at experiment install stage or by setting it as an Eclis parameter in the param_file (see attached example param_MAD150a )

Some information on Eclis’s use of Aneto :

  • you can submit from a PC or a server (but see the note on ’relance’ directory)
  • if you use your own version of Eclis, make sure that it is located on /cnrm
  • at the time of writing, Eclis does request 16 procs per node on Aneto, in order to be able to use any node
  • resquesting more than one node on Aneto exposes you to slow down your simulation, because the inter-node network is a bit slow
  • you should not request more than 4 nodes on Aneto, counting all your jobs
  • there is yet no accounting on Aneto
  • the version of ’relan’ used by the simulation run script is locked to the one colocated with your Eclis version
  • running Arpege is OK thanks to the synchronization of /home/common/sync from a GMAP server on Aneto ; a number of tools used are soght there (Grib_samples and libraries, mpi_exec, ...)
  • on some occasions, Aneto runs are slow ; you may try to correlate that with the list of nodes actually used, which shows in your job output after words "Nodes used :"

Dans la même rubrique

Breaks in Eclis upward compatibility
le 19 février 2019
par senesi
Eclis and Xios
le 19 février 2019
par senesi
Running ECLIS on a PC or server at CNRM
le 6 janvier 2016
par senesi
Présentation d’ECLIS
le 15 octobre 2013
par senesi
ECLIS sur Beaufix
le 20 août 2013
par senesi