help > Conn18b SLURM issue
Feb 11, 2020  12:02 AM | Bryan Jackson - TAMU
Conn18b SLURM issue
Hi,

I'm trying to analyze HCP data (n=223) and am having issues submitting jobs to our cluster (https://hprc.tamu.edu/wiki/Terra). All jobs submit and, as far as conn is concerned, stay in limbo perpetually. Slurm however returns the jobs as complete after a few seconds with no real errors. When I reopen conn, it says I have jobs running then tries to merge and nothing more happens. 

I think the issue revolves around how the info.mat file is created. I'm using default SLURM settings (except for mem and cpus). Looking through the logs after submissions fail, the JOBID variable in the GRID config is pulling my account number, not the job id. I attempted to change this by setting an env variable, but that was lost when submitting to other nodes, and needs to be dynamically set. I imagine conn looks for a jobid using one of the slurm commands and is just reading from the wrong column or something similar. Any ideas?

Threaded View

TitleAuthorDate
Conn18b SLURM issue
Bryan Jackson Feb 11, 2020
Alfonso Nieto-Castanon Feb 11, 2020