HPC-UGent system status

Status

Tier-2 UGent
Login nodes: Available
Compute nodes: Available
Shared filesystems: Available
Tier-1 Hortense (VSC)
Login nodes: Available (only for accepted Tier-1 compute projects - see https://www.vscentrum.be/compute)
Compute nodes: Available
Scratch filesystem: Available
Tier-1 Cloud (VSC): Available (only for accepted Tier-1 cloud projects - see https://www.vscentrum.be/cloud)
(network disruption expected Mon 11 July 08h00 - 20h00 - more information below)
VSC account page: Available

Known issues

[Fri 5 Aug 2022] Problems with logging in to HPC-UGent Tier-2 login nodes + communication with job scheduler

  • [20:52] The problems should be resolved now.
    Please contact hpc@ugent.be if you still see any issues.
  • [15:50] The situation is gradually improving, and the problems with logging in and contacting the job schedulers (for submitting jobs or checking the job queue) should hopefully be fully resolved in the coming 1-2 hours.
  • [14:30] Some problems are also occurring with the communication from the login nodes to the job scheduling software, which leads to errors like this when trying to submit a job:
    sbatch: error: Batch job submission failed: Socket timed out on send/recv operation
    
    We are still working to resolve the problems, and expect they will become gradually less frequent in the coming hour.
  • [14:06] Problems are occurring when logging in to the HPC-UGent Tier-2 infrastructure for some users, intermittently.
    This leads to errors like this when using SSH:
    vsc40000@login.hpc.ugent.be: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
    
    It also causes the HPC-UGent web portal to be unresponsive.
    The issue is caused due to problems with the HPC-UGent login nodes themselves (not with your VSC account).
    We are working on resolving the problem ASAP.

Planned maintenance

Planned UGent network disruption on 11 July 2022

A scheduled disaster recovery procedure test will take place in the datacenter of Ghent University.
This will result in network disruptions throughout 11 July 2022, at least 08h00-20h00 but we can’t rule out that the disruption is more extended.

This will affect the following infrastructure:

  • VSC accountpage
    https://account.vscentrum.be will be unavailable throughout the day.
    Connected services (such as the VSC firewall) will also be unreachable.
  • UGent Tier2 infrastructure
    Access to the Tier2 login nodes and webportal, and access to the internet from the HPC infrastructure (incl. external license servers) will not be possible during the day.
    Jobs that specifically rely on an external connection (e.g. license server) could terminate prematurely on Monday 11 July.
    Other queued and running jobs should not be affected.
  • VSC Tier1 Compute Hortense
    Access to the Tier1 Hortense login nodes and webportal will not be possible from the UGent network, but should still be possible via other university networks (KULeuven, VUB, UAntwerpen).
    Access through the VSC firewall application will also not be possible.
    Note that the Tier1 Hortense infrastructure will also not be able to connect to the internet (incl. external license servers) during the day.
    Jobs that specifically rely on an external connection (e.g. license server) could terminate prematurely on Monday 11 July.
    Other queued and running jobs should not be affected.
  • VSC Tier1 Cloud
    VSC Tier1 Cloud services (including web dashboard/API) and VMs will not be accessible during the day.
    Project VMs and storage should not be affected (everything will continue to run as usual), but will not be reachable from external networks.


Migration of Tier-2 clusters to RHEL8 (March-May 2022)

During the first half of 2022,  the HPC-UGent login nodes and the oldest Tier-2 clusters that are using CentOS 7 as operating system
were migrated to the RHEL8 (Red Hat Enterprise Linux 8) operating system.

More information, including an overview of the impact of this migration, is available here.

Reminders

  • [Wed 9 June 2021] New job command wrappers
    We switched to new job command wrappers for all HPC-UGent Tier-2 clusters.
    This switch should be transparant: you don't need to change your workflow or job scripts.
  • [Wed 27 May 2020] All SSH public keys uploaded before 20 May 2020 have been revoked.
    More information regarding this security operation at https://docs.vscentrum.be/en/latest/security_measures_20200520.html

Contact

For issues regarding Tier-2 UGent systems, contact hpc@ugent.be

For issues regarding Tier-1 Compute (VSC), contact compute@vscentrum.be

For issues regarding Tier-1 Cloud (VSC), contact cloud@vscentrum.be

Cluster load of Tier-2 UGent systems

Consult http://hpc.ugent.be/clusterstate/

(only available within the UGent network)