How to Use the Dashboards

Overview

This dashboard view provides holistic infrastructure information useful for both researchers and system administrators in managing and planning the resources.

Indicators

This section presents high-level statistics of the GPU computing resources

Cluster Load

Real-time monitoring of the cluster status in terms of GPU and CPU utilisation.

Queueing

Inspecting all queueing jobs. Possible reasons why jobs are queueing include:

  • The number of GPUs requested to be allocated to the job has exceeded the remaining GPUs in the project.
  • The job is waiting for other jobs to finish before it can be scheduled.

Idle GPUs

Displaying the number of idle GPUs currently allocated to running workloads.

Running Workloads

Summary of the list of running workloads.

Analytics

This dashboard provides more detailed breakdowns of the DGX running status. Key statistics that are reported at separate levels:

  • Cluster
  • Project
  • Workloads
  • Nodes