Downscaling climate data faster using GPU-enabled Machine Learning and Multiprocessing

Dr. Sanaa Hobeichi (Postdoc, CLEX), Dr. Samuel Green (CMS team, CLEX), Jingbo Wang (Manager, Training, Research and User Services, NCI), and Nidhi Nishant (Postdoc, CLEX)

“Faster, more efficient software,” “speedups and code profiling,” and “parallel programming” – these were the central takeaways from the two-week National Computational Infrastructure (NCI) HPC-AI Hackathon hosted in partnership with NVIDIA, and the OpenACC organisation from 24 October to 4 November. Read the full story on the NCI website: GPU Hackathon builds research skills and software capacity.

Modern programming has benefited from GPU accelerators for more than two decades, and now scientific applications are taking advantage of these performance gains. The recent boom in AI has made accelerators even more critical. Nine teams with more than 50 researchers from across Australia, Taiwan, India, Singapore, and the US worked together and with the NCI and NVIDIA experts to crack open their codes and make them more efficient. The science domains covered climate and weather science, geophysics, material science, fluid dynamics, aircraft modelling, and computational chemistry. All teams created substantial improvements to their HPC or AI applications. NCI is proud to support cutting-edge research around Australia and the world.

Dr. Sanaa Hobeichi, Dr. Nidhi Nishant, and Dr. Sam Green formed a team focused on improving localised climate modelling, with the support of the Australian Research Council Centre of Excellence for Climate Extremes. They worked on “a project to develop and optimise machine learning models for downscaling climate data,” said Dr Hobeichi.

She said, “The HPC-AI Hackathon was a significant opportunity for us to develop improvements to our code by taking advantage of the modern graphics processing units (GPUs) dedicated during the hackathon. We also had a chance to work closely with NVIDIA experts Juntao Yang and Yi Kwan Zheng on optimising this work on GPUs.”

Figure 1: Low-resolution (Left) and High-resolution (Right) maps of Evapotranspiration over part of New South Wales.

The maps in Figure 1 show what the team is trying to achieve. As they say, “The aim of this project is to use GPU-enabled machine learning to produce the high-resolution map from the low-resolution one.” This 9-times increase in resolution provides vital local information about the water use and transpiration of plants and soil. The fine details provide essential information for local adaptation planning to climate change impacts on agriculture, water supply, and fire risk, among others.

NCI’s ongoing training collaboration with vendors has helped users to develop expertise with their codes directly on the Gadi system, working in real-time on the platform they use every day. The Hackathon has also provided an opportunity to develop the mentors’ skills in training, education and content delivery.

The NCI-NVIDIA HPC-AI Hackathon has set up researchers to perform bigger, better and more efficient calculations with the HPC resources available to them. NCI is looking forward to hosting further GPU training events for our users and international computational researchers in coming years. We hope more CLEX researchers can participate and benefit from these great events in the future.

Hackathon Outcomes

Coming into the Hackathon we already had a running Python script that generates high resolution maps of Evapotranspiration (Figure 1) from low resolution ones by building neural network models that emulate downscaling using the GPU-enabled torch.nn package. This script builds a neural network for every gridcell in the low-resolution map, which contains more than 6000 gridcells, in a sequential way. A single gridcell takes 1.5 minutes to process, so it takes 150 hours to downscale a region of 6000 gridcells. While there are several ways to parallelise processes on CPUs, we didn’t know how to run multi-processes on a GPU. Multi-processing our code on GPUs will result in a significant speed-up of downscaling which is our main goal in this hackathon.

Our journey started with meeting our mentors and explaining how our code worked. We also outlined our goals for the event. Our mentors explained how to use the NVIDIA Nsight systems to profile our code and helped us analyse the efficiency of our script in using the CPUs and GPU. We used the NVIDIA Nsight tool and annotated our script using the NVTX library. We found that we only use ~25% of the GPU memory and 5 CPUs out of the 16 requested. As a result, we needed to find a way that allows us to build multiple neural networks in parallel such that we make full use of the GPU and CPUs.

Our mentors also helped us upgrade our script to use GPU multi-processing. To achieve this, we took advantage of the PyTorch.multiprocessing library to assign each grid to its own process on the GPU. While doing this we also refactored the script to reduce repetition and included more NVTX annotations to add information to our outputted profiler.

We evaluated the predicted evapotranspiration results to make sure the parallel neural network was performing correctly. We used the Nsight tool to confirm that the processes were running in parallel and that there were no obvious bottlenecks. Then we started to analyse the number of parallel grid cells vs the runtime of the job. As the day folds, we are so excited to see how many processes can run in parallel in order to maximise the use of GPUs.

Finally, we increased the number of processes running in parallel and tracked various CPU/GPU usage parameters to determine the optimal number of processes that enables full usage of one GPU. Figure 2 shows the CPU and GPU usage and the runtime as we increase the number of processes (number of grid cells) run in parallel. We determined that a single GPU can handle 4 processes (4 grid cells) in parallel. Future work will be to scale this application across multiple GPUs to further reduce the runtime.

Figure 2: The usage of CPU and GPU resources versus the Number of grid cells downscaled in parallel

The NCI hackathon was a great opportunity to work closely with NVIDIA GPU experts. Their guidance helped us to learn how to convert our code from sequential code to multiprocessing using GPUs, how to profile our code and find bottlenecks, how to use GPU and CPU resources more efficiently, and how to optimise these for our current application. As a result, we created a downscaling emulator that uses fewer resources and so far is 4 times faster (see Figure 3). This is a work in progress, and we are excited to see how much further we can reduce the runtime on multiple GPUs, and to apply these new multiprocessing skills to future applications in climate science.

Figure 3: The number of grid cells versus downscaling time in the case of Parallel and Sequential processing.