Graphics processors are becoming a must-have in computing, so Nvidia is stepping up its work with standards and open-source communities to downstream technologies once largely exclusive to the company’s development tools.
A lot of work is being done specifically around programming languages like C++ and Fortran, which are deemed to lag on native implementation to execute code across highly parallel systems.
The plan is to make generic computing environments and compilers more productive and approachable, Timothy Costa, group product manager for high-performance computing and quantum computing at Nvidia, told The Register.
“Ultimately, our goal with the open source community and programming is to enhance concurrency and parallelism for all. I say that because I do mean CPUs and GPUs,” Costa said.
Many of the technologies being opened up and brought mainstream are related to the past work done by Nvidia in its CUDA parallel programming framework, which combines open and proprietary libraries.
CUDA was introduced in 2007 as a set of programming tools and frameworks for coders to write programs to GPUs. But the CUDA strategy changed as GPU usage expanded to more applications and sectors.
Nvidia is largely known as dominating the GPU market, but CUDA is at the center of the company repositioning itself as a software and services provider chasing a $1 trillion market valuation.
The long-term goal is for Nvidia to be a full-stack provider targeting specialized domains that include autonomous driving, quantum computing, health care, robotics, cybersecurity, and quantum computing.
Nvidia has built CUDA libraries specialized in those areas, and also provides the hardware and services that companies can tap into.
The full-stack strategy is best illustrated by the concept of an “AI factory” introduced by CEO Jensen Huang at the recent GPU Technology Conference. The concept is that customers can drop applications in Nvidia’s mega datacenters, with the output being a customized AI model that meets specific sector or application requirements.
Nvidia has two ways to earn money via concepts like the AI factory: through the utilization of GPU capacity or usage of domain-specific CUDA libraries. Programmers can use open-source parallel programming frameworks that include OpenCL on Nvidia’s GPUs. But for those willing to invest, CUDA will provide that extra last-mile boost as it is tuned to work closely with the Nvidia’s GPU.
Parallel for all
While parallel programming is widespread in HPC, Nvidia’s goal is to standardize it in mainstream computing. The company is helping the community standardize best-in-class tools to write parallel code that is portable across hardware platforms, independent of brand, accelerator type or parallel programming framework.
“The complication is – it may be measured as simply as lines of code. If you are, if you’re bouncing back and forth between many different programming models, you’re going to have more lines of code,” Costa said.
For one, Nvidia is involved in a C++ committee that is laying down the piping that orchestrates parallel execution of code that is portable across hardware. A context might be a CPU thread doing mainly IO, or a CPU or GPU thread doing intensive computation. Nvidia is specifically active in bringing standard vocabulary and framework for asynchrony and parallelism that C++ programmers are demanding.
“Every institution, every major player, has a C++ and Fortran compiler, so it’d be crazy not to. As the language is advanced, we arrive at somewhere where we have true open standards with performance portability across platforms,” Costa said.
“Then users are of course always able, if they want to, to optimize with a vendor-specific programming model that’s tied to the hardware. I think we were arriving at kind of a mecca here of productivity for end users and developers,” Costa said.
Standardizing at a language level will make parallel programming more approachable to coders, which could ultimately also boost the adoption of open-source parallel programming frameworks like OpenCL, he opened.
Of course, Nvidia’s own compiler will extract best performance and value on its GPUs, but it is important to remove the hoops to bring parallelism to language standards, regardless of platform, Costa said.
“Focusing on the language standards is how we make sure we have true breadth of compilers and platform support for performance model programming,” he explained, adding that Nvidia has worked with the community for more than a decade to make low-level changes of languages for parallelism.
The initial work was around the memory model, which was included in C++ 11, but needed to be advanced back when parallelism and concurrency started taking hold. The memory model in C++ 11 focused on concurrent execution across multicore chips, but lacked the hooks for parallel programming.
The C++ 17 standard introduced the groundwork for higher-level parallelism features, but true portability coming in future standards. The current standard is C++ 20, with C++ 23 coming up.
“The great thing now is because that piping has been laid, if you start looking at the next iterations of the standard, you’ll see more and more user facing and productive features that are going into these languages, which are truly performance portable. Any hardware architecture in the CPU and GPU space will be able to leverage these,” Costa promised. ®