Equipped With a Brain That Surpasses Computers: Strengthening Cooperation With for CPU
In this second series installment, Samsung Newsroom sat down with two project leaders at Samsung Electronics to better understand the role of CPU and NPU in mobile devices. A computer’s central processing unit (CPU) is often compared to the human cerebrum, the largest part of your brain that handles many responsibilities. Similarly, the CPU is the most important unit that deals with a computer’s four main functions, which are memory, decoding, operation and control. CPU is the factor that determines the overall performance of a PC. Likewise, a mobile CPU runs all software on an operating system (OS) and controls other hardware peripherals, helping a smartphone perform at its optimal level.
CPU performance is determined by a variety of factors, including the clock speed,1 IPC2 and the number of cores.3 The phones of the past were powered by a single-core CPI with a simple pipeline structure. Consequently, there were limits in handling parallel processing, and the maximum frequency only amounted to a few hundred MHz. However, the CPU in smartphones today has a superscalar4 structure, allowing it to execute parallel processing for various commands or instructions. Additionally, it can run at 3 GHz speed, or 3 billion cycles per second, and have eight or more multi-core structures. Mobile CPUs now have a microarchitecture that pushes the performance beyond desktop CPUs.
Exynos’ CPU has evolved from a big core to a big-little and then a big-mid-little structure to keep its size small and power consumption low. Big-little structure is a processing architecture concept that dynamically switches between two types of cores — a big and a little — to maximize performance or maximize power efficiency, depending on the task. For example, the CPU performance needed for texting versus playing a 3D game is different. Therefore, when sending a text, the process uses a smaller, power-efficient core instead of a high-performing core.
“CPU determines the competitiveness of all systems, including the SoC. It’s an influential area and the top priority when it comes to developing advanced semiconductor technology,” said Wookyeong Jeong, the SoC Design Team 2’s project leader who is in charge of all tasks related to the Exynos’ CPU. Jeong has worked in the CPU field for more than 20 years since joining Samsung.
“Achieving a high performance with a limited power budget is key,” said Jeong. “It is important to operate different types of CPU cores, including big, mid and little in appropriate combinations to achieve maximum efficiency in various situations.” Exynos’ CPU optimizes a combination of activated cores to deliver users the best experience in situations requiring high performance, such as playing a game or using a camera on mobile devices.
Based on the IP of semiconductor design company Arm, Samsung Electronics is taking the performance of CPUs up a notch. When Jeong was asked about the specific tasks of the team’s developers, he explained the team’s role and responsibilities.
“We decide the performance goal for the CPU of a product, acquire the CPU IP, predict and review the performance, validate and conduct debugging5 before mass production and further steps. We take care of the overall development work to enhance CPU performance,” Jeong explained. “The System LSI Business is responsible for taking the RTL CPU design from Arm to create an optimal semiconductor chip,” Jeong said. “The team is also responsible for designing and creating the CPU peripheral circuit, such as an appropriate memory subsystem, for maximizing CPU performance.”
“With the adoption of Arm CPU, we have a vision of becoming the mobile industry’s best CPU manufacturer by optimizing software not only on a chip level but also on a device level. We aim to become an E2E6 total solution provider,” said Jeong when asked about the future development direction of the company. “To achieve this goal, the CPU developers have been working very closely with Arm, device manufacturers, Samsung Foundry and others as one team since the early development stages. In addition, they’re seeking various ways to enhance performance, such as advanced packaging technology that enhances performance further,” Jeong explained.
“With the emergence of AR and the metaverse, appropriately utilizing all processors, such as CPU, GPU and NPU for comprehensive machine learning processing on a SoC level would give us an important, competitive edge. We’re going to focus on increasing our competitiveness by strengthening the CPU’s performance in machine learning processing as well,” Jeong added.
Real, Imaginative Technology: The Advancement of NPU Based on Proprietary Technology Throughout Six Generations
An NPU is a processor optimized for deep learning7 algorithm arithmetic. It can process a large amount of data as fast and efficiently as the human neural network. For such reason, it is mainly used for AI arithmetic and computation. While it may seem complicated, it is already commonly used in devices. For example, thanks to NPU, a smartphone’s camera can recognize and focus based on the objects, environment and people in the frame. It can automatically switch on the food filter mode for food photography or even remove unwanted subjects in the picture.
In the past, when NPU did not exist, GPU mainly performed AI computation. However, the computation efficiency8 was low due to the hardware’s structural differences. These days, the NPU is mainly in charge of AI computation, and it can process data more efficiently in mobile devices as well. It’s optimized for parallel data computing so that AI-based applications can run faster on low power.
Exynos’ NPU development began in 2016. The first SoC equipped with the NPU was Exynos 9820, which was embedded in the Galaxy S10 that was released in 2019. “When the first task force was formed six years ago, we had only about 20 people, but now our team has grown tenfold if we include the members from our overseas research institutes,” said project leader Suknam Kwon. Kwon used to design the hardware of the SoC and has been working on the NPU since its second generation. “The NPU is an area of high interest these days, but back then, it was so unfamiliar and new that we had to learn from videos and university lectures overseas.”
In the past, there were few applications for the NPU, including detecting objects based on images. However, in the era of AI, market demand for high-performing IP requiring a large amount of computation is increasing. This can be used to perform tasks such as improving camera picture quality, voice services and more. In addition, since size and power consumption increase as IP performance is enhanced, determining the most efficient architecture is key.
As NPU gets more powerful, it offers improvements in object recognition speed or photo enhancement. The performance of the NPU equipped in the latest Exynos is two times more enhanced compared to previous generation. By independently developing the NPU for six product generations, the SoC Design team’s expertise and know-how in NPU technology is second to none. “With advantages in benchmark such as the ML Per, power efficiency, size, etc., Exynos’ NPU is a highly competitive IP solution,” Kwon said. “Through optimization of architecture for performance and improvements in power efficiency, the NPU adds competitive value for Exynos processor,” he said.
Going forward, the technologies that utilize NPU will continue to evolve. “I think the on-device AI, which performs AI computation in one’s smartphone rather than going through a server, will become more widely used because there is less risk of having sensitive personal information leaked,” Kwon said. “Because of this, mobile NPU performance needs to be even more enhanced. These days, one NPU is used for many computations, but I predict that there will be more demands for operating specialized AI algorithms for each application program. So, developing an NPU that is specialized for each domain will be important as well,” he added.
When asked about autonomous driving, Kwon discussed the role that NPU will play in the industry. “In the near future, the advanced driver-assistance system (ADAS) will become a reality,” Kwon said. “It requires hardware that can perform autonomous driving algorithms using a massive amount of data in real time. To accomplish this, a higher-performing NPU is needed, and Samsung is preparing an NPU with powerful capabilities for autonomous driving devices that meet the market’s demands.”
At the end of the interview, Kwon explained the most meaningful moment that occurred during development. “Each year, Exynos comes with a higher-performing NPU that is increasingly enhanced, which is very meaningful,” he said. “It will continue to become a key IP for future markets. I take a lot of pride in the fact that developing NPU has led to the growth of both myself and the company — and even contributes to the country’s overall competitiveness,” he said. “It’s the best field where it makes the things in one’s imagination come true.”
* All images shown are provided for illustrative purposes only and may not be an exact representation of the product or images captured with the product. All images are digitally edited, modified or enhanced.
1 Clock: Continuously generates electric oscillation of 0 or 1 for computation. It’s expressed in Hz, and a higher clock figure means a faster processing speed.
2 IPC (Instructions per Cycle): Instructions processed per clock. It measures the clock needed to process one command or instruction. IPC is the unit that assesses how efficiently a CPU is operating.
3 Core: The key part in the physical processing circuit within the CPU. The more cores there are, the easier it is to perform multiple actions at the same time. Single-core means there’s one core, dual-core means there are two, quad-core means there are four and so on.
4 Superscalar: An architecture that combines the advantages of pipeline and parallel processing and enables instructions from multiple pipelines to be processed in parallel. The processing speed is fast because multiple instructions can be executed at the same time without having to go through waiting status first.
5 Debugging: A process of checking whether the designed program is accurate, identifying program errors and fixing them.
6 End to End
7 Deep Learning: Technology that enables a machine to learn, infer and reason like human beings using data.
8 In mobile SoC, efficiency means it uses less power or has faster speeds.