[Editor's Note]: The author of this article is Love to play computer games Special author and technical expert "Gun God" 。

Many readers do not understand the concept of asynchronous multi-core mobile CPU. As a feature of Qualcomm Snapdragon series, what is the difference between it and synchronous multi-core processors? What are their advantages? How can asynchronous multi-core processors achieve energy saving? In addition, what are the characteristics of "big and small core" processors such as Samsung Exynos 5440? This article will answer for you one by one.

Asynchronous multi-core

Asynchronous multi-core, or aSMP (asynchronous SMP), was proposed by Qualcomm and applied to its own Snapdragon S3/S4 processor. There have been many arguments before, for example, asynchronous multi-core cores cannot communicate with each other, which is called "glue dual core"; In other words, asynchronous multi-core can only have one core to accept instructions at the same time, which is inefficient. Of course, these are actually wrong.

What is asynchronous multi-core? It focuses on frequency asynchrony, which can be called Asynchronous Clock Architecture. In the multi-core processor designed in this way, each core can work under different voltages and frequencies. In this way, heavy computing tasks can be assigned to a core working at high frequency, while tasks with less pressure can be burdened by a core working at low frequency and slower. In synchronous multi-core, all cores can only work at the same voltage and frequency.

As shown in the figure below, when there are two tasks, one with heavy calculation load and the other with light calculation load (the purple part in the figure indicates the calculation load of the task), Asynchronous multi-core allows the core CPU0 with heavy load to work at a higher frequency (the blue part in the figure represents the frequency), while the core CPU1 with light load works at a lower frequency and voltage, thus reducing power consumption. Although the load of synchronous multi-core CPU1 is light, it can only maintain the same high frequency and high voltage as CPU0 due to architecture constraints, thus wasting more energy.

In Qualcomm's actual design Not only can multiple cores work under different voltages and frequencies, but their shared L2 cache can also work under a single voltage and frequency according to the actual load, so as to maximize energy saving.

The asynchronous multi-core architecture looks really nice, but in fact it is not perfect. In some cases, the asynchronous frequency architecture will suffer performance loss 。 One situation is that when a CPU's L1 cache misses and needs to go to the L2 cache to fetch data, it takes more time to complete the data transmission due to the different frequencies of the asynchronous multi-core architecture's various cores and L2 caches, as shown by the arrow A in the figure. For example, the Krait CPU core of Qualcomm S4 can work at up to 1.5GHz, while the maximum frequency of L2 cache is 1.3GHz. If L2 cache is in a lower frequency energy-saving state, the core needs to wait for L2 cache to complete transmission.

In the other case, more performance will be lost 。 When one of the cores, such as CPU0's L1 cache misses and the required data is in CPU1's L1 cache, the data needs to be transferred from CPU1's L1 cache to CPU0's L1 cache, as shown by the B arrow in the figure. If CPU1 happens to have a light load at a low operating frequency, it will take a long time to complete data transmission, while CPU0 working at a high frequency is wasted waiting.

This can also be reflected in SiSoftware Sandra's multithreading efficiency test. Compared with synchronous multi-core Tegra2 (green in the figure), asynchronous multi-core (purple in the figure, Sony Xperia S, Qualcomm 8660) has higher communication latency and lower bandwidth between cores. Of course, Intel Atom (blue in the figure), which uses hyper threading technology, has the lowest latency and the highest bandwidth of communication between cores because the two virtual cores themselves are a physical core.

Size Core

Then, how can we dynamically adjust the core capability according to the weight of computing tasks to maximize energy saving; How about avoiding the performance loss of asynchronous multi-core architecture in some cases? ARM proposed Large and small cores (big. LITTLE) Schema for.

In such an architecture, it includes a cluster composed of "big core" and a cluster composed of "small core". Multiple cores within each cluster, Both belong to the traditional synchronous frequency architecture , working at the same frequency and voltage, so asynchronous multi-core will not lose performance. The "big core" is a high-performance core, which works at a higher voltage and frequency, consumes more energy, and is used to calculate heavy tasks, Typical examples are Cortex-A15. Although the performance of "small core" is low, its efficiency is high. For example, Cortex-A7, although its performance is 1/2 of A15, its power consumption is only 1/7, and its energy efficiency is 3.5 times of A15 Has. In some tasks with low computing pressure, such as sending a text message, you don't need to bother with the A15 core, which has strong performance but consumes a lot of power. You can just use the A7 core, which has enough performance and saves a lot of power.

The large and small cores all use the same instruction set, and the switch is based on the cluster. As shown in the figure above, the system activates two large cores when the task is heavy, and two small cores when the task is light. They are connected through a specially designed bus. When switching, the bus automatically transmits the status of one cluster to another. The switching time is very fast, less than 20 microseconds.

In fact, before ARM proposed the size core, NVIDIA's Tegra 3 already had this meaning. Tegra 3 includes four high-performance A9 cores (equivalent to large cores), and a low performance A9 companion core (small core) with low power consumption design. Of course, there was no big design at that time LITTLE is perfect, and there is no specially designed cache consistency interconnection. The switching time is also long, reaching the millisecond level.

Not surprisingly, the Samsung Exynos 5440 will be the first SoC designed with large and small cores. It will adopt the 28nm HKMG process, integrate four Cortex-A15 as the large core and four Cortex-A7 as the small core. It is rumored that the GPU will also return to PowerVR, which is likely to be used in Galaxy S4.

Size Core

Learn more about new cool devices, stay tuned