In the early morning of this morning, ARM released the ultra-low power processor architecture Cortex-A35 (code name Mercury) at its own technical conference. Manufacturers can redesign its architecture according to performance and power requirements, and use it in different fields.
The Cortex-A series is divided into three categories: high performance, low power consumption and ultra-low power consumption:
The representatives of high-performance series are of course the large core architectures of ARM, Cortex A57 and A72 (as well as A15 and A17, which are slowly delisting);
The low power consumption series is represented by the high performance ratio A53. According to the demand, it can use multi-core, or big LITTLE works in the form of large and small cores;
The ultra-low power consumption series, after A5 and A7, has now added A35.
Although it seems counter intuitive, A35 and A53 are not in the same series. A35 is the successor of A7/A5 (ARM's product number is growing fast...).
In the Exynos 5433 test, we found that the A53, as an upgraded version of A7, still cannot maintain the power consumption level of A7. It is more like a product that extends the performance curve of A7, which means that the performance power consumption ratio of A53 has not improved much. But because the A53 is generally more high-frequency, it can go further on the performance side. The A35's main promotion of the energy consumption ratio is actually a slap on the face of the A53 that boasted the energy consumption ratio that year.
However, in a strategic sense, the most important change brought about by the A35 is to make the Cortex-A series upper, middle and lower three-way products use the 64 bit ARMv8 architecture. After the retirement of old frameworks such as A5/A7, it will be able to fulfill its ambition of "64 bit for the whole family". In addition, A35 can also form a big with A72/A57/A53 and other cores The LITTLE SoC with large and small cores can enable manufacturers to produce a combination of products with different sizes of cores according to their needs. To exaggerate, they can even produce a SoC with three core clusters of A35+A53+A72.
Architecture improvement
The A35, like the A7/A53, is still a sequential dual transmission architecture. The length of the 8-level pipeline is no different from the previous architecture design. ARM mainly improves the energy efficiency ratio by improving individual blocks.
The key improvements are the core front-end efficiency, the redesigned instruction prefetching unit, and the stronger branch prediction performance. In order to balance the performance and power consumption, the instruction prefetch bandwidth is also adjusted as the queue length becomes shorter.
Compared with A7, A35 has significantly improved the cache speed. The A35 uses many A53 cache structures. The first level cache can be used as an instruction and data cache. It adds multi-threaded data prediction and write detection capabilities, which can be set between 8-64KB as required. The L2 cache can be set between 128KB-1BM. While improving the write efficiency and introducing the attachment optimization function to reduce the occupancy, it also improves the performance of resource sharing.
Another major improvement is the NEON/FP pipeline. In addition to improving storage performance, the new cell full pipeline supports double precision multiplication. The pipeline has also been improved to improve local efficiency, which is an important reason why the A35 chip area can be smaller than the A53 chip area.
In terms of power management, the A35 is also very similar to the A53. It adds a state reservation function to the CPU and NEON pipeline (independent power supply area) (to keep the core in the original state during sleep, similar to the gravestone mode of iOS). It has an additional hardware to control the CPU to enter and exit the reserved state. The manufacturer seldom uses this function. It is temporarily confirmed that the Snapdragon 810 has introduced this function, but perhaps because of the heating problem, it also disabled this function in the subsequent software update, allowing the core to be shut down directly when the CPU is idle.
Low power consumption/customizable/ultra small chip area
The power consumption range of A35 is below 125mW, which is also the usual power consumption range of A7 and A5. To facilitate our brain compensation, ARM said that the power consumption of the 1GHz A35 processor with 28nm process is only 90mW. Of course, just like the A53 in different SoCs today, the power consumption of the A35 core will fluctuate greatly (for example, using the 14/16nm process, or pushing the main frequency to 2GHz) because of different core numbers, main frequencies, and process nodes.
The A35 is also the most customizable architecture of ARM. Manufacturers can choose not only the number of cores but also whether to join NEON, Crypto (encryption) and ACP (parsing) units, and even whether to add the second level cache can be selected by yourself. This makes A35 possibly the most widely used core architecture in the next generation of mobile phones. We are also likely to see a lot of A35 processors customized for IoT (Internet of Things) on wearable devices and embedded platforms.
Compared with 32KB first level cache, 1MB second level cache, 4-core A53 with NEON/Crypto encryption module, and beggar version single core A35 (8KB first level cache, no second level cache), even with 28nm process, the chip area can be controlled below 0.4 mm2 (1 * 0.4mm, which is much smaller than rice), which makes it very suitable for making Internet of Things chips.
Performance improvement
With the same number of cores and frequency, the A35 architecture claims 10% lower power consumption and 6-40% higher performance than A7. In integer operation (SPECint2006 test), it can increase by 6%; The floating point operation (SPECfp2000 test) is improved by 36%.
As required, A35 can achieve 80-100% performance of A53. In the browser load test, the gap between A35 and A53 is the largest, and A35 can only provide 80% performance of A53 at most; In the integer operation load test, the A35 can achieve the performance of A53 84-85%; The maximum cache performance improvement, reflected in the load test that requires cache performance, can be close to A53 performance.
The chip area of A35 is only 75% of A53, and the power consumption is only 68% of A53. ARM claims that they can use it at the same time, and even make SoC of large and small core architectures with A53.
As long as the frequency of A35 is increased, the performance gap between A35 and A53 can be filled. Even in the worst case scenario, the high-frequency A35 will eat up its advantage in power consumption, but manufacturers can still get smaller chip area, which can still save a lot of money.
If, as stated in PPT, it is no longer necessary to use the A53 architecture, whether it is to form a multi-core SoC alone or to form large and small core structures with A72 and other large cores, A35 is likely to replace A53.
ARM expects that the first A35 equipment will be shipped at the end of 2016. Because of its wide use and adaptability, it will undoubtedly become one of the most important core architectures in the ARM series in the next few years.
In this case, after taking over the work of A5/A7, A35 will also let A53 take the money
However, as a running party, the author is more concerned about the performance of the high-performance core A72. After all, if you want to compete with Apple's A9, Qualcomm's Kryo and Samsung's mongoose at the same time, you will have to take advantage of it carelessly.
via: