Although Samsung has announced that it will use the Exynos 7 series processor on the next generation flagship mobile phone, the 5433 is still Samsung's strongest processor in terms of the machines that have been published so far. This processor is the first mobile phone processor that uses the 64 bit A57 architecture of ARM. How is its performance different from that of the previous generation?
Before analyzing processors, let's take a look at the various derivative models of Note 4. There are 15 models in total:
According to the above table, Note 4 of the S805 processor has a total of 9 versions, each of which has a slightly different baseband RF settings. Exynos has a variety of optical basebands. The N910U uses Intel's baseband, the Korean N910S/K/L uses Samsung's own baseband, and the remaining N910C and N910H use Ericsson's baseband, but the N910H does not support LTE networks.
Are you dizzy? Most of the Exynos versions currently on sale use Ericsson's baseband. The models are N910C and N910H, which correspond to the South American market, Central and Eastern Europe, Africa, the Middle East and some Asian countries, respectively. This baseband was released as early as a year and a half ago. It has been unknown until now that Note 4 is widely used. I don't know whether Ericsson will ever leave the baseband, What's more interesting is that the corresponding 3G model Note 4 uses the M7450 baseband. In the past, the 3G network Note mobile phone used the Intel baseband.
However, in the end, these 15 different Note 4 models all come down to whether consumers choose the S805 processor or the Exynos 5433 processor (of course, only the S805 processor is available for the Note 4 of the Bank of China, which is sold through the regular channels on the mainland), and the rest will not be very different (the Bank of China has 3000 mAh batteries with dual standby, while the single card version will have 3220 mAh batteries), The next tests are all based on Note 4 of N910C, and the firmware version is KTU84P N910CXXU1ANJ8。
Well, let's go to the main body. Let's take a look at the Exynos 5433 processor. This processor is the first mobile processor that uses ARM's A53/A57 architecture and T760 GPU. Although Samsung did not specifically emphasize this, the 5433 basically did not run. However, after the 5433, Samsung released a new Exynos 7 series processor, and the 5433 is numbered, It is still a 5-series product.
When 5433 first came out, the author doubted whether 5433 would release its seal on 64 bit computing capability. After analyzing the N910C kernel, we found that the attitude of the kernel towards this processor was no different from that towards A7/A15 processors, and the whole software was basically 32-bit, Note 4, the Exynos processor version in some regions, has also been upgraded to Android 5.0. However, after the actual upgrade to 5.0, the 5433 still runs under 32-bit. It seems that Samsung has its own reason to position the 5433 as an Exynos 5 series processor, and the hope for 5433 to obtain 64 bit processing capacity in the future is also slim, Is it necessary to take care of the feelings of consumers who bought the S805 processor version?
The GPU of the 5433 continues to use the 6-core design, which means that the number of cores is the same as the 5420, 5422 and 5430 already launched. After obtaining the Galaxy Alpha, we found that the 5430 processor is equipped with 825MHz memory, so the memory bandwidth of the 5430 has been reduced from the widely known 17GB/s to 13.2GB/s.
On the other hand, the difference between 5433 and 5430 is actually very small because they use almost the same auxiliary unit IP design, the same ISP, hardware decoder and encoder, various interfaces and bus architectures.
A big difference between 805 and 5433 is the display controller. Because the screen specification of Note 4 has reached 1440 × 2560 and needs to support 32bpp and 60fps, the controller with a single 4-channel MIPI DSI protocol is not enough, so Qualcomm uses the dual DSI design, as shown below.
The S805 adds two DSI interfaces, each of which has four channels. The screen is divided into two 720 × 2560 for driving, Because Exynos processor has MIC (mobile image compression) technology, only one DSI interface is needed. Although bandwidth is saved, the disadvantage is that a unit for decompression needs to be added to the display driver to ensure the normal operation of the entire display module.
However, since Note 4 uses Samsung's own AMOLED display module, how to customize it is Samsung's own business. After using bandwidth compression, it can save up to 150mW of display power, and on average, it can be close to 100mW. This saves considerable power consumption when you need to refresh the screen frequently, As for the power saving of static image display, it is left to the panel self refresh technology.
Since 5430, Samsung has used a 20nm process for its processors. However, compared with the 28nm process, it is difficult to calculate how much power can be saved by the 20nm process. After all, the processes of each company are different. For example, the power consumption of Samsung's 28nm HKMG is different from that of TSMC's 28nm HPM.
It can be seen from the above table that after the process improvement, the average power consumption of the A15 core at each frequency has decreased by 24%, while the power consumption at the maximum frequency has decreased by 29%, The average power consumption of the A7 core has even dropped by 40%, and the power consumption of the A7 at the maximum frequency has even dropped by 56%. It seems that Samsung has optimized the core layout and area, and it is not difficult to find that the core area is different even after using similar large and small core designs.
But looking back at Qualcomm, the 805 consumes 965mW at a high frequency of 2.7GHz, and 57mW at 300MHz. This also illustrates the efficiency advantage of HPM from one side (of course, this power consumption does not include the power consumption of the second level cache). The voltage required for the 805 to reach the highest frequency is also lower than 5430 and 5433.
Back to 20nm, the biggest advantage of 20nm over 28nm is that it can save CPU area. After using 20nm, the area of A7 core has decreased by 45%, while the area of A15 core has decreased by an exaggerated 64%, and the area of their respective auxiliary groups has also decreased by about 15%.
However, in terms of GPU, the area of the 5430 GPU is larger than that of the 5420, because the 5430 greatly increases the cache area. However, for Samsung, 20nm is only a springboard after all. The 5430 and 5433 have also become one of the few processors made of 20nm under Samsung. In 2015, Samsung still focused on developing 14nm FinFET technology. According to the information released by Samsung, The 14nm high-end processor has been put into production as early as the middle of November 2014, and by 2015 we will be able to play with the mobile phones that officially use the 14nm process processor (maybe the Exynos 7420 on the Galaxy S6).
In general, Samsung's 20nm chip is not as mature as TSMC's 28nm process, although it leads the 28nm chip in numerical value.
The A53 architecture is the first step of ARMv8. As the successor of A7, The A53 has the same ultra-low power consumption, small chip surface age and relatively passable performance, but the biggest change is to extend the A7 architecture to 64 bits. In the low-end Android market, A7 (including the current A53) is basically the only one. However, the biggest significance of Exynos high-end processors using A7 and A53 is to balance the overall power consumption with the structure of large and small cores.
Like A7, the A53 uses the sequential execution design, which also makes the A53 a wonderful flower in the ARMv8 camp. The A57 architecture learned from the same school gains high performance with huge chip area and power consumption. The goal of the A53 is to maximize the sequential execution performance under the premise of small size and low power consumption, which also forces ARM to optimize the sequential execution design as much as possible.
Peeling the A53 and cramping it, we can find that the execution characteristics of A53 and A7 are the same. The L1 data cache and instruction cache matched with the processing unit can range from 8KB to 64KB respectively, L2 caches have more options - ranging from 128KB to 2MB. For 5433 processors, 512KB L2 caches are used. The A53 CPUs are pre connected through a wide interface, enabling each core to access L2 caches sequentially.
At the same time, ARM also optimized its dual transmit design on the A53 to improve the dual transmit performance of the processor itself. On the A7 processor, slot-0 (slot 1) in the dual transmit is fully functional, while slot-1 (slot 2) can only transmit lower branches or integer data, while on the A53 processor, Slot-1 increases the ability to send load/store instructions and FP/NEON instructions, which makes slot-1 and slot-0 have the same functions. That is to say, as long as ARM still adheres to the dual transmit design, the performance of A53 should only be limited by the number of CPU cores.
ARM also adds new conditions and indirect prediction units to the A53. The former is a 6Kbit Gshare predictor, and the latter has 256 entries and a history function, which improves the branch prediction performance of the A53, increases the hit rate, and reduces the time waste caused by false prediction.
In terms of power consumption, the A53 also has the optional function of switching power modes. The newly added detention module can control the power consumption of each CPU core.
In terms of performance, ARM claims that the A53 has the same performance as the A9 at the same frequency. When considering the performance of the A9 and the core area of the A53, what ARM has done is quite impressive.
But when we looked at the performance test results of the 5433 processor, it was really hard to think that the A53 core of the 5433 processor had a 512KB L2 cache. Because the delay curve rose earlier and faster, this performance was no different from the 256KB L2 cache design.
After seeing the delay, let's take a look at the read bandwidth. The performance of 5433 makes us feel very interesting again. To be honest, we don't know whether it is because ARM has improved the dual transmit design, or Samsung's mysterious optimization, or both. From the perspective of performance, the A53 core of 5433 has nearly twice the performance of the A7 core of 5430 in reading bandwidth, There are still many magical places in the world.
Finally, there are two more words about the dual transmit bottleneck of A53. ARM has optimized the dual transmit design on A53 as much as possible. Perhaps the dual transmit design has reached the top. ARM's latest CPU architecture does not update A53, but releases A72, the successor of A57. We are still looking forward to the successor of A53.
In terms of performance, the Galaxy Note 4 of 5433 has not enabled AArch64 after upgrading Android 5.0. That is to say, the 5433 of Note 4 runs in 32-bit anyway, which means that the performance of ARMv8 should not be fully utilized, and the test results used for scoring also use 32-bit software.
Comparing the A7 of the 5430, generally speaking, the score of the A53 is about 30% higher than that of the A7 in the run of the SPEC CPU2000, just as the running frequency of the small cores of the 5430 and the 5433 is the same, so it is quite convenient for us to compare.
In the above score sheet, ART has the biggest improvement. This test uses image recognition/natural network work to test the floating point performance of the CPU, while VPR and TWOLF tests with declining scores should be due to the L2 cache performance mentioned earlier.
In GeekBench 3, ARMv8 benefited from the new encryption instructions, which made the performance of AES and SHA improved unprecedentedly. If you leave these two items aside, the overall improvement looks "normal", but still impressive, with an average score of about 49%. We also noticed that the performance improvement of single thread BZip compression and decompression test is not so impressive, but it is huge when it comes to multithreading performance.
In the floating point test, A53 showed a huge improvement again compared with A7, especially in the case of multithreading.
When testing the power consumption of a small core, we need to manually turn off the large core to ensure that the large core will not wake up during the test, resulting in inaccurate testing.
It is easy to see from the test results that the power consumption of the A53 is greater than that of the A7. However, with the increase of the number of threads, the increase in the power consumption of a single thread is reduced from 99 to 75 to 66 milliwatts. However, in general, the increase in the power consumption of the four core A53 relative to the four core A7 of the same frequency cannot be ignored. As far as the current design of large and small cores is concerned, It's actually not good to evenly allocate work tasks to each core - this forces each core to run at the same frequency without its own independent power control. For example, when it needs to wake up in standby, things that only one core can accomplish will cause four cores to wake up at the same time, which has an impact on power consumption that cannot be ignored.
When the four cores are running at 1.3GHz (the maximum frequency of the small cores of the 5430 and 5433), the power consumption of the A53 core is 122% higher than that of the A7 core. Combining the frequency power line chart above, we can also find that the power consumption of the small core rises rapidly after the frequency exceeds 1GHz, while the performance/power ratio of the 5433 and 5430 is lower than that of the 5433 core.
However, it is not objective to only look at the frequency and power consumption, because the power consumption of the CPU in the same process has a lot to do with the chip area, and the A53 has added many new components on the basis of the A7. When comparing the 5430 with the 5433, we found that the CPU core area of the latter is 1.75 times that of the former, while the cluster area is 1.38 times.
Although the performance of the unit power consumption has decreased, ARM's A53 is still worthy of respect. After all, the performance improvement is absolutely a good thing, not only for large and small cores, but also for processors that only use the A53 core. As long as the frequency and number of cores are increased, the performance of high-end flagship phones (such as the recently popular MT6752) will be comparable, The remaining question is whether the A53 can continue to be strong in the future competition with the increase of frequency and the upgrade of memory.
As for A57, it is naturally the successor of A15 released by ARM, but A57 has gained a lot of evolution and continuity on the basis of A15, The ARMv8 instruction set makes A57 have stronger single thread performance and higher efficiency than A15, while retaining the advantages of A15. The final result is that although A57 is not a new leap, at least the updated functions and optimizations are very promising.
The A57 is undoubtedly a design based on disorderly execution. Compared with Nvidia's Denver, The A57 is more "traditional". To put it simply, the A57 is designed to improve the performance on the basis of the A15. As for the energy consumption ratio, ARM hopes that the A57 can achieve a performance improvement of 25%~50% on the basis of 20% higher power consumption than the A15. As a result, the performance of the unit power consumption can be improved. Of course, the premise is that the performance improvement can be more than the power consumption improvement.
For the A57, the uncertain factor is the manufacturing process. Because the A57 can be produced with the 28nm process, Samsung chose to use 20nm to produce the Exynos 5433 processor in order to further optimize or show its technical strength (?). More advanced manufacturing processes can naturally reduce the CPU area, and the heap core is naturally more handy, In theory, the more advanced process is also beneficial to power saving, but Samsung may not be the only company to use the A57 in the future, which means that the specific performance of the A57 can only be "Master leads the practice in individuals", and in general, ARM is also trying to improve the IPC (Instruction per Clock) of its own processor. The higher IPC also allows manufacturers to reduce the main frequency of the processor appropriately, which means that the voltage and power consumption can be reduced moderately.
Compared with A15, A57 has countless small improvements. The L1 cache has been expanded from 32KB to 48KB in A15, and the combination degree has also been upgraded from two-way to three-way. This is naturally a change made to improve performance, but it does not exclude that the larger instruction set of ARMv8 makes ARM have to be upgraded. At the same time, the branch target cache used to store old branches and help predict future branches has also been upgraded to 2K~4K. As for data, the L1 data cache remains 32KB unchanged.
L2 cache can be customized by manufacturers according to their own needs. The size ranges from 512KB to 2MB. It is a 16 bit set associative structure. Each A57 core has its own interface on L2 cache, so there is no problem of sharing bandwidth at the interface level.
At the same time, ARM has also optimized the instruction delivery and execution on the A57. First, the out of sequence execution has increased. Although ARM has not announced the specific increase, it is said that it can process 128 instructions at the same time. The registry also improves the compatibility of AArch64. Now every 4K segment can be divided into 128 32-bit or 64 64 bit segments for processing, It makes the corresponding data packet smaller after switching to 64 bit to avoid wasting space in unnecessary places.
As for the pipeline itself, both integer and floating-point units have achieved performance improvement and 64 bit compatibility. Similarly, ARM has not published specific technical details. However, as far as we know, the data channel of integer has both 32 and 64 bits. Although this increases complexity, it also avoids the power consumption of 32-bit data conversion before passing when only 64 bit channels are reserved.
The floating point/NEON unit has also been upgraded from 64 bit to 128 bit, and can achieve double NEON performance on the premise that the FP unit can be fully fed. However, ARM does not equip it with multiple data channels. ARM said that they have optimized in other aspects to control the overall power consumption. The A57 also supports the optional encryption acceleration unit, which can improve the performance of AES and SHA1/SHA2-256.
Finally, with the improvement of load/storage unit, the data in A57 can bypass ambiguous storage, which can lead to a 5% performance improvement. The addition of independent predictor can also improve the disordered execution performance by preventing A57 from over predicting during prediction storage.
There is no difference between 5430 and 5433 in the memory latency of L2 cache, but 5433 is better than 5430 in the stability of bandwidth. As for memory latency, Samsung may have deliberately set this, and the improvement of bandwidth performance should come from the improvement of A57 itself compared with A15.
ARM has always had a reputation for memory performance, but charts may not represent actual experience, and communication between SoC and memory controller cannot be ignored. ARM's approach is different from that of Apple and Nvidia, ARM uses independent read and write data interfaces. At the cluster level, ARM uses two 128 bit interfaces (one for read and one for write) to cross the memory controller of the SoC and the CCI (cache coherently interconnected) architecture. The working frequency of the 5430 and 5433 CCIs is half of DRAM, that is, 412.5MHz, and the converted unidirectional physical bandwidth is 6.6GB/s.
But to be honest, today's testing method is not good for ARM, because the measured bandwidth is only one-way, and in fact, the maximum bandwidth can often be twice the test value. In fact, when the read and write tests are executed at the same time (of course, in the form of multithreading on two CPUs), the bandwidth reaches the theoretical peak of 13.2GB/s of the memory controller. More interestingly, ARM seems to have the same settings on L2 cache, so the L2 cache bandwidth of A15 of 5430 can reach 25GB/s, and that of A57 of 5433 can reach 27.5GB/s
As for why ARM should be designed this way, we don't know. Maybe it is because of power consumption and delay, but in general, the score data can never fully represent the actual use experience. The A57 looks like the next step in the natural development of A15. Even the pipeline is not much different, but it can better feed them through AArch64, ARM and its partners may also want to use the 20nm process or even the 16/14nm process to further squeeze the performance of the A57. However, as far as the current situation is concerned, the power consumption of the A15 is quite impressive, and whether the A57 is cost-effective in exchange for performance with higher power consumption remains to be tested.
When looking at the score of SPEC, it is not difficult to find that the performance improvement of A57 on A15 is far less than that of A53 on A7. On average, it can be 25%. Considering the difference in the maximum dominant frequency (the maximum dominant frequency of A15 in 5430 is 1.8GHz, and the maximum dominant frequency of A57 in 5433 is 1.9GHz), the performance improvement of the same frequency may be only 18%.
The integer score of GeekBench is also the same. Of course, on the premise of ignoring the score of the encryption part, the overall performance improvement is 31%, and the performance improvement of the same frequency is 29%.
The floating point part increased by about 21% overall.
When measuring the power consumption, because the main frequency of the 5433 is higher, it is tested twice separately, using 1.9GHz once and 1.8GHz once.
In terms of power consumption, the difference between the two is a bit amazing. The relatively lower power consumption of the 5430 is very impressive. In the past, the power consumption of the A15 (5410 and Kirin 920) we tested was far more than 1.5W per core. It seems that the use of r3 A15 and 20nm by the 5430 has greatly reduced the power consumption of the A15 core, and even matched that of Qualcomm Krait, Samsung's efforts to develop A15 in the past year were not in vain, and the efficiency of 5430 even reached twice that of 5420.
Today, however, the A57 is the leading role. Let's say it is. The power consumption of the A57 is rather unattractive. In terms of single core power consumption, the power consumption of the A57 is even twice that of the A15. This is quite worrying. When the four cores are all open at 1.9GHz, the power consumption of the CPU even exceeds 7W. The resulting heat will naturally lead to frequency reduction.
As for the chip area, the large core of 5433 is 4% larger than that of 5430.
However, in contrast to the small core, the performance/power ratio of the large core of the 5433 is higher than that of the 5430, which seems unscientific. However, the testing of the BaseMark OS II is conducted using XML, enabling each core to freely control its power consumption and performance in a more realistic way.
ARM claims that the A53 and A57 can bring better and more mature power consumption and frequency control. However, at present, we can compare the two Samsung machines, but in terms of the above tests, the A57 really outperforms the A15 in efficiency.
When we look at the 5430 and 5433 as a whole, we conduct the performance/power consumption ratio test, and the situation is a little different. The performance of the 5433 is a bit unsatisfactory. When the A53 and A57 are used at the same time for this performance test, the performance is not as beautiful as that of the A53 alone, or even better than that of the A57 alone, This should be due to the workload frequently switching between large and small cores.
As for the reason why the 5433 has such performance, we are still not sure. Maybe it is because of the running score software itself or the data that causes the heavy workload of CCI. Maybe the CCI-500 released by ARM some time ago can solve the problem. Although our results cannot represent everything, we should say that the actual use does not coincide with the test results, That is also not shown.
In addition, if you are familiar with Samsung, you should know that Samsung's Chrome browser will lead to a decline in the performance of its web pages. So we use Samsung's own browser with God Oil for our web page tests. In the efficiency test results, 5430 defeated 5433 again. Although the former has only 4% advantage, that is also an advantage, However, it is understandable. After all, the 5430 is Samsung's sixth generation processor using A15, and the A57 has only developed the second generation in Samsung's hands (including their GH7 server processor). Of course, we hope Samsung can optimize it in the future development, or simply make a major update, replacing the A57 with the recently released A72.
On the mobile phone product line, the Note 4 using Exynos 5433 is definitely the first "phone" of the new generation. In the current Android mobile phone market, A53 and A57 are the strongest processor architectures, and have considerable advantages over Qualcomm 805 in terms of performance. If Samsung can open AArch64 to the 5433 of Note 4, the performance advantage of 5433 in terms of CPU will be more obvious, but the disadvantage of 5433 is also prominent. The power consumption of 5433 is quite high when giving full play to its full performance. Although its performance has been improved, its efficiency is difficult to be guaranteed, ARM said that they would further improve this aspect in the future. We'd better wait for it.
original text