[Popular Science of Computational Photography] The algorithm engineer answered in person: 14 common questions about mobile photography | Ainongji

Faced with the "algorithm black box" of mobile phone manufacturers' computing photography, users will inevitably have various questions. For example, "will mobile phone manufacturers really OTA weaken the old flagship camera? The MediaTek platform is really more difficult to adjust? Can the resin lens of mobile phones really feed 100 million pixels of CMOS? Does AI often need brain to fill the details?" Wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait, wait.

This time, we will take the form of questions and answers, Welcome to Jilao's Appointment ， Please come to Hawk Wang, a real computational photography engineer, to answer these age-old questions about electronic packaging.

Hawk Wang works as a photography algorithm supplier in TOP3, and he usually works on the hard core official account“ ”(← The stamp link jumps directly) Share articles related to computational photography algorithms. Welcome to follow.

PS： Famous image algorithm suppliers include Hongruan (with a market value of 10 billion, 21 year revenue of 570 million, R&D of 270 million, and R&D of 444 people), Shangtang (with a market value of 30 billion, 2000 people), Kuangshi Technology (with a market value of 20 billion, and a total of 1400 people), Morpho (Japan, with unknown R&D personnel, and a market value of 760 million), and Core Photonics (Israel, with a market value of 250 million, and 50 R&D people)

If you are interested in the relevant content, you can also preview/review the previous article:

Platform and OTA

Love to play computer games Liang Zai: How difficult is the adaptation of Qualcomm, MediaTek and Samsung products? Is the MediaTek platform really more difficult to adjust? Fage's flagship ISP is far stronger than Qualcomm, but the general response is still that Fage can't take photos (is the basic algorithm of the platform distribution too weak? Interface problems? Or is the integration of algorithm suppliers and MediaTek not enough?)

Hawk Wang： In fact, there is not much difference in the difficulty of product adaptation between different platforms. The difficulty of adaptation and adjustment is due to the lack of experience in using relevant platforms and the lack of relevant talents. In addition, the influence of computing power is not excluded. If there are software algorithms of image quality in the pipeline, the platform with high computing power can naturally use stronger algorithms to get better results (such as more frame fusion, more accurate inter frame alignment, etc.).

Computer Lover: On the same platform, there are generally continuous imaging styles (for example, the Tianji 9000 model of each mobile phone factory has problems such as excessive HDR dark force, unstable white balance, etc.). What might be the cause? Is it because the basic algorithm comes from the same supplier? Or the basic algorithms or parameters of Fage platform?

Hawk Wang： Generally, the algorithms provided by different algorithm manufacturers have their own imaging styles or problems. If the basic algorithm is strongly dependent on the platform, it will also be related to the basic algorithm or parameters of the platform.

Love Computer Handsome: Can the mobile phone factory adjust the parameters of the algorithm provided by the supplier? Will there be a magic revision?

Hawk Wang： Able to fine tune, but not necessarily willing to. There is the possibility of customized versions for specific needs and projects, so there is no magic change.

Wonderful guy who loves computer: there will still be camera OTAs after the launch. Is it the algorithm manufacturer who is still maintaining the project, or is it the manufacturer who has developed its own algorithm commercially, so it can keep updating?

Hawk Wang： Continuous OTA is mainly caused by market feedback. It is the same for algorithm manufacturers and manufacturers to develop their own products. Meeting user needs is the key.

Beautiful guy who likes to play computer games: What improvement/improvement has been made in the OTA to improve the photo taking effect?

Hawk Wang： The improvement points are not certain. There are some feedback from the market that needs to be adjusted, or there may be old problems that have not been solved.

Wonderful guy who likes playing computer games: After the new flagship was released, the old flagship was weakened by OTA. Is that true? Or is there a bug in the post debugging?

Hawk Wang： In general, it is unlikely to use OTA to weaken the old flagship after the release of the new flagship, which is largely due to the psychological role of consumers. In most cases, the new flagship is released, and the manufacturer's developers have no energy to manage the old projects.

Wonderful computer player: What is the main reason why Android video has been unable to catch up with Apple? Is Apple's ISP/NPU too strong? Or does Apple have zenith star algorithm?

Hawk Wang： There are many reasons. On the one hand, it is Apple's ability to integrate the entire link of the entire image system and control the integration of software and hardware. In the Android camp, some paper data is not bad, but the integration ability is weaker.

On the other hand, in the past few years, the Android camp focused more on taking photos. The video investment was a little later than that of Apple, and there were many performances that led Apple in taking photos.

Algorithm and AI details

Lover of computer games: Sometimes when shooting irregular textures such as wood grain and cloth grain with weak light, you will find that each texture is different, because AI maps are used to fill/reverse the missing details? Or is it just that there are too many noise points, leading to common algorithms processing different textures?

Hawk Wang： There are several situations:

1. There is a lot of noise. Smearing the texture results in inconsistent results

2. The algorithm results are highly correlated with the input information in many cases, and the slight transformation of the input will also lead to different results

3. It is true that AI has the ability to reverse the details, but each manufacturer is still relatively restrained in the AI reverse generation of texture. After all, too much distortion is likely to cause consumer aversion

Lover of computer science: the semantic recognition of pictures in various press conferences will be optimized for different parts. How many target recognition and processing are generally supported?

Hawk Wang： There are many categories of scene recognition, generally 10-30 categories, but the semantic based matting optimization is generally less. Although there are many categories that can be done by the algorithm, there is no need to have too many really useful ones. Generally, special processing will be carried out for portraits, faces, skin, sky, green plants, especially the sun, moon, However, the utilization rate will be relatively low.

Wonderful computer player: Samsung/LG has cooperated with algorithm suppliers for many years, and the Galaxy S6 and LG G4 in 2015 use the Hongruan algorithm, right? Why are there still obvious advantages and disadvantages of the same generation of hardware, algorithms, and even algorithms from the same supplier?

Hawk Wang： The effect of the algorithm depends on many aspects. For example, the hardware level determines what kind of algorithm configuration, algorithm parameters, and algorithm inputs can be made. It is a system engineering. Even the algorithms of the same supplier will get different quality outputs when they are given different quality inputs (like computer smart guy: the coffin of LG mobile phone can't hold).

Fantastic computer player: Are there any essential differences between Huawei P20 Pro, Google Night Sight and HDR+multi frame short shutter synthesis in the past 18 years? Isn't there a related algorithm? Why was it possible to guarantee the film rate for only 1/4 second before, but now it is ok?

Hawk Wang： Compared with multi frame composition in the past, the super night scene has two main differences:

1. Integrating the past noise reduction and HDR algorithm capabilities,

2. The algorithm was transferred from the yuv domain and RGB domain to the RAW domain. At that time, conventional ISPs did not have these channels at all, and the manufacturers needed strong system capabilities. Google and Huawei did work on this earlier

Love to play computer games: Huawei's Mate 40 Pro in 2020, Xiaomi's 11 Ultra in 21 years, Vivo's X70 Pro+in 21 years, and OPPO's Find X5 Pro in this year have made significant progress in highlight suppression, with similar characteristics. Is it the algorithm of the same supplier? Or is an algorithm conquered by the industry? Was the reason why I couldn't do it before?

Hawk Wang： It is not exactly the algorithm of the same supplier. There are third-party suppliers and manufacturers who have developed it by themselves. It is a joint effort of everyone. Not to mention that an algorithm has suddenly been conquered by the industry. All these improvements are incremental. These improvements have a lot to do with the continuous improvement of computing power. After all, speed and power consumption are also very important indicators. When computing power is improved, more complex algorithms can be used to solve problems.

Lens and high pixel shooting

Smart guy who loves computer: can the optical resolution of mobile phone lens feed 100 million or even 200 million pixels of CMOS? The high pixel mode of the previous Lumia1020 and early IMX586 mobile phones is foggy. Is the optical resolution insufficient? Or just because you don't have the power to do routine processing?

Hawk Wang： At present, the theoretical optical resolution of the optical lens of some flagship mobile phones can basically feed 100 million pixels of CMOS, but due to manufacturing technology and other reasons, it is actually difficult to feed; Therefore, the insufficient optical resolution is also one of the reasons for fogging in high pixel mode.

Handsome computer player: Can the current high pixel mode of Quad Bayer sensor steadily win the "rocking" effect in low pixel mode? Nowadays, the high pixel mode of mobile phones is mostly low pixel mode interpolation, which is to control the film speed? Or is the optical quality of the lens itself not enough, and the high pixel mode of Quad Bayer has been improved too little?

Hawk Wang： Quad Bayer's high pixel mode may not be able to win the "rocking" effect in low pixel mode. High pixel mode is mostly low pixel mode interpolation, which is determined by the design of sensor pixel array itself.

Sensor and memory occupation

Smart guy who likes playing computer games: The Samsung Galaxy S9 has a 12 in 1 stack in 18 years, but it is still more than 10 in 1 until now. Is it unnecessary to stack up again? Or is there a bottleneck? (CMOS speed? Memory size? Chip computing power? Algorithm model?)

Hawk Wang： After IMX345/S5K2L3 (Samsung Galaxy S9 in 2018), the iteration direction of sensors is different, and DRAM high-speed stack sensors are not added. This high frame rate is expensive, and its cost performance is too low for mobile devices.

Computer enthusiasts: Many flagship cameras will eat 1-2GB of memory first when they turn on their cameras. For the common 12MP main camera, how much memory will be used to generate a picture during the night scene algorithm (is it the memory eater?)?

Hawk Wang： Night scene algorithm is an algorithm with high memory consumption, but it is not absolute. The manufacturer has strict red line requirements for the memory of the algorithm, which can not significantly affect the user's use. In addition, there are not many calculation methods here. The memory will be released after taking photos, and will not be used for a long time. It is difficult to give a definite answer to this question.

Smart guy who likes to play computer games: The situation that too much memory for photo processing leads to killing of the background has been extended from iPhone to Android platform. Is there any possibility or direction for improvement in recent years?

Hawk Wang： At present, the mobile phone memory is very large, and the algorithm generally has a relatively strict memory limit. The algorithm itself generally takes up a small amount of memory space.

Algorithm direction

Handsome guy: What's the bottleneck of mobile phone photography now? A few years ago, the computing power of the chip was weak, and the upper limit could be predicted from the ISP. Is the computing power of mobile chip ISP and NPU still the bottleneck of the running photo algorithm (so we need to plug in the self-developed ISP?). Is the bottleneck ISP/NPU performance and power consumption? Or algorithm price and R&D cycle?

Hawk Wang： For photographic algorithms, the bottleneck of computing power is weaker, and more is the balance of performance, power consumption and effect in product positioning. In video algorithms, computing power is still an absolute bottleneck.

For the follow-up of mobile phone photography, the bottleneck lies in the application direction itself, whether there will be new high-value application scenarios.

Computer geeks: Marc Levoy, who switched jobs from Google to Adobe, said that "the computational photography method of stacking photos has reached maturity, and it is time to find new challenges". Multi frame synthesis or computational photography, is it the second half now? Is there a recognized new breakthrough and direction in the industry?

Hawk Wang：

I think there is still too much potential and possibility. I can't talk about the first half or even the beginning of the second half. At best, this is the end of the first half.

It is hard to say the new breakthrough and direction recognized by the industry. But I am optimistic about various imaging technologies that combine 3D information. In the past, we just fused multiple image frames, added noise/HDR and so on to get good image quality. This is a low level image processing track in computer vision.

Later, semantic information was added, such as segmentation information of portrait and sky, which can be processed differently for different areas on the image. However, it is rare to integrate 3D information, such as the distance of the target, into the imaging and image processing process. But actually in the game field, we know that in order to render a beautiful picture, we usually need to know the color, brightness, depth, normal vector, material and other information of each pixel.

If we can integrate the depth, normal vector, material and other information emphasized in computer graphics into the process of image quality enhancement, we can surely get better images.

I will wait and see, and I believe I will participate in it later.

Follow our Weibo @ Love Computer

Follow our WeChat official account: playphone

Of course, we also pay attention to our Bilibili account: love computer