Simple science popularization: what does computational photography calculate?
Simple science popularization: what does computational photography calculate?

What is the phone doing after pressing the shutter?

The hardware of mobile photography has hit the ceiling. After the 1-inch sensors are sold all over the street next year, the focus of the manufacturer's publicity may become more and more "algorithm" and "computational photography". Here is a flag to help you preview/review the content of "Computational Photography".

"Computational photography" is not only a black box for users, but also a very vague concept. From familiar HDR, night view mode, beauty and special effects, to "rocking music" pixel displacement, hyper division, and multi shot fusion, even the former dummy image and ultra wide angle edge correction are also part of "computational photography".

Google Pixel 7 ↑

Google has a good tradition of technology sharing in Google AI Blog, so this article, from the perspective of Google, simply sorts out the evolution and characteristics of computational photography algorithms, such as HDR and night view mode, which are most concerned by everyone, and finally puts forward a few words about Apple.

Like fighting against monsters, the process of overcoming problems, improving and using various technologies often makes people cry "wonderful". Let's start to review history.

HDR and HDR+

One of the earliest problems encountered in digital photography is that the dynamic range of the small base CMOS is completely inadequate.  

Digital Duo's very intuitive principle description ↑

In order to enable the camera sensor to record a range of thousands to one, and record the real world with light and dark contrast of tens of thousands to one, engineers have created the "exposure surrounding" of multi frame relay, Use underexposed short exposure photos to record highlights+overexposed long exposure photos to record dark parts, and then synthesize the HDR with high dynamic range for shading details (High Dynamic Range) image. Now looking back is also a very good method.

In September 2010, Apple introduced HDR on iOS 4.1 for the iPhone 4 (just upgraded to 5 million pixels, and the front camera was added for the first time). However, the large-scale publicity of HDR still has to wait for the launch of the iPhone 4s (upgraded to 8 million pixels, supporting 1080P video).

Similar functions were not available until Lumia 1020 in 2013 (half of which was carried by Microsoft), the mobile photography overlord at that time, and its surrounding exposure function required that photos be imported to a computer for manual composition.

The next milestone is Google's HDR+. Google recruited Professor Marc Levoy from Stanford, whose research result is the HDR+carried by the Nexus 5 in 2013. The latter will not shine until the Nexus 5X/6P in 2015.

Professor Marc Levoy changed his job to be the vice president of Adobe in 2020. When he changed his job, he said that he would build a universal camera app. People thought that countless phones with a muddled algorithm would be saved. However, this app is still missing.

The following are a series of problems encountered by HDR/HDR+and excellent solutions.

Paste and sports ghost:

The original purpose of HDR+is to solve the problem that traditional HDR is easy to paste film and move ghost ("multiple shadow separation" of moving objects).

Traditional HDR is composed of at least two long and short frames, and long shutter frames are easy to be pasted due to shaking hands or moving subjects, so HDR+uses short exposures of the same duration for multiple frames (since long shutter is not safe, it is better to use short shutter).

HDR+will take underexposed photos with the same shutter duration of 2 to 15 frames, then align, select the base frame, reserve the jittery scrap for multi frame noise reduction, and merge and output HDR+photos.

Shutter delay:

Because it was necessary to wait for multiple frames to be shot before synthesis, so "zero delay shutter" could not be achieved at that time. After clicking the shutter, it was necessary to wait for the progress bar to turn round before taking a picture (the same is true in the current night scene mode).

Google uses the method of "preloading the picture in front of the shutter" to achieve zero shutter lag (ZSL). As long as the camera is turned on, the phone will start to capture image frames and store them in the buffer area. When the shutter is pressed, the camera will send the last 9 or 15 frames to HDR+or Super Res Zoom (super-resolution) processing.

Imaging speed problem (calculation):

To align and combine multiple photos, computing power is required. It is really "computational photography".

When the Snapdragon 810 of the Nexus 6P did not reduce the frequency, it took 2.5 to 4 seconds to take HDR+photos (depending on the number of frames synthesized by multiple frames). The scene with good light takes 2.5 seconds in total. It takes 100 milliseconds to shoot, 250 milliseconds to align, and 580 milliseconds to merge. The total shooting time of dark scenes is 4 seconds, including 1 second shooting, 500 ms alignment, and 1200 ms for merging. Waiting in circles while taking photos will still make you feel time has solidified.


For the Pixel series after 2016, even though Qualcomm Snapdragon's Hexagon DSP was brought in to help with the processing, its computing power was still completely inadequate (in fact, it is not enough now).

For this reason, in Pixel 2 in 2017, Google and Intel jointly designed a special Pixel Visual Core chip to accelerate hardware, help handle HDR+photos and enable third-party apps to call HDR+. It claims that running HDR+only requires 1/5 of the time and 1/10 of the power consumption. Close to the times, there are also Huawei HiSilicon NPU, Apple A series chips with "bionic suffix", vivo V1+and OPPO Mariana X years later, whose initial core purpose is also to engage in computational photography.

White balance:

In addition to traditional white balance algorithm requirements, when mapping 12bit or even 14bit HDR photos to 8bit jpg images, it is also prone to artifacts, halos, gradient inversion or local contrast loss. Some users may remember the early Google Nexus/Pixel models, and even some third-party models of Google cameras. HDR+will have white balance drift, special textures in highlights and other problems.

Later, Pixel 4 popularized the machine learning white balance used only for night scenes on Pixel 3 to all modes, which greatly improved the white balance problem of HDR+.

Dark noise:

To put it bluntly, HDR+uses "multiple short shutter" to avoid the problem of pasting easily and moving ghosts in traditional HDR long exposure frames. However, short shutter unexposed proofs are bound to lose dark details. Because each short shutter of HDR+will introduce new reading noise, it is doomed to be unable to exceed the long exposure of the same duration.

So in 2021, Google will add a long exposure frame to HDR+again (it will come back again, while Apple's Smart HDR in 2018 will retain a long exposure frame). The "wonderful" thing this time is that Google continues to use the strategy of "starting the camera and buffering short exposure frames". The new long exposure frames are placed after the shutter is pressed, so that zero delay shutter and dark effects can be taken into account.


Night View and Astronomical Photography Mode

Google's Night Sight is a new algorithm of Pixel series introduced at the end of 2018. Its translation is: it has to wait several seconds, but it can hold the automatic night view mode of "slow shutter", corresponding to Huawei's handheld super night view mode of P20 series in early 2018.

Night Sight is an HDR+enhanced version with slower shutter speed and positive shutter delay (PSL). HDR+is that most of the frames are shot before the shutter, while all of the frames of Night Sight are shot after the shutter (so there will be a progress bar that has been stuck for a long time).

HDR and night view modes are sensitive to handheld situations, and Google naturally adds Night Sight Adaptive shutter and frames Policy.

In 2018, Pixel 3 used the optical flow method to measure scene motion in the default photographing mode, and then automatically selected the exposure time to reduce the risk of pasting. At the same time, it used short frames as alignment reference frames, and used an algorithm similar to Super Res Zoom to determine whether each pixel should participate in the synthesis.

If there is not too much movement, the exposure time of each frame of Night Sight Lengthen it to 1/3 second at most and take 15 consecutive shots (5 seconds in total) If physical stabilization methods such as leaning against the wall or tripod are used, The duration of a single frame will be increased to 1 second and 6 consecutive shots will be taken (6 seconds in total)


In 2021, when long exposure is introduced into HDR+, Night Sight will also introduce more long exposure frames, In the 15 frames of the handheld limit, three long exposures were made in the first 6 frames between plum blossoms , and the picture frames that would have been discarded before can now be used for noise reduction.

 

On the left is the thermal noise generated by long-term exposure, and on the right is the algorithm elimination effect ↑

The astronomical photography mode is the extreme version of Night Sight, which includes 15 photos with a single frame shorter than 16 seconds in a long exposure of up to 4 minutes.

In addition, Google uses a neural network that has been "trained" by 100000 sky night images to distinguish the skyline where heaven and earth intersect, and then conducts targeted noise reduction or contrast processing on the sky and land landscapes, and completes the removal of thermal noise (due to the long-term exposure of sensors).

 

Super Res Zoom

Also introduced in 2018, Super Res Zoom (super resolution zoom) is also known as the "rocking music" algorithm. Now many manufacturers use pixel displacement superimposed AI super-resolution at a low magnification of 1.5 to 3 times.

 

Pixel 2 and Pixel 3 Super Res Zoom Effect Comparison ↑

At that time, the mainstream of the industry was to add a telephoto sub camera. The stubborn Google used Pixel 3 to tell everyone what is called "Gurenxi", and used a single camera to achieve a zoom effect comparable to that of a telephoto sub camera.

The Bayer Bayer array filter structure commonly used by CMOS sensors uses red, green and blue 3 primary color filters (corresponding to the common RGGB, but there are other schemes such as RYYB) to record colors in exchange for light sensitivity efficiency and pixel density.

A 1 million pixel screen will have 3 million sub pixels, while a 1 million pixel CMOS will have only 1 million sub pixels. In fact, 2/3 of the information in the photos we usually see is the result of interpolation and color guessing reconstruction of the de mosaic algorithm.



The idea of Super Res Zoom algorithm is similar to the "pixel offset multiple shooting" technology of SLR, which can complete the missing pixel information by precisely controlling the pixel offset. When it is magnified more than 1.2 times, it will automatically trigger, shoot and align multiple frames and jump over the mosaic algorithm.


Natural shaking effect ↑

Its "wonderful" is that Google not only uses OIS anti shake to force displacement (fix the phone, you may notice that the viewfinder is doing elliptical movement), but also The natural jitter of human hand is used for pixel displacement.

The "rocking music" on the camera is generally recommended to stack 8 pieces. After stacking, it will have 4 times of pixels, and can be folded up to 16 times. This is the future of digital photography, but the current mobile chip performance is not allowed.

Google gave the memory usage and speed data of SuperRes Zoom, using Adreno 630 GPU( Snapdragon 845) Processing 12 megapixel photos will consume 264MB of memory, which takes about 280ms. It still eats performance and memory

 

Imagination and multi camera fusion

Picture depth information ↑

The virtual operation is one of the best examples of Google's "never take more photos when you can do things with one camera". The earliest virtualization is to complete distance detection by binocular parallax (triangulation and parallax learned in grade two). On the Pixel series, Google used the ancestral IMX363 The green sub-pixel (red and blue pixels are useless) of full pixel dual core focusing for distance detection is another great skill.

 

However, compared with the dual camera of peers, the physical distance between sub pixels is much smaller, and the depth information accuracy is insufficient (even worse in weak light), so Google added a stereo algorithm.

In practical operation, HDR+is the first step. The convolutional neural network trained by 1 million photos is used to distinguish the human shape area, and the fuzzy degree of each pixel is calculated by combining the character recognition and depth information. The front lens does not have dual core focus, and it relies on character recognition.

In order to improve the effect, Google also played some tricks that year. For example, in the early virtual portrait, the front is magnified by 1.2 times by default, and the rear is magnified by 1.5 times, so that users can naturally stay away from it to reduce perspective distortion.

Today, all manufacturers can use the most common single shot to complete the virtualization, and even customize the shape of the virtual spot. However, the weakness of artificial neural network is still that it is too "relying on common sense" to recognize unusual objects. For example, in the photos of people kissing crocodiles, crocodiles may be blurred as background. It is easy to roll over in areas with repeated texture or even no texture when encountering science and engineering men's plaid shirt, blank wall surface/seriously overexposed samples, etc.


For multi shot fusion, the typical purpose is to focus on the zoom range between 3 and 5 times of the long focus: the main digital zoom superimposes the long focus picture in the middle (the picture quality suddenly changes from the edge to the middle of the picture).

Or, like the multi shot fusion promoted by Glory Magic3 series, superimpose the main shot picture in the middle of the super wide angle picture. If the equipment happens to have strong computing power, it can even set the main camera in the middle of the ultra wide angle, and then set the telephoto picture.


Apple

After years of development, for example, Google HDR+later reintroduced long exposure frames, Google and Apple have followed the handheld super night view mode similar to Huawei, and the algorithms of different manufacturers often reach the same goal by different routes. Here is a brief introduction of Apple's most commonly used Smart HDR and Deep Fusion algorithms.

Smart HDR The algorithm was first launched in the iPhone Xs series in 2018. As its name implies, it is used to deal with scenes with large light ratios.

It has always been the traditional setting of long and short frames. The goal of the algorithm is to improve the image tolerance, and the effect of image quality enhancement is not obvious. By default, Apple uses 9 consecutive frames for compositing, while Google uses 2-8 frames for compositing. The latter will reduce the number of compositing for the sake of a natural picture. Bright scenes will only use 2 frames for compositing at least.  

 

Deep Fusion algorithm It was also introduced in the iPhone 11 generation, which is also 9 in 1, but its core purpose is to improve the image quality. It is one of the culprits for Apple to change from "authenticity, restoration, elegance" to "gorgeous sharpening lovers".

Deep Fusion has no switch. You can only view those samples that trigger the algorithm through third-party apps such as Metapho. The main camera will trigger Deep Fusion in the weak light scene. For the telephoto and ultra wide angle with weak quality of Apple, it is almost triggered by default no matter day or night. Don't like the sharp plastic feeling. You can use NOMO and other apps to avoid it, but at the cost of improving the chance of pasting.

 

Algorithm supplier

Finally, I would like to mention the algorithm supplier.

Mobile photography was originally the world of traditional algorithm suppliers such as Hongruan. In addition to Apple, which has been engaged in self research routes, and Google, the former leader in computational photography, other manufacturers have been using Hongruan algorithms. For example, Kuangshi and Shangtang are also important third-party algorithm suppliers.

Now Huawei, Xiaomi, vivo, OPPO and Glory all have their own algorithms, but Samsung and vivo are still enthusiasts of the iris algorithm. In addition, many manufacturers will use the same algorithm, or even the same set of algorithms (like the multi camera fusion of Glory Magic3 series, and the telephoto of Xiaomi 11 Ultra all use the same generation algorithm of wide field of vision), and there have also been models equipped with several algorithms at the same time.


Follow our Weibo @ Love Computer

Follow our WeChat official account: playphone

Of course, we also pay attention to our Bilibili account: love computer

Share:
Charles Fang
Ordinary geek
Benefits!

Scan QR code and follow the author

Share Weibo Share WeChat
 Aigoji WeChat

Aigoji WeChat

 WeChat

WeChat

Recommended products

Sorry, the product you are looking for is not available in the product library

on trial