What is the performance of the smartest smart speaker? Small Smart Speaker Evaluation | Ainongji - 上海419贵族宝贝-上海后花园1314-上海品茶网-阿拉爱上海

As the price of smart speakers is relatively low, and as intelligent products, their speaker attributes are more easily accepted by the mass market, the smart speaker market is more popular than other smart home products. The smart speakers in the current market can be divided into two categories by "screen": smart speakers with a screen and smart speakers without a screen. The smart speakers with a screen have a wider application scenario due to the addition of a screen, but they reduce their speaker attributes and have a relatively high price. Screen less speakers are the mainstream of the current market. First, they are cheaper and easily accepted by consumers. Second, their functions are relatively simple and practical, and their speaker properties are also stronger.

When it comes to non screen smart speakers, we have to mention Tmall, Xiaomi and Baidu. The first two launched Tmall Genie and Xiaoai mini speakers at their very low prices, hoping to take the lead in capturing the market share of non screen smart speakers. Baidu recently launched a smaller smart speaker that is smarter, cheaper and more cost-effective, with a price of only 89 yuan, It is also the protagonist of our evaluation today.

appearance

Small smart speakers are packaged outside the box. After opening the package, you can see a small smart speaker, power adapter, instructions, packaging and accessories are relatively simple.

Small smart speakers adopt a cylindrical shape, and the whole machine adopts a three-stage design. The outer circle of the top is a mesh speaker opening. The four physical keys are evenly distributed on the top in a cross shaped arrangement. There is a hidden "bear's paw" LED light in the center of the top. The small smart speakers will light up when they are paired or awakened. The middle part is made of matte plastic, and the lower ring is also the opening for grid speakers. The power adapter interface is placed at the bottom of the machine.

Most smart speakers in the current market use white as the main color of the product. The main reason is that white is a relatively all-purpose color matching, which can be better integrated into various use scenarios, and the appearance is relatively comfortable. Even if you can't add points, you will never lose points. Small smart speakers are mixed in white and gray, which makes the overall view more comfortable and can also be integrated into various use scenarios.

Parameter configuration

CPU: ARM Cortex A53 Quad Core

System: DuerOS

Loudspeaker 1.75-inch full frequency NdFeO internal magnetic speaker

Speaker frequency response range: 80Hz-14kHz (- 6dB)

Speaker sensitivity: 80dB/m/W

Rated impedance: 6 Ω

Maximum output power:> 5W

Size: φ90mm x 102.4mm

Weight: about 280g

Ｗi-Fi：802.11 b/g/n

Bluetooth: Bluetooth V4.2

Power adapter: 12V/1A

Intelligent Voice Assistant

Thanks to the DuerOS open platform of small smart speakers, small smart speakers support 400+life skills, basically covering users' daily life scenes, including but not limited to encyclopedias, weather, stocks, alarm clocks, timers, exchange rates, calculations, history, unit conversion, human geography, idiom poetry, open classes, talk shows, etc. In terms of resource content cooperation, Xiaodu smart speakers also support many mainstream content creators. Baidu Baijia, Baidu Knows, Baidu Music, which are Baidu's own products, are indispensable. Many third-party resources, such as Keep, Kaola FM, Baby Bus, Dolphin Sleep, Financial Breakfast, Haochu, Sina News, Fruit Shell, also support it.

As the central control of smart home, small smart speakers support LifeSmart, Haier U+ BroadLink、 Doodle Intelligence, Xiaocong Intelligence, OREBO, etc. will add more support lists in the future.

Small speaker App

On Demand List in App

Record conversation content in the app

Customized ciphertext

Before officially using a small smart speaker, users need to download the small speaker app and distribute and bind it. In fact, there are not many functions in the App. The functions of small smart speakers can be achieved through voice conversation. There is a list of resources that can be directly on-demand in the App. At the same time, the conversation between the user and Xiaodu smart speakers will be recorded, and the user can customize the ciphertext, specifying that Xiaodu will reply to the corresponding content under the corresponding question.

Custom ciphers are similar to programmable operations. When a small smart speaker hears a statement of related content, it will give a preset reply. This reply can be a fixed statement or a function, such as playing a certain music.

use:

In terms of speech recognition, small smart speakers need to wake up each time with the voice command "small small"; (Geek mode does not need wake-up words, see below)

The app is a small speaker. After pairing, you can view the conversation with small speakers and customize the ciphers through the mobile phone's on-demand resource content;

Support the control of smart home products within the support range through voice operated switches (see the diagram for the list of supported products);

Experience:

In the actual test, the small smart speaker is operated with normal conversation sound. Its actual sensing radius is about 3 meters, and when it is within 5 meters, it needs to shout more loudly to respond. Its wake-up time is within 0.5 seconds, and its response time depends on the situation.

The following tests are conducted when the WiFi network is smooth and quiet

Regular man-machine scene dialogue
User: "I'm hungry".
Xiaodu: "Go to eat when you are hungry, and order takeout when you are too lazy to go out.".
Response time: 1s

Search Dialog
User: "How is China's stock market today?"
Small degree: "Today's A-share Shanghai Stock Exchange Index XXX..."
Reaction time: 0.5~1s

Functional dialogue
User: "Help me set an alarm clock at 6:00 pm".
Xiaodu: "OK, I will remind you at 6:00 tonight".
Reaction time: 0.5~1s

Call resource dialog
User: "Play today's international news".
Listen to the headlines in a small way Today's international news
Reaction time: 0.5~1s

In a smooth and quiet network environment, the response time of small smart speakers is basically less than 1 second. What's surprising is that it has a faster response speed when it needs to call resources and search related questions and answers, and sometimes it only takes about 0.5 seconds to answer. These two tests often take 2 seconds or more in other smart speakers. In addition, small smart speakers have a certain degree of fault tolerance in speech recognition. For example, in the small smart speaker app, you can see that there are individual text recognition errors in the conversation between users and small smart speakers, but it can automatically correct errors and answer relevant questions.

The response speed and fault tolerance rate are the criteria to judge whether a smart speaker is "smart". In both aspects, the performance of small smart speakers is very satisfactory to the editor. It is really "smart".

Semantic recognition

Semantic recognition of small smart speakers

In speech recognition, there is a very critical link called "semantic recognition". The so-called "semantic recognition" refers to that the system will associate context with speech recognition, rather than treating a user's sentence as an independent sentence break.

For example, when the first sentence of the mini editor says "introduce Baidu" and the second sentence says "who is its founder?", the second sentence is to test the semantic recognition function of the product. If there is no semantic recognition, the system cannot judge who is "his" in the second sentence. Only after combining the first sentence "introduce Baidu", The system can accurately judge that the "his" in the second sentence of the user is Baidu, and Xiaodu smart speakers can also complete this semantic recognition test here.

Similarly, in the picture on the left, Xiao Bian asked four questions in a row. The first one was to sing Jay Chou's song, the second one was to change the song, the third one was to stop listening to this series, and the fourth one was to have some Japanese songs. When these four sentences are related, they are actually switching songs all the time. Except for the first sentence, each sentence has context. In the fourth sentence, "some Japanese songs" means "some Japanese songs". Finally, Xiaodu smart speakers switched the music to another Japanese music album, and the test result was still successful.

Fault tolerance and response speed are the criteria for judging whether smart speakers are "smart", Semantic recognition is the standard to judge whether a smart speaker has "intelligence". The difficulty in semantic recognition of smart speakers lies in the fact that Chinese has tones, verbs have no tenses, and different dialects have different grammars. These differences from English make it impossible for commonly used computing models such as Hidden Markov Model to be suitable for Chinese speech recognition.

Xiaobian boldly speculates here that the reason why Xiaodu smart speakers can be so smart and intelligent is also related to Baidu's new generation of deep speech recognition system Deep Speech 2. In fact, the voice assistant/voice system built in most of the smart speakers on the market is not independently developed by the manufacturer, but adopts the voice recognition technology of iFLYTEK, Sogou, etc., and Baidu may be an exception. Baidu probably adopted the Deep Speech 2 speech recognition system independently developed by its laboratory based on deep learning technology this time, which also used a large amount of Chinese audio data in Baidu's search engine, and the recognition accuracy rate of this technology is up to 97%.

Therefore, Baidu has natural advantages over other manufacturers in making smart speakers It is reasonable that small smart speakers can do better than other products in response speed, fault tolerance, semantic recognition, etc.

Child mode

The first sentence is the reply in children's mode, and the second sentence is the reply in normal mode

Children's mode is a voice dialogue mode specially launched for children by small smart speakers. In this mode, the functions of small smart speakers remain unchanged, but their voices will become lovely and moving children's voices. At the same time, their chat mode will also become more "children", making children more intimate.

In the child mode, the small smart speaker will affectionately address the user as "baby", and the intonation, tone and words will change. The same sentence, if used in the normal mode, is entirely a different tone and wording.

At home, parents must have such troubles. Children often ask common sense questions, and most children are very curious. They often have many questions, and parents are tired of answering. At this time, small smart speakers will be a good help for parents.

Based on the powerful DuerOS system of Xiaodu smart speakers, it has built-in Baidu Know, Baidu Encyclopedia, calculators and rich educational resources, which can basically solve all children's questions. For example, a small editor said a long string of numbers in a row to calculate, and a small smart speaker gave a reply in less than 1 second, as well as some common sense questions such as "Tang Taizong is the first emperor of the Tang Dynasty", "Why Japan often has earthquakes", "What was the cause of the Hong Kong financial crisis in 1997", "Who is Soros", and so on, Small smart speakers are based on Baidu Knowledgeable and Baidu Encyclopedia. They can give answers in a very short time, not only through voice broadcast, but also on mobile app small speakers. At this time, children can click the link in the app to see more detailed and comprehensive answers.

In children's mode, when the user sends music, stories, listening to books and other instructions to Xiaodu, Xiaodu will play the content more suitable for children first. For example, when you say some music in children's mode, it will play children's songs. If this scenario occurs in normal mode, Xiaodu will play pop music.

Geek mode

Geek mode is also a major feature of small smart speakers. In this mode, small smart speakers will become very powerful, not only answering quickly, but also responding to users' questions without waking up in 8 seconds.

The user can use the voice command "Enter Geek Mode" to let the small smart speaker enter the Geek Mode. In this mode, the small smart speaker has an 8-second dialogue time gap after each response. In this 8-second period, the user can also perform voice operations without using the small smart speaker to wake up.

Geek mode
User: How is the weather today?
Small broadcast of today's weather

User: What about tomorrow?
Small broadcast of tomorrow's weather

User: What about the day after tomorrow?
Small broadcast of the weather the day after tomorrow

The biggest difference between the geek mode and the normal mode is that there is no need to wake up the small smart speakers again. Therefore, users who use this mode tend to generate more dialogue needs, which reflects the importance of semantic recognition. In the geek mode scenario, the user's conversation is likely to be coherent. As mentioned above, the semantic recognition of small smart speakers is very accurate. Combined with the huge resource base and audio content of DuerOS, it can give users a better experience in the geek mode.

tone quality

In the field of speakers, there is an iron rule, that is, the size of the speaker cavity determines what kind of sound the speaker can make. Under the same other conditions, the bigger the cavity, the better the sound. Although the small smart speaker itself has a small volume, and the speaker cavity is also small, it uses the acoustic reflection cone design inside. It forms an annular sound field with itself as the center of the circle through the upper and lower grid speaker openings, creating a sense of stereo. The biggest feature of this design is that the amount of bass is particularly sufficient. Compared with speakers with the same cavity size, It dives deeper.

In terms of actual listening sense, what impressed Xiaobian most about the small smart speaker was its low-frequency and mid-frequency. The voice and low-frequency are clearly layered. The mid-frequency charm is rich and dense, and the low-frequency atmosphere is strong. It is hard to imagine that this is the sound produced by a small cavity speaker. This kind of listening feeling is very similar to the feeling that Sony XB10 gave to Xiaobian before. The sound of XB10 can basically be regarded as the limit of sound production at this cavity level. Of course, the cavity of small smart speakers is larger than it and the effect is better than it.

summary

Many users may say that a smart speaker is just a toy, and the intelligent voice assistant is not intelligent at all. Yes, this is also the feeling of many smart speakers for Xiaobian. However, we can't generalize. If the success rate of speech recognition of a smart speaker is very low and the response speed is very slow, no matter how powerful its function is, it is useless.

But the smart small smart speaker is totally different. Just imagine that if 97 of your 100 voice sentences can be correctly recognized, and the response speed is extremely fast, the answer given is also the answer you want. In fact, it can have more application scenarios, help users solve many problems, and save a lot of time. In addition, small smart speakers are also equipped with child mode and efficient geek mode for children. Compared with other smart speakers, small smart speakers are not only smart, but also have more functions.

Yes, Xiao Bian also knows that talking about products without price is nonsense. No matter how strong the advantages are, they can't equal a word of "expensive", but this time the small smart speaker only costs 89 yuan