the UK & European home automation networking and entertainment resource

navigation bar

Please register
Subscribe to ezine
Bookmark this site
Quick navigation
 

Articles and whitepapers

Voice Recognition for Home Control (1/3/2006)

By David Milward, Linguamatics

For several years, demonstration 'homes of the future' have shown control of home devices by voice. There are some obvious benefits. Voice allows hands-free control. This is vital for people who find keyboards difficult or impossible to use, but also useful to anyone who wants to control their music, telephone etc., while doing other tasks, whether this be ironing, cooking or having a shower.

'Spoken dialogue technology' takes voice control one step further, allowing a system to interact in a step-by-step fashion with a user, asking questions and responding to the replies. This makes advanced functionality of multiple, complex, networked devices much more accessible, not just to the technically-minded with the patience to wade through manuals. It also allows people to phone up their home and access devices remotely, for example to turn on the heating or air conditioning before returning.

Voice control and spoken dialogue is starting to become commonplace in cars, not just in top-of-range models. So what is the potential in the home, what are the component technologies, and what is needed for it to become equally commonplace?

Potential for Spoken Control in the Home

Although spoken control can be used for simple individual devices in the same room as the user, the real benefits are for control of remote devices, of complex or multiple devices, and interaction with services. Let us consider each of these in turn.

1. Remote devices
This includes controlling the home by phone from outside, or controlling devices in other rooms. Monitoring of devices at a remote location, for example the home of someone with a medical condition, is also possible. Devices themselves may initiate interactions, for example, a warning may be relayed over speakers if a device senses an unusual state, giving a user a chance to override any automatic shutdown.

2. Complex devices
Although it is possible to control complex devices using on-screen menus on LCD panels, menu structures can be difficult to navigate for new users. Voice shortcuts allow people to immediately get to an option, such as a volume setting, without having to know that 'volume setting' is below 'user configuration' for example. It is possible to access hundreds of devices or services by voice - many more than can appear as shortcuts on a screen. It is also possible to skip several steps by setting parameters directly, using a command such as 'turn the TV volume to 5%'.

3. Multiple devices
Controlling multiple devices and checking their status is very natural using voice. For example, 'Turn off all the lights' or 'Have I left anything on?' Although lights and blinds may increasingly use sensors, homeowners will still want the ability to override and to choose particular mood settings. The ability to program multiple devices, for example 'Switch the hall light on when the front door is opened' is also critical if users are to exploit the benefits of networked devices

4. Services
It is possible today to access services over the phone by voice, but these are not tailored to the home or able to interact with other services from other suppliers. When booking a film and a taxi, it would be great to book the film then just say 'We need a taxi to get there' instead of having to re-enter the day, time and destination.

The technology

'Speech recognition' is the process of taking a speech signal and converting it into words. The technology has improved steadily, if not spectacularly, over the last few decades, but it is still not possible to accurately convert anyone's voice talking about any subject. The recogniser therefore needs to be trained to one or more specific speakers, i.e. 'speaker-dependent, large vocabulary' recognition, or restricted in the number of words it can recognise, i.e. 'speaker independent, small vocabulary'.

'Speech synthesis' is the opposite process of taking words and creating a speech signal. The quality of synthesis has improved rapidly in recent years, helped by larger computer memory. In the best systems, the robot-like voices of the past have been replaced by very natural-sounding speech, with a choice of accents.

In addition to converting from words to speech and vice versa, a system also has to understand what the words mean, and be able to convert instructions such as 'Turn the living room light on' or 'Switch on the light in the lounge' into the same command. This is 'language understanding' or 'natural language processing'.

Finally, if the system is to interact with the user in a conversation, the system requires a 'dialogue manager'. In most systems, this is a fixed script describing each step of an interaction with a user, but more intelligent and flexible systems are being developed. For example, in ontology-based dialogue, the dialogue manager calculates what to say next based on its knowledge of the devices and services (e.g. a light with the ability to switch between on and off) and the scenario (e.g. a light which is in the lounge, which is downstairs, which is in the house).

Components required for a home system

The following components are required for a home system, in addition to the networked home:

1. Microphone(s) and speaker(s). Portable devices range from Bluetooth headsets or mobile phones, to tablet PCs with embedded microphones and speakers, as in the picture below. Array microphones can be used to pick up sound from any position in a room. Small, fixed microphones and speakers in chosen locations, such as above a kitchen worktop, provide a cheaper alternative.


Tablet PC with touchscreen, microphone and speakers

2. Speech recognizer/synthesizer. This is usually software running on a medium- to high-specification PC. There are systems designed for embedded devices, but quality tends to be related to memory/disk usage. A telephony card or Voice Over IP is required for remote use.

3. Language understanding and dialogue management. Again software running on a PC, but requiring relatively little memory and processing requirements relative to speech recognition or synthesis.

4. Connection to devices. Messages need to be received from, and sent to devices by the dialogue manager. This may be done using X10 for example, or via an API to a home control software package.

As well as installing the components, some configuration, and later reconfiguration, by the home owner is required, so that the system knows about the rooms in the house, and the location of the devices.

Current status and future developments

Spoken interaction has the potential to provide a natural and very flexible way to talk to devices. Systems based on single commands or scripted interactions, with training to a speaker's voice are already available as off-the-shelf software packages. Spoken dialogue systems that provide more flexible interactions are at the prototype and demonstration home stage. For example, at the Advantica Test House in Loughborough, a demonstrator built as part of a DTI-sponsored trial run by TAHI (The Application Home Initiative) provides natural interactions using a multi-modal interface as shown in the diagram below.


Multimodal interaction for controlling home devices

Users can read what is on the screen, listen to instructions, or give voice instructions depending on what is most convenient. As well as controlling devices, the demonstrator includes a recipe application that reads out recipe instructions step by step.

So why is spoken control in the home less common than in-car control? Firstly, modern cars have networked devices as standard. Secondly, there are a fixed number of known devices in the car, so scripting interactions is possible. Thirdly, the driver is in a fixed position, allowing microphones and speakers to be carefully positioned to get maximum accuracy.

None of this need prevent spoken systems from being used in the home now, or in similar environments such as care homes. However in the next few years, the likely uses are going to be where there is a real need, such as for the elderly or disabled, or in expensive homes which are already highly automated. In the meantime, speech recognition technology is expected to continue to improve, and research projects such as the EU-sponsored Talk Project are developing the next generation of fully reconfigurable systems that will be more suitable for dynamic home environments, where devices and services are continually changing.

David Milward is the CTO of Linguamatics Ltd, provider of natural language processing technology, including next-generation ontology-based dialogue management.

www.linguamatics.com


 
home | ezine | directory | resources | about us
use our newsfeed | subscribe to ezine | submit a link | advertise | link to us

Whilst every effort has been made to ensure the accuracy of all articles, advertisements and other insertions
in this website, the publisher can accept no responsibility for any errors or omissions or incorrect insertions.
The views of the contributors are not necessarily those of the publisher or the advertisers.