Multimedia Classification
As John Zubrzycki mentioned yesterday,Ìýthis project is running as part of Â鶹Éç R&D’s Archive Research Section, is developing new ways in which to open up the Â鶹Éç’s archive to the public. The aim of the project is to allow people to search and browse the archive to find content that they want to watch or listen to, but didn’t know existed.
Currently, the majority of people search for programmes on Â鶹Éç iPlayer via programme title – they know the name of the show which they’ve missed and stand a reasonably good chance of finding it. If they don’t have a specific programme in mind, then they can browse by channel or genre and pick something from there. However, when the archives are digitised if people were to browse in this manner, they’d have to wade through thousands of programmes to find what they want. There would be so many sitcoms or documentaries that finding a programme of interest would be a real challenge. Ìý
In order to allow people to search and find programmes more effectively we need what is known as metadata about a programme – information about a programme that can tell you something about it such as who is in it, what it’s about, where the different scenes or sections are.
Currently we have metadata that would allow you to find a programme if you knew either the title, when it was first broadcast and in some instances who were the main presenters or actors. But what if you didn’t want to search by these terms? What if you wanted an archived programme that was an exciting spy drama similar to Spooks, or a satirical comedy show about politics? In order to solve these types of questions we need new metadata.
In R&D, we are currently developing systems that can watch and listen to a programme in a similar way to people. ÌýWe are developing systems that can recognise, and more importantly understand, what is in a programme (e.g. people, places, objects such as cars or Daleks), what these are doing (e.g. are character’s talking or shouting to each other? Is someone running? What are the characters saying to each other?) and what is the mood, or feeling of the programme. The mood element helps people find the programmes they want in order to be entertained – to match the mood of the programme to their current mood or desired mood.
In order to do these we are focussing on three main areas. The first is what we’ve termed characteristics extraction. This is where we analyse the audio and video signals and try to identify key properties of it – such as cuts, motion, luminance, faces, any key audio frequencies or audio frequency combinations or especially strong or weak parts of the signal – using signal processing techniques such as the power spectral density or taking a Fast Fourier Transform of a section of the signal. This then gives us a set of numbers, or vectors, which represent the audio and video signals based on their key properties. Ìý
The next stage is what we’ve termed feature extraction. Using the extracted characteristics, we aim to use them to identify key features, or objects in the programme. We do this using machine intelligence techniques which map the extracted characteristics to a library of characteristics taken from known features. ÌýThese systems then make a decision as to which of the known features the extracted characteristics most closely match. For example, one of the initial systems we developed aimed to find studio laughter in a programme, which would help us identify initially which programmes were comedies and Ìýthe position of jokes in a programme. To do this, we extracted the characteristics of hundreds of clips of studio laughter from different Â鶹Éç comedies, creating what is known as a ground truth data set. We then extracted the characteristics from other programmes and matched them to this ground truth data set (using a technique called Support Vector Machines), and were able to identify how much laughter a programme had and where it is. ÌýWe can then use similar techniques to identify other audio features, such as explosions, shouting, and cheering. We can also use broadly similar techniques to identify objects in the video, such as people, places and Daleks – by teaching the systems we’ve developed what we are looking for, by showing them hundreds or thousands of examples they are then able to recognise those examples in any other programme we show them.
This helps us identify what is in a programme. The next stage of our research is to identify what types of mood and emotion a programme contains. This helps us classify programmes as exciting, tearful or happy. ÌýIn order to do this we take a similar approach. We collected our ground truth by getting hundreds of people to watch hundreds of clips from the archive and tell us what the mood was at different times of the programme. We can then analyse the sections of programme which have powerful emotions and identify the key characteristics which we can use to train our systems. We can also use the extracted features to help identify a mood. In the above example, where we found lots of laughter we could infer the programme was happy and light hearted. ÌýIf however there was lots of screaming and shouting we could infer the opposite.
We also look at other aspects of the programme. Music is an inherently important part of productions, helping set the scene and reinforce its emotion. Working with the University of Salford and the British Science Association, we ran an experiment called MusicalMoods (). This asked people to listen to TV theme tunes and rate their mood and emotion. We then use this data to identify the key elements of music which reflect the emotions (e.g. the key or tempo) and can then analyse other music. You can see a we did about the experiment on YouTube.Ìý
In addition we are very interested in what is being spoken in the programme. ÌýIn some instances we can use the subtitles or any available scripts. In conjunction with Dr. Andrew MacFarlane at City University London we are developing systems that can analyse these, identifying what people are talking about by analysing the actual words said and also their emotional content. We are also interested in how people are speaking – for example are they shouting, whispering or laughing as they speak? In many cases, we do not have either the subtitles or the scripts available. In these instances we are part of a research project run by the Universities of Cambridge, Sheffield and Edinburgh which aims to create new methods for automatically transcribing the speech of a programme into text.
Once we have collected all of this metadata about a programme, what's going on in it and its mood, the next area of our research is to develop systems that store and index this a useful way, and develop ways to allow people to search for what they want most effectively. Continuing our research with City University London, we are looking to develop a new type of information retrieval system. Traditional information retrieval systems aim to match a user’s query with any documents the system knows about. These tend to focus on key word matching – matching words or synonyms in a users query with those in the indexed documents. ÌýWe are developing systems that not only perform this function with Â鶹Éç programmes, but take into account the mood of the programme as well.
We hope that by creating these systems, Ìýpeople will find content in the archive that they didn’t know was in there and that they didn’t know they wanted. This will really help open up the archives and allow people to explore programmes within it.
Comment number 1.
At 11th Oct 2011, Kit Green wrote:This is very interesting and potentially very useful to viewers, listeners and researchers.
How does it pay for itself? What is the business model?
Complain about this comment (Comment number 1)
Comment number 2.
At 13th Oct 2011, U14179821 wrote:All this user's posts have been removed.Why?
Complain about this comment (Comment number 2)