Background technology
Content-based video search is the research focus of nearly a period of time industry.Information retrieval system generally includes the searching database of a core, search dispatching system and group of server.Different with common information retrieval system, video searching system is comparatively complicated, and the module that comprises is also more relatively.Traditional video frequency search system will obtain video frequency program for information about by manual annotation, and leaves these information in supply inquiry after this in the database usefulness.This mode has significant limitation.Artificial note not only expends great amount of manpower and time, and often has very big subjectivity, can not make accurate, just portrayal to video program content.For this reason, people are used for video frequency program with graphical analysis, speech analysis and captions analytical technology and handle, and are main tool with the computing machine, obtain the characteristic information relevant with video program content automatically, for content-based search provides support.
Voice in the video content, captions, image, metadata (to the descriptive information of video content, as the director, the performer issues unit etc.) are extracted respectively and are stored in the different databases.How according to user's initial conditions, retrieving video frequency program near user expectation be in a plurality of databases needs another major issue of solving after the decomposition storage that has solved video content.
The search condition of user side input can be any class in the four class video contents, also can be the combination in any of four class contents.Input for captions and classification metadata (because metadata can be divided into polytype, such as by director, performer, the place of production, brief introduction etc.) user side is exactly a pure words information.Then to handle for the picture that the user selects, extract the search condition of the characteristic of picture as reality through user side software as search condition.The voice of importing by microphone for the user also will become the data that can carry out matched and searched in speech database through the client software processing.
No matter be the search of single class condition or the compound search of multiclass condition combination, the results for video of the expectation of being close to the users most, the matching degree between data that just must have set of rule to define to preserve in the multimedia content database and user's initial conditions.Captions, image, voice, this four classes video content of metadata are at the search condition (captions keyword, speech data, picture or video segment, metadata keywords) of correspondence the time, their matching degrees separately must have objective definition, for captions and the such text content of metadata, their matching degree can adopt same standard, often adopts the distance between the characteristic of calculating two width of cloth pictures to characterize for the matching degree of picture in the picture of user input and the database; Matching degree for the data in the voice of user input and the speech database then adopts similar probability to characterize.
Logically between client software and multimedia database, it can only communicate by letter with search dispatching system search dispatching system for client, and it does not also know the existence of multimedia database cluster; For multimedia database, the query interface of every type database is all inequality, and search dispatching system then has the query interface of all types multimedia database, and single class multimedia database only need be accepted the query requests of search dispatching system.Doing making that whole searching system is more flexible like this, is that client or database change all and only to need the upgrading search dispatching system.
The function that search dispatching system is finished is exactly to accept user's querying condition, querying condition is analyzed, the various combination conditions that the user is sended over split out, the searching request that the user is sent is converted into and meets the data that all kinds of multimedia database query interfaces require, and these condition data is forwarded in the corresponding multimedia database query interface again.Single class multimedia database can return to search dispatching system with being similar to the partial results that meets querying condition, if this is the inquiry of a single class condition, search dispatching system only this result need be marked, sorts and additional metadata information just can return to client.If this is the query composition of a multiclass condition,, also to asks union and repeatedly mark a plurality of results sets except the result that types of databases will be returned marks and sorts.
Choose the front that the result who meets user expectation most comes net result among the result that search dispatching system will inquire from database, this problem not only relates to the matching algorithm of underlying database to search condition, also relate to user's mental habit and the problems such as relation between all kinds of content of multimedia, still need a large amount of research just can reach satisfied effect.This search dispatching system is exactly the result who efforts be made so that Search Results is close to the users and expects most, has therefore adopted a lot of experimental scoring algorithms, has obtained reasonable effect.
Summary of the invention
The objective of the invention is to realize a search dispatching system based on video program content, it is the interface between user side and the multimedia database, finish and accept user search request, searching request is dispatched on different types of multimedia database, and the result set that returns of secondary treating database, in an orderly manner Search Results is returned to client according to the matching degree of the search condition of result set and user's input.Relation between search dispatching system and client and the multimedia database such as Fig. 1.
A content-based video search dispatching system comprises: a mixed-media network modules mixed-media; A data library inquiry interface module; A search condition parsing module; A search condition distribution module; A scoring policy module; Single class conditional search is grading module as a result; Multiclass condition query composition result merges, grading module; A Search Results generation module.
-described search condition parsing module has comprised the analysis to searching request, and generates the querying condition that multimedia database is accepted.
-described search condition distribution module, the multimedia database querying condition that the variety classes search condition is generated is distributed to corresponding multimedia database query interface.
-described scoring policy module, the default weighted value that has comprised all kinds of metadata fields when calculating metasearch result marks, the default weighted value of camera lens reproducible results when calculating scene result marks, all kinds of medium results' default weighted value during the net result scoring of calculating multiclass condition combinatorial search.
The default weighted value of-described all kinds of metadata fields has comprised programm name weighted value, PD program director's weighted value, program performer weighted value, program languages weighted value, program place of production weighted value, program classification weighted value, program format weighted value, program brief introduction weighted value; Each weighted value is the floating number between 0 to 1, and all the weighted value sum is 1.
The default weighted value of-described camera lens reproducible results has comprised that the image lens reproducible results is preset weighted value, voice camera lens reproducible results is preset weighted value, the default weighted value of captions camera lens reproducible results.
-described all kinds of medium results' default weighted value has comprised that image result is preset weighted value, sound result is preset weighted value, the default weighted value of captions result, the default weighted value of metadata result; Each weighted value is the floating number between 0 to 1, and all the weighted value sum is 1.
-described single class conditional search is grading module as a result, comprises picture search result's scoring algorithm, the scoring algorithm of captions Search Results, metasearch result's scoring algorithm, the scoring algorithm of camera lens reproducible results.
-described picture search result's scoring algorithm has comprised that the normalization to result scoring transforms to the distance between information drawing picture and result images, when it transforms used image pitch from upper limit sill value be configurable floating number.
The scoring algorithm of-described captions Search Results has comprised word segmentation processing and twice scoring to the actual subtitle fragment that obtains from caption database.
Scoring first in-described twice scoring has comprised in the search subtitle fragment whether comprising complete search criteria character string, and has carried out the statistics of the frequency of occurrences; The high more result's scoring of the frequency of occurrences also can be high more.
The scoring second time in-described twice scoring has comprised the participle to the search criteria character string, and each speech that is partitioned into of search from captions participle string, and the word frequency of adding up each speech then is as score basis.
-described metasearch result's scoring algorithm has comprised to the scoring algorithm of single class metadata with to whole metasearch result's scoring algorithm; The scoring algorithm of single class metadata has been adopted twice scoring algorithm in the captions scoring algorithm, and Ping Fen weight factor is that default weighted value by all kinds of metadata fields converts and draws for the third time.
The scoring algorithm of-described camera lens reproducible results has comprised the scoring algorithm of picture search camera lens reproducible results, the scoring algorithm of audio search camera lens reproducible results, the scoring algorithm of captions search camera lens reproducible results.
The scoring algorithm of-described camera lens reproducible results has comprised in the camera lens reproducible results that search is produced the extraction of the highest scoring and according to the adjustment of the number of times that repeats to scoring.
-described according to the adjustment of the number of times that repeats to scoring, comprised according to the default weighted value of all kinds of camera lens reproducible results and suitably improved the highest scoring.
-described multiclass condition query composition result merges, grading module, has comprised asking union and being weighted summation according to default weighted value scoring to result items when asking union of all kinds of medium results of multiclass Search Results.
-described scoring to result items is weighted summation, adopts that scoring is weighted adjustment to the result in the individual event result set earlier, the identical result of when asking union Different Results the being concentrated summation of marking again.
-described result in individual event result set scoring is weighted adjustment, comprised that scoring is multiplied by weight factor after the conversion to the second Conversion of all kinds of medium results' default weighted value with to the result.
The second Conversion of-described default weighted value to all kinds of medium results is that all kinds of medium results' default weighted value is carried out normalized according to the medium kind in the current search.
-described Search Results generation module, comprised to net result sort according to the descending of scoring and metadatabase in extract program descriptive information generate return results.
System of the present invention provides a kind of search dispatching and sort result algorithm that searches out the video content that meets user expectation most in a plurality of databases of storing different media contents.Finish receiving the client searching request, parsing and forwarding searching request are to multimedia database and generate orderly Query Result.This system comprises search condition parsing module, search condition distribution module, scoring policy module, single class conditional search grading module, the fusion as a result of multiclass condition query composition and grading module, Search Results generation module as a result.
Embodiment
Referring to Fig. 1, Fig. 2:
1, mixed-media network modules mixed-media
The net result collection of the searching request of client and inquiry gained all will receive and send by mixed-media network modules mixed-media, has defined a cover message format between client and the search dispatching system and has been used for communication.
2, data base querying interface module
Video content can be decomposed into captions, voice, image, metadata, and each media content is stored in the different databases, and they are called caption database, speech database, image data base, metadata database.The data-base cluster that every class database all is made up of a plurality of computing machines.Every class data of database institutional framework, storage means, query interface are all inequality.This module package the query interface of all kinds of multimedia databases, and provide unified interface calling module to the upper strata.
3, search condition parsing module
This module is positioned at search dispatching system foremost, receive user's searching request, if user's searching request is single class condition, all possible combination that is input as a pictures (being called picture search), one section voice (being called phonetic search), one section subtitle strings (being called the captions search), one section video segment (being called the fragment search), certain class metadata or a plurality of genre metadata (every have only metadata terms all be called metasearch).
What receive for this module of picture search is not view data but image feature value after handling through client software, analyze be the eigenwert of image after this module to further handle eigenwert, eigenwert is converted into the data structure that image data base is accepted.In like manner also adopt such processing mode for the voice condition.For captions and metasearch condition, client is left intact, but the input of user at the interface directly sent to search dispatching system.This module can be carried out word segmentation processing to captions and metadata after being captions search or metasearch having discerned, and regeneration is fit to the data structure of captions and metadata database query interface.For the fragment search, client can extract a plurality of pictures, and the eigenwert of extracting a plurality of pictures again sends to search dispatching system.It is can handle according to the flow process of picture search after the fragment search that the search condition parsing module identifies.In the processing procedure of this module, a fragment search is equivalent to repeatedly picture search.
4, search condition distribution module
This module is finished the forwarding work from user's condition to the corresponding data bank interface.For picture search and fragment search, condition data is forwarded to the image data base query interface; For phonetic search, condition data is forwarded to the speech database query interface; For the captions search, condition data is forwarded to the caption database query interface; For metasearch, condition data is forwarded to the metadatabase query interface.
5, scoring policy module
The various parameters that needed when this module has been preserved the result marked and the various parameters of system running pattern have mainly comprised 4 class parameters.Preceding 3 classes are relevant with scoring, and the 4th class is relevant with operational mode.These parameters are read in from configuration file in system start-up, do the configurability that has improved system like this, also can change operational factor by saying the word to system in the process of system's operation.
1) the default weighted value of all kinds of metadata fields
Programm name weighted value, PD program director's weighted value, program performer weighted value, program languages weighted value, program place of production weighted value, program classification weighted value, program format weighted value, program brief introduction weighted value have been comprised.Each weighted value is the floating number between 0 to 1, and all the weighted value sum is 1.
2) the default weighted value of camera lens reproducible results
Comprised that the image lens reproducible results is preset weighted value, voice camera lens reproducible results is preset weighted value, the default weighted value of captions camera lens reproducible results.
3) all kinds of medium results' default weighted value
Comprised that image result is preset weighted value, sound result is preset weighted value, the default weighted value of captions result, the default weighted value of metadata result.Each weighted value is the floating number between 0 to 1, and all the weighted value sum is 1.
4) operational mode parameter
The thread mode that has comprised various single classes search, because the dissimilar asynchronism(-nization)s that search consumed is more consuming time such as image and phonetic search, second level often; Metadata and captions search speed are than very fast, generally at 10 milliseconds once.So the speed of query composition is reached the most just must allow the search parallel processing of some class.Our way is to allow picture search and phonetic search use independent thread, and metadata and captions search are adopted serial processing.Just approximate the maximum time that consumes in single class search the T.T. of whole like this query composition.
6, single class conditional search grading module as a result
1) picture search result scoring algorithm
Similarity degree between the image of storing in the image of user's input and the image data base is to weigh with the distance between two width of cloth images, and distance value is more little represents that then two width of cloth images are similar more.But such expression mode can be very inconvenient when handling multiclass condition query composition, because the number percent representation has all been adopted in the search of other kinds in expression condition and result's similarity degree, therefore must be apart from being converted into the similarity of representing with number percent.When transforming a sill value must be set, every record greater than this sill value all is filtered and does not return to the user, is that the benchmark value of adjusting the distance is done normalized with this sill value then, subtracts 1 again and just becomes the similarity of representing with number percent.
2) scoring algorithm of captions Search Results
The main foundation of the matching degree between the program at the subtitle fragment place in the judgement caption database and the program that the user wishes to search out is:
Whether comprise complete condition captions string in A, the subtitle fragment, how many frequencies that condition captions string occurs in subtitle fragment is.
B, condition captions string are divided into after a plurality of speech, and what speech appear in the subtitle fragment, and how many frequencies that each speech occurs is.
If satisfy A then can obtain very high scoring, high more then this result's of frequency that complete condition captions string occurs scoring also can be high more, if do not comprise complete condition captions string in the subtitle fragment then mainly investigate the speech that comprises in the subtitle fragment in what condition captions strings, the scoring more at most that comprises is high more, and the frequency that the speech in the condition captions string occurs is high more, and then scoring also can be high more.As long as but satisfy just affirming of A standard than the scoring height that satisfies the B standard.
3) metasearch result's scoring algorithm
Metasearch is traditional way of search, though content-based search and metasearch have a great difference, but be not isolated fully two classes search between them, if on the contrary program is manually made a catalogue, the search efficiency of metadata and precision all are very high.If content-based search and metasearch combine then can improve the performance of content-based search greatly.So when making up content-based video searching system, we are also included metasearch.Metadata type is very many, here adopted a part of metadata item that function of search can be provided, and they are: the place of production of the performer of the name of program, the director of program, program, the languages of program, program, the type of program, the form of program, the brief introduction of program.The user comprises two classes at the search of metadata, and a kind of is full library searching: the key word of user's input can be used to all metadata type fields in search metadata storehouse.Here our marking mode of employing is the scoring of at first calculating at every this result items of class metadata, computing method are identical with the scoring algorithm of captions Search Results, after the scoring that has obtained at every metadata, we can be according to the importance degree of different metadata to every metadata scoring weighted sum, every class metadata can be distributed a weighted value in advance, and (for example the name of program will be higher than the place of production of program for the importance of video search, the weighted value of program name certainly can be high more a lot of than the weighted value in the place of production), it is the floating number between 0 to 1.What weighted sum obtained is exactly the final scoring of this result items.
Second kind of combinatorial search that form is the multiclass metadata item of metasearch specified the director of search program to comprise the program that Zhang Yimou and performer comprise Li Lianjie such as the user, Here it is combinatorial search that comprises two class metadata.The methods of marking of this class metadata is: at first still in the calculation combination type at the scoring of each genre metadata, then scoring weighted sum at single class metadata, but this moment, the weighted value selected for use was not the default weighted value of original this type of metadata, and will the default weighted value of this type of metadata be converted, the method that converts be with the default weighted value of the whole metadata type in the composite type and be unit one, obtain the weight factor of the shared number percent of the default weighted value of every class metadata in the composite type respectively as weighted sum.
4) phonetic search result's scoring
Scoring for phonetic search then mainly depends on the similarity that the speech database query interface returns, and we just use this to be worth consequent scoring.
5) scoring algorithm of camera lens reproducible results
The situation that has the result of repetition from the result that database returns only is present in content-based search.Why can produce such situation and be because the decomposition granularity of video content and video segment granularity that customer requirements returns are inconsistent.When design of graphics picture, captions, speech data, video content is to be that minimum unit is stored in respectively in the three class databases with the camera lens.What then require to return in client is video scene, can comprise a plurality of video lens in the video scene.When carrying out the search of this three class user's search condition with database in media data all be to be that least unit is carried out matched and searched with the camera lens when mating.Also all be to be the result of unit when returning to search dispatching system with the camera lens.This just requires us only generating a scene result as a result the time in the face of a plurality of camera lenses of Same Scene.
Scene result's scoring mainly depends on the camera lens that occurs in best result in a plurality of camera lens results scoring and Same Scene number as a result.We can pick out the best result among the camera lens result, and further adjust as the benchmark of scene result scoring with it again.Many scenes result's scoring also can be high more more to belong to the camera lens number of results of same scene in the algorithm of adjusting.We can preestablish three values, and they are respectively the default weighted value of image lens reproducible results, the default weighted value of voice camera lens reproducible results, the default weighted value of captions camera lens reproducible results.Adjustment algorithm is to be multiplied by the weighted value of camera lens reproducible results except higher assessment different camera lens results scorings exceptionally, and is then to these value summations, last again with this with add that the highest scoring must arrive scene result's final scoring.
7, multiclass condition query composition result fusion, grading module
The maximum characteristics of content-based video search are exactly according to multiclass media content conditional search video content, utilize the difference of the quantity of information of the inner link of the multiclass media content belong to same video content and the reflection of different media content can improve the efficient and the precision of video search greatly.
So client can send polytype combined retrieval condition to dispatching system in video search, image, voice, captions, these a few class media contents of metadata can combination in any.For the such situation of energy flexible processing, we have preset a weighted value rule of thumb for every class medium, and all weighted value additions get 1, and each weighted value is 0 to 1 floating number.Many times the type of search condition is not to comprise all medium types, calculate the scoring of end product so can not directly use the predetermined weights value, must do linear transformation to the predetermined weights value again, purpose is in keep with the influence of this search different media types to the result.Specific practice is done normalized for the weighted value of medium condition that this search is comprised, for example the default weighted value of metadata is 0.6, the default weighted value of image is 0.2, and then the normalized weight of metadata is 0.6/ (0.6+0.2)=0.75, and the normalized weight of image is 0.25.
8, Search Results generation module
The work that will carry out after the net result scoring is finished is exactly to the descending sort of net result collection according to scoring, fail to provide complete program recommended information for image, voice, captions, part metasearch owing to multimedia database, so also will generate final result according to the counter again metadatabase of looking into of program ID, the user could complete this program of understanding be that he needs like this.