CA2856843A1 - Audio fingerprint for content identification - Google Patents

Audio fingerprint for content identification Download PDF

Info

Publication number
CA2856843A1
CA2856843A1 CA2856843A CA2856843A CA2856843A1 CA 2856843 A1 CA2856843 A1 CA 2856843A1 CA 2856843 A CA2856843 A CA 2856843A CA 2856843 A CA2856843 A CA 2856843A CA 2856843 A1 CA2856843 A1 CA 2856843A1
Authority
CA
Canada
Prior art keywords
content
audio signal
audio
particular segment
television
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA2856843A
Other languages
French (fr)
Other versions
CA2856843C (en
Inventor
Malcolm Slaney
Andres Hernandez Schafhauser
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yahoo Inc
Original Assignee
Yahoo Inc until 2017
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yahoo Inc until 2017 filed Critical Yahoo Inc until 2017
Publication of CA2856843A1 publication Critical patent/CA2856843A1/en
Application granted granted Critical
Publication of CA2856843C publication Critical patent/CA2856843C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • H04N21/8352Generation of protective data, e.g. certificates involving content or source identification data, e.g. Unique Material Identifier [UMID]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors

Abstract

Methods and system for identifying multimedia content streaming through a television includes retrieving an audio signal from a multimedia content selected for rendering at the television. The retrieved audio signal is partitioned into a plurality of segments of small intervals. A particular segment is analyzed to identify acoustic modulation and to generate a distinct vector for the particular segment based on the acoustic modulation, wherein the vector defines a unique fingerprint of the particular segment of the audio signal. A content database on a server is queried using the vector of the particular segment to obtain content information for multimedia content that matches the fingerprint of the particular segment. The content information is used to identify the multimedia content and the source of the multimedia content that matches the audio signal received for rendering.

Description

AUDIO FINGERPRINT FOR CONTENT IDENTIFICATION
BACKGROUND
Field of the Invention [0001] The present invention relates to audio fingerprinting, and more particularly, to audio fingerprinting for connected television.
Description of the Related Art
[0002] Television viewing has changed over the years. The advancement in technology have allowed television manufacturers to integrate the Internet and web features into television sets providing the ability to connect to and access online interactive media, Internet TV, over-the-top content and on-demand streaming media through these television sets. In addition to the television sets, some of the external devices, such as set-top boxes, Blu-ray players, game consoles and other companion devices, also come equipped with these Internet and web features so as to enable conventional television sets without such integrated features to access the Internet and web features through these external devices. With these Internet-equipped television sets, viewers are able to search and find videos, movies, photos and other content available on the web, available locally or provided directly by content providers, such as cable content providers, satellite content providers, other users, etc. The Internet features incorporated in the TVs and external devices also offer integration with social network sites so as to allow the viewers to interact socially while allowing traditional TV viewing.
[0003] The Internet equipped television sets engage various applications to allow a user to search and select the content for viewing. However, the identity of the content to be viewed and/or the source of the content may not be available at the television set.
It would be advantageous to be able to identify through a fingerprint the content that is selected for viewing so that additional information related to the content and promotional content, including event related content, can be presented to the viewers. In the current information age, any additional information related to the content is shown to increase user engagement and user satisfaction.
[0004] It is in this context that the embodiments of the invention arise.

SUMMARY
[0005] Embodiments of the present invention describe methods and systems that allow identification of multimedia content selected for viewing on a television. An algorithm executed by a processor of an Internet-enabled television set or an external device retrieves an audio signal from a multimedia content selected for rendering at a television device, performs fingerprinting of a portion of the audio signal by examining modulation characteristics of the audio signal and uses the fingerprint to identify information related to content from a content provider. The content information may be used to identify additional information or promotional media related to the content or for generating an event for rendering alongside the content.
[0006] The embodiments provide a way to determine the source of a multimedia content, such as a video content, using audio signal. Since most of the protected content is identifiable given the audio, analyzing images of the multimedia content is not as important as analyzing the spoken words and music that are broadcast. The current embodiments provide a way to focus on a small segment of the audio signal to identify the entire content by extracting the audio portion of the multimedia content selected for rendering, fingerprinting the audio portion and matching the fingerprint to a corresponding audio portion of multimedia content available in a database to determine the multimedia content. The current embodiments provide an efficient algorithm that focuses on the modulation characteristics of a portion of the audio signal to match to multimedia content obtained from a plurality of content providers. The algorithm also provides the ability to verify that the audio signal is for the same content by storing information related to the content in a local cache and performing periodic verification of the audio signal streaming to the television set. The algorithm performs periodic verification by generating new fingerprints for the streaming audio signal and comparing against the content information in the local cache to determine if the signals continue to match to the content in the local cache or if there is a deviation. If there is deviation, then the algorithm initiates a search on a database server to find a match of content stored therein and the matching cycle continues. If there is no deviation, there is no need to query a database server for finding a match, thereby resulting in resource optimization and matching speed while providing an efficient and accurate matching of the content.
[0007] It should be appreciated that the present invention can be implemented in numerous ways, such as, methods and systems. Several inventive embodiments of the present invention are described below.
[0008] In one embodiment, a method for identifying multimedia content streaming through a television is disclosed. The method includes retrieving an audio signal from multimedia content selected for rendering at the television. The retrieved audio signal is partitioned into a plurality of segments of small intervals. A particular segment is analyzed to identify acoustic modulations and to generate a distinct vector for the particular segment based on the acoustic modulation. The vector defines an unique fingerprint of the particular segment of the audio signal. A content database on a server is queried using the vector of the particular segment to obtain content information for multimedia content that matches the fingerprint of the particular segment. The content information is used to identify the multimedia content and the source of the multimedia content that matches the audio signal received for rendering.
[0009] In yet another embodiment, a method for identifying content streaming through a television is disclosed. The method includes retrieving an audio signal from a content selected for rendering at the television. The audio signal is partitioned into a plurality of segments of small intervals. A particular segment of the audio signal is analyzed to identify acoustic modulations to generate a vector for the particular segment based on the acoustic modulation.
The vector identifies a plurality of floating point numbers related to data points of the particular segment and defines a unique audio fingerprint for the particular segment of the audio signal. A
content database is searched to identify one or more content with audio segments having data points that are closest to the plurality of floating point numbers of the particular segment. The content database is a repository of pre-computed data points for a plurality of audio segments representing different portions of a plurality of audio signals for a plurality of content obtained from a plurality of content providers. A content with an audio segment that has data points closest to the floating point numbers of the particular segment is identified.
A content provider database is queried using a content identifier of the content with the audio segment that matches the particular segment. A portion of the content is received from the content provider database in response to the query. The portion of the content includes content recording matching the particular segment and additional recording for a pre-defined amount of time.
The portion of the content received from the content provider database is used in subsequent matching of the audio signal streaming through the television.
[0010] In yet another embodiment, a method for matching promotional media for content streaming through a television is disclosed. The method includes retrieving an audio signal from a content selected for rendering at the television. The audio signal is partitioned into a plurality of segments of small intervals. A particular segment of the audio signal is analyzed to identify modulation characteristics and to generate a vector of a plurality of floating point numbers related to data points associated with the audio segment. The vector defines a unique fingerprint of the audio segment. A content database is searched to identify a content having an audio segment with data points that are closest to the plurality of floating point numbers of the particular segment of the audio signal. The content database is a repository of pre-computed data points for a plurality of audio segments representing different portions of a plurality of audio signals associated with a plurality of content obtained from a plurality of content providers. A
promotional media related to the content is identified from a service database using the fingerprint of the particular segment. A portion of content is received from a content provider database and metadata and assets related to the identified promotional media from an ad campaign database. Multimedia content for the promotional media is assembled using the retrieved metadata and assets for rendering alongside the content related to the audio signal streaming over the television.
[0011] Thus, the embodiments of the invention provide an efficient search and matching algorithm for identifying a source of the content streaming through the television set by fingerprinting a portion of the audio signal extracted from the content using acoustic modulation and matching the fingerprint against content stored in a content database. The matching algorithm uses optimal system resources while providing efficient matching.
The algorithm continues to verify the validity of the matching through periodic fingerprinting and matching.
The algorithm uses the result of the periodic matching to identify and update event or additional information for rendering alongside the content. The additional information relates to the content currently streaming through the television set and is provided alongside the content in a seamless manner, thereby enhancing the users television viewing experience.
The satisfactory user experience can be exploited to increase the monetization by targeting appropriate promotional media to the user.
[0012] Other aspects of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The invention may best be understood by reference to the following description taken in conjunction with the accompanying drawings.
[0014] Figure 1 illustrates a simplified overview of a system equipped with an algorithm including various modules within the algorithm for identifying a source and content of multimedia content streaming through a television, in one embodiment of the invention.
[0015] Figures 2a-2f illustrate a simplified graph comparing modulation characteristics of a sample audio segment of an audio signal using C and Matlab implementation of an algorithm, in one embodiment of the invention.
[0016] Figure 3 illustrates a graphical representation of a locality sensitive hashing technology 5 used in matching a particular segment to a corresponding segment of content, in one embodiment of the invention.
[0017] Figure 4 illustrates an exemplary modulation flowchart that is used to generate a distinct vector by analyzing the modulation characteristics of an audio segment, in one embodiment.
[0018] Figure 5 illustrate, is an exemplary audio fingerprint flowchart followed by an algorithm to generate a fingerprint of an audio segment, in one embodiment of the invention.
[0019] Figure 6 illustrates a flow chart of process flow operations used by an algorithm for identifying multimedia content streaming through a television, in one embodiment of the invention.
[0020] Figure 7 illustrates a flow chart of various process flow operations used by an algorithm for identifying multimedia content streaming through a television, in an alternate embodiment of the invention.
[0021] Figure 8 illustrates an alternate embodiment identifying process flow operations for matching promotional media to content streaming through a television.
DETAILED DESCRIPTION
[0022] Broadly speaking, the embodiments of the present invention provide methods and systems to identify multimedia content streaming through a television. An algorithm executing on a processor of an Internet-enabled television or an Internet-enabled external device connected to the television selects an audio segment from the content selected for rendering, generates audio fingerprint and uses the audio fingerprint to identify a source of multimedia content and multimedia content information. The algorithm performs the matching using acoustic modulation characteristics of the audio segment and ensures proper matching through periodic verification while using network resources in an optimal and effective manner.
The algorithm utilizes a local cache available to the algorithm for storing matching content and performing periodic verification to ensure that the identified content continues to relate to the streaming content at the television. The algorithm also uses the multimedia content information to identify additional information, such as promotional media and/or event related to the content for rendering alongside the content.
[0023] With the brief overview, various embodiments of the invention will now be described in detail with reference to the figures. Figure 1 illustrates a simplified overview of the system identifying high-level software/hardware modules that are used to identify multimedia content streaming to a television. The system includes a rendering device, such as a television 100, to request and receive content from a content provider. In one embodiment, the television includes an Internet-connection interface 110-a integrated into the television. In another embodiment, the television is connected to an external device, such as a set-top box 110-b with integrated Internet-enabled interface. The Internet-connection/enabled interface, for example, may include Internet protocol suite to receive television services over the Internet, instead of being delivered through traditional modes, such as satellite signal or cable television formats. The television services may include live television, time-shifted television and video-on-demand (VOD) content. Typically, in the Internet-enabled television, the content remains on a content provider's network servers and the requested program is streamed to the television. As a result, the Internet-connection interface in the television is unaware of the source of the requested content and information related to the content. The television is also equipped with a hardware audio-capturing system (HACs) 115 that is configured to interact with the Internet-enabled/connected interface and extract a portion of the audio signal from the content selected from a content provider's network server for streaming to the television, wherein the content selected for streaming is in response to a request by a viewer and could be any one of live television, time-shifted television or VOD content. The HACs interacts with an algorithm 120, such as audio processing algorithm, available at the television to transmit the audio signal captured from the Internet-connection interface for further processing.
[0024] The algorithm 120 receives a portion of the audio signal and partitions the portion of the audio signal into a plurality of segments of small intervals. In one embodiment, the portion of audio signal received by the algorithm may be partitioned into segments of 5 second intervals.
The algorithm then selects a particular segment for analyzing. In one embodiment, the algorithm may select a particular segment for analysis based on the payload data of the content contained within. The algorithm then analyzes the particular audio segment to determine acoustic modulations of the audio signal and to generate a distinct vector of floating-point numbers. The vector defines the audio fingerprint for the audio signal based on the modulation characteristic of the particular segment. The process of generating a distinct vector defining the audio fingerprint will be described further down with reference to Figure 1. In one embodiment, using the generated vector, the algorithm queries a content database available on a local server associated with the television to find a match of the fingerprint with data available on the server. The process of matching the fingerprint to content in a content database will be described in detail later with reference to other figures. Upon finding a match, the algorithm obtains content information including source of the multimedia content from a content database. The algorithm may use the content information to retrieve content recording that covers a time of the particular segment and additional recording for a pre-defined amount of time and store it in a local cache 125. The information in the local cache may be used by the algorithm to further verify the content streaming through the television.
[0025] In another embodiment, the local cache may be used to pre-populate content and corresponding fingerprints and the algorithm may use the information in the local cache to find a match of the segment of audio signal. In this embodiment, the backend server dynamically collects content related information and the corresponding fingerprint information based on what the user of the television device normally watches, watches more often, what is popular in a specific geographical area of the user (using a zip code of the user), etc.
When a user selects content for watching on the television, the algorithm at the television requests the server to download the cache. The server, in response to the request from the algorithm, pushes different subsets of content and the corresponding matching fingerprints onto the local cache of the television. The algorithm then uses the information in the local cache to identify the content selected by the user. The information in the local cache can be used until it expires. When the information expires, the algorithm sends a refresh requests for the content and the fingerprints associated with the content to the backend server and the backend server will forward appropriate content and fingerprint information to load the local cache.
[0026] In one embodiment, the algorithm performs fingerprint matching by querying one or more database available on one or more network servers. For instance, the algorithm may first generate a fingerprint of the selected segment of audio signal and query a content database 210 on a network server to find a match of the fingerprint. The content database may be a repository of fingerprints for a plurality of portions of a plurality of audio signals obtained from a plurality of content providers. In one embodiment, the content information from a plurality of content providers may be obtained ahead of time and stored in a content database on a server that is locally available to the algorithm so that the content can be easily identified irrespective of time and location where it is broadcast. The audio portion of the content in the content database may be fingerprinted and these fingerprints may be stored either alongside the content or in a separate database on a server that is equipped with a search software and used in the matching of the content that is presently selected for viewing at the television. The search software on the server aids in searching the database and finding a match for content. Using the information, the algorithm executing on a processor of the television then queries a second server, such as an event server or business information service (BIS) server, to determine if there are any BIS
service(s), ad campaigns or events for this audio scheduled for the particular time of day that the selected content is streamed. If a service, event or ad campaign scheduled for the time period is found, then the algorithm fetches metadata and assets of the service/event/ad campaign from an ad campaign database to create an application or video for the service/ad campaign. The application or video is rendered alongside the content streaming in the television and provides additional information or promotional media related to the content. A viewer viewing the selected content is provided with additional information that is most relevant to the content being viewed thereby enriching the user's viewing experience. The algorithm provides the ability to extract features of a small portion of the audio signal and use it to match and describe complete video content selected for streaming.
[0027] Feature extraction and fingerprinting will now be described in detail with reference to Figure 1. In a typical audio/video recording, the peaks and transitions of computed features of the media do not change much during editing, compression and transmission.
Further, in the speech world, it is determined that most of the speech information is centered around 4 Hz.
Consequently, the algorithm captures modulation characteristics of the audio signal using modulation spectrogram and uses audio-modulation fingerprint technology to fingerprint the video. The algorithm generates the spectrogram over time for a particular selected segment of the audio signal and looks for energy distributed around different frequencies. In order to achieve this, the audio signal within the selected segment is split into different bands/channels using bandpass filters. In one embodiment, the selected audio segment is split using 13 linearly-spaced filters to obtain 13 different channels. Additional information related to splitting of the audio signal using bandpass filters is described in "Auditory Toolbox"
available at https://engineering.purdue.edui-ma1co1iTilinterval/1998-01 0i , which is incorporated herein by reference. One or more channels may be combined to provide wider channels for the analysis.
[0028] After obtaining the audio signal in different channels, the algorithm computes the modulation energy in each channel by taking absolute value of each channel's signal and then uses a low-pass filter with a cut-off frequency at 6 Hz to smooth the response. The modulation energy is a rough measure of temporal information in the channel. The modulation energy provides an important measure of how the audio signal changes over time. In one embodiment, the algorithm uses fast Fourier transform algorithm (FFT) to analyze modulation in each channel. The magnitude obtained from the FFT provides a measure of how much energy is in each channel at each frequency. Figure 5 illustrates an audio fingerprint flowchart followed by the algorithm to generate the audio fingerprint for the audio segment extracted from the content streaming to the television, in one embodiment of the invention. As illustrated, the fingerprint is generated by extracting an audio signal from the streaming content and passing a particular segment of the audio signal through a filterbank to split the audio segment into a plurality of channels at different frequencies. The magnitude of modulation at each channel in each frequency is measured to determine the energy distribution in each channel at each frequency.
[0029] Focusing just on the magnitude and ignoring the phase of the frequency spectrum, enables the algorithm to obtain same fingerprint for the content even when the audio data has shifted slightly in the analysis window. Using the modulation spectrogram, the algorithm computes, for each bandpass channel, 18 measurements of each channel's modulation at frequencies from 0 Hz (DC) to about 6 Hz. The 18 measurements are selectively chosen from a two-dimensional array of channel number versus modulation frequency. Thus, with 13 channels of modulation spectrum and 18 independent frequency measurements at each channel, the algorithm computes a single, distinct vector of 234 elements (i.e. 13 * 18) for the selected segment of the audio signal. Each of the elements of the vector is a data point represented as a floating point number. This distinct vector succinctly describes the modulation in the audio signal over the short segment and forms the fingerprint for the audio signal.
[0030] Figure 4 illustrates a modulation flowchart followed by the algorithm to generate a distinct vector for an audio segment of a audio signal extracted from a content that is selected for streaming at the television. The algorithm examines acoustic modulation of a particular channel and uses FFT to generate an acoustic spectrum for the particular channel.
Selective data points from the acoustic spectrum (234 data points) are selected to compute a vector of the audio segment.
[0031] Figures 2a-2f illustrate audio signal spectrograms generated and used by the algorithm to match to content from a content provider. Figures 2a, 2band 2c are generated using a Matlab implementation of three-modulated tone test with frequency modulation 441, 881 and 1201 Hz modulated with 2, 3 and 4 Hz. When a low frequency modulator filter (for e.g.
2 Hz) is used, a low channel with a low modulation frequency is recorded, as illustrated by Figures 2a (Matlab implementation). Similarly, Figure 2b illustrates the result from a slightly higher frequency modulator filter of 3 Hz and Figure 2c illustrates the result from a still higher frequency modulator filter of 4 Hz. It should be noted herein that the audio signal spectrogram generated by using Matlab implementation is exemplary and should not be considered restrictive. Other types of implementation, such as C implementation, may be used, as shown in Figures 2d, 2e and 2f. It can be noticed from Figures 2a-2f that the results from the C
implementation are similar to results from the Matlab implementations of modulator frequency at each of the 3 different frequencies. Further, each frequency of sound has its own unique fingerprint and the audio signal with these different frequencies will generate its own unique combination of fingerprint. The bigger the fingerprint the easier it is to match. In order to get a good sampling, a 5 second window is selected for segmentation and fingerprinting, in one embodiment. The time period used for segmenting the audio signal, the number of channels and the number of frequency are exemplary and should not be considered restrictive.
[0032] After generating the spectrogram for a particular audio segment and generating a distinct vector, the algorithm uses the vector to find a match of content in a content database. The 5 content database may be located on a server and available to the algorithm through the network, such as the Internet. The content database is a repository of content received from a plurality of content providers with audio signals of the content already fingerprinted. The fingerprint of the audio signals are stored alongside the content or in a separate database with each fingerprint mapped to the content. The algorithm may use various techniques to find a match of the vector.
10 In one embodiment, the algorithm uses a randomized algorithm, such as Locality Sensitive hashing (LSH) methodology, to look up and find a match of the content in the content database.
When new content is selected for streaming to the television, the algorithm captures the audio portion of the content and partitions the content into segments of small intervals of 5 seconds, for example. The algorithm then performs the same analysis (explained earlier) to obtain a fingerprint of a particular segment of the captured audio signal and the fingerprint of the captured audio signal is matched against the ones stored in the database using the floating point numbers of the vector. It should be noted that even if the content of the captured audio signal is the same as an audio signal in the content database, the signals might not exactly match. This might be due to the fact that the audio signal in the database may have undergone different compression technique and have a different temporal offset than the audio signal associated with the particular segment that is being matched. Thus, direct and regular matching will not provide the expected matching result. In order to accommodate this change in the compression techniques, the algorithm may use the LSH technique to find nearest neighbor match.
[0033] Figure 3 illustrates the comparison of the fingerprint of a particular audio segment with the pre-determined fingerprints from a content database using the LSH matching technique. The LSH matching uses each of the 234 floating point numbers from the segment of audio signal for the new content streaming to the television and tries to match to corresponding data points of an audio signal for a content in the content database. The 234 floating point numbers were obtained using modulation spectrogram as explained earlier. It should be understood that generating a vector of 234 floating point numbers and using LSH matching technique for matching the vector of 234 floating point numbers is exemplary and should not be considered restrictive. As a result, alternate ways of matching the segment of audio signal may be employed. The algorithm computes the distance between each of the data points of an audio segment in the content database to the corresponding floating point numbers of the particular segment of audio signal.
When the algorithm finds a plurality of audio signals with data points that are closer to the corresponding data points of the particular audio signal, the algorithm determines the audio signal of content whose data points are closest to the data points defined by the floating point numbers in the vector of the particular audio segment. When more than a content has audio signals that are closest to the data points of the particular audio segment, we take a further sampling by taking a subsequent audio segment of the content selected for streaming, analyze the subsequent audio segment to define a second vector and use the second vector to find a match. The sampling, analyzing and matching may be continued till a good match is found. For more information about Locality Sensitive Hashing technique, reference can be made to the IEEE publication entitled, "Locality-Sensitive Hashing for Finding Nearest Neighbors," by Malcolm Slaney and Michael Casey, IEEE Signal Processing magazine, March 2008, which is incorporated herein by reference.
[0034] The matching of the content enables the algorithm to identify source of the content and to retrieve information associated with the content selected for streaming to the television. In one embodiment, the algorithm requests and receives content from a server that includes a match of fingerprint for the content for the period of the particular segment to which it is matched and also additional upcoming fingerprint for a pre-defined amount of time. The server interacts with a plurality of content providers and receives content from these sources. The additional content is used for subsequent matching of the audio signal. In one embodiment, the content and the additional content are received and stored in a local cache available to the algorithm. The algorithm may ensure that the audio segment is matched to the correct content by verifying that one or more of the subsequent segments of the audio signal continue to match with the audio segment of the content stored in the local cache. If the subsequent audio segments of the audio signal match the audio segments of the content, there is no need to query a server to obtain the content. Instead, the content may be provided from the local cache. If, on the other hand, the subsequent audio segments do not match with the content stored in the local cache, a new content from the content database matching the particular audio segment is retrieved and stored in the local cache for subsequent matching.
[0035] There are many options to cache and distribute the work using the audio fingerprint matching of the current embodiments. Some of the most important options include advance hinting, local caching, and verification. Advance hinting is a method where a single fingerprint request is answered with the matching content identifier and a sequence of upcoming fingerprints. The newly received fingerprint along with the content ID is stored in a local cache on the TV for subsequent reference and verification. The upcoming fingerprints allow the TV or set-top-box connected to the TV to identify what is coming in the future and simply check the newly calculated fingerprints of the content against the upcoming fingerprints stored in the local cache. If the newly calculated fingerprints match the expected upcoming fingerprints then there is no change in the content provider source, and no need to query the content provider for the content identifier.
[0036] In one embodiment, the local caching option is called, wherein the fingerprint and the content matching the fingerprint of the audio signal is downloaded and stored in the local cache for matching against the upcoming fingerprints of the audio signal. In another embodiment, content and a set of fingerprints related to a plurality of content is downloaded to the local device (i.e. TV) and stored in the local cache. In this embodiment, the set of fingerprints may relate to content that is scheduled for a specific period of time. The client can request and receive the set of fingerprints periodically, such as once a day or once every 3 hours, etc. In one embodiment, the client computes the fingerprint from the audio signal and only performs an action on the content, if the content matches one of the known fingerprints stored in the local cache. By performing action only when there is a match, network resources are preserved as the algorithm avoids making unnecessary server trips to find a match.
[0037] In one embodiment, the verification option is called, wherein the algorithm sends a request to the server along with a content identifier based on a best guess of the content. In one embodiment, the best guess of the content may be based on a previous query.
The server receiving such a request just verifies and confirms that the fingerprint received from the algorithm in the TV is indeed the expected fingerprint of the content related to the content identifier obtained in the request. This option also saves network resources as the server is already provided with enough content related information to identify the content. The local cache along with fingerprint, thus, provide for a faster and accurate match of the content that is selected for rendering at the TV while preserving network resources.
[0038] The content identity information is used by the algorithm to identify an event, promotional media or ad campaign and fetch metadata and assets for the ad campaign or event, in one embodiment of the invention. In this embodiment, metadata and assets are used to assemble a video or application for rendering alongside the content. Once the video or application is rendered alongside the content, the algorithm continues to verify the validity of the matching by continuing to perform matching of subsequent segments of audio signal to ensure that the content has not changed over time. If the content has changed, then the algorithm reinitializes the data in the local cache and starts the extraction of audio signal, generation of the distinct vector and matching of the vector to content in a content database to identify source of the new content and information related to the new content so that the promotional media or event can be identified and assembled for rendering with the new content.
[0039] Figure 6 illustrates a flow chart of operations used for identifying multimedia content streaming through a television, in one embodiment of the invention. The method begins at operation 710 wherein an audio signal is retrieved from a multimedia content selected for rendering at the television. The multimedia content may be obtained from any one of the content sources including satellite provider, cable provider, DVR, Blu-ray player, live media from the Internet. The multimedia content might be stored on a content provider server and streamed to the television upon request from a viewer. As a result, the source of the content and content information is not available at the Internet-connection interface of the television or external device connected to the television. In order to identify the source of the content and content information, an algorithm may partition the audio signal into a plurality of segments of small intervals, as illustrated in operation 720.
[0040] A particular segment of the audio signal is analyzed to identify acoustic modulations in the particular segment, as illustrated in operation 730. The particular segment is selected for analysis based on the payload data contained within. The analysis of the particular segment results in the identification of a plurality of data points represented by distinct floating point numbers. The plurality of floating point numbers are used to generate a vector. The vector of floating point numbers is used to query a content database on a server, as illustrated in operation 740. The server is equipped with a search algorithm that assists in the location of content from a particular content provider wherein the content of the particular content provider includes segment of data whose data points either match or is in close proximity to the floating point numbers of the particular segment. The content in the content database is obtained from a plurality of sources and the audio signal of these contents are pre-fingerprinted and stored alongside the content or in a separate database and mapped to the contents in the content database. As a result, when an audio segment of the content from a particular content provider matches the particular segment of content streaming to the television, information related to the content and the source of the content is retrieved from the content provider.
The retrieved information may be stored in a local cache and used for further verification of the content streaming through the television.
[0041] Figure 7 illustrates an alternate embodiment of the invention for identifying content streaming through a television. The process begins at operation 810 wherein an algorithm within the television recognizes selection of a particular content for streaming through the television. The content can be from any one of the content providers. An audio signal from the selected content is retrieved. The audio signal is partitioned into a plurality of small intervals, as illustrated in operation 820. In one embodiment, each segment partition is of a pre-set duration of time, such as 5 seconds. A particular segment within the plurality of segments is selected and analyzed to identify acoustic modulations within the particular segment, as illustrated in operation 830. The acoustic modulations are obtained by passing the audio segment through a bandpass filters and examining the modulation characteristics of the particular segment using FFT to identify energy distribution at each channel for each frequency of the audio segment.
The examination of the modulation characteristics results in identifying a selective set of data points represented by floating point numbers. The set of floating point numbers is used to compute a distinct vector. The vector defines an unique audio fingerprint of the particular segment.
[0042] A content database is searched to identify one or more content with audio segments having data points that either match or in close proximity to the floating point numbers of the vector of the particular segment, as illustrated in operation 840. As mentioned earlier, the content database includes content from a plurality of content providers having audio segments that have been fingerprinted by the algorithm using the same technique. When more than one audio segment from one or more content provider include data points that match to the data points of the particular audio segment, the algorithm identifies content having an audio segment that is closest to the floating point numbers of the particular segment. The algorithm then obtains a content identifier of the content with the audio segment that closely matches the audio segment of the particular segment, as illustrated in operation 850. A content provider database is queried using information, such as content identifier, obtained from the content database, as illustrated in operation 860. In response to the query, an ID portion of the identified content is received from the content provider database, as illustrated in operation 870.
The portion may include the identifier of content matching the particular segment and additional fingerprint for a pre-defined amount of time. In one embodiment, the additional recording may include recording for additional 15 seconds in addition to the 5 seconds related to the particular segment. The recording of audio content obtained from the content provider database is stored in a local cache and is used for further verification and for matching promotional media or event.
[0043] Figure 8 illustrates yet another alternate embodiment for matching promotional media for content streaming through a television. The method begins at operation 910, wherein an audio signal is retrieved from the content that is selected for rendering at the television. The audio signal is partitioned into a plurality of segments of small intervals, as illustrated in operation 920. A particular segment of the audio signal is selected for analysis to identify modulation characteristics, as illustrated in operation 930. The particular audio segment may be selected based on the payload contained within. The analysis of the particular segment includes generating an acoustic spectrogram for the particular segment and identifying a plurality of floating point numbers related to data points in the acoustic spectrogram that defines the acoustic modulation of the particular segment of audio signal. A distinct vector is computed as a function of the floating point numbers. The vector defines a unique audio fingerprint of the audio segment.
[0044] In operation 940, a content database is searched to identify content that includes audio 5 segment with data points that match or are in close proximity to the plurality of floating point numbers of the particular audio segment. The content database is a repository of pre-computed data points for a plurality of audio segments representing different portions of a plurality of audio signals for a plurality of content obtained from a plurality of content providers. Upon identifying content with audio signals matching the particular audio segment, the source of the 10 content and the content information related to the content may be retrieved from the content provider using a content identifier.
[0045] Using the content identifier, a promotional media or event related to the content is indentified from a service database using the fingerprint of the particular segment, as illustrated in operation 950. The content provider database is queried to obtain content from the content 15 provider database and an ad campaign database is queried to obtain metadata and assets related to the identified promotional media, as illustrated in operation 960. The process concludes with the assembly of the multimedia content from the content obtained from the content provider database and assembly of promotional media content/application using the metadata and assets retrieved from the ad campaign database for rendering at the television, as illustrated in operation 970. The promotional media content may be presented in the form of a widget either alongside the content or separately, in one embodiment of the invention.
[0046] The algorithm acts like a potential bridge for creating broadcast interactivity service (BIS) for a user by determining what content a particular user is watching on his/her television by extracting features of the content through audio fingerprinting of a small segment of the audio signal related to the content and identifying a particular application or promotional multimedia related to the content for rendering alongside the content. The small segment of audio is matched against audio of a plurality of content received from content providers/broadcasters scheduled for the specific period of time, using modulation detection process wherein the two signals are matched based on their modulation similarities. This approach uses less CPU
resources and time but provides more efficient and accurate match. In addition to the modulation match, the algorithm also provides for faster matches by enabling a recording of the matched content for the time segment and for an additional predefined amount of time to be stored locally in a local cache of the television and by continuing to verify that the identified content continues to match the audio signal of multimedia content selected for rendering at the television. When a user changes the multimedia content selected for viewing, the algorithm determines that the content stored in the local cache does not match and flushes the content. The algorithm then goes through the audio fingerprinting using HACs and LSH
technology as described earlier, making this a more robust and efficient algorithmic tool.
[0047] Embodiments of the present invention may be practiced with various computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers and the like. The invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a wire-based or wireless network.
[0048] With the above embodiments in mind, it should be understood that the invention could employ various computer-implemented operations involving data stored in computer systems.
These operations can include the physical transformations of data, saving of data, and display of data. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared and otherwise manipulated. Data can also be stored in the network during capture and transmission over a network. The storage can be, for example, at network nodes and memory associated with a server, and other computing devices, including portable devices.
[0049] Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
[0050] The invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data, which can thereafter be read by a computer system. The computer readable medium can also be distributed over a network-coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
[0051] Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications can be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
What is claimed is:

Claims (20)

1. A method for identifying multimedia content streaming through a television, the method executed by a processor of the television, comprising:
retrieving an audio signal from a multimedia content selected for rendering at the television;
partitioning the audio signal into a plurality of segments of small intervals;
analyzing a particular segment to identify acoustic modulations in the particular segment, the analysis generating a distinct vector for the particular segment based on the acoustic modulation, the vector defining an unique audio fingerprint of the particular segment of the audio signal; and querying a content database on a server using the vector of the particular segment of audio signal to obtain content information for multimedia content that matches the fingerprint of the particular segment, the content information used to obtain information related to the multimedia content from a content provider that matches the audio signal received for rendering.
2. The method of claim 1, wherein the audio signal is captured from multimedia content streamed to the television by a content provider or from a digital multimedia recording device.
3. The method of claim 1, wherein the small interval is a predefined interval of about 5 seconds.
4. The method of claim 1, wherein analyzing further includes, generating an acoustic spectrogram to identify acoustic modulation characteristics for the particular segment of audio signal at one or more frequencies, wherein the acoustic modulation characteristics are spread over a plurality of channels;
examining the acoustic modulation at each channel to measure magnitude, the magnitude identifying amount of energy in each channel at each frequency; and computing the vector of the particular segment of audio signal as a function of the measured magnitudes in each channel for each frequency for a time period associated with the particular segment of the audio signal, wherein the vector identifies a plurality of floating point numbers of data points representing the unique fingerprint for the particular segment of audio signal.
5. The method of claim 4, wherein the examination of the acoustic modulation and measuring of magnitude is done using Fast Fourier Transformation technology.
6. The method of claim 4, wherein querying further includes, searching the content database to identify one or more multimedia content with audio segments having data points that are closest to the plurality of floating point numbers of the particular segment of the audio signal, the content database being a repository of pre-computed data points for a plurality of audio segments representing different portions of a plurality of audio signals for multimedia content obtained from a plurality of content providers;
computing distance between data points of each audio segment of the identified multimedia content and the floating point numbers of the particular segment using iterative computation; and selecting the multimedia content that has data points closest to the floating point numbers, wherein the multimedia content is referenced using a unique identifier.
7. The method of claim 6, further includes retrieving multimedia content related to the entry from the content provider using the unique identifier, the multimedia content including multimedia content matching the particular segment and additional multimedia content for pre-defined amount of time related to the audio signal currently being rendered at the television, the retrieved multimedia content stored in a local cache of the television for subsequent verification of the audio signal for the content that continues to stream through the television.
8. The method of claim 6, further includes, when more than one multimedia content has data points closest to the floating point numbers of the particular segment, performing additional matching by selecting one or more additional segments of the audio signal for the content currently selected for rendering at the television.
9. The method of claim 1, further includes, identifying an event or promotional media related to the multimedia content that is scheduled for rendering from a service database, the event or promotional media identified using information from the fingerprint of the particular segment;
retrieving metadata and assets related to the identified event or promotional media from an ad campaign database; and assembling an application or multimedia content associated with the event or the promotional media using the retrieved metadata and assets, the assembled application or multimedia content related to event or promotional media rendered alongside the multimedia content related to the audio signal at the television.
10. A method for identifying content streaming through a television, the method executed by a processor of the television, comprising:
retrieving an audio signal from a content selected for rendering at the television;
partitioning the audio signal into a plurality of segments of small intervals;
analyzing a particular segment to identify acoustic modulations in the particular segment, the analysis generating a vector for the particular segment based on the acoustic modulation, the vector identifying a plurality of floating point numbers related to data points of the particular segment, the vector defining an unique audio fingerprint of the particular segment of the audio signal;
searching a content database to identify one or more content with audio segments having data points that are closest to the plurality of floating point numbers of the particular segment, the content database being a repository of pre-computed data points for a plurality of audio segments representing different portions of a plurality of audio signals for a plurality of content obtained from a plurality of content providers;
obtaining a content identifier of a content having an audio segment that has data points closest to the floating point numbers of the particular segment;
querying a content provider database using the content identifier for information related to the content with an audio segment that matches the particular audio segment; and receiving a portion of the content from the content provider database in response to the query, the portion of the content includes content recording matching the particular segment and additional recording for a pre-defined amount of time, the additional recording defining a sequence of audio fingerprints for the multimedia content, the portion of the content and additional recording received from the content provider database used in further matching subsequent segments of the audio signal.
11. The method of claim 10, wherein analyzing further includes, generating an acoustic spectrogram to identify acoustic modulation characteristics for the particular segment of audio signal at one or more frequencies, wherein the acoustic modulation characteristics are spread over a plurality of channels;
examining the acoustic modulation at each channel to measure magnitude, the magnitude identifying amount of energy in each channel at each frequency, the examining identifying data points related to the acoustic modulation of the particular segment of audio signal; and computing the vector of the particular segment of audio signal as a function of the measured magnitudes in each channel for each frequency for a time period associated with the particular segment of the audio signal, wherein the vector identifies a plurality of floating point numbers related to data points of the particular segment, the vector representing the unique fingerprint for the particular segment of audio signal.
12. The method of claim 10, wherein identifying the content identifier further includes, computing distance between data points of each content in the content database and corresponding floating point numbers of the audio segment using iterative computation; and identifying the content with a set of data points that are closest to the corresponding floating point numbers of the audio segment.
13. The method of claim 10, further includes storing the portion of the content and additional recording received from the content provider database in a local cache accessible to the processor of the television for further verification of the content of the audio signal streaming through the television.
14. The method of claim 13, further includes, generating additional fingerprints for additional segment of the streaming audio signal periodically; and comparing the additional fingerprints against the fingerprint and the sequence of fingerprint of the content and additional recording stored in the local cache to determine if the streaming audio signals continue to match the content in the local cache.
15. The method of claim 14, further includes, when the additional fingerprint does not match the fingerprint of the content stored in the local cache, clearing the content from the local cache;
initiating a search by querying the content database to identify content that matches the additional segment using the additional fingerprints; and retrieving content from the content provider database for storing in the local cache for subsequent verification.
16. The method of claim 10, further includes, identifying a promotional media related to the content from a service database, the promotional media identified using information from the fingerprint of the particular segment;
retrieving metadata and assets related to the identified promotional media from an ad campaign database; and assembling multimedia content for the promotional media using the retrieved metadata and assets, the assembled multimedia content related to the promotional media rendered alongside the content related to the audio signal at the television.
17. A method for identifying content streaming through a television, the method executed by a processor of the television, comprising:
retrieving a set of audio fingerprints associated with a plurality of contents that are scheduled for rendering;
storing the set of audio fingerprints in a local cache associated with the television;
receiving a request for rendering a content on the television;
retrieving an audio signal for the content selected for rendering at the television;
analyzing a particular segment of the audio signal to identify acoustic modulations in the particular segment, the analysis generating a vector for the particular segment based on the acoustic modulation, the vector identifying a plurality of floating point numbers related to data points of the particular segment, the vector defining an unique audio fingerprint of the particular segment of the audio signal;
determining if a match is found for the audio fingerprint of the particular segment of the audio signal within the local cache by comparing audio fingerprint of the particular segment against the audio fingerprint of the plurality of contents;
when a match is found in the local cache, querying a content provider database using a content identifier of the particular content matching the audio fingerprint of the particular segment to obtain a portion of the particular content; and rendering the particular content obtained from the content provider database in response to the request from the user.
18. The method of claim 17, further includes, when the audio fingerprint of the particular segment of the audio signal does not match the fingerprints of any of the plurality of content stored in the local cache, forwarding a request to a content database for verification of a potential match of the audio fingerprint associated with the audio signal, wherein the request includes a content identifier for a content from a prior query;
receiving confirmation from the content database of the potential match of the audio fingerprint of the audio signal.
19. The method of claim 17, further includes, generating additional fingerprints for additional segment of the streaming audio signal periodically; and verifying the additional fingerprints continue to match the particular content in the local cache by comparing the additional fingerprints against the corresponding fingerprint of the particular content stored in the local cache.
20. The method of claim 17, wherein the set of audio fingerprints scheduled for rendering are retrieved periodically and stored in the local cache, and wherein the local cache is cleared prior to storing the retrieved audio fingerprints.
CA2856843A 2011-12-20 2012-11-30 Audio fingerprint for content identification Expired - Fee Related CA2856843C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13/332,331 US8949872B2 (en) 2011-12-20 2011-12-20 Audio fingerprint for content identification
US13/332,331 2011-12-20
PCT/US2012/067487 WO2013095893A1 (en) 2011-12-20 2012-11-30 Audio fingerprint for content identification

Publications (2)

Publication Number Publication Date
CA2856843A1 true CA2856843A1 (en) 2013-06-27
CA2856843C CA2856843C (en) 2017-03-21

Family

ID=48611641

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2856843A Expired - Fee Related CA2856843C (en) 2011-12-20 2012-11-30 Audio fingerprint for content identification

Country Status (7)

Country Link
US (1) US8949872B2 (en)
EP (1) EP2795913B1 (en)
CN (1) CN103999473B (en)
CA (1) CA2856843C (en)
HK (1) HK1199344A1 (en)
TW (1) TWI516100B (en)
WO (1) WO2013095893A1 (en)

Families Citing this family (83)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10334324B2 (en) 2008-11-26 2019-06-25 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US9154942B2 (en) 2008-11-26 2015-10-06 Free Stream Media Corp. Zero configuration communication between a browser and a networked media device
US10631068B2 (en) 2008-11-26 2020-04-21 Free Stream Media Corp. Content exposure attribution based on renderings of related content across multiple devices
US10977693B2 (en) 2008-11-26 2021-04-13 Free Stream Media Corp. Association of content identifier of audio-visual data with additional data through capture infrastructure
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US9519772B2 (en) 2008-11-26 2016-12-13 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10880340B2 (en) 2008-11-26 2020-12-29 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9961388B2 (en) 2008-11-26 2018-05-01 David Harrison Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US10567823B2 (en) 2008-11-26 2020-02-18 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US8180891B1 (en) 2008-11-26 2012-05-15 Free Stream Media Corp. Discovery, access control, and communication with networked services from within a security sandbox
US9986279B2 (en) 2008-11-26 2018-05-29 Free Stream Media Corp. Discovery, access control, and communication with networked services
US20110307918A1 (en) * 2010-06-11 2011-12-15 Brian Shuster Method and apparatus for interactive mobile coupon/offer delivery storage and redemption system using a receiving device and a second interactive device
US9461759B2 (en) 2011-08-30 2016-10-04 Iheartmedia Management Services, Inc. Identification of changed broadcast media items
US8586847B2 (en) * 2011-12-02 2013-11-19 The Echo Nest Corporation Musical fingerprinting based on onset intervals
US9292894B2 (en) * 2012-03-14 2016-03-22 Digimarc Corporation Content recognition and synchronization using local caching
EP2654315A1 (en) * 2012-04-18 2013-10-23 Harman International Industries, Incorporated Multimedia system and method of performing a playback by means of a multimedia system
US9235867B2 (en) * 2012-06-04 2016-01-12 Microsoft Technology Licensing, Llc Concurrent media delivery
US9113203B2 (en) 2012-06-28 2015-08-18 Google Inc. Generating a sequence of audio fingerprints at a set top box
US8843952B2 (en) 2012-06-28 2014-09-23 Google Inc. Determining TV program information based on analysis of audio fingerprints
US9661361B2 (en) * 2012-09-19 2017-05-23 Google Inc. Systems and methods for live media content matching
US9460204B2 (en) * 2012-10-19 2016-10-04 Sony Corporation Apparatus and method for scene change detection-based trigger for audio fingerprinting analysis
US9798731B2 (en) * 2013-03-06 2017-10-24 Dell Products, Lp Delta compression of probabilistically clustered chunks of data
EP2854317A1 (en) * 2013-09-26 2015-04-01 Alcatel Lucent Method for providing a client device with a media asset
US9456237B2 (en) 2013-12-31 2016-09-27 Google Inc. Methods, systems, and media for presenting supplemental information corresponding to on-demand media content
US10002191B2 (en) 2013-12-31 2018-06-19 Google Llc Methods, systems, and media for generating search results based on contextual information
US9461973B2 (en) 2014-03-19 2016-10-04 Bluefin Payment Systems, LLC Systems and methods for decryption as a service
US11256798B2 (en) 2014-03-19 2022-02-22 Bluefin Payment Systems Llc Systems and methods for decryption as a service
EP3790301B1 (en) * 2014-03-19 2022-04-06 Bluefin Payment Systems, LLC Systems and methods for creating fingerprints of encryption devices
US9859871B2 (en) * 2014-03-19 2018-01-02 Chip Engine, LLC Radio to tune multiple stations simultaneously and select programming segments
NL2012567B1 (en) * 2014-04-04 2016-03-08 Teletrax B V Method and device for generating improved fingerprints.
US20150301718A1 (en) * 2014-04-18 2015-10-22 Google Inc. Methods, systems, and media for presenting music items relating to media content
US20150302086A1 (en) 2014-04-22 2015-10-22 Gracenote, Inc. Audio identification during performance
US9894413B2 (en) 2014-06-12 2018-02-13 Google Llc Systems and methods for locally detecting consumed video content
CN104023251B (en) * 2014-06-13 2015-08-19 腾讯科技(深圳)有限公司 Based on interactive approach and the system of video
US9838759B2 (en) 2014-06-20 2017-12-05 Google Inc. Displaying information related to content playing on a device
US9805125B2 (en) 2014-06-20 2017-10-31 Google Inc. Displaying a summary of media content items
US10206014B2 (en) 2014-06-20 2019-02-12 Google Llc Clarifying audible verbal information in video content
US9946769B2 (en) 2014-06-20 2018-04-17 Google Llc Displaying information related to spoken dialogue in content playing on a device
TWI569257B (en) * 2014-07-04 2017-02-01 玄舟科技有限公司 Audio signal processing apparatus and audio signal processing method thereof
US9905233B1 (en) 2014-08-07 2018-02-27 Digimarc Corporation Methods and apparatus for facilitating ambient content recognition using digital watermarks, and related arrangements
US9881083B2 (en) 2014-08-14 2018-01-30 Yandex Europe Ag Method of and a system for indexing audio tracks using chromaprints
WO2016024172A1 (en) * 2014-08-14 2016-02-18 Yandex Europe Ag Method of and a system for matching audio tracks using chromaprints with a fast candidate selection routine
EP3185577B1 (en) * 2014-08-21 2018-10-24 Panasonic Intellectual Property Management Co., Ltd. Content identification apparatus and content identification method
US10762533B2 (en) * 2014-09-29 2020-09-01 Bellevue Investments Gmbh & Co. Kgaa System and method for effective monetization of product marketing in software applications via audio monitoring
GB2531700A (en) * 2014-10-09 2016-05-04 Bigears Digital Services Ltd Methods for identifying and monitoring use of audio entities
US9948997B2 (en) * 2015-02-25 2018-04-17 Excalibur Ip, Llc Providing interactivity options for television broadcast content
US10750236B2 (en) * 2015-04-23 2020-08-18 The Nielsen Company (Us), Llc Automatic content recognition with local matching
EP3255633B1 (en) * 2015-04-27 2019-06-19 Samsung Electronics Co., Ltd. Audio content recognition method and device
US10157372B2 (en) * 2015-06-26 2018-12-18 Amazon Technologies, Inc. Detection and interpretation of visual indicators
US9743138B2 (en) 2015-07-31 2017-08-22 Mutr Llc Method for sound recognition task trigger
US9913056B2 (en) 2015-08-06 2018-03-06 Dolby Laboratories Licensing Corporation System and method to enhance speakers connected to devices with microphones
US11317168B2 (en) 2015-08-13 2022-04-26 Arris Enterprises Llc System and method for detecting advertisements in multimedia assets
US9836535B2 (en) * 2015-08-25 2017-12-05 TCL Research America Inc. Method and system for content retrieval based on rate-coverage optimization
CN106558318B (en) 2015-09-24 2020-04-28 阿里巴巴集团控股有限公司 Audio recognition method and system
US10075751B2 (en) * 2015-09-30 2018-09-11 Rovi Guides, Inc. Method and system for verifying scheduled media assets
US9813781B2 (en) 2015-10-27 2017-11-07 Sorenson Media, Inc. Media content matching and indexing
US10349141B2 (en) 2015-11-19 2019-07-09 Google Llc Reminders of media content referenced in other media content
FR3044508A1 (en) 2015-11-27 2017-06-02 Orange METHOD FOR SYNCHRONIZING AN ALTERNATIVE AUDIO STREAM
KR101757878B1 (en) * 2015-12-10 2017-07-14 삼성전자주식회사 Contents processing apparatus, contents processing method thereof, server, information providing method of server and information providing system
KR102560635B1 (en) * 2015-12-28 2023-07-28 삼성전자주식회사 Content recognition device and method for controlling thereof
US10034053B1 (en) 2016-01-25 2018-07-24 Google Llc Polls for media program moments
CN105847878A (en) * 2016-03-23 2016-08-10 乐视网信息技术(北京)股份有限公司 Data recommendation method and device
US9786298B1 (en) 2016-04-08 2017-10-10 Source Digital, Inc. Audio fingerprinting based on audio energy characteristics
US10951935B2 (en) 2016-04-08 2021-03-16 Source Digital, Inc. Media environment driven content distribution platform
US10397663B2 (en) * 2016-04-08 2019-08-27 Source Digital, Inc. Synchronizing ancillary data to content including audio
GB2556023B (en) * 2016-08-15 2022-02-09 Intrasonics Sarl Audio matching
TWI612516B (en) * 2016-08-25 2018-01-21 財團法人資訊工業策進會 Audio fingerprint recognition apparatus, audio fingerprint recognition method and computer program product thereof
US10136185B2 (en) 2016-10-25 2018-11-20 Alphonso Inc. System and method for detecting unknown TV commercials from a live TV stream
US10108718B2 (en) * 2016-11-02 2018-10-23 Alphonso Inc. System and method for detecting repeating content, including commercials, in a video data stream
WO2018117619A1 (en) * 2016-12-21 2018-06-28 Samsung Electronics Co., Ltd. Display apparatus, content recognizing method thereof, and non-transitory computer readable recording medium
US11711350B2 (en) 2017-06-02 2023-07-25 Bluefin Payment Systems Llc Systems and processes for vaultless tokenization and encryption
EP3631718A4 (en) 2017-06-02 2020-12-16 Bluefin Payment Systems, LLC Systems and methods for managing a payment terminal via a web browser
US11418858B2 (en) 2017-09-01 2022-08-16 Roku, Inc. Interactive content when the secondary content is server stitched
US11234060B2 (en) 2017-09-01 2022-01-25 Roku, Inc. Weave streaming content into a linear viewing experience
CN110322886A (en) 2018-03-29 2019-10-11 北京字节跳动网络技术有限公司 A kind of audio-frequency fingerprint extracting method and device
US10346474B1 (en) 2018-03-30 2019-07-09 Alphonso Inc. System and method for detecting repeating content, including commercials, in a video data stream using audio-based and video-based automated content recognition
TWI678668B (en) * 2018-09-04 2019-12-01 誠屏科技股份有限公司 Active advertising system and method thereof
US11166077B2 (en) 2018-12-20 2021-11-02 Rovi Guides, Inc. Systems and methods for displaying subjects of a video portion of content
US11076180B2 (en) * 2019-04-04 2021-07-27 Focus IP Inc. Concurrent media stream aggregate fingerprinting
WO2020232162A1 (en) 2019-05-13 2020-11-19 Bluefin Payment Systems Llc Systems and processes for vaultless tokenization and encryption
US11245959B2 (en) * 2019-06-20 2022-02-08 Source Digital, Inc. Continuous dual authentication to access media content
CN111402926A (en) * 2020-03-19 2020-07-10 中国电影科学技术研究所 Detection method, device and equipment for cinema showing content and intelligent network sensor
US11133037B1 (en) * 2020-07-17 2021-09-28 Idomoo Ltd System and method for generating dynamic media

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7930546B2 (en) 1996-05-16 2011-04-19 Digimarc Corporation Methods, systems, and sub-combinations useful in media identification
US7013301B2 (en) * 2003-09-23 2006-03-14 Predixis Corporation Audio fingerprinting system and method
US6834308B1 (en) 2000-02-17 2004-12-21 Audible Magic Corporation Method and apparatus for identifying media content presented on a media playing device
US6574594B2 (en) * 2000-11-03 2003-06-03 International Business Machines Corporation System for monitoring broadcast audio content
WO2005041455A1 (en) 2002-12-20 2005-05-06 Koninklijke Philips Electronics N.V. Video content detection
US8332326B2 (en) 2003-02-01 2012-12-11 Audible Magic Corporation Method and apparatus to identify a work received by a processing system
US7421305B2 (en) * 2003-10-24 2008-09-02 Microsoft Corporation Audio duplicate detector
US7516074B2 (en) 2005-09-01 2009-04-07 Auditude, Inc. Extraction and matching of characteristic fingerprints from audio signals
US20080066099A1 (en) * 2006-09-11 2008-03-13 Apple Computer, Inc. Media systems with integrated content searching
KR100862616B1 (en) * 2007-04-17 2008-10-09 한국전자통신연구원 Searching system and method of audio fingerprint by index information
CA2660674A1 (en) * 2008-03-27 2009-09-27 Crim (Centre De Recherche Informatique De Montreal) Media detection using acoustic recognition
US8335786B2 (en) * 2009-05-28 2012-12-18 Zeitera, Llc Multi-media content identification using multi-level content signature correlation and fast similarity search
CN101673262B (en) * 2008-09-12 2012-10-10 未序网络科技(上海)有限公司 Method for searching audio content
US20110041154A1 (en) * 2009-08-14 2011-02-17 All Media Guide, Llc Content Recognition and Synchronization on a Television or Consumer Electronics Device
US8428955B2 (en) 2009-10-13 2013-04-23 Rovi Technologies Corporation Adjusting recorder timing
US8560583B2 (en) * 2010-04-01 2013-10-15 Sony Computer Entertainment Inc. Media fingerprinting for social networking
US9264785B2 (en) * 2010-04-01 2016-02-16 Sony Computer Entertainment Inc. Media fingerprinting for content determination and retrieval
US8694533B2 (en) * 2010-05-19 2014-04-08 Google Inc. Presenting mobile content based on programming context
US8717499B2 (en) * 2011-09-02 2014-05-06 Dialogic Corporation Audio video offset detector

Also Published As

Publication number Publication date
EP2795913A4 (en) 2015-07-15
HK1199344A1 (en) 2015-06-26
US8949872B2 (en) 2015-02-03
TW201342890A (en) 2013-10-16
WO2013095893A1 (en) 2013-06-27
US20130160038A1 (en) 2013-06-20
CN103999473A (en) 2014-08-20
EP2795913A1 (en) 2014-10-29
CA2856843C (en) 2017-03-21
TWI516100B (en) 2016-01-01
CN103999473B (en) 2018-02-06
EP2795913B1 (en) 2019-11-27

Similar Documents

Publication Publication Date Title
CA2856843C (en) Audio fingerprint for content identification
US10509815B2 (en) Presenting mobile content based on programming context
US10231023B2 (en) Media fingerprinting for content determination and retrieval
US10349124B2 (en) Method and system for interacting with audience of multimedia content
US9948997B2 (en) Providing interactivity options for television broadcast content
JP5175908B2 (en) Information processing apparatus and program
US11025985B2 (en) Audio processing for detecting occurrences of crowd noise in sporting event television programming
US11223433B1 (en) Identification of concurrently broadcast time-based media
US11438649B2 (en) Methods and apparatus to optimize reference signature matching using watermark matching
WO2022079732A1 (en) Preloading of video content in a video streaming system
JP2021533405A (en) Audio processing to extract variable length decomposed segments from audiovisual content

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20140603

MKLA Lapsed

Effective date: 20201130