EP3155822A2 - Systems and methods for locally detecting consumed video content - Google Patents

Systems and methods for locally detecting consumed video content

Info

Publication number
EP3155822A2
EP3155822A2 EP15797471.8A EP15797471A EP3155822A2 EP 3155822 A2 EP3155822 A2 EP 3155822A2 EP 15797471 A EP15797471 A EP 15797471A EP 3155822 A2 EP3155822 A2 EP 3155822A2
Authority
EP
European Patent Office
Prior art keywords
video program
user
audio fingerprints
information
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP15797471.8A
Other languages
German (de)
French (fr)
Other versions
EP3155822B1 (en
Inventor
Ant Oztaskent
Yaroslav Volovich
Ingrid McAulay TROLLOPE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of EP3155822A2 publication Critical patent/EP3155822A2/en
Application granted granted Critical
Publication of EP3155822B1 publication Critical patent/EP3155822B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/433Query formulation using audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4722End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4828End-user interface for program selection for searching program descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • H04N21/8113Monomedia components thereof involving special audio data, e.g. different tracks for different languages comprising music, e.g. song in MP3 format

Definitions

  • the disclosure relates generally to identifying video programs, and more specifically to providing a user with context-aware information based on identifying video content consumed by the user.
  • Some systems receive explicit information from a user to identify the user's context, but such systems are burdensome for users.
  • Other systems provide an opt-in feature where users choose to have their ambient sounds monitored. When the feature is enabled by a user, the sounds are collected and sent to a server (e.g., once a minute or one every five minutes), where they are analyzed and compared against a large database of known audio from video programs. When a match is found, the server is able to identify what video program is being presented in the vicinity of the user.
  • a server e.g., once a minute or one every five minutes
  • the server is able to identify what video program is being presented in the vicinity of the user.
  • Such a system has several drawbacks. First, the frequent transmissions of data to the server consume lots of energy, and thus reduce battery life of the user's client device. Second, such a system is either burdensome (requiring periodic permission to continue tracking), or else creates privacy concerns by keeping the collection open too long.
  • a media server finds repeated segments of audio across many episodes of the same show (e.g., a theme song or a jingle).
  • the server computes audio fingerprints for these segments and sends the fingerprints to a user's client device (typically a mobile device, such as a smart phone).
  • the user's client device then continuously (or periodically) performs local matching of those fingerprints on the user's client device against computed fingerprints of the ambient sound. In this way, sound at the client device is not transmitted to a server.
  • This has several benefits. First, this provides greater respect for the user's privacy while simultaneously being less of a burden on the user.
  • a process runs on a server to identify a set of audio fingerprints that will be transmitted to a client device for matching. Rather than sending all possible audio fingerprints of video programs, the set transmitted to each client device is typically limited to a small number corresponding to video programs that a user is likely to watch.
  • the server collects audio content from live TV broadcasts (e.g., using a TV capture system) as well as on-demand video content libraries.
  • the server identifies theme songs, jingles, and other audio samples that commonly occur in many episodes of the same TV show. For movies, a short sample (e.g., 30 seconds) may be taken from some point in the first 5 minutes. Some implementations select the point to take the sample based on the audio level at the time offset and/or how unique the content is (e.g., only samples that do not match any other TV show or movie are picked).
  • the server then computes audio fingerprints for these common audio samples, which will be compared with ambient audio from a microphone associated with a user's client device.
  • Some implementations compute audio fingerprints using a format that minimizes the CPU usage of a client device to compute and compare audio fingerprints.
  • some implementations use a format that minimizes the size of the audio fingerprints.
  • Some implementations select small audio samples to reduce CPU usage.
  • the server selects a subset of TV shows and movies whose fingerprints will be sent to a user's client device.
  • Some implementations limit the audio fingerprints sent to a client device based on the number of independent video programs (a single video program has one or more audio fingerprints).
  • the number of video programs for which audio fingerprints are transmitted is limited to a predetermined number (e.g., 100 or 200).
  • Some implementations use various factors in the selection process, some of which are specific to an individual user, and some of which apply to a group of users (or all users).
  • the selection criteria include determining whether certain content (e.g., any episode of a video program) aired on TV during the previous week at a user's geographic location. In some implementations, the selection criteria include determining whether certain content was recently aired, and if so, the relative size of the viewership. In some implementations, the selection criteria include determining whether certain content is going to be aired on TV in the coming week. In some implementations, the selection criteria include determining whether the user watched the TV show before (e.g., a different episode of the same video program).
  • the selection criteria include determining whether the user showed interest in that TV show before (e.g., searched for the show using a search engine, set a calendar reminder for the show, followed the show on a social networking site, or expressed interest in the show on a social networking site). In some implementations, the selection criteria use a user's personal profile. In some implementations, the selection criteria include determining popularity of video programs.
  • the server transmits the selected subset of audio fingerprints to a user's client device (e.g., pushed to the device or pulled by the device by an application running on the device).
  • the process of selecting a subset of audio fingerprints and transmitting them to the user's device is typically done periodically (e.g., once a day or once each week). Fingerprints that already exist on the user's phone are generally not retransmitted. In some implementations, older audio fingerprints are discarded from a user's device when the corresponding video programs are no longer relevant.
  • the microphone is opened by the user and kept open.
  • the user's device continuously compares ambient audio captured by its microphone against the fingerprints that were received from the server. Typically this involves computing audio fingerprints for the ambient sound, and comparing those computed fingerprints to the received fingerprints. A match indicates that the user is near a television presenting the corresponding video program. The user is presumed to be watching the video program, which is generally true.
  • the fact that the user is watching a certain TV show is stored on the user's device, and may be used to provide context-aware information to the user.
  • the record indicating that the user is watching the show is stored "permanently" in a log on the device.
  • records about watched shows are deleted after a certain period of time.
  • records about watched shows are deleted N minutes after the end of the show, where N is a predefined number (e.g., 15 minutes, 30 minutes, or 60 minutes).
  • the context information about the user watching a specific video program can be used in various ways to provide the user with relevant information.
  • that information may be used to provide an information card about the program (e.g., information about the program and its cast, with links to relevant search topics).
  • the client device includes the video program (e.g., program name or identifier) with the search query, and the server uses that knowledge to provide the information card.
  • the server responds by confirming that the user is watching the identified video program (e.g., "Are you watching Big Bang Theory?") and prompts the user to enter a rich experience.
  • the user may enable audio detection, after which audio fingerprint detection may be used to identify the exact episode and time offset that is being watched. This allows the server to provide more detailed and specific information.
  • knowledge of what program a user is watching can be used to provide search auto complete suggestions (e.g., auto complete show name, actor names, or character names).
  • a method executes at a client with one or more processors, a microphone, and memory.
  • the memory stores one or more programs configured for execution by the one or more processors.
  • the process receives audio fingerprints for a plurality of video programs and information that correlates each respective received audio fingerprint to a respective video program.
  • a video program has two or more correlated audio fingerprints.
  • the process stores the received audio fingerprints and correlating information in the memory.
  • the process detects ambient sound using the microphone, which may include the sound track of a video program being presented in the vicinity of the client device.
  • the process computes one or more sample audio fingerprints from the detected ambient sound, and compares the computed audio fingerprints to the received audio fingerprints.
  • the process matches one of the sample audio fingerprints to a first stored audio fingerprint and uses the correlating information to identify a first video program corresponding to the matched sample audio fingerprint.
  • the process then provides the user with information related to the first video program.
  • the received audio fingerprints are received from a media server and are preselected by the media server according to a set of relevancy criteria.
  • preselecting the set of audio fingerprints according to the set of relevancy criteria includes limiting the selected set to a predefined maximum number (e.g., 100).
  • preselecting the set of audio fingerprints according to the set of relevancy criteria includes selecting one or more of the audio fingerprints based on stored preferences of the user.
  • preselecting the set of audio fingerprints according to the set of relevancy criteria includes selecting one or more of the audio fingerprints based on prior search queries by the user.
  • preselecting the set of audio fingerprints according to the set of relevancy criteria includes selecting one or more of the audio fingerprints based on popularity of the video programs correlated to the selected one or more audio fingerprints. In some implementations, preselecting the set of audio fingerprints according to the set of relevancy criteria includes selecting one or more of the audio fingerprints based on previous viewing by the user of video programs correlated to the selected one or more audio fingerprints.
  • Figure 2 is a block diagram of a client device according to some implementations .
  • FIG. 3 is a block diagram of a server according to some implementations, which may be used in a server system.
  • Figures 4 and 5 illustrate various skeletal data structures or tables used by some implementations.
  • Figure 6 a process flow for providing context-aware information in accordance with some implementations.
  • Figures 7A and 7B provide a flowchart of a process, performed at a client device, for providing context-aware information about video programs according to some implementations .
  • FIG. 1 is a block diagram that illustrates the major components of some implementations.
  • the various client devices 102 and servers 300 in server system 114 communicate over one or more networks 112 (such as the Internet).
  • a client environment 100 includes a television 108, which is typically connected to a set top box 106 (or a receiver / converter).
  • the set top box 106 receives media content from a content provider 110, such as a cable TV network, a satellite dish network, or broadcast over the airwaves. As illustrated in Figure 1, in some cases the media content is transmitted through the communication networks 112.
  • a content provider 110 such as a cable TV network, a satellite dish network, or broadcast over the airwaves.
  • the media content is transmitted through the communication networks 112.
  • the client environment 100 also includes one or more client devices 102, such as smart phones, tablet computers, laptop computers, or desktop computers.
  • client device is typically in close proximity to the television 108.
  • client application 104 runs on the client device 102.
  • the client device 102 includes memory 214, as described in more detail below with respect to Figure 2.
  • the client application runs within a web browser 222.
  • Figure 1 illustrates a single set top box 106
  • one of skill in the art would recognize that other environments could consist of a plurality of distinct electronic components, such as a separate receiver, a separate converter, and a separate set top box.
  • some or all of the functionality of the set top box 106 (or converter or receiver) may be integrated with the television 108.
  • the server system 114 includes a plurality of servers 300, and the servers 300 may be connected by an internal communication network or bus 128.
  • the server system 114 includes a query processing module 116, which receives queries from users (e.g., from client devices 102) and returns responsive query results. The queries are tracked in a search query log 120 in a database 118.
  • the server system includes one or more databases 118.
  • the data stored in the database 118 includes a search query log 120, which tracks each search query submitted by a user.
  • the search query log is stored in an aggregated format to reduce the size of storage.
  • the database may include television program information 122.
  • the television program information 122 may include detailed information about each of the programs, including subtitles, as well as broadcast dates and times. Some of the information is described below with respect to Figures 4 and 5.
  • the database 118 stores user profiles 124 for users, which may include preferences explicitly identified by a user, as well as preferences inferred based on submitted search queries or television viewing history.
  • the server system 114 also includes a media subsystem 126, which is described in more detail below with respect to Figures 3 and 6. Included in the media subsystem 126 are various modules to capture media content, compute audio fingerprints, and select audio fingerprints that are likely to be relevant for each user.
  • Figure 2 is a block diagram illustrating a client device 102 that a user uses in a client environment 100.
  • a client device 102 typically includes one or more processing units (CPU's) 202 for executing modules, programs, or instructions stored in memory 214 and thereby performing processing operations; a microphone 203; one or more network or other communications interfaces 204; memory 214; and one or more communication buses 212 for interconnecting these components.
  • CPU's processing units
  • the communication buses 212 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
  • a client device 102 includes a user interface 206 comprising a display device 208 and one or more input devices or mechanisms 210.
  • the input device/mechanism includes a keyboard and a mouse; in some implementations, the input device/mechanism includes a "soft" keyboard, which is displayed as needed on the display device 208, enabling a user to "press keys" that appear on the display 208.
  • the memory 214 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices.
  • memory 214 includes non- volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices.
  • the memory 214 includes one or more storage devices remotely located from the CPU(s) 202.
  • the memory 214, or alternately the non- volatile memory device(s) within memory 214 comprises a non- transitory computer readable storage medium.
  • the memory 214, or the computer readable storage medium of memory 214 stores the following programs, modules, and data structures, or a subset thereof:
  • an operating system 216 which includes procedures for handling various basic system services and for performing hardware dependent tasks;
  • a communications module 218, which is used for connecting the client device 102 to other computers and devices via the one or more communication network interfaces 204 (wired or wireless) and one or more communication networks 112, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
  • a display module 220 which receives input from the one or more input devices 210, and generates user interface elements for display on the display device 208; • a web browser 222, which enables a user to communicate over a network 112 (such as the Internet) with remote computers or devices;
  • a network 112 such as the Internet
  • a client application 104 which may be used in conjunction with a television 108 to provide the user more context-aware information (e.g., information about television programs the user is watching).
  • the client application 104 runs within the web browser 222.
  • the client application 104 runs as a application separate from the web browser. The client application 104 is described in more detail with respect to Figure 6; and
  • the client application 104 includes one or more submodules for performing specific tasks.
  • the client application 104 includes a local capture module 224, which captures ambient sounds using the microphone 203.
  • the client application 104 includes a local fingerprint module 226, which takes the captured sounds, and computes audio fingerprints.
  • the client application 104 includes a local matching module 228, which matches the computed audio fingerprints to audio fingerprints received from the media subsystem, thereby determining what video program the user is watching.
  • Each of the above identified executable modules, applications, or sets of procedures may be stored in one or more of the previously mentioned memory devices and corresponds to a set of instructions for performing a function described above.
  • the above identified modules or programs i.e., sets of instructions
  • the memory 214 may store a subset of the modules and data structures identified above.
  • the memory 214 may store additional modules or data structures not described above.
  • Figure 2 shows a client device 102
  • Figure 2 is intended more as a functional description of the various features that may be present rather than as a structural schematic of the implementations described herein.
  • items shown separately could be combined and some items could be separated.
  • FIG. 3 is a block diagram illustrating a server 300 that may be used in a server system 114.
  • a typical server system includes many individual servers 300, which may be hundreds or thousands.
  • a server 300 typically includes one or more processing units (CPU's) 302 for executing modules, programs, or instructions stored in the memory 314 and thereby performing processing operations; one or more network or other communications interfaces 304; memory 314; and one or more communication buses 312 for interconnecting these components.
  • the communication buses 312 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
  • a server 300 includes a user interface 306, which may include a display device 308 and one or more input devices 310, such as a keyboard and a mouse.
  • the memory 314 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices.
  • the memory 314 includes non- volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices.
  • the memory 314 includes one or more storage devices remotely located from the CPU(s) 302.
  • the memory 314, or alternately the non- volatile memory device(s) within memory 314, comprises a non-transitory computer readable storage medium.
  • the memory 314, or the computer readable storage medium of memory 314, stores the following programs, modules, and data structures, or a subset thereof:
  • an operating system 316 which includes procedures for handling various basic system services and for performing hardware dependent tasks;
  • a communications module 318 which is used for connecting the server 300 to other computers via the one or more communication network interfaces 304 (wired or wireless), an internal network or bus 128, or other communication networks 112, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
  • each query is logged in the search query log 120;
  • a media subsystem 126 which identifies various video programs that may be viewed by a user and transmits audio fingerprints of the video programs to a client device 102 corresponding to the user;
  • the media subsystem 126 includes a capture module 322, which captures broadcast video programs and video programs stored in video libraries;
  • the media subsystem includes a fingerprint module 324, which computes one or more audio fingerprints for each video program.
  • an audio fingerprint is a small representation of an audio sample, and is relatively unique;
  • the media subsystem 126 includes a matching module 326, which compares audio fingerprints to identify matches.
  • the matching module uses fuzzy matching techniques
  • the media subsystem 126 includes a fingerprint selection module 328 (which may also be referred to as a video program selection module), which selects specific audio fingerprints and corresponding video programs based on relevance to a user. For example, there may be hundreds or thousands of TV programs that a user may watch (and many more movies), but a specific user is not equally likely to watch all of the possible video programs.
  • the fingerprint selection module 328 identifies specific video programs (and their corresponding fingerprints) that are more likely to be watched by the user, and transmits the selected fingerprints to a user's client device 102. This is described in more detail with respect to Figure 6; and
  • Each of the above identified elements in Figure 3 may be stored in one or more of the previously mentioned memory devices.
  • Each executable program, module, or procedure corresponds to a set of instructions for performing a function described above.
  • the above identified modules or programs i.e., sets of instructions
  • the memory 314 may store a subset of the modules and data structures identified above.
  • the memory 314 may store additional modules or data structures not described above.
  • Figure 3 illustrates a server 300
  • Figure 3 is intended more as functional illustration of the various features that may be present in a set of one or more servers rather than as a structural schematic of the implementations described herein.
  • items shown separately could be combined and some items could be separated.
  • the actual number of servers used to implement these features, and how features are allocated among them, will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.
  • the database 118 stores video program data 122.
  • Each video program includes a program ID 330, and various other information, which may be subdivided into separate data structures.
  • the video program data 122 includes the video program content 334 (i.e., the video program itself), which includes both audio and video.
  • the audio and video are stored separately.
  • the video program data also includes one or more audio fingerprints 338 for each video program. Typically a single video program will have several stored audio fingerprints.
  • the video program data for each program includes a program profile 332, which is described in more detail with respect to Figure 4.
  • the profile includes the program ID 330, which is a unique identifier for each video program.
  • the profile 332 includes a program description 402, which may comprise one or more paragraphs that describe the program.
  • the profile 332 may include cast information 404, which includes details about individual cast members or links to further information about the cast members (e.g., links to cast member web pages). For video programs that are part of a series, some implementations include series information 406 in the profile 332.
  • the profile 332 includes genre information 408, which may include general information about the genre of the video program, and may provide links to additional information.
  • the profile 332 includes related terms 410, which may include key terms that describe the video program or may identify terms that enable a user to identify related content.
  • Some implementations store information about when the video program has been or will be broadcast. Some implementations focus on video programs that are broadcast on a predefined schedule, and thus multiple viewers are viewing the same video program at the same time. Different techniques are applied to use video on demand (VOD) data, and may not use a broadcast data table 336.
  • VOD video on demand
  • Figure 5 illustrates a skeletal data structure for storing broadcast data 336.
  • Broadcast data 336 includes a program ID 330 and a broadcast list 502, which identifies when the video program has or will be broadcast.
  • each broadcast instance has a start time 504 and an end time 506.
  • each broadcast instance includes a start time 504 and a duration.
  • each broadcast instance includes information 508 that specifies the channel, station, or other source of the broadcast.
  • each broadcast instance includes information 510 that specifies the geographic location or region where the broadcast occurred.
  • the information 510 is a broadcast area.
  • each broadcast instance stores the time zone 512 of the broadcast. For some video programs that have already been broadcast, viewership information 514 is collected and stored. The viewership information may include the number of viewers, the relative percent of viewers, and may be further subdivided based on demographic characteristics or geographic region.
  • the database 118 stores a TV viewing log, which identifies what programs a user has watched. This information may be provided to the server system 114 by the client application 104, or may be included in a search query submitted by the user. In some implementations, a user registers to have television viewing tracked (e.g., as part of a single source panel).
  • the database 118 stores calculated video program popularity data 342. As explained below in Figure 6, this information may be used by the media subsystem 126 to select relevant video program fingerprints for each user.
  • the database 118 stores a search query log 120.
  • each search query is assigned a unique query ID 344 (e.g., globally unique).
  • the log stores various search query data 346.
  • Each query includes a set of query terms, which may be parsed to eliminate punctuation. In some implementations, typographical errors are retained.
  • the query data 346 typically includes a timestamp that specifies when the query was issued.
  • the timestamp is based on the user time zone, which is also stored.
  • the timestamp represents a server generated timestamp indicating when the query was received.
  • Some server systems 114 include one or more servers 300 that accurately manage timestamps in order to guarantee both accuracy of the data as well as sequential consistency.
  • a server timestamp together with the user time zone allows the server system to accurately know when each query was submitting according to the user's local time, and does not rely on the user's client device 102.
  • the query data includes the user's IP address and the user's geographic location. The set of possible values for the user's geographic location typically corresponds to the same set of values for the geographic location or region 510 used for video broadcasts.
  • the database 118 stores user profiles 124.
  • a user profile 124 may include data explicitly provided by a user (e.g., preferences for specific television programs or genres).
  • user preferences are inferred based on television programs a user actually watches or based on submitted search queries.
  • Figure 6 illustrates a process of providing context-aware information to a user of a client device 102.
  • a media content provider 110 provides (602) media content 334 to a capture module 322 within the media subsystem 126.
  • the media content 334 may be provided in various forms, such as televised RF signals, electrical signals over a cable, IP packets over an IP network, or raw content from a video library.
  • the capture module 322 receives the media content 334, and extracts audio signals, and forwards (604) the audio signals to a fingerprint module 324.
  • the fingerprint module 324 takes the audio and computes one or more audio fingerprints. For example, portions of a video program may be partitioned into 30-second segments, and an audio fingerprint computed for each of the segments. The audio fingerprints may be computed and stored in any known format, as long as the format is consistent with the format used by the local fingerprint module 226. The audio fingerprints computed by the fingerprint module 324 are sent (606) to the matching module 326 for review. [0050] For each video program, it is useful to have an audio fingerprint that uniquely identifies the video program.
  • the matching module 326 identifies theme music or jingles by comparing and matching audio fingerprints from multiple episodes. This matching process thus identifies audio portions that uniquely identify the video program (e.g., the theme song for American Idol). Note that the matching process does not necessarily know beforehand which broadcasts are episodes of the same series.
  • a different process is used because there are not multiple episodes to compare.
  • multiple audio samples are taken from an early portion of the movie (e.g., ten 30-second segments from the first five minutes). From this set of samples, one is selected that is the most unique.
  • Some implementations use a large indexed library of audio fingerprints in order to select audio fingerprints that are the most unique.
  • the process of capturing, computing audio fingerprints, and matching fingerprints to identify theme songs or theme music can be repeated many times. At some interval (e.g., once a day or once a week), the fingerprint selection module 328 takes 608 the matched audio fingerprints (and representative audio fingerprints for movies), and selects a subset to transmit to each user. The selection process may use various criteria, but generally limits the selected subset to a small number (e.g., 50 or 100).
  • the selection criteria may use information about what shows have been or will be broadcast in the region where the user lives (e.g., based on the geographic location corresponding to the user's IP address), viewership or popularity information about the broadcast programs, the user's history of TV viewing, the user's history of submitted queries, information in a user profile, information from social media sites that illustrate a user's likes or dislikes, and so on.
  • the selected subset of fingerprints (and information to correlate the fingerprints to video programs) is sent (610) to the client device 102 and received by the client application 104 in the client environment 100.
  • the client application 104 stores the fingerprints and correlating information in its memory 214 (e.g., in non-volatile storage).
  • the client device 102 activates the microphone
  • the captured audio is sent (614) to the local fingerprint module 226, which computes one or more fingerprints from the captured audio. In some implementations, the captured audio is broken into segments for fingerprinting (e.g., 30 second segments). The computed fingerprints are then sent (616) to the local matching module 228.
  • the local matching module 228 compares the audio fingerprints received from the local matching module to the fingerprints received from the media subsystem 126. A detected match indicates what show the user is watching, and that information is stored in the memory 214 of the client device.
  • context-aware information is provided (618) to the user interface 206 on the client device 102 in various ways.
  • the stored information about what video program the user is watching is included with the query so that the search engine can provide more relevant search results.
  • an auto- complete feature uses the information about what show the user is watching to complete words or phrases (e.g., the name of the show, the name of an actor or actress, the name of a character in the show, or the name of a significant entity in the show, such as a the Golden Gate bridge or Mount Rushmore).
  • the client application transmits the name of the program the user is watching to the server system even without a search query, and the user receives information about the program (e.g., more information about the video program or links to specific types of information).
  • Figures 7A and 7B provide a flowchart of a process 700, performed by a client device 102 for providing (702) context-aware information.
  • the method is performed (704) by a client device with one or more processors, a microphone, and memory.
  • the memory stores (704) programs configured for execution by the one or more processors.
  • the process receives (706) audio fingerprints for a plurality of video programs and information that correlates each respective received audio fingerprint to a respective video program.
  • a video program can be an individual movie, a television series, a video documentary, and so on.
  • the term "video program” typically refers to the series instead of an individual episode in the series.
  • Each audio fingerprint corresponds to a video program, and the correspondence is typically unique (i.e., an audio fingerprint identifies a single video program).
  • the audio from a video program is divided into segments (e.g., 15 seconds, 30 seconds, or a minute), and a distinct audio fingerprint computed for each of the segments.
  • audio fingerprints may be computed at both a client device 102 as well as at a server system 114, so the formats used for the audio fingerprints at the client device 102 and at the server system 114 are the same or at least functionally compatible.
  • the received audio fingerprints correspond to video programs that the user of the client device is reasonably likely to watch in the near future (e.g., in the coming week). Here, reasonably likely may mean a 25% chance or higher, or greater than 10%.
  • the received audio fingerprints are received (708) from a media server (e.g., media subsystem 126) and are preselected by the media server according to a set of relevancy criteria.
  • preselecting the set of audio fingerprints according to the set of relevancy criteria includes (710) limiting the selected set to a predefined maximum number.
  • the preselected number is (712) one hundred.
  • Other implementations set a lower or higher limit (e.g., 50 or 200).
  • the limit applies to video programs, but in other implementations, the limit applies to the number of computed audio fingerprints.
  • limiting the number of video programs to 100 is roughly the same as limiting the number of audio fingerprints to 500.
  • Some implementations use a threshold probability of watching rather than a predefined maximum number. For example, select all audio fingerprints corresponding to video programs for which the estimated probability of watching is at least 10%.
  • Implementations use various selection criteria as described below.
  • an individual criterion is used by itself to identify a video program for inclusion in the preselected set.
  • multiple criteria are evaluated together to identify video programs for inclusion in the preselected set.
  • a score is computed for each video program based on the relevancy criteria (e.g., with each criterion contributing to an overall weighted score), and the scores enable selection of a specific number (e.g., the top 100) or those with scores exceeding a threshold value.
  • the relevancy criteria include (714) stored preferences of the user, which may be stored in a user profile 124.
  • a user may have preferences for (or against) specific programs, specific genres, or specific actors or actresses.
  • the user preferences are explicitly entered by the user.
  • user preferences may be inferred based on other data, such as previous programs viewed (e.g., as saved in a TV viewing log 340) or search queries previously submitted by the user (e.g., as saved in a search query log 120).
  • the relevancy criteria select (716) one or more of the audio fingerprints based on prior search queries by the user (e.g., in the search query log 120). For example, previous search queries may identify specific TV programs, the names of actors in a program, or the names of characters in a program.
  • video programs are selected (718) based on the popularity of the video programs.
  • popularity of a video program is computed for smaller groups of people, such as people in specific geographic areas or with certain demographic characteristics.
  • people are grouped based on other criteria, such as identified interests.
  • popularity for a video program is computed for each individual user based on the popularity of the program among the user's circle of friends (e.g., in a social network).
  • video programs are selected (720) based on previous viewing by the user. For example, if a user has already viewed one or more episodes of a TV series, the user is more likely to watch additional episodes of the same TV series. Similarly, if a user has watched a specific movie, the user is more likely to watch related movies (or even the same movie), movies of the same genre, sequels, etc.
  • the process 700 stores (722) the received audio fingerprints and correlating information in the memory 214 of the client device 102 (e.g., non-volatile memory).
  • the received audio fingerprints and correlating information may be appended to information previously received (e.g., receiving additional fingerprints daily or weekly). In some implementations, some of the older fingerprints are deleted after a period of non-use.
  • an application 104 opens up the microphone 203 on the client device 102 to detect (724) ambient sound.
  • detecting (724) ambient sounds occurs immediately after storing (722) the received audio fingerprints, but in other instances, detecting (724) may occur much later (e.g., hours or days). Note that the detecting (724) may start before storing the received audio fingerprints.
  • the local fingerprint module 226 computes (726) one or more sample audio fingerprints from the detected ambient sound. Each audio fingerprint typically corresponds to a short segment of time, such as 20 seconds or 30 seconds.
  • the local matching module 228 matches a sample audio fingerprint to a first stored audio fingerprint and uses the correlating information to identify a first video program corresponding to the matched sample audio fingerprint.
  • the client application has identified what video program the user is watching without transmitting information or audio to an external server.
  • the first video program is (730) a televised television program.
  • the first video program is (732) a movie, which may be broadcast, streamed from an online source, or played from a physical medium, such as a DVD.
  • the video program includes (734) a plurality of episodes of a television series.
  • the matching process identifies the series, but not necessarily the episode.
  • the process 700 provides (736) the user with information related to the matched first video program.
  • the user is provided (738) with information related to the first video program in response to submission of a search query, where the search results are adapted to the first video program.
  • the search query is transmitted to the server system 114, the name of the matched video program (or an identifier of the video program) is included with the search query. Because of this, the query processing module 116 is aware of the query context, and thus able to provide more relevant search results.
  • the search results include an information card about the matched video program and/or links to further information about the matched video program.
  • the information related to the first video program includes (740) information about cast members of the video program or information about the characters in the video program.
  • providing the user with information related to the first video program includes providing (742) auto-complete suggestions for a search query that the user is entering.
  • the auto-complete suggestions are (742) based on the first video program.
  • the auto-complete suggestions include (744) the video program name corresponding to the first video program, names of actors in the first video program, and/or names of characters in the first video program.

Abstract

A process provides a user with context-aware information. The process is performed at a client device with one or more processors, a microphone, and memory. The memory stores one or more programs configured for execution by the one or more processors. The process receives audio fingerprints for a plurality of video programs and information that correlates each respective received audio fingerprint to a respective video program, and stores the received audio fingerprints and correlating information in memory. The process detects ambient sound using the microphone, and computes one or more sample audio fingerprints from the detected ambient sound. The process matches one of the sample audio fingerprints to a first stored audio fingerprint and uses the correlating information to identify a first video program corresponding to the matched sample audio fingerprint. The process then provides the user with information related to the first video program.

Description

Systems and Methods for Locally Detecting Consumed Video
Content
TECHNICAL FIELD
[0001] The disclosure relates generally to identifying video programs, and more specifically to providing a user with context-aware information based on identifying video content consumed by the user.
BACKGROUND
[0002] People watch a lot of television every day, and therefore many users submit search queries to a search engine while watching TV. Knowing the context that the user is in while making a search query can help provide better and more contextual results. For example, if the search engine knows what TV program a person is watching, the search engine can provide search results that are more relevant, or even predict what the user may search for while watching that content.
[0003] Some systems receive explicit information from a user to identify the user's context, but such systems are burdensome for users. Other systems provide an opt-in feature where users choose to have their ambient sounds monitored. When the feature is enabled by a user, the sounds are collected and sent to a server (e.g., once a minute or one every five minutes), where they are analyzed and compared against a large database of known audio from video programs. When a match is found, the server is able to identify what video program is being presented in the vicinity of the user. Such a system has several drawbacks. First, the frequent transmissions of data to the server consume lots of energy, and thus reduce battery life of the user's client device. Second, such a system is either burdensome (requiring periodic permission to continue tracking), or else creates privacy concerns by keeping the collection open too long.
SUMMARY
[0004] Disclosed implementations address the above deficiencies and other problems associated with providing a user with context-aware information. In some implementations, a media server finds repeated segments of audio across many episodes of the same show (e.g., a theme song or a jingle). The server computes audio fingerprints for these segments and sends the fingerprints to a user's client device (typically a mobile device, such as a smart phone). The user's client device then continuously (or periodically) performs local matching of those fingerprints on the user's client device against computed fingerprints of the ambient sound. In this way, sound at the client device is not transmitted to a server. This has several benefits. First, this provides greater respect for the user's privacy while simultaneously being less of a burden on the user. Second, because the computing and matching of fingerprints is done locally, there is no need to keep a network connection open, which results in less consumption of battery life. When the user issues a search query, the information regarding what television program the user is watching can be included, and thus the search engine is able to provide better context-aware search results.
[0005] In some implementations, a process runs on a server to identify a set of audio fingerprints that will be transmitted to a client device for matching. Rather than sending all possible audio fingerprints of video programs, the set transmitted to each client device is typically limited to a small number corresponding to video programs that a user is likely to watch.
[0006] The server collects audio content from live TV broadcasts (e.g., using a TV capture system) as well as on-demand video content libraries. The server identifies theme songs, jingles, and other audio samples that commonly occur in many episodes of the same TV show. For movies, a short sample (e.g., 30 seconds) may be taken from some point in the first 5 minutes. Some implementations select the point to take the sample based on the audio level at the time offset and/or how unique the content is (e.g., only samples that do not match any other TV show or movie are picked).
[0007] The server then computes audio fingerprints for these common audio samples, which will be compared with ambient audio from a microphone associated with a user's client device. Some implementations compute audio fingerprints using a format that minimizes the CPU usage of a client device to compute and compare audio fingerprints. In particular, some implementations use a format that minimizes the size of the audio fingerprints. Some implementations select small audio samples to reduce CPU usage.
[0008] There are many TV programs and many movies, but it would require excessive resources (e.g., network bandwidth, client device memory, client device CPU capacity, and client device battery) to download all of them and compare ambient sound at a client device against all of the possibilities. In some implementations, the server selects a subset of TV shows and movies whose fingerprints will be sent to a user's client device. Some implementations limit the audio fingerprints sent to a client device based on the number of independent video programs (a single video program has one or more audio fingerprints). In some implementations, the number of video programs for which audio fingerprints are transmitted is limited to a predetermined number (e.g., 100 or 200). Some implementations use various factors in the selection process, some of which are specific to an individual user, and some of which apply to a group of users (or all users).
[0009] In some implementations, the selection criteria include determining whether certain content (e.g., any episode of a video program) aired on TV during the previous week at a user's geographic location. In some implementations, the selection criteria include determining whether certain content was recently aired, and if so, the relative size of the viewership. In some implementations, the selection criteria include determining whether certain content is going to be aired on TV in the coming week. In some implementations, the selection criteria include determining whether the user watched the TV show before (e.g., a different episode of the same video program). In some implementations, the selection criteria include determining whether the user showed interest in that TV show before (e.g., searched for the show using a search engine, set a calendar reminder for the show, followed the show on a social networking site, or expressed interest in the show on a social networking site). In some implementations, the selection criteria use a user's personal profile. In some implementations, the selection criteria include determining popularity of video programs.
[0010] The server transmits the selected subset of audio fingerprints to a user's client device (e.g., pushed to the device or pulled by the device by an application running on the device). The process of selecting a subset of audio fingerprints and transmitting them to the user's device is typically done periodically (e.g., once a day or once each week). Fingerprints that already exist on the user's phone are generally not retransmitted. In some implementations, older audio fingerprints are discarded from a user's device when the corresponding video programs are no longer relevant.
[0011] At the user's client device, the microphone is opened by the user and kept open. In some implementations, the user's device continuously compares ambient audio captured by its microphone against the fingerprints that were received from the server. Typically this involves computing audio fingerprints for the ambient sound, and comparing those computed fingerprints to the received fingerprints. A match indicates that the user is near a television presenting the corresponding video program. The user is presumed to be watching the video program, which is generally true. The fact that the user is watching a certain TV show is stored on the user's device, and may be used to provide context-aware information to the user. In some implementations, the record indicating that the user is watching the show is stored "permanently" in a log on the device. In some implementations, records about watched shows are deleted after a certain period of time. In some implementations, records about watched shows are deleted N minutes after the end of the show, where N is a predefined number (e.g., 15 minutes, 30 minutes, or 60 minutes).
[0012] The context information about the user watching a specific video program can be used in various ways to provide the user with relevant information. In some implementations, when the user submits a search query, and the user is known to be watching a specific video program in the last M minutes (e.g., 30 minutes), that information may be used to provide an information card about the program (e.g., information about the program and its cast, with links to relevant search topics). That is, the client device includes the video program (e.g., program name or identifier) with the search query, and the server uses that knowledge to provide the information card.
[0013] In some implementations, the server responds by confirming that the user is watching the identified video program (e.g., "Are you watching Big Bang Theory?") and prompts the user to enter a rich experience. For example, the user may enable audio detection, after which audio fingerprint detection may be used to identify the exact episode and time offset that is being watched. This allows the server to provide more detailed and specific information.
[0014] In some implementations, knowledge of what program a user is watching can be used to provide search auto complete suggestions (e.g., auto complete show name, actor names, or character names).
[0015] In accordance with some implementations, a method executes at a client with one or more processors, a microphone, and memory. The memory stores one or more programs configured for execution by the one or more processors. The process receives audio fingerprints for a plurality of video programs and information that correlates each respective received audio fingerprint to a respective video program. In some instances, a video program has two or more correlated audio fingerprints. The process stores the received audio fingerprints and correlating information in the memory. The process detects ambient sound using the microphone, which may include the sound track of a video program being presented in the vicinity of the client device. The process computes one or more sample audio fingerprints from the detected ambient sound, and compares the computed audio fingerprints to the received audio fingerprints. In some instances, the process matches one of the sample audio fingerprints to a first stored audio fingerprint and uses the correlating information to identify a first video program corresponding to the matched sample audio fingerprint. The process then provides the user with information related to the first video program.
[0016] In some implementations, the received audio fingerprints are received from a media server and are preselected by the media server according to a set of relevancy criteria. In some implementations, preselecting the set of audio fingerprints according to the set of relevancy criteria includes limiting the selected set to a predefined maximum number (e.g., 100). In some implementations, preselecting the set of audio fingerprints according to the set of relevancy criteria includes selecting one or more of the audio fingerprints based on stored preferences of the user. In some implementations, preselecting the set of audio fingerprints according to the set of relevancy criteria includes selecting one or more of the audio fingerprints based on prior search queries by the user. In some implementations, preselecting the set of audio fingerprints according to the set of relevancy criteria includes selecting one or more of the audio fingerprints based on popularity of the video programs correlated to the selected one or more audio fingerprints. In some implementations, preselecting the set of audio fingerprints according to the set of relevancy criteria includes selecting one or more of the audio fingerprints based on previous viewing by the user of video programs correlated to the selected one or more audio fingerprints.
[0017] Thus methods and systems are provided that locally detect what video programs a user is watching, and provide context-aware information to the user based on knowledge of those programs.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] For a better understanding of the aforementioned implementations of the invention as well as additional implementations thereof, reference should be made to the Description of Implementations below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures. [0019] Figure 1 illustrates a context in which some implementations operate.
[0020] Figure 2 is a block diagram of a client device according to some implementations .
[0021] Figure 3 is a block diagram of a server according to some implementations, which may be used in a server system.
[0022] Figures 4 and 5 illustrate various skeletal data structures or tables used by some implementations.
[0023] Figure 6 a process flow for providing context-aware information in accordance with some implementations.
[0024] Figures 7A and 7B provide a flowchart of a process, performed at a client device, for providing context-aware information about video programs according to some implementations .
[0025] Reference will now be made in detail to implementations, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details.
DESCRIPTION OF IMPLEMENTATIONS
[0026] Figure 1 is a block diagram that illustrates the major components of some implementations. The various client devices 102 and servers 300 in server system 114 communicate over one or more networks 112 (such as the Internet). A client environment 100 includes a television 108, which is typically connected to a set top box 106 (or a receiver / converter). The set top box 106 receives media content from a content provider 110, such as a cable TV network, a satellite dish network, or broadcast over the airwaves. As illustrated in Figure 1, in some cases the media content is transmitted through the communication networks 112.
[0027] The client environment 100 also includes one or more client devices 102, such as smart phones, tablet computers, laptop computers, or desktop computers. In the context here, the client device is typically in close proximity to the television 108. Running on the client device 102 is a client application 104. The client device 102 includes memory 214, as described in more detail below with respect to Figure 2. In some implementations, the client application runs within a web browser 222. Although only a single client environment 100 is illustrated in Figure 1, there are typically millions of client environments at any time. Different client environments 100 may use different media content providers 110, and may use varying combinations of client devices 102 and boxes 106 that function as receivers, converters, or set top boxes. Although Figure 1 illustrates a single set top box 106, one of skill in the art would recognize that other environments could consist of a plurality of distinct electronic components, such as a separate receiver, a separate converter, and a separate set top box. Also, some or all of the functionality of the set top box 106 (or converter or receiver) may be integrated with the television 108.
[0028] The server system 114 includes a plurality of servers 300, and the servers 300 may be connected by an internal communication network or bus 128. The server system 114 includes a query processing module 116, which receives queries from users (e.g., from client devices 102) and returns responsive query results. The queries are tracked in a search query log 120 in a database 118.
[0029] The server system includes one or more databases 118. The data stored in the database 118 includes a search query log 120, which tracks each search query submitted by a user. In some implementations, the search query log is stored in an aggregated format to reduce the size of storage. The database may include television program information 122. The television program information 122 may include detailed information about each of the programs, including subtitles, as well as broadcast dates and times. Some of the information is described below with respect to Figures 4 and 5. In some implementations, the database 118 stores user profiles 124 for users, which may include preferences explicitly identified by a user, as well as preferences inferred based on submitted search queries or television viewing history.
[0030] The server system 114 also includes a media subsystem 126, which is described in more detail below with respect to Figures 3 and 6. Included in the media subsystem 126 are various modules to capture media content, compute audio fingerprints, and select audio fingerprints that are likely to be relevant for each user. [0031] Figure 2 is a block diagram illustrating a client device 102 that a user uses in a client environment 100. A client device 102 typically includes one or more processing units (CPU's) 202 for executing modules, programs, or instructions stored in memory 214 and thereby performing processing operations; a microphone 203; one or more network or other communications interfaces 204; memory 214; and one or more communication buses 212 for interconnecting these components. The communication buses 212 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. A client device 102 includes a user interface 206 comprising a display device 208 and one or more input devices or mechanisms 210. In some implementations, the input device/mechanism includes a keyboard and a mouse; in some implementations, the input device/mechanism includes a "soft" keyboard, which is displayed as needed on the display device 208, enabling a user to "press keys" that appear on the display 208.
[0032] In some implementations, the memory 214 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. In some implementations, memory 214 includes non- volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices. In some implementations, the memory 214 includes one or more storage devices remotely located from the CPU(s) 202. The memory 214, or alternately the non- volatile memory device(s) within memory 214, comprises a non- transitory computer readable storage medium. In some implementations, the memory 214, or the computer readable storage medium of memory 214, stores the following programs, modules, and data structures, or a subset thereof:
• an operating system 216, which includes procedures for handling various basic system services and for performing hardware dependent tasks;
• a communications module 218, which is used for connecting the client device 102 to other computers and devices via the one or more communication network interfaces 204 (wired or wireless) and one or more communication networks 112, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
• a display module 220, which receives input from the one or more input devices 210, and generates user interface elements for display on the display device 208; • a web browser 222, which enables a user to communicate over a network 112 (such as the Internet) with remote computers or devices;
• a client application 104, which may be used in conjunction with a television 108 to provide the user more context-aware information (e.g., information about television programs the user is watching). In some implementations, the client application 104 runs within the web browser 222. In some implementations, the client application 104 runs as a application separate from the web browser. The client application 104 is described in more detail with respect to Figure 6; and
• in some implementations, the client application 104 includes one or more submodules for performing specific tasks. In some implementations, the client application 104 includes a local capture module 224, which captures ambient sounds using the microphone 203. In some implementations, the client application 104 includes a local fingerprint module 226, which takes the captured sounds, and computes audio fingerprints. In some implementations, the client application 104 includes a local matching module 228, which matches the computed audio fingerprints to audio fingerprints received from the media subsystem, thereby determining what video program the user is watching. These submodules are described in more detail below with respect to Figure 6.
[0033] Each of the above identified executable modules, applications, or sets of procedures may be stored in one or more of the previously mentioned memory devices and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations. In some implementations, the memory 214 may store a subset of the modules and data structures identified above. Furthermore, the memory 214 may store additional modules or data structures not described above.
[0034] Although Figure 2 shows a client device 102, Figure 2 is intended more as a functional description of the various features that may be present rather than as a structural schematic of the implementations described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated.
[0035] Figure 3 is a block diagram illustrating a server 300 that may be used in a server system 114. A typical server system includes many individual servers 300, which may be hundreds or thousands. A server 300 typically includes one or more processing units (CPU's) 302 for executing modules, programs, or instructions stored in the memory 314 and thereby performing processing operations; one or more network or other communications interfaces 304; memory 314; and one or more communication buses 312 for interconnecting these components. The communication buses 312 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. In some implementations, a server 300 includes a user interface 306, which may include a display device 308 and one or more input devices 310, such as a keyboard and a mouse.
[0036] In some implementations, the memory 314 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. In some implementations, the memory 314 includes non- volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices. In some implementations, the memory 314 includes one or more storage devices remotely located from the CPU(s) 302. The memory 314, or alternately the non- volatile memory device(s) within memory 314, comprises a non-transitory computer readable storage medium. In some implementations, the memory 314, or the computer readable storage medium of memory 314, stores the following programs, modules, and data structures, or a subset thereof:
• an operating system 316, which includes procedures for handling various basic system services and for performing hardware dependent tasks;
• a communications module 318, which is used for connecting the server 300 to other computers via the one or more communication network interfaces 304 (wired or wireless), an internal network or bus 128, or other communication networks 112, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
• a display module 320, which receives input from one or more input devices 310, and generates user interface elements for display on a display device 308; • a query processing module 116, which receives search queries from the client device 102, and returns responsive search results. In some implementations, each query is logged in the search query log 120;
• a media subsystem 126, which identifies various video programs that may be viewed by a user and transmits audio fingerprints of the video programs to a client device 102 corresponding to the user;
• in some implementations, the media subsystem 126 includes a capture module 322, which captures broadcast video programs and video programs stored in video libraries;
• in some implementations, the media subsystem includes a fingerprint module 324, which computes one or more audio fingerprints for each video program. In some implementations, an audio fingerprint is a small representation of an audio sample, and is relatively unique;
• in some implementations, the media subsystem 126 includes a matching module 326, which compares audio fingerprints to identify matches. In some implementations, the matching module uses fuzzy matching techniques;
• in some implementations, the media subsystem 126 includes a fingerprint selection module 328 (which may also be referred to as a video program selection module), which selects specific audio fingerprints and corresponding video programs based on relevance to a user. For example, there may be hundreds or thousands of TV programs that a user may watch (and many more movies), but a specific user is not equally likely to watch all of the possible video programs. The fingerprint selection module 328 identifies specific video programs (and their corresponding fingerprints) that are more likely to be watched by the user, and transmits the selected fingerprints to a user's client device 102. This is described in more detail with respect to Figure 6; and
• one or more databases 118, which store various data used by the modules described herein.
[0037] Each of the above identified elements in Figure 3 may be stored in one or more of the previously mentioned memory devices. Each executable program, module, or procedure corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations. In some implementations, the memory 314 may store a subset of the modules and data structures identified above. Furthermore, the memory 314 may store additional modules or data structures not described above.
[0038] Although Figure 3 illustrates a server 300, Figure 3 is intended more as functional illustration of the various features that may be present in a set of one or more servers rather than as a structural schematic of the implementations described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. The actual number of servers used to implement these features, and how features are allocated among them, will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.
[0039] In some implementations, the database 118 stores video program data 122.
Each video program includes a program ID 330, and various other information, which may be subdivided into separate data structures. In some implementations, the video program data 122 includes the video program content 334 (i.e., the video program itself), which includes both audio and video. In some implementations, the audio and video are stored separately. The video program data also includes one or more audio fingerprints 338 for each video program. Typically a single video program will have several stored audio fingerprints.
[0040] In some implementations, the video program data for each program includes a program profile 332, which is described in more detail with respect to Figure 4. The profile includes the program ID 330, which is a unique identifier for each video program. In some implementations, the profile 332 includes a program description 402, which may comprise one or more paragraphs that describe the program. The profile 332 may include cast information 404, which includes details about individual cast members or links to further information about the cast members (e.g., links to cast member web pages). For video programs that are part of a series, some implementations include series information 406 in the profile 332. In some implementations, the profile 332 includes genre information 408, which may include general information about the genre of the video program, and may provide links to additional information. In some implementations, the profile 332 includes related terms 410, which may include key terms that describe the video program or may identify terms that enable a user to identify related content.
[0041] Some implementations store information about when the video program has been or will be broadcast. Some implementations focus on video programs that are broadcast on a predefined schedule, and thus multiple viewers are viewing the same video program at the same time. Different techniques are applied to use video on demand (VOD) data, and may not use a broadcast data table 336.
[0042] Figure 5 illustrates a skeletal data structure for storing broadcast data 336.
Broadcast data 336 includes a program ID 330 and a broadcast list 502, which identifies when the video program has or will be broadcast. In some implementations, each broadcast instance has a start time 504 and an end time 506. In some implementations, each broadcast instance includes a start time 504 and a duration. In some implementations, each broadcast instance includes information 508 that specifies the channel, station, or other source of the broadcast. In some implementations, each broadcast instance includes information 510 that specifies the geographic location or region where the broadcast occurred. In some implementations, the information 510 is a broadcast area. In some implementations, each broadcast instance stores the time zone 512 of the broadcast. For some video programs that have already been broadcast, viewership information 514 is collected and stored. The viewership information may include the number of viewers, the relative percent of viewers, and may be further subdivided based on demographic characteristics or geographic region.
[0043] In some implementations, the database 118 stores a TV viewing log, which identifies what programs a user has watched. This information may be provided to the server system 114 by the client application 104, or may be included in a search query submitted by the user. In some implementations, a user registers to have television viewing tracked (e.g., as part of a single source panel).
[0044] In some implementations, the database 118 stores calculated video program popularity data 342. As explained below in Figure 6, this information may be used by the media subsystem 126 to select relevant video program fingerprints for each user.
[0045] In some implementations, the database 118 stores a search query log 120. In some implementations, each search query is assigned a unique query ID 344 (e.g., globally unique). In addition, the log stores various search query data 346. Each query includes a set of query terms, which may be parsed to eliminate punctuation. In some implementations, typographical errors are retained.
[0046] The query data 346 typically includes a timestamp that specifies when the query was issued. In some implementations, the timestamp is based on the user time zone, which is also stored. In other implementations, the timestamp represents a server generated timestamp indicating when the query was received. Some server systems 114 include one or more servers 300 that accurately manage timestamps in order to guarantee both accuracy of the data as well as sequential consistency. In some implementations, a server timestamp together with the user time zone (as well as knowing the server time zone) allows the server system to accurately know when each query was submitting according to the user's local time, and does not rely on the user's client device 102. In some implementations, the query data includes the user's IP address and the user's geographic location. The set of possible values for the user's geographic location typically corresponds to the same set of values for the geographic location or region 510 used for video broadcasts.
[0047] In some implementations, the database 118 stores user profiles 124. A user profile 124 may include data explicitly provided by a user (e.g., preferences for specific television programs or genres). In some implementations, user preferences are inferred based on television programs a user actually watches or based on submitted search queries.
[0048] Figure 6 illustrates a process of providing context-aware information to a user of a client device 102. A media content provider 110 provides (602) media content 334 to a capture module 322 within the media subsystem 126. The media content 334 may be provided in various forms, such as televised RF signals, electrical signals over a cable, IP packets over an IP network, or raw content from a video library. The capture module 322 receives the media content 334, and extracts audio signals, and forwards (604) the audio signals to a fingerprint module 324.
[0049] The fingerprint module 324 takes the audio and computes one or more audio fingerprints. For example, portions of a video program may be partitioned into 30-second segments, and an audio fingerprint computed for each of the segments. The audio fingerprints may be computed and stored in any known format, as long as the format is consistent with the format used by the local fingerprint module 226. The audio fingerprints computed by the fingerprint module 324 are sent (606) to the matching module 326 for review. [0050] For each video program, it is useful to have an audio fingerprint that uniquely identifies the video program.
[0051] For a video program that includes multiple episodes (e.g., a TV series), the matching module 326 identifies theme music or jingles by comparing and matching audio fingerprints from multiple episodes. This matching process thus identifies audio portions that uniquely identify the video program (e.g., the theme song for American Idol). Note that the matching process does not necessarily know beforehand which broadcasts are episodes of the same series.
[0052] For a video program that is a movie, a different process is used because there are not multiple episodes to compare. In some implementations, multiple audio samples are taken from an early portion of the movie (e.g., ten 30-second segments from the first five minutes). From this set of samples, one is selected that is the most unique. Some implementations use a large indexed library of audio fingerprints in order to select audio fingerprints that are the most unique.
[0053] The process of capturing, computing audio fingerprints, and matching fingerprints to identify theme songs or theme music can be repeated many times. At some interval (e.g., once a day or once a week), the fingerprint selection module 328 takes 608 the matched audio fingerprints (and representative audio fingerprints for movies), and selects a subset to transmit to each user. The selection process may use various criteria, but generally limits the selected subset to a small number (e.g., 50 or 100). The selection criteria may use information about what shows have been or will be broadcast in the region where the user lives (e.g., based on the geographic location corresponding to the user's IP address), viewership or popularity information about the broadcast programs, the user's history of TV viewing, the user's history of submitted queries, information in a user profile, information from social media sites that illustrate a user's likes or dislikes, and so on. The selected subset of fingerprints (and information to correlate the fingerprints to video programs) is sent (610) to the client device 102 and received by the client application 104 in the client environment 100. The client application 104 stores the fingerprints and correlating information in its memory 214 (e.g., in non-volatile storage).
[0054] When permitted by the user, the client device 102 activates the microphone
203 and ambient sounds are received (612) by the local capture module 224. In some instances, some of the ambient sound comes from a television 108 that is near the client device 102. The captured audio is sent (614) to the local fingerprint module 226, which computes one or more fingerprints from the captured audio. In some implementations, the captured audio is broken into segments for fingerprinting (e.g., 30 second segments). The computed fingerprints are then sent (616) to the local matching module 228.
[0055] The local matching module 228 compares the audio fingerprints received from the local matching module to the fingerprints received from the media subsystem 126. A detected match indicates what show the user is watching, and that information is stored in the memory 214 of the client device.
[0056] Subsequently, context-aware information is provided (618) to the user interface 206 on the client device 102 in various ways. In some instances, when a user submits a query to the server system, the stored information about what video program the user is watching is included with the query so that the search engine can provide more relevant search results. In some instances, as a user is entering a search query, an auto- complete feature uses the information about what show the user is watching to complete words or phrases (e.g., the name of the show, the name of an actor or actress, the name of a character in the show, or the name of a significant entity in the show, such as a the Golden Gate bridge or Mount Rushmore). In some implementations, the client application transmits the name of the program the user is watching to the server system even without a search query, and the user receives information about the program (e.g., more information about the video program or links to specific types of information).
[0057] Figures 7A and 7B provide a flowchart of a process 700, performed by a client device 102 for providing (702) context-aware information. The method is performed (704) by a client device with one or more processors, a microphone, and memory. The memory stores (704) programs configured for execution by the one or more processors.
[0058] The process receives (706) audio fingerprints for a plurality of video programs and information that correlates each respective received audio fingerprint to a respective video program. A video program can be an individual movie, a television series, a video documentary, and so on. For a series that includes multiple episodes, the term "video program" typically refers to the series instead of an individual episode in the series. Each audio fingerprint corresponds to a video program, and the correspondence is typically unique (i.e., an audio fingerprint identifies a single video program). However, there are generally multiple audio fingerprints for each video program. Commonly, the audio from a video program is divided into segments (e.g., 15 seconds, 30 seconds, or a minute), and a distinct audio fingerprint computed for each of the segments. One of skill in the art recognizes that there are many distinct formats for audio fingerprints and many distinct formulas or techniques that may be used to compute audio fingerprints. As disclosed herein, audio fingerprints may be computed at both a client device 102 as well as at a server system 114, so the formats used for the audio fingerprints at the client device 102 and at the server system 114 are the same or at least functionally compatible.
[0059] The received audio fingerprints correspond to video programs that the user of the client device is reasonably likely to watch in the near future (e.g., in the coming week). Here, reasonably likely may mean a 25% chance or higher, or greater than 10%.
[0060] In some implementations, the received audio fingerprints are received (708) from a media server (e.g., media subsystem 126) and are preselected by the media server according to a set of relevancy criteria. In some implementations, preselecting the set of audio fingerprints according to the set of relevancy criteria includes (710) limiting the selected set to a predefined maximum number. For example, in some implementations, the preselected number is (712) one hundred. Other implementations set a lower or higher limit (e.g., 50 or 200). In some implementations, the limit applies to video programs, but in other implementations, the limit applies to the number of computed audio fingerprints. For example, if each video program has approximately 5 audio fingerprints, then limiting the number of video programs to 100 is roughly the same as limiting the number of audio fingerprints to 500. Some implementations use a threshold probability of watching rather than a predefined maximum number. For example, select all audio fingerprints corresponding to video programs for which the estimated probability of watching is at least 10%.
[0061] Implementations use various selection criteria as described below. In some instances, an individual criterion is used by itself to identify a video program for inclusion in the preselected set. In other instances, multiple criteria are evaluated together to identify video programs for inclusion in the preselected set. In some instances, a score is computed for each video program based on the relevancy criteria (e.g., with each criterion contributing to an overall weighted score), and the scores enable selection of a specific number (e.g., the top 100) or those with scores exceeding a threshold value. [0062] In some implementations, the relevancy criteria include (714) stored preferences of the user, which may be stored in a user profile 124. For example, a user may have preferences for (or against) specific programs, specific genres, or specific actors or actresses. In some instances, the user preferences are explicitly entered by the user. In some instances, user preferences may be inferred based on other data, such as previous programs viewed (e.g., as saved in a TV viewing log 340) or search queries previously submitted by the user (e.g., as saved in a search query log 120).
[0063] In some implementations, the relevancy criteria select (716) one or more of the audio fingerprints based on prior search queries by the user (e.g., in the search query log 120). For example, previous search queries may identify specific TV programs, the names of actors in a program, or the names of characters in a program.
[0064] In some implementations, video programs are selected (718) based on the popularity of the video programs. Typically popularity of a video program is computed for smaller groups of people, such as people in specific geographic areas or with certain demographic characteristics. In some implementations, people are grouped based on other criteria, such as identified interests. In some implementations, popularity for a video program is computed for each individual user based on the popularity of the program among the user's circle of friends (e.g., in a social network).
[0065] In some implementations, video programs are selected (720) based on previous viewing by the user. For example, if a user has already viewed one or more episodes of a TV series, the user is more likely to watch additional episodes of the same TV series. Similarly, if a user has watched a specific movie, the user is more likely to watch related movies (or even the same movie), movies of the same genre, sequels, etc.
[0066] The process 700 stores (722) the received audio fingerprints and correlating information in the memory 214 of the client device 102 (e.g., non-volatile memory). The received audio fingerprints and correlating information may be appended to information previously received (e.g., receiving additional fingerprints daily or weekly). In some implementations, some of the older fingerprints are deleted after a period of non-use.
[0067] At some point, an application 104 opens up the microphone 203 on the client device 102 to detect (724) ambient sound. In some instances, detecting (724) ambient sounds occurs immediately after storing (722) the received audio fingerprints, but in other instances, detecting (724) may occur much later (e.g., hours or days). Note that the detecting (724) may start before storing the received audio fingerprints.
[0068] The local fingerprint module 226 computes (726) one or more sample audio fingerprints from the detected ambient sound. Each audio fingerprint typically corresponds to a short segment of time, such as 20 seconds or 30 seconds.
[0069] The local matching module 228 matches a sample audio fingerprint to a first stored audio fingerprint and uses the correlating information to identify a first video program corresponding to the matched sample audio fingerprint. In this way, the client application has identified what video program the user is watching without transmitting information or audio to an external server. In some instances, the first video program is (730) a televised television program. In some instances, the first video program is (732) a movie, which may be broadcast, streamed from an online source, or played from a physical medium, such as a DVD. In some instances, the video program includes (734) a plurality of episodes of a television series. In some instances, the matching process identifies the series, but not necessarily the episode.
[0070] At some point after the matching has occurred (e.g., 2 seconds later, a minute later, or half an hour later), the process 700 provides (736) the user with information related to the matched first video program. In some instances, the user is provided (738) with information related to the first video program in response to submission of a search query, where the search results are adapted to the first video program. When the user's search query is transmitted to the server system 114, the name of the matched video program (or an identifier of the video program) is included with the search query. Because of this, the query processing module 116 is aware of the query context, and thus able to provide more relevant search results. In some implementations, the search results include an information card about the matched video program and/or links to further information about the matched video program. In some implementations, the information related to the first video program includes (740) information about cast members of the video program or information about the characters in the video program.
[0071] In some implementations, providing the user with information related to the first video program includes providing (742) auto-complete suggestions for a search query that the user is entering. The auto-complete suggestions are (742) based on the first video program. In some instances, the auto-complete suggestions include (744) the video program name corresponding to the first video program, names of actors in the first video program, and/or names of characters in the first video program.
[0072] The terminology used in the description of the invention herein is for the purpose of describing particular implementations only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
[0073] The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations described herein were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various implementations with various modifications as are suited to the particular use contemplated.

Claims

What is claimed is:
1. A method of providing a user with context-aware information, comprising:
at a client device with one or more processors, a microphone, and memory storing one or more programs configured for execution by the one or more processors:
receiving audio fingerprints for a plurality of video programs and information that correlates each respective received audio fingerprint to a respective video program;
storing the received audio fingerprints and correlating information in the memory;
detecting ambient sound using the microphone, wherein the ambient sound includes sound of media being played on a second device in proximity to the client device;
computing one or more sample audio fingerprints from the detected ambient sound;
matching one of the sample audio fingerprints to a first stored audio fingerprint and using the correlating information to identify a first video program
corresponding to the matched sample audio fingerprint; and
providing the user with information related to the first video program, wherein the first video program is being playing on the second device.
2. The method of claim 1, wherein the first video program is a televised television program.
3. The method of claim 1, wherein the first video program is a movie.
4. The method of claim 1, wherein the first video program comprises a plurality of episodes of a television series.
5. The method of claim 1, wherein the received audio fingerprints are received from a media server and are preselected by the media server according to a set of relevancy criteria.
6. The method of any of claims 1-5, wherein preselecting the set of audio fingerprints according to the set of relevancy criteria comprises limiting the selected set to a predefined maximum number.
7. The method of claim 6, wherein the predefined maximum number is 100.
8. The method of any of claims 1-5, wherein preselecting the set of audio fingerprints according to the set of relevancy criteria comprises selecting one or more of the audio fingerprints based on stored preferences of the user.
9. The method of any of claims 1-5, wherein preselecting the set of audio fingerprints according to the set of relevancy criteria comprises selecting one or more of the audio fingerprints based on prior search queries by the user.
10. The method of any of claims 1-5, wherein preselecting the set of audio fingerprints according to the set of relevancy criteria comprises selecting one or more of the audio fingerprints based on popularity of the video programs correlated to the selected one or more audio fingerprints.
11. The method of any of claims 1-5, wherein preselecting the set of audio fingerprints according to the set of relevancy criteria comprises selecting one or more of the audio fingerprints based on previous viewing by the user of video programs correlated to the selected one or more audio fingerprints.
12. The method of any of claims 1-5, wherein providing the user with information related to the first video program is in response to user submission of a search query, and wherein the information includes search results that are adapted to the first video program.
13. The method of any of claims 1-5, wherein the information related to the first video program includes information about cast members of the video program.
14. The method of any of claims 1-5, wherein providing the user with information related to the first video program comprises providing auto-complete suggestions for a search query, and wherein the auto-complete suggestions are based on the first video program.
15. The method of claim 14, wherein the auto-complete suggestions are selected from the group consisting of a video program name corresponding to the first video program, names of actors in the first video program, and names of characters in the first video program.
16. A client device for providing a user with context-aware information, comprising: one or more processors;
a microphone; memory; and
one or more programs stored in the memory configured for execution by the one or more processors, the one or more programs comprising instructions for:
receiving audio fingerprints for a plurality of video programs and information that correlates each respective received audio fingerprint to a respective video program;
storing the received audio fingerprints and correlating information in the memory;
detecting ambient sound using the microphone, wherein the ambient sound includes sound of media being played on a second device in proximity to the client device;
computing one or more sample audio fingerprints from the detected ambient sound;
matching one of the sample audio fingerprints to a first stored audio fingerprint and using the correlating information to identify a first video program
corresponding to the matched sample audio fingerprint; and
providing the user with information related to the first video program, wherein the first video program is being playing on the second device.
17. The client device of claim 16, wherein the instructions for receiving the audio fingerprints comprise instructions for receiving the audio fingerprints from a media server, wherein the audio fingerprints are preselected by the media server according to a set of relevancy criteria, and wherein preselecting the set of audio fingerprints according to the set of relevancy criteria comprises limiting the selected set to a predefined maximum number.
18. The client device of claim 17, wherein preselecting the set of audio fingerprints according to the set of relevancy criteria comprises selecting one or more of the audio fingerprints based on prior search queries by the user.
19. The client device of claim 16, wherein the instructions for providing the user with information related to the first video program comprise instructions for providing the information in response to user submission of a search query, and providing search results that are adapted to the first video program.
20. A non-transitory computer readable storage medium storing one or more programs configured for execution by a client device having one or more processors, a microphone, and memory, the one or more programs configured for execution by the one or more processors and comprising instructions for:
receiving audio fingerprints for a plurality of video programs and information that correlates each respective received audio fingerprint to a respective video program;
storing the received audio fingerprints and correlating information in the memory; detecting ambient sound using the microphone, wherein the ambient sound includes sound of media being played on a second device in proximity to the client device;
computing one or more sample audio fingerprints from the detected ambient sound; matching one of the sample audio fingerprints to a first stored audio fingerprint and using the correlating information to identify a first video program corresponding to the matched sample audio fingerprint; and
providing the user with information related to the first video program, wherein the first video program is being playing on the second device.
21. A client device for providing a user with context-aware information, comprising: one or more processors;
a microphone;
memory; and
one or more programs stored in the memory configured for execution by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-15.
22. A non-transitory computer readable storage medium storing one or more program modules configured for execution by a client device having one or more processors, a microphone, and memory, the one or more program modules configured for execution by the one or more processors, and comprising instructions for causing the client device to perform the method of any of claims 1-15.
EP15797471.8A 2014-06-12 2015-06-10 Systems and methods for locally recognizing viewed video content Active EP3155822B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/303,506 US9894413B2 (en) 2014-06-12 2014-06-12 Systems and methods for locally detecting consumed video content
PCT/US2015/035162 WO2015191755A2 (en) 2014-06-12 2015-06-10 Systems and methods for locally detecting consumed video content

Publications (2)

Publication Number Publication Date
EP3155822A2 true EP3155822A2 (en) 2017-04-19
EP3155822B1 EP3155822B1 (en) 2020-09-30

Family

ID=54601980

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15797471.8A Active EP3155822B1 (en) 2014-06-12 2015-06-10 Systems and methods for locally recognizing viewed video content

Country Status (4)

Country Link
US (4) US9894413B2 (en)
EP (1) EP3155822B1 (en)
CN (1) CN106415546B (en)
WO (1) WO2015191755A2 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9497505B2 (en) * 2014-09-30 2016-11-15 The Nielsen Company (Us), Llc Systems and methods to verify and/or correct media lineup information
US10750236B2 (en) * 2015-04-23 2020-08-18 The Nielsen Company (Us), Llc Automatic content recognition with local matching
US9836535B2 (en) * 2015-08-25 2017-12-05 TCL Research America Inc. Method and system for content retrieval based on rate-coverage optimization
CN106558318B (en) 2015-09-24 2020-04-28 阿里巴巴集团控股有限公司 Audio recognition method and system
US11540009B2 (en) 2016-01-06 2022-12-27 Tvision Insights, Inc. Systems and methods for assessing viewer engagement
WO2017120469A1 (en) 2016-01-06 2017-07-13 Tvision Insights, Inc. Systems and methods for assessing viewer engagement
CN105635793B (en) * 2016-01-22 2018-06-19 天脉聚源(北京)传媒科技有限公司 A kind of method and device for playing program
CN106126617B (en) * 2016-06-22 2018-11-23 腾讯科技(深圳)有限公司 A kind of video detecting method and server
JP6530820B2 (en) * 2016-10-31 2019-06-12 北京小米移動軟件有限公司Beijing Xiaomi Mobile Software Co.,Ltd. Multimedia information reproducing method and system, collecting device, standard server
WO2018195391A1 (en) * 2017-04-20 2018-10-25 Tvision Insights, Inc. Methods and apparatus for multi-television measurements
KR102640422B1 (en) 2018-12-04 2024-02-26 삼성전자주식회사 Method for contents casting and electronic device therefor
US20210352368A1 (en) * 2019-12-12 2021-11-11 Martin Hannes Hybrid data collection system to create an integrated database of connected and detailed consumer video viewing data in real-time
US11785280B1 (en) * 2021-04-15 2023-10-10 Epoxy.Ai Operations Llc System and method for recognizing live event audiovisual content to recommend time-sensitive targeted interactive contextual transactions offers and enhancements
CN115119041A (en) * 2022-06-17 2022-09-27 深圳创维-Rgb电子有限公司 Cross-screen playing control method, device, equipment and computer storage medium

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028796A1 (en) * 2001-07-31 2003-02-06 Gracenote, Inc. Multiple step identification of recordings
CN101309386A (en) * 2007-05-14 2008-11-19 深圳Tcl工业研究院有限公司 Method and television recording program receiving history of user
US8190627B2 (en) * 2007-06-28 2012-05-29 Microsoft Corporation Machine assisted query formulation
CN101673265B (en) * 2008-09-12 2012-09-05 未序网络科技(上海)有限公司 Video content searching device
US20100205628A1 (en) 2009-02-12 2010-08-12 Davis Bruce L Media processing methods and arrangements
US9110990B2 (en) * 2009-04-03 2015-08-18 Verizon Patent And Licensing Inc. Apparatuses, methods and systems for improving the relevancy of interactive program guide search results on a wireless user's handset and television
US8677400B2 (en) * 2009-09-30 2014-03-18 United Video Properties, Inc. Systems and methods for identifying audio content using an interactive media guidance application
US8161071B2 (en) * 2009-09-30 2012-04-17 United Video Properties, Inc. Systems and methods for audio asset storage and management
WO2011046719A1 (en) 2009-10-13 2011-04-21 Rovi Technologies Corporation Adjusting recorder timing
US9361387B2 (en) * 2010-04-22 2016-06-07 Microsoft Technology Licensing, Llc Context-based services
US20120191231A1 (en) * 2010-05-04 2012-07-26 Shazam Entertainment Ltd. Methods and Systems for Identifying Content in Data Stream by a Client Device
US8694533B2 (en) 2010-05-19 2014-04-08 Google Inc. Presenting mobile content based on programming context
US8522283B2 (en) * 2010-05-20 2013-08-27 Google Inc. Television remote control data transfer
US20130204415A1 (en) 2011-03-25 2013-08-08 Verisign, Inc. Systems and methods for using signal-derived segmented identifiers to manage resource contention and control access to data and functions
US8843584B2 (en) * 2011-06-02 2014-09-23 Google Inc. Methods for displaying content on a second device that is related to the content playing on a first device
CN102999493B (en) * 2011-09-08 2018-07-03 北京小度互娱科技有限公司 A kind of method and apparatus for being used to implement video resource recommendation
WO2013040533A1 (en) * 2011-09-16 2013-03-21 Umami Co. Second screen interactive platform
US8949872B2 (en) * 2011-12-20 2015-02-03 Yahoo! Inc. Audio fingerprint for content identification
US9292894B2 (en) * 2012-03-14 2016-03-22 Digimarc Corporation Content recognition and synchronization using local caching
US9703932B2 (en) 2012-04-30 2017-07-11 Excalibur Ip, Llc Continuous content identification of broadcast content
US8819737B2 (en) * 2012-08-27 2014-08-26 At&T Intellectual Property I, L.P. System and method of content acquisition and delivery
US9547647B2 (en) * 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
WO2014139120A1 (en) * 2013-03-14 2014-09-18 Microsoft Corporation Search intent preview, disambiguation, and refinement

Also Published As

Publication number Publication date
US20180167676A1 (en) 2018-06-14
US11924507B2 (en) 2024-03-05
WO2015191755A2 (en) 2015-12-17
US20200053424A1 (en) 2020-02-13
WO2015191755A3 (en) 2016-03-17
CN106415546B (en) 2019-10-25
US20220116682A1 (en) 2022-04-14
US10455281B2 (en) 2019-10-22
US11206449B2 (en) 2021-12-21
US9894413B2 (en) 2018-02-13
US20150365722A1 (en) 2015-12-17
CN106415546A (en) 2017-02-15
EP3155822B1 (en) 2020-09-30

Similar Documents

Publication Publication Date Title
US11924507B2 (en) Adapting search query processing according to locally detected video content consumption
US9396258B2 (en) Recommending video programs
US10509815B2 (en) Presenting mobile content based on programming context
CN106464986B (en) System and method for generating video program snippets based on search queries
US10154310B2 (en) System and method for associating individual household members with television programs viewed
US11743522B2 (en) Systems and methods that match search queries to television subtitles
US11223433B1 (en) Identification of concurrently broadcast time-based media
US20150128186A1 (en) Mobile Multimedia Terminal, Video Program Recommendation Method and Server Thereof

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20170110

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: GOOGLE LLC

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20200123

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 602015059879

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: H04N0021472200

Ipc: G06F0016432000

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 21/442 20110101ALI20200331BHEP

Ipc: H04N 21/439 20110101ALI20200331BHEP

Ipc: H04N 21/4722 20110101AFI20200331BHEP

Ipc: H04N 21/422 20110101ALI20200331BHEP

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 21/422 20110101ALI20200416BHEP

Ipc: H04N 21/442 20110101ALI20200416BHEP

Ipc: H04N 21/482 20110101ALI20200416BHEP

Ipc: H04N 21/4722 20110101ALI20200416BHEP

Ipc: H04N 21/439 20110101ALI20200416BHEP

Ipc: G06F 16/432 20190101AFI20200416BHEP

Ipc: G06F 16/783 20190101ALI20200416BHEP

INTG Intention to grant announced

Effective date: 20200519

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1319523

Country of ref document: AT

Kind code of ref document: T

Effective date: 20201015

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602015059879

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20201230

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20201231

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20201230

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1319523

Country of ref document: AT

Kind code of ref document: T

Effective date: 20200930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20200930

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210201

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602015059879

Country of ref document: DE

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

26N No opposition filed

Effective date: 20210701

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20210630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210610

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210630

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210610

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20150610

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230508

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230626

Year of fee payment: 9

Ref country code: DE

Payment date: 20230626

Year of fee payment: 9

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20230627

Year of fee payment: 9