US7284255B1 - Audience survey system, and system and methods for compressing and correlating audio signals - Google Patents

Audience survey system, and system and methods for compressing and correlating audio signals Download PDF

Info

Publication number
US7284255B1
US7284255B1 US09/441,539 US44153999A US7284255B1 US 7284255 B1 US7284255 B1 US 7284255B1 US 44153999 A US44153999 A US 44153999A US 7284255 B1 US7284255 B1 US 7284255B1
Authority
US
United States
Prior art keywords
packet
packets
frequency band
audio
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/441,539
Inventor
Steven G. Apel
Stephen C. Kenyon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Assigned to APEL, STEVEN G. reassignment APEL, STEVEN G. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KENYON, STEPHEN C.
Priority to US09/441,539 priority Critical patent/US7284255B1/en
Priority to AU56208/00A priority patent/AU5620800A/en
Priority to BR0011762-5A priority patent/BR0011762A/en
Priority to PCT/US2000/016729 priority patent/WO2000079709A1/en
Priority to EP00941506A priority patent/EP1190510A1/en
Priority to CA002375853A priority patent/CA2375853A1/en
Priority to JP2001504616A priority patent/JP2003502936A/en
Publication of US7284255B1 publication Critical patent/US7284255B1/en
Application granted granted Critical
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • H04H60/58Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54 of audio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/35Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users
    • H04H60/38Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space
    • H04H60/41Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space for identifying broadcast space, i.e. broadcast channels, broadcast stations or broadcast areas
    • H04H60/44Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space for identifying broadcast space, i.e. broadcast channels, broadcast stations or broadcast areas for identifying broadcast stations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/76Arrangements characterised by transmission systems other than for broadcast, e.g. the Internet
    • H04H60/81Arrangements characterised by transmission systems other than for broadcast, e.g. the Internet characterised by the transmission system itself
    • H04H60/93Wired transmission systems
    • H04H60/94Telephonic networks

Definitions

  • the invention relates to a method and system for automatically identifying which of a number of possible audio sources is present in the vicinity of an audience member. This is accomplished through the use of audio pattern recognition techniques.
  • a system and method is disclosed that employs small portable monitoring units worn or carried by people selected to form a panel that is representative of a given population. Audio samples taken at regular intervals are compressed and stored for later comparison with reference signals collected at a central site. This allows a determination to be made regarding which broadcast audio signals each survey member is listening to at different times of day. An automatic survey of listening preferences can then be conducted.
  • Radio and television surveys have been conducted for many years to determine the relative popularity of programs and broadcast stations. This information is necessary for a number of reasons including the determination of advertising price structure and deciding if certain programs should be continued or canceled.
  • One of the most common methods for performing these surveys is for survey members to manually record the radio and television stations that they listen to and watch at various times of day. The maintaining of these manual logs is cumbersome and inaccurate. Additionally, transferring the information in the logs to an automated system represents an additional time consuming process.
  • Codes are repeatedly inserted into the audio when there is sufficient signal energy to mask the codes. However, when the injection level of the code is sufficient to assure reliable decoding it is perceptible to listeners. Conversely, when the code injection level is reduced to become imperceptible decoding reliability suffers. Best has improved on this invention as taught in U.S. Pat. No. 5,113,437. This system uses several sets of code frequencies and switches among them in a pseudo-random manner. This reduces the audibility of the codes.
  • a multiple stage pattern recognition system is described by Kenyon et al. in U.S. Pat. No. 4,843,562. This method uses low-bandwidth features of the audio signal to quickly determine which patterns can be immediately rejected. Those that remain are subjected to a high-resolution correlation with time warping to compensate for speed errors. This system is intended for use with a large number of candidate patterns. The algorithms used are too complex to be used in a portable survey system.
  • Lamb performs a spectrum analysis based on the semitones of the musical scale and extracts a sequence of measurements forming a spectrogram. Cells within this spectrogram are determined to be active or inactive depending on the relative power in each cell. The spectrogram is then compared to a set of reference patterns using a logical procedure to determine the identity of the unknown input. This technique is sensitive to speed variation and even small amounts of distortion.
  • Kiewit et al. have devised a system specifically for the purpose of conducting automatic audience surveys as disclosed in U.S. Pat. No. 4,697,209.
  • This system uses trigger events such as scene changes or blank video frames to determine when features of the signal should be collected.
  • trigger event such as scene changes or blank video frames
  • features of the video waveform are extracted and stored along with the time of occurrence in a local memory.
  • These captured video features are periodically transmitted to a central site for comparison with a set of reference video features from all of the possible television signals.
  • the obvious shortcoming of this system is that it cannot be used to conduct audience surveys of radio broadcasts.
  • the present invention combines certain aspects of several of the above inventions, but in a unique and novel manner to define a system and method that is suited to conducting audience surveys of both radio and television broadcasts.
  • a central computer extracts features from the audio of radio and television broadcast stations using direct connection to a group of receivers.
  • the audio is digitized and features are extracted in the same manner as for the portable monitoring units. However, the features are extracted continuously for all broadcast sources in a market.
  • the feature streams are compressed, time-marked and stored on the central computer disk drives.
  • the portable monitoring units assigned to survey members are stored in docking stations that recharge the batteries and also provide modems and telephone access.
  • the central computer interrogates the docked portable monitoring unit using the modem and transfers the stored feature packets to the central computer for analysis. This is done late at night or early in the morning when the portable monitoring unit is not in use and the phone line is available.
  • the current time marker is transferred from the portable monitoring unit to the central computer.
  • the central computer can determine the apparent elapsed time as seen by the portable monitoring unit.
  • the central computer then makes a similar calculation based on the absolute time of interrogation and the previous interrogation time.
  • the central computer can then perform the necessary interpolations and time translations to synchronize the feature data packets received from the portable monitoring unit with feature data stored in the central computer.
  • the system can determine which broadcast station the survey member was listening to at a particular time. This is accomplished by computing cross-correlation functions for each of three audio frequency bands between the unknown feature packet and features collected at the same time by the central computer for many different broadcast stations.
  • the fast correlation method based on the FFT algorithm is used to produce a set of normalized correlation values spanning a time window of approximately six seconds. This is sufficient to cover residual time synchronization errors between the portable monitoring unit and the central computer.
  • the correlation functions for the three frequency bands will each have a value of +1.0 for a perfect match, 0.0 for no correlation, and ⁇ 1.0 for an exact opposite.
  • a figure of merit that is a three dimensional Euclidean distance from a perfect match. This distance is calculated as the square root of the sum of the squares of the individual distances, where the individual distance is equal to (1,0-correlation value). In this representation, a perfect match has a distance of zero from the reference pattern.
  • the contributions of each of the features is weighted according to the relative amplitudes of the feature waveforms stored in the central computer database. This has the effect of assigning more weight to features that are expected to have a higher signal-to-noise ratio.
  • the minimum value of the resulting distance is then found for each of the candidate patterns collected from the broadcast stations. This represents the best match for each of the broadcast stations. The minimum of these is then selected as the broadcast source that best matches the unknown feature packet from the portable monitoring unit. If this value is less than a predetermined threshold, the feature packet is assumed to be the same as the feature data from the corresponding broadcast station. The system then makes the assertion that the survey member was listening to that radio or television station at that particular time.
  • FIG. 1 illustrates the functional components of the invention and how they interact to function as an audience measurement system.
  • Audience survey panel members wear portable monitor units that collect samples of audio in their environment. This includes audio signals from broadcast radio and television receivers. The radio and television broadcast signals in a survey market are also received by a set of receivers connected to a central computer. Audio features from all of the receivers are recorded in a database on the central computer.
  • portable monitor units When not in use, portable monitor units are placed in docking stations where they can be interrogated by the central computer via dialup modems. Audio feature samples transferred from the portable monitor units are then matched with audio features of multiple broadcast stations stored in the database. This allows the system to determine which radio and television programs are being viewed or heard by each panel member.
  • FIG. 2 is a block diagram of a portable monitor unit.
  • the portable monitoring unit contains a microphone for gathering audio. This audio signal is amplified and lowpass filtered to restrict frequencies to a little over 3 kHz. The filtered signal is then digitized using an analog to digital converter. Waveform samples are then transferred to a digital signal processor.
  • a low-power timer operating from a separate lithium battery activates the digital signal processor at intervals of approximately one minute. It will be understood by those skilled in the art that the digital processor can collect the samples at any period interval, and that use of a one-minute period is a matter of design choice and should not be considered as limiting of the scope of the invention.
  • the digital signal processor then reads samples from the analog to digital converter and extracts features from the audio waveform. The audio features are then compressed and stored in a non-volatile memory. Compressed feature packets with time tags are later transferred through a docking station to the central computer. A rechargeable battery is also included.
  • FIG. 3 shows the three frequency bands that are used for feature extraction in a particularly preferred embodiment of the present invention.
  • the energy in each of these three frequency bands is sampled approximately ten times per second to produce feature waveforms.
  • FIG. 4 illustrates the major components of the central computer that continuously captures broadcast audio from multiple receivers and matches feature packets from portable units with possible broadcast sources.
  • a set of audio amplifiers and lowpass antialias filters provide appropriate gain and restrict the audio frequencies to a little over 3 kHz.
  • a channel multiplexer rapidly scans the filter outputs and transfers the waveforms sequentially to an analog to digital converter producing a multiplexed digital time series.
  • a digital signal processor performs a spectrum analysis and produces energy measurements of each of three frequency bands from each of the input channels. These feature samples are then transferred to a host computer and stored for later comparison.
  • the host computer contains a bank of modems that are used to interrogate the portable monitor units while they are docked. Feature data packets are transferred from the portable units during this interrogation.
  • One or more digital signal processors are connected to the host computer to perform the feature pattern recognition process that identifies which broadcast channel, if any, matches the unknown feature packets from the portable monitoring units.
  • FIG. 5 is a block diagram of the docking station for the portable monitor unit.
  • the docking station contains four components.
  • the first component is a data interface that connects to the portable unit. This interface may include an electrical connection or an infrared link.
  • the data interface connects to a modem that allows telephone communication and transfer of data.
  • a battery charger in the docking station is used to recharge the battery in the portable unit.
  • a modular power supply is included to provide power to the other components.
  • FIG. 6 illustrates an expanded survey system that is intended to operate in multiple cities or markets.
  • a wide area network connects a group of remotely located signal collection systems with a central site. Each of the signal collection systems captures broadcast audio in its region and stores features. It also interrogates the portable monitoring units and gathers the stored feature packets. Data packets from the remote sites are transferred to the central site for processing.
  • FIG. 7 is a flow chart of the audio signal acquisition strategy for the portable monitoring units.
  • the portable monitoring units activate periodically and compute features of the audio in the environment. If there is sufficient audio power the features are compressed and stored.
  • FIG. 8 is a flow chart of procedures used to collect and manage audio features received at central collection sites. This includes the three separate processes of audio collection, feature extraction, and deletion of old feature data.
  • FIG. 9 is a flow chart of the packet identification procedure. Packets are first synchronized with the database. Corresponding data blocks from broadcast audio sources are then matched to find the minimum weighted Euclidean distance to the unknown packet. If this distance is less than a threshold, the unknown packet is identified as matching the broadcast.
  • FIG. 10 is a flow chart of the pattern matching procedure. Unknown feature packets are first zero padded to double their length and then correlated with double length feature segments taken from the reference features on the central computer. The weighted Euclidean distance is then computed from the correlation values and the relative amplitudes of the features stored in the reference patterns.
  • FIG. 11 illustrates the process of averaging successive weighted distances to improve the signal-to-noise ratio and reduce the false detection rate. This is an exponential process where old data have a smaller effect than new data.
  • the audience measurement system consists of a potentially large number of body-worn portable collection units 4 and several central computers 7 located in various markets.
  • the portable monitoring units 4 periodically sample the audio environment and store features representing the structure of the audio presented to the wearer of the device.
  • the central computers continuously capture and store audio features from all available broadcast sources 1 through direct connections to radio and television receivers 6 .
  • the central computers 7 periodically interrogate the portable units 4 while they are idle in docking stations 10 at night via telephone connections and modems 9 .
  • the sampled audio feature packets are then transferred to the central computers for comparison with the broadcast sources. When a match is found, the presumption is that the wearer of the portable unit was listening to the corresponding broadcast station.
  • the resulting identification statistics are used to construct surveys of the listening habits of the users.
  • the portable monitoring units 4 compress the audio feature samples to 200 bytes per sample. Sampling at intervals of one minute, the storage requirements are 200 bytes per minute or 12 kilobytes per hour. During quiet intervals, feature packets are not stored. It is estimated that about 50 percent of the samples will be quiet. The average storage requirement is therefore about 144 kilobytes per day or approximately 1 Megabyte per week.
  • the portable monitoring units are capable of storing about one month of compressed samples.
  • the number of modems 9 required at the central computer 7 or collection site 33 depends on the number of portable monitoring units 4 .
  • a central computer 7 receives broadcast signals directly and stores feature data continuously on its local disk 8 .
  • the required storage is about 173 Megabytes per day or 1210 Megabytes per week. Data older than one week is deleted.
  • the storage requirements increase. However, even with 500 broadcast sources the system needs only 10 Gigabytes of storage for a week of continuous storage.
  • the recognition process requires that the central computer 7 locate time intervals in the stored feature blocks that are time aligned (within a few seconds) with the unknown feature packet. Since each portable monitoring unit 4 produces one packet per minute, the processing load with 500 broadcast sources is 500 pattern matches per minute or about 8 matches per second for each portable monitoring unit. Assuming that there are 500 portable monitoring units in a market the system must perform about 4000 matches per second.
  • the overall system architecture When deployed on a large scale in many markets the overall system architecture is somewhat different as is illustrated in FIG. 6 .
  • the remote computers 33 record the broadcast sources in their particular markets as described above. In addition, they interrogate the portable monitoring units 34 in their area by modem 32 and download the collected feature packets.
  • the signal collection computers 33 are connected to a central site by a wide area data communication network 35 .
  • the central computer site consists of a network 37 of computers 39 that can share the pattern recognition processing load.
  • the local network 37 is connected to the wide area network 35 to allow the central site computers 39 to access the collected feature packets and broadcast feature data blocks.
  • a central computer 39 downloads a day's worth of feature packets from a portable monitoring unit 34 that have been collected by one of the remote computers 33 using modems 32 . Broadcast time segments that correspond to the packet times are then identified and transferred to the central site. The identification is then performed at the central site. Once an initial identification has been made, it is confirmed by matching subsequent packets with broadcast source features from the same channel as the previous recognition. This reduces the amount of data that must be transferred from the remote collection computer to the central site. This is based on the assumption that a listener will continue to listen (or stay tuned) to the same station for some amount of time. When a subsequent match fails, the remaining channels are downloaded for pattern recognition. This continues until a new match has been found. The system then reverts to the single-channel tracking mode.
  • An additional capability of this system configuration is the ability to match broadcast sources in different markets. This is useful where network affiliates may have several different; selections of programming.
  • the audio signal received by small microphone 11 in a portable unit is amplified, lowpass filtered, and digitized by an analog to digital converter 13 .
  • the sample rate is 8 kilosamples per second, resulting in a Nyquist frequency of 4 kHz.
  • an analog lowpass filter 12 rejects frequencies greater than about 3.2 kHz.
  • the analog to digital converter 13 sends the audio samples to a digital signal processing microprocessor 17 that performs the audio processing and feature extraction.
  • the first step in this processing is spectrum analysis and partitioning of the audio spectrum into three frequency bands as shown in FIG. 3 .
  • the frequency bands have been selected to contain approximately equal power on average.
  • the frequency bands are:
  • the spectrum analysis is performed by periodically performing Fast Fourier Transforms (FFT's) on blocks of 64 samples. This produces spectra containing 32 frequency “bins”. The power in each bin is found by squaring its magnitude. The power in each band is then computed as the sum of the power in the corresponding bins frequency. A magnitude value is then computed for each band by taking the square root of the integrated power. The mean value of each of these streams is then removed by using a recursive high-pass filter. The data rate and bandwidth must then be reduced. This is accomplished using polyphase decimating lowpass filters. Two filter stages are employed for each of the three feature streams. Each of these filters reduces the sample rate by a factor of five, resulting in a sample rate of 10 samples per second (per stream) and a bandwidth of about 4 Hz. These are the audio data measurements that are used as features in the pattern recognition process.
  • FFT's Fast Fourier Transforms
  • a similar process is performed at the central computer site as shown in FIG. 4 .
  • audio signals are obtained from direct connections to radio and television broadcast receivers. Since many audio sources must be collected simultaneously, a set of preamplifiers and analog lowpass filters 20 is included. The outputs of these filters are connected to a channel multiplexer 21 that switches sequentially between each audio signal and sends samples of these signals to the analog to digital converter 22 .
  • a digital signal processor 23 then operates on all of the audio time series waveforms to extract the features.
  • the system employs mu-law compression of the feature data. This reduces the data by a factor of two, compressing a 16-bit linear value to an eight bit logarithmic value. This maintains the full dynamic range while retaining adequate resolution for accurate correlation performance.
  • the same feature processing is used in both the portable monitoring units and the central computers. However, the portable monitoring units capture brief segments of 64 feature samples at intervals of approximately one minute as triggered by a timer in the portable monitoring unit.
  • Central computers record continuous streams of feature data.
  • the portable monitoring unit is based on a low-power digital signal processor of the type that is frequently used in such applications as audio processing for digital cellular telephones. Most of the time this processor is in an idle or sleep condition to conserve battery power. However, an electronic timer operates continuously and activates the DSP at intervals of approximately one minute.
  • the DSP 17 collects about six seconds of audio from the analog to digital converter 13 and extracts audio features from the three frequency bands as described previously. The value of the timer 15 is also read for use in time marking the collected signals.
  • the portable monitoring unit also includes a rechargeable battery 19 and a docking station data interface 18 .
  • the total audio power present in the six-second block is computed to determine if an audio signal is present.
  • the audio signal power is then compared with an activation threshold. If the power is less than the threshold the collected data are discarded, and the DSP 17 returns to the inactive state until the next sampling interval. This avoids the need to store data blocks that are collected while the user is asleep or in a quiet environment. If the audio power is greater than the threshold, then the data block is stored in a non-volatile memory 16 .
  • Feature data to be stored are organized as 64 samples of each of the three feature streams. These data are first mu-law compressed from 16 bit linear samples to 8 bit logarithmic samples. The resulting data packets therefore contain 192 data bytes. The data packets also contain a four-byte unit identification code and a four-byte timer value for a total of 200 bytes per packet. The data packets are stored in a non-volatile flash memory 16 so that they will be retained when power is not applied. After storing the data packet, the unit returns to the sleep-state until the next sampling interval. This procedure is illustrated in flow-chart form in FIG. 7 .
  • FIG. 5 is a block diagram of the portable unit docking station 10 .
  • the docking station includes a data interface 28 to the portable unit 4 and a dialup modem 29 that is used to communicate with modems 9 that are connected to the central computer 7 .
  • An AC power supply 31 supplies power to the docking station and also powers a battery charger 30 that is used to recharge the battery 19 in the portable monitoring unit 4 .
  • packets are transferred in reverse order. That is, the newest data packets are transferred first, proceeding backwards in time.
  • the central computer continues to transfer packets until it encounters a packet that has been previously transferred.
  • Each portable monitoring unit 4 optionally includes a motion detector or sensor (not shown) that detects whether or not the device is actually been worn or carried by the user. Data indicating movement of the device is then stored (for later downloading and analysis) along with the audio feature information described above. In one embodiment, audio feature information is discarded or ignored in the survey process if the output of the motion detector indicated that the device 4 was not actually been worn or carried during a significant period of time when the audio information was being recorded.
  • Each portable monitoring unit 4 also optionally includes a receiver (not shown) used for determining the position of the unit (e.g., a GPS receiver, a cellular telephone receiver, etc.). Data indicating position of the device is then stored (for later downloading and analysis) along with the audio feature information described above. In one embodiment, the downloaded position information is used by the central computer to determine which signal collection station's features to access for comparison.
  • a receiver not shown
  • Data indicating position of the device is then stored (for later downloading and analysis) along with the audio feature information described above.
  • the downloaded position information is used by the central computer to determine which signal collection station's features to access for comparison.
  • the central computer In contrast with the portable monitoring units that sample the audio environment periodically, the central computer must operate continuously, storing feature data blocks from many audio sources. The central computer then compares feature packets that have been downloaded from the portable units with sections of audio files that occurred at the same date and time. There are three separate processes operating in the data collection and storage aspect of central computer operation. The first of these is the collection and storage of digitized audio data and storage on the disks 8 of the central computer. The second task is the extraction of feature data and the storage of time-tagged blocks of feature data on the disk. The third task is the automatic deletion of feature files that are old enough that they can be considered to be irrelevant (one week). These processes are illustrated in FIG. 8 .
  • Audio signals may be received from any of a number of sources including broadcast radio and television, satellite distribution systems, subscription services, and the internet. Digitized audio signals are stored for a relatively short time (along with time markers) on the central computer pending processing to extract the audio features. It is frequently beneficial to directly compute the features in real-time using special purpose DSP boards that combine analog to digital conversion with feature extraction. In this case the temporary storage of raw audio is greatly reduced.
  • the audio feature blocks are computed in the same manner as for the portable monitoring units.
  • the central computer system 7 selects a block of audio data from a particular channel or source and performs a spectrum analysis. It then integrates the power in each of three frequency bands and outputs a measurement. Sequences of these measurements are lowpass filtered and decimated to produce a feature sample rate of 10 samples per second for each of the three bands. Mu-law compression is used to produce logarithmic amplitude measurements of one byte each, reducing the storage requirements. Feature samples are gathered into blocks, labeled with their source and time, and stored on the disk. This process is repeated for all available data blocks from all channels. The system then waits for more audio data to become available.
  • feature files are labeled with their date and time of initiation. For example, a file name may be automatically constructed that contains the day of the week and hour of the day. An independent task then scans the feature storage areas and deletes files that are older than a specified amount. While the system expects to interrogate portable monitoring units on a daily basis and to compare their collected features with the data base every day, there will be cases where it will not be possible to interrogate some of the portable units for several days. Therefore, feature data are retained at the central computer site for about a week. After that, the results will no longer be useful.
  • the central computer 7 compares audio feature blocks stored on its own disk drive 8 with those from a portable monitoring unit 4 , it must match its time markers with those transferred from the portable monitoring unit. This reduces the amount of searching that must be done, improving the speed and accuracy of the processing.
  • Each portable monitoring unit 4 contains its own internal clock 15 . To avoid the need to set this clock or maintain any specific calibration, a simple 3 2-bit counter is used that is incremented at a 10 Hz rate. This 10 Hz signal is derived from an accurate crystal oscillator. In fact, the absolute accuracy of this oscillator is not very important. What is important is the stability of the oscillator.
  • the central site interrogates each portable monitoring unit at intervals of between one day and once per week. As part of this procedure the central site reads the current value of the counter in the portable monitoring unit. It will also note its own time count and store both values. To synchronize time the system subtracts the time count that was read from the portable unit during the previous interrogation from the current value.
  • the system computes the number of counts that occurred at the central site (the official time) by subtracting its stored counter value from the current counter value. If the frequencies are the same, the same number of counts will have transpired over the same time interval (6.048 Million counts per week).
  • the portable unit 4 can be synchronized to the central computer 7 by adding the difference between the starting counts to the time markers that identify each audio feature measurement packet. This is the simplest case.
  • the typical case is where the oscillators are running at slightly different frequencies. It is still necessary to align the starting counter values, but the system must also compute a scale factor and apply it to time markers received from the portable monitoring unit.
  • This scale factor is computed by dividing the number of counts from the central computer by the number of counts from the portable unit that occurred over the same time interval.
  • the first order (linear) time synchronization requires computation of an offset and a scale factor to be applied to the time marks from the portable monitoring unit.
  • Compute Offset Off S c ⁇ S p Compute Central Counts
  • C c E c ⁇ S c Compute Portable Counts
  • Time markers can then be converted from the portable monitoring unit to the central computer frame of reference:
  • the remaining concern is short-term drift of the oscillator in the portable monitoring unit. This is primarily due to temperature changes. The goal is to stay within one second of the linearly interpolated time. The worst timing errors occur when the frequency deviates in one direction and then in the opposite direction. However, it has been determined that stability will be adequate over realistic temperature ranges.
  • the audience survey system includes pattern recognition algorithms that determine which of many possible audio sources was captured by a particular portable monitoring unit 4 at a certain time.
  • the central computers 7 preferably employ high performance PC's 25 that have been augmented by digital signal processors 26 that have been optimized to perform functions such as correlations and vector operations.
  • FIG. 9 summarizes the signal recognition procedure.
  • the system should be able to find stored feature blocks that are within about one second from the feature packets received from the portable units.
  • the tolerance for time alignment is about +/ ⁇ 3 seconds, leaving some room to deal with unusual situations.
  • the system can search for pattern matches outside of the tolerance window, but this slows down the processing.
  • the central computer can repeat all of the pattern matches using an expanded search window. Then when matches are found, their times of occurrence can be used as checkpoints to update the timing information.
  • the need to resort to these measures may indicate a malfunction of the portable monitoring unit or its exposure to environmental extremes.
  • the pattern recognition process involves computing the degree of match with reference patterns derived from features of each of the sources. As shown in FIG. 9 , this degree of match is measured as a weighted Euclidean distance in three-dimensional space. The distance metric indicates a perfect match as a distance of zero. Small distances indicate a closer match than large distances. Therefore, the system must find the source that produces the smallest distance to the unknown feature packet. This distance is then compared with a threshold value. If the distance is below the threshold, the system will report that the unknown packet matches the corresponding source and record the source identification. If the minimum distance is greater than the threshold, the system presumes that the unknown feature packet does not match any of the sources and record that the source is unknown.
  • Feature packets from a portable monitoring unit 4 contain 64 samples from each of the three bands. These must first be mu-law decompressed to produce 16 bit linear values. Each of the three feature waveforms is then normalized by dividing each value by the standard deviation (square root of power) computed over the three signals. This corrects for the audio volume to which the portable unit was exposed when the feature packet was collected. Each of the three normalized waveforms is then padded with a block of zeroes to a total length of 128 samples per feature band. This is necessary to take advantage of a fast correlation algorithm based on the FFT.
  • the system locates a block of samples consisting of 128 samples of each feature as determined by the time alignment calculation. This will include the time offset needed to assure that the needed three second margins are present at the beginning and end of the expected location of the unknown packet.
  • the system calculates the cross-correlation functions between each of the three waveforms of the unknown feature packet and the corresponding source waveforms. In the fast correlation algorithm this requires that both the unknown and the reference source waveforms are transformed to the frequency domain using a fast Fourier transform.
  • the system then performs a conjugate vector cross-product of the resulting complex spectra and then performs an inverse fast Fourier transform on the result.
  • the resulting correlation functions are then normalized by the sliding standard deviation of each computed over a 64 sample window.
  • Each of the three correlation functions representing the three frequency bands have a maximum value of one for a perfect match to zero for no correlation to minus one for an exact opposite.
  • Each of the correlation values is converted to a distance component by subtracting it from one.
  • each component makes an equal contribution to the overall distance regardless of the relative amplitudes of the audio in the three bands.
  • the present invention aims to avoid situations where background noise in an otherwise quiet band disturbs the contributions of frequency bands containing useful signal energy. Therefore, the system reintroduces relative amplitude information to the distance calculation by weighting each component by the standard deviations computed from the reference pattern as shown in equation (2) below.
  • D w [((std 1 )*(1 ⁇ cv 1 )) 2 +((std 2 )*(1 ⁇ cv 2 )) 2 +((std 3 )*(1 ⁇ cv 3 )) 2 ] 1/2 /[(std 1 ) 2 +(std 2 ) 2 +(std 3 ) 2 ] 1/2 (2)
  • the sequence of operations can be rearranged to combine some steps and eliminate others.
  • the resulting weighted Euclidean distance automatically adapts to the relative amplitudes of the frequency bands and will tend to reduce the effects of broadband noise that is present at the portable unit and not at the source.
  • a variation of the weighted Euclidean distance involves integrating or averaging successive distances calculated from a sequence of feature packets received from a portable unit as shown in FIG. 11 .
  • the weighted distance is computed as above for the first packet.
  • a second packet is then obtained and precisely aligned with feature blocks from the same source in the central computer. Again, the weighted Euclidean distance is calculated. If the two packets are from the same source, the minimum distance will occur at the same relative time delay in the distance calculation. For each of the 64 time delays in the distance array for a particular source the system computes a recursive update of the distance where the averaged distance is decayed slightly by multiplying it by a coefficient k that is less than one.
  • the decision rule for this process is the same as for the un-averaged case.
  • the minimum averaged distance from all sources is first found. This is compared with a distance threshold. If the minimum distance is less than the threshold, a detection has occurred and the source identification is recorded. Otherwise the system reports that the source is unknown.

Abstract

A system and method are disclosed for performing audience surveys of broadcast audio from radio and television. A small body-worn portable collection unit samples the audio environment of the survey member and stores highly compressed features of the audio programming. A central computer simultaneously collects the audio outputs from a number of radio and television receivers representing the possible selections that a survey member may choose. On a regular schedule the central computer interrogates the portable units used in the survey and transfers the captured audio feature samples. The central computer then applies a feature pattern recognition technique to identify which radio or television station the survey member was listening to at various times of day. This information is then used to estimate the popularity of the various broadcast stations.

Description

This application claims the benefit of U.S. Provisional Application(s) No(s): 60/140,190 filing data Jun. 18, 1999
FIELD OF THE INVENTION
The invention relates to a method and system for automatically identifying which of a number of possible audio sources is present in the vicinity of an audience member. This is accomplished through the use of audio pattern recognition techniques. A system and method is disclosed that employs small portable monitoring units worn or carried by people selected to form a panel that is representative of a given population. Audio samples taken at regular intervals are compressed and stored for later comparison with reference signals collected at a central site. This allows a determination to be made regarding which broadcast audio signals each survey member is listening to at different times of day. An automatic survey of listening preferences can then be conducted.
DISCUSSION OF THE PRIOR ART
Radio and television surveys have been conducted for many years to determine the relative popularity of programs and broadcast stations. This information is necessary for a number of reasons including the determination of advertising price structure and deciding if certain programs should be continued or canceled. One of the most common methods for performing these surveys is for survey members to manually record the radio and television stations that they listen to and watch at various times of day. The maintaining of these manual logs is cumbersome and inaccurate. Additionally, transferring the information in the logs to an automated system represents an additional time consuming process.
Various systems have been developed that provide a degree of automation to conducting these surveys. In a typical semiautomatic survey system an electronic device records which television station is being viewed in a survey member's home. The survey member may optionally enter the number of people who are viewing the program. These data are electronically transferred to a central location where survey statistics are compiled.
Automatic survey systems have been devised that substantially improve efficiency. Many of the methods used involve the injection of a coded identification signal within the audio or video. There are several problems with these so-called active identification systems. First, each broadcaster must cooperate with the survey organization by installing the coding equipment in its broadcast facility. This represents an additional expense and complication to the broadcaster that may not be acceptable. The use of identification codes can also result in audio or video artifacts that are objectionable to the audience. An active encoding system is described by Best et al. in U.S. Pat. No. 4,876,617. Best employs two notch filters to remove narrow frequency bands from the audio signal. A frequency shift keyed signal is then injected into these notches to carry the identification code. Codes are repeatedly inserted into the audio when there is sufficient signal energy to mask the codes. However, when the injection level of the code is sufficient to assure reliable decoding it is perceptible to listeners. Conversely, when the code injection level is reduced to become imperceptible decoding reliability suffers. Best has improved on this invention as taught in U.S. Pat. No. 5,113,437. This system uses several sets of code frequencies and switches among them in a pseudo-random manner. This reduces the audibility of the codes.
Fardeau et al. describe a different type of system in U.S. Pat. No. 5,574,962 and U.S. Pat. No. 5,581,800 where the energy in one or more frequency bands is modulated in a predetermined manner to create a coded message. A small body-worn (or carried) device receives the encoded audio from a microphone and recovers the embedded code. After decoding, the identification code is stored for later transfer to a central computer. The problem remains that all broadcast stations to be detected by the system must be persuaded to install code generation and insertion equipment in their audio feeds.
Broughton et al. describe a video signaling method in U.S. Pat. No. 4,807,031 that encodes a message by modulating the relative luminance of the two fields comprising a video frame. While intended for use in interactive television, this method can also be used to encode a channel identification code. An obvious limitation is that this method cannot be used for radio broadcasts. Additionally, the television broadcast equipment must be altered to include the identification code insertion.
Passive signal recognition techniques have been developed for the identification of prerecorded audio and video sources. These systems use the features of the signal itself as the identification key. The unknown signal is then compared with a library of similarly derived features using a pattern recognition procedure. One of the earliest works in this area is presented by Moon et al. in U.S. Pat. No. 3,919,479. Moon teaches that correlation functions can be used to identify audio segments by matching them with replicas stored in a database. Moon also describes the method of extracting sub-audio envelope features. These envelope signals are more robust than the audio itself, but Moon's approach still suffers from sensitivity to distortion and speed errors.
A multiple stage pattern recognition system is described by Kenyon et al. in U.S. Pat. No. 4,843,562. This method uses low-bandwidth features of the audio signal to quickly determine which patterns can be immediately rejected. Those that remain are subjected to a high-resolution correlation with time warping to compensate for speed errors. This system is intended for use with a large number of candidate patterns. The algorithms used are too complex to be used in a portable survey system.
Another representative passive signal recognition system and method is disclosed by Lamb et al. in U.S. Pat. No. 5,437,050. Lamb performs a spectrum analysis based on the semitones of the musical scale and extracts a sequence of measurements forming a spectrogram. Cells within this spectrogram are determined to be active or inactive depending on the relative power in each cell. The spectrogram is then compared to a set of reference patterns using a logical procedure to determine the identity of the unknown input. This technique is sensitive to speed variation and even small amounts of distortion.
Kiewit et al. have devised a system specifically for the purpose of conducting automatic audience surveys as disclosed in U.S. Pat. No. 4,697,209. This system uses trigger events such as scene changes or blank video frames to determine when features of the signal should be collected. When a trigger event is detected, features of the video waveform are extracted and stored along with the time of occurrence in a local memory. These captured video features are periodically transmitted to a central site for comparison with a set of reference video features from all of the possible television signals. The obvious shortcoming of this system is that it cannot be used to conduct audience surveys of radio broadcasts.
The present invention combines certain aspects of several of the above inventions, but in a unique and novel manner to define a system and method that is suited to conducting audience surveys of both radio and television broadcasts.
SUMMARY OF THE INVENTION
It is an objective of the present invention to provide a method and apparatus for conducting audience surveys of radio and television broadcasts. This is accomplished using a number of body-worn portable monitoring units. These units periodically sample the acoustic environment of each survey member using a microphone. The audio signal is digitized and features of the audio are extracted and compressed to reduce the amount of storage required. The compressed audio features are then marked with the time of acquisition and stored in a local memory.
A central computer extracts features from the audio of radio and television broadcast stations using direct connection to a group of receivers. The audio is digitized and features are extracted in the same manner as for the portable monitoring units. However, the features are extracted continuously for all broadcast sources in a market. The feature streams are compressed, time-marked and stored on the central computer disk drives.
When the portable monitoring units assigned to survey members are not being worn (or carried), they are stored in docking stations that recharge the batteries and also provide modems and telephone access. On a daily basis, or every several days, the central computer interrogates the docked portable monitoring unit using the modem and transfers the stored feature packets to the central computer for analysis. This is done late at night or early in the morning when the portable monitoring unit is not in use and the phone line is available.
In addition to transferring the feature packets, the current time marker is transferred from the portable monitoring unit to the central computer. By comparing the current time marker with the time marker transferred during the last interrogation the central computer can determine the apparent elapsed time as seen by the portable monitoring unit. The central computer then makes a similar calculation based on the absolute time of interrogation and the previous interrogation time. The central computer can then perform the necessary interpolations and time translations to synchronize the feature data packets received from the portable monitoring unit with feature data stored in the central computer.
By comparing the audio feature data collected by a portable monitoring unit with the broadcast audio features collected at the central computer site, the system can determine which broadcast station the survey member was listening to at a particular time. This is accomplished by computing cross-correlation functions for each of three audio frequency bands between the unknown feature packet and features collected at the same time by the central computer for many different broadcast stations. The fast correlation method based on the FFT algorithm is used to produce a set of normalized correlation values spanning a time window of approximately six seconds. This is sufficient to cover residual time synchronization errors between the portable monitoring unit and the central computer. The correlation functions for the three frequency bands will each have a value of +1.0 for a perfect match, 0.0 for no correlation, and −1.0 for an exact opposite. These three correlation functions are combined to form a figure of merit that is a three dimensional Euclidean distance from a perfect match. This distance is calculated as the square root of the sum of the squares of the individual distances, where the individual distance is equal to (1,0-correlation value). In this representation, a perfect match has a distance of zero from the reference pattern. In an improved embodiment of the invention the contributions of each of the features is weighted according to the relative amplitudes of the feature waveforms stored in the central computer database. This has the effect of assigning more weight to features that are expected to have a higher signal-to-noise ratio.
The minimum value of the resulting distance is then found for each of the candidate patterns collected from the broadcast stations. This represents the best match for each of the broadcast stations. The minimum of these is then selected as the broadcast source that best matches the unknown feature packet from the portable monitoring unit. If this value is less than a predetermined threshold, the feature packet is assumed to be the same as the feature data from the corresponding broadcast station. The system then makes the assertion that the survey member was listening to that radio or television station at that particular time.
By collecting and processing these feature packets from many survey members in the context of many potential broadcast sources, comprehensive audience surveys can be conducted. Further, this can be done faster and more accurately than was possible using previous methods.
DESCRIPTION OF THE DRAWINGS
The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the following drawings:
FIG. 1 illustrates the functional components of the invention and how they interact to function as an audience measurement system. Audience survey panel members wear portable monitor units that collect samples of audio in their environment. This includes audio signals from broadcast radio and television receivers. The radio and television broadcast signals in a survey market are also received by a set of receivers connected to a central computer. Audio features from all of the receivers are recorded in a database on the central computer. When not in use, portable monitor units are placed in docking stations where they can be interrogated by the central computer via dialup modems. Audio feature samples transferred from the portable monitor units are then matched with audio features of multiple broadcast stations stored in the database. This allows the system to determine which radio and television programs are being viewed or heard by each panel member.
FIG. 2 is a block diagram of a portable monitor unit. The portable monitoring unit contains a microphone for gathering audio. This audio signal is amplified and lowpass filtered to restrict frequencies to a little over 3 kHz. The filtered signal is then digitized using an analog to digital converter. Waveform samples are then transferred to a digital signal processor. A low-power timer operating from a separate lithium battery activates the digital signal processor at intervals of approximately one minute. It will be understood by those skilled in the art that the digital processor can collect the samples at any period interval, and that use of a one-minute period is a matter of design choice and should not be considered as limiting of the scope of the invention. The digital signal processor then reads samples from the analog to digital converter and extracts features from the audio waveform. The audio features are then compressed and stored in a non-volatile memory. Compressed feature packets with time tags are later transferred through a docking station to the central computer. A rechargeable battery is also included.
FIG. 3 shows the three frequency bands that are used for feature extraction in a particularly preferred embodiment of the present invention. The energy in each of these three frequency bands is sampled approximately ten times per second to produce feature waveforms.
FIG. 4 illustrates the major components of the central computer that continuously captures broadcast audio from multiple receivers and matches feature packets from portable units with possible broadcast sources. A set of audio amplifiers and lowpass antialias filters provide appropriate gain and restrict the audio frequencies to a little over 3 kHz. A channel multiplexer rapidly scans the filter outputs and transfers the waveforms sequentially to an analog to digital converter producing a multiplexed digital time series. A digital signal processor performs a spectrum analysis and produces energy measurements of each of three frequency bands from each of the input channels. These feature samples are then transferred to a host computer and stored for later comparison. The host computer contains a bank of modems that are used to interrogate the portable monitor units while they are docked. Feature data packets are transferred from the portable units during this interrogation. One or more digital signal processors are connected to the host computer to perform the feature pattern recognition process that identifies which broadcast channel, if any, matches the unknown feature packets from the portable monitoring units.
FIG. 5 is a block diagram of the docking station for the portable monitor unit. The docking station contains four components. The first component is a data interface that connects to the portable unit. This interface may include an electrical connection or an infrared link. The data interface connects to a modem that allows telephone communication and transfer of data. A battery charger in the docking station is used to recharge the battery in the portable unit. A modular power supply is included to provide power to the other components.
FIG. 6 illustrates an expanded survey system that is intended to operate in multiple cities or markets. A wide area network connects a group of remotely located signal collection systems with a central site. Each of the signal collection systems captures broadcast audio in its region and stores features. It also interrogates the portable monitoring units and gathers the stored feature packets. Data packets from the remote sites are transferred to the central site for processing.
FIG. 7 is a flow chart of the audio signal acquisition strategy for the portable monitoring units. The portable monitoring units activate periodically and compute features of the audio in the environment. If there is sufficient audio power the features are compressed and stored.
FIG. 8 is a flow chart of procedures used to collect and manage audio features received at central collection sites. This includes the three separate processes of audio collection, feature extraction, and deletion of old feature data.
FIG. 9 is a flow chart of the packet identification procedure. Packets are first synchronized with the database. Corresponding data blocks from broadcast audio sources are then matched to find the minimum weighted Euclidean distance to the unknown packet. If this distance is less than a threshold, the unknown packet is identified as matching the broadcast.
FIG. 10 is a flow chart of the pattern matching procedure. Unknown feature packets are first zero padded to double their length and then correlated with double length feature segments taken from the reference features on the central computer. The weighted Euclidean distance is then computed from the correlation values and the relative amplitudes of the features stored in the reference patterns.
FIG. 11 illustrates the process of averaging successive weighted distances to improve the signal-to-noise ratio and reduce the false detection rate. This is an exponential process where old data have a smaller effect than new data.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The audience measurement system according to the invention consists of a potentially large number of body-worn portable collection units 4 and several central computers 7 located in various markets. The portable monitoring units 4 periodically sample the audio environment and store features representing the structure of the audio presented to the wearer of the device. The central computers continuously capture and store audio features from all available broadcast sources 1 through direct connections to radio and television receivers 6. The central computers 7 periodically interrogate the portable units 4 while they are idle in docking stations 10 at night via telephone connections and modems 9. The sampled audio feature packets are then transferred to the central computers for comparison with the broadcast sources. When a match is found, the presumption is that the wearer of the portable unit was listening to the corresponding broadcast station. The resulting identification statistics are used to construct surveys of the listening habits of the users.
In typical operation, the portable monitoring units 4 compress the audio feature samples to 200 bytes per sample. Sampling at intervals of one minute, the storage requirements are 200 bytes per minute or 12 kilobytes per hour. During quiet intervals, feature packets are not stored. It is estimated that about 50 percent of the samples will be quiet. The average storage requirement is therefore about 144 kilobytes per day or approximately 1 Megabyte per week. The portable monitoring units are capable of storing about one month of compressed samples.
If the portable monitoring units are interrogated daily, approximately one minute will be required to transfer the most recent samples to a central computer or collection site. The number of modems 9 required at the central computer 7 or collection site 33 depends on the number of portable monitoring units 4.
In a single market or a relatively small region, a central computer 7 receives broadcast signals directly and stores feature data continuously on its local disk 8. Assuming that on average a market will have 10 TV stations and 50 radio stations, the required storage is about 173 Megabytes per day or 1210 Megabytes per week. Data older than one week is deleted. Obviously, as more sources are acquired through, e.g., satellite network feeds and cable television, the storage requirements increase. However, even with 500 broadcast sources the system needs only 10 Gigabytes of storage for a week of continuous storage.
The recognition process requires that the central computer 7 locate time intervals in the stored feature blocks that are time aligned (within a few seconds) with the unknown feature packet. Since each portable monitoring unit 4 produces one packet per minute, the processing load with 500 broadcast sources is 500 pattern matches per minute or about 8 matches per second for each portable monitoring unit. Assuming that there are 500 portable monitoring units in a market the system must perform about 4000 matches per second.
When deployed on a large scale in many markets the overall system architecture is somewhat different as is illustrated in FIG. 6. There are separate remote signal collection computers 33 installed in each city or market. The remote computers 33 record the broadcast sources in their particular markets as described above. In addition, they interrogate the portable monitoring units 34 in their area by modem 32 and download the collected feature packets. The signal collection computers 33 are connected to a central site by a wide area data communication network 35. The central computer site consists of a network 37 of computers 39 that can share the pattern recognition processing load. The local network 37 is connected to the wide area network 35 to allow the central site computers 39 to access the collected feature packets and broadcast feature data blocks. In operation, a central computer 39 downloads a day's worth of feature packets from a portable monitoring unit 34 that have been collected by one of the remote computers 33 using modems 32. Broadcast time segments that correspond to the packet times are then identified and transferred to the central site. The identification is then performed at the central site. Once an initial identification has been made, it is confirmed by matching subsequent packets with broadcast source features from the same channel as the previous recognition. This reduces the amount of data that must be transferred from the remote collection computer to the central site. This is based on the assumption that a listener will continue to listen (or stay tuned) to the same station for some amount of time. When a subsequent match fails, the remaining channels are downloaded for pattern recognition. This continues until a new match has been found. The system then reverts to the single-channel tracking mode.
The above process is repeated for all portable monitoring units 34 in all markets. In instances where markets overlap, feature packets from a particular portable unit can be compared with data from each market. This is accomplished by downloading the appropriate channel data from each market. In addition, signals that are available over a broad area such as satellite feeds, direct satellite broadcasts, etc. are collected directly at the central site using one or more satellite receivers 36. This includes many sources that are distributed over cable networks such as movie channels and other premium services. This reduces the number of sources that must be collected remotely (and redundantly) by the signal collection computers.
An additional capability of this system configuration is the ability to match broadcast sources in different markets. This is useful where network affiliates may have several different; selections of programming.
In the preferred embodiment of the portable monitoring unit shown in FIG. 2 the audio signal received by small microphone 11 in a portable unit is amplified, lowpass filtered, and digitized by an analog to digital converter 13. The sample rate is 8 kilosamples per second, resulting in a Nyquist frequency of 4 kHz. To avoid alias distortion, an analog lowpass filter 12 rejects frequencies greater than about 3.2 kHz. The analog to digital converter 13 sends the audio samples to a digital signal processing microprocessor 17 that performs the audio processing and feature extraction. The first step in this processing is spectrum analysis and partitioning of the audio spectrum into three frequency bands as shown in FIG. 3.
The frequency bands have been selected to contain approximately equal power on average. In one embodiment, the frequency bands are:
Band 1: 50 Hz-500 Hz
Band 2: 500 Hz-1500 Hz
Band 3: 1500 Hz-3250 Hz
It will be understood by those skilled in the art that other frequency bands may be used to implement the teachings of the present invention.
The spectrum analysis is performed by periodically performing Fast Fourier Transforms (FFT's) on blocks of 64 samples. This produces spectra containing 32 frequency “bins”. The power in each bin is found by squaring its magnitude. The power in each band is then computed as the sum of the power in the corresponding bins frequency. A magnitude value is then computed for each band by taking the square root of the integrated power. The mean value of each of these streams is then removed by using a recursive high-pass filter. The data rate and bandwidth must then be reduced. This is accomplished using polyphase decimating lowpass filters. Two filter stages are employed for each of the three feature streams. Each of these filters reduces the sample rate by a factor of five, resulting in a sample rate of 10 samples per second (per stream) and a bandwidth of about 4 Hz. These are the audio data measurements that are used as features in the pattern recognition process.
A similar process is performed at the central computer site as shown in FIG. 4. However, audio signals are obtained from direct connections to radio and television broadcast receivers. Since many audio sources must be collected simultaneously, a set of preamplifiers and analog lowpass filters 20 is included. The outputs of these filters are connected to a channel multiplexer 21 that switches sequentially between each audio signal and sends samples of these signals to the analog to digital converter 22. A digital signal processor 23 then operates on all of the audio time series waveforms to extract the features.
To reduce the storage requirements in both the portable units and the central computers, the system employs mu-law compression of the feature data. This reduces the data by a factor of two, compressing a 16-bit linear value to an eight bit logarithmic value. This maintains the full dynamic range while retaining adequate resolution for accurate correlation performance. The same feature processing is used in both the portable monitoring units and the central computers. However, the portable monitoring units capture brief segments of 64 feature samples at intervals of approximately one minute as triggered by a timer in the portable monitoring unit. Central computers record continuous streams of feature data.
The portable monitoring unit is based on a low-power digital signal processor of the type that is frequently used in such applications as audio processing for digital cellular telephones. Most of the time this processor is in an idle or sleep condition to conserve battery power. However, an electronic timer operates continuously and activates the DSP at intervals of approximately one minute. The DSP 17 collects about six seconds of audio from the analog to digital converter 13 and extracts audio features from the three frequency bands as described previously. The value of the timer 15 is also read for use in time marking the collected signals. The portable monitoring unit also includes a rechargeable battery 19 and a docking station data interface 18.
In addition to the features that are collected, the total audio power present in the six-second block is computed to determine if an audio signal is present. The audio signal power is then compared with an activation threshold. If the power is less than the threshold the collected data are discarded, and the DSP 17 returns to the inactive state until the next sampling interval. This avoids the need to store data blocks that are collected while the user is asleep or in a quiet environment. If the audio power is greater than the threshold, then the data block is stored in a non-volatile memory 16.
Feature data to be stored are organized as 64 samples of each of the three feature streams. These data are first mu-law compressed from 16 bit linear samples to 8 bit logarithmic samples. The resulting data packets therefore contain 192 data bytes. The data packets also contain a four-byte unit identification code and a four-byte timer value for a total of 200 bytes per packet. The data packets are stored in a non-volatile flash memory 16 so that they will be retained when power is not applied. After storing the data packet, the unit returns to the sleep-state until the next sampling interval. This procedure is illustrated in flow-chart form in FIG. 7.
FIG. 5 is a block diagram of the portable unit docking station 10. The docking station includes a data interface 28 to the portable unit 4 and a dialup modem 29 that is used to communicate with modems 9 that are connected to the central computer 7. An AC power supply 31 supplies power to the docking station and also powers a battery charger 30 that is used to recharge the battery 19 in the portable monitoring unit 4.
When the portable monitoring unit 4 is in its docking station 10 and communicates with a central computer 7, packets are transferred in reverse order. That is, the newest data packets are transferred first, proceeding backwards in time. The central computer continues to transfer packets until it encounters a packet that has been previously transferred.
Each portable monitoring unit 4 optionally includes a motion detector or sensor (not shown) that detects whether or not the device is actually been worn or carried by the user. Data indicating movement of the device is then stored (for later downloading and analysis) along with the audio feature information described above. In one embodiment, audio feature information is discarded or ignored in the survey process if the output of the motion detector indicated that the device 4 was not actually been worn or carried during a significant period of time when the audio information was being recorded.
Each portable monitoring unit 4 also optionally includes a receiver (not shown) used for determining the position of the unit (e.g., a GPS receiver, a cellular telephone receiver, etc.). Data indicating position of the device is then stored (for later downloading and analysis) along with the audio feature information described above. In one embodiment, the downloaded position information is used by the central computer to determine which signal collection station's features to access for comparison.
In contrast with the portable monitoring units that sample the audio environment periodically, the central computer must operate continuously, storing feature data blocks from many audio sources. The central computer then compares feature packets that have been downloaded from the portable units with sections of audio files that occurred at the same date and time. There are three separate processes operating in the data collection and storage aspect of central computer operation. The first of these is the collection and storage of digitized audio data and storage on the disks 8 of the central computer. The second task is the extraction of feature data and the storage of time-tagged blocks of feature data on the disk. The third task is the automatic deletion of feature files that are old enough that they can be considered to be irrelevant (one week). These processes are illustrated in FIG. 8.
Audio signals may be received from any of a number of sources including broadcast radio and television, satellite distribution systems, subscription services, and the internet. Digitized audio signals are stored for a relatively short time (along with time markers) on the central computer pending processing to extract the audio features. It is frequently beneficial to directly compute the features in real-time using special purpose DSP boards that combine analog to digital conversion with feature extraction. In this case the temporary storage of raw audio is greatly reduced.
The audio feature blocks are computed in the same manner as for the portable monitoring units. The central computer system 7 selects a block of audio data from a particular channel or source and performs a spectrum analysis. It then integrates the power in each of three frequency bands and outputs a measurement. Sequences of these measurements are lowpass filtered and decimated to produce a feature sample rate of 10 samples per second for each of the three bands. Mu-law compression is used to produce logarithmic amplitude measurements of one byte each, reducing the storage requirements. Feature samples are gathered into blocks, labeled with their source and time, and stored on the disk. This process is repeated for all available data blocks from all channels. The system then waits for more audio data to become available.
In order to control the requirement for disk file storage, feature files are labeled with their date and time of initiation. For example, a file name may be automatically constructed that contains the day of the week and hour of the day. An independent task then scans the feature storage areas and deletes files that are older than a specified amount. While the system expects to interrogate portable monitoring units on a daily basis and to compare their collected features with the data base every day, there will be cases where it will not be possible to interrogate some of the portable units for several days. Therefore, feature data are retained at the central computer site for about a week. After that, the results will no longer be useful.
When the central computer 7 compares audio feature blocks stored on its own disk drive 8 with those from a portable monitoring unit 4, it must match its time markers with those transferred from the portable monitoring unit. This reduces the amount of searching that must be done, improving the speed and accuracy of the processing.
Each portable monitoring unit 4 contains its own internal clock 15. To avoid the need to set this clock or maintain any specific calibration, a simple 3 2-bit counter is used that is incremented at a 10 Hz rate. This 10 Hz signal is derived from an accurate crystal oscillator. In fact, the absolute accuracy of this oscillator is not very important. What is important is the stability of the oscillator. The central site interrogates each portable monitoring unit at intervals of between one day and once per week. As part of this procedure the central site reads the current value of the counter in the portable monitoring unit. It will also note its own time count and store both values. To synchronize time the system subtracts the time count that was read from the portable unit during the previous interrogation from the current value. Similarly, the system computes the number of counts that occurred at the central site (the official time) by subtracting its stored counter value from the current counter value. If the frequencies are the same, the same number of counts will have transpired over the same time interval (6.048 Million counts per week). In this case the portable unit 4 can be synchronized to the central computer 7 by adding the difference between the starting counts to the time markers that identify each audio feature measurement packet. This is the simplest case.
The typical case is where the oscillators are running at slightly different frequencies. It is still necessary to align the starting counter values, but the system must also compute a scale factor and apply it to time markers received from the portable monitoring unit. This scale factor is computed by dividing the number of counts from the central computer by the number of counts from the portable unit that occurred over the same time interval. The first order (linear) time synchronization requires computation of an offset and a scale factor to be applied to the time marks from the portable monitoring unit.
Compute Offset Off = Sc − Sp
Compute Central Counts Cc = Ec − Sc
Compute Portable Counts Cp = Ep − Sp
Compute Scale Factor Scl = Cc/Cp
Time markers can then be converted from the portable monitoring unit to the central computer frame of reference:
Convert Time Marker Tc=(Tp+Off)*Scl
The remaining concern is short-term drift of the oscillator in the portable monitoring unit. This is primarily due to temperature changes. The goal is to stay within one second of the linearly interpolated time. The worst timing errors occur when the frequency deviates in one direction and then in the opposite direction. However, it has been determined that stability will be adequate over realistic temperature ranges.
The audience survey system includes pattern recognition algorithms that determine which of many possible audio sources was captured by a particular portable monitoring unit 4 at a certain time. To accomplish this with reasonable hardware cost, the central computers 7 preferably employ high performance PC's 25 that have been augmented by digital signal processors 26 that have been optimized to perform functions such as correlations and vector operations. FIG. 9 summarizes the signal recognition procedure.
As discussed previously, it is important to synchronize the time markers received from the portable monitoring units 4 with the time tags applied to feature blocks stored on the central computer systems 7. Once this has been done, the system should be able to find stored feature blocks that are within about one second from the feature packets received from the portable units. The tolerance for time alignment is about +/−3 seconds, leaving some room to deal with unusual situations. Additionally, the system can search for pattern matches outside of the tolerance window, but this slows down the processing. In cases where pattern matches are not found for a particular portable unit, the central computer can repeat all of the pattern matches using an expanded search window. Then when matches are found, their times of occurrence can be used as checkpoints to update the timing information. However, the need to resort to these measures may indicate a malfunction of the portable monitoring unit or its exposure to environmental extremes.
The pattern recognition process involves computing the degree of match with reference patterns derived from features of each of the sources. As shown in FIG. 9, this degree of match is measured as a weighted Euclidean distance in three-dimensional space. The distance metric indicates a perfect match as a distance of zero. Small distances indicate a closer match than large distances. Therefore, the system must find the source that produces the smallest distance to the unknown feature packet. This distance is then compared with a threshold value. If the distance is below the threshold, the system will report that the unknown packet matches the corresponding source and record the source identification. If the minimum distance is greater than the threshold, the system presumes that the unknown feature packet does not match any of the sources and record that the source is unknown.
The basic pattern matching procedure is illustrated in FIG. 10. Feature packets from a portable monitoring unit 4 contain 64 samples from each of the three bands. These must first be mu-law decompressed to produce 16 bit linear values. Each of the three feature waveforms is then normalized by dividing each value by the standard deviation (square root of power) computed over the three signals. This corrects for the audio volume to which the portable unit was exposed when the feature packet was collected. Each of the three normalized waveforms is then padded with a block of zeroes to a total length of 128 samples per feature band. This is necessary to take advantage of a fast correlation algorithm based on the FFT.
The system then locates a block of samples consisting of 128 samples of each feature as determined by the time alignment calculation. This will include the time offset needed to assure that the needed three second margins are present at the beginning and end of the expected location of the unknown packet. Next, the system calculates the cross-correlation functions between each of the three waveforms of the unknown feature packet and the corresponding source waveforms. In the fast correlation algorithm this requires that both the unknown and the reference source waveforms are transformed to the frequency domain using a fast Fourier transform. The system then performs a conjugate vector cross-product of the resulting complex spectra and then performs an inverse fast Fourier transform on the result. The resulting correlation functions are then normalized by the sliding standard deviation of each computed over a 64 sample window.
Each of the three correlation functions representing the three frequency bands have a maximum value of one for a perfect match to zero for no correlation to minus one for an exact opposite. Each of the correlation values is converted to a distance component by subtracting it from one. The Euclidean distance is preferably defined as set forth in equation (1) below as the square root of the sum of the squares of the individual components:
D=[(1−cv 1)2+(1−cv 2)2+(1−cv 3)2]1/2  (1)
This results in a single number that measures how well a feature packet matches the reference (or source) pattern, combining the individual distances as though they were based on measurements taken in three dimensional space. However, by virtue of normalizing the feature waveforms, each component makes an equal contribution to the overall distance regardless of the relative amplitudes of the audio in the three bands. In one embodiment, the present invention aims to avoid situations where background noise in an otherwise quiet band disturbs the contributions of frequency bands containing useful signal energy. Therefore, the system reintroduces relative amplitude information to the distance calculation by weighting each component by the standard deviations computed from the reference pattern as shown in equation (2) below. This must be normalized by the total magnitude of the signal:
D w=[((std1)*(1−cv 1))2+((std2)*(1−cv 2))2+((std3)*(1−cv 3))2]1/2/[(std1)2+(std2)2+(std3)2]1/2  (2)
The sequence of operations can be rearranged to combine some steps and eliminate others. The resulting weighted Euclidean distance automatically adapts to the relative amplitudes of the frequency bands and will tend to reduce the effects of broadband noise that is present at the portable unit and not at the source.
A variation of the weighted Euclidean distance involves integrating or averaging successive distances calculated from a sequence of feature packets received from a portable unit as shown in FIG. 11. In this procedure, the weighted distance is computed as above for the first packet. A second packet is then obtained and precisely aligned with feature blocks from the same source in the central computer. Again, the weighted Euclidean distance is calculated. If the two packets are from the same source, the minimum distance will occur at the same relative time delay in the distance calculation. For each of the 64 time delays in the distance array for a particular source the system computes a recursive update of the distance where the averaged distance is decayed slightly by multiplying it by a coefficient k that is less than one. The newly calculated distance is then scaled by multiplying it by (1−k) and adding it to the average distance. For a particular time delay value within the distance array the update procedure can be expressed as shown in equation (3) below:
D w(n)=k*D w(n−1)+(1−k)*D w(n)  (3)
Note that the bold notation Dw indicates the averaged value of the distance calculation, (n) refers to the current update cycle, and (n−1) refers to the previous update cycle. This process is repeated on subsequent blocks, recursively integrating more signal energy. The result of this is an improved signal-to-noise ratio in the distance calculation that reduces the probability of false detection.
The decision rule for this process is the same as for the un-averaged case. The minimum averaged distance from all sources is first found. This is compared with a distance threshold. If the minimum distance is less than the threshold, a detection has occurred and the source identification is recorded. Otherwise the system reports that the source is unknown.
The previous description of the preferred embodiments is provided to enable any person skilled in the art to make and use the present invention. The various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

What is claimed is:
1. A method for correlating a first packet of feature waveforms from an unknown source with a second packet of feature waveforms from a known broadcast audio source in order to associate a known broadcast audio source with the first packet of feature waveforms, comprising the steps of:
(A) receiving free field audio signals using a microphone that is included in a portable data collection unit, wherein the free field audio signals are audible to a user proximate the portable data collection unit, and generating the first packet of feature waveforms in accordance with said free field audio signals received by the microphone; and determining, with at least one processor, at least first, second and third correlation values (cv1, cv2, cv3) by correlating features from the first and second packets, wherein the first correlation value (cv1) is determined by correlating features associated with a first frequency band from the first and second packets, the second correlation value (cv2) is determined by correlating features associated with a second frequency band from the first and second packets, and the third correlation value (cv3) is determined by correlating features associated with a third frequency band from the first and second packets;
(B) computing, with said at least one processor, a first weighting value in accordance with the features from the second packet associated with the first frequency band, a second weighting value in accordance with the features from the second packet associated with the second frequency band, and a third weighting value in accordance with the features from second packet associated with the third frequency band;
(C) computing, with said at least one processor, a weighted Euclidean distance value (Dw) representative of differences between the first and second packets from the first, second and third correlation values and the first, second and third weighting values;
wherein the first weighting value corresponds to a standard deviation (std1) of the features from the second packet associated with the first frequency band, the second weighting value corresponds to a standard deviation (std2) of the features from the second packet associated with the second frequency band, and the third weighting value corresponds to a standard deviation (std3) of the features from the second packet associated with the third frequency band;
wherein the weighted Euclidean distance value (Dw) is determined in accordance with the following equation:

D w=[((std1)*(1−cv 1))2+((std2)*(1−cv 2))2+((std3)*(1−cv 3))2]1/2/[(std1)2+std2)2+(std3)2]1/2;
and
(D) determining, with said at least one processor and in accordance with the weighted Euclidean distance value (Dw), whether the first packet derived from the free field audio signals received by the microphone in the portable data collection unit is associated with the known broadcast audio source.
2. A method for correlating a packet of feature waveforms from an unknown source with a packet of feature waveforms from a known broadcast audio source in order to associate a known broadcast audio source with the packet of feature waveforms from the unknown source, comprising, the steps of:
(A) receiving free field audio signals using a microphone that is included in a portable data collection unit, wherein the free field audio signals are audible to a user proximate the portable data collection unit, and generating a first packet of feature waveforms in accordance with said free field audio signals received by the microphone; and determining, with at least one processor, at least first, second and third correlation values by correlating features from the first packet and a second packet associated with the known broadcast audio source, wherein the first correlation value is determined by correlating features associated with a first frequency band from the first and second packets, the second correlation value is determined by correlating features associated with a second frequency band from the first and second packets, and the third correlation value is determined by correlating features associated with a third frequency band from the first and second packets;
(B) computing, with said at least one processor, a Euclidean distance value (D(n−1)) representative of differences between the first and second packets from the first, second and third correlation values;
(C) receiving free field audio signals using the microphone that is included in the portable data collection unit in order to generate a third packet of feature waveforms in accordance with said free field audio signals received by the microphone; and determining, with said at least one processor, at least fourth, fifth and sixth correlation values by correlating features from the third packet and a fourth packet associated with the known broadcast audio source, wherein the fourth correlation value is determined by correlating features associated with the first frequency band from the third and fourth packets, the fifth correlation value is determined by correlating features associated with the second frequency band from the third and fourth packets, and the sixth correlation value is determined by correlating features associated with the third frequency band from the third and fourth packets;
(D) computing, with said at least one processor, a Euclidean distance value (D(n)) representative of differences between the third and fourth packets from the fourth, fifth and sixth correlation values;
(E) updating, with said at least one processor, the Euclidean distance value (D(n)) using the Euclidean distance value (D(n−1)); and
(F) determining with said at least one processor and in accordance with the updated Euclidean distance value (D(n)), whether the third packet derived from the free field audio signals received by the microphone in the portable data collection unit is associated with the known broadcast audio source.
3. The method of claim 2, wherein the second and fourth packets are known a priori to represent signals broadcast from the known source.
4. The method of claim 3, wherein the third packet is positioned immediately after the first packet in a sequence of packets of feature waveforms.
5. The method of claim 4, wherein the fourth packet is positioned immediately after the second packet in a sequence of packets of feature waveforms.
6. The method of claim 5, wherein the updated the Euclidean distance value (D(n)) is determined in step (E) in accordance with the following equation:

D(n)−k*D(n−1)+(1−k)*D(n)
where k is a coefficient that is less than 1.
7. The method of claim 2, wherein step (F) comprises:
(F) associating the third frequency packet with the known source if the updated Euclidean distance value (D(n)) is less than a threshold.
US09/441,539 1999-06-18 1999-11-16 Audience survey system, and system and methods for compressing and correlating audio signals Expired - Fee Related US7284255B1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US09/441,539 US7284255B1 (en) 1999-06-18 1999-11-16 Audience survey system, and system and methods for compressing and correlating audio signals
EP00941506A EP1190510A1 (en) 1999-06-18 2000-06-16 Audience survey system, and systems and methods for compressing and correlating audio signals
BR0011762-5A BR0011762A (en) 1999-06-18 2000-06-16 Audience survey system, and systems and processes for compressing and correlating audio signals
PCT/US2000/016729 WO2000079709A1 (en) 1999-06-18 2000-06-16 Audience survey system, and systems and methods for compressing and correlating audio signals
AU56208/00A AU5620800A (en) 1999-06-18 2000-06-16 Audience survey system, and systems and methods for compressing and correlating audio signals
CA002375853A CA2375853A1 (en) 1999-06-18 2000-06-16 Audience survey system, and systems and methods for compressing and correlating audio signals
JP2001504616A JP2003502936A (en) 1999-06-18 2000-06-16 Audience survey system and system and method for compressing and correlating audio signals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14019099P 1999-06-18 1999-06-18
US09/441,539 US7284255B1 (en) 1999-06-18 1999-11-16 Audience survey system, and system and methods for compressing and correlating audio signals

Publications (1)

Publication Number Publication Date
US7284255B1 true US7284255B1 (en) 2007-10-16

Family

ID=26837944

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/441,539 Expired - Fee Related US7284255B1 (en) 1999-06-18 1999-11-16 Audience survey system, and system and methods for compressing and correlating audio signals

Country Status (7)

Country Link
US (1) US7284255B1 (en)
EP (1) EP1190510A1 (en)
JP (1) JP2003502936A (en)
AU (1) AU5620800A (en)
BR (1) BR0011762A (en)
CA (1) CA2375853A1 (en)
WO (1) WO2000079709A1 (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070118375A1 (en) * 1999-09-21 2007-05-24 Kenyon Stephen C Audio Identification System And Method
US20070261073A1 (en) * 2006-04-25 2007-11-08 Xorbit, Inc. System and method for monitoring video data
US20070294057A1 (en) * 2005-12-20 2007-12-20 Crystal Jack C Methods and systems for testing ability to conduct a research operation
US20080082995A1 (en) * 2006-09-28 2008-04-03 K.K. Video Research Method and apparatus for monitoring TV channel selecting status
US20080114557A1 (en) * 2006-11-14 2008-05-15 2 Bit, Inc. Variable sensing using frequency domain
US20100095210A1 (en) * 2003-08-08 2010-04-15 Audioeye, Inc. Method and Apparatus for Website Navigation by the Visually Impaired
US20110106587A1 (en) * 2009-10-30 2011-05-05 Wendell Lynch Distributed audience measurement systems and methods
US7966494B2 (en) 1999-05-19 2011-06-21 Digimarc Corporation Visual content-based internet search methods and sub-combinations
US20110244784A1 (en) * 2004-02-19 2011-10-06 Landmark Digital Services Llc Method and apparatus for identification of broadcast source
US8768003B2 (en) 2012-03-26 2014-07-01 The Nielsen Company (Us), Llc Media monitoring using multiple types of signatures
US8885842B2 (en) 2010-12-14 2014-11-11 The Nielsen Company (Us), Llc Methods and apparatus to determine locations of audience members
US20150059459A1 (en) * 2013-08-28 2015-03-05 James Ward Girardeau, Jr. Method and apparatus for recreating machine operation parameters
US9106953B2 (en) 2012-11-28 2015-08-11 The Nielsen Company (Us), Llc Media monitoring based on predictive signature caching
US9158760B2 (en) 2012-12-21 2015-10-13 The Nielsen Company (Us), Llc Audio decoding with supplemental semantic audio recognition and report generation
US9179200B2 (en) 2007-03-14 2015-11-03 Digimarc Corporation Method and system for determining content treatment
US9183849B2 (en) 2012-12-21 2015-11-10 The Nielsen Company (Us), Llc Audio matching with semantic audio recognition and report generation
US9195649B2 (en) 2012-12-21 2015-11-24 The Nielsen Company (Us), Llc Audio processing techniques for semantic audio recognition and report generation
US9292894B2 (en) 2012-03-14 2016-03-22 Digimarc Corporation Content recognition and synchronization using local caching
US9426525B2 (en) 2013-12-31 2016-08-23 The Nielsen Company (Us), Llc. Methods and apparatus to count people in an audience
US9715626B2 (en) 1999-09-21 2017-07-25 Iceberg Industries, Llc Method and apparatus for automatically recognizing input audio and/or video streams
US9794619B2 (en) 2004-09-27 2017-10-17 The Nielsen Company (Us), Llc Methods and apparatus for using location information to manage spillover in an audience monitoring system
US9848222B2 (en) 2015-07-15 2017-12-19 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US9924224B2 (en) 2015-04-03 2018-03-20 The Nielsen Company (Us), Llc Methods and apparatus to determine a state of a media presentation device
US10007723B2 (en) 2005-12-23 2018-06-26 Digimarc Corporation Methods for identifying audio or video content
US10242415B2 (en) 2006-12-20 2019-03-26 Digimarc Corporation Method and system for determining content treatment
US10423709B1 (en) 2018-08-16 2019-09-24 Audioeye, Inc. Systems, devices, and methods for automated and programmatic creation and deployment of remediations to non-compliant web pages or user interfaces
US10444934B2 (en) 2016-03-18 2019-10-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10681399B2 (en) * 2002-10-23 2020-06-09 The Nielsen Company (Us), Llc Digital data insertion apparatus and methods for use with compressed audio/video data
US10867120B1 (en) 2016-03-18 2020-12-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10896286B2 (en) 2016-03-18 2021-01-19 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11727195B2 (en) 2016-03-18 2023-08-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6560349B1 (en) 1994-10-21 2003-05-06 Digimarc Corporation Audio monitoring using steganographic information
US6505160B1 (en) 1995-07-27 2003-01-07 Digimarc Corporation Connected audio and other media objects
US8874244B2 (en) 1999-05-19 2014-10-28 Digimarc Corporation Methods and systems employing digital content
US7185201B2 (en) 1999-05-19 2007-02-27 Digimarc Corporation Content identifiers triggering corresponding responses
US8055588B2 (en) 1999-05-19 2011-11-08 Digimarc Corporation Digital media methods
US8572640B2 (en) 2001-06-29 2013-10-29 Arbitron Inc. Media data use measurement with remote decoding/pattern matching
GB2391322B (en) * 2002-07-31 2005-12-14 British Broadcasting Corp Signal comparison method and apparatus
US7920164B2 (en) 2003-07-28 2011-04-05 Nec Corporation Viewing surveillance system for carrying out surveillance independent of a broadcasting form
US7483975B2 (en) * 2004-03-26 2009-01-27 Arbitron, Inc. Systems and methods for gathering data concerning usage of media data
US8738763B2 (en) * 2004-03-26 2014-05-27 The Nielsen Company (Us), Llc Research data gathering with a portable monitor and a stationary device
MX2007002071A (en) 2004-08-18 2007-04-24 Nielsen Media Res Inc Methods and apparatus for generating signatures.
US7623823B2 (en) 2004-08-31 2009-11-24 Integrated Media Measurement, Inc. Detecting and measuring exposure to media content items
BRPI0514559A (en) * 2004-08-31 2008-06-17 Integrated Media Measurement I methods and systems for detecting user exposure to media items and for detecting media target transmission
US20060105702A1 (en) * 2004-11-17 2006-05-18 Muth Edwin A System and method for interactive monitoring of satellite radio use
CA2678942C (en) 2007-02-20 2018-03-06 Nielsen Media Research, Inc. Methods and apparatus for characterizing media
JP4645609B2 (en) * 2007-03-22 2011-03-09 ヤマハ株式会社 Broadcast identification device and automatic performance device
US10489795B2 (en) 2007-04-23 2019-11-26 The Nielsen Company (Us), Llc Determining relative effectiveness of media content items
US8458737B2 (en) 2007-05-02 2013-06-04 The Nielsen Company (Us), Llc Methods and apparatus for generating signatures
EP2210252B1 (en) 2007-11-12 2017-05-24 The Nielsen Company (US), LLC Methods and apparatus to perform audio watermarking and watermark detection and extraction
US8457951B2 (en) 2008-01-29 2013-06-04 The Nielsen Company (Us), Llc Methods and apparatus for performing variable black length watermarking of media
WO2009110932A1 (en) 2008-03-05 2009-09-11 Nielsen Media Research, Inc. Methods and apparatus for generating signatures
GB2465747A (en) 2008-11-21 2010-06-02 Media Instr Sa Audience measurement system and method of generating reference signatures
JP5302085B2 (en) * 2009-04-27 2013-10-02 株式会社ビデオリサーチ Survey system
GB2474508B (en) 2009-10-16 2015-12-09 Norwell Sa Audience measurement system
JP5372825B2 (en) * 2010-03-31 2013-12-18 株式会社エヌ・ティ・ティ・ドコモ Terminal device, program identification method, and program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3919479A (en) * 1972-09-21 1975-11-11 First National Bank Of Boston Broadcast signal identification system
US4547804A (en) 1983-03-21 1985-10-15 Greenberg Burton L Method and apparatus for the automatic identification and verification of commercial broadcast programs
US4677466A (en) 1985-07-29 1987-06-30 A. C. Nielsen Company Broadcast program identification method and apparatus
US4697209A (en) 1984-04-26 1987-09-29 A. C. Nielsen Company Methods and apparatus for automatically identifying programs viewed or recorded
US5373567A (en) * 1992-01-13 1994-12-13 Nikon Corporation Method and apparatus for pattern matching
US5574962A (en) * 1991-09-30 1996-11-12 The Arbitron Company Method and apparatus for automatically identifying a program including a sound signal
US5826165A (en) 1997-01-21 1998-10-20 Hughes Electronics Corporation Advertisement reconciliation system
US5835634A (en) * 1996-05-31 1998-11-10 Adobe Systems Incorporated Bitmap comparison apparatus and method using an outline mask and differently weighted bits

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5481294A (en) * 1993-10-27 1996-01-02 A. C. Nielsen Company Audience measurement system utilizing ancillary codes and passive signatures
JP3099736B2 (en) * 1996-04-30 2000-10-16 日本電気株式会社 Mobile terminal device
JPH1168994A (en) * 1997-08-26 1999-03-09 Sony Corp Information transmitter, transmission distribution system and charger

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3919479A (en) * 1972-09-21 1975-11-11 First National Bank Of Boston Broadcast signal identification system
US4547804A (en) 1983-03-21 1985-10-15 Greenberg Burton L Method and apparatus for the automatic identification and verification of commercial broadcast programs
US4697209A (en) 1984-04-26 1987-09-29 A. C. Nielsen Company Methods and apparatus for automatically identifying programs viewed or recorded
US4677466A (en) 1985-07-29 1987-06-30 A. C. Nielsen Company Broadcast program identification method and apparatus
US5574962A (en) * 1991-09-30 1996-11-12 The Arbitron Company Method and apparatus for automatically identifying a program including a sound signal
US5373567A (en) * 1992-01-13 1994-12-13 Nikon Corporation Method and apparatus for pattern matching
US5835634A (en) * 1996-05-31 1998-11-10 Adobe Systems Incorporated Bitmap comparison apparatus and method using an outline mask and differently weighted bits
US5826165A (en) 1997-01-21 1998-10-20 Hughes Electronics Corporation Advertisement reconciliation system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
(Merriam Webster's Collegiate Dictionary; Merriam Webster Incorporated; 1997; Tenth Edition; p. 1146). *
(Microsoft Computer Dictionary; Microsoft Press; 1999; p. 421). *

Cited By (97)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7966494B2 (en) 1999-05-19 2011-06-21 Digimarc Corporation Visual content-based internet search methods and sub-combinations
US10449797B2 (en) * 1999-05-19 2019-10-22 Digimarc Corporation Audio-based internet search methods and sub-combinations
US9715626B2 (en) 1999-09-21 2017-07-25 Iceberg Industries, Llc Method and apparatus for automatically recognizing input audio and/or video streams
US7783489B2 (en) * 1999-09-21 2010-08-24 Iceberg Industries Llc Audio identification system and method
US20070118375A1 (en) * 1999-09-21 2007-05-24 Kenyon Stephen C Audio Identification System And Method
US8589169B2 (en) * 2002-07-31 2013-11-19 Nathan T. Bradley System and method for creating audio files
US10681399B2 (en) * 2002-10-23 2020-06-09 The Nielsen Company (Us), Llc Digital data insertion apparatus and methods for use with compressed audio/video data
US11223858B2 (en) * 2002-10-23 2022-01-11 The Nielsen Company (Us), Llc Digital data insertion apparatus and methods for use with compressed audio/video data
US8296150B2 (en) * 2003-08-08 2012-10-23 Audioeye, Inc. System and method for audio content navigation
US20100095210A1 (en) * 2003-08-08 2010-04-15 Audioeye, Inc. Method and Apparatus for Website Navigation by the Visually Impaired
US8046229B2 (en) * 2003-08-08 2011-10-25 Audioeye, Inc. Method and apparatus for website navigation by the visually impaired
US20110307259A1 (en) * 2003-08-08 2011-12-15 Bradley Nathan T System and method for audio content navigation
US9071371B2 (en) 2004-02-19 2015-06-30 Shazam Investments Limited Method and apparatus for identification of broadcast source
US8290423B2 (en) * 2004-02-19 2012-10-16 Shazam Investments Limited Method and apparatus for identification of broadcast source
US9225444B2 (en) 2004-02-19 2015-12-29 Shazam Investments Limited Method and apparatus for identification of broadcast source
US20110244784A1 (en) * 2004-02-19 2011-10-06 Landmark Digital Services Llc Method and apparatus for identification of broadcast source
US8811885B2 (en) 2004-02-19 2014-08-19 Shazam Investments Limited Method and apparatus for identification of broadcast source
US9794619B2 (en) 2004-09-27 2017-10-17 The Nielsen Company (Us), Llc Methods and apparatus for using location information to manage spillover in an audience monitoring system
US8185351B2 (en) * 2005-12-20 2012-05-22 Arbitron, Inc. Methods and systems for testing ability to conduct a research operation
US20070294057A1 (en) * 2005-12-20 2007-12-20 Crystal Jack C Methods and systems for testing ability to conduct a research operation
US8949074B2 (en) 2005-12-20 2015-02-03 The Nielsen Company (Us), Llc Methods and systems for testing ability to conduct a research operation
US8799054B2 (en) 2005-12-20 2014-08-05 The Nielsen Company (Us), Llc Network-based methods and systems for initiating a research panel of persons operating under a group agreement
US10007723B2 (en) 2005-12-23 2018-06-26 Digimarc Corporation Methods for identifying audio or video content
US8255963B2 (en) * 2006-04-25 2012-08-28 XOrbit Inc. System and method for monitoring video data
US20070261073A1 (en) * 2006-04-25 2007-11-08 Xorbit, Inc. System and method for monitoring video data
US20080082995A1 (en) * 2006-09-28 2008-04-03 K.K. Video Research Method and apparatus for monitoring TV channel selecting status
US8548763B2 (en) 2006-11-14 2013-10-01 2 Bit, Inc. Variable sensing using frequency domain
US8060325B2 (en) * 2006-11-14 2011-11-15 2 Bit, Inc. Variable sensing using frequency domain
US20080114557A1 (en) * 2006-11-14 2008-05-15 2 Bit, Inc. Variable sensing using frequency domain
US10242415B2 (en) 2006-12-20 2019-03-26 Digimarc Corporation Method and system for determining content treatment
US9179200B2 (en) 2007-03-14 2015-11-03 Digimarc Corporation Method and system for determining content treatment
US9785841B2 (en) 2007-03-14 2017-10-10 Digimarc Corporation Method and system for audio-video signal processing
US8990142B2 (en) * 2009-10-30 2015-03-24 The Nielsen Company (Us), Llc Distributed audience measurement systems and methods
US10672407B2 (en) 2009-10-30 2020-06-02 The Nielsen Company (Us), Llc Distributed audience measurement systems and methods
US9437214B2 (en) 2009-10-30 2016-09-06 The Nielsen Company (Us), Llc Distributed audience measurement systems and methods
US20200365165A1 (en) * 2009-10-30 2020-11-19 The Nielsen Company (Us), Llc Distributed audience measurement systems and methods
US20110106587A1 (en) * 2009-10-30 2011-05-05 Wendell Lynch Distributed audience measurement systems and methods
US11671193B2 (en) * 2009-10-30 2023-06-06 The Nielsen Company (Us), Llc Distributed audience measurement systems and methods
US8885842B2 (en) 2010-12-14 2014-11-11 The Nielsen Company (Us), Llc Methods and apparatus to determine locations of audience members
US9292894B2 (en) 2012-03-14 2016-03-22 Digimarc Corporation Content recognition and synchronization using local caching
US9986282B2 (en) 2012-03-14 2018-05-29 Digimarc Corporation Content recognition and synchronization using local caching
US9106952B2 (en) 2012-03-26 2015-08-11 The Nielsen Company (Us), Llc Media monitoring using multiple types of signatures
US11044523B2 (en) 2012-03-26 2021-06-22 The Nielsen Company (Us), Llc Media monitoring using multiple types of signatures
US11863820B2 (en) 2012-03-26 2024-01-02 The Nielsen Company (Us), Llc Media monitoring using multiple types of signatures
US8768003B2 (en) 2012-03-26 2014-07-01 The Nielsen Company (Us), Llc Media monitoring using multiple types of signatures
US9674574B2 (en) 2012-03-26 2017-06-06 The Nielsen Company (Us), Llc Media monitoring using multiple types of signatures
US10212477B2 (en) 2012-03-26 2019-02-19 The Nielsen Company (Us), Llc Media monitoring using multiple types of signatures
US11863821B2 (en) 2012-03-26 2024-01-02 The Nielsen Company (Us), Llc Media monitoring using multiple types of signatures
US9723364B2 (en) 2012-11-28 2017-08-01 The Nielsen Company (Us), Llc Media monitoring based on predictive signature caching
US9106953B2 (en) 2012-11-28 2015-08-11 The Nielsen Company (Us), Llc Media monitoring based on predictive signature caching
US9158760B2 (en) 2012-12-21 2015-10-13 The Nielsen Company (Us), Llc Audio decoding with supplemental semantic audio recognition and report generation
US11837208B2 (en) 2012-12-21 2023-12-05 The Nielsen Company (Us), Llc Audio processing techniques for semantic audio recognition and report generation
US9812109B2 (en) 2012-12-21 2017-11-07 The Nielsen Company (Us), Llc Audio processing techniques for semantic audio recognition and report generation
US10360883B2 (en) 2012-12-21 2019-07-23 The Nielsen Company (US) Audio matching with semantic audio recognition and report generation
US10366685B2 (en) 2012-12-21 2019-07-30 The Nielsen Company (Us), Llc Audio processing techniques for semantic audio recognition and report generation
US9754569B2 (en) 2012-12-21 2017-09-05 The Nielsen Company (Us), Llc Audio matching with semantic audio recognition and report generation
US11094309B2 (en) 2012-12-21 2021-08-17 The Nielsen Company (Us), Llc Audio processing techniques for semantic audio recognition and report generation
US9640156B2 (en) 2012-12-21 2017-05-02 The Nielsen Company (Us), Llc Audio matching with supplemental semantic audio recognition and report generation
US11087726B2 (en) 2012-12-21 2021-08-10 The Nielsen Company (Us), Llc Audio matching with semantic audio recognition and report generation
US9195649B2 (en) 2012-12-21 2015-11-24 The Nielsen Company (Us), Llc Audio processing techniques for semantic audio recognition and report generation
US9183849B2 (en) 2012-12-21 2015-11-10 The Nielsen Company (Us), Llc Audio matching with semantic audio recognition and report generation
US9880529B2 (en) * 2013-08-28 2018-01-30 James Ward Girardeau, Jr. Recreating machine operation parameters for distribution to one or more remote terminals
US20150059459A1 (en) * 2013-08-28 2015-03-05 James Ward Girardeau, Jr. Method and apparatus for recreating machine operation parameters
US9918126B2 (en) 2013-12-31 2018-03-13 The Nielsen Company (Us), Llc Methods and apparatus to count people in an audience
US11711576B2 (en) 2013-12-31 2023-07-25 The Nielsen Company (Us), Llc Methods and apparatus to count people in an audience
US9426525B2 (en) 2013-12-31 2016-08-23 The Nielsen Company (Us), Llc. Methods and apparatus to count people in an audience
US11197060B2 (en) 2013-12-31 2021-12-07 The Nielsen Company (Us), Llc Methods and apparatus to count people in an audience
US10560741B2 (en) 2013-12-31 2020-02-11 The Nielsen Company (Us), Llc Methods and apparatus to count people in an audience
US11363335B2 (en) 2015-04-03 2022-06-14 The Nielsen Company (Us), Llc Methods and apparatus to determine a state of a media presentation device
US9924224B2 (en) 2015-04-03 2018-03-20 The Nielsen Company (Us), Llc Methods and apparatus to determine a state of a media presentation device
US11678013B2 (en) 2015-04-03 2023-06-13 The Nielsen Company (Us), Llc Methods and apparatus to determine a state of a media presentation device
US10735809B2 (en) 2015-04-03 2020-08-04 The Nielsen Company (Us), Llc Methods and apparatus to determine a state of a media presentation device
US11716495B2 (en) 2015-07-15 2023-08-01 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US10264301B2 (en) 2015-07-15 2019-04-16 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US9848222B2 (en) 2015-07-15 2017-12-19 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US11184656B2 (en) 2015-07-15 2021-11-23 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US10694234B2 (en) 2015-07-15 2020-06-23 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US10896286B2 (en) 2016-03-18 2021-01-19 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10928978B2 (en) 2016-03-18 2021-02-23 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11061532B2 (en) 2016-03-18 2021-07-13 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10444934B2 (en) 2016-03-18 2019-10-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11151304B2 (en) 2016-03-18 2021-10-19 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11157682B2 (en) 2016-03-18 2021-10-26 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11029815B1 (en) 2016-03-18 2021-06-08 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10997361B1 (en) 2016-03-18 2021-05-04 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10809877B1 (en) 2016-03-18 2020-10-20 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11080469B1 (en) 2016-03-18 2021-08-03 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11455458B2 (en) 2016-03-18 2022-09-27 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10845947B1 (en) 2016-03-18 2020-11-24 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10867120B1 (en) 2016-03-18 2020-12-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10866691B1 (en) 2016-03-18 2020-12-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10860173B1 (en) 2016-03-18 2020-12-08 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11727195B2 (en) 2016-03-18 2023-08-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10845946B1 (en) 2016-03-18 2020-11-24 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11836441B2 (en) 2016-03-18 2023-12-05 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10762280B2 (en) 2018-08-16 2020-09-01 Audioeye, Inc. Systems, devices, and methods for facilitating website remediation and promoting assistive technologies
US10423709B1 (en) 2018-08-16 2019-09-24 Audioeye, Inc. Systems, devices, and methods for automated and programmatic creation and deployment of remediations to non-compliant web pages or user interfaces

Also Published As

Publication number Publication date
CA2375853A1 (en) 2000-12-28
EP1190510A1 (en) 2002-03-27
AU5620800A (en) 2001-01-09
WO2000079709A1 (en) 2000-12-28
BR0011762A (en) 2002-05-14
JP2003502936A (en) 2003-01-21

Similar Documents

Publication Publication Date Title
US7284255B1 (en) Audience survey system, and system and methods for compressing and correlating audio signals
US7174293B2 (en) Audio identification system and method
EP2263335B1 (en) Methods and apparatus for generating signatures
EP1133090B1 (en) Apparatus for identifying the members of an audience which are watching a television programme or are listening to a broadcast programme
US9197931B2 (en) System and method for determining broadcast dimensionality
US7954120B2 (en) Analysing viewing data to estimate audience participation
US8060372B2 (en) Methods and appratus for characterizing media
US20100262642A1 (en) Methods and apparatus for generating signatures
US20120203363A1 (en) Apparatus, system and method for activating functions in processing devices using encoded audio and audio signatures
EP2106050A2 (en) Audio matching system and method
EP0429469A1 (en) Radio meter.
EP1737151A2 (en) Fingerprint-based technique for surveying an audience
US10757456B2 (en) Methods and systems for determining a latency between a source and an alternative feed of the source
EP1724755B9 (en) Method and system for comparing audio signals and identifying an audio source
WO2002013396A2 (en) Audience survey system and method

Legal Events

Date Code Title Description
AS Assignment

Owner name: APEL, STEVEN G., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KENYON, STEPHEN C.;REEL/FRAME:010403/0099

Effective date: 19991029

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20111016