WO2016207899A1

WO2016207899A1 - System and method for secured capturing and authenticating of video clips

Info

Publication number: WO2016207899A1
Application number: PCT/IL2016/050679
Authority: WO
Inventors: Shlomo MATICHIN
Original assignee: Capester Ltd
Priority date: 2015-06-25
Filing date: 2016-06-26
Publication date: 2016-12-29
Also published as: IL253192A0; AR105169A1; IL249739A0; IL249739A

Abstract

A system for secured capturing and authenticating of video clips may include: one or a plurality of capturing devices, each of the capturing devices comprising a camera and a first processor, and configured to: acquire a time stamp; capture a video clip of a scene and attach to the captured video clip metadata that includes the time stamp; transmit a notification that the capturing of the video clip has ended; upload the captured video clip with the attached metadata to a remote service unit. The system may also include a remote service unit comprising a second processor and storage device, configured to: receive the notification; receive an uploaded video clip; authenticate the received video clip by: verifying that the received video clip is unedited with respect to the captured video clip; and verifying that a time stamp in a metadata attached to the received video clip is identical to the acquired time stamp and that a time interval between the time indicated by the attached time stamp and a time at which the notification was received is equal to a length of the video clip and a limited time margin.

Description

SYSTEM AND METHOD FOR SECURED CAPTURING AND

AUTHENTICATING OF VIDEO CLIPS

BACKGROUND OF THE INVENTION

[001] With the substantial increase in the availability and quality of video capturing devices many individuals all over the world capture video clips on a daily basis. At times civilians (i.e. not law enforcement agents) may find themselves at a scene of a crime or other infraction (a law violation) and witness it as it happens. If documented in a video clip by a video camera (such as a video camera of a smartphone), such video clip may be used in law-enforcement proceedings against the perpetrator provided its authenticity is determined and legally admissible, i.e. if it is possible to confirm to a very high standard of proof that the video clip has not been tampered with (edited, modified, or otherwise changed). Tampering with the video clip may be motivated by any of the following:

• A person who may be adversely affected if the video clip is presented to law- enforcement authorities may be motivated to tamper with the video clip wishing to disprove the video clip as evidence, in order to be acquitted of the offense and escape punishment.

• A computer hacker seeking to discredit a system responsible for authenticating such video clips, by demonstrating its vulnerability, and as a result, precluding the admissibility of video clips in legal proceedings. For example, such an attack might be a published video rant showing the hacker was charged with a law violation, e.g. speeding, in reliance on a video clip taken at the time his car was actually parked in the garage (an alibi).

[002] There is a need for system and method for protecting an original video clip from being tampered or hacked, and if such video clip was changed - a need for system and method that would identify and flag such change or changes that were made. Such solution should preferably be aligned with law enforcement standards of credibility.

SUMMARY OF THE INVENTION [003] There is thus provided, in accordance with some embodiments of the invention a system for secured capturing and authenticating of video clips. The system may include one or a plurality of capturing devices, each of the capturing devices comprising a camera and a first processor, and configured to acquire a time stamp; capture a video clip of a scene and attach to the captured video clip metadata that includes the time stamp; transmit a notification that the capturing of the video clip has ended; upload the captured video clip with the attached metadata to a remote service unit.

[004] The system may also include and a remote service unit comprising a second processor and storage device, configured to: receive the notification; receive an uploaded video clip; authenticate the received video clip by: verifying that the received video clip is unedited with respect to the captured video clip; and verifying that a time stamp in a metadata attached to the received video clip is identical to the acquired time stamp and that a time interval between the time indicated by the attached time stamp and a time at which the notification was received is equal to a length of the video clip and a limited time margin.

[005] In some embodiments each of said one or a plurality of the capturing devices is configured to calculate a hash or a digital signature for the captured video clip or the attached metadata, and wherein the remote service unit is configured to verify the hash or the digital signature.

[006] In some embodiments, the time stamp is based on a trusted clock source selected from the group consisting of a clock of the remote service unit, a clock of a trusted third party server, a trusted hardware module with clock capability.

[007] In some embodiments the remote service unit is further configured to identify a falsified video clip, by computing camera movement of the capturing device using a computer vision technique applied on consecutive frames in the received video clip; and comparing physical motion that was detected by one or a plurality of motion sensors of a capturing device that captured the captured video clip while the capturing device was capturing that video clip with the computed camera movement; and if the recorded physical motion and the computed camera movement do not correlate determine that the video clip is falsified.

[008] In some embodiments the remote service unit is further configured to determine that a scene recorded in the received video clip was captured from a digital display device by: performing frequency analysis on one or a plurality of frames of the received video clip, and detecting artifacts in said one or a plurality of frames that indicate that the scene recorded in said one or a plurality of frames was captured from a digital display device.

[009] In some embodiments the artifacts are identified as an energy level above a threshold for frequencies that are greater than a threshold in a vertical axis and greater than a threshold in a horizontal axis of a frequency analysis of said one or plurality of frames.

[0010] In some embodiments, the artifacts are identified as echoes across a vertical axis or across a horizontal axis of a frequency analysis of said one or plurality of frames.

[0011] In some embodiments, the remote service unit is further configured to determine that a video clip was digitally manipulated, by: compressing and decompressing one or a plurality of frame of the received video clip; subtracting each of said one or a plurality of frames from a corresponding frame of said compressed and decompressed one or a plurality of frames to obtain an error level analysis (ELA) image; and determining that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the ELA image is above a predetermined threshold.

[0012] In some embodiments the remote service unit is further configured to determine whether a video clip is a product of digital manipulation combining video from different capturing sessions, by: computing camera movement using a computer vision technique applied on consecutive frames in the received video clip; for each pair of a first and second consecutive frames of the received video clip, applying a perspective transform on the first frame based on the computed camera movement for that pair of frames to produce a transformed first frame, and subtracting the second frame from the transformed first frame, thereby producing a second video clip; accumulating frame data in the second video clip to produce a noise image; and determining that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the noise image deviates from a value of the same attribute of other patches of the noise image above a predetermined threshold.

[0013] In some embodiments the remote service unit is further configured to authenticate location data attached as metadata to the received video clip relating to a capturing device of said one or a plurality of capturing devices, by comparing location data for that device from two independent location data sources.

[0014] There is also provided according to some embodiments of the invention a service unit of a system for authenticating of video clips, the system including one or a plurality of capturing devices, each of the capturing devices configured to acquire a time stamp; capture a video clip of a scene and attach to the captured video clip metadata including the time stamp, transmit a notification that the capturing of the video clip has ended, and upload the captured video clip with the attached metadata to a remote service unit; the service unit comprising a second processor and storage device and configured to: receive the notification; receive an uploaded video clip; authenticate the received video clip by: verifying that the received video clip is unedited with respect to the captured video clip; and verifying that a time stamp in a metadata attached to the received video clip is identical to the acquired time stamp and that a time interval between the time indicated by the attached time stamp and a time at which the notification was received is equal to a length of the video clip and a limited time margin.

[0015] There is further provided, according to some embodiments of the invention a method for secured capturing and authenticating of video clips, for use in cooperation with one or a plurality of capturing devices, each of the capturing devices configured to acquire a time stamp; capture a video clip of a scene and attach to the captured video clip metadata including the time stamp, transmit a notification that the capturing of the video clip has ended; upload the captured video clip with the attached metadata to a remote service unit. The method may include : receiving the notification; receiving an uploaded video clip; authenticating the received video clip by: verifying that the received video clip is unedited with respect to the captured video clip; and verifying that a time stamp in a metadata attached to the received video clip is identical to the acquired time stamp and that a time interval between the time indicated by the attached time stamp and a time at which the notification was received is equal to a length of the video clip and a limited time margin.

[0016] There is also provided, in accordance with some embodiments of the invention an authentication method for identifying a falsified video clip. The method may include: recording physical motion detected by one or a plurality of motion sensors of a capturing device while the capturing device is capturing the video clip; computing camera movement of the capturing device using a computer vision technique applied on consecutive frames in the video clip; and comparing the recorded physical motion with the computed camera movement; and if the recorded physical motion and the computed camera movement do not correlate determining that the video clip is falsified.

[0017] There is also provided, according to some embodiments of the invention an authentication method for determining that a scene recorded in an image that was captured from a digital display device. The method may include: performing frequency analysis on the image, and detecting artifacts in the frequency analysis that indicate that the scene recorded in the image is a scene of a digital display device.

[0018] There is also provided, in accordance with some embodiments of the invention an authentication method for determining that a video clip was digitally manipulated. The method may include: for each frame of the video clip, compressing and decompressing one or a plurality of frame of the received video clip; subtracting each of said one or a plurality of frames from a corresponding frame of said compressed and decompressed one or a plurality of frames to obtain an error level analysis (ELA) image; and determining that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the ELA image is above a predetermined threshold.

[0019] There is further provided, according to some embodiments of the invention an authentication method for determining whether a video clip is a product of digital manipulation combining video from different capturing sessions. The method may include: computing camera movement using a computer vision technique applied on consecutive frames in the video clip; for each pair of a first and second consecutive frames of the video clip, applying a perspective transform on the first frame based on the computed camera movement for that pair of frames to produce a transformed first frame, and subtracting the second frame from the transformed first frame, thereby producing a second video clip; accumulating frame data in the second video clip to produce a noise image; and determining that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the noise image deviates from a value of the same attribute of other patches of the noise image above a predetermined threshold. [0020] There is also provided, in accordance with some embodiments of the invention a method for authenticating location data relating to a device. The method may include: comparing location data for that device from two independent location data sources.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:

[0022] Fig. 1 is a block diagram of a device for secured capturing of video clips according to embodiments of the present invention;

[0023] Fig. 2 A is a block diagram of a system for secured capturing and authenticating of video clips according to embodiments of the present invention;

[0024] Fig. 2B is a block diagram of functionalities of a system for secured capturing and authenticating of video clips according to embodiments of the present invention;

[0025] Fig. 2C is a time-line graph depicting the flow of data and operations between the various functionalities of the system for secured capturing and authenticating of video clips, according to embodiments of the present invention;

[0026] Fig. 3 A is a flow chart depicting a method for detecting tampered video clip based on movement detection and measurement, according to some embodiments of the present invention.

[0027] Figs. 3Bi-3Biv illustrate two consecutive frames out of a hypothetical video where due to camera motion a rectangular item in one of the frames appears rotated in the other frame. The diagram visualizes the principle of operation of an algorithms for estimating camera movement from video frames (camera transform), according to some embodiments of the present invention Fig. 4Bi shown the first frame; Fig. 3Bii shows the next consecutive frame; Fig. 3Biii points to specific features (corners) that are identified in the first frame; and Fig. 3Biv shows these features identified on the rotated item, thus allowing computing the transform of the rectangular item between the first frame and second frame. [0028] Fig. 3C depicts a frame from a video encoded with H.264 encoder, with the encoded motion vectors drawn as arrows over the frame.

[0029] Fig. 3D depicts a real-world example of two correlating signals, according to some embodiments of the present invention: the top signal is the y axis rotation rate in radians as measured by an accelerometer sensor on the capturing device over time, and the bottom signal is the two-dimensional (2D) transform in pixel units between each two consecutive frames of a recorded video.

[0030] Fig. 4A is a flow chart depicting a method for detecting whether a scene captured in a video clip was captured from a digital display device (such as a computer screen or TV set), according to some embodiments of the present invention.

[0031] Fig. 4B is the magnitude spectrum image computed for a still photograph taken of a real scene (not captured from a digital display device)

[0032] Fig. 4C is the magnitude spectrum image computed for a still photograph of a scene similar to the scene of the photograph whose magnitude spectrum image is shown in Fig. 4B, but captured from a digital display device, that displayed a digital image of that scene.

[0033] Fig. 4D is the magnitude spectrum image of Fig. 4C, after applying a thresholding computation step and an octagon mask, according to some embodiments of the present invention.

[0034] Fig. 4E is the image of Fig. 4D after applying a blurring step.

[0035] Fig. 4F is a frame taken from a video of a real scene.

[0036] Fig. 4G is a frame taken from a video capturing a computer screen (a digital display device) displaying the video of Fig. 4F.

[0037] Fig. 4H is the result of computing magnitude spectrum for the frame of

Fig. 4F.

[0038] Fig. 41 is the result of computing magnitude spectrum for the frame of

Fig. 4G.

[0039] Fig. 5 A is a flow chart depicting a method for detecting whether a video clip was digitally manipulated, by compressing and decompressing each frame, according to some embodiments of the present invention.

[0040] Fig. 5B is a frame of a real video that was not digitally manipulated. [0041] Fig. 5C is a frame of a video that was created from the video of Fig. 5B, but was digitally manipulated to replace the license plate number with another license plate number.

[0042] Fig. 5D is an Error Level Analysis (ELA) image calculated for the frame in Fig. 5B, zoomed in on the back of the vehicle depicted in that frame, according to some embodiments of the present invention.

[0043] Fig. 5E is the Error Level Analysis (ELA) image calculated for the frame in Fig. 5C, zoomed in on the back of the vehicle depicted in that frame.

[0044] Fig. 6A is a flow chart depicting a method for detecting whether a video clip is a product of digital manipulation combining video from different capturing sessions, according to some embodiments of the present invention,

[0045] Fig. 6B is a noise image computed from a video captured on a Google

Nexus 5 smartphone camera.

[0046] Fig. 6C is a noise image computed from a video of the same scene of Fig.

6B, but captured using a different smartphone camera, a LG G3 device.

[0047] Fig. 7 is a flow chart depicting a method for detecting whether location data relating to a device was manipulated, according to some embodiments of the present invention.

[0048] It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

[0049] In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.

[0050] Although embodiments of the invention are not limited in this regard, discussions utilizing terms such as, for example, "processing," "computing," "calculating," "determining," "establishing", "analyzing", "checking", or the like, may refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, that manipulates and/or transforms data represented as physical (e.g., electronic) quantities within the computer's registers and/or memories into other data similarly represented as physical quantities within the computer's registers and/or memories or other information non-transitory storage medium that may store instructions to perform operations and/or processes. Although embodiments of the invention are not limited in this regard, the terms "plurality" and "a plurality" as used herein may include, for example, "multiple" or "two or more". The terms "plurality" or "a plurality" may be used throughout the specification to describe two or more components, devices, elements, units, parameters, or the like. The term set when used herein may include one or more items. Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed simultaneously, at the same point in time, or concurrently.

[0051] A memory used in embodiments of the present invention may be, or may include, for example, a Random Access Memory (RAM), a read only memory (ROM), a Dynamic RAM (DRAM), a Synchronous DRAM (SD-RAM), a double data rate (DDR) memory chip, a Flash memory, a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units. Such memory may be, or may include a plurality of, possibly different memory units.

[0052] A processor, or processors used in embodiments of the present invention may be, for example, a central processing unit processor (CPU), a controller, an on-chip computing device, or any suitable computing or computational device configured to execute programs according to embodiments of the present invention.

[0053] As used herein operating system may be or may include any code segment designed and/or configured to perform tasks involving coordination, scheduling, arbitration, supervising, controlling or otherwise managing operation of a processor according to embodiments of the present invention. The operating system may be a commercial operating system and or a proprietary operating system. [0054] As used herein, the term "storage unit" may refer to any apparatus, device, system and/or array of devices that is configured to store data, for example, video recordings. The storage unit may include a mass storage device, for example Secure Digital (SD) cards, an optical storage device such as a CD, a DVD, or a laser disk; a magnetic storage device such as a tape, a hard disk, Redundant Array of Independent Disks (RAID), Direct-Attached Storage (DAS), Storage Area Network (SAN), a Network Attached Storage (NAS), cloud storage or the like. Each of the storage units may include the ability to write data to the storage and read the data from the storage unit for further use, e.g., video files may be read from the storage unit, upon a request, for example, when an investigation of an incident is required.

[0055] As used herein the term "organization" may refer to any organization, company, government institution, either profitable or non-profitable organization that may use a plurality of cameras (e.g., surveillance cameras) to monitor and record activities in a defined area which is under the supervision or influence of the organization. For example, the organization may be airports, police stations, jails, casinos, banks, department stores, companies, industrial facilities or hospitals.

[0056] "Recording", "capturing", "acquiring", etc., when used in the preset specification with respect to a camera, capturing device and the like, are interchangeable terms.

[0057] Some embodiments of the present invention contain computations on numerical values of pixels in video frames and still images. These computations are described in terms of performing them image- wise, but in essence the creation of an actual new image, i.e., a matrix of pixel values per channel, might not be necessary. For example, the result of subtraction of one image from a second image, might be referred to as a third image - a matrix containing the numerical value of the subtraction per pixel per channel. Yet, the computation may be performed without ever actually generating and storing such an image (i.e., matrix) but rather handling them in different format, or even not storing the result of the computation at all. For visualization of the computation stages, figures were added that depict the resulting images for such computation (e.g., subtraction), but adding those figures was intended to clarify the computation and visualize it, and is not the only method to complete these computations.

[0058] Some embodiments of the invention may be related to recording, in real time, of video data captured by cameras, for example, cameras of a surveillance system. As used herein, capturing, streaming, receiving, storing or performing any other action on a video stream in real time relates to performing these actions on the video stream at substantially the same rate in which the video stream is captured. Thus, receiving of incoming packets of video stream that was captured over a certain time period may last substantially no more than that time period. In some other embodiments of the invention, the video recording is carried out off-line mode, or in near real-time mode. In some other embodiments of the invention, a capturing device may record continuously (e.g., a dashboard camera), and when prompted, send a video clip lasting a predetermined period of time of the last footage obtained (e.g., last 1, 2 or 3 minutes, etc.).

[0059] The video data may be encoded and decoded (i.e., compressed and decompressed) according to any known or propriety video compression standard such as H. 264/MPEG-4, H.265, H. 263, MPEG2, WebM, VP8/9, Ogg Vorbis, etc., using any known video codec, for example, H. 264/MPEG-4 AVC, Microsoft On2, H.265 or H. 263 codecs, WebM, VP8/9, Ogg Vorbis, Motion JPG, etc. The video length might also be of a single frame video (as some embodiments of the present invention do not require more than one frame for authentication), in which case the video might actually refer to a still image, which may be encoded and decoded (i.e., compressed and decompressed), according to any known or propriety image compression standard such as JPEG, PNG, BMP, TIFF, etc.

[0060] Embodiments of the present invention allow capturing and collecting video clips in a secured manner so that the video clip is protected from changes / modifications / tampering with after its original recording. In essence, systems and methods according to some embodiments of the present invention seek to verify that a video clip that was received by a remote service unit only includes an unedited complete continuous image sequence depicting a real scene, as it was originally captured by a capturing device, and was not manipulated (e.g., by adding or removing details, or otherwise digitally modifying pixels or frames of the video clip or metadata attached to these clips).

[0061] The systems and methods according to some embodiments of the invention are designed to make sure that the received video clip is an identical copy (bit-to-bit) of a video clip that was captured by the capturing device.

[0062] The systems and methods according to some embodiments of the invention are designed to make sure that the entire scene captured in the captured video clip is a real scene and not a scene captured from a picture of a previously captured scene, and that the alleged time of recording of the received video clip is the actual time the scene was recorded on the captured video.

[0063] This novel feature may provide for admissibility of the video clip as an evidence in legal proceedings, as well as for additional purposes. For example, a user of an application according to embodiments of the present invention, that is executed, for example, on a user's smartphone (an "app" or "mobile app"), may record a video clip documenting, for example, a violation of law (a misdemeanor, felony, or crime). The video clip may be saved, for example stored in a cloud based storage, and then may be authenticated. The authentication process may include one or a plurality of tests such as sanity check of the time stamp of the video file as well as more complicated tests such as a process that crosschecks several inputs, such as readings from accelerometers, collected all throughout the video recording, on the user's recording device. After recording of the video clip, the clip may be made available for inspection by a remote inspecting agent (e.g., remote application), such as a Backend web application. The remote application may be operated and monitored by specialists, such as an expert staff of a service provider according to embodiments of the present invention and officials of a respective municipality/police or other law enforcement authority, which can decide whether the video clip actually depicts a violation of law that requires enforcement / intervention / penalization. When needed, a copy of the evidential video file may be prepared for future use in legal proceedings, for example along with the required authenticity proofs.

[0064] Falsification, as used throughout the description of embodiments of the invention, relates to changes that are made in a video clip / video file after it was taken and recorded. Such changes may be made in order to change the visual appearance of certain elements in the video clip, and changes that are made in order to change / delete / add data related to the video clip such as data included in the video metadata.

[0065] Falsification of a video clip may take place in various forms and ways, such as editing the video (removing, adding, rearranging and stretching segments of the video); digital image manipulation (for example, using video or photographic editing tools, like "Adobe Photoshop", "Adobe Premiere", etc.): changing the video such that new details are added or removing details from the original video, or finally, changing proportions or distances. Additional type of falsification may include, for example, keeping the graphical features of the video clip intact, but using a video feed that does not represent a real event - either by appending incorrect data associated with the video feed about time and location, or - in case of a video feed that was actually recorded at the "correct" time and the place, Falsification may also include capturing the video from a digital display device, such as a screen of another device (so that the time, location, and content purportedly indicate that an event was actually filmed there, but the video still does not represent a genuine event that took place in reality at that time and at that location).

[0066] System and method according to some embodiments of the present invention are configured to protect from various threats.

[0067] First type of threats relates to defects in the computing environment (i.e., "bugs") that can allow unauthorized insertion of incorrect data. If authenticity of an event of interest is based on the authenticity of video frames, but it turns out that the time of the event, as recorded with the video frames, was incorrect - it's not important that the origin of the problem is an accidental defect, and not an intentional attack. Similarly, if a non-original video clip is used for proving authenticity of an event of interest, or a non-matching graphical data, such as wrong license plate recorded in the video or an incorrect address as recorded in the data associated with the video, and so on - these would also have the same effect. This may be dealt with by conducting rigorous testing of the system, and adhering to best practices regarding data storage. Digitally signing the data itself may also be used in order to create authentic chain of custody. Special attention needs to be directed to the time stamp issue - a notoriously complex and bug-prone issue for computer systems - each server can have its own time- zone, time drift and daylight savings time notations.

[0068] A second threat may be tampering with the system, which may be intentional, or accidental. Most tampering events are intercepted at the verification stage, as is explained in details herein below. Tampering with the system may include:

• Changing the clock of the mobile device.

• Filming another device screen, when it is playing a recording of a

violation (disguised as a "live" event).

• Using a software GPS emulator as a location source

• Using a software camera emulator as a video source

• Using a software sensor emulator as a sensor data source

• Using an Android/iPhone emulator to provide incorrect location

• Using an Android/iPhone emulator to provide false video stream • Using an Android/iPhone emulator to provide false sensor data

• Trying to fiddle with the stored state data on the mobile device

• Trying to replace the recorded video file without the app detecting it

• Filming a video that is not a violation (e.g. a video of a cat)

[0069] An additional possible threat may be an experienced hacker attempting to reverse engineer the mobile app. Static reverse engineering includes reading the "disassembled" code of the communication protocol module used by the app, or the actual app. Dynamic reverse engineering may include using debugger to run the app, and attempt to change in memory data or code. Finally, the attacker may focus his efforts in the API between the video upload functionality 102 and communication handling functionality 104.

[0070] Additional threat vector stems from the fact that each of the online services of the system can be the target of a potential attack, and if access is gained, be used to inject false data, modify existing data, or erase data.

[0071] According to some embodiments of the present invention, method and system for securely capturing and authentication video clips may incorporate three layers that greatly increases the credibility of authentication. These layers are: hacker-proofing layer, digital signature layer, and algorithmic computer vision and signal processing layer. The hacker- proofing layer may include tamper-proofing the video capturing device, anti-debugging and anti-reverse engineering techniques applied to the mobile phone application, and network security (e.g., firewalls and strong passwords) of the remote service unit. The digital signature layer may include hashing and signing of captured data, that may include the video file, parts of the video file, and some or all attached metadata records (e.g., location data). Digital signature guarantee that the data was not changed even in a single bit after the time of performing the hash / signature, and is admissible in courts around the world as such proof. Therefore, generally, the earlier the hash/signature is calculated, the better. The third layer of computer vision and signal processing algorithms may provide confirmation that the data that was digitally signed was obtained from a genuine source (e.g., a video is not a capture of another digital display device, such as a computer monitor or a TV set).

[0072] Definitions of Terms of Acronyms: the following terms and / or acronyms will have, throughout the entire description herein below, the meaning appearing next to them. Hash Function - A cryptographic hash function, such as SHA1, SHA256, SHA512, MD5, etc., which is used for data integrity verification

Digital Signature Algorithm - A cryptographic algorithm such as RSA, ECDSA, etc.

OpenCV - Open Source Computer Vision Library. An open source computer vision and machine learning software library.

Pixel Patch - A continuous group of adjacent pixels from an image or a frame. For example, 8 by 8 group of pixels, 16 by 16 pixels, etc.

Attribute of a Pixel Patch - The result of a calculation performed on the numerical values of the pixels in the pixel patch, producing a single number. With this single number other calculation can be performed, such as comparison with other pixel patches. The numerical value for a pixel may include one or plurality of numbers, per channel (e.g., color channel) in the image. The calculation can be, for example, an energy function (the sum of the power of pixel numerical value per pixel, e.g., squared, power of 2). The attribute is generally, a single dimension number, but may be a number of several dimensions, e.g., a complex number.

Capturing Session - A single activation event of a capturing device. For example, different capturing sessions may refer to two different capturing activations on the same capturing device, or two different devices.

SIFT - Scale Invariant Feature Transformation. A known algorithm for extracting keypoints and computing descriptors.

FLANN - Fast Library for Approximate Nearest Neighbor. A software library that contains a collection of algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional features.

RANSAC - Random Sample Consensus. An iterative method to estimate parameters of a mathematical model.

Kalman filter - An algorithm that operates on streams of noisy input data to produce a statistically optimal estimate of the underlying system state.

FFT - Fast Fourier Transform. A known algorithm that converts time (or space) to frequency (or wavenumber) and vice versa.

ELA - Error Level Analysis. A forensic method for identifying areas in an image that are at different compression levels. BSSID - Basic Service Set Identifier of a WiFi Access Points. It is the MAC address of the access point.

NMEA - A data protocol for communications to GPS receivers, as defined by the National Marine Electronic Association. The protocol is comprised of individual "sentences" of data.

[0073] Reference is made now to Fig. 1, which is a block diagram of device 100 for secured capturing, according to some embodiments of the present invention. Device 100 may include camera 120 configured to capture still and / or video images and to store them as a file or files on storage device 130 under the management of controller 110. Device 100 may further include internal movement / spatial location / orientation sensors 124, such as Micro Electro Mechanical Systems (MEMS) accelerometers, gyroscope, compass, pedometer or other sensors, or external location / orientation sensors (not shown), which are configured to provide data indicative of the orientation of device 100 (e.g., with respect to an external reference frame, such as the horizon). Device 100 may further include internal time stamp source 126 configured to provide time stamps to images taken by camera 120. Controller 110 may be configured to manage images taken by camera 120, e.g. save such images along with respective information provided by orientation sensors 124 and by time stamp source 126 in memory 130, for example in one of known graphical formats applicable for the taken stills or video images. Controller 110 may further be in operative communication with input/output (I/O) unit 140 configured to enable wired / wireless communication of device 100 with other units over one or more known communication networks or channels such as Internet, intranet, wireless networks (cellular networks, WiFi based communication channels), wired communication busses, and the like, as is known in the art. Device 100 may be, for example, a smartphone on which an appropriate application was installed, which uses the built-in features and functionalities of the smartphone, a digital camera with communication capabilities, or other specifically designed device.

[0074] Reference is made now also to Fig. 2A which is a block diagram of system 200 for securely capturing and authenticating video clips, according to embodiments of the present invention. System 200 may include one or more devices 100 for securely capturing video clips, such as device 100 of Fig. 1, in operative communication with remote service unit 250. Remote service unit 250 is configured to perform the role of managing the lifecycle of the evidence after its recording - such as, for example, storage, verification, archiving, user browsing interface and export to other systems.

[0075] Authenticity of video clips taken by a user using a smartphone or other personal device, such as camera, stills camera, video camera, android camera, dashboard camera, GPS-capable device, stationary camera and the like, may be performed in several steps, some of which may be performed on the user's own personal smart device and others may be performed on a remote unit, such as remote service unit 250 of Fig. 2A.

[0076] Reference is made now to Fig. 2B which is a block diagram of functionalities of system 200, according to some embodiments of the present invention. System 200 may comprise one or more devices 100, such as device 100 of Fig. 1 and remote service unit 250 in operative communication between them. Functionalities described herein below with respect to device 100 and remote service unit 250 may be implemented as computer programs executed on computing devices, as hardware configured to perform the respective functionality or a combination of hardware and software, as is known in the art.

[0077] Device 100 may include video upload module 102, configured to read video files stored on a local storage and communicate them to remote service unit 250 for storing on a local storage 252. Video upload module 102 may be implemented as software, written in any suitable language or development tool that conforms to the operating system (OS) of the respective smartphone, such as Java on Android, and Objective-C/Swift on iOS, or hardware, or a combination thereof. Device 100 may further include communication and handling module 104. Communication handling module 104 may be, for example, developed in any suitable programming language (e.g., C++ programming language) as a module that is executable unit in both Android and iOS applications. Communication handling module 104 is configured to securely communicate with Endpoint server 258 of remote service unit 250, and to provide runtime security for the data being recorded. All data coming in or leaving the communication handling module 104 may be scrambled or encrypted - the communication from the API (application program interface) to video upload module 102 is scrambled. The communication handling module 104 may save encrypted settings file, and the communication protocol with Endpoint server 258 may also be encrypted (e.g., on top of SSL encryption protocol). The code of the module itself is designed to prevent or substantially interfere with tampering attempts - static or dynamic reverse engineering, man in the middle attacks, and so on, by incorporating mechanisms such as, for example, code obfuscation, code scrambling, and anti-debugging techniques like debugger detection. Endpoint server 258 may be a "web" application, written in any suitable language or tool, such as Golang, that may be executed on the host OS, such as Heroku (PaaS cloud service - at any given moment, it might run on 0, 1 or more servers). Endpoint server 258 communicates with communication handling functionality 104 using another layer of encryption on top of SSL. Endpoint server 258 may store the uploaded data to database (DB) 257, and may provide the mobile app with credentials to upload a file to storage unit 252, such as S3 cloud storage, after a video was successfully recorded.

[0078] The database 257 that is used may be any suitable remote DB, such as MongoDB, hosted on the cloud by MongoLab. MongoLab uses Amazon EC2 to host their service.

[0079] Servant service 256 may be embodied with one or more computing instances, such as Amazon Elastic Compute Cloud (EC2) instances, that wait for "chores" to perform. For example, signing chore may be created right after the endpoint server 258 received the notification that the capturing has finished and before the upload of the video clip file. The signing chore may sign the video clip contents by signing its hash (i.e. message-digest), and may sign all the metadata attached to the video clip (location, time, sensors, etc.) termed the "metadata". The metadata may be formatted as a JAVA Script Object Notation (JSON) formatted file. The metadata may be signed by servant service 256, using the signing API provided by signing service 254. The metadata may also contain the hash of the video clip. Another chore, the verification chore, may take place after the file has been successfully uploaded to storage unit 252. At this stage the automatic verification of the video clip take place. The collected clocks and various location inputs may be compared to detect inconsistencies. Image processing may be applied to the video to detect unauthorized video clips, including comparison with the sensor data.

[0080] Signing service 254 may be embodied, for example, by a Windows VM hosted on Azure, which adheres to certification requirements, and thus the signature produced by it conforms to the requirements of the electronic (digital) signature regulations. Other possibilities for signing service 254 embodiments: A Linux or MacOSx VMs hosted on any cloud hosting service, or on any physical computer attached to the internet, a third party digital signing service, or a Hardware Security Module (HSM). Signing service 254 may validate that the request for verification originates from servant service functionality 256, using several methods, such as cryptography and network transaction origin filtering.

[0081] After servant service 256 has finished verifying the video, backend web application 260 may display unauthorized occurrences identified in previous stages to a human operator. The operator may watch this report using a suitable browser, displayed on a display device.

[0082] Modules in remote service 250 may be embodied using centralized or decentralized units available for operative communication with other modules of remote service unit 250. Certain modules may be embodied using cloud services, such as cloud based storage, cloud based computing, etc., as is known in the art.

[0083] Remote service unit 250 may further include servant service unit 256 configured to perform video file signing, for example using signing service 254 module and video file verification. Video file verification may be performed, according to some embodiments of the present invention, based on comparison of values received from device 100 of the user personal device that captured the video clip originally. For example, the collected clocks may be compared to detect inconsistencies; the various location inputs may be compared; image processing may be applied to the video to determine whether it is an unauthorized video clips; and may be compared with the sensor data.

[0084] Reference is made now to Fig. 2C which is a time-line graph depicting the flow of data and operations between the various functionalities of the system for secured capturing and authenticating of video clips, according to some embodiments of the present invention. In the figure, the timeline is from top to bottom, and each arrow is a network message between two functionalities. The trigger for these interaction is beginning of capturing of a video by the capturing device (e.g., the user of the device, which may be a mobile smart phone running a specific application, request to start recording a new video by pressing a button on the screen). The top most two arrows relate to the device communication module 104 asking the endpoint server 258 to start recording. Endpoint server 258 responds with a cryptographically verifiable token containing a timestamp. The device 100 then performs the recording of the video, attaching location data (e.g., GPS), movement sensor data and the token as the metadata for the video recording. The device calculates hashes for the video and metadata, during the recording process, and after the recording is complete. A notification that the capturing of the video clip has ended, with the hashes is then sent from the communication module 104 of the capturing device 100 to the endpoint server 258 (arrow named "upload hashes" in Fig. 2C). Endpoint server 258 in turn creates a database 257 record for that recording, which in turn triggers a "signing chore" message to the servant 256. The servant uses the hashes received to request digital signatures from signing service 254, and finishes by storing the received signatures back to the database 257. Since digital signatures assure that the content of the video and attached metadata has not been changed after the hash the signature was calculated for was produced (back on the capturing device 100), this allows the video file and metadata to be uploaded at a later time, for example, when WiFi internet connection is available, instead of using up cellular data package to upload the video and metadata (which are sometimes rather large). When the capturing device 100 is ready to upload the video and metadata, it first requests upload credentials from endpoint server 258 (a message from communication module 104), and can then upload directly to the storage unit 252 (using video file upload module 102), which allows upload only with credentials generated by the endpoint server 258. Once the upload is done, the storage unit 252 notifies the endpoint server 258 that the upload is done, after which the endpoint server 258 marks the database 257 record as uploaded. This in turn triggers a "verification chore" message to the servant 256, and execution of the verification and authentication algorithms. The servant may then verify the digital signature of the video and metadata, verify the timestamp in the metadata, and authentication the video by using the methods described in the present invention. The sequence ends when the result of authentication is stored back into the database 257. Verification of authenticity of the video recording time may include checking the time interval between the time the timestamp was created (the time indicated by the time stamp), and the time the notification on the end of the capturing of the video clip was received at the endpoint server 258. The timestamp might be acquired from a trusted clock source, such as the endpoint server 258 itself when the endpoint server 258 provides the token to the communication module 104 before the recording, but might also be acquired from other sources, such as other network services, for example, a trusted third party server or trusted hardware modules with clock capabilities, as long as it can be later verified that the timestamp was not changed. For example, digitally signing the timestamp, or storing a copy of the acquired timestamp in the database 257. These two time values were originated from trusted clock sources and may therefore serve in the authentication process of the video clip. If the time interval between acquiring of the times tamp and receiving the notification at the endpoint server 258 is greater than the length of the video plus some limited margin for network communication, then authentication of the clock should fail. Some embodiments of the present invention may acquire the timestamp at other times, such during the recording (instead of before the recording of the video starts). In some embodiments of the invention, network latency may be estimated and referred (with an optional error factor) as the limited margin.

[0085] Verification of authenticity of the video recording time may further comprise of embedding the verifiable timestamp acquired into the video data as a watermark or encoded into some of the pixels in the video (e.g., a barcode), which further proves the video has only been encoded after the timestamp has been created.

[0086] According to some embodiments of the present invention, verification of the video clip file and metadata file data integrity, i.e., that the received data processed on the servant 256 is bit-for-bit identical to the captured data on the capturing device 100, may be performed, by computing a hash function value or values, using, for example, the SHA512 hash algorithm, for the captured data on the capturing device 100, as early as possible (during the capturing of the video and metadata, or right afterwards), and verifying the received data on the servant 256, by computing hash function value or values for the data, and testing if the value or values match those computed on the capture device 100. In some embodiments of the present invention, the hash value generated may be digitally signed, using, for example, digital signature algorithm such as RSA, The signing might be performed on the capturing device 100 itself, using, for example, a private and public key pair distributed over a Public Key Infrastructure (PKI), or on the remote service unit 250 (signing service 254) using a private key, or both. Digitally signing the hash value proves the hash value was not modified since the time of the signature.

[0087] Verification of authenticity of a video clip may comprise checking of consistency based on comparison of movement derived from comparison of consecutive frames with movement data recorded from movement sensors while the video was recording. If both movements match then the video was indeed recorded on the device with the sensors, but if there is no match, this might indicate a falsification attempt by injecting video (as in attacking a the camera mobile app), or a falsification attempt of recoding a digital display device (e.g., another computer screen) playing a video. [0088] Reference is made now to Fig. 3A, which is a flow chart depicting a method of detecting tampered video clip based on movement verification method, according to some embodiments of the present invention. Video clip is received 302 and spatial movements (displacement and rotation - a "transform") of the camera are searched for and identified based on known methods in computer vision. This search and identification may include four stages (for each pair of consecutive frames):

[0089] The first stage is extracting "features" or "corners" for each of the frames. "Features" and "corners" are terms referring to a pixel or a patch in the frame that are easily trackable (corners of objects depicted in the frame usually are easily tracked). The first stage can be computed, for example, by the SIFT (Scale-Invariant Feature Transform) algorithm, by the Harris Corner Detection algorithm or Shi-Tomasi Corner Detection algorithm (for example, as implemented by OpenCV's goodFeaturesToTrack function), by the Minimum Eigen Value Corner Detection algorithm (for example, as implemented by OpenCV's goodFeaturesToTrack function), the SURF algorithm (Speeded Up Robust Features, for example, as implemented by OpenCV's SURF_create function), the FAST Algorithm for Corner Detection (Features from Accelerated Segment Test, for example, as implemented by OpenCV's FastFeatureDetector_create function), CenSorE Feature Detector (for example, as implemented by OpenCV's StarDetector_create function), ORB Feature Detector (Oriented FAST and Rotated BRIEF, for example, as implemented by OpenCV's ORB_create function), and other known computer vision techniques. The result of this stage is a generated list of coordinates (x,y) for each corner found.

[0090] The second stage (for each two consecutive frames) is to match the features extracted from both frames. The result of this stage, for each of the two consecutive frames, is a sub-list of the features, that were also found in the other frame, and both lists match in order; item number j in frame X's matched features list and item number j in frame X+l matched features list are the same feature, but not in the exact same location in the frame, as a result of (intended, but mostly inadvertent) movement of the camera in between capturing of the two consecutive frames. This stage can be calculated, for example, by using a brute-force matcher (for example, as implemented by OpenCV's BFMatcher function), a FLANN (Fast Library for Approximate Nearest Neighbors) based matcher (for example, as implemented by OpenCV's FlannBasedMatcher, or the FLANN library itself), or an Optical Flow algorithm, for example, Lucas-Kanade algorithm (as implemented by OpenCVs calcOpticalFlowPyrLK). The first two stages can be conceptually visualized by the example provided in Figs. 3Bi - 3Biv. Fig. 3Bi and Fig. 3Bii show two consecutive frames out of a hypothetical video 350 of a clockwise rotating rectangle 352. In the first stage, corner detection, as depicted in Fig. 3Biii and Fig. 3Biv respectively for the two frames depicted in Fig. 3Bi and Fig. 3Bii, the corners (354, 356, and additional two corners on the bottom) of the rectangle 352 are detected in both frames (a list of coordinates for each corner is maintained). In the second stage, the pairing stage, the corner 354 in the first frame Fig. 3Biii, is matched to 354 in the second frame Fig. 3Biv. Same for the corner 356, is matched between the two frames, and same for the rest of the corners.

[0091] The third stage is estimation calculation of a two-dimensional (2D) or a three- dimensional (3D) transform from the matching feature lists. A result of a 2D transform is a matrix of dimensions 2X3 and a 3D transform is a matrix of 3X4. These matrices, when multiplied by each feature in frame X (a feature is 2D vector representing a pixel in frame X) will result in the matching feature list from frame X+1 (also called a camera transform between two consecutive frames), up to an acceptable error that the transform estimation algorithm allows. This stage may be calculated by, for example, one of OpenCVs fmdHomography, estimateRigidTransform, getAffineTransform, getPerspectiveTransform, or other available algorithms and known computer vision methods (not necessarily from the OpenCV library).

[0092] The fourth and last stage consists of extracting camera displacement and rotation from the transform obtained in the previous stage. The translation for each axis can be found in the last column of the transform (TranslationX, TranslationY, TranslationZ = TransformMatrix[(0,3), (1,3), (2,3)] for 3D transforms, and TranslationX, TranslationY = TransformMatrix[(0,2), (1,2)] for 2D transforms). The rotation of each axis can be obtained with the following formulas: for 2D transforms, the image rotation can be computed with: atan2(-TransformMatrix[(0,l)],TransforrnMatrix([0,0])). For 3D transforms, the following C++ code extracts the Yaw, Pitch and Roll rotation angles:

if (TransformMatrix[(0,0)] == l.Of) {

Yaw = atari2f(Transfonr atrix[(0,2)],TransformMatrix[(2,3)]);

Pitch = 0;

Roll = 0;

}else if (TransformMatrix[(0,0)] == -l .Of) { Yaw = atan2f(TransformMatrix[(0,2)],TransformMatrix[(2,3)]);

Pitch = 0;

Roll = 0;

}else {

Yaw = atan2(-TransformMatrix[(2,0)],TransformMatrix[(0,0)]);

Pitch = asin(TransformMatrix[(l,0)]);

Roll = atan2(-TransformMatrix[(l,2)],TransformMatrix[(l,l)]);

}

[0093] The following pseudo code, which is given as an example, describes the four stages of calculation in searching and identifying spatial movement, in the case where the SIFT algorithm is used for feature detection, FLANN algorithm for feature matching, and fmdHomography for camera transform estimation:

PointsX = SIFT.detectPointsInImage( Frame X )

PointsX_l = SIFT.detectPointsInImage( Frame X + 1)

PointsMatchinglnBothFrames = FLANN.match( PointsX, PointsX_l )

Source_Points = PointsX that are also in PointsMatchinglnBothFrames

Destination_Points = PointsX_l that are also in PointsMatchinglnBothFrames HomographyMatrix = findHomography(Source_Points, Destination_Points) TranslationX, TranslationY, TranslationZ = HomographyMatrix[(0,3), (1,3),

(2,3)]

[0094] Another alternative for searching and detecting spatial movement (304 in Fig. 3A) is to estimate camera translation and rotation between each two consecutive frames, by using motion vectors between the two consecutive frames. MPEG, H.264 and other video encoding formats store per each group of pixels in a frame (a "macroblock"), the relative movement from the previous frame - how many "pixels" in each direction the block was located in the previous frame. Reference is now made to Fig. 3C, that visualizes what motion vectors are: this image is a frame from a video which shows the motion vectors embedded in the video stream - where the "same patch of pixels" was located in the previous frame. Averaging this information across all macroblocks, or applying FLANN across all macroblocks in a frame, computes an approximation of the 2D camera movement, assuming the scene is mostly static. (This method for approximating 2D camera movement from motion vectors is described in detail in the following publication: Maurizio Pilu, "On Using Raw MPEG Motion Vectors To Determine Global Camera Motion", Digital Media Department, HP Laboratories Bristol, HPL-97-102, August 1997). This method might be augmented by using RANSAC (Random Sample Consensus) instead of FLANN. Another augmentation would be instead of reading the motion vectors from the encoded video stream, just re-run the "motion estimation" stage of any of video encoders, to deduce new motion vectors between the frames (the advantage is that a software video encoder, can be configured for a higher "search space", but hardware video encoders, such as video encoders embedded into smartphone device are limited in resources for finding motion vectors).

[0095] After completing searching and identifying spatial movement, a translation signal per axis is obtained (value for each point in time, between each two frames, across the horizontal axis, vertical axis and depth axis) and a rotation angle signal, for a total of 6 signals (or less if 2D transform was used in step three). The camera device (for example, a smartphone), may have motion sensors installed, which may contain accelerometers recording acceleration over 3 axes, and gyroscopes that record rotation rate over 3 axes. These signals are recorded 306 and obtained. Then these signals are processed 308, to match either the camera translation or rotation from the recorded signals 306. The processing may include:

• For acceleration:

° Acceleration signal from an accelerometer that includes a gravity

component (9.81 Meters/Seconds square). This component must be eliminated from the signal for the acceleration to describe the camera device acceleration relative to the recorded scene. Elimination of the gravity component can be done, for example, with a low pass filter, or a Kalman filter. Alternatively, most software APIs for sensors input allow a "pseudo-sensor" input for Linear Acceleration, which already has the gravity component ignored.

° Linear Acceleration signal is then integrated once to receive velocity

signal, or twice to receive a displacement signal.

• Rotation rate:

° Rotation rate (angular velocity) is recorded by gyroscope sensors.

Some software APIs contain an Attitude pseudo sensor, which contain the current angle in terms of Roll, Pitch and Yaw, relative to a reference point. The derivative of the Attitude pseudo sensor gives the Rotation Rate signal.

[0096] Finally, the signals from the two origins, computer vision and hardware sensors on the camera device, are compared, and graded (block 310) for their "similarity". To check for similarity, the signals may first pass through a filtering or smoothing algorithm. Example for algorithms that may be used to perform this function, include low pass filter, Kalman filter, Kalman Smoothing, weighted average rolling window (Id convolution kernel), Bayesian filter, and other known methods. There are several known methods to compare two signals, for example, Matlab's corrcoef function (Cross Correlation Coefficients). Example source code in Matlab:

grade = corrcoef(SensorSignal, MotionSignal)(l , 2);

[0097] grade will receive a value close to 1 in similar signals, and close to 0 in non similar cases.

[0098] Another possible technique to grade signal similarity, is by using cross correlation, for example, as implemented by Matlab's xcorr, and dervied computations. For example: Normalized Maximum Cross Correlation Magnitude, may be calculated by the following example source code in Matlab:

norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); grade = norm_max_xcorr_mag(SensorSignal, MotionSignal);

[0099] Since sensor data is usually acquired at 100Hz, and video motion is derived from video frame rate, which is usually 25-30 frames per second, there may be an additional step to either drop sensor data or interpolate the video motion signal. Other methods to grade signal similarities, might not require this step. Example of one such other method: The calculation method involves detecting the X most numerically significant "peaks" in the signal, and verifying that the time delta between them, and their sequence, matches with the "peaks" from the other signal. According to one embodiment local minimum and maximum of disposition may be used as "peaks". Another known method for this step is called Dynamic Time Warping, for example, as implemented in the FastDTW software library (This method is described in detail in the following publication: Al-Naymat, G., S. Chawla, and J. Taheri, "SparseDTW: A Novel Approach to Speed up Dynamic Time Warping", The 2009 Australasian Data Mining, vol. 101, Melbourne, Australia, ACM Digital Library, pp. 117-127, 12/2009).

[00100] Fig. 3D depicts a real- world example of two such correlating signals, according to some embodiments of the present invention: rotation rate in radians over time (the top plot) as measured from a gyroscope sensor on a smartphone device, and the camera translation in units of "pixel size" over time (the bottom plot) on the same axis (using 2D transform from OpenCV's estimateRigidTransform). Signals that are similar (graded close to 1) are signals like these - they "look" similar.

[00101] As described above, searching and identifying spatial movement - 304 in Fig. 3A - produces several signals, also in 308. The pairs of signals to compare (one from each source) may be, for example, a predetermined list (for example: 2D camera transform extracted translation best correlate with rotation on the same axis. Another example: 3D camera transform extraction translation correlates with velocity (integrated acceleration) on the same axis). Another approach may be a pairing the signals by "best match": grade each possible combination of pairs, and take the pairing that scores the highest.

[00102] The final step, in 310, is to combine the similarity between the signals originating from 304, and the signals originating from 308, into a single number (for example, by using a weighted linear grade function). The result is a single number, between 0 and 1 , for which a threshold is defined, and videos recordings that score beneath this threshold are flagged as possibly falsified.

[00103] In some embodiments of the invention the method may include using odometer sensor ("step counter") as a signal sensor and estimating steps form the camera movement (e.g., shaking of the camera) as the compared signal, and/or using magnetometer orientation sensor and camera rotation motion.

[00104] Another kind of falsification may involve filming a video that is playing on another digital display device, for example - video of another camera screen, another smartphone screen, another tablet screen, a computer screen, a TV set, or even a printed picture captured from another camera. Such falsification can be attempted in order to insert into the system a video from another time and location, by filming the video playing on another device. Detection of such a falsified video can be achieved with comparing the movement of the camera and video (the previous method), but also with the following method: Reference is now made to Fig. 4B, depicting this method. For each frame of the video, the magnitude spectrum image 404 may be calculated (as explained above, the term magnitude spectrum image refers to creating a matrix of values, but the computation does not require a specific storage in the form of a matrix, as it may be performed without creating an "image"). Calculating the magnitude spectrum image for each frame typically includes using a frequency analysis method, for example, Fourier Transform, Fast Fourier Transform (FFT), Discreet Cosine Transform (DCT), etc. In some embodiments of the invention, an additional step of using only the absolute value of the result, may be performed. In some embodiments, the scale may be changed to a logarithmic scale. In some embodiments the calculation may be performed on the luminance channel of the image, and in some embodiments the calculation may be performed on color planes or any combination thereof.Below is an example Matlab source code designed to compute the magnitude spectrum image using Fast Fourier Transform (FFT), after converting an RGB image (image containing pixel values for Red Green and Blue color channels) to a single grey channel, according to some embodiments of the present invention: function FFT_amp = my_fft2( I )

I_gray = rgb2gray(I);

I_FFT = fft2(I_gray);

F = fftshift(I_FFT);

FFT_amp = abs(log(F+l));

FFT_amp = FFT_amp/max(FFT_amp(:));

End

[00105] Reference is now made to Fig. 4B and Fig. 4C. Fig. 4B shows the resulting magnitude spectrum image of the Matlab function presented above, when computed for a still photograph of a real scene (not of a digital display device like a computer monitor). Fig. 4C shows the resulting magnitude spectrum image of the Matlab function presented above, when computed over a still photograph of a computer screen which displayed the still photograph of Fig. 4B. The magnitude spectrum image of false still images (denoted as 'FFT_amp' in the code above, e.g. Fig. 4C) will have "artifacts" at high frequencies (frequencies above a predetermined frequency threshold, for a vertical axis and a horizontal axis), compared with an image of a real scene (not a captured off a digital display device such as a computer monitor, e.g., Fig. 4B). This is attributed to aliasing effects of the pixels in the recorded screen. The center of a magnitude spectrum image represents the zero horizontal and vertical frequency, with highest vertical and horizontal frequencies in the corners. The "dots" or "white stains" in the corners of the image in Fig. 4C, are the "telltale sign" (the artifacts) that the image was captured off a digital display device (such as a computer monitor, smartphone screen etc.), and is not an image of a genuine live scene. Genuine real scenes like in Fig. 4B (not captured off a digital display device) do not produce such stains. These stains at the edges are a visual representation of high energy at high frequencies - a phenomenon that typically does not usually occur in real world images.

[00106] Automatic detection of the presence of high energy in these high frequencies in a magnitude spectrum image 406 may be performed with thresholding/ local thresholding of the image and then summing up the percentage of white value pixels at the image corners. Thresholding refers to creating a new image in which all pixels above a certain value are "on" (usually, white) while all others are "off' (black). If needed, the thresholding can be done not only on the intensity level but also on the number of pixels (for example not taking into account blobs that are smaller then 1-4 pixels). If needed, before the local thresholding, one can use a filter to smooth the 'background' in the magnitude spectrum image, for example median filter. Results of applying such technique may be seen in the image of Fig. 4D which depicts an image resulting from applying "thresholding" (i.e. each pixel is "on" or "off if it was above a certain percentage threshold in the original image) on the image in Fig. 4C, according to some embodiments of the present invention. Relative to the image of Fig. 4C, the stains (or "dots") at the corners are more distinct, as the noise of all the insignificant frequencies in the background is discarded from the computation, increasing the distinction between non-fabricated scenes and fabricated scenes, and can be detected simply by counting white pixels in the corners of the thresholded magnitude spectrum image (natural scenes will have almost zero, fabricated scenes will have many - i.e., tens or hundreds - white pixels). This way the noise of all the insignificant frequencies in the background is discarded from the computation. This method may be applicable even without applying a threshold step.

[00107] A formula for what qualifies as "the corners" of the image, or a "high frequency", can be defined, for example, as the top 25 percent of frequencies, if sorted by the sum of the absolute horizontal and vertical components (magnitude spectrum images are two dimensional). A visual representation of this process may be imagining an "octagon" mask applied to the magnitude spectrum image. In some embodiments, an additional blurring step may be used to "increase" the size of these "stains" (the high frequency artifacts). Fig. 4E shows the result of masking out non high frequencies and applying blurring with the code sample below. A Matlab code for this process may look like this:

kern = [7,7];

THR = 0.4; %% or can be adaptive threshold

FFTl_binary = im2bw(medfilt2(FFTl, kern), THR);

SE = strel('octagon', 300);

maskl = SE.getnhood;

maskl = imresize((l -maskl), size(FFTl),'nearest');

FFTl_masked = FFTl_binary.*maskl ;

figure; imshow(double(FFTl_masked))

grade= sum(FFTl_masked(:))/sum(maskl (:));

[00108] A natural, original image (that was not tampered with) is expected to return grade 0. An image involving data from another screen will return grades between 0.01 and 0.03, as measured in experiments performed by the inventors of the present invention.

[00109] A second technique for automatically detecting the presence of the "stains" in the magnitude spectrum image 406: Blurring the magnitude spectrum image, for example by using median filter with big kernel. These types of filters may be used to discard insignificant noise from the magnitude spectrum image, and similarly to the purpose of the thresholding stage, a simple sum of the pixels values in the magnitude spectrum image of the high frequencies can detect the presence of the said stains.

[00110] Regardless of which of the filters is applied (none, thresholding, blur or Baysian), computation of the grade may be performed by computing the percent of the energy of the filtered magnitude spectrum image that is in the high frequencies (simply put, the sum of white pixels in the corners versus the sum of all white pixels).

[00111] A third technique for detecting the artifacts in high frequencies 406 may be to use a machine learning classification algorithm. [00112] The magnitude spectrum image of a frame extracted from a falsified video clip (video clip of a digital display device playing the original video), will have "artifacts" in frequencies low in one dimension, compared with an image captured of a real scene. These artifacts are termed "echoes" in this document. This occurs due to temporal aliasing effects between the frequency of the capturing camera, and the screen playing the video. Reference is now made to Fig. 4F - Fig. 41. Fig. 4F is a frame taken from a video of a real scene (not captured off a digital display device). Fig. 4G is a frame taken from a video of a computer monitor displaying the video used for Fig. 4F. Fig. 4H and Fig. 41 are the magnitude spectrum images computed for Fig. 4F and Fig. 4G respectively, according to some embodiments of the present invention. The artifacts discussed here, the echoes, can be observed as "thicker" lines along the horizontal and vertical axis (the axes meet in the center of the image). This is a mathematical representation of the camera capturing pixels from two consecutive frames at once, which can be seen on the original image, in the passing car, especially around the rear left wheel.

[00113] Automatic detection of such echoes artifacts may use the same algorithms as mentioned herein above: either threshold the magnitude spectrum image (with or without filtering), blur the image, or use machine learning image classification algorithm.

[00114] Reference is now made to Fig. 5A, which depicts a method for detection of falsification by means of graphical digital manipulations made in an original frame or frames. This may provide a second layer of security. Assuming that a "falsified" video was injected (i.e., submitted), which was not recorded using a normal capturing device linked with the system, and that this video was previously edited (edited using a video editor, or otherwise digitally manipulated with details added or removed, e.g., by editing the frame - changing pixel values, etc.). According to the first method a dedicated algorithm may help in detecting this situation by flagging videos that are "suspected" of such manipulation. For example, Fig. 5B contains a frame from a non modified captured video of a parked car. Fig. 5C, on the other hand, contains the same frame extracted from a video created by digitally manipulating the original video (the one used for Fig. 5B), to replace the license plate of the car with a different number. Fig. 5B represents a non falsified video, while Fig. 5C is a falsified video. The detecting method comprises calculating an Error Level Analysis (ELA) image (504, 506, 508 and 510) of one or more frames from the video being authenticated, and detecting patches of pixels with attribute value standing out in respect to the other patches of pixel attribute values, e.g. a higher energy value in the pixel patch than the rest of the pixel patches in the ELA image.

[00115] To calculate the Error Level Analysis (ELA) image, the first step 504 is creating a video encoder with a state supporting the calculation - a state as similar as possible to the state of the encoder that encoded the original frame. According to some embodiments of the present invention the relevant state variables to restore into the encoder may consist of: the bitrate control parameters (type of algorithm), the previous frames stack, Group Of Pictures (GOP) size, and current quantization factors (initial Qp, Qi and Qb). All these state variables can be recovered from the encoded video stream. According to some embodiments, a different encoder might be used, such as a JPEG encoder, MPEG2 encoder, etc. According to some embodiments, the JPEG encoder quality parameter must be set to match the lowest quantization factor of all macrob locks in the video frame. E.g., if the video is compressed using H.264, 51 for lowest quality quantization factor and 0 is best quality i.e., lossless compression. JPEG encoder quality is between 100 for highest, and 1 for lowest. According to some embodiments of the invention, for this case, the JPEG quality can be derived by subtracting Qmax to the power of A multiplied by B, where Qmax is the highest quantization factor of all the macroblocks in the frame (lower quality), and A and B are parameters measured empirically.

[00116] In the second step of calculating the ELA image 506, the frame is encoded using the encoder created in the first step 504. The resulting bitstream is then decoded again 508, and finally, the resulting compressed and then decompressed image is subtracted from the original frame 510. Subtracting images here refers to calculating the difference in pixel values between the original frame from the video and the compressed-then-decompressed image (not necessarily building a new matrix in memory representing an actual image).

[00117] Fig. 5D and Fig. 5E depict the resulting ELA images for Fig. 5B and Fig. 5C respectively (for print- visibility over white paper, Fig. 5D and Fig. 5E are zoomed in on the back of the parked vehicle, turned into grey scale and inverted, since the computation result is colorful over black background). Fig. 5E shows how the manipulation of the license plate is clearly noticeable, and no such result in Fig. 5D, which was calculated over a non-manipulated video.

[00118] Automatic detection of the presence of manipulation can be performed by calculating an energy value for each patch of pixels in the image (patch of pixels being a consecutive block of N by N pixels, where N is, for example, 8, 16, 32, 64, etc.). Energy value here is defined by summing the power by A of all the pixel values in the pixel path, where A can be 1, 1.5, 2, etc.). If a patch exists that its energy value is above a certain threshold over the average energy value for all patches, then this frame - and therefore all video - is suspected of digital manipulation.

[00119] Another kind of video falsification is digital manipulation that combines two or more videos into a single video file. For example, overlaying different license plate over each frame in the video file (combining a video file with an image file). Another example: filming an actor over a "green screen" to cut him out of that video and "inserting" the actor into a video of a background filmed absent of the actor. Digital video cameras, especially of the quality existing in smartphone device, typically have "noise" in the captured video, i.e., even when directing the camera at a white wall, the numerical value of each pixel may "wiggle" a bit between the frames of the video. The pattern in this noise, i.e. the noise in patches of pixels, differs between different capturing devices (e.g., different smartphones), but also between capturing sessions - different activations of the same capturing device - since it is also affected by the device temperature and also by ambient conditions of the device. The term sessions is used to denote different capturing sessions whether on the same capturing device, or different devices.

[00120] Reference is now made to Fig. 6A, which depicts a method for determining whether a video clip is a product of digital manipulation combining video from different capturing sessions. For each two consecutive frames, identify camera transform based on computer vision techniques 604. This block, and block 304 are implemented the same way, as described herein above. For each two consecutive frames, the first frame is transformed by the calculated camera transform, and subtracted from the second frame (block 606), which then an absolute value of each pixel value is retrieved, to produce a delta frame, combining all delta frames into a new video, a delta video (this description is a simple way to explain the calculation, it is not actually necessary to use an actual image - matrix of pixel values - for the delta image, or a list of frames or video file for the delta video - in some embodiments of the present invention, the computation can produce the same result without going through this form of storage). Viewing the delta video will not present the objects captured in the video, but only the noise, i.e., if very little noise exists the delta video will contain mainly zero pixel values (dark, "black" video). [00121] In some embodiments of the present invention, edge detection algorithm (for example, the Sobel algorithm, such as implemented by OpenCV's Sobel function, the Scharr algorithm, such as implemented by OpenCV's Scharr function, Canny edge detector, such as implemented by OpenCV s Canny function, etc.) over the second frame (or transformed first frame) is used to mask-out the delta frame (if the pixel value in the image produced by the edge detection algorithm is above a certain threshold, zero the pixel value in the delta image). The resulting masked delta image therefore does not contain the noise around edges of objects in the frame, which improves the results of detection in step 610 described below. In these embodiments, the delta video is the result of combining the masked delta frames.

[00122] In some embodiments of the present invention the delta frame is further processed by relating each pixel value to the pixel value in the second frame (or transformed first frame), for example, the pixel value in the processed delta frame is computed by the formula (A+B*pl)/(D+C*p2), where pi is the pixel value in the delta frame, p2 is the pixel value in the second frame, and A, B, C and D are numerical parameters of the formula adjusted empirically. This purpose of this step is handle videos on some capturing devices, where the noise is relative to the brightness of the pixel, an effect that can originate from quantization in the video encoder that encoded the video (usually a hardware H.264 encoder).In the next step of calculation, the delta video is accumulated to a single noise frame 608. In some embodiments of the present invention the accumulated delta frames are not all the delta frames in the delta video, but a partial group, for example, each group of 30 frames, each group of 60 frames, and also every Nth frame. In some embodiments of the present invention, the accumulation is performed by summing the numerical value of the pixel in each of the said delta frames, a linear combination of the values from said delta frames, or using a Noise Level Function (NLF), as described in Kobayashi, Michihiro, Takahiro Okabe, and Yoichi Sato. "Detecting forgery from static- scene video based on inconsistency in noise level functions. " Information Forensics and Security, IEEE Transactions on 5.4 (2010): 883-892. Reference is now made to Fig. 6B and Fig. 6C. Fig. 6B depicts the noise image created by summing pixel values of the first 120 frames of a video captured using a Google Nexus 5 smartphone (manufactured by LG). Fig. 6C is noise image computed in the same way from a video captured using a LG G3 smartphone, of the same scene, minutes after the video recorded in Fig. 6B. These two images are clearly distinguishable, relating to a different amount of noise.The final step in the computation is to determine from the noise image of a video if the video was created by combining two other videos from different capturing sessions 610. For each patch of pixels in the noise image (e.g., 8x8 pixels, 16xl6pixels, etc.) compute an attribute for the pixel patch, and if a pixel patch exists for which the attribute stands out relating to the compute attributes for another set of pixel patches, determine that they have originated from different capturing sessions, and the video is the result of digitally combining them, therefore is falsified.According to some embodiments of the present invention, the attribute used is the energy value calculated by a linear combination of the power of each pixel' s value (of the pixels in the pixel patch). As a formula: Energy = per_pixel_sum(p^AA), where p is the pixel numerical value, and A is a numerical parameter to the computation, determined empirically. For each two pixel patches in the noise image, if their computed energy values difference is over a threshold, determine said falsification.

[00123] According to some embodiments of the present invention, determining falsification may be implemented by relating to an attribute of each pixel patch, and performing a statistical analysis of that attribute across all pixel patches. For example, for each pixel patch, calculate the difference between the energy in the pixel patch and the average of energy across all pixel patches. If the difference is a factor over the standard deviation of the energy across all pixel patches, determine said falsification.

[00124] Another type of falsification may be an attempt to "inject" falsified location data into otherwise legitimate recording (video and time remaining truthful). Example for such falsification may be: running using an emulator of the capturing device, and using existing features for "injecting" location data (mobile device emulator is a virtual machine environment that can run inside desktop computers, usually for development and testing purposes).

[00125] Another Reference is now made to Fig. 7, which depicts a method for detecting such a falsified video clip where location data was altered. Such method may include acquiring location data from two independent location sources, 702 704 and finding if the two sources conform or contradict each other 706. In some embodiments of the present invention, the capturing device 100, while recording the video, besides recording the location from a geo-location services (the software API governing acquisition of location data from the operating system) of the smartphone, all visible wireless transmissions (e.g., WiFi access points, cellular towers, etc.) are also recorded and attached as metadata to the video clip. Later on, on the remote service unit 250, the location of the capturing device that captured the video can then be cross-checked with locations of origins of these transmissions, as extracted from publicly available databases (for example, Google geolocation API for the cloud can be queried a WiFi access point BSSID "MAC address", and it will return a GPS point for the last location this access point was recorded at. The database was collected by crowdsourcing from android devices, and by specially purposed vehicles surveying WiFi signals, also known as "War-Driving"). For each detected wiresless signal (e.g., WiFi access point or cellular tower), the device may perform a query against any wireless-signal-to-geolocation database (such as WiFi BSSID-to-geolocation database, or cellular tower ID to geolocation database). If the location, as acquired and calculated based on this method is far from location as reported by the capturing device's integrated GPS/geo location services, by more than a predefined figure, for example 100 meters, the location data is determined to have been falsified. According to some embodiments of the present invention, contradiction of location can be found between any two independent location data sources, such as data location originating from a GPS module, data location originating from location of cellular signals, data location originating from location of WiFi signals, data location originating from Bluetooth signals, data location originating from navigation beacons (for example, iBeacon), etc.

[00126] A second kind of independent data location source can be altitude queried from a trusted map service (such as Google Maps API), for a specific latitude and longitude. A location data source that provides latitude, longitude and altitude data (such as a GPS module location data source), can contradict the altitude acquired by querying the trusted map service with the longitude and latitude acquired. The two location data sources are independent only for the altitude data, longitude and latitude originating from a single location data source. This may indicate an attacker attempting manipulation of location data without properly manipulating all the fields in the location data (such as the altitude).

[00127] A third kind of independent data location source can be a trusted method to compute or query the list of satellites that may have been visible at the time and location acquired (e.g., if a GPS satellite is over Brazil at a certain time, and the location data list of satellites contains this satellite at this time indicates a location in Turkey, the other side of the planet, it's a contradiction). Here the two location data sources are independent in the list of satellites, but dependent on a single source for longitude and latitude. In the GPS system, for example, the satellites are not ego- stationary, i.e., are moving around the earth, changing their relevant positions. When GPS data is received, for example, in NMEA protocol format, the $GPGSA and $GPGSV NMEA records contain a list of satellites, which should be partial to the possible list of satellites (the satellites in the hemisphere centered at the acquired longitude and latitude). For GPS satellites, the position of each satellite for a specific time can be computed from the method described in the following publications: J. Sanz Subirana, J.M. Juan Zornoza and M. Hernandez-Pajares, "GPS and Galileo Satellite Coordinates Computation", Technical University of Catalonia, Spain, 2011 ; Global Positioning System Standard Positioning Service Signal Specification, 2nd Edition, 1995. In some embodiments of the present invention other location system may be used other than GPS, for example, GLONASS.

[00128] While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.

Claims

1. A system for secured capturing and authenticating of video clips, the system comprising:

one or a plurality of capturing devices, each of the capturing devices comprising a camera and a first processor, and configured to:

acquire a time stamp;

capture a video clip of a scene and attach to the captured video clip metadata that includes the time stamp;

transmit a notification that the capturing of the video clip has ended; upload the captured video clip with the attached metadata to a remote service unit; and

a remote service unit comprising a second processor and storage device, configured to:

receive the notification;

receive an uploaded video clip;

authenticate the received video clip by:

verifying that the received video clip is unedited with respect to the captured video clip; and

verifying that a time stamp in a metadata attached to the received video clip is identical to the acquired time stamp and that a time interval between the time indicated by the attached time stamp and a time at which the notification was received is equal to a length of the video clip and a limited time margin.

2. The system of claim 1, wherein each of said one or a plurality of the capturing devices is configured to calculate a hash or a digital signature for the captured video clip or the attached metadata, and wherein the remote service unit is configured to verify the hash or the digital signature.

3. The system of claim 1, wherein the time stamp is based on a trusted clock source selected from the group consisting of a clock of the remote service unit, a clock of a trusted third party server, a trusted hardware module with clock capability.

4. The system of claim 1, wherein the remote service unit is further configured to identify a falsified video clip, by computing camera movement of the capturing device using a computer vision technique applied on consecutive frames in the received video clip; and comparing physical motion that was detected by one or a plurality of motion sensors of a capturing device that captured the captured video clip while the capturing device was capturing that video clip with the computed camera movement; and if the recorded physical motion and the computed camera movement do not correlate determine that the video clip is falsified.

5. The system of claim 1, wherein the remote service unit is further configured to determine that a scene recorded in the received video clip was captured from a digital display device by:

performing frequency analysis on one or a plurality of frames of the received video clip , and

detecting artifacts in said one or a plurality of frames that indicate that the scene recorded in said one or a plurality of frames was captured from a digital display device.

6. The system of claim 5, wherein the artifacts are identified as an energy level above a threshold for frequencies that are greater than a threshold in a vertical axis and greater than a threshold in a horizontal axis of a frequency analysis of said one or plurality of frames.

7. The system of claim 5, wherein the artifacts are identified as echoes across a vertical axis or across a horizontal axis of a frequency analysis of said one or plurality of frames.

8. The system of claim 1, wherein the remote service unit is further configured to determine that a video clip was digitally manipulated, by:

compressing and decompressing one or a plurality of frame of the received video clip;

subtract each of said one or a plurality of frames from a corresponding frame of said compressed and decompressed one or a plurality of frames to obtain an error level analysis (ELA) image;

determining that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the ELA image is above a predetermined threshold.

9. The system of claim 1, wherein the remote service unit is further configured to determine whether a video clip is a product of digital manipulation combining video from different capturing sessions, by:

computing camera movement using a computer vision technique applied on consecutive frames in the received video clip;

for each pair of a first and second consecutive frames of the received video clip, applying a perspective transform on the first frame based on the computed camera movement for that pair of frames to produce a transformed first frame, and subtracting the second frame from the transformed first frame, thereby producing a second video clip;

accumulating frame data in the second video clip to produce a noise image; determine that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the noise image deviates from a value of the same attribute of other patches of the noise image above a predetermined threshold.

10. The system of claim 1, wherein the remote service unit is further configured to authenticate location data attached as metadata to the received video clip relating to a capturing device of said one or a plurality of capturing devices , by comparing location data for that device from two independent location data sources.

11. A service unit of a system for authenticating of video clips, the system including one or a plurality of capturing devices, each of the capturing devices configured to acquire a time stamp; capture a video clip of a scene and attach to the captured video clip metadata including the time stamp, transmit a notification that the capturing of the video clip has ended, and upload the captured video clip with the attached metadata to a remote service unit;

the service unit comprising a second processor and storage device and configured to:

receive the notification;

receive an uploaded video clip;

authenticate the received video clip by:

12. The unit of claim 11, wherein each of said one or a plurality of the capturing devices is configured to calculate a hash or a digital signature for the captured video clip or the attached metadata, and wherein the remote service unit is configured to verify the hash or the digital signature.

13. The unit of claim 11, wherein the time stamp is based on a trusted clock source selected from the group consisting of a clock of the remote service unit, a clock of a trusted third party server, a trusted hardware module with clock capability.

14. The unit of claim 11, further configured to identify a falsified video clip, by computing camera movement of the capturing device using a computer vision technique applied on consecutive frames in the received video clip; and comparing physical motion that was detected by one or a plurality of motion sensors of a capturing device that captured the captured video clip while the capturing device was capturing that video clip with the computed camera movement; and if the recorded physical motion and the computed camera movement do not correlate determine that the video clip is falsified.

15. The unit of claim 11, further configured to determine that a scene recorded in the received video clip was captured from a digital display device by:

16. The unit of claim 15, wherein the artifacts are identified as an energy level above a threshold for frequencies that are greater than a threshold in a vertical axis and greater than a threshold in a horizontal axis of a frequency analysis of said one or plurality of frames.

17. The unit of claim 15, wherein the artifacts are identified as echoes across a vertical axis or across a horizontal axis of a frequency analysis of said one or plurality of frames.

18. The unit of claim 11, further configured to determine that a video clip was digitally manipulated, by:

determine that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the ELA image is above a predetermined threshold.

19. The unit of claim 11, further configured to determine whether a video clip is a product of digital manipulation combining video from different capturing sessions, by: computing camera movement using a computer vision technique applied on consecutive frames in the received video clip;

20. The unit of claim 1 , further configured to authenticate location data attached as metadata to the received video clip relating to a capturing device of said one or a plurality of capturing devices , by comparing location data for that device from two independent location data sources.

21. A method for secured capturing and authenticating of video clips, for use in cooperation with one or a plurality of capturing devices, each of the capturing devices configured to acquire a time stamp; capture a video clip of a scene and attach to the captured video clip metadata including the time stamp, transmit a notification that the capturing of the video clip has ended; upload the captured video clip with the attached metadata to a remote service unit, the method comprising:

receiving the notification;

receiving an uploaded video clip;

authenticating the received video clip by:

22. The method of claim 21 , wherein each of said one or a plurality of the capturing devices is configured to calculate a hash or a digital signature for the captured video clip or the attached metadata, and wherein the remote service unit is configured to verify the hash or the digital signature.

23. The method of claim 21 , wherein the time stamp is based on a trusted clock source selected from the group consisting of a clock of the remote service unit, a clock of a trusted third party server, a trusted hardware module with clock capability.

24. The method of claim 21 , wherein the remote service unit is further configured to identify a falsified video clip, by computing camera movement of the capturing device using a computer vision technique applied on consecutive frames in the received video clip; and comparing physical motion that was detected by one or a plurality of motion sensors of a capturing device that captured the captured video clip while the capturing device was capturing that video clip with the computed camera movement; and if the recorded physical motion and the computed camera movement do not correlate determine that the video clip is falsified.

25. The method of claim 21 , further comprising determining that a scene recorded in the received video clip was captured from a digital display device by:

26. The method of claim 25, wherein the artifacts are identified as an energy level above a threshold for frequencies that are greater than a threshold in a vertical axis and greater than a threshold in a horizontal axis of a frequency analysis of said one or plurality of frames.

27. The method of claim 25, wherein the artifacts are identified as echoes across a vertical axis or across a horizontal axis of a frequency analysis of said one or plurality of frames.

28. The method of claim 21 , further comprising determining that a video clip was digitally manipulated, by:

subtracting each of said one or a plurality of frames from a corresponding frame of said compressed and decompressed one or a plurality of frames to obtain an error level analysis (ELA) image;

29. The method of claim 21 , further comprising determining whether a video clip is a product of digital manipulation combining video from different capturing sessions, by: computing camera movement using a computer vision technique applied on consecutive frames in the received video clip;

accumulating frame data in the second video clip to produce a noise image; determining that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the noise image deviates from a value of the same attribute of other patches of the noise image above a predetermined threshold.

30. The method of claim 21 , further comprising authenticateing location data attached as metadata to the received video clip relating to a capturing device of said one or a plurality of capturing devices , by comparing location data for that device from two independent location data sources.

31. An authentication method for identifying a falsified video clip, the method comprising:

recording physical motion detected by one or a plurality of motion sensors of a capturing device while the capturing device is capturing the video clip;

computing camera movement of the capturing device using a computer vision technique applied on consecutive frames in the video clip;

comparing the recorded physical motion with the computed camera movement; and if the recorded physical motion and the computed camera movement do not correlate determining that the video clip is falsified.

32. An authentication method for determining that a scene recorded in an image that was captured from a digital display device, the method comprising:

performing frequency analysis on the image, and

detecting artifacts in the frequency analysis that indicate that the scene recorded in the image is a scene of a digital display device.

33 The method of claim 32, wherein the image is a frame of a video clip and wherein the artifacts correspond to echoes across either a vertical frequency axis or a horizontal frequency axis in the frequency analysis.

34 The method of claim 32, wherein the image is a still photograph and wherein the artifacts correspond to an energy level above a threshold for frequencies that are greater than a threshold in a vertical axis and greater than a threshold in a horizontal axis of the frequency analysis.

35. An authentication method for determining that a video clip was digitally manipulated, the method comprising:

for each frame of the video clip,

subtract each of said one or a plurality of frames from a corresponding frame of said compressed and decompressed one or a plurality of frames to obtain an error level analysis (ELA) image; determining that the video clip was digitally manipulated if an attribute value of at least one patch of pixels in the ELA image is above a predetermined threshold.

36. The method of claim35, wherein the attribute is an energy level function

37. An authentication method for determining whether a video clip is a product of digital manipulation combining video from different capturing sessions, the method comprising:

computing camera movement using a computer vision technique applied on consecutive frames in the video clip;

for each pair of a first and second consecutive frames of the video clip, applying a perspective transform on the first frame based on the computed camera movement for that pair of frames to produce a transformed first frame, and subtracting the second frame from the transformed first frame, thereby producing a second video clip;

38. The method of claim 37, wherein the attribute is energy level function

39. The method of claim 37, wherein the predetermined threshold is computed using standard deviation.

40. A method for authenticating location data relating to a device, the method comprising:

comparing location data for that device from two independent location data sources.

41. The method of claim 40, wherein one of the two independent location sources is selected form the group of independent location sources consisting of GPS, WiFi, cellular network, Bluetooth, navigation beacon, public Internet IP address.

42. The method of claim 40, wherein the two independent location sources comprise a first independent location source and a second location source that includes location information processed from location information from the first location source.

43. The method of claim 42, wherein the first location source is configured to provide longitude latitude and altitude, and wherein the second location source is a trusted map service configured to provide altitude location data derived form longitude and latitude obtained from the first location source.

44. The method of claim 42, wherein the first location source is configured to provide longitude, latitude and list of visible satellites , and wherein the second location source is a trusted source of list of visible satellites for a given time, longitude and latitude.