What is the first step to building an AI Video analytics product -Video data processing

8 min readJan 3, 2024

The above image summarizes what we learn in this blog.

This blog will introduce you to the first step of building your AI Video analytics product. Here, we are going to learn about where to begin from.

For the video analytics project, the nature of our dataset would be a video. In this blog we will learn about:

Technical features of Videos and Python libraries to decode those features.
Which Pre-processing techniques are used for video data processing
Significance of High FPS and Low FPS of a Video

So, let us begin by understanding the technical features of a video.

Introduction to Video data

Video files have a range of technical features that define their quality, compatibility, and overall viewing experience. Here’s a list of the most common technical features and a brief definition:

Video features:

1. Resolution: Indicates the number of pixels in each dimension that can be displayed. Common resolutions include 480p, 720p, 1080p, 4K.

2. Aspect Ratio: The ratio of the width to the height of the video frame. Common aspect ratios are 16:9 (widescreen), 4:3 (standard), and 21:9 (cinematic).

3. Frame Rate (FPS): Frames per second (FPS) is the number of individual frames or images that are displayed per second in the video. The Common frame rates include 24, 30, and 60 FPS.

4. Bitrate: The amount of data processed per unit of time in a video, usually measured in megabits per second (Mbps). A higher bitrate generally means better video quality. Higher bitrates require more bandwidth to stream smoothly, lower bitrates are used to make video more accessible over slower internet connections but at the cost of reduced quality.

5. Codec: The method for compressing and decompressing video and audio data. Encoding involves transforming video data into a format suitable for storage or transmission, using codecs like H.264, H.265, AV1 or VP9.

6. Container Format: The file format that contains the video, audio, and metadata. Examples include MP4, AVI, MOV, and MKV.

7. Color Depth: The number of bits used to represent the colour of a single pixel. The Common colour depths include 8-bit, 10-bit, and 12-bit.

8. Chroma Subsampling: A method that describes how colour information is stored and compressed. Common formats include 4:4:4, 4:2:2, and 4:2:0.

9. Video Length/Duration: The total running time of the video.

10. HDR (High Dynamic Range): Indicates whether the video has a higher contrast ratio and colour depth, allowing for more realistic images.

11. File Size: The total digital size of the video file, usually measured in megabytes (MB) or gigabytes (GB).

12. Interlaced vs. Progressive Scan: Describes how the video is rendered — interlaced videos display alternate lines per frame, while progressive scan displays every line in each frame.

Audio features:

1. Audio Channels: The number of separate audio channels. For example, stereo (2 channels), 5.1 surround sound (6 channels), or 7.1 surround sound (8 channels).

2. Audio Codec: The technology compresses and decompresses audio data. Examples include AAC, MP3, FLAC, and Dolby Atmos.

3. Audio Bitrate: Similar to video bitrate, this is the amount of audio data processed per second, impacting audio quality.

4. Subtitle Tracks: Information about the number and types of subtitle tracks included in the video file, which can be in various formats like SRT, ASS, or embedded.

Scripts to extract technical features of a video

To extract the technical features of a video file using Python and OpenCV, you will need to have OpenCV installed. If you haven’t installed it yet, you can do so via pip:

Installations:

pip install opencv-python
pip install pymediainfo

Test Vidoe data:

OpenCV Script:

The below script uses OpenCV to find out the features of video but features like Color Depth, HDR, Chroma Subsampling would require additional libraries.


import os
import cv2
from pymediainfo import MediaInfo
import os

def extract_video_features(video_path):
    # Open the video file
    cap = cv2.VideoCapture(video_path)

    if not cap.isOpened():
        return "Could not open video file"

    # Extracting video properties
    width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
    height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    duration = frame_count / fps
    codec_code = int(cap.get(cv2.CAP_PROP_FOURCC))
    codec = ''.join([chr((codec_code >> 8 * i) & 0xFF) for i in range(4)])
    # bitrate = cap.get(cv2.CAP_PROP_BITRATE)  # Note: This may not be supported in all environments
    file_size_bytes = os.path.getsize(video_path)
    file_size_bits = file_size_bytes * 8  # 1 byte = 8 bits
    bitrate_mbps = (file_size_bits / (1024 * 1024)) / duration
    file_size_mbs = file_size_bytes / (1024 * 1024)
    container_format = os.path.splitext(video_path)[1]

    # Release the video capture object
    cap.release()

    # Return the extracted information
    return {
        "Resolution": f"{int(width)}x{int(height)}",
        "Frame Rate (FPS)": fps,
        "Total Frames": frame_count,
        "Duration (seconds)": duration,
        "Codec": codec,
        "Bitrate (mbps)": bitrate_mbps,
        "File Size (MB)": file_size_mbs,
        "Container Format": container_format
    }

Output:

{'Resolution': '1920x1080',
 'Frame Rate (FPS)': 30.0,
 'Total Frames': 3919,
 'Duration (seconds)': 130.63333333333333,
 'Codec': 'h264',
 'Bitrate (mbps)': 13.642249996791723,
 'File Size (MB)': 222.76657390594482,
 'Container Format': '.mp4'}

MediaInfo Script:

MediaInfo is specialized in media file analysis.

from pymediainfo import MediaInfo
import os

def extract_video_features_mediainfo(video_path):
    # Extract media info from the file
    media_info = MediaInfo.parse(video_path)
    video_tracks = [track for track in media_info.tracks if track.track_type == "Video"]
    
    # Check if video track is available
    if not video_tracks:
        return "No video track found"

    # Assuming first video track contains the main video information
    video_track = video_tracks[0]

    # Extracting properties
    resolution = f"{video_track.width}x{video_track.height}"
    aspect_ratio = video_track.display_aspect_ratio
    fps = video_track.frame_rate
    total_frames = video_track.frame_count
    duration = video_track.duration / 1000  # converting milliseconds to seconds
    codec = video_track.codec_id_hint or video_track.codec_id or video_track.format
    color_depth = video_track.bit_depth
    hdr = "Yes" if video_track.hdr_format else "No"
    chroma_subsampling = video_track.chroma_subsampling

    # File size in MB
    file_size_mbs = os.path.getsize(video_path) / (1024 * 1024)

    # Container format
    container_format = os.path.splitext(video_path)[1]

    # Return the extracted information
    return {
        "Resolution": resolution,
        "Aspect Ratio": aspect_ratio,
        "Frame Rate (FPS)": fps,
        "Total Frames": total_frames,
        "Duration (seconds)": round(duration, 2),
        "Codec": codec,
        "File Size (MB)": round(file_size_mbs, 2),
        "Container Format": container_format,
        "Color Depth": color_depth,
        "HDR": hdr,
        "Chroma Subsampling": chroma_subsampling
    }

# Example usage
video_info = extract_video_features_mediainfo("path_to_your_video.mp4")
print(video_info)

Output:

{'Resolution': '1920x1080',
 'Aspect Ratio': '1.778',
 'Frame Rate (FPS)': '30.000',
 'Total Frames': '3919',
 'Duration (seconds)': 130.63,
 'Codec': avc1,
 'File Size (MB)': 222.77,
 'Container Format': '.mp4',
 'Color Depth': 8,
 'HDR': 'No',
 'Chroma Subsampling': '4:2:0'}

Different Labeling Conventions for Codec:

OpenCV typically uses the more common, user-friendly names for codecs. In this case, it’s using ‘h264’, which is the widely recognized name for the H.264 codec.
MediaInfo, on the other hand, uses the codec’s FourCC code or a more technical identifier. ‘avc1’ is essentially a variation of the H.264 codec, often used as an identifier in MP4 containers. It’s another name for essentially the same codec.

What is meant by the Aspect Ratio of a video:

An aspect ratio of 1.778 for a video essentially means that the width of the video is 1.778 times its height.

In more practical terms:

If the video is 16 inches wide, it will be 9 inches high.
If the video is 1920 pixels wide, it will be 1080 pixels high (as 1920/1080 is the same ratio as 16/9).
Here is the formula:
Aspect Ratio= Width of the Video Frame/Height of the Video
It’s important to note that the aspect ratio is a measure of the shape of the video frame, not its size.

This hierarchical structure helps in categorizing and quantifying digital information efficiently.

1 Byte = 8 Bits
1 Kilobyte (KB) = 1,024 Bytes
1 Megabyte (MB) = 1,024 Kilobytes (KB) = 1,048,576 Bytes
1 Gigabyte (GB) = 1,024 Megabytes (MB) = 1,073,741,824 Bytes

Pre-processing techniques used for video data processing

Pre-processing techniques in video processing are crucial for preparing raw video data for further analysis or enhancement. These techniques aim to improve the quality of the video or to standardize its format, making it more suitable for downstream tasks like analysis, compression, or editing. Here are some common pre-processing techniques used in video processing:

1. Noise Reduction:
— Removes unwanted noise or graininess from the video. Common techniques include temporal noise reduction (which looks across frames) and spatial noise reduction (which looks within a single frame).

2. Video Stabilization:
— Compensates for unwanted camera movements to produce smoother footage. This can be achieved through techniques like optical flow analysis or gyro-based stabilization.

3. Color Correction and Grading:
— Adjusts the colour balance and tones in the video to correct for lighting conditions or to achieve a desired aesthetic. Techniques include white balance correction, colour grading, and histogram equalization.

4. Deinterlacing:
— Converts interlaced video (common in standard-definition TV broadcasts) into a progressive format by interpolating additional lines, reducing flicker and improving clarity.

5. Resolution Upscaling:
— Increases the resolution of the video through algorithms that interpolate additional pixels. Advanced methods might use AI-based techniques for more natural-looking results.

6. Aspect Ratio Conversion:
— Changes the aspect ratio of a video to fit different display formats, often involving cropping or letterboxing/pillarboxing (adding black bars to the top/bottom or sides).

7. Frame Rate Conversion:
— Adjusts the frame rate of the video, either by adding or removing frames. Techniques include frame interpolation for increasing the frame rate and decimation for decreasing it.

8. Contrast Enhancement:
— Adjusts the contrast levels to make the video appear clearer, often by stretching the histogram of pixel intensity.

9. Motion Detection and Compensation:
— Detects and compensates for motion within video frames, which can be essential for subsequent tasks like object tracking or motion-based compression.

10. Format Standardization:
— Converts the video into a standard format or codec, ensuring compatibility with various downstream processing or display technologies.

11. Lens Distortion Correction:
— Corrects distortions caused by the camera lens, such as fisheye or barrel distortion, especially important in wide-angle or action camera footage.

12. Image Sharpening:
— Enhances the edges and details in the video frames, making the video appear more crisp and defined.

These pre-processing steps are often tailored to the specific requirements of the application or the condition of the source video. The choice of algorithms and the extent of processing can significantly impact the quality and usability of the final video output.

Significance of High FPS and Low FPS of a Video

High FPS (e.g., 60 FPS, 120 FPS)-

Smooth Motion (particularly noticeable in fast-moving scenes or action sequences), Improved Clarity (reducing motion blur), Enhanced Realism, Larger File Size, Greater Processing Power needed (Requires more computational power and better hardware capabilities).

Low FPS (e.g., 24 FPS, 30 FPS)-

Cinematic Feel (The traditional 24 FPS is standard in cinema and gives movies their characteristic ‘film look’. Many viewers associate this frame rate with high-quality storytelling), Reduced Smoothness, Increased Motion Blur, Smaller File Size, Lower Hardware Requirements.

Conclusion:

Balance is Key when going ahead with choosing the FPS of your data also being aware of your computational power and hardware capabilities to achive the best results for your product.

As I conclude this blog, I believe that this blog will be helpful for people who want to understand the basics of video data and its processing.

Readers are welcome to contribute to this blog chain by suggesting more topics to explore, in the comment section.

Feel free to follow my medium page where I write about exciting works on Computer vision and AI.

What is the first step to building an AI Video analytics product -Video data processing

Introduction to Video data

Video features:

Audio features:

Scripts to extract technical features of a video

Installations:

Test Vidoe data:

OpenCV Script:

Output:

MediaInfo Script:

Output:

Different Labeling Conventions for Codec:

What is meant by the Aspect Ratio of a video:

Pre-processing techniques used for video data processing

Significance of High FPS and Low FPS of a Video

High FPS (e.g., 60 FPS, 120 FPS)-

Low FPS (e.g., 24 FPS, 30 FPS)-

Conclusion:

Written by Pallawi

No responses yet