Searching thru videos has been ALOT easier
Define rusted areas of a bridge or a towers
Creating a grid-like heatmap can also show us the levels of rust of each part of the bridge
You can use one classifier to detect the Bridge from a plethora of images, and another to check if it’s rusted or not, and another to determine the levels of rust.
A person on the ground takes multiple high-resolution images from multiple places. A computer vision expert splits these images into smaller groups. Each of the smaller images is passed to a custom classifier that can detect the presence of the metal structure versus other, non-metal structures. After this, the images are passed through another custom classifier that is trained to detect the presence of rust in images.
Current Active Research in Computer Vision:
Computer Vision (just like much of AI) can be divided into two parts → Building the model, and then deploying that model
A digital Image is a rectangular array of numbers, where the length is the rows, and width is the columns → Each pixel is stored as a value in the matrix
It is composed of pixels, each represented by intensity values between 0 and 255.
Intensity values determine shades of gray in a grayscale image.
As intensity goes down, i.e. from 255 → 0, the “brightness” of that pixel goes down.
255 would be max brightness, or white (in grayscale), where 0 would be black
Coloured Images, or RGB images are like a cube. They have 3 channels (Red, Green, Blue) and are combined together to store the coloured image
Instead of just having a matrix for a grayscale image, we now have 3 channels as well. So it’ll be a 3D tensor, of shape (3,n,d), where n = number of rows, and d = number of columns.
Videos can be broken down into frames, which are also just images.
For example, in the 5 frames below are from 1 video.
Images can be stored as .png and .jpeg files