Edge Detection¶

Key Idea? Look for strong gradients, and then post-process.

How are edges caused?¶

Due to discontinuties in:

  1. Surface Normal
  2. Surface Color/Appearance
  3. Depth
  4. Illumination
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import datasets, misc
from PIL import Image
from scipy.ndimage import convolve, correlate, gaussian_filter, filters, laplace
import cv2 as cv

Egdes look like Steep Cliffs in images represented as a function¶

In [3]:
im = np.zeros((5,5))
for i in range(5):
    im[i, 2] = 1
f,ax = plt.subplots(1, 2)
ax[0].imshow(im, cmap='gray')
ax[0].title.set_text("Image")
ax[1].plot(im[0])
ax[1].title.set_text("Intensity Function")

Derivatives with Convolution¶

For discrete data, we can approximate using finite differences: $\frac{\partial{f(x,y)}}{\partial{x}} \approx \frac{f(x+1, y) - f(x, y)}{1}$

Example of such a filter: $\begin{bmatrix}-1 & 1\end{bmatrix}$, $\begin{bmatrix}-1 \\ 1\end{bmatrix}$, $\begin{bmatrix}1 \\ -1\end{bmatrix}$

In [28]:
derivative_filter = np.matrix([-1, 1])
derivative_of_image = convolve(im, derivative_filter)
plt.plot(derivative_of_image[0])
Out[28]:
[<matplotlib.lines.Line2D at 0x751d345827d0>]

The Extrema correspond to the edges

Standard Edge Detection Filters¶

In [4]:
ascent = datasets.ascent().astype('int32')
In [4]:
plt.imshow(ascent, cmap='gray')
Out[4]:
<matplotlib.image.AxesImage at 0x751d384299f0>

Sobel/Roberts/Prewitt Edge Detection¶

Prewitt (Vertical Version): $\begin{bmatrix} -1 & 0 & 1 \\ -1 & 0 & 1 \\ -1 & 0 & 1 \end{bmatrix}$
Sobel (Vertical Version): $\begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix}$
Roberts (Diagnol Edges Detection): $\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}$

The above given version of Prewitt & Sobel detect edges going from black to white. If the 1st and 3rd columns are interchanged the filters would detect edges going from white to black
Assumption: $0 \rightarrow black$, $255 \rightarrow white$

In [5]:
prewitt = np.matrix([[-1,0,1], [-1,0,1], [-1,0,1]])
sobel = np.matrix([[1,0,-1], [2,0,-2], [1, 0, -1]])
roberts = np.matrix([[0, 1], [-1, 0]])
In [6]:
sobel_v = convolve(ascent, sobel)
sobel_h = convolve(ascent, sobel.T)
magnitude = np.sqrt(sobel_v**2 + sobel_h**2)
magnitude *= 255 / np.max(magnitude) # To bring the max of magnitude to 255
In [7]:
f, ax = plt.subplots(2,2)
ax[0,0].imshow(ascent, cmap='gray')
ax[0,0].title.set_text("Original")
ax[0,1].imshow(sobel_h, cmap='gray')
ax[0,1].title.set_text("Horizontal")
ax[1,0].imshow(sobel_v, cmap='gray')
ax[1,0].title.set_text("Vertical")
ax[1,1].imshow(magnitude, cmap='gray')
ax[1,1].title.set_text("Magnitude")

Image Graidents¶

  • Graident of an Image: $\nabla{f} = [\frac{\partial{f}}{\partial{x}}, \frac{\partial{f}}{\partial{y}}]$
  • If $\nabla{f} = [\frac{\partial{f}}{\partial{x}}, 0]$, then there is a no change in the image intensity in the y-direction and vice versa.
  • Graident/Edge Strength: $\sqrt{(\frac{\partial{f}}{\partial{x}})^2 + (\frac{\partial{f}}{\partial{y}})^2}$
  • Orientation of Gradient: $\theta = tan^{-1}({\frac{\frac{\partial{f}}{\partial{x}}} {\frac{\partial{f}}{\partial{y}}}})$

Effect of Noise?¶

Possible Solution:

    1. Smooth First
    1. Look for Peaks in $\frac{d(f * g)}{dx}$
In [93]:
x = np.linspace(-5, 5, 256)
denoised_y = 1/(1+np.exp(-x))
noise = np.random.normal(0,0.01,256)
y  = denoised_y + noise
to_image = y.reshape(16, 16)
post_gaussian = gaussian_filter(to_image, sigma=2)
to_function = post_gaussian.reshape(256,)
f, ax = plt.subplots(2, 2)
ax[0,0].plot(x, y)
ax[0,0].title.set_text("Image function with Noise")
ax[0,1].imshow(to_image, cmap='gray')
ax[0,1].title.set_text("Image with Noise")
ax[1,0].imshow(post_gaussian, cmap='gray')
ax[1,0].title.set_text("Image De-noised")
ax[1,1].plot(x, to_function)
ax[1,1].title.set_text("Image function post-smoothing")

We can now take the derivative of the function to detect edges.

Derivative Theorem of Convolution¶

Differentiation is acheived through convolution, and convolution is associative. Since, we know that gradient calculation can also be written as a convoluton operation

$$ \frac{d(f*g)}{dx} = f * \frac{d(g)}{dx} = g * \frac{d(f)}{dx} $$

This saves us an operation, because we can pre-calculate the derivative of the Gaussian.

In [113]:
f, ax = plt.subplots(1,2)
f.set_figwidth(15)
ax[0].plot(x, np.gradient(denoised_y))
ax[0].title.set_text("Image convolved by Gaussian")
ax[1].plot(x, np.gradient(np.gradient(denoised_y)))
ax[1].title.set_text("Image convolved by Laplacian")

The zero-crossing in the image convolved by the second-derivative indicates edge.

Gaussians and its derivatives¶

$\nabla^2{f} = \frac{\partial^2{f}}{\partial{x^2}} + \frac{\partial^2{f}}{\partial{y^2}}$

In [147]:
from scipy import signal
gs = signal.gaussian(49, 7)
dr = np.gradient(gs)
la = np.gradient(dr)

f, ax = plt.subplots(1, 3)
f.set_figheight(5)
f.set_figwidth(20)
ax[0].plot(gs)
ax[0].title.set_text("Gaussian")

ax[1].plot(dr)
ax[1].title.set_text("Derivative")

ax[2].plot(la)
ax[2].title.set_text("Laplacian")

Criteria for Good Detector¶

  1. Good Detection: Find real edges ignoring noise or other artifacts
  2. Good Localization: Detect edges as close as possible to true edges
  3. Single Response: Return one point only for each true edge point

Non-Maxima Supression¶

image.png

Assume the orange curve is the edge of an image and is full of noise, we would supress that noise by:

  • Check if a pixel is the local maximum along the graident direction.
  • May require checking the intensities of interpolatex pixels, as drawn in the image.
  • My understanding is this process would result in us picking out 1 pixel over a group of pixels(as shown in the right image)

Hysterisis Thresholding¶

A technique to handle discontinuities.
Check for well-connected egdes. How?
Use high-threshold to start edges curves and low threshold to continue them

image.png

  • If gradient at pixel > 'High' = Edge Pixel
  • If gradient at pixel < 'Low' = Non-Edge Pixel
  • If gradient at pixel >= 'Low' and <= 'High' = Edge Pixel iff it is connected to an edge pixel directly or via pixels between 'low' and 'high'.

Canny Edge Detection¶

  1. Filter the image with derivative of Gaussian
  2. Find Maginitude and Orientation of Gradient
  3. Non-Maxima Supression
  4. Do Hysterisis (AKA Double Thresholding and Edge Tracking)
In [135]:
canny_edges = cv.Canny(np.uint8(ascent),100,200, apertureSize=3)
f, ax = plt.subplots(1, 3)
f.set_figwidth(15)
ax[0].imshow(magnitude, cmap='gray')
ax[0].title.set_text("Sobel Filter")
ax[1].imshow(canny_edges, cmap='gray')
ax[1].title.set_text("Canny Edges")
ax[2].imshow(canny_edges)
Out[135]:
<matplotlib.image.AxesImage at 0x751d1c744280>

Effect of $\sigma$ in Canny Edge Detector¶

  • Large $\sigma$ detects large-scale edges
  • Small $\sigma$ detects fine edges

Straight-line Detection using Canny edges¶

  1. Compute Canny Edges
  2. Assign each edge pixel to one of the 8 directions
  3. For each direction 'd', get edgelets.
    • Find connected components for edge pixels with directions {d-1, d, d+1}
  4. For each edgelet, compute the straightness and orientation using the second moment matrix M of the edge pixels
$$ M = \begin{bmatrix} \Sigma(x-\mu_x)^2 & \Sigma(x-\mu_x)(y - \mu_y) \\ \Sigma(x-\mu_x)(y - \mu_y) & \Sigma(y-\mu_y)^2 \end{bmatrix} $$
  • $\theta = tan^{-1}(\frac{v_1}{v_0})$, where $v_1$ is the eigenvector corresponding to the largest eigenvalue
  • straightness = $\frac{\lambda_2}{\lambda_1}$, where $\lambda_1$ is the largest eigenvalue
  1. Threshold straightness appropriately and store line segments

Laplacian of Gaussian¶

Discrete approximation of Second-Derivative
$$ \frac{\partial^2{f}}{\partial{x^2}} = f(x+1, y) + f(x-1, y) - 2f(x, y) \\ \frac{\partial^2{f}}{\partial{y^2}} = f(x, y+1) + f(x, y-1) - 2f(x, y) $$ Substituting in Laplacian Operator $$ f(x+1, y) + f(x-1, y) + f(x, y+1) + f(x, y-1) - 4f(x,y) \\ \begin{bmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0\end{bmatrix} $$

In [199]:
def gaussian_kernel(size: int, sigma: float) -> np.ndarray:
    """Generates a (size x size) Gaussian kernel."""
    kernel = np.fromfunction(
        lambda x, y: (1/(2*np.pi*sigma**2)) * np.exp(-((x-(size-1)/2)**2 + (y-(size-1)/2)**2) / (2*sigma**2)),
        (size, size)
    )
    return kernel / np.sum(kernel)

gaussian_kernel_31x31 = gaussian_kernel(31, 5)
In [201]:
log = laplace(gaussian_kernel_31x31)

plt.imshow(log, cmap='gray')

gaussian_kernel_17x17 = gaussian_kernel(17, 5)
log = laplace(gaussian_kernel_17x17)

Laplacian of Gaussian as a Blob Detector¶

Convolution with a filter can be viewed as comparing a little image of what you want to find against all the local regions of the image.

Since, LoG as an image looks like a Blob, it can be used for blob detection.

In [174]:
sunflower = cv2.imread('sunflower.jpg', cv2.IMREAD_GRAYSCALE)
In [175]:
plt.imshow(sunflower, cmap='gray')
Out[175]:
<matplotlib.image.AxesImage at 0x751d1c432d10>
In [202]:
plt.imshow(convolve(sunflower, log), cmap='gray')
Out[202]:
<matplotlib.image.AxesImage at 0x751d036d91e0>

From Blobs to Corners:¶

Motivation: What are the interesting features that set apart an image from every other image?

Possible Solution:

  1. Look for image regions that are unusual.
  2. Textureless patches are nearly impossible to Localize. (Eg: Sky)
  3. Patches with large contrast changes (gradients) are easier to localize (Edges)
  4. But edges suffer from aperture problem
  5. Significant Gradients in atleast 2 different orientations are the easiest (Eg: Corners)

Aperture Problem:¶

A feature may appear as 2 different things in 2 different scales(??)

  • In a "flat": No Change in Grad in all directions
  • In a "edge": No Change in Grad in the direction of the edge
  • In a "corner": Change in Grad in all directions

image.png

Autocorrelation¶

Correlation with itself
$ E_{AC}[\Delta{u}] = \Sigma_{x,y} W(P_i)[I(P_i + \partial{u}) - I(P_i)]^2 $, in other words: Sum of change in pixel intensities as we move the patch

$W(P_i)$ is the window function

Taylor Series Expansion of $I[P_i + \Delta{u}] = I(P_i) + \nabla{I}(P_i)\Delta{u}$, where $\Delta{u}$ indicates the shift.

Therefore, $$ E_{AC}[\Delta{u}] = \Sigma_{x,y} W(P_i)[I(P_i + \partial{u}) - I(P_i)]^2 \\ \approx \Sigma_{x,y} W(P_i)[I(P_i) + \nabla{I}(P_i)\Delta{u} - I(P_i)]^2 \\ \approx \Sigma_{x,y} W(P_i)[\nabla{I}(P_i)\Delta{u}]^2 \\ \approx \Delta{u^T} A \Delta{u} $$


$$ A = \Sigma_u\Sigma_v W(u,v) \begin{bmatrix} I_x^2 & I_xI_y \\ I_xI_y & I_y^2\end{bmatrix} \\ = W *\begin{bmatrix} I_x^2 & I_xI_y \\ I_xI_y & I_y^2\end{bmatrix} \\ = U \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} U^T $$

A tells us how much did the gradients change as the position of the patch is changed and the eigenvalues of A reveal the amount of intensity change in the two principal orthogonal directions in the window.

*How do eigenvalues determine if an image point is a corner?

image-2.png

  1. If $\lambda_1, \lambda_2$ are small, then it is a flat.
  2. If $\lambda_2 >> \lambda_1$ or $\lambda_1 >> \lambda_2$, then it is an edge
  3. If $\lambda_1, \lambda_2$ are large, and $\lambda_1 \approx \lambda_2$, then it is a corner. The $E_{AC}$ is almost constant in all directions.

Harris Corner Detector¶

a. Compute gradients at each point in the image.
b. Compute A for each image window to get its cornerness score
c. Compute eigenvalues or Compute $M_c$, where $M_c = \lambda_1\lambda_2 - k(\lambda_1 + \lambda_2)^2$ = $det - k * trace^2$
d. Find points who have $M_c > threshold$
e. Perform non-maxima supression of the points

Variants of $M_c$¶

  1. The $M_c$ given above. It is rotationally-invariant and downweights edge-like features
  2. Triggs '04 = $\lambda_0 - \alpha \lambda_1$
  3. Brown '05 = User Harmonic Mean, $\frac{det(A)}{trace(A)} = \frac{\lambda_0\lambda_1}{\lambda_0 + \lambda_1}$

Properties¶

  1. Not scale-invariant. If the image is zoomed-in, a pixel may be detected as an edge, but would be detected as corner, if the image is sufficiently zoomed-out. Add Image(??)
  2. Rotation Invariant - Yes
  3. Shift-Invariant($I = I + b$)- Yes, since only gradients(derivatives) are used.
  4. Photometric Change($I = aI + b$) - Partially invariant (Based on the chosen threshold)

image.png

In [15]:
## Implementation
gray = ascent.astype("uint8")
dst = cv.cornerHarris(gray, 2, 3, 0.04)
dst = cv.dilate(dst, None)
gray[dst>0.01*dst.max()]=255
plt.imshow(gray, cmap='gray')
Out[15]:
<matplotlib.image.AxesImage at 0x79cd26d955a0>

Selecting scale-invariant interest points¶

How can we select interest points in each image such that the detections are repeatable across different scales?

  1. Plot the region size vs corner measure.
  2. Find the regions in both the image which have peaks

image.png

In [141]:
filename = 'harris.png'
img = cv.imread(filename)
img_ = cv.imread(filename)

gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
gray = np.float32(gray)
dst = cv.cornerHarris(gray,2,3,0.04)
dst_ = cv.cornerHarris(gray,10,3,0.04)
#result is dilated for marking the corners, not important
dst = cv.dilate(dst, None)
dst_ = cv.dilate(dst_, None)

# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.1*dst.max()]=[0,0,255]
img_[dst_>0.1*dst_.max()] = [0,0,255]

f, ax = plt.subplots(1,2)
ax[0].imshow(img)
ax[1].imshow(img_)
Out[141]:
<matplotlib.image.AxesImage at 0x7d9d7c4bfa90>

Automatic Scale Selection¶

Design a function that takes a region around the point and outputs its response as a scalar given the region.
We will observe that the function outputs different response values depends on the region size. And more importantly, the local maxima of this function is a strong clue indicating that the region size corresponding to the local maxima should be invariant to image scale.

Textures¶

Regular or stochastic patterns caused by bumps, grooves or markings.
Why care about textures? Conveys more information that can be exploited to match regions of interest in images.
How to represent textures? Compute responses of blobs and edges at various orientations and scales.

Filter Banks¶

  1. They have multiple filters that only allow certain frequencies.
  2. They seperate the input signal into multiple components
  3. Each component carrying a single freq. sub-band of the original signal.
  4. Then, Process the image with each filter.

Examples: Gabor, Steerable, Isotropic Gaussian

Gabor¶

A special class of band-pass filters. They mimic the human visual system
It can be viewed as a product of the convolution between gaussian wave and a sinusoidal signal with a certain frequency and modulation.

image.png

Gabor Filters at an orientation of 15 degrees

Steerable Filter¶

A class of oriented filters that can be steered or expressed as a linear combination of a set of basis filters.

Sift Theory¶

What does it do? Extacts features and their descriptors that are invariant to translation, rotation, scale, and shear.
Steps:

  1. Scale-Space Extrema Detection: Detects interesting points invariant to scale and orientation using DoG (an approximation of LoG)
  2. Keypoint Localization: Accurately Locating the feature keypoints
  3. Orientation Estimation: Assignting orientation to keypoints.
  4. Keypoint Descriptor: Describing the keypoints as a high dimensional vector.

What does it output? It outputs keypoints(Location, Scale and Orientation information), descriptors (A 128-D vector that captures gradient information.)

Keypoint Localization¶

image.png

We are trying to nudge the detected extrema towards true extrema.

Solution: Taylor Series Expansion: $D(s_0 + \Delta s) = D(s_0) + \frac{\partial{D^T}}{\partial s} |_{s_0} \Delta s + \frac{1}{2}\Delta{s} \frac{\partial^2{D^T}}{\partial s} |_{s_0} \Delta{s}$, where $s_0 = (x_0, y_0, \sigma_0)^T$ and $\Delta{s} = (\delta{x}, \delta{y}, \delta{\sigma})^T$

The location of the extremum $\hat{s}$ is determined by taking the derivative of $D$ wrt to $\Delta{s}$ and setting it to $\hat{0}$

$$ \frac{\partial{D^T}}{\partial s} + \frac{\partial^2{D^T}}{\partial s} \Delta{s} = 0 \\ \hat{s} = (\frac{\partial^2{D^T}}{\partial s})^-1 \frac{\partial{D^T}}{\partial s}|_{s_0} $$

Low contrast Point Elimination: Reject $D(\hat{s})$ if smaller than 0.03

Edge Elimination: Hessian(H) = \begin{bmatrix} D_{xx} & D_{xy} \\ D_{xy} & D_{yy} \end{bmatrix}
$\alpha = $ largest eigenvalue, and $\beta = $ smallest eigenvalue.
$Tr(H) = \alpha + \beta$
$Det(H) = \alpha\beta$
$\frac{Tr(H)^2}{Det(H)} = \frac{(\alpha + \beta)^2}{\alpha \beta} = \frac{(r\beta + \beta)^2}{r \beta^2} = \frac{(r+1)^2}{r}, r = \frac{\alpha}{\beta}$. Minimum value when r = 1.
Reject if $\frac{Tr(H)^2}{Det(H)}$ > Threshold

Automatic Scale Selection¶

DoG: Think of Gaussian Blur as a socialist government. The DoG calculates who were the features that had rapid changhe in intensity by subtracting 'money' after a 'mild' socialist government and an 'extreme' socialist government. Rest of it is written above.

Orientation Estimation¶

  1. Use the correct scale of a point to choose correct image: $\hat{I(x, y)} = G(x, y,\sigma) * I(x,y)$
  2. Compute Gradient Magnitude and Orientation using Finite Differences.
  3. Take a certain region around a keypoint, and construct a histogram based on the orientation of the points in the region.
  4. Histogram entries are weighted by: a.) Gradient Magnitude, b.) A gaussian function with $\sigma$ equal to 1.5 times scale of the keypoint.
  5. Select the PEAK as the direction of the keypoint.
  6. Introduce additonal keypoints at same location if the another peak is within 80% of the max peak of the histogram but with a different direction.

Keypoint Descriptor:¶

  1. Take a 16X16 window around a keypoint.
  2. Divide 16X16, into 4X4 each.
  3. Construct a histogram of each of the 16 patches with '8' bins.
  4. It results in a 128-D vector.

To reduce effects of contrast or gain, the 128-D vector is normalized to unit length and values are cropped to 0.2, and again normalized...

SURF¶

3X faster than SIFT. Uses box filters instead of Gaussians to approximate Laplacians and many more. Uses Haar wavelets to get keypoint orientations. Haar Wavelets are very fast to compute

MOPS¶

Rotate patch according to its dominant graident direction.

In [1]:
## Sift Implementation

Log Polar¶

Co-ordinate system in 2 dimensions parametrized by logarithmic distance from origin and angle. It is location and scale invariant.

image.png

Two points are said to be in correspondance if they minimize value of $C_{ij} = \frac{1}{2} \Sigma_{k=1}^K \frac{(h_i(k) - h_j(k))^2}{h_i(k) + h_j(k)}$

Distance less than a threshold implies shape match

MSER¶

Method for Blob Detection. Identify regions that stay same despite fluctuations in gray-level threshold.

Methodology:

  1. Threshold the image at 'g'
  2. As we decrease 'g', new regions appear.
  3. As we increase 'g', regions coalesce.

Regions at a particular 'g' level denoted as $R_1^g, R_2^g, ..., R_n^g$, where $|R_i^g|$ = total number of pixels in $R_i^g$
$ \psi(R_i^g) = \frac{|R_j^{g-\Delta}| - |R_k^{g+\Delta}|}P $

In [ ]:
## HoG
In [ ]:
## Pyramidal HoG
In [ ]:
## LBP