Projective Transformations

Today’s problem is how to transform a given quadrilateral to another quadrilateral.

Let me be more precise. The transformation is to be considered in the context of photography. For example, look at this ‘Scrabble’ board.

Quadrilateral

As one might have expected for such a board to look like a square (instead of that unsightly trapezoid), it’s our job in this post to ‘correct’ the above image. That is, to transform a given quadrilateral to a ‘proper’ rectangle.

The solution comes in the form of Projective transformation in 2D. Projective transformations are basically fractional linear transformations:

x^{\prime}=\frac{ax+by+c}{gx+hy+i},\quad y^{\prime}=\frac{dx+ey+f}{gx+hy+i}.

Why isn’t just a simple linear transformation enough for this job? Because as we know, a 2D matrix can only describe scaling, rotation and shear. It doesn’t have translations and it also can NOT make converging lines parallel! So, not only linear transformation is insufficient, even affine transformation won’t fit our purposes. The transformation which can make parallel lines converge (i.e. make ∞ come to a finite point) is what we are really after and the projective one does that job. Still all is not lost. All these 5 transformations can be made linear if we are willing to go up a bit from 2D to 3D.

Let’s augment our 2D coordinates (x,y) by w and consider the following 3D linear transformation:

\begin{pmatrix}x^{\prime} \\ y^{\prime} \\ w^{\prime}\end{pmatrix}=\begin{pmatrix}a & b & c \\ d & e & f \\ g & h & i\end{pmatrix}\begin{pmatrix}x \\ y \\ w\end{pmatrix}

To get the 2D coordinates, we have to project back from 3D by dividing (x, y) by w. Since that division is a bit cumbersome, we will choose w=1. (The division makes sure w^{\prime}=1 too.) Let’s do a sanity check: a, b, d, e give the usual linear (2D) transformations, c, f give the translations, g, h give the projective transformations, and i is a global scaling. This last transformation is redundant for our purposes so we can set i=1.

Now, we have the right transformation tool. Why, you may ask? The answer is because this transformation matrix has 8 (unknown) parameters and given the 4 corners of the (source) quadrilateral & (destination) rectangle, we can write down 8 equations relating them. So 8 equations and 8 unknowns → any respectable linear algebra package should be able to do the ‘Maths’! (Caveat: You – How do you know the destination corners? Me – Well, that’s ‘beyond the scope’ of this post. You – $$$##%%^#)

Actually, we can do better than that. If one stares at those 8 equations long enough, one realizes that those equations can be solved analytically if the transformation is from a unit square to a given quadrilateral! You might be thinking that’s not too helpful; we want the reverse at the very least. Not quite! Because in image transformation business, if you think about it, “you don’t PUT the pixel, you GET the pixel”. (That’s my quote and you can fearlessly attribute it to me from now on. – Thanks.) For more discussion on this revelation, read this post.

So what we have to do now is simple:

  1. Get the 4 coordinates of the corners of the quadrilateral in the source image: X_q .
  2. Find the 8 parameters (a, \cdots, h) in the transformation matrix from the analytical solution for the coordinates X_q: T_s .
  3. Figure out the corners of the destination rectangle: X_r .
  4. Scale the transformation matrix in (x,y) direction appropriately so that the square scales to the required rectangle: T_r=T_s \cdot\text{diag}\left(\tfrac{1}{W}, \tfrac{1}{H}, 1\right) . (Width & Height are figured out from X_r.)
  5. Do the final transformation with translated coordinates: X_{\text{src}}=T_r \cdot (X_{\text{dest}}-X_{\text{trans}}) . (I find that incorporating the translations in the matrix is not that straightforward. Maybe it can be done, but translating the coordinates and then transforming them is simple enough. Also remember: “GET the Pixels”.)
  6. Crop the relevant portion of the transformed image.

After implementing the above algorithm, we can end up with something like this (I think the second last point becomes clear too):

Intermediary

Or to put it more bluntly, this:

Square

If you’re starting to think, I thought of all this… You’re giving me too much credit. Here’s the paper from where I learnt about this solution / algorithm (though, I think the solution given there may have some typos. I say this because that solution didn’t work ‘out of the box’):

Source