{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CSE 152 : Introduction to Computer Vision, Spring 2018 – Assignment 2\n", "### Instructor: Ben Ochoa\n", "### Assignment Published On: Wednesday, April 11, 2018\n", "### Due On: Wednesday, April 25, 2018, 11:59 PM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Instructions\n", "* Review the academic integrity and collaboration policies on the course website.\n", "* This assignment must be completed individually.\n", "* This assignment contains both math and programming problems.\n", "* All solutions must be written in this notebook\n", "* For the Math problems you may use Markdown/LATEX or you can work it out on paper and upload the scanned copy after merging with the .ipynb PDF. Remember to show work and describe your solution.\n", "* Programming aspects of this assignment must be completed using Python in this notebook.\n", "* If you want to modify the skeleton code, you can do so. This has been provided just to provide you with a framework for the solution.\n", "* You may use python packages for basic linear algebra (you can use numpy or scipy for basic operations), but you may not use packages that directly solve the problem.\n", "* If you are unsure about using a specific package or function, then ask the instructor and teaching assistants for clarification.\n", "* You must submit this notebook exported as a pdf. You must also submit this notebook as .ipynb file.\n", "* You must submit both files (.pdf and .ipynb) on Gradescope. You must mark each problem on Gradescope in the pdf.\n", "* It is highly recommended that you begin working on this assignment early." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Problem 1: Geometry [20 points]\n", "\n", "Consider a line in the 2D plane, whose equation is given by $a\\tilde{x} + b\\tilde{y} + c = 0$. This can equivalently be written as $\\boldsymbol{l}^\\top \\boldsymbol{x} = 0$, where $\\boldsymbol{l} = (a, b, c)^\\top$ and $\\boldsymbol{x} = (\\tilde{x}, \\tilde{y}, 1)^\\top$. Noticing that $\\boldsymbol{x}$ is a homogeneous representation of $\\tilde{\\boldsymbol{x}} = (\\tilde{x}, \\tilde{y})^\\top$, we can view $\\boldsymbol{l}$ as a homogeneous representation of the line $a\\tilde{x} + b\\tilde{y} + c = 0$. We see that the line is also defined up to a scale since $(a, b, c)^\\top$ and $k(a, b, c)^\\top$ with $k\\neq0$ represents the same line.\n", "All points $(x, y)$ that lie on the line $a\\tilde{x} + b\\tilde{y} + c = 0$ satisfy the equation $\\boldsymbol{l}^\\top \\boldsymbol{x} = 0$.
\n", ">$\\textbf{Statement 1:}\\text{A point } \\boldsymbol{x} \\text{ lies on the line } \\boldsymbol{l} \\text{ if and only if } \\boldsymbol{l}^\\top \\boldsymbol{x} = \\boldsymbol{x}^\\top \\boldsymbol{l} = 0$\n", "\n", "1. [4 points] Using Euclidean coordinates, find the equation of the line perpendicular to the family of lines $y = x + \\lambda$ whereas $\\lambda \\in (-\\infty, \\infty)$ and at a distance $d$ from the origin. Your answer should be represented only in terms of the given parameters.

\n", "2. [6 points] Prove the following two statements that follow from $\\textbf{Statement 1}$.
\n", " a). The cross product between two points gives us the line connecting the two points
\n", " b). The cross product between two lines gives us their point of intersection

\n", "3. [4 points] What is the line, in homogenous coordinates, joining the inhomogeneous points $(1, 4)$ and $(4, 5)$.

\n", "4. [6 points] When a rectangle $ABCD$ is observed under pinhole perspective, the image will be arbitrary quadrilateral $A'B'C'D'$. Answer the following questions using your newly learned skilled in working with homogeneous representations.
\n", " a). [3 points] For any arbitrarily imaged rectangle $ABCD$ with non zero area, can $A'B'C'D'$ ever be a non-convex quadrilateral? Explain the intuition behind your answer. (Note: A convex polygon is a simple polygon (not self-intersecting) in which no line segment between two points on the boundary ever goes outside the polygon.)
\n", " b). [3 points] Let $A' = (t, t)$, $B' = (t, 6t)$, $C' = (4t, 6t)$ and $D' = (2t, 4t)$ be the vertices of the image. Find all the vanishing points of the quadrilateral (i.e. the points of intersections of pairs of opposite lines through $\\lbrace A'B', C'D' \\rbrace$ and $\\lbrace B'C', A'D' \\rbrace$ given $t = 1$.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 2: Image Formation and Rigid Body Transformations [20 points]\n", "\n", "In this problem we will practice rigid body transformations and image formations through the projective camera model. The goal will be to photograph the following four points \n", "$\\boldsymbol{X}_1 = [\\text{-1 -0.5 2}]^T$, $\\boldsymbol{X}_2 = [\\text{1 -0.5 2}]^T$, $\\boldsymbol{X}_3 = [\\text{1 0.5 2}]^T$, $\\boldsymbol{X}_4 = [\\text{-1 0.5 2}]^T$ in the world coordinate frame. First, recall the following formula for rigid body transformation\n", "$$\n", "\\widetilde{\\boldsymbol{X}}_{cam} = \\text{ } R\\widetilde{\\boldsymbol{X}} + \\boldsymbol{t}\n", "$$\n", "Where $\\widetilde{\\boldsymbol{X}}_{cam}$ is the point coordinate in the camera coordinate system. $\\widetilde{\\boldsymbol{X}}$ is a point in the world coordinate frame, and $\\text{R}$ and $\\boldsymbol{t}$ are the rotation and translation that transform points from the world coordinate frame to the camera coordinate frame. Together, $\\text{R}$ and $\\boldsymbol{t}$ are the $\\textit{extrinsic}$ camera parameters. Once transformed to the camera coordinate frame, the points can be photographed using the $3 \\times 3$ camera calibration matrix $\\text{K}$, which embodies the $\\textit{intrinsic}$ camera parameters, and the canonical projection matrix $[\\text{I} | \\boldsymbol{0}]$. Given $\\text{K}, \\text{R}$, and $\\boldsymbol{t}$, the image of a point $\\widetilde{\\boldsymbol{X}}$ is $\\boldsymbol{x} = \\text{K}[\\text{I} | \\boldsymbol{0}]\\boldsymbol{X}_\\text{Cam} = \\text{K}[\\text{R} | \\boldsymbol{t}]\\boldsymbol{X}$, where the homogeneous points $\\boldsymbol{X}_\\text{Cam} = (\\widetilde{\\boldsymbol{X}}_\\text{Cam}^\\top, 1)^\\top$ and $\\boldsymbol{X} = (\\widetilde{\\boldsymbol{X}}^\\top, 1)^\\top$. We will consider four different settings of focal length, viewing angles and camera positions below. \n", "\n", "a). The extrinsic transformation matrix,\n", "\n", "b). Intrinsic camera matrix under the perspective camera assumption.\n", "\n", "c). Calculate the image of the four vertices and plot using the supplied **plot_points** function (see e.g. output in figure below).\n", " \n", "1. [No rigid body transformation]. Focal length = 1. The optical axis of the camera is aligned with the z-axis.\n", "2. [Translation]. $\\boldsymbol{t} = [\\text{0 0 1}]^T$. The optical axis of the camera is aligned with the z-axis.\n", "3. [Translation and Rotation]. Focal length = 1. $\\text{}R$ encodes a 30 degrees around the z-axis and then 60 degrees around the y-axis. $\\boldsymbol{t} = [\\text{0 0 1}]^T$.\n", "4. [Translation and Rotation, long distance]. Focal length = 5. $\\text{}R$ encodes a 30 degrees around the z-axis and then 60 degrees around the y-axis. $\\boldsymbol{t} = [\\text{0 0 13}]^T$.\n", "\n", "We will not use a full intrinsic camera matrix (e.g. that maps centimeters to pixels, and defines the coordinates of the center of the image), but only parameterize this with f, the focal\n", "length. In other words: the only parameter in the intrinsic camera matrix under the perspective assumption is f.\n", "\n", "For all the four cases, include a image like above. Note that the axis are the same for each row, to facilitate comparison between the two camera models. Note: the angles and offsets used to generate these plots may be different from those in the problem statement, it's just to illustrate how to report your results.\n", "\n", "Also, Explain why you observe any distortions in the projection, if any, under this model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "# convert points from euclidian to homogeneous\n", "def to_homog(points):\n", " \"\"\"\n", " your code here\n", " \"\"\"\n", " return points_homog\n", "\n", "# convert points from homogeneous to euclidian\n", "def from_homog(points_homog):\n", " \"\"\"\n", " your code here\n", " \"\"\"\n", " return points\n", "\n", "# project 3D euclidian points to 2D euclidian\n", "def project_points(P_int, P_ext, pts):\n", " \"\"\"\n", " your code here\n", " \"\"\"\n", " #return the 2d euclidean points\n", " pts_2d=np.zeros([2,1])\n", " return pts_2d\n", "\n", "def camera1():\n", " \"\"\"\n", " replace with your code\n", " \"\"\"\n", " P_int_proj = np.eye(3,4)\n", " P_ext = np.eye(4,4)\n", " return P_int_proj, P_ext\n", "\n", "def camera2():\n", " \"\"\"\n", " replace with your code\n", " \"\"\"\n", " P_int_proj = np.eye(3,4)\n", " P_ext = np.eye(4,4)\n", " return P_int_proj, P_ext\n", "\n", "def camera3():\n", " \"\"\"\n", " replace with your code\n", " \"\"\"\n", " P_int_proj = np.eye(3,4)\n", " P_ext = np.eye(4,4)\n", " return P_int_proj, P_ext\n", "\n", "def camera4():\n", " \"\"\"\n", " replace with your code\n", " \"\"\"\n", " P_int_proj = np.eye(3,4)\n", " P_ext = np.eye(4,4)\n", " return P_int_proj, P_ext\n", "\n", "\n", "#######################################################\n", "# test code. Do not modify\n", "#######################################################\n", "\n", "def plot_points(points, title='', style='.-r', axis=[]):\n", " inds = list(range(points.shape))+\n", " plt.plot(points[0,inds], points[1,inds],style)\n", " if title:\n", " plt.title(title)\n", " if axis:\n", " plt.axis('scaled')\n", " #plt.axis(axis)\n", " \n", "def main():\n", " point1 = np.array([[-1,-.5,2]]).T\n", " point2 = np.array([[1,-.5,2]]).T\n", " point3 = np.array([[1,.5,2]]).T\n", " point4 = np.array([[-1,.5,2]]).T\n", " points = np.hstack((point1,point2,point3,point4))\n", " \n", " for i, camera in enumerate([camera1, camera2, camera3, camera4]):\n", " P_int_proj, P_ext = camera()\n", " plt.subplot(1, 2, 1)\n", " plot_points(project_points(P_int_proj, P_ext, points), title='Camera %d Projective'%(i+1), axis=[-.6,.6,-.6,.6])\n", " plt.show()\n", "\n", "main()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 3: Image Rendering [20 points]\n", "\n", "\n", "In this exercise, we will render the image of a face with two different point light sources using a Lambertian reflectance model. We will use two albedo maps, one uniform and one that is more realistic. The face heightmap, the light sources, and the two albedo are given in facedata.npy for Python (each row of the lightsource' variable encode a light location). The data from facedata.npy is already provided to you.\n", "\n", "Note: Please make good use out of subplot to display related image next to eachother.\n", "\n", "$\\textbf{3.1 Plot the face in 2-D [2 pts]}$\n", "\n", "Plot both albedo maps using imshow. Explain what you see.\n", "\n", "$\\textbf{3.2 Plot the face in 3-D [2 pts]}$\n", "\n", "Using both the heightmap and the albedo, plot the face using plot\\_surface. Do this for both albedos. Explain what you see.\n", "\n", "$\\textbf{3.3 Surface normals [8 pts]}$\n", "\n", "Calculate the surface normals and display them as a quiver plot using quiver in matplotlib.pyplot in Python. Recall that the surface normals are given by \n", "\\begin{eqnarray}\n", "[-\\frac{\\delta f}{\\delta x}, -\\frac{\\delta f}{\\delta y}, 1].\n", "\\end{eqnarray}\n", "Also, recall, that each normal vector should be normalized to unit length.\n", "\n", "$\\textbf{3.4 Render images [8 pts]}$\n", "\n", "For each of the two albedos, render three images. One for each of the two light sources, and one for both light-sources combined. Display these in a $2 \\times 3$ subplot figure with titles. Recall that the general image formation equation is given by\n", "\\begin{eqnarray}\n", "I = a(x,y) \\hat{\\boldsymbol{n}}(x, y)^\\top \\hat{\\boldsymbol{s}}(x, y)\\frac{s_0}{( d(x,y) )^{2}}\n", "\\end{eqnarray}\n", "where $a(x,y)$ is the albedo for pixel $(x, y)$, $\\hat{n}(x,y)$ is the corresponding surface normal, $\\hat{s}(x,y)$ the light source direction, $s_0$ the light source intensity, $d(x,y)$ the distance to the light. Let the light source intensity be $1$ and do $\\textit{not}$ make the distant light source assumption'.\n", "Use imshow with appropriate keyword arguments .\n" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from matplotlib.path import Path\n", "import matplotlib.patches as patches\n", "# Load facedata.npy as ndarray\n", "face_data = np.load('facedata.npy',encoding='latin1')\n", "# Load albedo matrix \n", "albedo = face_data.item().get('albedo')\n", "# Load uniform albedo matrix\n", "uniform_albedo = face_data.item().get('uniform_albedo')\n", "# Load heightmap \n", "heightmap = face_data.item().get('heightmap')\n", "# Load light source\n", "light_source = face_data.item().get('lightsource')\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }