Ali U. Irturk
Ph.D. at University of California, San Diego
M.A. and M.S. at University of California, Santa Barbara
B.A. and B.S. at Turkish Naval Academy
Matrix Inversion
Explicit matrix inversion (analytic method) of a full matrix is a computationally intensive method. If the inversion is encountered, one should consider converting this problem into an easy decomposition problem which will result in analytic simplicity and computational convenience.Decomposition methods are generally viewed as the preferred methods for matrix inversion because they scale well for large matrix dimensions while the complexity of the analytic method increases dramatically as the matrix dimensions grow. However, for small matrices, the analytic method, which can exploit a significant amount of parallelism, outperforms the decomposition methods. Also note that Cholesky and LU decompositions work only with positive definite and nonsingular diagonally dominant square matrices, respectively. QR decomposition, on the other hand, is more general and can be applied to any matrix.
Design space exploration for matrix inversion architectures can be divided into two parts: inflection point analysis and architectural design alternatives analysis. We also compared our results with the previously published papers.
Inflection Point Analysis:We present different execution results, serial and parallel, for different bit widths and matrix dimensions to answer: at what matrix size does an inflection point occur and how does varying bit width and degree of parallelism change the inflection point? The comparisons for sequential and parallel executions of matrix decomposition and inversion methods are shown in Figure below (a, b, c, d) with different bit widths: 16, 32 and 64. Square, spade and triangle represent QR, LU and Cholesky methods respectively. Solid, dashed and smaller dashed lines represent 64, 32 and 16 bits of bit widths respectively. The balloons denote the inflection points between these methods for the different bit widths.
Different design space exploration: inflection point analyses, of our tool. Top row: decomposition methods only. Bottom row: matrix inversion.
Different design space exploration examples, specifically area and throughput results of different bit width and matrix dimensions, of our tool.
Comparisons: We provide a comparison between our results and previously published implementations for 4 × 4 matrices in Table below. We present all of our implementations with bit width 20 as this is the largest bit width value used in the related works. Though it is difficult to make direct comparisons between our designs and those of the related works (because we used fixed point arithmetic instead of floating point arithmetic and fully used FPGA resources (like DSP48s) instead of LUTs), we observe that our results are comparable. The main advantages of our implementation are that it provides the designer the ability to study the tradeoffs between architectures with different design parameters and provides a means to find an optimal design.
| Ref [1] | Ref [1] | GUSTO Impl A |
GUSTO Impl B |
GUSTO Impl C |
Ref [2] | Ref [3] | GUSTO | GUSTO | GUSTO | |
|---|---|---|---|---|---|---|---|---|---|---|
| Method | Analytic | Analytic | Analytic | Analytic | Analytic | QR | QR | QR | Cholesky | LU |
| Bit width | 16 | 20 | 20 | 20 | 20 | 12 | 20 | 20 | 20 | 20 |
| Data type | floating | floating | fixed | fixed | fixed | fixed | floating | fixed | fixed | fixed |
| Device type | Virtex 4 | Virtex 4 | Virtex 4 | Virtex 4 | Virtex 4 | Virtex 2 | Virtex 4 | Virtex 4 | Virtex 4 | Virtex 4 |
| Slices | 1561 | 2094 | 702 | 1400 | 2808 | 4400 | 9117 | 3584 | 2719 | 3682 |
| DSP48s | 0 | 0 | 4 | 8 | 16 | NR | 22 | 12 | 12 | 12 |
| BRAMs | NR | NR | 0 | 0 | 0 | NR | NR | 1 | 1 | 1 |
| Throughput | 1.04 | 0.83 | 0.38 | 0.72 | 1.3 | 0.28 | 0.12 | 0.26 | 0.33 | 0.25 |
Table Comparisons between our results and previously published papers. NR denotes not reported.
[1] J. Eilert, D. Wu, D. Liu, "Efficient Complex Matrix Inversion for MIMO Software Defined Radio", IEEE International Symposium on Circuits and Systems. (2007) 2610 - 2613.
[2] F. Edman, V. Owall, "A Scalable Pipelined Complex Valued Matrix Inversion Architecture", IEEE International Symposium on Circuits and Systems. (2005) 4489 - 4492.
[3] M. Karkooti, J.R. Cavallaro, C. Dick, "FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm", Conference Record of the Thirty-Ninth Asilomar Conference on Signals, Systems and Computers (2005) 1625 - 162.