Embedded vision, Embedded Systems, Parallel computing, FPGA/GPU/CPU systems, High-Level Synthesis, Applications of FPGA focused on computer vision, Bigdata, machine Learning, data compression and cloud computing.
I interned under Dr. Joo-Young Kim at Computer Architecture Group of Microsoft Research in Summer 2013. I designed and implemented the canonical Huffman encoding as part of Compression Accelerators project. The goal of this project was to design a real-time and efficient lossless data compression system on a low power device. The prototype of the system was implemented on three different platforms: Zynq device using Vivado HLS, ARM processor, and NIOS soft core processor.
I interned under Dr. Juanjo Noguera and Dr. Stephen Neuendorffer at Xilinx Research Labs, Dublin in Summer 2011. I designed a face detection system using the high-level synthesis tool Vivado HLS (formerly named as AutoESL). The face detection algorithm was based on the Viola and Jones algorithm. I created an efficient architecture (with help of my supervisors) for the integral image calculation part and the classification part of the algorithm. The final system achieved throughput of processing 1 pixel every clock cycle for the integral image calculation and cascade classification parts. This performance was the same as manually designed RTL.
|See Other Projects|
A Complete and Real-Time Face Recognition System on an FPGA
What is a complete face recognition system ?
We define a complete face recognition system as a system which interfaces with a video source, detects all face(s) images in each frame, and sends only the detected face images to face recognition subsystem which in turn identifies the detected face images.
What does a Complete Face Recognition System includes? (Architecture)? It should have face detection module which detects face(s) in each frame, and face recognition module which identifies detected face images from detection module as a name or identity number. The overall architecture of a Complete Face Recognition System is shown in the following figure.
How is the system works?
Current system is implemented on a Virtex-5 FPGA. It has a camera which is attached to the FPGA board. The FPGA reads 640×480 size frame and stores in a blockRAM of an FPGA. Then the face detection subsystem detects faces in current frame. We used a face detection system implemented here. It is based on Viola-Jones object detection algorithm and uses Haar features from OpenCV distribution. Detected face(s) are sent to the Face Recognition subsystem which identifies face as a person number. Based on the person number, we draw a box around the face in the frame which shows on the display. (1=blue=John, 2=Bob=green..etc). The face recognition subsystem uses Eigenface face recognition algorithm. The following picture shows current set-up of the system implementation.
|Resolve: Computer Generation of High-Performance Sorting Architectures from High-Level Synthesis|
|Janarbek Matai, Dustin Richmond, Dajung Lee, Zac Blair, Qiongzhi Wu, Amin Abazari, and Ryan Kastner|
|International Symposium on Field Programmable Gate Arrays (FPGA), February 2016 – Full Paper Acceptance Rate 20/105 = 19%|
|Composable, Parameterizable Templates for High Level Synthesis|
|Janarbek Matai, Dajung Lee, Alric Althoff and Ryan Kastner|
|Design Automation and Test in Europe (DATE), March 2016 (pdf) – Full Paper Acceptance Rate 199/829 = 24%|
|Quantifying Timing-Based Information Flow in Cryptographic Hardware|
|Baolei Mao, Wei Hu, Alric Althoff, Janarbek Matai, Jonathan Valamehr, Timothy Sherwood, Dejun Mu, and Ryan Kastner|
|IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November 2015.|
|Real-time 3D Reconstruction for FPGAs: A Case Study for Evaluating the Performance, Area, and Programmability Trade-offs of the Altera OpenCL|
|Quentin Gautier, Alexandria Shearer, Janarbek Matai, Dustin Richmond, Pingfan Meng, and Ryan Kastner|
|International Conference on Field-Programmable Technology (FPT), December 2014|
|BibTeX Slides (coming soon)|
|Enabling FPGAs for the Masses|
|Janarbek Matai, Dustin Richmond, Dajung Lee, Ryan Kastner|
|First International Workshop on FPGAs for Software Programmers (FSP 2014), September 2014|
|High Throughput Channel Tracking for JTRS Wireless Channel Emulation|
|Dajung Lee, Janarbek Matai, Brad Weals, and Ryan Kastner|
|International Conference on Field Programmable Logic and Applications (FPL), September 2014|
|Energy Efficient Canonical Huffman Encoding|
|Janarbek Matai, Joo-Young Kim and Ryan Kastner|
|IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), June 2014 – Full Paper Acceptance Rate 22/85 = 25.9%|
|A Low-Power AdaBoost-Based Object Detection Processor Using Haar-Like Features|
|Motoki Kimura, Janabek Matai, Matthew Jacobsen, Ryan Kastner|
|IEEE International Conference on Consumer Electronics (ICCE-Berlin), September 2013|
|Designing a Hardware in the Loop Wireless Digital Channel Emulator for Software Defined Radio|
|Janarbek Matai, Pingfan Meng, Lingjuan Wu, Brad Weals, and Ryan Kastner|
|International Conference on Field-Programmable Technology (FPT2012), December 2012 - Acceptance Rate: 24/114 ≈ 21.1%|
|Trimmed VLIW: Moving Application Specific Processors Towards High Level Synthesis|
|Janarbek Matai, Jason Oberg, Ali Irturk, Taemin Kim, and Ryan Kastner|
|The Electronic System Level Synthesis Conference (ESLsyn 2012)|
|Design and Implementation of an FPGA-based Real-Time Face Recognition System|
|Janarbek Matai, Ali Irturk, and Ryan Kastner|
|IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM2011), May 2011 - Acceptance Rate: 42/119 = 35.3%|
|Simulate and Eliminate: A Top-to-Bottom Design Methodology for Automatic Generation of Application Specific Architectures|
|Ali Irturk, Janarbek Matai, Jason Oberg, Jeffrey Su and Ryan Kastner,|
|IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, issue 8, August 2011|
|Full list of publications|