Embedded vision, Embedded Systems, Parallel computing, FPGA/GPU/CPU systems, High-Level Synthesis, Applications of FPGA focused on computer vision, Bigdata, machine Learning, data compression and cloud computing.

I interned under Dr. Joo-Young Kim at Computer Architecture Group of Microsoft Research in Summer 2013. I designed and implemented the canonical Huffman encoding as part of Compression Accelerators project. The goal of this project was to design a real-time and efficient lossless data compression system on a low power device. The prototype of the system was implemented on three different platforms: Zynq device using Vivado HLS, ARM processor, and NIOS soft core processor.

I interned under Dr. Juanjo Noguera and Dr. Stephen Neuendorffer at Xilinx Research Labs, Dublin in Summer 2011. I designed a face detection system using the high-level synthesis tool Vivado HLS (formerly named as AutoESL). The face detection algorithm was based on the Viola and Jones algorithm. I created an efficient architecture (with help of my supervisors) for the integral image calculation part and the classification part of the algorithm. The final system achieved throughput of processing 1 pixel every clock cycle for the integral image calculation and cascade classification parts. This performance was the same as manually designed RTL.

What is a complete face recognition system ?

We define a complete face recognition system as a system which interfaces with a video source, detects all face(s) images in each frame, and sends only the detected face images to face recognition subsystem which in turn identifies the detected face images.
What does a Complete Face Recognition System includes? (Architecture)? It should have face detection module which detects face(s) in each frame, and face recognition module which identifies detected face images from detection module as a name or identity number. The overall architecture of a Complete Face Recognition System is shown in the following figure.
Responsive image

How is the system works?

Current system is implemented on a Virtex-5 FPGA. It has a camera which is attached to the FPGA board. The FPGA reads 640×480 size frame and stores in a blockRAM of an FPGA. Then the face detection subsystem detects faces in current frame. We used a face detection system implemented here. It is based on Viola-Jones object detection algorithm and uses Haar features from OpenCV distribution. Detected face(s) are sent to the Face Recognition subsystem which identifies face as a person number. Based on the person number, we draw a box around the face in the frame which shows on the display. (1=blue=John, 2=Bob=green..etc). The face recognition subsystem uses Eigenface face recognition algorithm. The following picture shows current set-up of the system implementation.
Resolve: Computer Generation of High-Performance Sorting Architectures from High-Level Synthesis
Janarbek Matai, Dustin Richmond, Dajung Lee, Zac Blair, Qiongzhi Wu, Amin Abazari, and Ryan Kastner
International Symposium on Field Programmable Gate Arrays (FPGA), February 2016 – Full Paper Acceptance Rate 20/105 = 19%
BibTeX   Slides
Composable, Parameterizable Templates for High Level Synthesis
Janarbek Matai, Dajung Lee, Alric Althoff and Ryan Kastner
Design Automation and Test in Europe (DATE), March 2016 (pdf) – Full Paper Acceptance Rate 199/829 = 24%
BibTeX   Slides
Quantifying Timing-Based Information Flow in Cryptographic Hardware
Baolei Mao, Wei Hu, Alric Althoff, Janarbek Matai, Jonathan Valamehr, Timothy Sherwood, Dejun Mu, and Ryan Kastner
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November 2015.
Real-time 3D Reconstruction for FPGAs: A Case Study for Evaluating the Performance, Area, and Programmability Trade-offs of the Altera OpenCL
Quentin Gautier, Alexandria Shearer, Janarbek Matai, Dustin Richmond, Pingfan Meng, and Ryan Kastner
International Conference on Field-Programmable Technology (FPT), December 2014
BibTeX   Slides (coming soon)
Enabling FPGAs for the Masses
Janarbek Matai, Dustin Richmond, Dajung Lee, Ryan Kastner
First International Workshop on FPGAs for Software Programmers (FSP 2014), September 2014
BibTeX   Slides
High Throughput Channel Tracking for JTRS Wireless Channel Emulation
Dajung Lee, Janarbek Matai, Brad Weals, and Ryan Kastner
International Conference on Field Programmable Logic and Applications (FPL), September 2014
BibTeX   Slides
Energy Efficient Canonical Huffman Encoding
Janarbek Matai, Joo-Young Kim and Ryan Kastner
IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), June 2014 – Full Paper Acceptance Rate 22/85 = 25.9%
BibTeX   Slides
A Low-Power AdaBoost-Based Object Detection Processor Using Haar-Like Features
Motoki Kimura, Janabek Matai, Matthew Jacobsen, Ryan Kastner
IEEE International Conference on Consumer Electronics (ICCE-Berlin), September 2013
BibTeX   Slides
Designing a Hardware in the Loop Wireless Digital Channel Emulator for Software Defined Radio
Janarbek Matai, Pingfan Meng, Lingjuan Wu, Brad Weals, and Ryan Kastner
International Conference on Field-Programmable Technology (FPT2012), December 2012 - Acceptance Rate: 24/114 ≈ 21.1%
BibTeX   Slides
Trimmed VLIW: Moving Application Specific Processors Towards High Level Synthesis
Janarbek Matai, Jason Oberg, Ali Irturk, Taemin Kim, and Ryan Kastner
The Electronic System Level Synthesis Conference (ESLsyn 2012)
BibTeX   Slides
Design and Implementation of an FPGA-based Real-Time Face Recognition System
Janarbek Matai, Ali Irturk, and Ryan Kastner
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM2011), May 2011 - Acceptance Rate: 42/119 = 35.3%
BibTeX   Slides
Simulate and Eliminate: A Top-to-Bottom Design Methodology for Automatic Generation of Application Specific Architectures
Ali Irturk, Janarbek Matai, Jason Oberg, Jeffrey Su and Ryan Kastner,
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, issue 8, August 2011
Full list of publications