Work Experience

I am currently working with NVIDIA. Prior to NVIDIA, I have also worked with AMD Research and Qualcomm. I have 5+ years of industry and 10+ years of professional experience. Please find a list of selected projects and public source code listings I contributed to:

CUTLASS (CUDA Templates for Linear Algebra Subroutines) [BibTex]

CUDA kernel development in CUTLASS to enable deep learning primitives

Convolution (forward and backward) kernel development for NVIDIA Ampere, Turing, and Volta architecture targeting tensor cores

Gaussian complex GEMMs using 3m complex multiply algorithm targeting NVIDIA Ampere DMMA.884 tensor operations for F64 data type [Source code]

Complex GEMMs for F32 data targeting NVIDIA Ampere HMMA.1688 tensor operations for Tensor Float 32 (TF32) data type [Source code]

Warp Matrix Multiply Accumulate for F16, S8 and S4 data types targetting NVIDIA Turing architecutre [Source code]

Single-stage Matrix-Multiply Accumulate (MMA) pipeline [Source code]

LLVM-inspired data structures to store operations' compile-time configuration and run-time arguments [Source code]

Cutlass profiler to ensure functional correctness and measure performance of GEMM operations [Source code]

Please see CV for more details


I finished Ph.D. in Computer Science from University of California, San Diego. My expertise is in GPUs, computer architecture, and compilers. The following is a list of my publications and patents:

Ph.D. Dissertation

Software Techniques to Enhance Reliability of Emerging Compute and Memory Units [BibTeX ]

Manish Gupta (UC San Diego 2017)


Reliability-aware Data Placement for Heterogeneous Memory Architecture

Manish Gupta, Vilas Sridharan, David Roberts, Andreas Prodromou, Ashish Venkat, Dean Tullsen and Rajesh Gupta. In High-Performance Computer Architecture (HPCA 2018) [ PPT | Talk | BibTeX ]

Compiler Techniques to Reduce the Synchronization Overhead of GPU Redundant Multithreading

Manish Gupta, Daniel Lowell, John Kalamatianos, Steven Raasch, Vilas Sridharan, Dean Tullsen, Rajesh Gupta. In Design Automation Conference (DAC 2017) [ PPT | Talk | BibTeX]

ASAR: Application-Specific Approximate Recovery to Mitigate Hardware Variability

Manish Gupta, Abbas Rahimi, Daniel Lowell, John Kalamatianos, Dean Tullsen, Rajesh Gupta. In Silicon Errors in Logic – System Effects (SELSE 2017) [ PPT | Talk | BibTeX ]

Reliability and Performance Trade-off Study of Heterogeneous Memories

Manish Gupta, David Roberts, Mitesh Meswani, Vilas Sridharan, Dean Tullsen, Rajesh Gupta. In International Symposium on Memory Systems (MEMSYS 2016) [ PPT | Talk | BibTeX ]

Verifying GPU Kernels by Test Amplification

Alan Leung, Manish Gupta, Yuvraj Agarwal, Rajesh Gupta, Ranjit Jhala, Sorin Lerner. In Programming Language Design and Implementation (PLDI 2012) [BibTeX]


I have multiple patents which are waiting to be approved at United States Patent and Trademark Office (USPTO). The following is a selected list of my patents:

Performance-aware and Reliability-aware Data Placement for N-level Heterogeneous Memory Systems [BibTeX]

Manish Gupta, David Roberts, Mitesh Meswani, Vilas Sridharan, Steven Raasch, Daniel Lowell

Waterfall Counters with Application to AVF Estimation [BibTeX]

Manish Gupta, David Roberts, Vilas Sridharan

Paired Value Comparison for Redundant Multi-Threading Operations [BibTeX]

Manish Gupta, Daniel Lowell

Bufferless Communication for Redundant Multithreading using Register Permutation [BibTeX]

Manish Gupta, Daniel Lowell


Self-Driving Cars: Reliability Challenges, Solutions, and Social Adoption


Basic Data Structures & OO Design, Teaching Assistant, Fall 2015. [ Eval Section A00, Eval Section B00]

Software for Embedded Systems, Teaching Assistant, Spring 2014.

Links to some useful resources


Email: mgupta dot iitr at gmail dot com
Office: NVIDIA Endeavor
2788 San Tomas Expy , CA 95051