Work Experience

I am currently working with NVIDIA. Prior to NVIDIA, I have also worked with AMD Research and Qualcomm. I have 5+ years of industry and 10+ years of professional experience. Please find a list of selected projects and public source code listings I contributed to:

CUTLASS (CUDA Templates for Linear Algebra Subroutines) [BibTex]

Gaussian complex GEMMs using 3m complex multiply algorithm targetting NVIDIA Ampere DMMA.884 tensor operations for F64 data type [Source code]

Complex GEMMs for F32 data targetting targetting NVIDIA Ampere HMMA.1688 tensor operations for Tensor Float 32 (TF32) data type [Source code]

Warp Matrix Multiply Accumulate for F16, S8 and S4 data types targetting NVIDIA Turing architecutre [Source code]

Single-stage Matrix-Multipy Accumulate (MMA) pipeline [Source code]

LLVM-inspired data structures to store operations' compile-time configuration and run-time arguments [Source code]

Cutlass profiler to ensure functional correctness and measure performance of GEMM operations [Source code]

Please see CV for more details

Research

I finished Ph.D. in Computer Science from University of California, San Diego. My expertise is in GPUs, computer architecture, and compilers. The following is a list of my publications and patents:

Ph.D. Dissertation

Software Techniques to Enhance Reliability of Emerging Compute and Memory Units [BibTeX ]

Manish Gupta (UC San Diego 2017)

Publications

Reliability-aware Data Placement for Heterogeneous Memory Architecture

Manish Gupta, Vilas Sridharan, David Roberts, Andreas Prodromou, Ashish Venkat, Dean Tullsen and Rajesh Gupta. In High-Performance Computer Architecture (HPCA 2018) [ PPT | Talk | BibTeX ]

Compiler Techniques to Reduce the Synchronization Overhead of GPU Redundant Multithreading

Manish Gupta, Daniel Lowell, John Kalamatianos, Steven Raasch, Vilas Sridharan, Dean Tullsen, Rajesh Gupta. In Design Automation Conference (DAC 2017) [ PPT | Talk | BibTeX]

ASAR: Application-Specific Approximate Recovery to Mitigate Hardware Variability

Manish Gupta, Abbas Rahimi, Daniel Lowell, John Kalamatianos, Dean Tullsen, Rajesh Gupta. In Silicon Errors in Logic – System Effects (SELSE 2017) [ PPT | Talk | BibTeX ]

Reliability and Performance Trade-off Study of Heterogeneous Memories

Manish Gupta, David Roberts, Mitesh Meswani, Vilas Sridharan, Dean Tullsen, Rajesh Gupta. In International Symposium on Memory Systems (MEMSYS 2016) [ PPT | Talk | BibTeX ]

Verifying GPU Kernels by Test Amplification

Alan Leung, Manish Gupta, Yuvraj Agarwal, Rajesh Gupta, Ranjit Jhala, Sorin Lerner. In Programming Language Design and Implementation (PLDI 2012) [BibTeX]

Patents

I have multiple patents which are waiting to be approved at United States Patent and Trademark Office (USPTO). The following is a selected list of my patents:

Performance-aware and Reliability-aware Data Placement for N-level Heterogeneous Memory Systems [BibTeX]

Manish Gupta, David Roberts, Mitesh Meswani, Vilas Sridharan, Steven Raasch, Daniel Lowell

Waterfall Counters with Application to AVF Estimation [BibTeX]

Manish Gupta, David Roberts, Vilas Sridharan

Paired Value Comparison for Redundant Multi-Threading Operations [BibTeX]

Manish Gupta, Daniel Lowell

Bufferless Communication for Redundant Multithreading using Register Permutation [BibTeX]

Manish Gupta, Daniel Lowell

News

Self-Driving Cars: Reliability Challenges, Solutions, and Social Adoption

Teaching

Basic Data Structures & OO Design, Teaching Assistant, Fall 2015. [ Eval Section A00, Eval Section B00]

Software for Embedded Systems, Teaching Assistant, Spring 2014.

Links to some useful resources

Contact

Email: mgupta dot iitr at gmail dot com
Office: NVIDIA Endeavor
2788 San Tomas Expy , CA 95051