About
I am a Systems Research Engineer in the SRG group at Google. I design novel accelerator architectures and interfaces to integrate them in the system-on-chip.
Prior to Google, I received my Ph.D. in Computer Science from the University of California Los Angeles, advised by Prof. Tony Nowatzki. My research is about rethinking the accelerator design to achieve general-purpose acceleration using reconfigurable architecture. My work has been recognized with ACM SIGARCH and IEEE-CS TCCA Outstanding Dissertation Award Honorable Mention, two IEEE Micro Top Picks awards, IEEE Micro Honorable Mention Award and an Outstanding Graduate Student award.
Before joining UCLA, I did my undergraduate studies at the Indian Institute of Technology Roorkee in Electronics and Communication Engineering with minors in Computer Science (2017). During my undergraduate, I worked with Prof. Onur Mutlu on designing memory scheduling techniques to mitigate inter-application interference. You can find my CV here.PhD Work
During my PhD, I designed programmable accelerator for irregular workloads, called Sparse Processing Unit (SPU), by finding fundamental data-dependence forms in workloads spanning machine learning, graph processing, and databases. We enhanced SPU's flexibility using "TaskFlow" execution model for supporting fine-grained dynamic parallelism efficiently, making it suitable for graph processing workloads. We proposed PolyGraph accelerators that implements the TaskFlow execution model. My works on SPU and PolyGraph received IEEE Micro Top Picks Awards which is given to 12 papers in the field each year, based on novelty and the potential for long-term impact. These awards were given for developing a systematic understanding of large hardware-software co-design space. Our recent work, "TaskStream: Accelerating Task Parallel Workloads by Recovering Program Structure" develops a unified task-dataflow execution and programming model such that we can program SPU and PolyGraph workloads in TaskStream while exploiting the performance advantage of dynamic work distribution for medium granularity tasks. The framework for my research is open-source.
During my internships, I gained experience in performance characterization of various industrial architectures -- Configurable Spatial Accelerator (CSA) at Intel, Edge Tensor Processing Unit (TPU) accelerator for ML at Google, and Microsoft's Azure Synapse Spark for analytical database processing.