Curriculum Vitae

Nachiket Kapre Electrical and Computer Engineering
University of Waterloo
Canada
Email: nachiket at uwaterloo dot ca

Education

Ph.D. California Institute of Technology (USA), Computer Science
Dissertation: SPICE2 - A Spatial Parallel Architecture for Accelerating the SPICE Circuit Simulator
Degree Conferred: September 2010 [Link]

M.S., California Institute of Technology (USA), Computer Science
Degree Conferred: June 2006
Thesis: Packet-Switched FPGA-Overlay Networks [Link]

M.S., California Institute of Technology (USA), Electrical Engineering
Degree Conferred: June 2005

B.E., University of Pune (India), Electronics and Telecommunication Engineering
Project: FPGA based testing system for Siemens Railway Signalling Relayss [Link] [slides] [guide]
Degree Conferred: August 2002

Research Interests

Concurrent and Spatial Architectures, Parallel Processing, Heterogeneous Architectures and Compilation Tools, Communication-Centric Design

Grants

AcRf Tier 1 Grant (Nov 2015) S$100K (1 year)
Delta Electronics Grant (Co-PI) (August 2015) S$100K (Co-PI) (2 years)
MIT SMART Innovation Grant (Co-PI) (August 2015) S$50K (Co-PI) (2 years)
NTU CELT Excellence in Education Grant (November 2014) S$37K (1 year)
AcRf Tier 1 Grant (March 2014) S$150K (2 years)
NTU CELT Excellence in Education Grant (October 2013) S$40K (1 year)
NTU CELT Excellence in Education Grant (March 2013) S$30K (1 year)
NTU CoE Competitive Seed Grant S$50K (Jan-March 2013)
NTU SCE Startup Grant S$100K (3 years)

Journal Publications

[PDF] "Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs"
Nachiket Kapre, Jan Gray
IEEE Transactions on Reconfigurable Technology and Applications (Special Issue FPL 2015)

[PDF] "Optimizing Soft Vector Processing in FPGA-based Embedded Systems"
Nachiket Kapre
IEEE Transactions on Reconfigurable Technology and Applications (Special Issue FPL 2014), Published: 2016

[PDF] "A Case for Embedded FPGA-based SoCs in Energy-Efficient Acceleration of Graph Problems"
Pradeep Moorthy, Nachiket Kapre
Supercomputing Frontiers and Innovations (Special Best Papers Issue from 2015 Supercomputing Frontiers Conference), Published: 2016

[PDF] "Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs"
Abid Rafique, George Constantinides, Nachiket Kapre
IEEE Transactions on Parallel and Distributed Systems, Jan 2015

[PDF] "SPICE2: Spatial Processors Interconnected for Concurrent Execution for accelerating the SPICE Circuit Simulator using an FPGA"
Nachiket Kapre and André DeHon Transactions in CAD (Special Issue on Parallel CAD), Volume 31 Issue 1 January 2012

[PDF] "Spatial Hardware Implementation for Sparse Graph Algorithms in GraphStep"
Michael deLorimier, Nachiket Kapre, Nikil Mehta and André DeHon
ACM Transactions on Autonomous and Adaptive Systems: Spatial Computing Special Issue, September 2011

[PDF] "An NoC Traffic Compiler for efficient FPGA implementation of Sparse Graph-Oriented Workloads"
Nachiket Kapre and André DeHon
International Journal of Reconfigurable Computing Volume 2011 Article ID 745147

[PDF] "Pipelined Saturated Accumulation"
Karl Papadantonakis, Nachiket Kapre, Stephanie Chan, and André DeHon
IEEE Transactions on Computers, February 2009.

Conference/Workshop Publications (Full papers)

[PDF] "On Bit-Serial NoCs for FPGAs"
Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2017
(upcoming)

[PDF] "Implementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades"
Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2017
(upcoming)

[PDF] "eBSP: Managing NoC traffic for BSP workloads on the 16-core Adapteva Epiphany-III Processor"
Siddhartha, Nachiket Kapre
Design, Automation, and Test in Europe, March 2017

[PDF] "Deflection Routing for Multi-Level FPGA Overlay NoCs"
Chethan Kumar H B, Shubham Agarwal, Nachiket Kapre
International Conference on Field-Programmable Technology, December 2016

[PDF] "Preventive Detection of Mosquito Populations using Embedded Machine Learning on Low Power IoT Platforms"
Prashant Ravi, Uma Syam, Nachiket Kapre
Seventh ACM Symposium on Computing and Development, Nov 2016

[PDF] "CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms" (Best Paper Award)
Gopalakrishna Hegde, Siddhartha, Nachiappan Ramasamy, Nachiket Kapre
International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Oct 2016

[PDF] "Hoplite-DSP: Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs"
Chethan Kumar H B, Nachiket Kapre
26th International Conference on Field-Programmable Logic and Applications, Sep 2016

[PDF] "Boosting Convergence of Timing Closure using Feature Selection in a Learning-Driven Approach"
Que Yanghua, Harnhua Ng, Nachiket Kapre
26th International Conference on Field-Programmable Logic and Applications, Sep 2016

[PDF] "Survey of Domain-Specific Languages for FPGA Computing"
Nachiket Kapre, Samuel Bayliss
26th International Conference on Field-Programmable Logic and Applications, Sep 2016

[PDF] "Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs"
Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2016

[PDF] "GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths"
Nachiket Kapre, Ye Deheng
International Symposium on Field-Programmable Gate Arrays, Feb 2016

[PDF] "Hoplite: Building Austere Overlay NoCs for FPGAs" (Best Paper Award)
Nachiket Kapre, Jan Gray
25th International Conference on Field-Programmable Logic and Applications, Sep 2015

[PDF] "Limits of FPGA Acceleration of 3D Green’s Function Computation for Geophysical Applications"
Nachiket Kapre, Selvakumar Jayakrishnan, Parjanya Gupta, Sagar Masuti, Sylvain Barbot
25th International Conference on Field-Programmable Logic and Applications, Sep 2015

[PDF] "Custom FPGA-based Soft-Processors for Sparse Graph Acceleration"
Nachiket Kapre
26th IEEE International Conference on Application-specific Systems, Architectures and Processors, July 2015

[PDF] "GraphMMU: Memory Management Unit for Sparse Graph Accelerators"
Nachiket Kapre, Han Jianglei, Andrew Bean, Pradeep Moorthy, Siddhartha
22nd Reconfigurable Architectures Workshop, 2015 (co-located with IPDPS 2015), May 2015

[PDF] "Enhancing Speedups for FPGA Accelerated SPICE through Frequency Scaling and Precision Reduction"
Nachiket Kapre, Lim Hui Hui
22nd Reconfigurable Architectures Workshop, 2015 (co-located with IPDPS 2015), May 2015

[PDF] "Zedwulf: Power-Performance Tradeoffs of a 32-node Zynq SoC cluster"
Pradeep Moorthy, Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2015

[PDF] "Driving Timing Convergence of FPGA Designs through Machine Learning and Cloud Computing"
Nachiket Kapre, Bibin Chandrashekaran, Harnhua Ng, Kirvy Teo
International Symposium on Field-Programmable Custom Computing Machines, May 2015

[PDF] "Energy-Efficient Acceleration of OpenCV Saliency Computation using Soft Vector Processors"
Gopalakrishna Hegde, Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2015

[PDF] "On Data Forwarding in Deeply Pipelined Soft Processor"
Hui Yan Cheah, Suhaib A. Fahmy and Nachiket Kapre
International Symposium on Field-Programmable Gate Arrays, February 2015

[PDF] "Relax-Miracle: GPU Parallelization of Semi-Analytic Fourier-Domain solvers for Earthquake Modeling"
Sagar Masuti, Sylvain Barbot, and Nachiket Kapre
International Conference on High Performance Computing, December 2014

[PDF] "Comparing Soft and Hard Vector Processing in FPGA-based Embedded Systems" (Best Paper Nominee)
Soh Jun Jie, and Nachiket Kapre
International Conference on Field-Programmable Logic and Applications, September 2014

[PDF] "Limits of Statically Scheduled Token Dataflow Processing"
Nachiket Kapre, and Siddhartha
4th International Workshop on Data-Flow Execution Models for Extreme Scale Computing (co-located with PACT 2014), August 2014

[PDF] "System-Level FPGA Device Driver with High-Level Synthesis Support"
Vipin Kizhepatt, Shreejit Shanker, Dulitha Gunasekara, Suhaib A Fahmy, Nachiket Kapre
International Conference on Field-Programmable Technology, December 2013

[PDF] "Exploiting Input Parameter Uncertainty for Reducing Datapath Precision of SPICE Device Models"
Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, April 2013

[PDF] "Application Composition and Communication Optimization of Iterative Solvers using FPGAs" (HiPEAC Paper Award)
Abid Rafique, Nachiket Kapre and George Constantinides
International Symposium on Field-Programmable Custom Computing Machines, April 2013

[PDF] "Enhancing Performance of Tall-Skinny QR factorization using FPGAs"
Abid Rafique, Nachiket Kapre and George Constantinides
International Conference on Field-Programmable Logic and Applications, August 2012

[PDF] "FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models using Gappa++"
Helene Martorell and Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, April 2012

[PDF] "VLIW-SCORE: Beyond C for Sequential Control of SPICE FPGA Acceleration" (Best Paper Award)
Nachiket Kapre and André DeHon
International Conference on Field-Programmable Technology, December 2011

[PDF] "SPICE2 - A Spatial Parallel Architecture for Accelerating the SPICE Circuit Simulator"
Nachiket Kapre and André DeHon
The First Workshop on the Intersections of Computer Architecture and Reconfigurable Logic, December 2010

[PDF] "An NoC Traffic Compiler for efficient FPGA implementation of Sparse Graph-Oriented Workloads"
Nachiket Kapre and André DeHon
Reconfigurable Communication-centric Systems on Chip, May 2010

[PDF] "Parallelizing Sparse Matrix-Solve for SPICE Circuit Simulation using FPGAs"
Nachiket Kapre and André DeHon
International Conference on Field-Programmable Technology, December 2009

[PDF] "Performance Comparison of Single-Precision SPICE Model-Evaluation on FPGA, GPU, Cell, and Multi-Core Processors"
Nachiket Kapre and André DeHon
International Conference on Field-Programmable Logic and Applications, September 2009

[PDF] "Accelerating SPICE Model-Evaluation using FPGAs"
Nachiket Kapre and André DeHon
IEEE Symposium on Field-Programmable Custom Computing Machines, April 2009

[PDF] "Optimistic Parallelization of Floating-Point Accumulation"
Nachiket Kapre and André DeHon
IEEE Symposium on Computer Arithmetic, June 2007.

[PDF] "Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks" (FCCM20 25-most influential papers award winner)
Nachiket Kapre, Nikil Mehta, Michael deLorimier, Raphael Rubin, Henry Barnor, Michael Wilson, Michael Wrighton and André DeHon
IEEE Symposium on Field-Programmable Custom Computing Machines, April 2006.

[PDF] "GraphStep: A System Architecture for Sparse Graph Algorithms"
Michael deLorimier, Nachiket Kapre, Nikil Mehta, Dominic Rizzo, Ian Eslick, Raphael Rubin, Tomas Uribe, Thomas Knight Jr., and André DeHon
IEEE Symposium on Field-Programmable Custom Computing Machines, April 2006.

[PDF] "Pipelined Saturated Accumulation"
Karl Papadantonakis, Nachiket Kapre, Stephanie Chan, and André DeHon
International Conference on Field-Programmable Technology, December 2005.

[PDF] "Design Patterns for Reconfigurable Computing"
André DeHon, Joshua Adams, Michael deLorimier, Nachiket Kapre, Yuki Matsuda, Helia Naeimi, Michael Vanier, and Michael Wrighton
IEEE Symposium on Field-Programmable Custom Computing Machines, April 2004.

Conference Publications (Short Papers)

[PDF] "120-core microAptiv MIPS Overlay for the Terasic DE5-NET FPGA board"
Chethan Kumar H B, Gourav Modi, Prashant Ravi, Nachiket Kapre
International Symposium on Field-Programmable Gate Arrays, Feb 2017

[PDF] "Vector Acceleration of 1-D DWT Computations using Sparse Matrix Skeletons"
Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket Kapre
26th International Conference on Field-Programmable Logic and Applications, Sep 2016

[PDF] "Improving Classification Accuracy of a Machine Learning approach for FPGA Timing Closure"
Que Yanghua, Nachiket Kapre, Harnhua Ng, Kirvy Teo
International Symposium on Field-Programmable Custom Computing Machines, May 2016

[PDF] "Case for Design-Specific Machine Learning in Timing Closure of FPGA Designs"
Que Yanghua, Chinnakkannu Adaikkal Raj, Harnhua Ng, Kirvy Teo, and Nachiket Kapre
International Symposium on Field-Programmable Gate Arrays, Feb 2016

[PDF] "InTime: A Machine Learning Approach for Efficient Selection of FPGA CAD Tool Parameters"
Nachiket Kapre, Harnhua Ng, Kirvy Teo and Jaco Naude
International Symposium on Field-Programmable Gate Arrays, February 2015

[PDF] "Fanout Decomposition Dataflow Optimizations for FPGA-based Sparse LU Factorization"
Siddhartha, and Nachiket Kapre
International Conference on Field-Programmable Technology, December 2014

[PDF] "Analysis and Optimization of a Deeply Pipelined FPGA Soft Processor"
Hui Yan Cheah, Suhaib A. Fahmy and Nachiket Kapre
International Conference on Field-Programmable Technology, December 2014

[PDF] "Heterogeneous Dataflow Architectures for FPGA-based Sparse LU Factorization"
Siddhartha, and Nachiket Kapre
International Conference on Field-Programmable Logic and Applications, September 2014

[PDF] "Breaking Sequential Dependencies in FPGA-based Sparse LU Factorization"
Siddhartha, and Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2014

[PDF] "MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations"
Ye Deheng, and Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2014

[PDF] "Timing Fault Detection in FPGA-based Circuits"
Edward Stott, Joshua M. Levine, Peter Y. K. Cheung, and Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2014

Posters

"Evaluating Embedded FPGA Accelerators for Deep Learning Applications"
Gopalakrishna Hegde, Siddhartha, Nachiappan Ramasamy, Vamsi Buddha, Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2016

"Communication Optimization for the 16-core Epiphany Floating-Point Processor Array"
Siddhartha, Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2016

"Machine-Learning driven Auto-Tuning of High-Level Synthesis for FPGAs"
Li Ting, Harri Sapto Wijaya, and Nachiket Kapre
International Symposium on Field-Programmable Gate Arrays, Feb 2016

"Sparse Graph Processing using Soft-Processors"
Nachiket Kapre
International Symposium on Field-Programmable Custom Computing Machines, May 2015

"FPGA Acceleration of Irregular Iterative Computations using Criticality-Aware Dataflow Optimizations"
Siddhartha, and Nachiket Kapre
International Symposium on Field-Programmable Gate Arrays, February 2015

[PDF] "Measuring Timing Errors in FPGA-based Circuits"
Joshua Levine, Edward Stott, and Nachiket Kapre
The 10th IEEE Workshop on Silicon Errors in Logic - System Effects, April 2014

Magazine Articles

[PDF] "Saliency on a chip: a digital approach with an FPGA"
Nachiket Kapre, Dirk Walther, and Christof Koch, and André DeHon
The Neuromorphic Engineer, Volume 1, Issue 2, Autumn 2004

Book Chapters

"Accelerating the SPICE Circuit Simulator using an FPGA - A Case Study"
Nachiket Kapre and André DeHon
From High-Performance Computing using FPGAs
Page 389-427,
Edited by Wim Vanderbauwhere and Khaled Benkrid
Published by Springer, Copyright 2013, ISBN-13: 978-1-4614-1790-3

"Programming FPGA Applications in VHDL"
Nachiket Kapre and André DeHon
From Reconfigurable Computing: The Theory and Practice of FPGA-based Computation,
Pages 129-153,
Edited by Scott Hauck and André DeHon,
Published by Morgan Kauffman/Elsevier, Copyright 2008, ISBN-13: 978-0-12-370522-8

Selected Talks

"A Case for Embedded FPGA-based SoCs for Energy-Efficient Acceleration of Graph Problems"
Pradeep Moorthy, Siddhartha, Nachiket Kapre
Supercomputing Frontiers 2015, March 2015

"SPICE2- A Spatial Parallel Architecture for Accelerating the SPICE Circuit Simulator - Retrospective and Vision"
Talk at Maxeler Inc., University of Glasgow, University of York, Oxford, University of Southampton, National University of Singapore, Mahankorn University of Technology, 2010-2013.

"Spatial SPICE Mapping and Lessons"
Vancouver, Canada Talk at the University of British Columbia (UBC), August 2010.

"Accelerating the SPICE Circuit Simulator using FPGAs"
Bengaluru (Bangalore), India Invited Talk at the Indian Institute of Science (IISC), March 2010.

"Accelerating the SPICE Circuit Simulator using FPGAs"
Austin, USA Invited Talk at IBM Inc., August 2009.

"Accelerating SPICE Model-Evaluation using FPGAs"
San Jose, USA. Invited Talk at Xilinx Inc., February 2009.

"Exploiting Application Structure in On-Chip Network Design"
Invited Talk at University of Gent, Belgium and TU Munich, Germany, July-August 2007.

Patents

"Method and a circuit using an associative calculator for calculating a sequence of non-associative operations", André DeHon and Nachiket Kapre
Publication Number US7991817 B2, Applied Jan 2007, Granted Aug 2011.

Advising

NTU PhD
Siddhartha (2013-present): Dataflow Computing using FPGAs

NTU MSc
Chethan Kumar Basavaraju (2015): FPGA NoCs
Jayakrishnan Selva Kumar (2014): Maxeler Applications
Venugopal Swetha (2014): GPU Monte-Carlo Applications
Chinnakkannu Adaikkala Raj (2014): Machine Learning in FPGA CAD
Jianrong, Kiran Ganapathi, Kunal Gokhale (2014): Misc Topics
Kanchan Kaur, Shipeng Xu (2013): FPGA Placement/Routing

NTU UG (Final Year Projects)
Shubham Agarwal (2015): FPGA NoCs
Que Yanghua (2015): Machine Learning
Dakshina Pradeep Moorthy (2014): Parallel Graph Accelerators
Han Jianglei (2014): Parallel Graph Accelerators
Soh Jun Jie (2013-14): Vectorblox
Favian (2013-14): 3D Convolution using FPGAs
Lim Hui Hui (2013): SPICE Fault Tolerance

Imperial PhD, MSc, MEng, BEng and Interns
Andrew Bean (PhD student 2011-2016): Adaptive/Learning Systems using FPGAs
Abid Rafique (PhD student 2010-2013): Accelerating Semi-Definite Programming with FPGAs, GPUs and Multi-Cores
Siddhartha, Dulitha Gunasekara (BEng/MEng students 2011-2012): Different topics
Helene Martorell, Emmanouil Spanakis, Fang Zhou, Wei Lizhong (MSc students 2010-2011): Different topics
Coryan Wilson-Shah (UROP student 2011): Matrix-Free SPICE
Cody Huang (CAPA intern, UC Davis undergraduate 2011): GPU Code-Generation

Caltech Undergraduates and Summer Students
Henry Barnor (2005, now at Altera): VHDL Design of systolic hardware sorter/placer
Stephanie Chan (2005, now at NIST): Experiments on saturating accumulator
Ravi Teja Sukhavasi (2006, Caltech graduate student): Applying network-coding ideas to message traffic between parallel compute elements
Jon Ramirez (2006): Floating-point associative accumulator

Corporate Collaborations
Harnhua Ng, Kirvy Teo (Plunify): Machine-Learning for FPGA CAD
Jacob Bower (Maxeler): Maxeler Compiler Framework
Kumiko Nomura (Toshiba): Architecture analysis of 3D chips

Teaching Experience

Lecturer
Semester 1 2015, Nanyang Technological University, CE4052/ES6152: Embedded System Development
Semester 2 2014, Nanyang Technological University, CE4054/ES6154: Programmable Systems-on-Chip
Semester 1 2014, Nanyang Technological University, CE4052/ES6152: Embedded System Development
Semester 2 2013, Nanyang Technological University, CE4054/ES6154: Programmable Systems-on-Chip
Semester 1 2013, Nanyang Technological University, CE7451: Research Methods in Computer Science & Engineering
Semester 1 2013, Nanyang Technological University, CE4052/ES6152: Embedded Systems Development
Semester 1 2013, Nanyang Technological University, ES7501: Electronic Design Automation

Tutorials/Labs
Semester 1 2015, Nanyang Technological University, CE3001: Advanced Computer Architecture
Semester 1 2015, Nanyang Technological University, CE4052/ES6152: Embedded Systems Development
Semester 2 2014, Nanyang Technological University, CE4054/ES6154: Programmable System-on-Chip
Semester 1 2014, Nanyang Technological University, CE1005: Digital Logic (3 groups)
Semester 1 2014, Nanyang Technological University, CE4052/ES6152: Embedded Systems Development
Semester 2 2013, Nanyang Technological University, CE1005: Digital Logic (1 group)
Semester 1 2013, Nanyang Technological University, CE4052/ES6152: Embedded Systems Development

Guest Lectures
Fall 2011, Imperial College London, ISE2: Computer Architecture
Winter 2011, Imperial College London, DoC: Custom Computing

Teaching Assistant
Spring 2007, University of Pennsylvania, Electrical and Systems Engineering, ESE680s2: Computer Organization
Winter 2006, California Institute of Technology, Computer Science, CS137: Electronic Digital Automation

Professional Experience

Nanyang Technological University, Assistant Professor (Oct 2012-Sep 2016)
Plunify, Inc., Chief Technology Officer (July 2014-)
Imperial College London, Junior Research Fellow (October 2010-September 2012)
Maxeler Inc., Consultant (July 2011-July 2012)
University of Pennsylvania, Visiting Graduate Student (October 2006-present)
Xilinx Inc., Summer Intern (Summer 2005)
Koch Lab (Caltech), Research Assistant (February 2004 to September 2004)
Paxonet Communications Inc. (now Conexant), Employee (August 2002 to August 2003)
Siemens Inc., Part-Time Intern (2002).

Copyright for all the PDF papers hosted here belongs to IEEE, or ACM as appropriate.