ASB 10817
Applied Science Building
8888 University Drive
Simon Fraser University
Burnaby, BC V5A-1S6
Email: akhil_baranwal [at] sfu.ca
I am a MASc student in Computer Engineering at Simon Fraser University, advised by Prof. Zhenman Fang.
Prior to SFU, I have worked at Imagination Technologies as a CPU Design Engineer and at Micron Technology as a SoC Verification Engineer.
I received my B.E. from BITS Pilani, Hyderabad Campus.
My current research interests include characterization of accelerator-rich architectures, systems for ML, and neuromorphic architectures. I think that closing the gap between software programmers and hardware design is as trememdously significant as challenging. I am also interested in the general study of cognition and behaviour of the mind, and always open to discuss bio-inspired hardware architecture design.
It's possible I'm currenty sipping Chai (slurp)
We (CFAED-PD) propose a multilevel approach to scalable accelerator design for reinforcement-learning on FPGAs
We (MMNE) build an inexpensive, completely automated potentiostat suited for resource-constrained and non-precise applications
(or view my Google Scholar profile)
C1 |
Memory Oriented Optimization Approach to Reinforcement Learning on FPGA-based Embedded Systems GLSVLSI '21
Reinforcement Learning (RL) represents the machine learning method that has come closest to showing human-like learning. While Deep RL is becoming increasingly popular for complex applications such as AI-based gaming, it has a high implementation cost in terms of both power and latency. Q-Learning, on the other hand, is a much simpler method that makes it more feasible for implementation on resource-constrained embedded systems for control and navigation. However, the optimal policy search in Q-Learning is a compute-intensive and inherently sequential process and a software-only implementation may not be able to satisfy the latency and throughput constraints of such applications. To this end, we propose a novel accelerator design with multiple design trade-offs for implementing Q-Learning on FPGA-based SoCs. Specifically, we analyze the various stages of the Epsilon-Greedy algorithm for RL and propose a novel microarchitecture that reduces the latency by optimizing the memory access during each iteration. Consequently, we present multiple designs that provide varying trade-offs between performance, power dissipation, and resource utilization of the accelerator. With the proposed approach, we report considerable improvement in throughput with lower resource utilization over state-of-the-art design implementations.
@inproceedings{10.1145/3453688.3461533,
author = {Sahoo, Siva Satyendra and Baranwal, Akhil Raj and Ullah, Salim and Kumar, Akash},
title = {MemOReL: A \Mem\ory-Oriented \O\ptimization Approach to \Re\inforcement \L\earning on FPGA-Based Embedded Systems},
year = {2021},
isbn = {9781450383936},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3453688.3461533},
doi = {10.1145/3453688.3461533},
abstract = {Reinforcement Learning (RL) represents the machine learning method that has come closest to showing human-like learning. While Deep RL is becoming increasingly popular for complex applications such as AI-based gaming, it has a high implementation cost in terms of both power and latency. Q-Learning, on the other hand, is a much simpler method that makes it more feasible for implementation on resource-constrained embedded systems for control and navigation. However, the optimal policy search in Q-Learning is a compute-intensive and inherently sequential process and a software-only implementation may not be able to satisfy the latency and throughput constraints of such applications. To this end, we propose a novel accelerator design with multiple design trade-offs for implementing Q-Learning on FPGA-based SoCs. Specifically, we analyze the various stages of the Epsilon-Greedy algorithm for RL and propose a novel microarchitecture that reduces the latency by optimizing the memory access during each iteration. Consequently, we present multiple designs that provide varying trade-offs between performance, power dissipation, and resource utilization of the accelerator. With the proposed approach, we report considerable improvement in throughput with lower resource utilization over state-of-the-art design implementations.},
booktitle = {Proceedings of the 2021 on Great Lakes Symposium on VLSI},
pages = {339–346},
numpages = {8},
keywords = {energy-efficient computing, hardware accelerators, memory-centric computing, fpga, high-level synthesis},
location = {Virtual Event, USA},
series = {GLSVLSI '21}
}
|
J4 |
ReLAccS: A Multilevel Approach to Accelerator Design for Reinforcement Learning on FPGA-Based Systems IEEE TCAD
Reinforcement learning (RL), specifically Q-learning, with human-like learning abilities to learn from experience without any a priori data, is being increasingly used in embedded systems in the field of control and navigation. However, finding the optimal policy in this approach can be highly compute-intensive, and a software-only implementation may not satisfy the application's timing constraints. To this end, we propose optimization methods at multiple levels of accelerator design for RL. Specifically, at the architecture-level, we exploit the instruction-level parallelism and the spatial parallelism in FPGAs to improve the throughput over state-of-the-art designs by up to 34%. Further, we propose lookup table-level optimizations to reduce the resource utilization and power dissipation of the accelerator. Finally, we propose algorithm-level approximation that can be used for acceleration of Q-learning problems with more states and for reducing the peak power dissipation. We report up to 10× reduction in power dissipation with marginal degradation in quality of results
@ARTICLE{9211770,
author={Baranwal, Akhil Raj and Ullah, Salim and Sahoo, Siva Satyendra and Kumar, Akash},
journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
title={ReLAccS: A Multilevel Approach to Accelerator Design for Reinforcement Learning on FPGA-Based Systems},
year={2021},
volume={40},
number={9},
pages={1754-1767},
doi={10.1109/TCAD.2020.3028350}
}
|
J3 |
Development of Completely Automated Poly Potential Portable Potentiostat ECS JST
Various research activities related to profiling chemicals employ detection or measurement of the response from a specimen in terms of electric fields or currents. Hence, a portable poly-potential device forms one of the necessary measuring equipment essential to these domains. This work aims to propose a Poly-Potential Portable Potentiostat (P4), that can perform electrochemical analysis of solutions through easily integrable data-acquisition hardware and flexible software post-processing. The P4 device is based on a commercial development board, which provides an analog front-end (AFE) for working with 2-lead and 3-lead amperometric cells. An economical and portable design approach is prioritised while keeping the basic functions of the research-grade commercial instruments. A novel technique of dynamically changing the bias and reference potential is used to achieve a finer resolution, enabling qualitative estimation. P4 works by performing detailed mathematical post-processing on-board and therefore relies on hardware integrity as much as on software flexibility. Calibration of P4 was done using a standardised solution to function independently of any external hardware or software tools. P4 makes electrochemical analysis truly portable in remote or resource-constrained applications.
@article{Baranwal_2021,
doi = {10.1149/2162-8777/abdc15},
url = {https://dx.doi.org/10.1149/2162-8777/abdc15},
year = {2021},
month = {feb},
publisher = {IOP Publishing},
volume = {10},
number = {2},
pages = {027001},
author = {Akhil Raj Baranwal and Sohan Dudala and Prakash Rewatkar and Jaligam Murali Mohan and Mary Salve and Sanket Goel},
title = {Development of Completely Automated Poly Potential Portable Potentiostat},
journal = {ECS Journal of Solid State Science and Technology},
abstract = {Various research activities related to profiling chemicals employ detection or measurement of the response from a specimen in terms of electric fields or currents. Hence, a portable poly-potential device forms one of the necessary measuring equipment essential to these domains. This work aims to propose a Poly-Potential Portable Potentiostat (P4), that can perform electrochemical analysis of solutions through easily integrable data-acquisition hardware and flexible software post-processing. The P4 device is based on a commercial development board, which provides an analog front-end (AFE) for working with 2-lead and 3-lead amperometric cells. An economical and portable design approach is prioritised while keeping the basic functions of the research-grade commercial instruments. A novel technique of dynamically changing the bias and reference potential is used to achieve a finer resolution, enabling qualitative estimation. P4 works by performing detailed mathematical post-processing on-board and therefore relies on hardware integrity as much as on software flexibility. Calibration of P4 was done using a standardised solution to function independently of any external hardware or software tools. P4 makes electrochemical analysis truly portable in remote or resource-constrained applications.}
}
|
Please contact me via email: akhil_baranwal [at] sfu.ca