ASB 10817
Applied Science Building
8888 University Dr.
Simon Fraser University
Burnaby, BC V5A 1S6
Email: asa582 [at] sfu.ca
I am a Ph.D. student in Engineering Science at the Simon Fraser University, advised by Prof. Zhenman Fang. My research interests include: Hardware/Software codesign, Multicore Processing, High-Performance Computing, and Approximate Computing. Prior to SFU, I received my B.S. in Electrical Engineering from Shahid Beheshti University, under the supervision of Dr. Alireza Fattah on Low Noise Amplifiers for Medical Application in 2015. I received my M.Sc. degree in Integrated Electronics Design from Tabriz University, under supervision of Prof. Jafar Sobhi and co-supervision of Prof. Ziaedin Dayi Kouzeh Konani on design and implementation of a RISC-based multiprocessor in 2018.
I am a high performance computing researcher and adroit at building systems as well as debugging, optimizing and layouting enourmous architectures.
You can find my CV here. Or view my Google Scholar profile.
C4 |
Approximation-aware Task Partitioning on an Approximate-Exact MPSoC (AxE) NorCAS '23
As the demand for increased performance and reduced energy consumption continues to grow, Quality of Service (QoS) adjustment approaches offer an effective way to tackle those demands. One such method, approximation, has gained popularity in recent years, facilitating faster executions as well as a smaller power consumption by providing an approximated result. The areas in which these trade-offs are acceptable are numerous, but hardware-based solutions are usually domain-specific and expensive to integrate. To tackle this issue, we take a different approach, in which approximate hardware can be used (or not) in a general purpose environment and via software decisions. That is, a Multi-Processor System-on-Chip (MPSoC) that contains Central Processing Units (CPUs) that offer approximate calculations alongside the ones that offer exact calculations. However, current task partitioning algorithms do not consider the specific capabilities or requirements of such a MPSoC. This paper introduces approximation-aware partitioning algorithms using different strategies and compares the results to the State-of-the-Art (SoA). Additionally, the resulted task partitions are executed to gauge their quality compared to the SoA. Experimental results show, that the usage of an approximate CPU and approximation-aware task partitioning leads to an increased partition success rate of 21.5%. Furthermore, the execution, i.e., scheduling of the partitioned tasks until energy starvation, achieves a 3.4% extended run-time.
@INPROCEEDINGS{10305464,
author={Huemer, S. and Baroughi, A. S. and Shahhoseini, H. S. and TaheriNejad, N.},
booktitle={2023 IEEE Nordic Circuits and Systems Conference (NorCAS)},
title={Approximation-aware Task Partitioning on an Approximate-Exact MPSoC (AxE)},
year={2023},
volume={},
number={},
pages={1-7},
doi={10.1109/NorCAS58970.2023.10305464}}
|
C3 |
An Approximate Carry Disregard Multiplier with Improved Mean Relative Error Distance and Probability of Correctness DSD '22
Nowadays, a wide range of applications can tolerate certain computational errors. Hence, approximate computing has become one of the most attractive topics in computer architecture. Reducing accuracy in computations in a premeditated and appropriate manner reduces architectural complexities, and as a result, performance, power consumption, and area can improve significantly. This paper proposes a novel approximate multiplier design. The proposed design has been implemented using 45 nm CMOS technology and has been extensively evaluated. Compared to existing approximate architectures, the proposed approximate multiplier has higher accuracy. It also achieves better results in critical path delay, power consumption, and area up to 47.54 %, 75.24%, and 92.49%, respectively. Compared to the precise multipliers, our evaluations show that the critical path delay, power consumption, and area have been improved by 39%, 18%, and 6 %, respectively.
@INPROCEEDINGS{9996856,
author={Amirafshar, N. and Baroughi, A. S. and Shahhoseini, H. S. and TaheriNejad, N.},
booktitle={2022 25th Euromicro Conference on Digital System Design (DSD)},
title={An Approximate Carry Disregard Multiplier with Improved Mean Relative Error Distance and Probability of Correctness},
year={2022},
volume={},
number={},
pages={46-52},
doi={10.1109/DSD57027.2022.00016}}
|
C2 |
AxE: An Approximate-Exact Multi-Processor System-on-Chip Platform DSD '22
Due to the ever-increasing complexity of computing tasks, emerging computing paradigms that increase efficiency, such as approximate computing, are gaining momentum. However, so far, the majority of proposed solutions for hardware-based approximation have been application-specific and/or limited to smaller units of the computing system and require engineering effort for integration into the rest of the system. In this paper, we present Approximate and Exact Multi-Processor system-on-chip (AxE) platform. AxE is the first general-purpose approximate Multi-Processor System-on-Chip (MPSoC). AxE is a heterogeneous RISC-V platform with exact and approximate cores that allows exploring hardware approximation for any application and using software instructions. Using the full capacity of an entire MPSoC, especially a heterogeneous one such as AxE, is an increasingly challenging problem. Therefore, we also propose a task mapping method for running exact and approximable applications on AxE. That is a mixed task mapping, in which applications are viewed as a set of tasks that can be run independently on different processors with different capabilities (exact or approximate). We evaluated our proposed method on AxE and reached a 32% average execution speed-up and 21% energy consumption saving with an average of 99.3% accuracy on three mixed workloads. We also ran a sample image processing application, namely gray-scale filter, on AxE and will present its results.
@INPROCEEDINGS{9996733,
author={Baroughi, A. S. and Huemer, S. and Shahhoseini, H. S. and TaheriNejad, N.},
booktitle={2022 25th Euromicro Conference on Digital System Design (DSD)},
title={AxE: An Approximate-Exact Multi-Processor System-on-Chip Platform},
year={2022},
volume={},
number={},
pages={60-66},
doi={10.1109/DSD57027.2022.00018}}
|
C1 |
High Performance Application-oriented Memory Management on Multicore Systems ICSPIS '20
Due to the evolution of imagining and dealing with issues and demands that have developed with intelligent systems, higher performance has become a crucial need for daily growth of application size and complexities. Currently, most high-performance computing systems are multi-core, and many kinds of research have been done toward minimizing the execution time efficiently. Cache performance impacts on program execution especially on modern intelligent processing systems, therefore cache replacement policies in set-associative caches have been investigated in great depth. We propose an approach to consider information derived from coherence state of the cache block in a time-interleaved manner, and our simulations on high-performance applications of PARSEC2.1 and SPLASH-2 benchmark suits show that our approach has 10% lower miss rate and up to 10% more instructions per cycle than LRU, MRU and Random replacement policy without further power consumption.
@INPROCEEDINGS{9349590,
author={Baroughi, Ahmad Sadigh and Naderi, Madjid},
booktitle={2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS)},
title={High Performance Application-oriented Memory Management on Multicore Systems},
year={2020},
volume={},
number={},
pages={1-6},
doi={10.1109/ICSPIS51611.2020.9349590}}
|
J1 |
Carry Disregard Approximate Multipliers TCAS I
Several challenges in improving the performance of computing systems have given rise to emerging computing paradigms. One of these paradigms is approximate computing. Many applications require different levels of accuracy and are error-tolerance to a certain degree. Approximate computations can reduce the calculation complexities significantly and thus improve the performance. Here, we propose a methodology for designing approximate N-bit array multipliers based on carry disregarding. We evaluate and analyze the proposed multipliers both experimentally and theoretically. The proposed 8-bit multipliers, compared to the exact multiplier, reduce the critical path delay, power consumption, and area by 29%, 29%, and 30%, on average. Compared to the existing approximate array architectures in the literature, they have improved 14.3%, 22.8%, and 26.4%, respectively. Compared to the exact 16-bit multiplier, the proposed 16-bit multipliers have reduced the delay, power consumption, and area by 35%, 24%, and 23% on average. In an image processing application, we have also demonstrated the applicability of a wide range of proposed multipliers, which have Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) over 30 dB and 94%, respectively.
@ARTICLE{10235317,
author={Amirafshar, Nima and Baroughi, Ahmad Sadigh and Shahhoseini, Hadi Shahriar and TaheriNejad, Nima},
journal={IEEE Transactions on Circuits and Systems I: Regular Papers},
title={Carry Disregard Approximate Multipliers},
year={2023},
volume={70},
number={12},
pages={4840-4853},
doi={10.1109/TCSI.2023.3306071}}
|