My name is Behnam Pourghassemi and I'm a Ph.D. student in computer engineering at the University of California, Irvine.
My research interests lie in high performance computing (HPC), network systems and performance analysis. I am interested in designing and developing scalable and low-overhead profiling tools for web services as well as large-scale and parallel code-base such as web browsers. I also research in HPC and parallel computing with the focus on GPU computing and optimizing deep learning networks.
Education
Ph.D. in Computer Engineering, University of California, Irvine
2015-Present
M.S. in Computer Engineering, University of California, Irvine
2015-2017
B.S. in Electrical Engineering, Sharif University of Technology
2011-2015
Publications
-
AdPerf: Characterizing the Performance of Third-party Ads
Behnam Pourghassemi, Jordan Bonecutter, Zhou Li, Aparna Chandramowlishwaran
-
Only Relative Speed Matters: Virtual Causal Profiling
Behnam Pourghassemi, Ardalan Amiri Sani, Aparna Chandramowlishwaran
-
On the Limits of Parallelizing Convolutional Neural Networks on GPUs
Behnam Pourghassemi, Chenghao Zhang, Joo Hwan Lee, Aparna Chandramowlishwaran
-
Scalable Dynamic Analysis of Browsers for Privacy and Performance
-
What-If Analysis of Page Load Time in Web Browsers Using Causal Profiling
Behnam Pourghassemi, Ardalan Amiri Sani, Aparna Chandramowlishwaran
SIGMETRICS 2019 (nominated for best paper award)
Details
PDF
-
Platform for Concurrent Execution of GPU Operations
Behnam Pourghassemi, Joo Hwan Lee, Yang Seok Ki
US Patent application number: 16/442,440 (in contract with Samsung Electronics)
Details
PDF
-
Platform for Concurrent Execution of GPU Operations
Joo Hwan Lee, Yang Seok Ki, Behnam Pourghassemi
US Patent application number: 16/442,447 (in contract with Samsung Electronics)
Details
PDF
-
CudaCR: An In-kernel Application-level Checkpoint/restart Scheme for CUDA-enabled GPUs
Behnam Pourghassemi, Aparna Chandramowlishwaran
-
Unsteady Navier-stokes Computations on GPU Architectures
Bahareh Mostafazadeh, Ferran Marti, Behnam Pourghassemi, Aparna Chandramowlishwaran
Selected Projects
Leveraging parallelism for non-linear convolutional neural networks on GPU
GPUs are currently the platform of choice for training neural networks. However, training a deep neural network (DNN) is a time-consuming process even on GPUs because of the massive number of parameters that have to be learned. As a result, accelerating DNN training has been an area of significant research in the last couple of years.While earlier networks such as AlexNet had a linear dependency between layers and operations, more recent networks such as ResNet, PathNet, and GoogleNet have a non-linear structure that exhibits a higher level of inter-operation parallelism. However, popular deep learning (DL) frameworks such as TensorFlow and PyTorch launch the majority of neural network operations, especially convolutions, serially on GPUs and do not exploit this inter-op parallelism. Accordingly, we make a case for the need and potential benefit of exploiting this rich parallelism in state-of-the-art non-linear networks for reducing the training time. We identify the challenges and limitations in enabling concurrent layer execution on GPU backends (such as cuDNN) of DL frameworks and propose potential solutions.
adPerf: Perfomance Characterization of Web ads
We apply an in-depth and first-of-a-kind performance evaluation of web ads without using adblockers.
We aim to characterize the cost by every component of an ad, so the publisher, ad syndicate, and advertiser can improve ad's performance with detailed guidance.
For this purpose, we develop an infrastructure, adPerf, for the Chrome browser that classifies page loading workloads into ad-related and main-content at the granularity of browser activities (such as Javascript and Layout).
Our evaluations show that online advertising entails more than 15% of browser page loading workload and approximately 88% of that is spent on JavaScript.
AdPerf also tracks the sources and delivery chain of web ads and analyze performance considering the origin of the ad contents.
We observe that 2 of well-known third-party ad domains contribute to 35% of the ads performance cost and surprisingly, top news websites implicitly include unknown third-party ads which in some cases build up to more than 37% of the ads performance cost.
COZ+: Causal Performance Analysis of Browsers
Apply comprehensive and quantitative what-if analysis on the web browser’s page loading process to detect performance bottlenecks. Unlike conventional profiling methods, we apply causal profiling to precisely determine the impact of each computation stage such as HTML parsing and Layout on PLT. For this purpose, we develop COZ+, a high-performance causal profiler capable of analyzing large software systems such as the Chromium browser. COZ+ highlights the most influential spots for further optimization, which can be leveraged by browser developers and/or website designers. For instance, COZ+ shows that optimizing JavaScript by 40% is expected to improve the Chromium desktop browser’s page loading performance by more than 8.5% under typical network conditions.
cudaCR: In-kernel Checkpoint/restart for GPU
By shifting from peta-scale to exa-scale, mean-time-between-failure of large-scale machines is dropping so that errors might happen when GPU nodes are executing their kernel. Unlike previous frameworks, we design and implement an application-level checkpoint/restart scheme for CUDA application that can capture in-kernel state of GPU and restart from previous clean state inside the kernel. We inject CR code into the application source code that handles data-movement and memory footprint of threads. This scheme is well-designed for compute-intensive long-running kernels.
VCoz: Virtual Causal Profiler
Causal profiling is a novel and powerful profiling technique that
quantifies the potential impact of optimizing a code segment on the
program’s execution time. In this project, We first theoretically model and prove causal profiling,
the missing piece in the original paper; then we assert the necessary
condition to achieve virtual causal profiling on the secondary device.
Building upon the theory, we design VCoz, a virtual causal profiler
that enables profiling applications on the target devices by running
experiments on the host device. We implement a prototype of VCoz
by tuning multiple hardware components to preserve the relative
execution speed of code segments. Our experiments demonstrate that VCoz can
generate causal profiling reports of Nexus 6P (ARM-based device)
on a host x86 system with less than 16% variance.
HiPer: Computational Fluid Dynamics Simulation on Heterogeneous System
HPC Lab started a joint-project in collaboration with department of Mechanical and Aerospace Engineering at UCI to improve performance of their CFD simulator. Our team implement scalable and high-performance 2nd order finite-volume Navier-Stokes simulator on heterogeneous system. My contribution was to accelerate GPU stencil computation using Geometric Multi-grid and simulate different test cases such as cylinder channel for steady/unsteady flows.
HetroFHE: Fully Homomorphic Encryption on Heterogeneous CPU-GPU System
Fully Homomorphic Encryption is an almost new asymmetric cryptosystem (The primary lattice-based implementation of this scheme presented in Gentry's PhD thesis in 2009) that carries out computation on ciphertext. In other word, it lets users to apply all operations on encrypted data and get the same result if they apply them on unencrypted data (plaintext). So, FHE is a good option for substitution of existing standard cryptosystems like AES in cloud computing. However, its operations such as encryption, key generation, multiplication and so on require intensive computation over big integers. We implemented some of these operations on heterogeneous CPU-GPU system. In CPU, we used big-integer library NTL for initialization and Enc/Dec and in device side, we used Chinese Reminder Theorem (CRT) for multiplication and addition.
Skills
- Programming: C/C++, Java, Verilog, Python, Matlab
- Parallel Environment: CUDA, MPI, OpenMP
- Hardware: GPU, FPGA, X86, ARM
- Operating System: Linux
- Software: Visual Studio, CUDA debugger/profiler, Vivado (HLS & design suite), Quartus
- Miscellaneous: Latex, R, Git
Curriculum Vitae
You can download my CV from here.
Contact Me
You can send me your message here.