Electrical and Computer Engineering Student

Ahmed Elmersawy

Purdue University

I build AI systems that learn to optimize software across competing goals (speed, memory, and energy) and study the fundamental training dynamics that determine when those systems can learn at all.

Explore Research View CV

About

Research Vision

I’m an undergraduate researcher at Purdue University’s Duality Lab, advised by Prof. James Davis. My work sits at the intersection of machine learning and systems: I build AI systems that optimize code across competing goals, and I study the training dynamics that determine when learning is even possible.

Two projects define this work. Hydra is a preference-learning framework that teaches a code LLM to balance speed, memory, and energy simultaneously, treating multi-objective optimization as a decision problem, not just a metric. Variance Collapse asks the complementary question: before you train anything, can you predict whether the training signal will survive? It turns out you can, from a single measurement made before training begins.

My long-term interest is in AI systems that understand and adapt to their own computational cost: systems that are not just accurate, but resource-aware by design. I’m interested in research opportunities in AI systems, learned optimization, and ML theory.

Academic Highlights

DAC Young Fellowship

Design Automation Conference, 2026

NSF Grant Award

National Science Foundation, 2025

ECE Dean's List

Purdue University, 2023–2026

View all awards →

Research

One question, two directions

How does training signal behave, and can it be steered? Hydra acts on it to guide multi-objective code optimization. Variance Collapse predicts when that signal exists at all.

AI/LLMsCode OptRL

Hydra: Multi-Objective Code Optimization

Research Thread 1: AI for Software Optimization. A DPO-based framework that fine-tunes a code LLM to navigate sampled trade-offs across runtime, memory, CPU cycles, throughput, and energy instead of optimizing a single metric, by reading the model's own per-metric training signal each epoch and steering sampling toward whichever objective it's currently weakest on.

AI/LLMs

Variance Collapse & Gate Density

Research Thread 2: Optimization Dynamics & Learning Behavior. Whether the fraction of gradient-carrying units rises or falls during training is treated as activation-specific folklore. This derives and empirically validates a mechanistic predictor from a single training-free quantity, across nearly 500 experiments spanning CNNs, MLP-Mixers, and Transformers, three datasets, and three optimizer families, the complement to Hydra: instead of steering training signal, this asks when it structurally disappears.

Explore the full research graph

Projects

Selected work

Research projects that test a scientific claim, and engineering projects that ship a working system.

MOSO-DPO training pipeline: code samples are profiled across per-metric performance scores, aggregated via Adaptive Dirichlet Sampling into preference weights, and used to train the LLM with DPO loss.

Hydra: Multi-Objective Code Optimization

Most tools that make code faster only optimize for one thing (usually speed). Hydra teaches an AI model to balance speed, memory, and energy use together, the way a real engineer would.

An offline preference-learning pipeline that fine-tunes a 7B-parameter code LLM with Direct Preference Optimization (DPO) to navigate sampled trade-offs across runtime, memory, CPU cycles, throughput, and energy, using an adaptive Dirichlet sampling mechanism that reweights training toward whichever objective the model is currently weakest on.

Contribution: Built by a 4-person Purdue team (Arjun Gupte, Ahmed Elmersawy, Andre Lee, Stefan Maxim) advised by Prof. James Davis. My role: implementing and stabilizing the DPO training/inference pipeline, the QLoRA fine-tuning setup for Qwen2.5-Coder-7B, the adaptive Dirichlet sampling mechanism for the Python model, the enriched 5-metric PIE data pipeline, and the inference-time evaluation benchmark (Table III results below).

PythonPyTorchDPOQLoRA/LoRAQwen2.5-Coder-7BSLURM / A100

46.7% latency, 36.0% CPU-cycle, and 36.0% energy reduction on a 7-program held-out Python inference benchmark
35,752 DPO preference pairs built from the enriched PIE Python split, scored across 5 system-level metrics

View on GitHub

Reconstruction IoU undergoes a sharp sigmoidal phase transition at sigmoid stiffness alpha*=28.28, with empirical measurements tightly matching the fitted theoretical curve.

Variance Collapse & Gate Density Divergence

When you train a neural network, some of its internal units gradually stop learning. For some activation functions this is normal and expected; for others it's a sign of trouble. This project shows that whether it's normal or a problem is predictable, for the tested activation and optimizer combinations, before you even start training, using a property anyone can check in advance.

A hook-based instrumentation framework that recovers the exact gradient gate of any elementwise activation function, and a derived predictor (based on BatchNorm's known variance shrinkage under weight decay) that predicts, for the tested configurations, from a single training-free quantity whether gate density rises (GELU/SiLU/Mish) or falls (ReLU) during ordinary training, and why that split disappears under AdamW.

PythonPyTorchCIFAR-native ResNet/VGG/ViTSLURM / A100

48/48 architecture-fixed runs confirm the ReLU-vs-smooth-activation gate-density split (sign test p=2.44×10⁻⁴)
The same predictor, fed AdamW's measured statistics, correctly anticipates AdamW's different outcome with zero new free parameters
Directional claims validated across ~500 total runs spanning CNNs, MLP-Mixer, and Transformer-Encoder architectures on CIFAR, Tiny-ImageNet, and Places365

View on GitHub

View all projects

Get in touch

Open to research opportunities.

I'm an undergraduate researcher in Purdue's Duality Lab, advised by Prof. James Davis. I'm interested in research opportunities at the intersection of machine learning, systems, and software optimization. The fastest way to reach me is email.

Contact me