Skip to content
Ahmed Elmersawy, undergraduate researcher at Purdue University

Electrical and Computer Engineering Student

Ahmed Elmersawy

Purdue University

I build AI systems that learn to optimize software across competing goals (speed, memory, and energy) and study the fundamental training dynamics that determine when those systems can learn at all.

Scroll to learn more

About

Research Vision

I’m an undergraduate researcher at Purdue University’s Duality Lab, advised by Prof. James Davis. My work sits at the intersection of machine learning and systems: I build AI systems that optimize code across competing goals, and I study the training dynamics that determine when learning is even possible.

Two projects define this work. Hydra is a preference-learning framework that teaches a code LLM to balance speed, memory, and energy simultaneously, treating multi-objective optimization as a decision problem, not just a metric. Variance Collapse asks the complementary question: before you train anything, can you predict whether the training signal will survive? It turns out you can, from a single measurement made before training begins.

My long-term interest is in AI systems that understand and adapt to their own computational cost: systems that are not just accurate, but resource-aware by design. I’m interested in research opportunities in AI systems, learned optimization, and ML theory.

Academic Highlights

DAC Young Fellowship

Design Automation Conference, 2026

NSF Grant Award

National Science Foundation, 2025

ECE Dean's List

Purdue University, 2023–2026

Research

One question, two directions

How does training signal behave, and can it be steered? Hydra acts on it to guide multi-objective code optimization. Variance Collapse predicts when that signal exists at all.

Explore the full research graph

Projects

Selected work

Research projects that test a scientific claim, and engineering projects that ship a working system.

MOSO-DPO training pipeline: code samples are profiled across per-metric performance scores, aggregated via Adaptive Dirichlet Sampling into preference weights, and used to train the LLM with DPO loss.

Hydra: Multi-Objective Code Optimization

Most tools that make code faster only optimize for one thing (usually speed). Hydra teaches an AI model to balance speed, memory, and energy use together, the way a real engineer would.

An offline preference-learning pipeline that fine-tunes a 7B-parameter code LLM with Direct Preference Optimization (DPO) to navigate sampled trade-offs across runtime, memory, CPU cycles, throughput, and energy, using an adaptive Dirichlet sampling mechanism that reweights training toward whichever objective the model is currently weakest on.

Contribution: Built by a 4-person Purdue team (Arjun Gupte, Ahmed Elmersawy, Andre Lee, Stefan Maxim) advised by Prof. James Davis. My role: implementing and stabilizing the DPO training/inference pipeline, the QLoRA fine-tuning setup for Qwen2.5-Coder-7B, the adaptive Dirichlet sampling mechanism for the Python model, the enriched 5-metric PIE data pipeline, and the inference-time evaluation benchmark (Table III results below).

PythonPyTorchDPOQLoRA/LoRAQwen2.5-Coder-7BSLURM / A100
  • 46.7% latency, 36.0% CPU-cycle, and 36.0% energy reduction on a 7-program held-out Python inference benchmark
  • 35,752 DPO preference pairs built from the enriched PIE Python split, scored across 5 system-level metrics
View on GitHub
Reconstruction IoU undergoes a sharp sigmoidal phase transition at sigmoid stiffness alpha*=28.28, with empirical measurements tightly matching the fitted theoretical curve.

Variance Collapse & Gate Density Divergence

When you train a neural network, some of its internal units gradually stop learning. For some activation functions this is normal and expected; for others it's a sign of trouble. This project shows that whether it's normal or a problem is predictable, for the tested activation and optimizer combinations, before you even start training, using a property anyone can check in advance.

A hook-based instrumentation framework that recovers the exact gradient gate of any elementwise activation function, and a derived predictor (based on BatchNorm's known variance shrinkage under weight decay) that predicts, for the tested configurations, from a single training-free quantity whether gate density rises (GELU/SiLU/Mish) or falls (ReLU) during ordinary training, and why that split disappears under AdamW.

PythonPyTorchCIFAR-native ResNet/VGG/ViTSLURM / A100
  • 48/48 architecture-fixed runs confirm the ReLU-vs-smooth-activation gate-density split (sign test p=2.44×10⁻⁴)
  • The same predictor, fed AdamW's measured statistics, correctly anticipates AdamW's different outcome with zero new free parameters
  • Directional claims validated across ~500 total runs spanning CNNs, MLP-Mixer, and Transformer-Encoder architectures on CIFAR, Tiny-ImageNet, and Places365
View on GitHub
View all projects

Get in touch

Open to research opportunities.

I'm an undergraduate researcher in Purdue's Duality Lab, advised by Prof. James Davis. I'm interested in research opportunities at the intersection of machine learning, systems, and software optimization. The fastest way to reach me is email.

Contact me