Posters - GCASR 2026

Session #2

Dillo
Lake
Northwestern
Rock

Session #1

Dillo
Lake
Northwestern
Rock

Dillo
Lake
Northwestern
Rock

Dillo
Lake
Northwestern
Rock

Session #2

Room: Northwestern

Foundational Agent Pipeline for Cloud Infrastructure Configuration

Authors: Haozheng Luo, Yimin Wang, Xiangmin Shen, Yan Chen

Institution(s): Northwestern University

Room: Northwestern

Board #: N31

Session #: 2

Abstract: We introduce a Multi-Agent System (MAS) framework grounded in a Cloud Native World Model. The system leverages specialized agents (Config, YAML, Module, Docs) coordinated by a Master Agent to systematically decompose, validate, and synthesize complex cloud configurations. Compared with traditional ground-truth comparison, we implement Dynamic Evaluation via a local Kubernetes environment, where automated deployment feedback and schema validation act as the evaluation oracle. The Environment-Reflection mechanism enables agents to learn from system feedback via a verifier-in-the-loop architecture (VERA), propagating system-level configuration error messages to the agent for iterative diagnosis and resolution. Our enhanced MAS pipeline improves configuration constraint retention by 18.6% over baselines, reduces knowledge staleness errors, and achieves a 73.3% win rate in A/B testing for end-to-end deployment success.

Author Bios:
Haozheng Luo is a Ph.D. candidate in Computer Science at Northwestern University, advised by Prof. Yan Chen. His research focuses on trustworthy and efficient foundation models, including AI safety, reasoning efficiency, quantization robustness, and alignment.
Yimin Wang is a senior undergraduate student studying Data Science at the University of Michigan and Mechanical Engineering at Shanghai Jiao Tong University. She will join UCLA as a Computer Science Ph.D. student advised by Prof. Yangruibo Ding.
Xiangmin Shen is an Assistant Professor of Computer Science at Hofstra University and co-director of the Cybersecurity Innovation & Research Center. He received his Ph.D. in Computer Science from Northwestern University under Prof. Yan Chen.
Yan Chen is a Professor of Computer Science at Northwestern University. His research focuses on network and system security, software security, vulnerability discovery, network measurement, and software-defined networking.

Efficiently Reproducing Distributed Workflows in Notebook-based Systems

Authors: Talha Azaz, Raza Ahmad, Md Saiful Islam, Douglas Thain, and Tanu Malik

Institution(s): University of Missouri, DePaul University, University of Notre Dame, University of Missouri, University of Notre Dame

Room: Northwestern

Board #: N17

Session #: 2

Abstract: Notebooks provide an author-friendly environment for iterative development, modular execution, and easy sharing. Distributed workflows are increasingly being authored and executed in notebooks, yet sharing and reproducing them remains challenging. Even small code or parameter changes often force complete re-execution of the distributed workflow, limiting iterative development for such workloads. Current methods for improving notebook execution largely operate on single-node workflows, while optimization techniques for distributed workflows typically sacrifice reproducibility. We introduce NBReplay, an end-to-end system for efficient, reproducible execution of distributed workflows in notebooks. NBReplay consists of two kernels—audit and repeat—which are based on data-flow analysis across cells. The audit kernel avoids redundant computations through task level caching and incremental, cell-level checkpointing. The repeat kernel replays the cached tasks and reconstructs the checkpoints to enable partial re-execution of notebook cells that manage distributed workflow. Using real-world case studies, we demonstrate that NBReplay significantly improves notebook re-execution, and enables portable, cross-site reproducibility of notebook-based distributed workflows on HPC systems, while adding minimal overhead in terms of execution time and storage space.

Author Bios:
Talha Azaz is a Platform & AI Systems Engineer with 5+ years of experience building backend systems, cloud infrastructure, and scientific computing tools that enhance visibility, performance, and reproducibility at scale. He holds a Master's in Computer Science from DePaul University in Chicago.
Raza Ahmad is a Computer Science researcher with interests in Data Science, Data Provenance, Reproducibility, Distributed Systems, Machine Learning, Natural Language Processing, Information Retrieval, and Graph Databases. He is currently pursuing his Ph.D. in Computer Science at DePaul University in Chicago.
Md Saiful Islam is an experienced software developer and has worked with numerous technologies stacks. His research interests include distributed systems and computational reproducibility. He is currently a Ph.D. student at University of Notre Dame in South Bend, Indiana.
Douglas Thain is a Professor of Computer Science at University of Notre Dame in South Bend, Indiana where he leads the Cooperative Computing Lab. His research focuses on the design of software systems for large scale distributed computing.
Tanu Malik is an associate professor in the Department of Electrical Engineering and Computer Science at the University of Missouri, where she leads the RADIANT Lab. Her research focuses on big data management, accountable and reproducible distributed and parallel systems and trustworthy AI.

A Compiler Framework for Packing Optimization and TPU Acceleration in Fully Homomorphic Encryption

Authors: Zohaib Azam, Akash Madhu, Jianming Tong, Minxuan Zhou

Institution(s): Illinois Institute of Technology, Georgia Institute of Technology

Room: Northwestern

Board #: N4

Session #: 2

Abstract: Fully homomorphic encryption (FHE) is a key enabler for privacy-preserving machine learning (PPML), allowing inference on encrypted data without exposing sensitive inputs. However, compiling PPML workloads under FHE remains challenging: packing - encoding ML tensors into ciphertexts, each storing a 1D vector - and mapping - transforming ML operators into homomorphic operations such as addition, multiplication, and rotation - are tightly coupled, with the chosen packing scheme constraining which mappings are valid. Meanwhile, orchestration of rescaling, relinearization, and data movement must respect the ciphertext's evolving modulus level. Current industry-oriented FHE compilers such as HEIR lack comprehensive packing and mapping optimization support and do not target emerging hardware accelerators like TPUs. We present a compilation framework, integrated into HEIR, that addresses both gaps. First, we introduce intermediate representations that describe general packing and mapping choices, enabling effective optimization of data layouts while integrating naturally with the existing HEIR infrastructure. Second, we develop a new backend targeting CROSS, a TPU acceleration library for CKKS. The backend analyzes the generated layouts and mappings and produces efficient TPU code through a level-aware context API that encapsulates key generation, modulus-aware kernel selection, and accelerator-specific wiring behind a stable compiler surface. Preliminary experiments on encrypted neural-network inference validate our implementation, demonstrating improvements over existing manual approaches. The proposed framework can boost various research directions including scheduling policies for heterogeneous CPU–TPU pipelines, cross-layer operator fusion to reduce memory traffic and launch overhead, and exploration of MPC–FHE hybrid compilation under a unified toolchain

Author Bios:
Zohaib Azam is a Ph.D. student in Computer Science at Illinois Institute of Technology. His research interests include compilers, software-hardware co-design, and privacy-preserving computing, with a focus on building compiler and system-level support for efficient privacy-preserving machine learning.

Akash Madhu is a master’s student and systems researcher working across two labs, focusing on fully homomorphic encryption for machine learning in the Emerging Computing Lab and performance modeling of Processing-in-Memory architectures in the Programming Languages Research Lab.

Jianming Tong is a 5th-year PhD, Student Researcher at Google. He focuses on enabling AI system for applications beyond AI such as cryptography and using AI to improve kernels and architectures.

Minxuan Zhou is an Assistant Professor in the Department of Computer Science at Illinois Institute of Technology, where he is leading the Emerging Computing Systems (ECS or X) Lab. Prior to that, he obtained a PhD degree in computer science from UC San Diego.

Deepfake Video Detection

Authors: Khalidou Bass, Daniel Moreira, Florence Chee, Eric Chan-Tin

Institution(s): Loyola University Chicago

Room: Northwestern

Board #: N37

Session #: 2

Abstract: This poster will explore a new novel method to detect whether a video is real or fake based on facial recognition. The dataset consists of videos of celebrities so a baseline of their face is easy to obtain for the ground truth. We utilize state of the art facial recognition algorithm to detect whether a face in a celebrity can be recognized in a video. The intuition is that the face would be detected in a real video but would not be detected in a deepfake video.

Author Bios:
Khalidou Bass is a graduate student in the Department of Computer Science at Loyola University Chicago. Daniel Moreira is an Assistant Professor in the Department of Computer Science at Loyola University Chicago. Florence Chee is an Associate Professor in the School of Communication and the Director of the Center for Digital Ethics and Policy at Loyola University Chicago. Eric Chan-Tin is a Professor in the Department of Computer Science and the Director of the Center for Cybersecurity at Loyola University Chicago.

Beyond Assumptions: Measuring Federated Learning over Real 5G Networks

Authors: Robert J. Hayek, Kayla Comer, Joaquin Chung, Chandra R. Murthy, Rajkumar Kettimuthu, Igor Kadota

Institution(s): Argonne National Laboratory, Northwestern University, Indian Institute of Science

Room: Northwestern

Board #: N30

Session #: 2

Abstract: Federated Learning (FL) deployments using IoT devices is an area that is poised to significantly benefit from advances in NextG wireless. We deployed a FL application using a 5G-NR Standalone (SA) testbed with open-source and Commercial Off-the-Shelf (COTS) components. The 5G testbed architecture consists of a network of resource-constrained edge devices, namely Raspberry Pis, and a central server equipped with a Software Defined Radio (SDR) and running O-RAN software. Our testbed allows edge devices to communicate with the server using WiFi and Ethernet instead of 5G. FL is deployed using the Flower FL framework, for which we developed a comprehensive instrumentation tool to collect and analyze diverse communications and machine learning performance metrics including model aggregation time, downlink transmission time, training time, and uplink transmission time. Leveraging these measurements, we performed a comparative analysis of the FL application across three network interfaces–5G, WiFi, and Ethernet–as well as across 5G bandwidths and uplink-downlink scheduling ratios. Our experimental results challenge some common assumptions about communication time in FL over wireless and address the potential pitfalls of these assumptions. We find that there is a consistent straggler in about 70% of trials, while in the other 30%, high communication time causes competing stragglers. We also compare FL performance over 5G with and without external congestion and compare our testbed to commercial 5G to validate our findings in a broader context. For reproducibility, we have open-sourced our FL application, instrumentation tools, and testbed configuration.

Author Bios:
Robert J. Hayek is a predoctoral apointee within the Data Science and Learning Division at Argonne National Laboratory. His research experience lies in communication systems, specifically, quantum communications, 5G cellular networks, federated learning, and synchronization protocol design. He received his M.S. in Electrical Engineering from Northwestern University, Evanston, IL, USA (2025), and his B.S. in Computer Engineering (2023) from Ohio Northern University, Ada, OH, USA.

Kayla Comer is a PhD Candidate in Electrical Engineering at Northwestern University within the Communications and Networking Laboratory. Her research is at the intersection of communications theory and systems. She received her B.S. in Electrical Engineering from Purdue University, West Lafayette, IN, USA in 2023.

Joaquin Chung is a research scientist at the Data Science and Learning Division at Argonne National Laboratory. His main work focuses on studying architectures for scalable quantum networks. He received his Ph.D. in Electrical and Computer Engineering (2017) at the Georgia Institute of Technology, Atlanta, GA, USA.

Chandra R. Murthy received the Ph. D. degree in Electrical and Computer Engineering from the University of California, San Diego, USA, in 2006. He is a Professor in the Department of Electrical Communication Engineering at the Indian Institute of Science, Bengaluru, India. His research interests are in the areas of sparse signal recovery, energy harvesting-based communication, performance analysis, and optimization of 5G and beyond communications.

Rajkumar Kettimuthu is a senior scientist and group leader at Argonne National Laboratory. He has over 20 years of experience leading large-scale R&D efforts in high-performance computing, AI for science, scientific workflows, and advanced networking, with a recent emphasis on quantum networking and distributed quantum computing. He is a Distinguished Member of ACM, a Senior Member of IEEE, co-author of nearly 200 peer-reviewed publications, and a recipient of an R&D 100 Award.

Igor Kadota is an Assistant Professor of Electrical and Computer Engineering at Northwestern University. He received the Ph.D. degree from MIT LIDS. His research is on modeling, analysis, optimization, and implementation of next-generation communication networks, with an emphasis on advanced wireless systems and time-sensitive applications.

Cross-Architecture GPU Fault Injection at the LLVM IR Level

Authors: Maisy Dunlavy, Zhiling Lan, Michael Papka

Institution(s): University of Illinois Chicago, Argonne National Laboratory

Room: Northwestern

Board #: N9

Session #: 2

Abstract: Modern HPC systems increasingly rely on heterogeneous GPU platforms from NVIDIA, Intel, and AMD, yet most resilience studies remain confined to a single vendor stack. It therefore remains unclear how the same logical fault is handled by different GPU backends. We perform a cross-architecture fault injection study that applies semantically aligned single-bit faults at the LLVM IR level before backend-specific lowering, enabling direct comparison across CUDA, SYCL, and HIP compilation flows. We evaluate representative heterogeneous benchmarks on three production supercomputers, Polaris, Aurora, and Frontier. Our analysis extracts several key insights into GPU resilience and suggests practical guidelines for applying fault mitigation strategies more effectively on heterogeneous GPU platforms.

Author Bios:
Maisy Dunlavy is a Ph.D. student at the University of Illinois Chicago focused on HPC system reliability and resilience.

Zhiling Lan is a professor of University of Illinois Chicago CS with a joint appointment with Argonne National Laboratory.

Michael Papka is a Collegiate Warren S. McCulloch Professor of Computer Science at University of Illinois Chicago with a joint appointment with Argonne National Laboratory.

Efficient Compression of Structured and Unstructured Volumes via Learned 3D Gaussian Representation

Authors: Landon Dyken, Sidharth Kumar

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N32

Session #: 2

Abstract: Recent work has shown that implicit neural representations (INRs) can be trained to effectively compress structured and unstructured volume data, allowing for direct data querying with a reduced memory footprint. However, as existing INRs for unstructured volumes do not encode geometry, they require partial mesh storage for later sampling, limiting achievable compression. At the same time, novel view synthesis methods have shown that explicit collections of 3D Gaussians can be used to accurately visualize volume data. In this work, we introduce an explicit model for volume data compression based on 3D Gaussian primitives. We reinterpret collections of 3D Gaussians as an explicit representation of a volume’s scalar field and use a sampling strategy that reconstructs scalar values at spatial locations through weighted aggregation of intersecting Gaussians. We develop optimized CUDA-accelerated pipelines for structured and unstructured model sampling, loss functions that encourage accurate domain encoding by our models, and a novel sampling-error based densification strategy. Our explicit formulation naturally encodes domain geometry, eliminating the need for mesh storage in unstructured volumes and introducing significantly higher compression opportunities. Compared to existing INRs, we demonstrate that our explicit model achieves competitive reconstruction quality with significant training speedups on structured volumes, while markedly outperforming in all metrics on unstructured volumes.

Author Bios:
Landon Dyken is a Ph.D. student in Computer Science at the University of Illinois Chicago, in the Electronic Visualization Lab (EVL) and High-Performance Computing (HPC) groups. Landon's research focus is on the use of machine learning and GPU-accelerated systems for scientific visualization tasks.

Sidharth Kumar is an associate professor at the Department of Computer Science at the University of Illinois Chicago. His research lies at the intersection of high-performance computing (HPC) and Data visualization, working on problems in parallel I/O, big data processing, scalable algorithms, scientific visualiaztion, GPUs & performance modeling.

AI-based Health Companion

Authors: Naila Faiz, Dr Yi Yang

Institution(s): Northeastern Illinois University

Room: Northwestern

Board #: N27

Session #: 2

Abstract: This project presents the development of an AI-based everyday health companion designed to help individuals monitor and better understand their daily well-being. The system allows users to log health-related data such as mood, stress levels, sleep patterns, symptoms, and lifestyle habits through an interactive interface. The primary objective is to convert user-generated data into meaningful, personalized insights. To achieve this, the system utilizes lightweight artificial intelligence techniques, including Natural Language Processing (NLP) to extract relevant health indicators from unstructured text inputs, and rule-based models combined with time-series analysis to identify behavioral and emotional trends over time. These approaches enable the detection of patterns such as increasing stress, declining mood, and irregular sleep cycles. Based on these patterns, the system generates personalized insights along with a risk assessment score, helping users recognize potential areas of concern. While the system is not intended for medical diagnosis, it focuses on predictive and preventive support by identifying correlations between habits and well-being outcomes. The application is built using a full-stack architecture, consisting of a FastAPI backend, a Streamlit-based frontend, and a PostgreSQL database hosted on Supabase, enabling realtime interaction and scalable data management. Future work includes enhancing predictive capabilities using machine learning models, improving personalization through adaptive user baselines, and expanding the system to incorporate more advanced health metrics for deeper insight generation.

Author Bios:
Naila Faiz is a Master’s student in Computer Science at Northeastern Illinois University, Chicago. Her academic interests include Artificial Intelligence, data-driven systems, and cloud computing. Her current work focuses on developing an AI-based health companion that leverages real-time data analysis and weekly trend modeling to generate personalized health insights.
Dr. Yi Yang earned her PhD degree in Computer Science and Engineering from PennState in the year of 2010. She is currently an Associate Professor in the Computer Science Department at Northeastern Illinois University in Chicago, IL. Her research expertise includes Cybersecurity, Computer Networks, and Artificial Intelligence (AI).

The Hidden Cost of Storage Mismatches: Why Faster I/O Per Task Can Slow Down Entire Workflows

Authors: Meng Tang, Luanzheng Guo, Nathan R. Tallent, Anthony Kougkas, Xian-He Sun

Institution(s): Illinois Institute of Technology

Room: Northwestern

Board #: N15

Session #: 2

Abstract: Data-intensive scientific workflows in high-performance computing exchange intermediate data between pipeline stages through shared storage such as parallel file systems, local SSDs, and burst buffers. Existing workflow schedulers select storage to maximize individual task throughput, implicitly assuming that faster I/O per task yields faster end-to-end execution. We show this assumption is wrong. We present an empirical study across two scientific workflows—genomics and climate tracking—on a production HPC cluster. By benchmarking storage systems across operation types, transfer sizes, parallelism levels, and data volumes, we uncover two non-obvious findings. First, data movement between storage tiers scales fundamentally differently from read and write operations: copy throughput plateaus or degrades at parallelism levels where reads and writes continue to scale, making cross-tier staging costs unpredictable from standard I/O benchmarks. Second, producer-consumer task pairs can exhibit performance inversion—configurations that maximize throughput for individual tasks introduce costly inter-stage data movement that increases total workflow time by up to 6.8x compared to storage-aware alternatives. Across both workflows, I/O accounts for 42–98% of total execution time, yet the penalty from storage mismatches between dependent tasks can exceed the compute cost of the tasks themselves. These patterns hold across fundamentally different access profiles, from large sequential transfers to metadata-heavy many-small-file workloads. Our results demonstrate that producer-consumer data dependencies must be treated as first-class constraints in workflow scheduling.

Author Bios:
Meng Tang
PhD Candidate in Computer Science at Illinois Institute of Technology and member of the Gnosis Research Center, advised by Prof. Xian-He Sun. Her research focuses on distributed storage systems, scientific workflow I/O optimization, and containerization for HPC. She has authored work on dataflow-aware workflow optimization (DaYu) and I/O characterization of HPC workflows, with publications at SC, CLUSTER, and IPDPS.

Luanzheng Guo
Computer Scientist in the Future Computing Technologies group at Pacific Northwest National Laboratory, where he works at the intersection of scientific computing, large-scale data management, and AI for Science. He earned his PhD from UC Merced on system resilience in HPC, currently leads an LDRD on generative-AI-based microscope data compression, and is an NSF Trusted Cyberinfrastructure Fellow.

Nathan R. Tallent
Chief Computer Scientist in the Future Computing Technologies group at Pacific Northwest National Laboratory, where he directs the Performance Lab for EXtreme Computing and daTa (PerfLab-EXaCT). His research targets performance measurement, modeling, and optimization for distributed systems, scientific workflows, and machine learning. He earned his PhD from Rice University, was an original developer of HPCToolkit, and received a DOE Early Career Award.

Anthony Kougkas
Associate Research Professor of Computer Science at Illinois Institute of Technology and Deputy Director of the Gnosis Research Center, with a guest scientist appointment at Argonne National Laboratory. His work addresses HPC storage and I/O, multi-tiered architectures, and data management for AI and scientific ML. He has 50+ peer-reviewed publications, $12M+ in federal funding, and Best Paper awards at CCGrid'21 and HPDC'19.

Xian-He Sun
University Distinguished Professor and Ron Hochsprung Endowed Chair of Computer Science at Illinois Institute of Technology, Director of the Gnosis Research Center, and IEEE Fellow. His research spans parallel and distributed processing, memory and I/O systems, software systems for big-data applications, and performance evaluation and optimization. He has mentored many generations of researchers in HPC and storage.

A Joint Evaluation of Deployment Efficiency and Semantic Accuracy in Compact Vision-Language Models on Edge Systems

Authors: Micheal E Papka, Fatima Mora Garcia, Alejandra Rios

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N36

Session #: 2

Abstract: Deploying vision-language models (VLMs) on edge systems requires balancing performance with constrained memory, power, and compute resources, particularly in heterogeneous research environments such as the NSF SAGE testbed. SAGE supports large-scale environmental and urban sensing for the scientific community, where image-to-text capabilities enable on-device summarization without transmitting raw visual data. In this setting, selecting an appropriate model requires understanding not only system-level performance, but also how accurately models describe visual content under deployment constraints. In this work, we present a unified evaluation of compact VLMs that jointly considers deployment efficiency and semantic correctness. We evaluate six officially distributed Ollama models (moondream:1.8b-v2, qwen3-vl:2b, qwen2.5vl:3b, llava-phi3:3.8b, gemma3:4b, and minicpm-v:8b) at uniform Q4_K_M quantization across two heterogeneous edge platforms: a Dell Pro Max GB10 and an NVIDIA Jetson Thor. Models are tested using a standardized five-task vision workload over a 100-image COCO val2017 sample, measuring latency, throughput, GPU power, energy, and output reliability. To complement these deployment metrics, we introduce a lightweight, object-grounded semantic analysis that compares generated outputs against COCO entity annotations to assess alignment with ground-truth content. This provides a practical proxy for output quality by estimating how consistently models identify relevant objects across deterministic (temperature=0) generations. We additionally evaluate cross-device reproducibility by computing semantic similarity between paired outputs from the same model on different hardware, providing a measure of how reliably outputs transfer across the heterogeneous SAGE node fleet. By integrating system performance with semantic correctness and cross-device reproducibility, we enable a joint analysis of accuracy and efficiency trade-offs across models and devices. Our results highlight meaningful variation in both deployment cost and semantic fidelity across compact VLMs, underscoring the importance of evaluating both dimensions when selecting models for scientific edge deployments.

Author Bios:
Michael E. Papka is the Warren S. McCulloch Professor of Computer Science at the University of Illinois Chicago, where he directs the Electronic Visualization Laboratory (EVL). He holds a joint appointment at Argonne National Laboratory as Senior Scientist, Distinguished Fellow, and Director of the Argonne Leadership Computing Facility. His research spans high-performance computing, large-scale data analysis, and visualization across the computing continuum from edge to HPC.

Alejandra Rios is a senior studying Computer Science at the University of Illinois Chicago with a passion for exploring the intersection of AI and systems performance. As a Research Assistant at the Electronic Visualization Laboratory (EVL), she benchmarks compact vision-language models across edge computing platforms, investigating deployment trade-offs to inform real-world AI applications. Outside of research, she is passionate about building software that is both technically rigorous and meaningfully accessible

Fatima Mora Garcia is a senior studying Data Science and Computer Science at the University of Illinois Chicago and a first-generation college student interested in expanding access to education through technology. As a Research Assistant at the Electronic Visualization Laboratory (EVL), she works on edge AI applications using NVIDIA Jetson platforms, benchmarking models across devices to understand performance and deployment trade-offs. After graduation, she plans to pursue roles in analytics or product design.

Improving Edge AI Efficiency With High-Accuracy In-Memory Computing

Authors: Ryan Wong, Nabila Tasnim, Arjun Tyagi, Qingyuan Liu, Karan Annam, Saugata Ghose

Institution(s): University of Illinois Urbana-Champaign

Room: Northwestern

Board #: N34

Session #: 2

Abstract: Edge computing platforms are being employed for a wide range of use cases, but strict resource constraints can hamper the capabilities of such platforms. Particularly, as the machine learning model sizes and dataset footprints continue to grow, the inefficiencies of conventional computing hardware become intolerable for a range of edge applications (e.g., UAVs, XR, smart sensing). In recent years, there has been significant research on using in-memory computing (IMC; a.k.a. processing-using-memory) to deliver multi-order-of-magnitude improvements in ML inference. However, existing IMC approaches introduce a significant number of computation errors that, while tolerable for several uses of inference, are barriers to safety-critical applications and to in-the-field training. We present a summary of work going on in the ARCANA Research Group at Illinois over the last year, where three separate research thrusts are working to overcome the reliability issues of IMC to make it practical for a much broader spectrum of edge applications. First, we are developing a new hybrid IMC architecture that combines analog IMC for inference with more error-resilient Boolean IMC operations. Second, we are developing a chip-scale architecture that uses new emerging memories (ECRAM) to enable in-the-field continual learning, with a focus on SLAM. Third, we are working with device-level experts to co-design error correction techniques that can bring IMC error rates closer to those of conventional CPUs.

Author Bios:
Ryan Wong is a fifth-year Ph.D. student in computer science at the University of Illinois Urbana-Champaign. His research interests are in the broad area of computer architecture, with particular emphasis on memory and storage systems, as well as accelerators for machine learning, scientific computing and database systems. For more information, please visit his website at https://rwong.cs.illinois.edu/

Nabila Tasnim is a second-year Master’s student in computer science at the University of Illinois Urbana-Champaign. Her research lies broadly at the intersection of computer architecture and circuit design, with a particular focus on in-memory computation, hardware and system design for edge devices, and accelerators for in-device training and robotics. For more information, please visit her website at https://sites.google.com/view/nabila-tasnim/

Arjun Tyagi is a second-year Ph.D. student in computer science at the University of Illinois Urbana-Champaign. His research is on computer architecture and memory systems, with a focus on enabling efficient and viable processing-using-memory architectures using emerging memory technologies. For more information, please visit his website at https://arjuntyagi.cs.illinois.edu/

Qingyuan Liu is a second-year Master’s student in electrical and computer engineering at the University of Illinois Urbana-Champaign. His research focuses on next-generation memory systems using emerging devices, with a focus on how to enable reliable circuits and architectures for in-memory computing.

Karan Annam is a junior majoring in electrical engineering at the University of Illinois Urbana-Champaign, with a minor in computer science. His research interests are in building embedded architectures for the evaluation of emerging memory technologies and in-memory computing.

Saugata Ghose is an assistant professor in computer science at the University of Illinois Urbana-Champaign, where he leads the ARCANA Research Group. His current research interests include data-oriented computer architectures and systems, new interfaces between systems software and hardware, energy-efficient memory and storage, and architectures for emerging platforms and domains. For more information, please visit his website at https://ghose.cs.illinois.edu/

Hardware-based Kernel-Bypass Exceptions to Accelerate User-Level Services

Authors: Karl Hallsby, Liam Strand, Peter Dinda

Institution(s): Northwestern University

Room: Northwestern

Board #: N7

Session #: 2

Abstract: Delivering instruction exceptions to user-space code enables a range of services, including floating point tracing and virtualization. Unfortunately, the usual forms of such delivery, signals or even specialized kernel modules, have high latency and overhead. We propose kernel-bypass exceptions (KBEs), hardware support to deliver certain exceptions safely and directly to user-space handlers. We then describe RAFT-V, a proof-of-concept, open-source prototype that extends the SonicBOOM RISC-V core and Linux to implement KBEs. RAFT-V also implements floating point trap support, which was not previously available on RISC-V in any form. RAFT-V lowers the latency and overhead of delivering these and other instruction exceptions to a user-space handler by 30x compared to signals. We use this functionality in RISC-V ports of the open-source FPSpy and FPVM floating point tracing and virtualization tools, reducing their costs by a factor of 3x. Our changes to add KBE and floating point trap features to RISC-V only marginally increase the overall hardware footprint and do not affect its critical path. We also discuss additional existing and novel events, such as page faults and value exceptions, that could benefit from KBEs.

Author Bios:
Karl Hallsby is a Ph.D. candidate in Computer Engineering at Northwestern University advised by Dr. Peter Dinda.
Their current work focuses on hardware-software co-design, specifically the interface between processor exceptions and software's handling of them.
They can be found online at https://raven.hallsby.com.

Liam Strand is completing his Master's in Computer Science at Northwestern University, where he researches adaptive hardware performance profiling using ML-driven counter orchestration and hardware accelerator synthesis.
His work spans the full stack, from GPU architecture to Linux kernel internals to distributed systems.
In August 2026, he will join Apple as a GPU Driver Engineer on the Apple Silicon team.
He can be found online at https://liam-strand.github.io/.

Peter Dinda is a professor in the Department of Computer Science at Northwestern University, and also holds an appointment in the Department of Electrical and Computer Engineering.
He works in experimental computer systems.
You can find out more about him at pdinda.org.

Directional Pedestrian Flow Analysis Using Edge Computing: An Educational Framework for AI-Driven Spatial Analytics

Authors: Michael Cortez, Elizabeth Cardoso, Fatima Mora Garcia, Om Patel, , JD Pritle, Project Supervisor: Dr. M Papka

Institution(s): University of Illinois at Chicago, EVL, Argonne, SAGE

Room: Northwestern

Board #: N35

Session #: 2

Abstract: Edge computing has emerged as a transformative paradigm for processing data at its source the Electronic Visualization Laboratory at the University of Illinois Chicago has been exploring edge computing applications through the SAGE ecosystem, with particular focus on AI-driven spatial analytics. Traditional pedestrian tracking systems typically stream video to centralized servers for processing, raising privacy concerns and requiring substantial network bandwidth. We introduce Edge Computing in a system that tracks walking directions in real time using NVIDIA Jetson Developer Kits. By processing AI locally instead of in the cloud, the system then identifies movement but deletes all video to keep data private.

Author Bios:
Michael Cortez is a Senior Computer Science student and researcher at the University of Illinois Chicago, specializing in edge computing and spatial analytics. As a Research Assistant at the Electronic Visualization Laboratory (EVL), he develops AI-driven object detection systems on NVIDIA Jetson platforms. His research focuses on creating accessible educational frameworks for real-time machine learning applications.

Elizabeth Cardoso is an undergraduate Computer Science student at the University of Illinois Chicago and a Research Assistant at the Electronic Visualization Laboratory (EVL). Their work focuses on developing resilient edge computing infrastructure and AI-driven sensing platforms for the SAGE ecosystem. They are dedicated to bridging the gap between high-performance machine learning and real-world environmental monitoring through hardware-accelerated systems.

Fatima Mora Garcia is a senior studying Data Science and Computer Science at the University of Illinois Chicago and a first-generation college student interested in expanding access to education through technology. As a Research Assistant at the Electronic Visualization Laboratory (EVL), she works on edge AI applications using NVIDIA Jetson platforms, benchmarking models across devices to understand performance and deployment trade-offs. After graduation, she plans to pursue roles in analytics or product design.

Om Patel is a junior Computer Science student and researcher at the University of Illinois Chicago, specializing in distributed systems and edge computing. As an Undergraduate Research Assistant at the Electronic Visualization Laboratory (EVL), he builds data infrastructure and sensor telemetry platforms for projects on the SAGE testbed. His research focuses on designing lightweight, deployable systems that make real-time data accessible across distributed research networks.

JD Pirtle is a multidisciplinary artist, educator, and researcher based in Chicago, Illinois. Currently a PhD student at the University of Illinois Chicago, his practice spans traditional and new media, including creative coding, computational design, and ceramics. His academic research explores the political economy of education technology, critical pedagogy, and the integration of liberatory practices like Maker Education within contemporary learning environments.

Michael E. Papka is the Warren S. McCulloch Professor of Computer Science at the University of Illinois Chicago and Director of the Electronic Visualization Laboratory. He holds a joint appointment as a Senior Scientist and Distinguished Fellow at Argonne National Laboratory, where he leads the Argonne Leadership Computing Facility. His research focuses on high-performance computing, data visualization, and the integration of edge computing across the scientific continuum.

Leaky Language Models: Leaking Architectural and Deployment Information from Production Language Models through Timing Side Channels

Authors: Sadegh Majidi, Kazem Taram

Institution(s): Purdue University

Room: Northwestern

Board #: N12

Session #: 2

Abstract: This work presents LeakyLMs, a set of attacks that leak proprietary model, architecture, and deployment information from production language models. LeakyLMs is the first to demonstrate that key model and deployment details can be inferred using only token generation timing, even when interacting through remote APIs. LeakyLMs introduces two core attacks. The first attack targets inference optimizations and deployment strategies. For example, our attack detects whether a provider uses speculative decoding, a widely deployed inference-time optimization, and further identifies the context length of the draft model used in the pipeline. Our measurements show that Google Gemini Flash 2.5 uses speculative decoding with a draft context window of approximately 128K tokens. The second attack recovers key architectural properties, including the number of transformer layers, hidden dimension size, and number of attention heads. To achieve this, LeakyLMs builds a detailed and accurate model of token-generation timing on modern NVIDIA GPUs, characterizing how latency scales with model configuration and hardware parameters. The attack then performs a search over the architecture space using this timing model. In experiments with Llama models, the near-correct architectural configuration appears in the top-10 guesses more than 90% of the time.

Author Bios:
Sadegh Majidi is a second-year PhD student in Computer Science at Purdue University, advised by Prof. Kazem Taram. His research focuses on the intersection of computer security, machine learning systems, and modern computer architectures.

Kazem Taram is an assistant professor at Purdue University. His research interests are in computer architecture and computer security.

Parallel Batch Breadth-First Search: Framework and Applications

Authors: Letong Wang; Yan Gu; Yihan Sun

Institution(s): DePaul University; UC Riverside

Room: Northwestern

Board #: N19

Session #: 2

Abstract: Breadth-First Search (BFS) is a fundamental graph-processing tool underlying numerous applications. Many such applications require running BFS from multiple sources, which we refer to as the Multi-BFS (MBFS) problem. MBFS has been studied in various specific applications, such as distance oracles, closeness centrality, and eccentricity, among many others. Since each BFS may require traversing the entire graph, MBFS is inherently computationally expensive, requiring various forms of parallelism to improve performance. In this paper, we propose BitFluX, a general and efficient parallel MBFS framework, including its algorithms, analysis, implementations, and how it can be adapted to a broad range of applications. We leverage both thread- and bit-level parallelism, both effectively contributing to its high performance. While MBFS and its parallelization have been studied in the literature, prior works are usually tailored to specific use cases. Our work decouples the core MBFS algorithm from downstream tasks, making it a primitive that is readily extendable to multiple applications. To demonstrate its practical utility, we adapt the framework to three example applications: landmark labeling, eccentricity, and closeness centrality. Based on the framework, we also show a general theoretical analysis for MBFS. We test the performance of BitFluX using both microbenchmarks and end-to-end application comparisons against existing work. The microbenchmark results show that our performance gains compound multiplicatively compared to prior work using solely bit- or thread-level parallelism, indicating highly efficient utilization of each form of parallelism. For the applications, our general framework is competitive and often outperforms baselines tailored for these applications.

Author Bios:
Letong Wang is an Assistant Professor in the Jarvis College of Computing and Digital Media at DePaul University. She received her Ph.D. from UC Riverside and her B.S. from ShanghaiTech University, and her research focuses on parallel algorithms and systems that are efficient in practice, theoretically grounded, and broadly applicable.

Yan Gu is an Associate Professor in the Computer Science and Engineering Department at UC Riverside. He received his Ph.D. from Carnegie Mellon University and his B.S. from Tsinghua University, and his research focuses on simple and efficient algorithms with strong theoretical guarantees and practical performance.

Yihan Sun is an Associate Professor in the Computer Science and Engineering Department at UC Riverside. She received her Ph.D. from Carnegie Mellon University and her B.S. from Tsinghua University, and her research spans the theory and practice of parallel computing, with a focus on bridging theoretical guarantees and practical performance.

Evaluation of Loss Functions under Fully Homomorphic Encryption for Multiclass Classification

Authors: Sami Ouabrk, Cesar Agullo Quiros, Minxuan Zhou, Yutong Wang

Institution(s): Illinois Institute of Technology

Room: Northwestern

Board #: N3

Session #: 2

Abstract: Privacy-preserving machine learning as a service allows data owners to encrypt their data and delegate model training to an untrusted server using fully homomorphic encryption (FHE). While most prior work has focused on private inference, private training remains challenging due to the high computational cost of non-linear operations such as exponentiation and division. A key design choice is the loss function, which directly impacts both model accuracy and FHE efficiency. However, the trade-offs between square loss and cross-entropy loss in encrypted training have not been systematically studied. We present an empirical comparison of multiclass square loss and cross-entropy loss under the CKKS scheme. For square loss, we implement full-batch gradient descent with ciphertext refresh via bootstrapping. For cross-entropy, we use a normalize-and-square softmax approximation that replaces costly exponentiation and division with low-degree polynomial approximations and SIMD rotations. We evaluate accuracy, multiplicative depth, and runtime across multiple benchmark datasets. We show that square loss requires significantly less multiplicative depth but converges more slowly, whereas cross-entropy achieves higher accuracy at the cost of deeper circuits and increased computational complexity. This work provides practical guidelines for selecting loss functions for privacy-preserving training, highlighting the trade-off between accuracy, efficiency, and circuit depth.

Author Bios:
Sami Ouabrk is an AI master’s student at Illinois Institute of Technology specializing in privacy-preserving machine learning and efficient deep learning. His research spans multimodal systems, self-supervised learning, and ML systems optimization, software–hardware co-design, with a focus on building intelligent, goal-driven AI systems.

César Agulló is an AI master’s student at Illinois Institute of Technology with a background in Computer Engineering and Business Administration. His interests include artificial intelligence and its applications in real-world systems.

Minxuan Zhou is an Assistant Professor in the Department of Computer Science at Illinois Institute of Technology, where he leads the Emerging Computing Systems (ECS) Lab. His research focuses on computer architecture, software–hardware co-design, machine learning acceleration, and privacy-preserving computing.

Yutong Wang is an Assistant Professor of Computer Science at Illinois Institute of Technology and a member of the IDEAL Institute. His research focuses on machine learning theory, overparameterized models, uncertainty estimation, and privacy-preserving AI.

On the Feasibility of Time-Multiplexed Control for Surface Code Fault Tolerant Computation

Authors: Konstantinos-Nikolaos Papadopoulos, Kaitlin N. Smith, Jakub Szefer

Institution(s): Northwestern University

Room: Northwestern

Board #: N1

Session #: 2

Abstract: Scaling superconducting quantum processors to the millions of qubits required for useful quantum applications is hindered by the prohibitive number of control lines needed for qubit control and readout. Time Domain Multiplexing (TDM) mitigates this wiring overhead by allowing multiple qubits to share control lines, albeit at the cost of reduced parallelism and increased idle time. In this work, we investigate the impact of TDM on all the levels in the surface code fault tolerance stack. We analyze the effect of multiplexing on syndrome extraction, lattice surgery, and magic state distillation. We further evaluate whether multiplexed control hardware can also be used for multiplexing syndrome extraction and calibration, allowing fault-tolerant execution to continue uninterrupted across the periodic recalibration that would otherwise pause execution. Finally, during evaluation, we account for realistic demultiplexer switches with limited port-to-port isolation, which induce crosstalk on idling qubits sharing the same switch. Together, these results enable a systematic analysis of the trade-offs between wiring overhead and fault-tolerant performance, providing guidance for the design of scalable superconducting fault-tolerant quantum architectures.

Author Bios:
Konstantinos-Nikolaos Papadopoulos is a first year PhD student at Northwestern University advised by Prof. Jakub Szefer, broadly interested in quantum computing architectures and quantum computing security.

Kate Smith is an Assistant Professor of Computer Science at Northwestern University.
Within the scope of quantum computing, her research interests include system architecture, optimized compilation, context-aware error mitigation, distributed systems, high-dimensional / qudit processing, device characterization, and circuit simulation.

Jakub Szefer is an Assistant Professor of Electrical and Computer Engineering at Northwestern University and leads the Computer Architecture and Security Laboratory (CASLAB) with his research broadly focusing on computer security, with special interest in architectures and hardware for securing computer systems: from classical CPUs, GPUs, and FPGAs, to the nascent Quantum Computing systems.

Label-Flip Attack Detection via Trust-Weighted Aggregation in Federated Learning for Underground Mine Security

Authors: Md. Sazedur Rahman and Sanjay Madria

Institution(s): Missouri University of Science and Technology, Rolla, MO.

Room: Northwestern

Board #: N25

Session #: 2

Abstract: Underground mining operations are increasingly dependent on autonomous vehicles, robotic drilling systems, and intelligent in- spection platforms operating in confined, GPS-denied tunnel en- vironments. These systems rely on distributed perception models to interpret navigation cues, hazard warnings, and environmen- tal signals in real time. While centralized deep learning can en- hance model performance, transferring raw operational data across mining sites introduces serious confidentiality and security risks. Federated Learning (FL) offers a privacy-preserving alternative by enabling collaborative model training without sharing local datasets. However, deploying FL in underground mining introduces several critical challenges: (i) Training labels may be modified either mali- ciously by compromised clients or inadvertently due to harsh un- derground conditions like such as low illumination or physical dis- tortion of safety signs. Such alterations can result in label-flipping (LF) attacks, wherein corrupted semantic information propagates to the global model through poisoned local updates. (ii) Mining data is inherently heterogeneous and non-IID due to variations in tun- nel shapes or uneven terrains, making malicious updates difficult to distinguish from natural distribution shifts. (iii) Underground communication networks are bandwidth-constrained, limiting the feasibility of computationally intensive cryptographic defenses or repeated validation procedures. To address these challenges, we pro- pose TrustFed, a lightweight and unsupervised defense framework tailored for underground mining environments. TrustFed first fil- ters client updates by detecting abnormal last-layer gradient norms and then performs clustering in gradient space to identify adver- sarial behavior. Clients are assigned adaptive trust scores based on their proximity to cluster centroids, and global aggregation is conducted using a soft trust-weighted mechanism that suppresses malicious contributions while preserving informative updates. To facilitate realistic evaluation, we further introduce MineSigns, a vision-based dataset capturing 13 safety-critical signage classes under authentic tunnel conditions, including low illumination and structural irregularities. Extensive experiments on MineSigns and standard benchmarks demonstrate that TrustFed effectively miti- gates LF attacks and significantly improves global model robustness under both IID and non-IID mining scenarios.

Author Bios:
Md. Sazedur Rahman is a PhD candidate in the Computer Science Department at Missouri University of Science and Technology, where his research focuses on secure and robust federated learning for safety-critical underground mining environments. His work addresses adversarial threats such as label-flipping and unreliable clients in highly non-IID settings, with the goal of building privacy-preserving and trustworthy AI systems for real-world industrial applications.

Sanjay K Madria is a Curators’ Distinguished Professor in the Department of Computer Science at the Missouri University of Science and Technology (formerly, University of Missouri-Rolla, USA).

Reducing the "Drag" of GPU Memory Accesses with Slipstreaming

Authors: Connor Selna, Atmn Patel, Vijay Kandiah, Nikos Hardavellas

Institution(s): Unaffiliated, Nvidia, Nvidia, Northwestern University

Room: Northwestern

Board #: N11

Session #: 2

Abstract: Memory accesses have long been the primary bottleneck in compu- tation, and their performance relative to pure-compute operation grows worse year over year. This is a well-known and widely- explored problem with a rich history of mitigation strategies, the most instrumental of which is the memory hierarchy, and, as an extension, the inclusion of on-die SRAM caches. These techniques facilitate higher utilization of compute hardware in Graphics Pro- cessing Units (GPU), increasing performance. GPUs leverage their SIMT topology to coalesce loads at the warp-granularity; separate threads within the same warp performing accesses to the same cache line avoid generating duplicate requests to that line. This reduces contention in the memory system, improving compute throughput. Scheduling policies and optimizations are applied to order the execution of warps in pursuit of the same goal: maintaining high shader utilization. Through leveraging information from issued memory access instructions to inform a combined scheduling and cache replacement policy, slipstreaming encourages lock-stepped warp execution and mitigates thrashing by preventing the eviction of L1 cache lines common to contiguous warps in an SM. This optimization requires just 160KB of state (less than 1% the area of L2) and offers a 2% geomean speedup in GPU-wide IPC. The co-designed Slipstream-Clustering Loose Round Robin Scheduling and Slipstream-Clustering Least Recently Used Caching Policies emulate inter-warp coalescing, reducing expensive accesses to L2 and Global Memory structures. In this paper, we demonstrate the latent potential for coalescing in GPU workloads. We investigate the properties of opportunities for coalescing, prevent and evaluate slipstreaming, and demonstrate its low cost.

Author Bios:
Connor Selna

Connor Selna is an independent researcher focused on GPU architecture, memory-system behavior, and scheduling optimizations for massively parallel processors. He previously worked as a graduate research assistant and teaching assistant at Northwestern University, where he studied GPU architecture and optimizations. He holds an M.S. in Computer Science.

Atmn Patel
Atmn Patel is a Python Compiler Engineer at NVIDIA. He received his master’s degree in Computer Science from Northwestern University, where his research focused on architectural support for memory optimizations.

Vijay Kandiah

Vijay Kandiah is a senior compiler developer at NVIDIA in the HPC Compilers Group. He received his PhD in Computer Engineering from Northwestern University in 2023.

Nikos Hardavellas
Nikos Hardavellas is a professor of Computer Science and Computer Engineering at Northwestern University, where he directs the Parallel Architecture Group at Northwestern, PARAG@N. His research focuses on computer architecture, specifically at the intersection of computer architecture with the computer systems stack (programming languages, compilers, operating systems), memory systems, nanophotonics, energy-efficient computing and quantum computing systems.

The Hidden Cost of Storage Mismatches: Why Faster I/O Per Task Can Slow Down Entire Workflows

Authors: Meng Tang, Luanzheng Guo, Nathan R. Tallent, Anthony Kougkas, Xian-He Sun

Institution(s): Illinois Institute of Technology

Room: Northwestern

Board #: N15

Session #: 2

Deployment-Agnostic Indoor Positioning Using BLE-Based Localization in Smart Spaces

Authors: Ariunaa Tsegmed, Ahmed Khalid

Institution(s): Northeastern Illinois University

Room: Northwestern

Board #: N33

Session #: 2

Abstract: Indoor positioning remains challenging due to the absence of GPS and the variability of Bluetooth Low Energy (BLE) signals. RSSI measurements are inherently noisy and unstable, making distance-based localization methods unreliable in indoor environments. This work presents an indoor positioning system that models BLE signals as fixed-length vectors and estimates user location using similarity-based matching. Instead of converting RSSI values into distance, the system captures signal patterns across multiple beacons and compares them with precomputed fingerprint vectors corresponding to spatial grid cells. Cosine similarity is used to identify the most likely location, making the approach robust to signal variability. The system is implemented as a microservice-based architecture. Edge devices collect BLE, QR, and NFC data and send them to a central service, where positioning is computed and location-based content is delivered in real time. QR and NFC interactions provide high-confidence signals, while BLE provides continuous observations. Fingerprint vectors are generated using a signal propagation model, eliminating the need for manual data collection. The system is deployment-agnostic and can be applied to new environments by updating configuration files such as beacon positions and layout, without modifying core logic. A simulation environment is used to generate realistic user interactions and evaluate system behavior. The system provides a foundation for future work in adaptive fingerprint refinement, multi-modal data integration, and uncertainty-aware localization.

Author Bios:
Ariunaa Tsegmed is a Computer Science student at Northeastern Illinois University, focused on IoT systems, distributed architectures, and real-time data processing. Their work includes developing SmartMQTT, a real-time IoT data platform, and SmartLBS, an indoor positioning system based on BLE signal fingerprinting.

Ahmed Khaled is an Associate Professor at Northeastern Illinois University. He earned his Ph.D. in Computer Engineering from University of Florida in 2018. His research focuses on IoT, distributed systems, and data modeling and analysis.

AI for Scientific Integrity: Detecting Paper-Mill Publications Through NLP and Scholarly Metadata

Authors: Alessandra Valentina Vellucci Solari / Emil Shahbazov

Institution(s): Loyola University Chicago

Room: Northwestern

Board #: N28

Session #: 2

Abstract: The proliferation of paper mills, organizations that produce fraudulent scientific manuscripts for profit, poses a growing threat to the integrity and reliability of academic publishing. Detecting such publications remains challenging because paper-mill articles often imitate legitimate research structures, language, and citation patterns. This work explores an artificial intelligence approach to support scientific integrity by identifying potential paper-mill publications using natural language processing (NLP) and scholarly metadata. Our research is currently in progress and focuses on building a large-scale dataset combining suspicious and legitimate scientific publications. Retracted papers associated with potential paper-mill activity are collected from publicly available retraction databases, while a large set of non-retracted academic papers is being gathered through the OpenAlex snapshot API server to represent legitimate scientific output. These datasets are used to construct a binary classification task distinguishing fraudulent or suspicious publications (positive class) from legitimate research papers (negative class). To model textual patterns in scientific writing, we are fine-tuning a BERT-based transformer model to analyze article titles, abstracts, and related textual content. In addition to linguistic signals captured through NLP, selected scholarly metadata features are incorporated to capture structural patterns that may be indicative of paper-mill production. By combining transformer-based language representations with metadata analysis, this project aims to identify subtle signals of fraudulent publication practices. The long-term goal is to develop AI-assisted tools that can help editors, publishers, and research integrity investigators prioritize suspicious manuscripts for further review and strengthen safeguards for scientific publishing.

Author Bios:
I’m Alessandra Vellucci, a Master’s student in Computer Science at Loyola University Chicago, specializing in Artificial Intelligence. I work as a Graduate Research Assistant focused on AI for scientific integrity, where I study how to detect paper-mill publications using NLP and scholarly metadata. I’m passionate about building ethical AI solutions that have real-world impact.
/
My name is Emil Shahbazov. I am a Master’s student in Loyola University Chicago, majoring in Computer Science.

Improving automatic verification of NL2SQL queries with reinforcement learning

Authors: Donghyun Sohn, Junho Won

Institution(s): Northwestern University

Room: Northwestern

Board #: N14

Session #: 2

Abstract: Current work on Natural Language-to-SQL (NL2SQL) LLM models and systems largely focus on improving end-to-end generation and execution accuracy of SQL queries, as measured by popular benchmarks such as BIRD-Bench and Spider. However, execution accuracy alone is insufficient in practice because the ground-truth correctness is not always known or clear to the data analyst. In our ongoing work, we propose LLM SQL verifier as an important part of the NL2SQL workflow. At a basic level, verifiers are an LLM-as-a-judge approach to evaluate the correctness of the generated SQL queries. They use the natural language, schema, and other auxiliary information, but do not involve comparing against ground truth as in the above benchmarks. With better verifiers, we can improve the generation accuracy for NL2SQL systems, using them as selectors for multiple generated queries and thus augmenting prior multiple-generation approaches. In a different direction, they can be used to provide a confidence metric in real-life database use cases where ground truths are not known to the data analyst, which is a point typically neglected in current literature. To this end, we present methods to improve the accuracy of verifiers, in particular through reinforcement learning (GRPO). Finally, we discuss our work on going beyond single model verifiers and building agentic frameworks using verifiers to automate NL2SQL generation. We plan to further develop approaches using iterative verification and refinement for NL2SQL generation without relying on execution verification.

Author Bios:
Donghyun Sohn is a 4th-year Computer Science PhD student at Northwestern University, working with Prof. Jennie Rogers on privacy-preserving database systems. His research sits at the intersection of cryptographic protocols and database query optimization, with a particular interest in how hardware properties—memory hierarchy, thread allocation, and accelerator topology—shape the design of efficient secure analytics engines.

Junho Won is a 6th-year Mathematics PhD student at Northwestern University, working with Prof. Bao Le Hung on number theory. His research focuses on geometric structures arising in the p-adic Langlands program and connections to representation theory. Recently, he has also worked on applications of LLMs to mathematics problem solving, and database systems.

HyPIM: Accelerating Hyperbolic Machine Learning via Processing-In-Memory

Authors: Jinlin Wu, Ferit Ozdaban, Selvanathan Sadhasivam, Xiaoyang Lu, YunhuiGuo, Minxuan Zhou

Institution(s): Illinois Institute of Technology, University of Texas at Dallas

Room: Northwestern

Board #: N8

Session #: 2

Abstract: Recent research has demonstrated that hyperbolic machine learn ing, whichutilizes hyperbolic geometry to represent data and define arithmetic operations, can substantially enhance the efficiency and accuracy of machine learning tasks on hierarchical data. Unfortu nately, the existing machine learning accelerators cannot efficiently support hyperbolic operations, which require memory-intensive data shuffling and reduction operations, significantly degrading the utilization of arithmetic-intensive hardware. In this work, we propose HyPIM, a novel hardware-software co-design that exploits processing-in-memory (PIM) for accelerating memory-intensive op erations in a heterogeneous system. Our experiments on real-world hyperbolic machine learning models show that HyPIM significantly improves the performance of state-of-the-art GPUs by 3×.

Author Bios:
Jinlin Wu is a Ph.D. student in the Department of Computer Science at Illinois Institute of Technology. His research focuses on computer architecture, machine-learning systems, and hardware/software co-design for efficient AI acceleration.
Ferit Ozdaban is a student in Computer Science and Artificial Intelligence at Illinois Institute of Technology.
Selvanathan Sadhasivam is a graduate student in the Department of Computer Science at Illinois Institute of Technology.
Xiaoyang Lu is an Associate Professor in the Department of Computer Science at Illinois Institute of Technology. His research interests include computer architecture, high-performance computing, memory systems, and data-centric system design.
Yunhui Guo is an Associate Professor in the Department of Computer Science at the University of Texas at Dallas. His research interests include computer vision, machine learning, data-efficient learning, transfer learning, and geometric learning.
Minxuan Zhou is an Associate Professor in the Department of Computer Science at Illinois Institute of Technology. His research focuses on computer architecture, hardware/software co-design, emerging memory technologies, and efficient AI systems.

CASTLE: Collaborative Analytics via Secure, Trustworthy and Scalable Query Evaluation

Authors: Xiling Li, Jennie Rogers

Institution(s): Northwestern University

Room: Northwestern

Board #: N6

Session #: 2

Abstract: Businesses and the public sector are increasingly engaging in collaborative data analytics on the union of their private data. One approach is a private data federation (PDF) with secure multiparty computation (MPC). This enables them to jointly evaluate queries without divulging anything except the query answers. Whereas prior works employed maliciously secure protocols for query evaluation, they do not provide end-to-end guarantees because they allow local computation for some partitions in plaintext, without verifying that their intermediate results are correct and complete with respect to their private databases. Moreover, these systems did not scale to large volumes or numerous parties in MPC. To this end, we propose CASTLE, a collaborative analytic system with secure, trustworthy, and scalable query evaluation. First, we use zero-knowledge proofs (ZKPs) to verify the integrity of local evaluation. Then we evaluate the remaining operators under MPC. To securely bridge them, we design an authenticated protocol that verifies ZKPs' outputs, ensuring end-to-end trustworthiness from the inputs to the outputs of all data providers. In addition, we employ a dealer-aided MPC backend. The dealer generates and distributes pseudo-randomness, thereby avoiding overwhelming communication overhead and enabling scalability to more computing parties in practice. Moreover, we incorporate state-of-the-art ZK protocols to achieve linear-time verification complexity. Taken together, this system enhances scalability with respect to both data sizes and the number of nodes. Finally, we demonstrate up to 18.7X speedup over the full MPC baseline on TPC-H benchmark.

Author Bios:
Xiling Li is a Ph.D. candidate in computer science at Northwestern University advised by Dr. Jennie Rogers. Dr Jennie Rogers is an associate professor of computer science at Northwestern University. Both of them work on secure and private data management.

Security Challenges in AI-based Quantum Control and Error Correction

Authors: Anthony Etim and Jakub Szefer

Institution(s): Yale University

Room: Northwestern

Board #: N2

Session #: 2

Abstract: AI-based quantum control and error correction is increasingly used in quantum computing systems to improve multi-qubit readout discrimination and to mitigate correlated readout errors. As such, AI is becoming an integral component of today’s quantum computers’ control and readout stacks. Despite this growing role, the security implications of AI in these systems remain largely unexamined, and the variety of security attacks are currently overlooked by the vast majority of quantum computing research. This work brings to light the susceptibility of AI-based control and error correction to a variety of security attacks. In particular, AI-based classifiers can be vulnerable to physical fault-injection attacks, which can result in the generation of incorrect readout results from quantum computers. More broadly, fault-injection or side-channel attacks on AI algorithms, whether realized in software or hardware of the control stack, can cause a range of problems, from integrity violations to loss of intellectual property or exposure of sensitive data.

Author Bios:
Anthony Etim is a fifth year Ph.D. student at Yale University, advised by Professor Jakub Szefer. His research interests lie in machine learning security and hardware security, as well as attacks and defenses in machine learning accelerators and algorithms to exploit vulnerabilities and leak sensitive information.

Restructuring Workflows for Large-Scale High Energy Physics Data Analysis

Authors: Barry Sly-Delgado

Institution(s): University of Notre Dame

Room: Northwestern

Board #: N5

Session #: 2

Abstract: Workflow management systems allow users to create large-scale distributed scientific applications. Varying systems provide different paradigms for translating user-provided code or specifications into a workflow, where some systems provide libraries in languages such as Python, such as those used in the domain of high energy physics (HEP). In the HEP domain, users provide functions that represent tasks in a broader DAG. However, the initial topology of the DAG may inhibit the performance of the workflow during runtime and is often not visible to the user. Such inefficiencies incurred include increased task overhead, lack of concurrency, and susceptibility to faults due to the high granularity, which causes distribution. Notably, a graph can be restructured, producing an identical result with a more ideal topology that can eliminate these issues. This work provides the methodology for restructuring workflows via task merging, which combines tasks into single executable entity, and task reshaping via labeling, which provides insight to the underlying workflow stack of the algebraic properties of a function such as variable arity, commutativity, and associativity. enabling restructuring for better topology under a restricted high energy physics workflow stack. This is incorporated into the composition of the frameworks Dask and TaskVine. We then show improvement with real-world HEP applications in regard to runtime, task overhead, and fault-tolerance.

Author Bios:
Barry Sly-Delgado is a Ph.D. Candidate studying Computer Science at the University of Notre Dame, advised by Dr. Douglas Thain within the Cooperative Computing Lab. His current research interests broadly include Distributed Computing, Scientific Workflows, Data Management, Data Provenance, and Filesystems.

An Overview of XK.BLAS.jl: Composable and Portable Multi-GPU BLAS in Julia

Authors: Romain PEREIRA, Alexis MONTOISON, Michel SCHANEN, Swann PERARNAU

Institution(s): Mathematics and Computer Science Division (MCS, CELS) Argonne National Laboratory

Room: Northwestern

Board #: N10

Session #: 2

Abstract: Multi-GPU systems are the building blocks of modern AI and HPC clusters. The variery of vendors and hardware complexity creates a challenge in conceiving composable, performant and portable applications [1]. Macro-dataflow systems are a solution to meet this challenge, delegating the responsibility from programs to architecture-aware compilers and runtime systems. Workloads are split into tasks and distributed on available processors, implicitly overlapping computation and data transfers following a data-flow [2]. This poster presents XK.BLAS: the BLAS module of the Julia package XK.jl [3]. It provides BLAS APIs for multi-device architectures natively in Julia, portably across all three major GPU vendors (AMD, Intel, NVIDIA). XK.BLAS is built on top of the XKRT tasking runtime systems and the XKBlas library — that implements a macro-dataflow programming model [4]. We evaluate XK.BLAS primitives and its use as a multi-GPU backend for the package Krylov.jl [5] showing up to 2.8× speedup when scaling from 1 to 4 H100 GPUs. More importantly, XK.BLAS solves Krylov.jl problems that would not otherwise fit on a single GPU with no code changes at all.

Author Bios:
N/A

Exploring WebGPU as a Platform for Scalable and Portable GPU Computing: A Case Study with Set Operations

Authors: Jiaxin Lu, Landon Dyken, Sidharth Kumar

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N13

Session #: 2

Abstract: WebGPU is a cross-platform GPU programming standard that maps a single shader codebase to Vulkan, Di- rect3D 12, and Metal, reaching every major GPU vendor on both web browsers and native runtimes. However, unlike vendor- specific ecosystems that ship mature compute libraries (e.g., Thrust for CUDA), WebGPU currently lacks a standard library of reusable GPU compute primitives. As a step toward filling this gap, we implement a representative family of GPU set operations, namely intersection, union, difference, and symmetric difference, in WebGPU. Our implementation provides eight set operations (four keys-only and four key-value variants) through a single pipeline abstraction and introduces a batched execution model that processes thousands of input-array pairs in a single GPU submission. We demonstrate the efficacy of our implementation across five GPU platforms, NVIDIA RTX 3060, NVIDIA A100 (Polaris supercomputer), AMD, Intel integrated, and Apple Silicon, and three application domains: frequent itemset mining (ECLAT), SQL query execution (TPC-H Q12), and genomic Jaccard similarity. On the A100, WebGPU runs within 14% of native CUDA across all workloads; on the RTX 3060, the gap widens to 1.16×–1.60× (WebGPU slower). Our batched pipeline outperforms Thrust by up to 9.9× end-to-end on high-pair-count workloads. The same WGSL code produces correct results on all five platforms without modification, confirming that WebGPU is a viable platform for scalable and portable GPU computing.

Author Bios:
Jiaxin Lu: My focus is on data visualization and high performance computing.
Landon Dyken: His focus is on data visualization and high performance computing, specifically using applied ML, web-based tools and GPU-accelerated systems.
Sidharth Kumar: His research lies at the intersection of high-performance computing (HPC) and Data visualization. More broadly, he works on problems in parallel I/O, big data processing, scalable algorithms, scientific visualiaztion, GPUs & performance modeling.

Parallel Worst-Case Optimal Joins for Distributed-Memory Systems

Authors: Kunting Qi, Yihao Sun, Kristopher Micinski, Sidharth Kumar

Institution(s): University of Illinois Chicago, Syracuse University

Room: Northwestern

Board #: N16

Session #: 2

Abstract: Scalable logic-based analytics are frequently dominated by cyclic multi-way joins over large relational datasets. State-of-the-art systems evaluate these through chained binary joins, incurring substantial overhead from large intermediate relations that must be materialized and redistributed across the network. Worst-Case Optimal Joins (WCOJ) offer a provably superior alternative by evaluating joins one variable at a time, eliminating intermediate blowup. However, all existing WCOJ implementations target shared-memory architectures, making distributed-memory execution an open problem. We present two contributions toward parallelizing WCOJ on distributed HPC systems: a hypercube partitioning scheme that hash-partitions relations along variable-aligned grid dimensions and selectively replicates them along missing dimensions to maximize locality, and an iterative semi-naive execution pipeline that supports recursive fixpoint computation via MPI collective redistribution of newly derived tuples. We evaluate our system on leadership class supercomputers Aurora and Polaris at ANL using graph mining and program analysis benchmarks, demonstrating strong scalability and outperforming state-of-the-art engines. This is the first truly scalable distributed WCOJ framework.

Author Bios:
Kunting Qi is a Ph.D. student in Computer Science at the University of Illinois Chicago. His research interests include high-performance computing, data science, and scalable distributed systems, with current work on distributed Datalog, worst-case optimal joins, and MPI-based execution on HPC platforms.

Yihao Sun is a Ph.D. candidate at Syracuse University working on databases, programming language theory, high-performance computing, and GPGPU systems. His research focuses on scalable Datalog execution and GPU-accelerated deductive reasoning systems.

Kristopher Micinski is an assistant professor in Electrical Engineering and Computer Science at Syracuse University. His research focuses on programming languages, program analysis, formal methods, security, and scalable logical reasoning systems for code analysis, analytic reasoning, and symbolic AI.

Thomas Gilray is an Associate Professor of Computer Science at Washington State University. His research focuses on scalable systems for reasoning about code, including program analysis, verification, declarative logic programming, Datalog, and high-performance logic solving on parallel systems and GPUs.

Sidharth Kumar is an associate professor of Computer Science at the University of Illinois Chicago. His research lies at the intersection of high-performance computing, data visualization, big data processing, scalable algorithms, GPUs, and performance modeling.

SafeNet: A Neural-Symbolic Network for Safe Planning in Robotic Systems using Formal Method-Guided LLM Fine-Tuning

Authors: Jialiang Fan, Fanxin Kong

Institution(s): University of Notre Dame

Room: Northwestern

Board #: N18

Session #: 2

Abstract: Robotic systems present unique safety challenges due to their complex integration of computational and physical processes and direct interaction with humans and environments. Traditional approaches to robot safety planning either rely on conventional methods, which struggle with the complexity of modern robotic systems, or on pure machine learning techniques, which lack formal safety guarantees. While recent advances in Large Language Models (LLMs) offer promising capabilities, pre-trained LLMs alone lack the specific domain expertise required for effective robotic safety planning. This paper introduces SafeNet, a novel neural-symbolic network architecture that enhances LLMs' safety planning capabilities through formal method-guided fine-tuning for robotic applications. Our approach integrates formal logical knowledge and reward machines into pre-trained LLMs by carefully designed fine-tuning, creating a neural-symbolic approach that combines the flexibility of neural networks with the precision of formal methods for robot trajectory generation and task planning. Experimental results demonstrate significant improvements in safe trajectory generation for robotic systems, with planning success rates increasing from 1.17\% to 91.60\% for the block manipulation task and from 7.23\% to 90.63\% for the robotic path planning task.

Author Bios:
Jialiang Fan, Jialiang Fan is a second-year Ph.D. student in Computer Science and Engineering at the University of Notre Dame, advised by Prof. Fanxin Kong. His research focuses on safe and trustworthy AI, including LLM post-training (SFT, RLHF, GRPO), Vision-Language-Action models, reinforcement learning, and formal-method-guided task planning for robotic and cyber-physical systems.

Fanxin Kong, Fanxin Kong is a tenure-track Assistant Professor in the Department of Computer Science and Engineering at the University of Notre Dame, working on assured and intelligent cyber-physical robotic systems across embodied AI, security, and safety. He received his Ph.D. from McGill University and was a postdoctoral researcher at the PRECISE Center, University of Pennsylvania. He is a recipient of the 2025 ACM SIGBED Early Career Researcher Award and the NSF CAREER Award.

Fault Injection Attacks and Countermeasures on TinyML Algorithms

Authors: Anthony Etim, Srilalith Nampally, Aubtin Rasouli, Dustin Mazza, Krishna Chilakapati, Tinghung Chiu, Ferhat Erata, Leyla Nazhandali, Wenjie Xiong and Jakub Szefer

Institution(s): Yale University

Room: Northwestern

Board #: N20

Session #: 2

Abstract: Tiny Machine Learning (TinyML) algorithms, designed to operate on constrained devices such as those found in Internet of Things (IoT) systems, are vulnerable to adversarial threats, including fault injection attacks. These attacks exploit physical means to induce errors in computation, compromising the reliability of the device and the TinyML models running on it. This work investigates the operation of TinyML models under fault injection attacks. Through systematic experimentation with voltage glitching and EM fault injection attacks on microcontrollers, this work identifies configurations that adversaries can exploit to induce faults without triggering a system reset, thus focusing on finding the more stealthy attacks. This study analyzes four types of TinyML models, and demonstrates that all four evaluated TinyML models will generate inference outputs with reduced accuracy under the two types of fault injection attacks. Further, in some instances, this work shows that it may be feasible for the attackers to use the faults to cause inference operations to return a predictable output, not just random incorrect inference results. This highlights the need for more robust fault injection protection mechanisms in TinyML implementations. In order to provide one such protection, this work demonstrates the use of Randomized Self-Reduction (RSR) schemes and majority voting for intermediate values as a means to protect the TinyML models.

AI-driven Moving Target Defense

Authors: Qurat ul ain Khalid , Li Wang

Institution(s): Northeastern Illinois University

Room: Northwestern

Board #: N22

Session #: 2

Abstract: Moving Target Defense (MTD) is a proactive security paradigm designed to increase uncertainty and complexity for attackers by constantly changing a system’s attack surface. Despite its potential, deploying MTD in complex computation environments remains challenging due to the manual overhead required for strategy formulation and the risk of operational disruption. As large language models (LLMs) rapid advance, it has shown positive results in many security applications. This paper introduces a novel framework, AI-driven MTD, which leverages the reasoning and generative capabilities of LLMs to automate the lifecycle of MTD strategies. Our approach utilizes LLMs to interpret security requirements and automatically generate executable configurations for diverse infrastructure layers. Specifically, the framework orchestrates dynamic network topology restructuring, automated firewall rule reconfiguration, and adaptive load balancing adjustments. By bridging the gap between high-level defensive intent and low-level system execution, our method enables an autonomous transition among different MTD strategies. Preliminary results indicate that LLM-driven orchestration significantly reduces the window of opportunity for adversaries while minimizing the manual burden on security administrators. This research represents a critical step toward autonomous, self-evolving cyber defenses, demonstrating the viability of LLMs in managing complex, real-time infrastructure mutations within productive environments

Author Bios:
Qurat-ul-Ain Khalid is a graduate student in Computer Science at Northeastern Illinois University with a strong background in software quality assurance and project management. She has over 1.5 years of industry experience in software testing and more than three years of teaching experience in computer science and project management. She is currently working on an LLM-driven Moving Target Defense (GEN-MTD) framework, focusing on automated strategy generation, verification, and optimization for enhancing network security and resilience. Her research interests include software reliability, cybersecurity, and AI-assisted system evaluation.

Vulnerability Exploration of Safe Reinforcement Learning in Cyber-Physical Systems via STL Mining

Authors: Jialiang Fan, Shixiong Jiang, Mengyu Liu, Fanxin Kong

Institution(s): University of Notre Dame

Room: Northwestern

Board #: N23

Session #: 2

Abstract: Safe Reinforcement Learning (safe RL) has been widely used in safety-critical cyber-physical systems (CPS) to achieve task goals while satisfying safety constraints. Analyzing vulnerabilities that can be exploited to violate safety (i.e., safety-violated vulnerabilities) is crucial for understanding and improving the robustness of safe RL policies in CPS. However, existing works are inadequate for addressing such vulnerabilities, as they either focus on vulnerabilities that merely degrade task performance (rather than causing safety violations) or rely on strong assumptions about an adversary’s capability (e.g., requiring explicit knowledge of the safety constraints). This paper aims to bridge this gap by studying safety-violated vulnerabilities of safe RL in CPS without requiring prior knowledge of the underlying safety constraints. To this end, we propose a novel adversarial framework based on Signal Temporal Logic (STL) mining. The framework first mines STL formulas to uncover the implicit safety constraints of a safe RL policy, and then synthesizes perturbation attacks that violate these constraints. The generated attacks can effectively and efficiently induce safety violations by adapting perturbations and identifying critical time intervals for applying them. We conduct extensive experiments across multiple CPS environments, and the results demonstrate the effectiveness and efficiency of our method.

Author Bios:
Jialiang Fan, CSE PhD Student | LLM Post-Training, Reinforcement Learning, Cyber-Physical Systems, AI Safety. Shixiong Jiang, CSE PhD Student, Reinforcement Learning, Segmentation Foundation Model, Cyber-Physical Systems, AI Safety. Mengyu Liu, assistant professor at Washington State University Tri-Cities. I obtained my Ph.D. from University of Notre Dame in 2025 under the advice of Professor Fanxin Kong. Fanxin Kong, Dr. Fanxin Kong is a tenure-track assistant professor in the Department of Computer Science and Engineering at University of Notre Dame. Before that, he worked as a tenure-track assistant professor at Syracuse University and as a postdoctoral researcher with Prof. Insup Lee in the PRECISE Center at University of Pennsylvania. He obtained his Ph.D. in Computer Science at McGill University advised by Prof. Xue Liu.

Multi-Tenant Framework for Collective Communication Scheduling in Shared Clusters

Authors: Animesh Saxena, Ratul Sikder, Balajee Vamanan

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N24

Session #: 2

Abstract: Modern AI workloads are increasingly trained on clusters with hundreds to thousands of accelerators, such as GPUs. These accelerators are interconnected by high-speed fabrics, and a significant portion of training time is spent on communication rather than computation, synchronizing gradients across devices through collective operations such as All-Reduce, All-Gather, and Reduce-Scatter. Current collective communication libraries for distributed machine learning remain bound by a fixed set of predefined collective algorithms. Such constraints severely limit the potential for communication optimization, particularly in modern clusters with increasingly heterogeneous and asymmetric network topologies. We use a data structure called the Time Expanded Network (TEN), which represents communication by unfolding the physical topology into discrete time steps and connecting nodes across successive time steps with edges that represent bandwidth and latency. In this framework, TEN serves as the basis for the synthesis of topology-aware collective schedules: it supports the decision-making about when a chunk can move, which path it can take, and which transfers can proceed concurrently. Beyond the existing single-application research, our framework is being extended to support multi-tenant workloads in which multiple jobs concurrently share the network. This makes it a practical platform for studying collective communication not only in isolation, but also under the concurrent conditions common in real shared AI clusters.

Author Bios:
Animesh Saxena, PhD Student, with interests in distributed machine learning systems with a focus on collective communication optimization.

Ratul Sikder, PhD student in Computer Science at University of Illinois Chicago, specializing in Networks and Systems.

Balajee Vamanan, Associate Professor in the Department of Computer Science at University of Illinois at Chicago.

NOMAD: LLM-Guided Indexing for Non-Metric Time-Series Search

Authors: Zheng Zhang, Andrew Crotty

Institution(s): Northwestern University

Room: Northwestern

Board #: N26

Session #: 2

Abstract: We present NOMAD, a framework for scalable time-series retrieval under arbitrary, potentially non-metric distance functions. Rather than assuming a fixed embedding space or metric index, NOMAD treats the user-provided distance function F as the source of task-specific similarity. Given F's implementation and documentation, an LLM helps infer symbolic structure, invariances, sensitivities, and candidate conversion logic for mapping raw time series into a controlled language representation. These candidates are evaluated against F through an iterative calibration loop, where retrieval failures guide refinement until the representation sufficiently preserves F-neighborhoods. Search then proceeds in two stages: efficient candidate recall over the calibrated representation, followed by exact reranking with F. This design offers a general path toward indexing domain-specific distance functions while keeping F as the final oracle for correctness.

Author Bios:
Zheng Zhang is a 5th year Ph.D. candidate in Computer Science at Northwestern University, advised by Andrew Crotty. His research focuses on database systems, indexing, and AI-assisted data exploration.

Andrew Crotty is an Assistant Professor of Computer Science at Northwestern University. His research focuses on building systems for data management, data science, and machine learning.

Personalized Detection of Motor Symptoms in Parkinson’s Disease with Federated Learning

Authors: Rijul Sareen, Nathaniel Hudson

Institution(s): Illinois Institute of Technology, Argonne National Laboratory

Room: Northwestern

Board #: N29

Session #: 2

Abstract: Parkinson’s disease (PD) severity assessment tra- ditionally relies on clinical visits, while continuous, privacy- preserving monitoring remains an open challenge. Federated Learning (FL) addresses this by training a shared global model across distributed devices, where each device trains on its own data and shares only model updates, never raw recordings. This work investigates FL for PD severity estimation using wearable force- plate gait signals from the PhysioNet Gait in Parkinson’s Disease dataset, which contains recordings from 93 PD patients and 73 healthy controls walking at self-selected pace on level ground. Sixteen force sensors beneath each foot captured vertical ground reaction forces at 100 Hz, producing variable-length multivariate time series per subject. We treat each subject as an isolated client, a naturally non-IID setting reflecting real-world deployment where patient recordings never leave the individual device, and train an LSTM regression model using the Flower framework with FedAvg aggregation. We present results demonstrating that meaningful severity estimation is achievable under these privacy-preserving constraints. As a secondary effort, we explored voice-based severity estimation using four aggregated public datasets, which surfaced fundamental challenges in federation design stemming from the absence of natural client boundaries and label scarcity.

Author Bios:
Nathaniel Hudson:
Nathaniel Hudson is an Assistant Professor of Computer Science at the Illinois Institute of Technology in the Department of Computer Science. His research studies the design of systems for serving AI on edge computing infrastructure — i.e., Edge Intelligence (EI) — for smart city applications.

Rijul Sareen:
Rijul Sareen is an undergraduate student in Computer Science at the Illinois Institute of Technology. His research focuses on personalized detection of motor symptoms in Parkinson's disease using federated learning, under the mentorship of Professor Nathaniel Hudson.

Room: Lake

Multi-Node Inertial Motion Capture Device

Authors: Ananya Kankane

Institution(s): Illinois Institute of Technology

Room: Lake

Board #: L20

Session #: 2

Abstract: Continuous kinematic monitoring is essential for managing post-stroke hemiparesis, epilepsy, and geriatric mobility. However, existing solutions rely on centralized, smartphone-dependent architectures that suffer from packet loss and temporal de-synchronization. This research introduces a decentralized, dual-node STM32WB55 system designed to capture high-fidelity, synchronized 9-DOF data, including orientation quaternions and tri-axial acceleration, at a deterministic 100Hz sampling rate. To ensure 100% data integrity, the system bypasses wireless streaming latency by logging data directly to 16GB Industrial MicroSD cards via a high-speed SPI interface. Optimized for clinical utility, each node is powered by a 500mAh Li-Po battery, supporting a 10-hour operational cycle that matches standard daily observation windows. The hardware is integrated into a compact wearable enclosure, maximizing patient compliance in real-world environments. System accuracy is verified through a 2-DOF robotic validation rig utilizing SG92R servos, targeting an orientation error less than <2.5% against mechanical ground truth. This architecture provides a scalable foundation for precise, longitudinal biomechanical assessment, bridging the gap between laboratory-grade precision and ambulatory freedom.

Author Bios:
Ananya Kankane is an Autonomous Systems and Robotics Maters student at the Illinois Institute of Technology with a focus on motion tracking and wearable technology. She specializes in developing low-latency data pipelines and multi-node inertial systems using STM32 microcontrollers to bridge the gap between human kinetics and real-time gesture recognition. Her work integrates custom PCB design and advanced sensor fusion to create ergonomic, assistive solutions for motor-impaired individuals.

AgentRecall: Provenance and Rewind for Multi-Agent Workflows

Authors: Keith Bateman, Xian-He Sun, Anthony Kougkas

Institution(s): Illinois Institute of Technology

Room: Lake

Board #: L18

Session #: 2

Abstract: Complex scientific tasks have increasingly become automated through the utilization of agents. This leads to complex multi-agent systems where agent reasoning, tool calls, computation, and more are interleaved to solve a problem. In spite of this trend, a recent taxonomy of 1,642 multi-agent traces reports failure rates of 41 to 87 percent, with 37 percent of failures caused by inter-agent misalignment. Some provenance systems have been created to record the lineage of agentic workflows, but they are intended as observers and cannot intervene; recovery tools exist to rewind state but lack the knowledge provided by formal provenance models. There is no system that unifies both capabilities for multi-agent settings. We present AgentRecall, a provenance and rewind system that captures agent interactions, maintains multi-agent context histories, and enables cost-aware, consistent rewind across multiple collaborating agents. AgentRecall contributes (1) a multi-interface provenance collection framework spanning heterogeneous stores and multi-level agent pipelines, (2) a cost-aware recovery mechanism enabling consistent multi-agent context rewind and targeted error correction while reducing unnecessary re-execution, and (3) an annotated context history store supporting automated lineage reasoning and manual inspection of agent behavior. Preliminary evaluations of AgentRecall on benchmarks such as Kramabench and ParEval demonstrate that rollback can improve correctness by 50% over non-rewind baselines.

Author Bios:
Keith Bateman is a PhD Candidate in the Department of Computer Science at the Illinois Institute of Technology and a member of the Gnosis Research Center. Keith Bateman works under the guidance of Dr. Xian-He Sun and Dr. Anthony Kougkas. His primary areas of research include HPC, Distributed Storage, and Task Systems.

Dr. Xian-He Sun is the Director of the Gnosis Research Center and an IEEE fellow. He is a University Distinguished Professor and the Ron Hochsprung Endowed Chair of Computer Science at Illinois Institute of Technology. His current research interests include parallel and distributed processing, memory and I/O systems, software system for Big Data applications, and performance evaluation and optimization.

Dr. Anthony Kougkas is the Deputy Director of the Gnosis Research Center at the Illinois Institute of Technology. With a Ph.D. in Computer Science, he is committed to solving data management and I/O challenges in extreme-scale distributed applications. His research has led to advancements in multi-tiered storage systems, data prefetching, replication, compression, and in-transit/in-situ techniques, resulting in performance improvements for HPC, Cloud, and AI workloads.

AQNSA: Agentic Quantum Network Simulator Arena

Authors: Pablo Cesar Bedolla Ortiz, Joaquin Chung, Rajkumar Kettimuthu

Institution(s): Dominican University, Argonne National Laboratory

Room: Lake

Board #: L3

Session #: 2

Abstract: Advances in artificial intelligence (AI), particularly large language models (LLMs), have demonstrated effectiveness in technical tasks such as software engineering and code generation. However, their direct application to scientific experiments remain challenging, specifically quantum networks, where strict formalism and complex reasoning is demanded. We introduce Agentic Quantum Network Simulator Arena (AQNSA), a framework designed to replicate and extend quantum networking research using AI agents. Given an experimental prompt, AQNSA orchestrates a multi-agent workflow in which a research agent develops a structured experiment plan and an engineering agent translates that plan into executable simulations using the Simulator of QUantum Network Communication (SeQUeNCe) framework. To ground agent reasoning and support planning and execution, we construct a knowledge graph from the cited materials of eight quantum networking studies selected for replication. Domain-expert graders evaluate agent reasoning traces, simulations and results. Our results investigate the feasibility agentic research workflows for scientific experiments in quantum networks.

Author Bios:
Pablo Cesar Bedolla Ortiz (pablobedolla.com) is a visiting student researcher at Argonne National Laboratory and a senior Mathematics undergraduate at Dominican University, with plans to continue at the Georgia Institute of Technology in the M.S. in Computer Science program. Through four internships at NASA's Jet Propulsion Laboratory and his ongoing research at Argonne, his work has spanned satellite data processing, AI-driven mission control tooling, and AI systems for scientific research.

Joaquin Chung is a research scientist at the Data Science and Learning Division of Argonne National Laboratory. His work spans diverse areas that go from architecting of future quantum networked systems to designing systems for enabling memory-to-memory data streaming between federated instruments.

Rajkumar Kettimuthu is a Senior Computer Scientist at Argonne National Laboratory, a Senior Scientist at UChicago CASE, and a Senior Fellow at NAISE. His research focuses on networking and AI/ML for science, with current emphasis on quantum networks and AI-driven scientific workflows. He leads major federal programs in heterogeneous quantum networking and has co-developed widely deployed scientific software including Globus GridFTP and the SeQUeNCe quantum network simulator.

Reducing Energy Cost of On-Device LLM Inference via DVFS and Adaptive Decoding

Authors: Weisi Yang, Stephen Xia

Institution(s): Northwestern University

Room: Lake

Board #: L14

Session #: 2

Abstract: Deploying large language models (LLMs) on edge devices enables privacy and low-latency applications, but remains challenging due to limited compute capacity and strict thermal constraints. Existing power management approaches, such as dynamic voltage and frequency scaling (DVFS), primarily operate at the hardware level and often face a fundamental trade-off between performance and energy efficiency. In this work, we present PELM, a system that jointly optimizes hardware control and LLM decoding behavior for energy-efficient on-device inference. Our key observation is that not all tokens require full-depth execution to maintain high output quality. Building on this insight, PELM expands the optimization space beyond frequency scaling by incorporating self-speculative decoding and adaptive verification depth, enabling dynamic control over both computation and hardware usage. To manage this multi-dimensional design space, PELM employs a lightweight reinforcement learning-based controller that continuously adjusts processor frequencies and decoding parameters at runtime, adapting to workload characteristics and thermal conditions. We implement PELM on NVIDIA Jetson edge platforms and evaluate it across diverse workloads. Results show that PELM reduces energy consumption by up to 52.4% while achieving up to 23.1% speedup compared to existing DVFS-based approaches, without significant degradation in task performance. These findings highlight the importance of cross-layer co-design for efficient LLM deployment on resource-constrained devices.

Author Bios:
Weisi Yang is a Ph.D. student in Computer Engineering at Northwestern University. His research interests lie in computer systems and machine learning systems, with a focus on efficient, adaptive, and reliable AI systems for resource-constrained and real-world computing environments.

Stephen Xia is an Assistant Professor of Electrical and Computer Engineering (with a courtesy appointment in Computer Science) at Northwestern University. His research lies at the intersection of embedded systems, artificial intelligence, and cyber-physical systems, with a focus on enabling intelligent, resource-efficient sensing and computing in everyday environments.

UpDown: A Supercomputer Co-designed for Scalable Graph Processing

Authors: Andrew A. Chien, Charles Colley, Jianru Ding, Alexander Fell, David F. Gleich, Henry Hoffmann, Moubarak Jeje, Rajat Khandelwal, Yanjing Li, Jose M. Monsalve Diaz, Marziyeh Nourian, Andronicus Rajasukumar, Jiya Su, Tianshuo Su, Yuqing (Ivy) Wang, Wenyi Wang, Ruiqi Xu, Tianchi Zhang

Institution(s): University of Chicago, Argonne National Laboratory, Tactical Computing Laboratories, Purdue University

Room: Lake

Board #: L12

Session #: 2

Abstract: Traditional supercomputers have focused on dense computation performance as exemplified by HPL. Graph processing applications differ with extreme irregularity (10^9 imbalance in skewed, real-world graphs) that produces unpredictable work, parallelism, memory access, and communication. Together, these make scalable performance and programming difficult. We describe the UpDown system architecture, co-designed for irregular graph computations. UpDown provides efficient fine- grained thread invocations (∼10 instructions), direct messaging (no network interface card) for scalable local and global messaging, and split-transaction memory operations that enable extremely high memory bandwidth. Combined with architectural support for global addressing and an aggressive network design, these UpDown features enable direct exploitation of edge and vertex parallelism, using it to deliver breakthrough graph processing performance and programmability. We evaluate the performance of the UpDown system using a challenging suite of graph applications (Pagerank, Breadth-first Search, Triangle Counting, Partial Match, etc). For a single-node, results show a 100-fold performance advantage over multicore CPUs. Compared to today’s fastest scalable parallel computers, UpDown achieves 1000-fold performance increases. UpDown delivers these levels of performance with high-level programmability; these programs directly express vertex-edge parallelism, which UpDown exploits directly in hardware.

Author Bios:
Andrew A. Chien received the BS, MS, and PhD degrees from the Massachusetts Institute of Technology. He is the William Eckhardt Distinguished Service Professor in Computer Science at the University of Chicago and a senior computer scientist at Argonne National Laboratory. He has broad interests in computer systems and renewable energy. Chien served as Vice President of Research of Intel Corporation (2005–10), the SAIC Professor in Computer Science at the University of California, San Diego (1998–2005), and professor of computer science at the University of Illinois (1990–98). He is a fellow of the ACM, IEEE, and AAAS.

Charles Colley received his BS/MS degrees from Tufts University and his PhD from Purdue University. His research interests are in numerical linear algebra, spectral graph theory, and parallel algorithms. He is currently a research scientist at Instagram.

Jianru Ding is a Ph.D. candidate at the University of Chicago and received his B.S. from The Ohio State University. His research focuses on adaptive runtimes for scalable distributed systems, bridging application-level optimizations with underlying hardware characteristics. He is particularly interested in improving the performance and efficiency of large-scale machine learning workloads.

Alexander Fell received his PhD from the Indian Institute of Science, Bangalore, India, and during his career, he contributed to the RISC-V architecture and LLVM compiler development at the Barcelona Supercomputing Center and Nanyang Technological University, Singapore. Now, he is a Senior Research Specialist at the University of Chicago, focusing on the compiler and simulator for the UpDown architecture.

David F. Gleich received the BS degree from Harvey Mudd College and the PhD degree from Stanford University. He is a Professor of Computer Science at Purdue University and a University Faculty Scholar. His research is on data-driven scientific computing, matrix computations, network and graph algorithms, and parallel and distributed computing. He is a fellow of SIAM.

Henry Hoffmann received the BS degree from UNC-Chapel Hill and the SM and PhD degrees from MIT. He is the Liew Family Chair of the Department of Computer Science at the University of Chicago. His research includes adaptive and self-aware computing systems.

Moubarak Jeje received his BS and MS degrees from the University of Wisconsin–Madison. He currently works as a Senior Research Engineer at Tactical Computing Laboratories (2023–2026). His expertise includes FPGA simulation, hardware-software co-design, design verification, RTL microarchitecture, and the RISC-V ISA.

Rajat Khandelwal is a PhD student at the University of Chicago. He received his B.Tech from the Indian Institute of Technology BHU, India. He worked as a System Software Developer at Intel (2020–2023) on USB, Type-C, and Thunderbolt subsystems. His current research spans systems and architecture, focusing on networking and high-performance data movement.

Yanjing Li received the BS and MS degrees from Carnegie Mellon University and a PhD from Stanford. She is an Associate Professor of the Department of Electrical and Computer Engineering at Northeastern University. Her research includes intersections of AI and systems, computing architecture for emerging technologies and applications, and hardware security.

Jose M. Monsalve Diaz is a Member of Technical Staff at AMD. He received his PhD and MS in Electrical and Computer Engineering from the University of Delaware and his BSc in Electronics Engineering from Pontificia Universidad Javeriana, Colombia. He was a Postdoctoral Researcher at Argonne National Laboratory. His research interests include runtime systems, compilers, and heterogeneous computing.

Marziyeh Nourian received her Ph.D. in Computer Engineering from North Carolina State University. Her work addresses challenges in high-performance computing, focusing on data transformation, scalable execution of large-scale workloads, and hardware–software co-design. Her expertise includes parallel computing, heterogeneous and reconfigurable architectures, and compiler and runtime techniques.

Andronicus Rajasukumar is a PhD candidate at the University of Chicago. He received his BE from the College of Engineering Guindy, India, and MTech from the Indian Institute of Technology, Delhi. He worked as a Senior Staff Engineer at Qualcomm (2014–20) and Intel (2010–14) in 3D-graphics GPU pre-silicon performance. His current research spans specialized and general-purpose computer architectures and processing near memory.

Jiya Su is a Ph.D. student in Computer Science at the University of Chicago. She received the B.S. degree from Renmin University of China in 2020 and the M.S. degree from the Illinois Institute of Technology in 2023. Her research focuses on graph processing, parallel computing, and computer architecture.

Tianshuo Su received his BS from the University of Wisconsin–Madison and his MS from Georgia Institute of Technology. His research interests include computer architecture and high-performance computing. He was a senior engineer at Qualcomm working on DSP performance. Currently, he works at Google in cloud infrastructure efficiency.

Yuqing (Ivy) Wang is a computer science PhD student at the University of Chicago. She received her BSci degree from the University of Edinburgh. Her research spans parallel programming, operating systems, and unified memory models for novel accelerator systems.

Wenyi Wang is a PhD student at the University of Chicago. He earned his B.E. from Northeastern University, China, and his M.S. from Northwestern University, USA. His research focuses on high-performance computing and parallel computing.

Ruiqi Xu received the BS and MS degrees from Northwestern University in 2023. He is currently working toward a PhD degree at the University of Chicago. His research interests include computer architecture and parallel programming.

Tianchi Zhang received the B.S. degree from the University of Michigan and Shanghai Jiao Tong University in 2020 and the M.S. degree from the University of Michigan in 2023. He is currently a Ph.D. student at the University of Chicago. His research interest is computer architecture.

Multi-Node Inertial Motion Capture Device

Authors: Ananya Kankane

Institution(s): Illinois Institute of Technology

Room: Lake

Board #: L20

Session #: 2

Evaluating I/O Behavior of ATLAS Derivations Across Container Runtimes on HPC Platforms

Authors: Wesley Kwiecinski

Institution(s): University of Illinois Chicago

Room: Lake

Board #: L8

Session #: 2

Abstract: The ATLAS project is adopting container technologies to improve workflow portability and reproducibility. However, this causes concern about I/O performance in derivation workflows on HPC systems, where inefficient I/O can affect both ATLAS and other users. To enable reliable production deployment on HPC resources, we systematically study I/O behavior in ATLAS derivations across container runtimes supported by ATLAS, including Podman, Shifter, and Apptainer, and compare them against native execution. We conduct experiments on both CERN computing resources and the Perlmutter supercomputer at NERSC. Our study finds that containerization can generate distinct, runtime-dependent I/O performance differences in ATLAS derivations, particularly related to shared file system scalability and bottlenecks. We identify which container runtimes degrade or maintain I/O efficiency when compared to native execution. These outcomes provide practical guidance for deploying ATLAS derivations on HPC systems to secure efficient and fair use of shared storage infrastructure.

Author Bios:
Wesley is a CS Masters student at the University of Illinois Chicago. His primary research focus is in high performance computing. His research interests are HPC, data visualization, and computational physics.

qMEMO: Quantum-Resistant Proof-of-Space Blockchain System

Authors: Likitha Shankar, Venkata Harsha Pedada, Ioan Raicu

Institution(s): Illinois Institute of Technology

Room: Lake

Board #: L2

Session #: 2

Abstract: Shor's algorithm threatens every blockchain that relies on ECDSA signatures for transaction authentication. NIST has standardized post-quantum digital signature schemes, ML-DSA (Dilithium) and Falcon, but their cost in a real blockchain pipeline, where signatures must be serialized, transmitted, batch-verified, and committed under strict block deadlines, remains poorly understood. Most existing studies measure isolated cryptographic primitives, not their interaction with consensus and networking layers. We integrated Falcon-512 and ML-DSA-44 into the MEMO Proof-of-Space blockchain, replacing fixed-size Ed25519 transactions with a unified crypto_backend abstraction that supports compile-time scheme selection and runtime hybrid operation, where validators using different signature schemes coexist on the same network. Variable-length signatures (64 to 2,420 bytes) and public keys (32 to 1,312 bytes) flow end-to-end through wallet, transaction pool, OpenMP-parallel batch verification, block construction, and ledger commitment. We conducted controlled experiments on Chameleon Cloud across multiple transaction batch sizes. At 500 transactions, Ed25519, Falcon-512, and ML-DSA-44 deliver statistically equivalent end-to-end throughput (1,400 to 2,600 TPS), confirming the pipeline is network-bound rather than verify-bound at this scale. At 1,000 transactions, Falcon-512 outperforms ML-DSA-44 by 68% (2,572 vs. 1,533 TPS) despite ML-DSA's 2x faster raw verification speed. Signature size differences drive this divergence: Ed25519 at 64 bytes, Falcon-512 at 700 bytes, and ML-DSA-44 at 2,400 bytes, with the latter saturating serialization and ZMQ transfer bandwidth. Our results demonstrate that for deployed blockchains, signature size, not verification speed, is the dominant cost of post-quantum migration.

Author Bios:
Likitha Shankar: Likitha is a final years Master' student in Computer Science at Illinois Institute of Technology, graduating in May 2026. Her work on the qMEMO project focuses on integrating NIST-standardized post-quantum signature schemes, specifically Falcon-512 and ML-DSA-44, into the MEMO Proof-of-Space blockchain and benchmarking their end-to-end performance against classical baselines. She brings prior production engineering experience from AR Systems Co., Ltd in Tokyo.
Venkata Harsha Pedada: Harsha is a first-year PhD student in Computer Science at Illinois Institute of Technology, advised by Prof. Ioan Raicu at the DataSys Lab. He holds an M.S. in Computer Science (2024) and has industry experience as a Data Engineer at a venture-backed startup. His research interests include distributed systems, blockchain consensus mechanisms, and proof-of-space protocols, with a focus on designing high-throughput decentralized systems.
Dr. Ioan Raicu: Dr. Ioan Raicu is an Associate Professor in the Department of Computer Science at Illinois Institute of Technology and guest research faculty at Argonne National Laboratory. He is the founder and director of the Data-Intensive Distributed Systems Laboratory (DataSys Lab). His research focuses on distributed systems, many-task computing, and data-intensive computing at extreme scales, with recognition including the IEEE TCSC Young Achievers in Scalable Computing award and an NSF CAREER award.

Towards Automation and Reproducibility for Research Data Meshes

Authors: Michael Lukowski, Matthew West, Robtert L. Grossman

Institution(s): University of Chicago

Room: Lake

Board #: L15

Session #: 2

Abstract: A research data mesh is a data platform consisting of collection of data commons, data repositories, cloud-based computational resources, and other cloud-based resources ("data nodes") that interoperate using a common set of core software services and is designed to accelerate research by supporting the management, exploration, analysis and sharing of data. In this work we present Mesh and Node cards, which are JSON documents that are designed to automate the set up and maintenance of research data meshes. Mesh Cards provide a machine readable advertisement to data nodes of valid standards that can be used for interoperation, whereas node cards provide API specific details for data meshes to use for aggregation. We also explore the use of publish-subscribe mechanisms within the data mesh ecosystem for a highly scalable data sharing protocol. In our use case, users can share metadata with the centralized hub to facilitate search and discovery while also creating a history of data publications for future reference. In addition, because of the nature of publish-subscribe paradigms, multiple services can consume broadcast data from publishers. This allows for easy internode and intranode communication and clearly defined mesh state, since any subscriber may observe changes to a node or mesh by simply observing the channels describing the changes.

Author Bios:
Michael Lukowski is a PhD student in the Department of Computer Science and a Software Engineer at the Center for Translational Data Science at the University of Chicago

Matthew West is a computer science PhD student at the University of Chicago. His research is focused on scientific reproducibility in the context of data mesh platforms.

Robert L. Grossman is the Frederick H. Rawson Distinguished Service Professor of Medicine and Computer Science and the Director of the Center for Translational Data Science at the University of Chicago

Large Language Models (LLM) and Temporal Logic of Actions (TLA): How Effective are LLMs for Verification Systems

Authors: Brian Ortiz, Arslan Bisharat, Eric Spencer, Khushboo Bhadauria, Anisa Ramos, Mohammed Abuhamad, Konstantin Laufer, TaiNing Wang, George K. Thiruvathukal

Institution(s): Loyola University Chicago

Room: Lake

Board #: L11

Session #: 2

Abstract: TLA+ has supported industrial verification at companies such as Amazon and Microsoft, yet writing correct TLA+ specifications from natural language still requires time and expertise, which limits adoption. LLMs show promise, but no prior study measures whether they produce semantically correct TLA+ specifications from natural language. Our research presents the first systematic evaluation of LLM-based TLA+ specification synthesis from natural language. Our study evaluates 30 LLMs across eight families on a curated dataset of 205 TLA+ specifications: 25 open-weight models across four prompting strategies (2,600 runs) and 5 proprietary models under few-shot prompting (130 runs), all validated by the SANY parser and TLC model checker. LLMs achieve up to 26.6% syntactic correctness but only 8.6% semantic correctness, with successes exclusive to progressive prompting. Results show that model size does not predict quality, e.g., DeepSeek r1:8b outperforms its 70B variant across all strategies, which suggests the importance of reasoning alignment for formal languages. Code-specialized models consistently underperform due to negative transfer from mainstream language training. We identify five recurring hallucination categories, all traceable to specific training data biases. These results suggest that current LLMs do not generate reliable TLA+ specifications without expert oversight. We release the evaluation framework, code, and dataset to support reproducibility and future research.

Author Bios:
Brian Ortiz is graduate student at Loyola University Chicago finishing his Master's in Computer Science this spring. He works professionally as a DevSecOps Engineer. He is a part of the AI for Formal Methods Lab at LUC and conducts research related to TLA+ and Large Language Models.
Arslan Bisharat:PhD researcher in Computer Science at Loyola University Chicago, focused on adversarial machine learning, natural language processing, and social computing. Builds AI systems that are robust, socially aware, and reliable in high-stakes environments, with an emphasis on safety, fairness, and accountability.
Eric Spencer: Eric Spencer is a Senior Undergraduate Student majoring in Computer Science at Loyola University Chicago. He is a part of the AI for Formal Methods Lab at LUC and conducts research related to TLA+ and Large Language Models. He is planning to continue research at Loyola and work for a software agency after graduation.
Khushboo Bhadauria: Khushboo Bhadauria is a graduate student pursuing a Master’s degree in Computer Science at Loyola University Chicago. Her academic interests include artificial intelligence, machine learning, natural language processing, data analysis, and data engineering. She is a member of the AI for Formal Methods Laboratory at Loyola University Chicago, where she contributes to research involving TLA+, formal methods, and large language models (LLMs).
Anisa Ramos: Anisa Ramos is a senior undergraduate student at Loyola University Chicago majoring in Cybersecurity with a minor in Computer Crime and Forensic Science. She conducts research in the AI for Formal Methods Lab, focusing on TLA+ and large language models, and has experience in digital forensics and security analysis. She has interned with Motorola Solutions and Northwestern Medicine, as she plans to pursue a career in cybersecurity and research after graduation.
Mohammed Abuhamad:I am an assistant professor of Computer Science at Loyola University Chicago . I received a Ph.D. degree in Computer Science from the University of Central Florida (UCF) in 2020. I also received a Ph.D. degree in Electrical and Computer Engineering from INHA University , (Incheon, Republic of Korea) in 2020. I received a Master degree in Information Technology (Artificial Intelligence) from the National University of Malaysia , (Bangi, Malaysia) in 2013.
Konstantin Laüfer: I joined Loyola University Chicago's computer science faculty in 1992 after completing my PhD at NYU under Ben Goldberg and Martin Odersky. My work spans programming languages, formal methods, software architecture, and computer science education. I currently co-direct the AI for Formal Methods Lab and am a co-inventor on two Lucent Technologies patents.
TaiNing Wang: Dr. TaiNing Wang is a tenure-track Assistant Professor of Computer Science at Loyola University Chicago. Her research focuses on database systems, query processing and optimization, graph data, applied AI/ML, AI accountability, and formal methods. Her work has been published in leading venues such as ACM SIGMOD, IEEE ICDE, and Future Generation Computer Systems.
George K. Thiruvathukal: Dr. George K. Thiruvathukal is a full professor of computer science at Loyola University Chicago and chairperson. He received the PhD and MS degrees in computer science from Illinois Institute of Technology in 1995 and 1990, respectively and BA degrees in physics and computer science (mathematics minor) from Lewis University (Romeoville, IL) in 1988.

Jetson Sensor Hub: A Virtual Sensor Telemetry Platform for Real-Time Environmental Monitoring

Authors: Om Patel, Dr Michael E. Papka

Institution(s): University Of Illinois Chicago

Room: Lake

Board #: L23

Session #: 2

Abstract: Research labs that rely on environmental monitoring often face a common problem: getting sensor data off edge devices and into a place where researchers can actually use it, without spinning up heavyweight infrastructure. We present Jetson Sensor Hub, a lightweight virtual sensor telemetry platform built to collect, store, and visualize environmental data from multiple NVIDIA Jetson devices and expose it publicly without cloud dependency. Each Jetson runs a Python agent that reads onboard BME680 hardware sensors temperature, humidity, barometric pressure, and gas resistance —parses raw Linux IIO sysfs output into structured JSON, and pushes readings to a central server at five-second configurable intervals via HTTP. The server, built on FastAPI with SQLite-backed storage, handles per-device token authentication, stores time-series readings, and exposes a versioned REST API publicly at sage.evl.uic.edu. For low-latency use cases, the server supports WebSocket streaming, delivering live readings to subscribers in under one second without polling overhead. A lightweight HTML/JavaScript dashboard built with Chart.js provides real-time metrics, historical time-series plots, and per-device CSV export — accessible from any browser with no installation required. The system is designed around simplicity and deployability. All components run as systemd services, agents operate statelessly, and the stack avoids external brokers or cloud services entirely. The platform is currently deployed at the Electronic Visualization Laboratory at UIC, where it supports ongoing NSF-funded research on the SAGE testbed — a national-scale distributed sensing infrastructure spanning edge, fog, and cloud computing tiers.

UpDown: Efficient Manycore based on Many Threading and Scalable Memory Parallelism

Authors: Andronicus Rajasukumar, Ruiqi Xu, Tianchi Zhang, Yuqing (Ivy) Wang, Tianshuo Su, Marziyeh Nourian, Jianru Ding, Jiya Su, Rajat Khandelwal, Alexander Fell, David F. Gleich, Yanjing Li, Henry Hoffmann, Andrew A. Chien

Institution(s): University of Chicago, Purdue University

Room: Lake

Board #: L9

Session #: 2

Abstract: Manycore architectures are a promising direction for single-chip performance. They typically use in-order cores and caches as building blocks and can produce good performance on regular applications. However, on irregular applications, they have low core utilization due to data-dependent control-flow and memory access. We propose UpDown manycore that employs a novel core with Event-Driven Scheduling (EDS), Software-controlled Lightweight Threading (SLT), and Split-transaction Memory Access with software synchronization (SMA) to deliver both high core utilization and chip performance. On a variety of graph and sparse applications, UpDown outperforms a much larger commercial multicore chip (20-core, OoO) by up to 81x. Compared to simple, in-order cores, the UpDown core’s novel mechanisms provide a 2.4-5.9x performance advantage, specifically 1.9x (EDS), 1.4x (SLT), and 1.4x (SMA). For a manycore chip, these architectural benefits enable a 2048-core UpDown to outperform an 8192-core in-order chip of similar Si area by 3.1x overall. By efficiently scheduling computation from many parallel threads, UpDown achieves high core utilization. Further, UpDown mechanisms enable it to more effectively exploit the growing bandwidth available from HBMs and tolerate higher NoC latencies, supporting future scaling.

Author Bios:
Andronicus Rajasukumar is a PhD candidate at the University of Chicago. He received his BE from the College of Engineering Guindy, India, and MTech from the Indian Institute of Technology, Delhi. He worked as a Senior Staff Engineer at Qualcomm (2014-20) and Intel (2010-14) in 3D-graphics GPU pre-silicon performance. His current research spans specialized and general-purpose computer architectures and processing near memory.

Ruiqi Xu received the BS and MS degrees from Northwestern University in 2023. He is currently working toward a PhD degree at the University of Chicago. His research interests include computer architecture and parallel programming.

Tianchi Zhang received the B.S. degree from the University of Michigan and Shanghai Jiao Tong University in 2020 and the M.S. degree from the University of Michigan in 2023. He is currently a Ph.D. student at the University of Chicago. His research interest is computer architecture.

Yuqing (Ivy) Wang is a computer science PhD student at the University of Chicago. She received her BSci degree from the University of Edinburgh. Her research spans parallel programming, operating systems, and unified memory models for novel accelerator systems.

Tianshuo Su received his BS from the University of Wisconsin - Madison and his MS from Georgia Institute of Technology. His research interests include computer architecture and high-performance computing. He was a senior engineer at Qualcomm working on DSP performance. Currently, he works at Google in cloud infrastructure efficiency.

Marziyeh Nourian received her Ph.D. in Computer Engineering from North Carolina State University. Her work addresses challenges in high-performance computing, focusing on data transformation, scalable execution of large-scale workloads, and hardware–software co-design. Her expertise includes parallel computing, heterogeneous and reconfigurable architectures, and compiler and runtime techniques.

Jianru Ding is a Ph.D. candidate at the University of Chicago and received his B.S. from The Ohio State University. His research focuses on adaptive runtimes for scalable distributed systems, bridging application-level optimizations with underlying hardware characteristics. He is particularly interested in improving the performance and efficiency of large-scale machine learning workloads.

Jiya Su is a Ph.D. student in Computer Science at the University of Chicago. She received the B.S. degree from Renmin University of China in 2020 and the M.S. degree from the Illinois Institute of Technology in 2023. Her research focuses on graph processing, parallel computing, and computer architecture.

Rajat Khandelwal is a PhD student at the University of Chicago. He received his B.Tech from the Indian Institute of Technology BHU, India. He worked as a System Software Developer at Intel (2020–2023) on USB, Type-C, and Thunderbolt subsystems. His current research spans systems and architecture, focusing on networking and high-performance data movement.

Alexander Fell received his PhD from the Indian Institute of Science, Bangalore, India, and during his career, he contributed to the RISC-V architecture and LLVM compiler development at the Barcelona Supercomputing Center and Nanyang Technological University, Singapore. Now, he is a Senior Research Specialist at the University of Chicago, focusing on the compiler and simulator for the UpDown architecture.

David F. Gleich received the BS degree from Harvey Mudd College and the PhD degree from Stanford University. He is a Professor of Computer Science at Purdue University and a University Faculty Scholar. His research is on data-driven scientific computing, matrix computations, network and graph algorithms, and parallel and distributed computing. He is a fellow of SIAM.

Yanjing Li received the BS and MS degrees from Carnegie Mellon University and a PhD from Stanford. She is an Associate Professor of the Department of Electrical and Computer Engineering at Northeastern University. Her research includes intersections of AI and systems, computing architecture for emerging technologies and applications, and hardware security.

Henry Hoffmann received the BS degree from UNC-Chapel Hill and the SM and PhD degrees from MIT. He is the Liew Family Chair of the Department of Computer Science at the University of Chicago. His research includes adaptive and self-aware computing systems

Andrew A. Chien received the BS, MS, and PhD degrees from the Massachusetts Institute of
Technology. He is the William Eckhardt Distinguished Service Professor in Computer Science at the University of Chicago and a senior computer scientist at Argonne National Laboratory. He has broad interests in computer systems and renewable energy. Chien served as Vice President of Research of Intel Corporation (2005-10), the SAIC Professor in computer science at the University of California, San Diego (1998-2005), and professor of computer science at the University of Illinois (1990-8). He is a fellow of the ACM, IEEE, and AAAS.

Evaluating Anomaly Detection Models for Cybersecurity: From Simulation to Real-World Data

Authors: Maheen Syeda, Yi Yang

Institution(s): Northeastern Illinois University

Room: Lake

Board #: L17

Session #: 2

Abstract: Modern cybersecurity systems generate large volumes of network and log data, making reliable detection of malicious activity increasingly challenging. This project presents an anomaly detection system for cybersecurity traffic using both simulated security logs and the CICIDS2017 real-world dataset. The study evaluates rule-based detection, unsupervised anomaly detection models and a supervised benchmark to better understand how different approaches perform across controlled and real-world environments. The project first explored rule-based methods, Local Outlier Factor, One-Class Support Vector Machine and Isolation Forest. These methods highlighted the difficulty of detecting attacks without labeled training data, especially in noisy and imbalanced cybersecurity traffic. Isolation Forest achieved the strongest performance among the unsupervised models after feature selection, sampling and parameter tuning. To improve detection accuracy, a supervised Random Forest model was added using labeled normal and attack traffic. Random Forest achieved the best overall performance, with an F1 score of approximately 0.97. The final system includes an interactive Streamlit dashboard that displays model performance, precision, recall, F1 score, false positives, false negatives, ROC curve, feature importance and sample predictions. This work highlights the practical difference between unsupervised anomaly detection and supervised classification in cybersecurity settings. Future work will focus on enhanced feature engineering, additional hybrid models and improving the dashboard for real-time security monitoring and analysis.

Author Bios:
Maheen Syeda is a graduate student in Computer Science at Northeastern Illinois University. Her research interests include artificial intelligence, machine learning, cybersecurity and data-driven systems. Her current work focuses on anomaly detection for cybersecurity network traffic using supervised and unsupervised learning methods.

Dr. Yi Yang earned her PhD degree in Computer Science and Engineering from PennState in the year of 2010. She is currently an Associate Professor in the Computer Science Department at Northeastern Illinois University in Chicago, IL. Her research expertise includes Cybersecurity, Computer Networks and Artificial Intelligence (AI).

Drava: An Event-Driven Runtime for Energy-Efficient, Agent-Aided Scientific Workflows

Authors: Ahmedur Rahman Shovon, Romain Pereira, Brice Videau, Swann Perarnau

Institution(s): Argonne National Laboratory

Room: Lake

Board #: L16

Session #: 2

Abstract: Next-generation detectors in high-energy physics (HEP) and X-ray science (XRS) are driving continuous, multi-stage streaming pipelines with facility-scale data rates approaching petabytes per second. These pipelines require real-time reconstruction under strict energy constraints, integrating data ingestion, preprocessing, AI inference, and feedback-driven control, with intermediate results steering detector configuration and analysis decisions. Such closed-loop workflows have usually relied on custom, one-of-a-kind hardware components, with limited ability to adapt to new sensor or compute technologies after deployment. We argue that a well-designed software infrastructure can efficiently orchestrate these workflows, avoiding manual tuning and allowing general-purpose computing hardware integration. We present Drava, an event-driven runtime system for scientific pipelines based on a lightweight event-driven model. Developers express application stages as reusable components, while Drava manages event scheduling, batching, data movement, and transport-aware execution across heterogeneous resources. This design separates scientific logic from system concerns and enables control over execution granularity, data locality, and resource utilization. Our contributions are threefold. First, we introduce a runtime abstraction for composing multi-stage scientific workflows as end-to-end streaming pipelines. Second, Drava provides stage-level and pipeline-level instrumentation to quantify how batching, concurrency, transport, and resource placement affect throughput, latency, and energy efficiency. Third, we enable agentic workflows for runtime optimization: one agent performs guided exploration to identify top-performing configurations, while another learns a surrogate model from sampled executions to predict promising configurations. We demonstrate Drava using multi-stage ptychography and tomography applications, showing sustained high-throughput execution and ability to adapt to increasing data rates and bandwidth demands of next-generation detectors.

Author Bios:
Ahmedur Rahman Shovon is a Postdoctoral Appointee in the Mathematics and Computer Science Division at Argonne National Laboratory. His research focuses on designing system software and runtime infrastructure for energy-efficient high-performance computing (HPC) platforms that support large-scale scientific data streaming workflows.

Romain Pereira is a postdoctoral appointee at the Argonne National Laboratory. He works on runtime systems towards performant, portable and reliable execution of parallel programs.

Brice Videau is a computer scientist at Argonne National Laboratory working on HPC-related topics such as heterogeneous platform programming and auto-tuning. He is also Performance Engineering Team Lead at the Argonne Leadership Computing Facility.

Swann Perarnau is a Computer Scientist in the Mathematics and Computer Science division of Argonne, where he currently studies the use of system software methods to improve the energy-efficiency of scientific computing platforms.

Rethinking Dependences for Compiler Parallelization

Authors: Federico Sossai

Institution(s): Northwestern University

Room: Lake

Board #: L4

Session #: 2

Abstract: Automatic parallelization of sequential programs remains ineffective outside small kernels with regular structure. As a consequence, expert programmers achieve multithreaded execution manually by reasoning about coarser-grained operations, reshaping data structure layouts to support parallel access, and exploiting algorithm-specific knowledge. None of the insights behind these transformations can be inferred by static analysis. Existing compilers reason about fine-grained dependences and fall short beyond scalar and array reductions, failing to capture opportunities that expert programmers routinely exploit. We propose a new compiler abstraction called the Dependence Termination Graph (DTG) that pairs dependences with programmer-defined transformations that can terminate them. By making these transformations first-class in compiler reasoning, DTG exposes parallelism that prior abstractions cannot represent or infer. This allows compilers to parallelize complex operations such as sequence appends and hash-table insertions across different concrete implementations. We define the DTG over a lightweight extension of LLVM IR that captures dependences between high-level memory operations beyond traditional low-level reads and writes. Programmers express relations through an annotation interface that lowers directly to the IR extension. We evaluate DTG on 11 benchmarks from PARSEC, GAPBS, and SPEC CPU 2017 where prior compilers failed to match the performance of manual parallel implementations. On a 28-core processor, our compiler T-800 outperforms prior automatic and semi-automatic approaches by 3.1×, closing 96\% of the speedup achievable via manual parallelization.

Author Bios:
[Federico Sossai]
I am a fourth-year Computer Science PhD candidate at Northwestern University, advised by Simone Campanoni. My research explores abstractions and intermediate representations that help compilers uncover parallelism in CPU programs beyond what static analyses can infer automatically.

Shipyard: A Multi-Leader Consensus Protocol with Auto-Balanced Leadership of the Sharded Keyspace

Authors: Xincheng Yang, Kyle Hale

Institution(s): Illinois Institute of Technology

Room: Lake

Board #: L5

Session #: 2

Abstract: Leader-based consistency protocols often struggle with scalability due to the centralization of client request ordering. The leader, responsible for ensuring consistency by ordering reads and writes, becomes a bottleneck under high load, limiting overall system performance. This limitation is exacerbated by the tight coupling of consistency protocols with fault tolerance mechanisms, which further strain the leader’s capacity. Keyspace sharding, a widely adopted technique in databases and blockchain systems, helps mitigate this bottleneck by distributing the workload. However, it introduces two primary challenges: first, maintaining consistency across shards, and second, balancing the workload between them. While several multi-leader approaches have been proposed to address consistency issues, they often neglect the crucial aspect of leadership balancing. Furthermore, these existing solutions frequently fail to consider dynamic factors such as CPU utilization, request frequency, and other performance metrics that significantly impact overall system efficiency. To address these gaps, we propose Skiff, a variant of the Raft protocol that automatically balances leadership for a single shard. Furthermore, we introduce Shipyard, a higher-level protocol that manages all sharded keyspaces as a unified system, integrating failure recovery and offering linearizable reads and writes.

Author Bios:
Xincheng Yang is a PhD student at Illinois Institute of Technology. His research focuses on distributed systems, particularly consistency protocols in sharding based environment. Kyle Hale is an Associate Professor in the School of Electrical Engineering and Computer Science at Oregon State University. Previously at Illinois Institute of Technology, his research spans several areas in systems, including operating systems, computer architecture, embedded systems, system security, and virtualization.

Optimizing LLM Batch Inference with Workload-Aware DVFS

Authors: Yiheng Tao, Yuping Fan, Michael E. Papka, Zhiling Lan

Institution(s): University of Illinois Chicago, Argonne National Laboratory

Room: Lake

Board #: L22

Session #: 2

Abstract: Most prior work on large language model (LLM) inference has focused on online serving, where systems must react to dynamically arriving requests under strict latency and fairness constraints. However, batch inference is becoming increasingly important in many emerging workloads, including literature mining, synthetic data generation, scientific data annotation, and document summarization. Despite its growing practical importance, batch inference remains underexplored as a systems problem. A defining property of batch inference is that the workload is available before execution. This creates opportunities for optimizations that are difficult or impossible in online serving. In this work, we argue that advance knowledge of batch composition can be leveraged to analyze workload characteristics and proactively optimize execution for energy efficiency. We propose a workload-aware system design that analyzes a batch before or during execution and uses dynamic voltage and frequency scaling (DVFS) to better match hardware behavior to workload demand. By exploiting the predictability of batch jobs, such a system can improve the energy efficiency of LLM inference while maintaining competitive performance. This poster highlights workload-aware DVFS as a promising direction for energy-efficient LLM batch inference.

Author Bios:
Yiheng Tao is a Ph.D. student in the Department of Computer Science at the University of Illinois Chicago. His research focuses on high-performance computing, LLM infrastructure, and related systems topics.
Yuping Fan is a Postdoctoral Researcher at Argonne National Laboratory. Her current research focuses on optimization of Large Language Model inference systems and distributed systems. Prior to joining Argonne, she served as a Research Scientist at Meta Platforms (2022–2025) and earned her Ph.D. in Computer Science from the Illinois Institute of Technology in 2021, where her doctoral work focused on intelligent job scheduling for high-performance computing.
Michael E. Papka is the Warren S. McCulloch Professor of Computer Science and Director of the Electronic Visualization Laboratory at the University of Illinois Chicago (UIC). He is also an Argonne Distinguished Fellow at Argonne National Laboratory, where he serves as a Deputy Associate Laboratory Director and Director of the Argonne Leadership Computing Facility. In addition, he is the founding co-director of the Crabtree Institute, a collaborative initiative between UIC and Argonne. Dr. Papka’s research focuses on advancing scientific discovery through high-performance computing, artificial intelligence, large-scale data analysis, and visualization. Prior to joining UIC, he was a Presidential Research, Scholarship, and Artistry Professor of Computer Science at Northern Illinois University.
Zhiling Lan is a Full Professor of Computer Science at the University of Illinois Chicago (UIC) and holds a joint appointment with Argonne National Laboratory. Previously, she was a tenured Full Professor of Computer Science at the Illinois Institute of Technology from 2002 to 2023. Her research interests lie in high- performance computing (HPC), with particular emphasis on resource management and job scheduling, energy efficiency, fault tolerance, and digital twins. She has served as a program chair and program committee member for numerous international conferences, including SC, IPDPS, HPDC, ICS, Cluster, CCGrid, JSSPP, and SIGSIM PADS, among others.

Correct Is Not I/O Efficient: Measuring Storage Costs of LLM-Generated Scientific Workflows

Authors: Izzet Yildirim, Jaime Cernuda, Xian-He Sun, Anthony Kougkas

Institution(s): Illinois Institute of Technology

Room: Lake

Board #: L1

Session #: 2

Abstract: Large language model coding agents are increasingly used to generate scientific data-analysis workflows, but evaluations usually stop at output correctness. For data-intensive science, this is incomplete: two workflows can produce the same valid result while placing very different pressure on storage systems. This poster studies I/O behavior in correctness-passing LLM-generated scientific workflows. We evaluated Sonnet 4.5, Gemini 2.5 Flash, and Gemma 4 26B on five tasks: climate reanalysis, satellite raster processing, meteorological time-series composition, neuroscience HDF5/NWB analysis, and genomic variant-format conversion. Each workflow was executed in a container and validated against expected scientific outputs. For passing runs, we traced POSIX-level reads, writes, metadata operations, and per-step behavior, then compared the resulting I/O profiles against expert-written implementations. Across these five tasks, 46 of 105 agent-generated workflows passed validation. We summarized only cases where the same model completed the same task successfully at least three times. In all of those comparisons, the expert implementation used fewer POSIX operations and read less data. The median validated agent workflow used 3.8x more POSIX operations and read 6.5x more data than the expert baseline. We decomposed I/O into final-solution execution versus exploratory steps such as failed attempts, inspection, and verification. Exploration overhead varied by model: some passing runs concentrated nearly all I/O in the final solution, while others spent much more I/O during exploration. These results suggest that correctness alone is insufficient for evaluating LLM-generated scientific code. Storage behavior should be measured directly, especially before deploying agent-generated workflows on shared HPC or cloud storage systems.

Author Bios:
Izzet Yildirim: Izzet Yildirim is a Ph.D. candidate in the Department of Computer Science at the Illinois Institute of Technology and a member of the Gnosis Research Center. His research focuses on high-performance computing, with emphasis on I/O performance optimization, I/O characterization, and automated detection of I/O bottlenecks in HPC and deep learning workflows.

Dr. Jaime Cernuda: Dr. Jaime Cernuda is a Research Assistant Professor in the Department of Computer Science at the Illinois Institute of Technology and a member of the Gnosis Research Center. He specializes in high-performance computing infrastructure, with expertise in distributed storage systems, real-time data processing, and exa-scale computing environments.

Dr. Xian-He Sun: Dr. Xian-He Sun is the Director of the Gnosis Research Center and an IEEE Fellow. He is a University Distinguished Professor and the Ron Hochsprung Endowed Chair of Computer Science at the Illinois Institute of Technology. His research interests include parallel and distributed processing, memory and I/O systems, software systems for Big Data applications, and performance evaluation and optimization.

Dr. Anthony Kougkas: Dr. Anthony Kougkas is the Deputy Director of the Gnosis Research Center at the Illinois Institute of Technology. With a Ph.D. in Computer Science, he focuses on solving data management and I/O challenges in extreme-scale distributed applications, with research spanning multi-tiered storage, prefetching, replication, compression, and in-transit/in-situ techniques.

Everything You Think You Know About Compiled and Vectorized Queries Is Wrong

Authors: Alex Butler

Institution(s): Northwestern University

Room: Lake

Board #: L6

Session #: 2

Abstract: Kersten et al. (2018) compared vectorized and data-centric compiled query execution within a single test framework on a single-socket Skylake X at moderate thread counts, concluding that the two paradigms achieve broadly similar performance with principled tradeoffs by query class. Their conclusions and recommendations have since been treated as settled. We reproduce their methodology on five contemporary platforms — multi-socket Cascade Lake, Neoverse N1, Neoverse V2 in single- and dual-NUMA configurations, and AMD EPYC — at higher thread counts and finer thread-count sampling. We find that the paper's qualitative mechanisms generalize: compilation wins register-resident compute, vectorization hides cache-miss latency better, and the relative engine ranking on most queries matches the paper's findings at single thread. The quantitative conclusions do not generalize. Engine performance ratios at high thread counts are governed by which platform-specific bottleneck binds — memory bandwidth on single-socket hardware, cross-NUMA coordination on multi-NUMA hardware, execution-port contention under SMT — and these bottlenecks vary in ways the paper's measurement platform could not have surfaced. Several specific recommendations from the paper actively mislead on modern hardware: vector size 1024 is suboptimal across all tested platforms, SIMD optimizations that the paper found neutral on AVX-512 produce substantial engine-ranking shifts on ARM, and hyperthreading degrades rather than improves throughput on most queries on modern server parts. We argue that engine choice on modern many-core systems is a function of platform topology and operating point that no single benchmark configuration can capture.

Author Bios:
Alex Butler is a Computer Science PhD student at Northwestern University, advised by Peter Dinda in the Prescience Lab. His research lies at the intersection of firmware and computer architecture, with recent work on synchronization and communication primitives for parallel systems and architectural support for databases. He previously earned his bachelor's in Electrical & Computer Engineering from Olin College and worked on IBM Z mainframes across IBM's Millicode and System Architecture teams.

HermeticMerge: Provenance-Aware Capture and Replay for Reproducible Distributed Workflows

Authors: Talha Azaz, Raza Ahmad, Tanu Malik

Institution(s): University of Missouri, Columbia, MO, USA; DePaul University, Chicago, IL, USA

Room: Lake

Board #: L13

Session #: 2

Abstract: Reproducing distributed scientific workflows in HPC environments requires capturing the complete execution state of both the workflow coordinator and all compute nodes. While application virtualization techniques can audit a single process and produce a self-contained replayable container, applying this to distributed workflows introduces three unsolved problems. (1) auditing each worker independently produces heavily overlapping file sets, making naive aggregation impractical at scale due to redundant data transfer. (2) each worker produces its own isolated provenance log, leaving cross-node data dependencies ( files produced by one worker and consumed by another) unrecorded, with process identities that are only locally meaningful. (3) replaying the workflow on a new cluster requires efficiently distributing the captured environment without shipping files that were not true dependencies (read before write). This paper proposes a system that addresses all three problems. To capture worker environments efficiently, each worker sends its provenance log to the head node, which uses it to identify files already present using content hash and requests only the missing delta container reducing transfer proportionally to environment overlap across workers. To unify provenance, we merge per-worker logs into a single coherent record by mapping intermediate files to a canonical identity via content hashing and namespacing process identifiers to eliminate conflicts across nodes. For replay, we construct a minimal hermetic container containing only the files accessed at audit time, which is efficiently distributed to compute nodes on the target cluster. Together, these contributions enable faithful, auditable reproduction of distributed workflows without access to the original execution environment.

Speeding Up Transformation Search with Lightweight Statistics

Authors: Rubab Zahra Sarfraz, Boris Glavic

Institution(s): University of Illinois Chicago

Room: Lake

Board #: L19

Session #: 2

Abstract: Data lakes contain highly heterogeneous data with little to no schema, resulting in columns that are semantically joinable but syntactically incompatible (e.g., "Springfield, IL" vs. "Springfield"). Transformation discovery systems such as AutoJoin enumerate possible transformation sequences to identify one that enables equi-joins between such columns. However, exhaustive search is prohibitively expensive at data lake scale. In this work, we investigate the use of inexpensive column-level statistics (e.g., string-length distributions, delimiter frequencies, and token counts) to prune large portions of the transformation search space. To handle the diversity of transformation types that may be needed to achieve joinability, we study how to automatically derive statistics and pruning rules from the preconditions of transformation operators. As a proof of concept, we integrate our techniques into the AutoJoin transformation search algorithm. Our preliminary results on 10 real-world datasets show up to a 47x speedup over vanilla AutoJoin. Beyond Autojoin, these techniques can be integrated into other transformation discovery systems, such as FlashFill, making them more feasible at data lake scale.

Author Bios:
Rubab Zahra Sarfraz is a first-year PhD student in the database group at the University of Illinois Chicago, advised by Prof. Boris Glavic (https://www.cs.uic.edu/~bglavic/dbgroup/members/bglavic.html). Her research interests are in data discovery, data cleaning, and data integration. She is currently working on making data transformations in large data lakes scalable and automatic.

Boris Glavic (https://www.cs.uic.edu/~bglavic/dbgroup/members/bglavic.html) is an Associate Professor in the Department of Computer Science at University of Illinois at Chicago leading the DBGroup. His research spans several areas of database systems and data science including data provenance, data integration, query execution and optimization, uncertain data, and data curation. Boris strives to build systems that are based on solid theoretical foundations.

Distributed Edge Computing Task Allocation with Network Effects

Authors: Henry W. Abrahamson, Yongho Kim, Seongha Park, Ermin Wei

Institution(s): Northwestern University, Argonne National Lab

Room: Lake

Board #: L21

Session #: 2

Abstract: Field-deployable edge computing nodes form a network and are used to complete scientific tasks for remote sensing and monitoring. The networked nodes collectively decide which scientific applications to run while they are constrained by various factors, such as differing hardware constraints from heterogeneous nodes and time-varying quality of service (QoS) requirements. We model the problem of task allocation as an optimization problem that maximizes the QoS, subject to the constraints. We solve the optimization problem using a dual-descent method, which can be easily implemented in a distributed way subject to the communication constraints of the network. Using a simulation that uses real-world data collected from Sage, a distributed sensor network, we analyze our policy’s performance in dynamic situations where the required QoS and the nodes’ capabilities change, and verify that it can adapt and return a feasible solution while accounting for those changes.

Author Bios:
Henry Abrahamson is a PhD student of Electrical and Computer Engineering at Northwestern University under Professor Ermin Wei. His research interests include distributed systems, optimization & control, and online algorithms.

Ermin Wei is an associate professor in the Department of Electrical Engineering & Computer Science and Industrial Engineering & Management Sciences at Northwestern University. Her research interests include distributed optimization Methods, Smart Grid and Other Networked Markets, Network Optimization and Control, and Game Theory.

Yongho Kim is an assistant computer scientist at Argonne National Laboratory. His research interests include AI@edge, edge computing, task scheduling, resource management and control, and agentic AI.

Seongha Park is an assistant computer scientist at Argonne National Laboratory. Her research interests include Agentic AI for scientific research, Probabilistic Embeddings, AI for error mitigation and characterization in quantum computing.

Room: Rock

Evaluating Experimental Designs for Short Video Studies

Authors: Annapurna B. Puttaswamy and Aaron D. Striegel

Institution(s): University of Notre Dame

Room: Rock

Board #: R6

Session #: 2

Abstract: Short video platforms like TikTok, Instagram Reels, etc., have gained popularity in multimedia by reshaping video consumption. The small video duration and unique swipe feature in the short video platforms made recommendation and preloading mechanisms crucial. Existing studies focused on optimizing the recommendation and preloading mechanisms to enhance the user experience using conventional and learning based approaches. However, existing studies use different experimental designs to evaluate their mechanisms, which could potentially lead to less reliable results in various environments. Hence, in this work we explore various experimental designs and evaluate the efficiency of existing mechanisms under various experimental designs. We provide guidelines to make better experimental design choices.

Author Bios:
ANNAPURNA B. PUTTASWAMY received the B.E. degree in Computer Science and Engineering in 2016 and M.Tech. degree in Computer Networks and Engineering in 2018 from Visvesvaraya Technological University, India. Currently pursuing a Ph.D. in Computer Science and Engineering at the University of Notre Dame, USA. Current research focuses on multimedia-streaming, QoE optimization, healthcare mobile apps, and wearables.

AARON D. STRIEGEL received a Ph.D. degree in Computer Science and Engineering, Iowa State University, 2002. Currently working as a Professor in Computer Science and Engineering at the University of Notre Dame, USA. Research interests focus on instrumenting the wireless networked ecosystem, Privacy, healthcare applications, and security dynamics.

Graph Neural Networks for Wireless Interference Estimation

Authors: Yanzhi Li

Institution(s): Northwestern University

Room: Rock

Board #: R10

Session #: 2

Abstract: Accurate interference characterization is essential for channel as- signment and resource allocation in wireless networks. Pairwise interference among access points (APs) can be quantified using re- ceived signal strength (RSS) measurements. However, constructing a complete RSS map via direct measurement is often impractical due to the high measurement cost. Existing approaches to construct RSS maps face important trade-offs in terms of estimation accu- racy, measurement overhead, side-information requirements, and adequacy to unseen/dynamic scenarios. We propose a graph-based framework for inferring RSS maps from sparse measurements. By modeling the wireless network as a directed graph and formulating RSS prediction as an edge inference problem, we employ a Message- Passing Graph Neural Network (MPGNN) to capture both local and global interference dependencies without requiring explicit scene- level environmental models and maps, which is a required side- information in existing solutions including ray-tracing. Evaluations on measurement and ray-tracing datasets suggest that the proposed approach generalizes across network topologies and remains robust to deployment changes and measurement sparsity. Compared to representative baseline methods, it reduces normalized root mean square error (NRMSE) by up to 0.15 in low-observation regimes while substantially lowering measurement overhead.

Author Bios:
I am a Computer Engineering PhD candidate from Northwestern University, advised by Prof. Igor Katoda. My research goal is to apply machine learning to solve dynamic spectrum sharing problems.

DCGen 1.1: Generating Datacenter Configs (IT, Power, Cooling)

Authors: Wedan Emmanuel Gnibga and Andrew A. Chien

Institution(s): University of Chicago

Room: Rock

Board #: R4

Session #: 2

Abstract: Diversification of digital applications and workloads has driven the development of diverse datacenter architectures on ever-larger scales. These datacenters consist of complex IT, power, and cooling systems with interdependencies that influence configuration and performance. As datacenters scale and power density increase, designing realistic models becomes more difficult, particularly for research, because it requires understanding all layers of the datacenter and how they interact. Consequently, many studies rely on outdated or unrealistic designs. To support research in datacenter hardware design principles, operational dynamics, cooling mechanisms, and interactions of these facilities with the electrical grid, we have designed DCGen, a tool which can generate a variety of datacenter configurations (including IT hardware, cooling and power distribution infrastructures) at various electrical power, compute capability, and area targets. The tool captures power and space characteristics of IT, cooling, and power infrastructures at both the rack and datacenter levels, enabling modeling of power, energy, and space. DCGen leverages specific use cases such as AI training, AI inference, and cloud services, to select reference and canonical IT hardware configurations, producing realistic mixes of server types. It can target datacenter scale in terms of both power (e.g., 10 MW, 100 MW, 1 GW) and compute capability. For cooling and power distribution infrastructures, DCGen chooses components from a production equipment catalog that optimizes for space or power efficiency while meeting the datacenter capacity requirements. This tool supports research using realistic datacenter designs through “what-if” scenario exploration, including studies of power density evolution over time, grid interconnection capacity planning, datacenter-grid interactions, and space management.

Author Bios:
Wedan Emmanuel Gnibga received his PhD of Computer Sciences from the university of Rennes (France) in 2024 and the engineering degree in embedded electronic systems and systems control from the National School of Applied Sciences
in Marrakech, Morocco, in 2020. He is currently a postdoctoral researcher at the University of Chicaco.

Andrew A. Chien is the William Eckhardt Professor at the University of
Chicago and a Senior Scientist at Argonne National Laboratory, and in prior
roles has served at Intel Vice President of Research and Professor at UCSD
and University of Illinois. Dr. Chien is a global research leader for sustainable
computing, datacenters and power grids, parallel computing, computer archi-
tecture, clusters, resource scheduling, and cloud computing, and has received
numerous awards for his research. Dr. Chien is an ACM, IEEE, and AAAS
Fellow, and earned his PhD, MS, and BS from M.I.T.

Applications do I/O. What if files actually went... somewhere else?

Authors: Neeraj Rajesh, Xian-He Sun, Anthony Kougkas

Institution(s): Illinois Tech

Room: Rock

Board #: R5

Session #: 2

Abstract: Applications are chained to their storage—moving from POSIX files to HDF5, ADIOS2, or cloud formats requires rewriting code. I/O Router breaks that chain. Our framework intercepts file operations at three levels (LD_PRELOAD, FUSE, eBPF) and routes them to diverse backends without application modification. Our insight: separate data from metadata. System calls like fstat() and access() pass through to the real filesystem (keeping HPC benchmarks happy), while read() and write() redirect to structured storage via configurable path rules (/data/**/*.h5). A hybrid architecture makes this practical: eBPF tracks process lifecycles via ring buffers, a userspace controller maintains interception sets, and LD_PRELOAD performs O(1) file-descriptor routing at open() time. This avoids race conditions and preserves compatibility with IOR, HDF5, and MPI-IO. Built in Rust with lock-free concurrency, I/O Router achieves <50% overhead for data-path operations while enabling zero-code-modification adoption of next-generation storage.

Author Bios:
Neeraj Rajesh is a PhD Candidate in Computer Science at Illinois Institute of Technology. His research focuses on parallel and distributed systems, high-performance computing, and ML-assisted systems optimization. He is passionate about free and open-source software, federated protocols, and democratizing data ownership through accessible technology.

Anthony Kougkas is an Associate Research Professor of Computer Science at Illinois Institute of Technology and Deputy Director of the Gnosis Research Center. His research focuses on HPC storage and I/O, multi-tiered storage architectures, and data management for AI/ML workflows. He is a Guest Research Scientist at Argonne National Laboratory with 50+ publications and recipient of the HPDC'19 and CCGrid'21 Best Paper Awards.

Dr. Xian-He Sun is a University Distinguished Professor and Ron Hochsprung Endowed Chair of Computer Science at the Illinois Tech. An IEEE Fellow, his research focuses on parallel and distributed processing, high-performance memory and I/O systems, and performance optimization for big data applications.

Agentic Search Efficiency for Scientific Data Discovery

Authors: Shazzadul Islam, Jaime Cernuda, Xian-He Sun, Anthony Kougkas

Institution(s): Illinois institute of technology

Room: Rock

Board #: R9

Session #: 2

Abstract: Finding scientific data is an iterative reasoning task. A scientist issues a query, skims results, reformulates when units mismatch, inspects candidates, and repeats until something fits. Agentic systems promise to automate this loop, but the cost is alarming: a single discovery query can consume a million tokens and tens of minutes, mostly re-reading metadata already seen. The bottleneck is not model capability; current agents treat retrieval as an opaque tool and reason about every intermediate result in context. We present CLIO Search, which separates reasoning from execution. Domain knowledge lives in the retrieval layer, not the LLM. Deterministic science-aware operators (unit conversion, formula matching) run as first-class branches alongside BM25 and vector search, correct by construction rather than similarity. CLIO profiles each corpus before querying, selects branches, routes across HPC backends, and iteratively refines. The LLM is advisory; scoring, filtering, and ranking stay deterministic, making the correctness model-independent. We compare two baseline paths against CLIO Search across 10 queries on the National Data Platform (NDP; 341 datasets) using claude-sonnet-4-20250514. Claude Native runs a Claude agent with web search and sub-agents, but no domain tools. NDP-MCP exposes NDP's catalog via a Model Context Protocol server. CLIO cuts tokens by ~52% and time by ~91% vs Claude Native, ~52% and ~75% vs NDP-MCP. Per-query context drops two orders of magnitude; 9 of 10 queries resolve in a single CLIO pass compared to 26 and 8 agent iterations on the baseline paths. Agentic search scales through improved harnessing, not prompting.

Author Bios:

Shazzadul Islam is a Ph.D. student at Illinois Institute of Technology and a member of the Gnosis Research Center. His research focuses on AI-driven scientific data systems, high-performance computing, and scalable data infrastructures, including workflow optimization and large-scale data management.

Dr. Jaime Cernuda is a Research Assistant Professor at Illinois Institute of Technology and a member of the Gnosis Research Center. His research interests include distributed systems, high-performance computing, and streaming applications for large-scale scientific data processing.

Dr. Xian-He Sun is a University Distinguished Professor and Director of the Gnosis Research Center at Illinois Institute of Technology. His research focuses on parallel and distributed systems, memory and I/O systems, and performance optimization for data-intensive computing.

Dr. Anthony Kougkas is a Research Associate Professor and Deputy Director of the Gnosis Research Center at Illinois Institute of Technology. His research focuses on storage systems and I/O optimization for extreme-scale distributed applications and HPC environments.

Parameterized Algorithms and Parameter Selection for Fast GPU-GPU Collective Communication

Authors: Peizhi Liu, Sean Rhee, Michael Wilkins, Peter Dinda

Institution(s): Northwestern University, New York University, Cornelis Networks

Room: Rock

Board #: R2

Session #: 2

Abstract: High-performance collective communication among GPUs in modern supercomputers is crucial for enabling many applications. Complex hierarchical interconnects between GPU devices necessitate collective algorithms that can effectively leverage the underlying network topology. We present parameterized algorithms for two GPU-to-GPU collectives, Allgather and Allreduce, as well as an optimized permutation kernel used to further enhance GPU collective communication. By employing a LogGP-based model calibrated with real machine measurements, we can efficiently simulate various parameter choices to identify optimal settings for specific device allocations and message sizes. We perform a comprehensive evaluation on NCSA Delta and further assessment on Argonne Polaris supercomputers. We demonstrate, through Delta, that our parameterized algorithms can achieve, on average, a 20% speedup over their non-parameterized counterparts, with our parameter selection process capturing 98% of the potential speedup.

Author Bios:
Peizhi Liu is a Ph.D. candidate in computer science at Northwestern University, advised by Professor Peter Dinda. His research focuses on heterogeneous interconnects and communication optimization in large-scale parallel and distributed systems, with an emphasis on making complex hardware more efficient, programmable, and portable. He is an NSF Graduate Research Fellow.

Sean is a PhD student at NYU working on networks and formal methods.

Mike Wilkins is a Senior Software Engineer at Cornelis Networks. He received his Ph.D. in computer engineering from Northwestern University and was previously a Maria Goeppert Mayer fellow at Argonne National Laboratory. His research focuses on networks and communication for HPC systems, especially collective communication algorithms and autotuning.

Peter Dinda is a professor in the Department of Computer Science at Northwestern University, and also holds an appointment in the Department of Electrical and Computer Engineering. He works in experimental computer systems. You can find out more about him at pdinda.org.

Evaluating Experimental Designs for Short Video Studies

Authors: Annapurna B. Puttaswamy and Aaron D. Striegel

Institution(s): University of Notre Dame

Room: Rock

Board #: R6

Session #: 2

Shards on a Shoestring: Putting Blockchain Sharding Simulations to the Test

Authors: Om Amith Gandhi, Sohini Sahukar, Dr. Ioan Raicu

Institution(s): Illinois Institute of Technology

Room: Rock

Board #: R3

Session #: 2

Abstract: Modern blockchains struggle to scale - Bitcoin processes roughly 6 transactions per second while Visa handles thousands. Sharding, which splits transaction processing across parallel groups of validators, is the most promising path forward. NEAR Protocol's Nightshade sharding design claims to achieve up to one million transactions per second, but how well do the simulations that inform such designs actually reflect reality? This poster presents an ongoing effort to find out. We take a validated discrete-event sharding simulator modeled after NEAR's Nightshade architecture and put its predictions to the test by running real sharding experiments on Chameleon Cloud, a bare-metal testbed, adapting NEAR's one-million-TPS benchmark harness from its original GCP infrastructure to a scaled-down but representative Chameleon configuration. By instrumenting both systems for the same core metrics including throughput, confirmation latency, and coordination overhead, we can directly measure the gap between what the simulation predicts and what real hardware delivers, and interrogate what that gap says about the assumptions sharding models rely on. The broader implication is what makes this work matter: if simulations systematically misrepresent sharding performance, the field needs to know. If they hold up, they become a powerful and accessible tool for guiding protocol design decisions without depending on commercial cloud deployments.

Author Bios:
Sohini Sahukar
Sohini Sahukar is a graduate student in Computer Science at Illinois Institute of Technology with interests in distributed systems and blockchain technologies. She is particularly interested in understanding the real-world performance of consensus mechanisms and sharded architectures.

Om Amit Gandhi
Om Amit Gandhi is a Computer Science student at Illinois Institute of Technology specializing in systems programming, GPU computing, and blockchain technology. His work spans parallel computing, cryptography, and high-performance architecture, with projects including GPU-accelerated blockchain plotters and distributed sharding simulations. He actively builds portfolio projects bridging academic research and real-world systems.

Dr. Ioan Raicu
Dr. Ioan Raicu is an Associate Professor in the Department of Computer Science at Illinois Institute of Technology and guest research faculty at Argonne National Laboratory. He is the founder and director of the Data-Intensive Distributed Systems Laboratory (DataSys Lab). His research focuses on distributed systems, many-task computing, and data-intensive computing at extreme scales, with recognition including the IEEE TCSC Young Achievers in Scalable Computing award and an NSF CAREER award.

Uncertainty-Aware GPU Hour Allocation via Conformal Risk Control

Authors: Iqtedar Uddin, André Bauer

Institution(s): Illinois Institute of Technology

Room: Rock

Board #: R8

Session #: 2

Abstract: In shared GPU clusters, jobs must be assigned a time allocation before their true runtime is known. Jobs exceeding their allocation are killed, wasting all prior compute, while over-allocating wastes scarce GPU hours. Studies of production clusters report that GPU utilization often sits at 30 to 40%, driven by conservative over-allocation. Current practice relies on fixed multiplicative buffers such as 2x predicted runtime, applying uniform safety margins regardless of per-job uncertainty. We propose an uncertainty-aware allocation framework that combines Conformal Quantile Regression with Conformal Risk Control to produce per-job time allocations with formal, finite-sample guarantees on expected overrun. Unlike fixed buffers, our approach generates heteroscedastic prediction intervals that adapt to each job's uncertainty: routine jobs receive tight allocations while hard-to-predict jobs receive wider margins. A tail-adaptive variant partitions jobs by runtime bucket and applies risk control independently to each, protecting expensive long-running jobs that dominate cluster costs. For recurring job groups, an online adaptation mechanism updates intervals in real time. We evaluate on the Alibaba PAI-2020 trace (732K jobs) using the ATLAS leakage-free benchmark, with cross-cluster validation on the Microsoft Philly trace. Our approach achieves up to 50% lower scheduling cost than the best fixed buffer at matched coverage levels, Pareto-dominates fixed buffers across all cost ratios, and reaches high-coverage operating points that fixed buffers cannot without catastrophic cost inflation. A case study simulating an Earliest Deadline First scheduler confirms the lowest worst-case job stretch. The allocation path requires 26.3 microseconds per job. Integration with the Tiresias scheduler is ongoing.

Author Bios:
Iqtedar Uddin is an Accelerated Master’s student at the Illinois Institute of Technology pursuing degrees in Computer Science and Data Science. His research focus centers on uncertainty quantification and the development of reliable machine learning systems. He will join Amazon (AWS) as a Data Engineering Intern in the summer of 2026.

André Bauer is an Assistant Professor of Computer Science at the Illinois Institute of Technology. He is the founder and elected chair of the SPEC RG Predictive Data Analytics Working Group. His research focuses on performance engineering, distributed and scientific computing, time series forecasting, and privacy-preserving computation for data-driven science.

Analytical Quantum Layout Synthesis

Authors: Yuchen Zhu, Nikos Hardavellas

Institution(s): Northwestern University

Room: Rock

Board #: R1

Session #: 2

Abstract: Quantum layout synthesis (QLS) maps logical quantum circuits onto hardware with limited qubit connectivity by inserting SWAP gates to satisfy nearest-neighbor constraints. Although exact QLS methods can produce optimal solutions, they scale poorly, while heuristic methods are more scalable but often incur large routing overheads. We propose Analytical Quantum Layout Synthesis (Analytical QLS), a differentiable optimization framework that models layer-wise qubit mappings as the central optimization variables. Our method formulates each layer’s routing cost using qubit interaction and hardware distance matrices, and introduces inter-layer consistency penalties to reduce additional routing between adjacent layers. Discrete placement matrices are relaxed into doubly stochastic matrices through a Sinkhorn layer, enabling gradient-based optimization with Nesterov’s accelerated gradient descent. A post-processing step then projects the relaxed solutions back to valid mappings and inserts SWAPs between layers. Across benchmark circuits, Analytical QLS outperforms prior methods on 86% of cases, reducing SWAP count by 45% on average and by up to 90% in the best case, while showing stronger scalability on large circuits than widely used heuristic approaches. Our results demonstrate that analytical optimization is a compelling alternative to traditional exact and heuristic QLS techniques for near-term superconducting quantum devices.

Author Bios:
Yuchen Zhu is a first-year PhD student in Computer Science at Northwestern University, advised by Prof. Nikos Hardavellas. His research interests lie in hardware-software co-design for quantum computing systems.
Nikos Hardavellas is a professor of Computer Science and Computer Engineering at Northwestern University, where he directs the Parallel Architecture Group at Northwestern. His research focuses on computer architecture, specifically at the intersection of computer architecture with the computer systems stack, memory systems, nanophotonics, energy-efficient computing and quantum computing systems.

APEX: Agent Policy-Driven Execution Runtime

Authors: Aditya Gandhi, Jay Yu, Lavangi Yadava, Omar Attia, Rohan Potta, James Davis

Institution(s): Purdue University, West Lafayette

Room: Rock

Board #: R7

Session #: 2

Abstract: AI agent frameworks for software engineering lack runtime constraints. We build a runtime layer for OpenSWE that enforces budgets, routes models, and applies recovery policies. We evaluate this system on SWE-bench to show cost and reliability gains.

Author Bios:
For presenting authors,
Aditya Gandhi (S):
Aditya Gandhi is a Computer Science Honors student at Purdue University, West Lafayette. He is currently researching in the Pre-Trained Models team in the VIP program at Purdue, and is also a Teaching Assistant for CS 38100, the Analysis of Algorithms course at Purdue. His interests are in software engineering and machine learning, and his research is in a coding agent that provides developers more control over budget.

Jay Yu (S):
Jay is a Computer Science student at Purdue University. His work focuses on large language model applications, including fine-tuning, evaluation, and agentic workflows, with an emphasis on software engineering use cases.

James Davis (S):
James C. Davis is an assistant professor in the Elmore Family School of Electrical and Computer Engineering School at Purdue University, West Lafayette. His research interests include human and technical aspects of software engineering and cybersecurity, with an emphasis on failure analysis. Contact him at davisjam@purdue.edu

Perception Constrained Autonomous Driving

Authors: Rushikesh R Shirsat, Neil Klingensmith

Institution(s): loyola university chicago

Room: Rock

Board #: R11

Session #: 2

Abstract: Autonomous vehicles rely on accurate perception systems, particularly LiDAR-based point clouds, to understand and navigate complex environments. However, real-world driving conditions introduce perception constraints such as occlusions, limited sensor range, and reduced data density, which can significantly impact safety and decision-making. This project aims to investigate how autonomous driving performance is affected under constrained perception scenarios and to explore potential solutions to mitigate these limitations. The study is conducted using the CARLA Simulator integrated with Autoware AI. Simulated environments are used to systematically reduce LiDAR point cloud density and introduce occlusions to replicate real-world perception challenges. The project will analyze how these constraints influence object detection, localization, and planning behaviors. In addition, this work proposes a cooperative perception framework in which autonomous vehicles share point cloud data with nearby vehicles. This approach aims to reduce blind spots and improve situational awareness by enabling vehicles to perceive objects beyond their direct line of sight. The expected outcome of this research is to provide insights into the limitations of perception-constrained systems and evaluate the potential of collaborative sensing for improving safety in autonomous driving.

Author Bios:
Rushikesh, Is a Graduate Student at Loyola University Chicago working with Neil Klingensmith.

Neil Klingensmith is an Assistant Professor at CS Dept. at Loyola University Chicago.

Room: Dillo

Multi-Knob Job-Level Performance-Energy Optimization via Phase-Aware Bandit Control and GEOPM

Authors: Niccolò Brembilla, Zhiling Lan, Michael E. Papka

Institution(s): University of Illinois Chicago

Room: Dillo

Board #: D9

Session #: 2

Abstract: High-Performance Computing (HPC) and large-scale AI systems now consume tens of megawatts of power, making energy efficiency a first-class design constraint alongside performance. Existing job-level power management relies on single-knob, static policies that fail to capture the complex interactions between multiple hardware controls, node power limits, GPU, CPU, and Uncore DVFS, across heterogeneous node configurations and dynamic workload phases. We aim to develop a multi-knob, job-level performance-energy optimization framework for any HPC system. The approach builds lightweight phase-aware response models describing how runtime and energy vary across different knob combinations at distinct phases of a workload. These models are learned online using a bandit controller interfacing directly with GEOPM (Global Extensible Open Power Manager), which enforces control decisions across all nodes allocated to a job in real time. By incorporating node-to-node hardware variability, the framework supports non-uniform power assignment across the node allocation redistributing power headroom toward nodes on the critical path to improve load balancing and total job throughput. The best knob configuration at each phase is identified by combining the learned response models with performance and facility power constraints, using the Energy-Delay Product (EDP) as the optimization objective. We aim to demonstrate measurable EDP reduction across representative HPC and large-scale AI workloads, establishing a generalizable foundation for adaptive multi-knob power optimization in heterogeneous environments.

Author Bios:
Niccolò Brembilla is a Computer Science PhD student and Graduate Research Assistant at the University of Illinois Chicago. His work focuses on high-performance computing, log analysis, and energy-efficient supercomputing. He holds a double M.S. in Computer Science from Politecnico di Milano and UIC, where he was recognized as one of the best students in the joint degree program.

Zhiling Lan: Professor at UIC

Michael E. Papka: Professor at UIC

Centralizing Task-based Approach to Quantum Network Control

Authors: Alexander Pirker, Robert Hayek, Alexander Kolar, Igor Kadota, Joaquin Chung, Rajkumar Kettimuthu

Institution(s): Quantum Network Design GmbH; Argonne National Laboratory, University of Chicago, Northwestern University

Room: Dillo

Board #: D1

Session #: 2

Abstract: For the last decade, layered stacks have dominated the way of reasoning about architectures for quantum networks. However, layered architectures impose stringent design and timing constraints on quantum networks, adding additional latency to the time required to serve an entanglement generation request. Moreover, increasing delays from the layered approach to network control causes degradation of state, effectively minimizing achievable fidelities. In this work we simulate a resource-centric, task-based approach to quantum network control by utilizing a centralized controller. Using the SeQUeNCe quantum network simulator, we implement the centralized controller which tracks quantum memory availability across all nodes, and schedules objectives in an offline fashion using a priority based scheduler. We evaluate the performance of this controller on multiple topologies (bottleneck, grid, star, caveman) of significant scale, with varying reservation patterns, thereby also demonstrating the viability of the resource-centric task-based quantum network control framework at scaling. Our simulation results show that the caveman and grid topologies have a higher fraction of delivered requests with low delay compared to the star topology, but with a higher fraction of highly delayed requests as well. Furthermore, we find a linear shift of the CDFs in terms of queue size for all topologies depending on the reservation delay. More interestingly, we conclude that the CDFs of priority queues for the star topology converge fast into saturation for increasing request arrival rates, demonstrating together with the other results that the framework is robust for high load scenarios in quantum networks.

Author Bios:
Alexander Pirker is the CEO of Quantum Network Design (QND). The vision of QND is to provide simulation software to design, analyze and optimize quantum networks before building them. His main research fields and interests are quantum network architectures and protocols, their control frameworks, simulations and protocol stacks.

Robert J. Hayek is a predoctoral appointee within the Data Science and Learning Division at Argonne National Laboratory. He received his M.S. in Electrical Engineering from Northwestern University, Evanston, IL, USA (2025), and his B.S. in Computer Engineering (2023) from Ohio Northern University, Ada, OH, USA.

Alexander Kolar is a Ph.D. candidate in Quantum Science and Engineering at the Pritzker School of Molecular Engineering, University of Chicago, as well as a visiting graduate student in the Data Science and Learning Division at Argonne National Laboratory.

Igor Kadota is an Assistant Professor of Electrical and Computer Engineering at Northwestern University. Previously, he was a Postdoctoral Research Scientist at Columbia University. He received the Ph.D. degree from MIT LIDS and his B.Sc. degree from the Aeronautics Institute of Technology (ITA) in Brazil.

Joaquin Chung is a research scientist at the Data Science and Learning Division of Argonne National Laboratory. His work spans diverse areas that go from architecting of future quantum networked systems to designing systems for enabling memory-to-memory data streaming between federated instruments.

Rajkumar Kettimuthu is a senior scientist and group leader at Argonne National Laboratory. He has over 20 years of experience leading large-scale R&D efforts in high-performance computing, AI for science, scientific workflows, and advanced networking, with a recent emphasis on quantum networking and distributed quantum computing.

Optimizing Record/Replay through Relaxed Total Ordering and Multi-Version eXecution

Authors: David Schwartz

Institution(s): UIC

Room: Dillo

Board #: D6

Session #: 2

Abstract: Record/Replay (RR) allows developers to record an execution and deterministically replay it's behavior later to diagnose production bugs. Unfortunately, RR introduces non-negligible performance overhead for multi-threaded and I/O bound workloads. Recordings capture either a total or partial order of synchronization events, which the replayer later enforces. Partial orders produce faster replays, but require pre-processing before replay. Recording also effectively doubles the I/O performed as any data read must be duplicated and written to the log. We present two complimentary techniques to reduce the overhead of RR. Relaxed Total Order (RTO) is a weakening of total order that preserves cross-thread constraints needed for replay while avoiding unnecessary serialization. RTO can operate online (i.e., during runtime), enabling deterministic replay without pre-processing the log. RTO's strictness is a novel point between total and partial order. A prototype implementation in an existing RR system, JMVX, halves recording overhead from 21.7% to 13.7% and replay overhead from 64.9% to 13.2%. Second, we combine RR with Multi-Version eXecution (MVX), a technique similar to online RR, to eliminate RR's poor performance on I/O-bound processes. A follower variant, i.e., an online replayer in a separate process, absorbs the I/O needed for logging and backfills I/O which can be safely re-executed on the same system, keeping the user-facing leader off the critical path. Our prototype reduces the overhead on I/O bound programs from 196.1% to just 25.8%. Together, RTO and hybrid MVX/RR substantially narrow the gap between today’s RR systems and practical, low-overhead, always-on deployment.

Author Bios:
PhD student at the University of Illinois Chicago. I study Record/Replay (RR) and Multi-Version eXecution (MVX) systems. My focus is solving fundamental issues with RR/MVX techniques which prevent production deployment.

Enabling Floating Point Virtualization With Tiny Numbers

Authors: Kevin Hayes and Peter Dinda

Institution(s): Northwestern University

Room: Dillo

Board #: D3

Session #: 2

Abstract: Floating point virtualization allows existing, unmodified application binaries to be run using an alternative arithmetic system. Such virtualization is geared to alternative numbers that are “larger” (require more bits) than the IEEE 754 numbers (e.g., 64 bit doubles) they replace. In this work, we approach the challenge of virtualizing with “smaller” numbers (requiring fewer bits), which is of increasing interest given the explosion of low-precision hardware targeting AI. We focus specifically on the ubiquitous x64 architecture through a hardware/software co-design that leverages x64 functionality that currently lays fallow. The design combines (a) instruction traps via lazy FPU abduction, and (b) simplified memory management by tiny value boxing. We also develop an example tiny alternative arithmetic system that allows smaller IEEE 754 numbers, down to 3 bits, with the exact precision able to be specified on a per-value or per-instruction basis at runtime. Our prototype system is evaluated using validation and performance tests based on running NAS and other benchmarks with a range of lower precision numbers.

Author Bios:
Kevin Hayes:

Kevin Hayes is a current BS/MS student in Computer Science at Northwestern University, as well as an incoming PhD student at Northwestern. His primary interests lie in exploring novel/unique computer systems designs.

Peter Dinda:

Peter Dinda is a professor in the Department of Computer Science at Northwestern University, and also holds an appointment in the Department of Electrical and Computer Engineering. He works in experimental computer systems. You can find out more about him at pdinda.org.

StickyInvoc: Rethinking Task Models for High-throughput Workflows in the LLM Era

Authors: Thanh Son Phung, Douglas Thain

Institution(s): University of Notre Dame

Room: Dillo

Board #: D4

Session #: 2

Abstract: Executing large-scale, high-throughput lightweight Large Language Model (LLM) inference workflows on conventional shared clusters leads to long job queues due to inefficient static resource allocation. While opportunistic GPU clusters offer a vast pool of compute, their transient nature makes them challenging to utilize, as the overhead of repeatedly initializing an inference job's state with billions of parameters after preemption is prohibitively high. We introduce StickyInvoc, a symbiotic relationship between two new task models for high-throughput workflows. Specifically, a “sticky” task creates a persistent and inheritable state on a worker node from a user-provided template, while subsequent “invocation” tasks inherit this state and perform the actual computation without incurring the state creation overhead or destroying the state upon exit. This allows efficient reuse and rapid context transfer between inferences across intermittently available GPUs, virtually eliminating initialization costs. With StickyInvoc, a claim verification workflow achieves a 3.6x speedup on a stable testbed with 20 GPUs, and completes in just 784 seconds using 186 opportunistic GPUs.

Author Bios:
Thanh Son Phung is a PhD candidate in Computer Science and Engineering at the University of Notre Dame. His research focuses on optimizing large-scale workflow systems, specifically targeting improvements in resource efficiency and execution performance.

Douglas Thain is a Professor of Computer Science and Engineering at the University of Notre Dame, where he leads research in large-scale scientific computing. He holds a PhD from the University of Wisconsin-Madison and specializes in developing robust workflow and data management systems.

Bridging the Gap: Semantic Vector Middleware for Knowledge Graph Construction

Authors: Abdul Haseeb Shams, Mohammad Bachcha, Ahmed Khaled

Institution(s): Northeastern Illinois University

Room: Dillo

Board #: D7

Session #: 2

Abstract: Semi-structured data has created a wealth of information that remains difficult to navigate due to its nested and often inconsistent nature. Knowledge Graphs (KGs) offer a powerful and visual solution by transforming these isolated data objects into a connected web of entities and relationships. However, such data objects can represent similar or different entity types and can range from simple key-value pairs to deeply nested ones. Building coherent structured KG requires a carefully designed data pipeline that maintains semantic integrity as well as the context of the information. Such a data pipeline faces two main challenges. The first challenge is identifying when different data objects represent the same entity in order to merge them. Traditional string-matching or rule-based systems often fail to detect similarity in varied schemas, leading to duplicate nodes and fragmented graphs. The second challenge is determining the optimal level of normalization. Over-normalization can lead to trivial nodes that obscure high-level relationships, while excessive denormalization results in big nodes that lose the granular connectivity essential for graph-based discovery and reasoning. In this project we propose Doc2Graph to address these challenges. Doc2Graph implements a multi-tiered similarity solution by utilizing Vector Databases for a rapid cosine similarity filtering then a Large Language Models (LLMs) for a deep contextual similarity check. Doc2Graph also explores an adaptive and user-driven schema that balances normalization and denormalization, ensuring that the resulting KG representation is semantically rich across diverse entity types. The project is a work in progress and, in the presented work, we focus on addressing the first challenge.

Author Bios:
Abdul Haseeb Shams - Computer Science graduate student at Northeastern Illinois University with a background in IT infrastructure and systems integration. Have a track record in technical leadership, from managing large-scale operational systems to managing daily operations. A dedicated academic, has been recognized with the Rector's Merit Award for maintaining a 4.0 GPA and previously served as a Peer Tutor for subjects like Compiler Construction.

Dr. Ahmed Khaled - is an Associate Professor at Northeastern Illinois University and a Visiting Associate Professor at the University of Chicago, specializing in the design and development of distributed systems and Internet of Things (IoT) applications. His research focuses on smart healthcare, location-aware services, and data analytics, building on a Ph.D. from the University of Florida where he developed autonomous programming models for smart devices. Prior to his current work, he conducted research at Cairo University involving security schemes for Wireless Sensor Networks and fuzzy-based clustering for energy efficiency.

KVMSR+UDWeave: Extreme-Scaling with Fine-grained Parallelism on the UpDown Graph Supercomputer

Authors: Alexander Fell, Yuqing Wang, Tianshuo Su, Marziyeh Nourian, Wenyi Wang, Jose M. Monsalve-Diaz, Andronicus Rajasukumar, Jiya Su, Ruiqi Xu, Rajat Khandelwal, Tianchi Zhang, David F. Gleich, Yanjing Li, Hank Hoffmann, Andrew A. Chien

Institution(s): University of Chicago

Room: Dillo

Board #: D5

Session #: 2

Abstract: Programming irregular graph applications is challenging on today's scalable supercomputers. We describe a novel programming model, KVMSR+UDWeave, that supports extreme scaling by exposing fine-grained parallelism. By enabling the expression of maximum parallelism, it opens the door for extreme scaling, even on both small and large graph problems. KVMSR+UDWeave cleanly separates the three key dimensions of parallel programming: parallelism, computation binding, and data placement. This decomposition reduces effort to achieve scalable, high-performance for graph algorithms on real-world, highly skewed graphs. Key features of the UpDown supercomputer (computation location naming and shared global address space) enable decomposition and scalable, high performance. In the IARPA AGILE program, we built numerous graph benchmarks and workflows, and use them to illustrate the programming model. Simulation results for UpDown show excellent strong-scaling to million-fold hardware parallelism and high absolute performance. Results suggest KVMSR+UDWeave enables reduced programming effort for scaling the most demanding irregular applications.

Author Bios:
Alexander Fell received his PhD from the Indian Institute of Science, Bangalore, India, and during his career, he contributed to the RISC-V architecture and LLVM compiler development at the Barcelona Supercomputing Center and Nanyang Technological University, Singapore. Now, he is a Senior Research Specialist at the University of Chicago, focusing on the compiler and simulator for the UpDown architecture.

Yuqing (Ivy) Wang is a computer science PhD student at the University of Chicago. She received her BSci degree from the University of Edinburgh. Her research spans parallel programming, operating systems, and unified memory models for novel accelerator systems.

Tianshuo Su received his BS from the University of Wisconsin - Madison and his MS from Georgia Institute of Technology. His research interests include computer architecture and high-performance computing. He was a senior engineer at Qualcomm working on DSP performance. Currently, he works at Google in cloud infrastructure efficiency.

Marziyeh Nourian received her Ph.D. in Computer Engineering from North Carolina State University. Her work addresses challenges in high-performance computing, focusing on data transformation, scalable execution of large-scale workloads, and hardware–software co-design. Her expertise includes parallel computing, heterogeneous and reconfigurable architectures, and compiler and runtime techniques.

Wenyi Wang is a PhD student at the University of Chicago. He earned his B.E. from Northeastern University, China, and his M.S. from Northwestern University, USA. His research focuses on high-performance computing and parallel computing.

Jose M. Monsalve Diaz is a Member of Technical Staff at Advanced Micro Devices (AMD). He received his PhD and MS in Electrical and Computer Engineering from the University of Delaware and his BSc in Electronics Engineering from Pontificia Universidad Javeriana, Colombia. He was a Postdoctoral Researcher at Argonne National Laboratory. His research interests include runtime systems, compilers, and heterogeneous computing.

Andronicus Rajasukumar is a PhD candidate at the University of Chicago. He received his BE from the College of Engineering, Guindy, India, and MTech from the Indian Institute of Technology, Delhi. He worked as a Senior Staff Engineer at Qualcomm (2014-20) and Intel (2010-14) in 3D-graphics GPU pre-silicon performance. His current research spans specialized and general-purpose computer architectures and processing near memory.

Jiya Su is a Ph.D. student in Computer Science at the University of Chicago. She received the B.S. degree from Renmin University of China in 2020 and the M.S. degree from the Illinois Institute of Technology in 2023. Her research focuses on graph processing, parallel computing, and computer architecture.

Ruiqi Xu received the BS and MS degrees from Northwestern University in 2023. He is currently working toward a PhD degree at the University of Chicago. His research interests include computer architecture and parallel programming.

Rajat Khandelwal is a PhD student at the University of Chicago. He received his B.Tech from the Indian Institute of Technology BHU, India. He worked as a System Software Developer at Intel (2020–2023) on USB, Type-C, and Thunderbolt subsystems. His current research spans systems and architecture, focusing on networking and high-performance data movement.

Tianchi Zhang received the B.S. degree from the University of Michigan and Shanghai Jiao Tong University in 2020 and the M.S. degree from the University of Michigan in 2023. He is currently a Ph.D. student at the University of Chicago. His research interest is computer architecture

David F. Gleich received the BS degree from Harvey Mudd College and the PhD degree from Stanford University. He is a Professor of Computer Science at Purdue University and a University Faculty Scholar. His research is on data-driven scientific computing, matrix computations, network and graph algorithms, and parallel and distributed computing. He is a fellow of SIAM

Yanjing Li received the BS and MS degrees from Carnegie Mellon University and a PhD from Stanford. She is an Associate Professor of the Department of Electrical and Computer Engineering at Northeastern University. Her research includes intersections of AI and systems, computing architecture for emerging technologies and applications, and hardware security.

Henry Hoffmann received the BS degree from UNC-Chapel Hill and the SM and PhD degrees from MIT. He is the Liew Family Chair of the Department of Computer Science at the University of Chicago. His research includes adaptive and self-aware computing systems

Andrew A. Chien received the BS, MS, and PhD degrees from the Massachusetts Institute of Technology. He is the William Eckhardt Distinguished Service Professor in Computer Science at the University of Chicago and a senior computer scientist at Argonne National Laboratory. He has broad interests in computer systems and renewable energy. Chien served as Vice President of Research of Intel Corporation (2005-10), the SAIC Professor in computer science at the University of California, San Diego (1998-2005), and professor of computer science at the University of Illinois (1990-8). He is a fellow of the ACM, IEEE, and AAAS.

Minimal Hardware Extensions for Object-Level Memory Management

Authors: Nick Wanninger, Nikos Hardavellas, Peter Dinda

Institution(s): Northwestern University

Room: Dillo

Board #: D2

Session #: 2

Abstract: The creation of virtual memory was a watershed moment for systems software. It enabled safe multi-tenancy, memory protection, and demand paging through a simple but powerful abstraction: the page. Yet the page is a poor match for how modern applications actually organize memory. User-space programs allocate, access, and optimize around objects, often only tens of bytes in size, while the kernel manages memory in multi-kilobyte pages. This granularity mismatch forces virtual-memory mechanisms such as swapping, protection, and access tracking to operate over pages that may indiscriminately mix hot and cold objects. This is fundamentally because the virtual memory system cannot have generalized introspection into each application's specific memory semantics and use cases. In this work, we present Yukon, a hardware extension that extends the virtual-memory philosophy to individual heap objects through handles: a level of indirection managed entirely in user space. Yukon adds a simple hardware structure, the Handle Translation Lookaside Buffer (HTLB), to the load/store unit allowing handle-based object translation to occur directly in hardware rather than through software handle dereferences. We describe an implementation of the HTLB in the BOOMv3 RISC-V out-of-order core running in FireSim. With a small hardware addition and a simple hardware/software interface, Yukon enables virtual-memory-like mechanisms at object granularity and under user-space control. By making individual objects relocatable and observable, Yukon opens the door to fine-grained memory management techniques such as object-level swapping, compression, protection, and locality optimization without requiring the kernel to understand application-specific data structures.

Author Bios:
Nick Wanninger is a Ph.D. candidate in Computer Science at Northwestern University, advised by Peter Dinda. His research focuses on compiler and runtime techniques for memory management in unmanaged languages, including his current project Alaska, which brings object mobility to C and C++. He is expected to graduate in 2026 and is pursuing research roles in industry.

LimAgents: Scalable Multi-Agent RAG for Scientific Critique Generation at Corpus Scale

Authors: Ibrahim Al Azher, Hamed Alhoori

Institution(s): Northern Illinois University

Room: Dillo

Board #: D8

Session #: 2

Abstract: Identifying research limitations is central to rigorous scientific assessment. However, zero-shot large language models (LLMs) often produce superficial or generic critiques. They usually restate the author's disclosed weaknesses while missing deeper methodological and contextual flaws. This problem is made worse because many authors disclose only partial or trivial limitations. We introduce \textbf{SciLimAgents}, a model-agnostic multi-agent framework for generating evidence-based limitations from scholarly text. It consists with LimitAgents and NovAgents. Here, LimitAgents uses empirically derived specialist roles to decompose limitation generation across complementary critique dimensions, while NovAgents, a novelty-focused extension compares each paper against retrieved related work to surface novelty-specific weaknesses that single-paper analysis misses. To ground critique beyond the input manuscript, our framework incorporates shared-memory retrieval over a 120K-paper scholarly corpus and citation-grounded evidence from referenced work. We evaluate on a large-scale AI/ML benchmark, where the full framework consistently outperforms strong zero-shot and multi-agent baselines, improving F1 by 20.15 to 44.77 points. We further validate both the limitation extraction pipeline and the evaluation protocol with human annotators, obtaining agreement of at least 0.92.

Author Bios:
a) Ibrahim Al Azher, PhD student, Department of Computer Science, Northern Illinois University. b) Hamed Alhoori, Associate Professor, Department of Computer Science, Northern Illinois University.

Session #1

Room: Rock

TIDES: Test-time Inference Drift Exploitation via Scaling

Authors: Haoran Dai , Haozheng Luo, Haotian Zhang, Meng lin, Yan Chen, Binghui Wang

Institution(s): NU, IIT

Room: Rock

Board #: R9

Session #: 1

Abstract: We propose TIDES, a reasoning-attacking method that exposes a previously unrecognized failure of test-time scaling: as reasoning traces lengthen, model performance degrades sharply rather than improves. Unlike prior attacks on large reasoning models (LRMs), TIDES exploits the intrinsic properties of test-time scaling laws to manipulate reasoning trace length, producing degradations that are inherently difficult to detect. Methodologically, we define Depth-Guided Latent Tracker (DLT), a depth-based tracker that injects microscopic steering vectors into intermediate reasoning traces stealthily and combines them with on-policy distillation to precisely position LRMs under test-time scaling. Theoretically, we model latent space as a depth-indexed dynamic process and prove that under test-time scaling, small bounded perturbations introduced at intermediate layers induce non-vanishing trajectory drift, explaining why DLT remains effective yet difficult to detect in large reasoning models. Empirically, we evaluate TIDES on multiple reasoning benchmarks using two strong reasoning models, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Llama-8B, where it consistently outperforms state-of-the-art reasoning attack methods such as DecepChain and BadChain. Notably, TIDES delivers an average 30.3% improvement in attack performance over the baselines, demonstrating that TIDES remains efficient within large reasoning model generation.

Author Bios:
Haozheng Luo is a fourth-year Ph.D. candidate in Computer Science at Northwestern University, advised by Professor Yan Chen. As an Adobe and DAAD AINeT Fellow, his research centers on AI safety and efficient reasoning within large language and multi-modal models. Additionally, he founded GLB², an open-source community dedicated to extending foundation models for everyday, real-world applications.

Haoran Dai is a Ph.D. student in Computer Science at the Illinois Institute of Technology, advised by Professor Binghui Wang. His research expertise lies in Trustworthy AI, with a specific focus on evaluating the safety, security, and inherent vulnerabilities of large reasoning models and multimodal diffusion architectures.

An Agentic AI Flow for Alert Correlation in Enterprise Networks

Authors: Cheng-Yun King Yang, Dongyan Xu

Institution(s): Purdue University

Room: Rock

Board #: R7

Session #: 1

Abstract: Alert correlation is a critical process that transforms isolated detections into coherent attack narratives, enabling analysts to capture multi-stage intrusions that would otherwise surface as hundreds of unrelated, low-severity events. Microsoft’s Managed Extended Detection and Response (MXDR) facilitates this by automating alert grouping. However, it often exhibits systematic blind spots in complex scenarios. For instance, MXDR frequently fragments alerts into disconnected incidents when an attacker targets multiple users from a single IP or when identical malware re-executes on the same account. To bridge these gaps, we propose a daily AI agent pipeline that operates alongside MXDR. A correlation subagent first scans ingested alerts for shared entities — IPs, file hashes, and user accounts — across time windows, producing a structured analysis packet. An analysis subagent then interprets these packets, produces prioritized SOC reports, and feeds confirmed misses back into a rule refinement subagent—incrementally expanding correlation coverage with each iteration. Evaluated on real enterprise alerts, the system surfaced multi-day attack campaigns and recurring correlation patterns that MXDR consistently missed.

Author Bios:
Cheng-Yun King Yang is a PhD student in Purdue University. His research bridges theoretical ML robustness in PIDS and embedded systems with practical AI agent applications, such as automated alert triage and red team simulations.

Dongyan Xu is a Samuel D. Conte Professor of Computer Science and Director of CERIAS, Purdue's cybersecurity research center. His research focuses on cyber and cyber-physical security. He has also made early contributions to the areas of cloud computing and peer-to-peer media streaming/distribution.

ChronoLog: Distributed Shared Tiered Log Store

Authors: Eneko Gonzalez, Kun Feng, Inna Brodkin, Kyle Chard, Anthony Kougkas, Xian-He Sun

Institution(s): Illinois Institute of Technology, University of Chicago

Room: Rock

Board #: R5

Session #: 1

Abstract: Modern computing systems generate data at unprecedented rates exceeding TB/s. This growth is driven by the proliferation of sensors, scientific instruments, Internet-of-Things devices, and human activities such as web, mobile, and edge computing. Beyond conventional data storage, an increasingly prevalent requirement is the capture of activity data, also known as log data, which records events as they occur rather than static state. A wide range of domains, including scientific applications, depend on high-performance logging methods that existing solutions cannot deliver at scale. ChronoLog is a scalable distributed log store designed to handle the growing volume of activity data, tailored for applications ranging from edge computing to High Performance Computing (HPC) systems. ChronoLog provides several capabilities that distinguish it from existing data streaming platforms: it guarantees log ordering across distributed nodes using physical time, thereby avoiding expensive synchronization operations; employs multiple storage tiers to scale log capacity; supports efficient concurrent access to data; and enables range queries for partial log processing. This poster presents the ChronoLog architecture and its principal components—ChronoVisor, ChronoKeeper, ChronoGrapher, and ChronoPlayer—together with two representative use cases. The first demonstrates live HPC telemetry monitoring, in which system statistics such as CPU, memory, and network utilization are logged into ChronoLog and visualized through a Grafana plugin developed as a custom data source. The second presents the ChronoLog Model Context Protocol (MCP) server, which provides episodic memory for Claude Code-based agentic workflows by recording agent behaviors and retrieving them via ChronoLog's time-based ordering guarantees.

Author Bios:
Eneko Gonzalez is a Research Software Engineer at the Gnosis Research Center (Illinois Institute of Technology), specializing in distributed systems for high-performance computing and AI infrastructure. He holds a Double Master's degree in Computer Science (IIT) and Telecommunications Engineering (UPV/EHU, Spain), with prior experience in international standards-based communications and computer vision research.
---------------------------------------------------------------------------------------------------------------------------------------------
Dr. Kun Feng is a Research Software Engineer in the Department of Computer Science at the Illinois Institute of Technology, where he is a member of the Gnosis Research Center. Working under the mentorship of Dr. Xian-He Sun, his research focuses on high-performance computing, scalable I/O systems, and data management for distributed workflows.
---------------------------------------------------------------------------------------------------------------------------------------------
Inna Brodkin is a Senior Software and Data Engineer based in Evanston, Illinois, with over a decade of experience building scalable, high-performance distributed systems for Capital Markets and Internet industries. Currently a Research Software Engineer at the University of Chicago, she holds degrees from the University of Chicago and New York University.
---------------------------------------------------------------------------------------------------------------------------------------------
Dr. Kyle Chard is a Research Associate Professor in the Department of Computer Science at the University of Chicago, with a joint appointment at Argonne National Laboratory. He co-leads the Globus Labs research group, focusing on data-intensive computing and research data management. He earned his Ph.D. from Victoria University of Wellington, New Zealand, and is a member of both ACM and IEEE.
---------------------------------------------------------------------------------------------------------------------------------------------
Dr. Anthony Kougkas is an Associate Research Professor of Computer Science at Illinois Tech and Deputy Director of the Gnosis Research Center, where he also serves as a Guest Research Scientist at Argonne National Laboratory. His research spans HPC storage and I/O, data management for scientific workflows, and systems for AI and machine learning, with over 50 peer-reviewed publications.
---------------------------------------------------------------------------------------------------------------------------------------------
Dr. Xian-He Sun is a University Distinguished Professor, the Ron Hochsprung Endowed Chair of Computer Science, and Director of the Gnosis Research Center at Illinois Institute of Technology. An IEEE Fellow with over 300 publications and six U.S. patents, his research focuses on parallel and distributed processing, high-performance memory and I/O systems, and software for Big Data applications.

A Lightweight AI-Driven Framework for Rapid Intrusion Detection in IoT

Authors: Mina Habibollahi and Sharief M. A. Oteafy

Institution(s): Depaul University

Room: Rock

Board #: R8

Session #: 1

Abstract: IoT systems are notoriously susceptible to network attacks, given their ubiquity and often resource-constrained Intrusion Detection Systems (IDS). While various IDS solutions have been proposed to secure IoT systems, machine learning (ML) and deep neural network (DNN) based techniques are of recent focus in this field, given their ability to adapt to newer attacks. However, such ML and DNN based IDSs are typically classified as heavyweight or lightweight based on their resource footprint. While lightweight IDSs are often chosen for deployment on IoT and edge devices due to their limited resource requirements, their accuracy is typically limited, and improving it typically impacts the speed of detection, and vice versa. In remedy, we introduce a framework that provides an effective tradeoff between accuracy and speed, while keeping AI-based IDS models lightweight for IoT Edge deployment. By combining edge and cloud support, our self-improving framework achieves a precision of 0.92, recall of 1.0, and F1-score of 0.93, providing reliable and rapid attack detection while demonstrating signifi- cantly faster inference compared to baseline models

Author Bios:
Short Bio
Mina Habibollahi is a third-year PhD student at DePaul University, working under the supervision of Sharief Oteafy. Her research focuses on AI-based intrusion detection systems (IDS) for Internet of Things (IoT) devices. She works at the intersection of IoT systems and artificial intelligence, with a focus on agentic IoT for resource-constrained environments.

Short Bio
Dr. Sharief M. A. Oteafy is an Associate Professor in the School of Computing at DePaul University, USA, and founder of the Next Generation Networking Laboratory. His research focuses on the Tactile Internet, medical IoT, next-generation networking systems, and big sensed data orchestration. He is a Senior Member of IEEE, lead Technical Editor of the IEEE Tactile Internet Standard, and an editor for several IEEE journals.

Explaining the Unseen: Multimodal Vision-Language Reasoning for Situational Awareness in Underground Mining Disasters

Authors: Mizanur Rahman Jewel and Sanjay Madria

Institution(s): Missouri University of Science and Technology, Rolla, MO

Room: Rock

Board #: R11

Session #: 1

Abstract: Underground mining disasters produce pervasive darkness, dust, and collapses that obscure vision and make situational awareness difficult for humans and conventional systems. To address this, we propose MDSE, Multimodal Disaster Situation Explainer, a novel vision-language framework that automatically generates detailed textual explanations of post-disaster underground scenes. MDSE has three-fold innovations: (i) Context-Aware Cross-Attention for robust alignment of visual and textual features even under severe degradation; (ii) Segmentation-aware dual pathway visual encoding that fuses global and region-specific embeddings; and (iii) Resource-Efficient Transformer-Based Language Model for expressive caption generation with minimal compute cost. To support this task, we present the Underground Mine Disaster (UMD) dataset--the first image-caption corpus of real underground disaster scenes--enabling rigorous training and evaluation. Extensive experiments on UMD and related benchmarks show that MDSE substantially outperforms state-of-the-art captioning models, producing more accurate and contextually relevant descriptions that capture crucial details in obscured environments, improving situational awareness for underground emergency response. The code is at https://github.com/mizanJewel/Multimodal-Disaster-Situation-Explainer.

Author Bios:
Mizanur Rahman Jewel is a PhD candidate in Computer Science at Missouri University of Science and Technology, where his research focuses on multimodal AI and computer vision for safety-critical underground mine environments. He leverages vision–language models and sensor fusion to build reliable situational awareness systems for post-disaster response.
Sanjay K Madria is a Curators’ Distinguished Professor in the Department of Computer Science at the Missouri University of Science and Technology (formerly, University of Missouri-Rolla, USA).

QuantEM: The Quantum Error Management Compiler

Authors: Ji Liu, Quinn Langfitt, Mingyoung Jessica Jeng, Alvin Gonzales, Noble Agyeman-Bobie, Kaiya Jones, Siddharth Vijaymurugan, Daniel Dilley, Zain H. Saleem, Nikos Hardavellas, Kaitlin N. Smith

Institution(s): Argonne National Laboratory, Northwestern University, Grambling State University, Tuskegee University

Room: Rock

Board #: R2

Session #: 1

Abstract: During the transition from NISQ devices to fault-tolerant systems, quantum error detection (QED) codes offer a practical means for improving the utility of quantum computers. However, applying QED is challenging: developers must manually select codes, insert checks, manage ancillas, and remap circuits—a complex, error-prone process where trade-offs are difficult to assess under real hardware constraints. Here, we introduce the Quantum Error Management Compiler (QuantEM), a modular framework that automates the integration of QED into arbitrary quantum circuits. QuantEM performs program analysis to identify regions amenable to protection, selects appropriate detection codes given algorithmic and hardware constraints, and inserts ancilla-based subcircuits without disturbing the original computation. QuantEM currently supports Pauli Check Sandwiching and Iceberg codes, and its extensible design enables the integration of additional detection schemes and hardware platforms. Furthermore, QuantEM incorporates novel QED applications. One application is the error mitigation technique Pauli Check Extrapolation (PCE). Benchmarks for PCE show significant performance improvement over ZNE, especially as circuit sizes increase. On real IBM hardware, PCE achieves an accuracy of up to 99.2% (56.2% improvement over baseline), compared to ZNE's 82% accuracy, on 4-qubit circuits. Another application is Dynamic Resource Allocation (DRA) with Pauli Checks, which is a method for characterizing and mapping quantum circuits at runtime, utilizing the Upper-Confidence Bound algorithm to identify optimal resource allocation regions. When applied to QAOA, DRA with Pauli Checks increases circuit fidelity by 15% on average (up to ~30%) while providing shot budget savings up to 58%.

Author Bios:

Ji Liu is an Assistant Computer Scientist under the supervision of Dr. Paul Hovland. He received his Ph.D. from North Carolina State University, Raleigh, NC, where he majored in computer engineering under the supervision of Prof. Huiyang Zhou.

Quinn Langfitt is a second-year PhD student in Computer Science at Northwestern University, where he is advised by Prof. Kaitlin N. Smith. His research focuses on quantum computing, with an emphasis on quantum error mitigation and characterization, and classical machine learning for quantum systems.

Mingyoung Jessica Jeng is a third-year PhD student in Computer Science at Northwestern University, where she is advised by Prof. Nikos Hardavellas. Her research interests include modular quantum architectures, quantum data representations, and hybrid quantum-classical algorithms.

Alvin Gonzales received his Ph.D. in 2021 from Southern Illinois University Carbondale and joined Argonne under the Intelligence Community Postdoctoral Research Fellowship Program in October 2023.

Noble Agyeman-Bobie is an undergraduate student at Grambling State University, where he is pursuing degrees in Computer Science, Mathematics, and Physics. His research interests include quantum error mitigation for NISQ systems and higher-dimensional quantum computing.

Kaiya Jones graduated summa cum laude with a BS in Computer Science and a minor in Mathematics from Tuskegee University. She was an Open Quantum Initiative (OQI) Fellow through the Chicago Quantum Exchange, during which she conducted quantum computing research at Northwestern University.

Siddharth Vijaymurugan is a second-year undergraduate student at Duke University, where he is pursuing degrees in Physics and Mathematics. He was a SULI research intern at Argonne National Laboratory, contributing to the development of the QuantEM compiler.

Daniel Dilley is a Postdoctoral Researcher at Argonne National Laboratory. His research focuses on quantum information theory, including quantum nonlocality, entanglement theory, quantum error correction, and distributed quantum computing.

Zain Hamid Saleem is a theoretical physicist working in the area of quantum information and computation at the Argonne National Laboratory.

Nikos Hardavellas is a professor of Computer Science and Computer Engineering at Northwestern University, where he directs the Parallel Architecture Group at Northwestern.

Kaitlin N. Smith is an Assistant Professor of Computer Science at Northwestern University. Her research is focused on quantum computing systems, software, and architecture.

Towards Noise-Aware Quantum Compilation for Hamiltonian Simulation

Authors: Mu-Te Lau, April Wang, Kaitlin N. Smith, Nikos Hardavellas

Institution(s): Department of Computer Science, Northwestern University

Room: Rock

Board #: R3

Session #: 1

Abstract: Simulating quantum systems on quantum computers requires compiling fermionic Hamiltonians into hardware-executable circuits, a process where efficiency and noise tolerance are critical. We build on the Treespilation pipeline, which synthesizes hardware-aware circuits by vertically integrating multiple compilation stages. We show that total Pauli weight serves as a computationally cheaper alternative metric for optimizing Treespilation's internal ternary tree representation. We further introduce noise-awareness into the pipeline by penalizing less reliable qubit couplings, reducing geometric mean two-qubit gate error rates by ~25% across multiple benchmarks.

Author Bios:
Mu-Te Lau is a PhD student of Computer Science at Northwestern University. He builds compilers and other system software for quantum computers.

April Wang:
April is a third-year undergraduate student at Northwestern University studying Computer Science and Integrated Science, and minoring in Physics. Her research interests include the development and application of quantum algorithms, particularly for scientific problems. After graduating, she hopes to pursue a graduate degree in quantum information science.

Kate Smith:
Kate is an Assistant Professor of Computer Science at Northwestern University.

Nikos Hardvellas:
Nikos Hardavellas is a professor of Computer Science and Computer Engineering at Northwestern University, where he directs the Parallel Architecture Group at Northwestern (PARAG@N, https://paragon.cs.northwestern.edu/).

TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

Institution(s): Northwestern University

Room: Rock

Board #: R10

Session #: 1

Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state-of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.

Author Bios:
Yueyuan Sui is a Computer Engineering PhD student at Northwestern University. His research focuses on mobile, wearable, and embedded AI systems, with an emphasis on efficient machine learning for real-time sensing and speech enhancement on resource-constrained platforms.

Minghui (Scott) Zhao is a Ph.D. candidate in Electrical Engineering at Columbia University, advised by Prof. Xiaofan (Fred) Jiang in the Intelligent and Connected Systems Lab. His research focuses on embodied and embedded AI systems, mobile and ubiquitous computing, and hardware-software co-design for intelligent systems operating in the physical world.

Junxi Xia is a Computer Engineering Ph.D. student at Northwestern University. His research interests include mobile and embedded AI systems, wearable sensing, speech enhancement, and cyber-physical systems, with a focus on building practical intelligent systems for real-world sensing and deployment.

Xiaofan (Fred) Jiang is an Associate Professor of Electrical Engineering at Columbia University and Chair of the Smart Cities Center at the Data Science Institute. His research focuses on intelligent and connected systems, spanning mobile and embedded computing, IoT, cyber-physical systems, smart cities, and data-driven sensing systems.

Stephen Xia is an Assistant Professor of Electrical and Computer Engineering at Northwestern University, with a courtesy appointment in Computer Science. His research focuses on bringing artificial intelligence into the physical world through resource-efficient systems for mobile, embedded, wearable, IoT, cyber-physical, and smart-environment applications.

Evaluating the Robustness of Jailbreak Defenses to Speech-Input Large Language Models

Authors: Kris Yun, Janelys Graciano Betancourt, Rafael Molina

Institution(s): Northwestern University

Room: Rock

Board #: R6

Session #: 1

Abstract: Speech Large Language Models (SpeechLLMs) unify speech perception and language reasoning in a single framework and are rapidly being deployed in assistants, accessibility tools, and multimodal agents. While safety of text‑only LLMs has been extensively studied, the safety of SpeechLLMs is less understood, especially under audio‑conditioned jailbreak and adversarial attacks. This work investigates whether current SpeechLLM safety is sufficient under a suite of strong attacks and defenses. Building on recent safety benchmarks for audio‑language models, we use the AABench dataset of red‑team and harmful prompts as audio queries and quantify robustness with Attack Success Rate (ASR) and StrongReject‑based refusal metrics. We consider several prompt‑level and audio‑level attacks: JBC, PAIR, PGD, AutoDan, BOOST+GPTFuzzer, TAP and Autoattack. Our experimental pipeline targets a diverse set of open and closed SpeechLLMs, evaluates models with and without an external StrongReject safety layer, and measures safety–utility trade‑offs.

Author Bios:
Kris is an undergraduate student at Northwestern majoring in computer science with a concentration in security. In school, she has been a peer mentor for Foundations of Security, Introduction to Computer Systems, and Data Structures and Algorithms. Outside of school, Kris has worked at Amazon, Shostack + Associates, and CAS LLC in various engineering and security roles.

Evaluating Gate Decomposition Strategies for High-Dimensional Quantum Systems

Authors: Katie Harrison, Leonardo Bove, Yuchen Zhu, Mu-Te Lau, Tanay Roy, José Cruz Serrallés, Nikos Hardavellas, Kate Smith

Institution(s): Northwestern University, Fermi National Lab

Room: Rock

Board #: R1

Session #: 1

Abstract: Higher-dimensional quantum systems (qudits) provide a promising route toward more efficient quantum computation by increasing the information encoded per physical mode. However, compiling arbitrary unitary operations into hardware-executable instructions remains a key bottleneck, particularly for superconducting cavity-based platforms. In this work, we study practical compilation strategies for qudit systems using SNAP (Selective Number-dependent Arbitrary Phase) and displacement gates, a universal control set for bosonic architectures. We evaluate both structured and search-based decomposition methods across a range of system sizes, circuit depths, and initialization conditions. Our analysis focuses on fidelity, gate count, and variability across optimization runs. Through systematic parameter sweeps, we identify tradeoffs between expressivity and stability, showing that search-based approaches can achieve high fidelity but exhibit significant run-to-run variability due to non-convex optimization landscapes. In contrast, structured methods provide more consistent performance, though often at the cost of deeper circuits. These results highlight key challenges in compiling for near-term qudit hardware and provide guidance on selecting decomposition strategies under realistic constraints. More broadly, this work advances efforts to bridge the gap between abstract quantum algorithms and practical, hardware-level implementations.

Author Bios:
Katie Harrison
Katie Harrison is a PhD student in Computer Science at Northwestern University working on quantum computing systems, with a focus on compiler design for qudit architectures. Her research explores gate decomposition using SNAP and displacement operations for hardware-executable quantum control.

Leonardo Bove
Leonardo Bove is a researcher at the SQMS Center at Fermilab, where he works on superconducting qubit systems and calibration frameworks for transmon and cavity-based architectures. He focuses on hardware bring-up, automated calibration, and bosonic qubit encodings.

Yuchen Zhu
Yuchen Zhu is a PhD student at Northwestern University studying architectural support for next-generation quantum computing systems. His work focuses on circuit design, compilation, and bridging high-level programs with hardware execution.

Mu-Te Lau
Mu-Te Lau is a PhD student at Northwestern University researching system software for quantum computing, with an emphasis on quantum compilation. His work focuses on improving efficiency and reducing the cost of quantum algorithms through compiler optimizations.

Tanay Roy
Tanay Roy is an Associate Scientist at the SQMS Center at Fermilab, where he develops quantum computing architectures based on superconducting radio-frequency cavities. His research focuses on high-dimensional qudit systems and scalable quantum technologies.

José E. Cruz Serrallés
José E. Cruz Serrallés is a Postdoctoral Fellow at NYU Langone working on quantum algorithms for MRI and scientific computing. His background includes algorithm development for inverse problems and imaging systems.

Nikos Hardavellas
Nikos Hardavellas is a Professor at Northwestern University and director of PARAG, specializing in computer architecture and systems. His research spans compilers, memory systems, and quantum computing, with a focus on hardware–software co-design.

Kate Smith
Kate Smith is an Assistant Professor at Northwestern University whose research focuses on quantum computing systems, software, and architecture. Her work includes quantum compilation, error mitigation, and simulation across diverse quantum platforms.

StreamGuard: Low-Overhead Resilience for Real-time HPC Data Streams

Authors: Hai Duc Nguyen, Bogdan Nicolae, Tekin Bicer, Amal Gueroudji, Matthieu Dorier, Kyle Chard, and Ian Foster

Institution(s): Argonne National Laboratory and University of Chicago

Room: Rock

Board #: R4

Session #: 1

Abstract: Real-time scientific workflows operate on continuous data streams and must produce timely, high-quality results despite executing on complex, failure-prone infrastructure. Hardware faults, network disruptions, and performance anomalies caused by resource contention or system heterogeneity can severely degrade performance and violate real-time constraints. We focus on strengthening the resilience of the producer-consumer streaming pattern, a fundamental building block of scientific streaming workflows. We present two complementary techniques: (i) a dynamic, asynchronous, non-blocking checkpointing mechanism that preserves progress without interrupting computation, and (ii) a progress-aware load redistribution strategy that detects slow workers and proactively rebalances tasks. Together, these mechanisms maintain forward progress and balanced execution even in highly error-prone environments. Experimental results show that our approach reduces the impact of failures and performance anomalies by up to 6x, while introducing less than 1% overhead in failure-free execution.

Author Bios:
Hai Duc Nguyen is a Postdoctoral Appointee in the Data Science and Learning Division at Argonne National Laboratory and a CASE Postdoctoral Scholar at the University of Chicago. His work focuses on resilient, large-scale high-performance software systems. He received his Ph.D. and M.Sc. in Computer Science from the University of Chicago and his B.E. from Vietnam National University Ho Chi Minh City.

Bogdan Nicolae is a Computer Scientist at Argonne National Laboratory and a Research Professor at the Illinois Institute of Technology. His research focuses on scalable storage, data management, and fault tolerance for large-scale distributed systems, particularly at the intersection of high-performance computing, big data analytics, and artificial intelligence. He is a senior member of ACM and IEEE.

Tekin Bicer is a Computer Scientist in the Data Science and Learning Division at Argonne National Laboratory, with joint appointments at the University of Chicago and the Advanced Photon Source. His research focuses on parallel and distributed systems, high-performance computing, cloud computing, and data science. He is a senior member of IEEE and ACM.

Amal Gueroudji is a Postdoctoral Appointee at Argonne National Laboratory, specializing in high-performance computing and data management. She received her Ph.D. from Université Grenoble Alpes and conducted research at Maison de la Simulation. Her work focuses on workflows, data services, distributed computing, in situ analytics, and task-based programming for scientific computing.

Matthieu Dorier is a Software Development Specialist at Argonne National Laboratory. His research focuses on data storage and management for HPC, data services, in situ analysis and visualization, and software development methodologies for HPC applications. He is one of the core developers of the R&D 100 Award-winning Mochi project.

Kyle Chard is a Research Associate Professor in the Department of Computer Science at the University of Chicago and holds a joint appointment at Argonne National Laboratory. His research focuses on data-intensive computing and research data management. He co-leads the Globus Labs research group and has received several awards, including the IEEE TCHPC Award for Excellence for Early Career Researchers in HPC.

Ian Foster is the Director of the Data Science and Learning Division, Senior Scientist, and Distinguished Fellow at Argonne National Laboratory, and the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago. His research spans high-performance computing, distributed systems, and data-driven discovery. He is a Fellow of AAAS, ACM, BCS, and IEEE.

Room: Lake

SAIN: Improving ICS Attack Detection Sensitivity via State-Aware Invariants

Authors: Syed Ghazanfar Abbas, Muslum Ozgur Ozmen, Abdulellah Alsaheel, Arslan Khan, Z. Berkay Celik, Dongyan Xu

Institution(s): Purdue University

Room: Lake

Board #: L20

Session #: 1

Abstract: Industrial Control Systems (ICSs) rely on Programmable Logic Controllers (PLCs) to operate within a set of states. The states are composed of variables that determine how sensor data is interpreted, configuration parameters are applied, and actuator commands are issued. Recent works have shown that attackers can manipulate these variables to compromise ICS safety and security. To detect such attacks, previous approaches have leveraged invariants—a set of rules defining the correct behavior of an ICS. However, these invariants suffer from a critical limitation: they are state-agnostic. This means they define variable ranges across all possible ICS states, leading to loosely bounded detection thresholds. Unfortunately, attackers can exploit these loose bounds and launch stealthy attacks that evade detection without violating such invariants. In this paper, we introduce SAIN, an automated method to derive state-aware ICS invariants with tighter bounds and enforce them through a PLC-based monitor. SAIN first generates invariant templates by identifying the PLC program states, state transitions, and the inter-dependencies among sensing, actuation, and configuration variables within each state through program analysis. It then partitions the ICS data traces into state-specific sub-traces and quantifies the invariant templates with concrete, tighter bounds, as system-specific knowledge about the subject ICS. Lastly, it enforces the state-aware invariants through a run-time monitor. We evaluate SAIN on a Fischertechnik manufacturing plant and a chemical plant simulator against 17 attacks. SAIN protects the plants, on average, with a false positive rate of 2% and a run-time overhead of 3%.

Author Bios:
Syed Ghazanfar Abbas is a graduate researcher in the Department of Computer Science at Purdue University and a member of the Purdue Security Laboratory (PurSec Lab). His research focuses on the design and evaluation of security for software and systems, particularly in Cyber-Physical Systems and the Internet of Things (IoT).

pMVX: Policy-Level Multi-Version Execution for Agentic OS Kernel Self-Tuning

Authors: Sujot Singh, Kenan Alghythee, Chalmers Phua , Xiaoguang Wang

Institution(s): University of Illinois Chicago

Room: Lake

Board #: L14

Session #: 1

Abstract: Operating system performance depends critically on kernel policy choices, yet these policies are typically selected statically despite dynamic and heterogeneous workloads. While an agent managing an operating system can observe the outcome of the policy it applies, it fundamentally lacks access to counterfactual feedback - how alternative kernel policies would have behaved under the same workload without risking system stability. We present pMVX, a framework for adaptive kernel policy selection based on policy-level multi-version execution. pMVX evaluates multiple kernel policy variants concurrently across isolated OS instances executing identical workloads, enabling empirical comparison without impacting production systems. The resulting observations are distilled into a lightweight mapping from eBPF derived workload characteristics to effective policies, which guides safe runtime policy selection. We instantiate pMVX with a self-tuning Linux scheduler built on top of sched_ext, and show that it improves performance over static schedulers across a range of scheduling-sensitive workloads.

Author Bios:

Xiaoguang Wang:- is an Assistant Professor of Computer Science at the University of Illinois Chicago and director of the SysSec Research Group. His research focuses on operating systems performance and security, including heterogeneous CPU architectures, software diversification, virtualization, and the application of large language models to systems and security challenges.

Sujot Singh: is a recent graduate with Master degree in Computer Science. During his time at SysSec Research Group at the University of Illinois Chicago, his research focused on operating systems and kernel-level policy management, with a particular interest in agentic approaches to OS self-tuning.

Chalmers Phua: is a first-year PhD student in the SysSec Research Group at the University of Illinois Chicago. His research focuses on system performance for AI workloads. Before starting his PhD, he worked in firmware and driver programming, with hands-on experience in embedded systems and hardware-software integration, including work through UIC's Aeronautics and Astronautics student group.

Kenan Alghythee: is a second-year PhD student in the SysSec Research Group at the University of Illinois Chicago, advised by Prof. Xiaoguang Wang. His research spans automated program repair, and OS-level self-turning.

A Structured Benchmarking Dataset for TLA+ Specification Reasoning

Authors: Arslan Bisharat, Eric Spencer, Khushboo Bhadauria, Anisa Ramos, Brian Ortiz, Mohammed Abuhamad, Konstantin Laüfer, TaiNing Wang, George K. Thiruvathukal

Institution(s): Loyola University Chicago

Room: Lake

Board #: L11

Session #: 1

Abstract: TLA+ is a formal specification language used to model and verify concurrent and distributed systems. Despite its practical value, no structured dataset exists to support future training and benchmarking of language models on TLA+ reasoning tasks. We present a benchmarking dataset built from real-world TLA+ specifications sourced from the TLA+ Examples repository. The dataset supports two tasks: structured feature extraction and natural language-to-specification generation. For feature extraction, we design a 60+ field JSON schema capturing syntactic and semantic properties of TLA+ specifications, verified against primary sources including Lamport's TLA+ Summary, the TLA+2 Preliminary Guide, the formal TLA+ grammar, and the TLAPS proof system documentation. For the generation task, natural language descriptions are derived exclusively from external documentation so that TLA+ specification files serve as ground truth outputs. All annotations are verified using SANY and TLC. This dataset is intended as a foundation for future training and benchmarking of language models on formal specification reasoning.

Author Bios:
Arslan Bisharat:PhD researcher in Computer Science at Loyola University Chicago, focused on adversarial machine learning, natural language processing, and social computing. Builds AI systems that are robust, socially aware, and reliable in high-stakes environments, with an emphasis on safety, fairness, and accountability.
Eric Spencer: Eric Spencer is a Senior Undergraduate Student majoring in Computer Science at Loyola University Chicago. He is a part of the AI for Formal Methods Lab at LUC and conducts research related to TLA+ and Large Language Models. He is planning to continue research at Loyola and work for a software agency after graduation.
Khushboo Bhadauria: Khushboo Bhadauria is a graduate student pursuing a Master’s degree in Computer Science at Loyola University Chicago. Her academic interests include artificial intelligence, machine learning, natural language processing, data analysis, and data engineering. She is a member of the AI for Formal Methods Laboratory at Loyola University Chicago, where she contributes to research involving TLA+, formal methods, and large language models (LLMs).
Anisa Ramos: Anisa Ramos is a senior undergraduate student at Loyola University Chicago majoring in Cybersecurity with a minor in Computer Crime and Forensic Science. She conducts research in the AI for Formal Methods Lab, focusing on TLA+ and large language models, and has experience in digital forensics and security analysis. She has interned with Motorola Solutions and Northwestern Medicine, as she plans to pursue a career in cybersecurity and research after graduation.
Brian Ortiz is graduate student at Loyola University Chicago finishing his Master's in Computer Science this spring. He works professionally as a DevSecOps Engineer. He is a part of the AI for Formal Methods Lab at LUC and conducts research related to TLA+ and Large Language Models.
Mohammed Abuhamad:I am an assistant professor of Computer Science at Loyola University Chicago . I received a Ph.D. degree in Computer Science from the University of Central Florida (UCF) in 2020. I also received a Ph.D. degree in Electrical and Computer Engineering from INHA University , (Incheon, Republic of Korea) in 2020. I received a Master degree in Information Technology (Artificial Intelligence) from the National University of Malaysia , (Bangi, Malaysia) in 2013.
Konstantin Laüfer: I joined Loyola University Chicago's computer science faculty in 1992 after completing my PhD at NYU under Ben Goldberg and Martin Odersky. My work spans programming languages, formal methods, software architecture, and computer science education. I currently co-direct the AI for Formal Methods Lab and am a co-inventor on two Lucent Technologies patents.
TaiNing Wang: Dr. TaiNing Wang is a tenure-track Assistant Professor of Computer Science at Loyola University Chicago. Her research focuses on database systems, query processing and optimization, graph data, applied AI/ML, AI accountability, and formal methods. Her work has been published in leading venues such as ACM SIGMOD, IEEE ICDE, and Future Generation Computer Systems.
George K. Thiruvathukal: Dr. George K. Thiruvathukal is a full professor of computer science at Loyola University Chicago and chairperson. He received the PhD and MS degrees in computer science from Illinois Institute of Technology in 1995 and 1990, respectively and BA degrees in physics and computer science (mathematics minor) from Lewis University (Romeoville, IL) in 1988.

Formal-assisted Reinforcement Learning

Authors: Thuan T. Cao; Stefan Mitsch

Institution(s): Depaul University

Room: Lake

Board #: L13

Session #: 1

Abstract: Reinforcement learning (RL) performs well for tasks in complex environments when clearly specified desired outcomes are available, but it comes at the cost of a lack of formal safety guarantees. This drawback is in part due to the "black-box" nature of RL and the finiteness of training runs, and in part due to semantic differences between formal safety models (nondeterministic physics models and nondeterministic action safety envelopes) and the RL training environment (simulated environment with stochastic actions). This makes translating the safety logic to the RL environment a safety-critical, laborious, and error-prone process. We propose a formal specification language that ties formal models to RL environments through automated synthesis. Our method is based on differential dynamic logic to specify safety models, extended with annotations to aid translation to the Gymnasium environment. Structured syntactic annotations in the form of comments in the formal model allow the user to link the formal specification to the reward/penalty signals and the gradients more precisely. // Rewarding speed (reward=isEfficient <=> reward=v-0) // @progress: reward=isEfficient, on_success=continue, on_failure=continue Bool isEfficient(Real v) <-> ( v > 0 ); // Limit speed and enforce safety distance (shape=isSafe <=> shape=(abs(x-xo) - stopDist(v)) // @safety: penalty=-1000, shape=isSafe, on_failure=truncate:status=failure, on_success=continue Bool isSafe(Real x, Real xo, Real v) <-> (abs(x-xo) > stopDist(v)); The framework then automatically generates an RL environment and reward signals. This allows the RL agent to be trained on the same safety specifications that were formally proven correct in differential dynamic logic and greatly reduces the effort of reward engineering.

Author Bios:
Thuan T. Cao short bio:
Thuan Cao is a Master of Computer Science student and Graduate Assistant in the PROVE Lab at DePaul University. His research focuses on integrating formal verification with reinforcement learning to create verified training environments for autonomous systems.

Stefan Mitsch short bio:
Stefan Mitsch is an Associate Professor in the School of Computing at DePaul University. His research on formal verification for autonomous cyber-physical systems combines offline verification with code synthesis and runtime verification to provide rigorous guarantees about runtime behavior and the correctness of AI components.

Contour: Elevating Scientific Applications to Scalable Workflows

Authors: Colin Thomas, Andrés Iglesias, Douglas Thain

Institution(s): University of Notre Dame

Room: Lake

Board #: L6

Session #: 1

Abstract: Scientific applications are executed in a broad variety of cluster ecosystems which use several layers of supporting technology to connect processes with data and manage resources. The applications themselves are composed with invocations of various domain specific software packages. Many applications benefit in performance, scalability, and portability when described and executed as a task-based workflow, with order of execution defined by a DAG (Directed-Acyclic-Graph). The challenge in writing such a workflow is the description of tasks and their dependencies in order to form the DAG. Programs often exhibit implicit behavior and depend on unexpected information, making it difficult to define the behavior of a task. This work presents Contour, an I/O analysis and workflow composition toolkit which observes a running application to create an execution contract. The contract describes the behavior of the application and facilitates its transposition to a high-level distributed workflow. Contour enables task-based workflow description and execution for complex scientific applications.

Author Bios:
Colin Thomas and Andrés Iglesias are graduate students at the University of Notre Dame, working in the Cooperative Computing Lab with Professor Douglas Thain. The CCL researches and develops workflow system technology for data intensive scientific applications.

VaultX: A Lightweight Proof-of-Space Consensus System

Authors: Samuel Fatunmbi, Ioan Raicu

Institution(s): Illinois Institute of Technology2

Room: Lake

Board #: L5

Session #: 1

Abstract: Blockchain consensus mechanisms face the trilemma of balancing security, decentralization, and resource efficiency. Two dominant consensus algorithms, Bitcoin’s Proof-of-Work (PoW) and Ethereum’s Proof-of-Stake (PoS), balance security, scalability, and energy efficiency, though PoW is energy-intensive and PoS faces risks centralizing power among wealthy validators, creating risks of centralization, governance capture, censorship, and reduced network fairness over time. Chia’s Proof-of-Space (PoSp) offers a middle-ground, using storage (instead of computation) for validation in the network while maintaining decentralization. PoSp turns the computational-intensive problem into a data-intensive problem. Participants dedicate unused disk space to store pre-computed cryptographic proofs, which can later be verified with minimal computational effort. We present VaultX, a clean-slate PoSp system that utilizes a minimalist approach redesigned from the ground up for hardware efficiency, simplicity, and scalability. It uses a two-table construction using BLAKE3 cryptographic hashing that defends against time–memory trade-off attacks while dramatically reducing algorithmic complexity. Parallelism is achieved in VaultX through OpenMP with I/O-aware data layouts, and optimizations targeting mechanical hard drives through large sequential I/O calls. We evaluate VaultX on the Mystic testbed across 9 heterogeneous machines, from 4-core Raspberry Pi nodes to 384-core HPC servers, spanning HDD, SSD, NVMe, NFS, and Ceph storage. VaultX can generate 2^32 cryptographic proofs (K32 vault) in as low as 6 minutes versus 18 minutes for BladeBit, Chia’s fastest CPU plotter working across 64-cores and 512GB RAM. Our hash compression strategy allows hash values to be omitted from the final saved state, reducing stored information from 40 bytes per proof to 8 bytes, producing 32GB K32 com- pressed vaults compared to Chia’s 105GB K32 plots, or 160GB K32 uncompressed vaults. An economic analysis shows that PoSp is cost-competitive with PoW over a 5-year horizon, with significantly lower ongoing energy expenditure, producing less ewaste, making VaultX a viable, sustainable, and egalitarian foundation for decentralized consensus.

Author Bios:
Samuel Fatunmbi
Samuel Fatunmbi is a 1st year PhD student in the department of computer science at Illinois Institute of Technology. He is a Research assistant to Professor Ioan Raicu, with research focus in high performance computing, blockchain systems, distributed systems and data-intensive computing. He is currently working on a Proof-of-Space consensus mechanism redesign in Prof Raicu's lab.

Prof. Ioan Raicu
Prof. Ioan Raicu is an Associate Professor in the Department of Computer Science at Illinois Institute of Technology and guest research faculty at Argonne National Laboratory. He is the founder and director of the Data-Intensive Distributed Systems Laboratory (DataSys Lab). His research focuses on distributed systems, many-task computing, and data-intensive computing at extreme scales, with recognition including the IEEE TCSC Young Achievers in Scalable Computing award and an NSF CAREER award.

A Scalable Benchmarking Toolkit for AI-Driven Image Search

Authors: Francisco Lozano

Institution(s): Northwestern University, Argonne National laboratory

Room: Lake

Board #: L23

Session #: 1

Abstract: Evaluating AI systems at scale, particularly domain-specific tasks, remains difficult due to the lack of extensible and reproducible benchmarking frameworks. We present a benchmarking toolkit for large-scale image search, designed to enable systematic evaluation, comparison, and improvement of retrieval systems across models, datasets, and deployment environments. Our toolkit supports the full benchmarking lifecycle: dataset ingestion, query generation, relevance annotation, and metric-driven evaluation. It enables the construction of domain-specific benchmarks with rich metadata and controllable difficulty, supporting use cases such as environmental monitoring and wildfire detection. The framework integrates standard retrieval metrics (e.g., NDCG, precision, recall) alongside diversity-aware measures such as intra-list similarity, providing a more complete view of system performance. A key feature is its tight coupling with real-world retrieval pipelines. The toolkit evaluates search systems that combine semantic embeddings, keyword retrieval (BM25), and reranking models, while remaining model-agnostic and infrastructure-independent. This allows researchers to benchmark different architectures, indexing strategies, and ranking approaches under consistent conditions. The system is designed with reproducibility and scalability as first-class concerns: experiments are fully scriptable, benchmarks are portable, and evaluation workflows can be applied to both historical datasets and live data streams. When deployed on large-scale datasets such as those from the Sage Continuum (tens of millions of images), the toolkit enables rigorous, system-level analysis of trade-offs between retrieval quality and latency. This work establishes a general-purpose, extensible foundation for benchmarking multimodal retrieval systems in domain-specific settings.

Author Bios:
Francisco Lozano is a software engineer specializing in AI, IoT, and edge computing, currently contributing to large-scale systems at Argonne National Laboratory and Northwestern University. His work focuses on applying AI and distributed systems to solve real-world challenges across data-intensive environments.

Edge-Based Audio Sensing for Real-Time Event Detection and Speech Processing

Authors: Samin Sohrabi

Institution(s): University of Illinois at Chicago

Room: Lake

Board #: L24

Session #: 1

Abstract: Edge computing enables real-time data processing with low latency and improved privacy, making it well-suited for continuous audio sensing applications. In this work, we present a real-time speech transcription system deployed on NVIDIA Jetson edge devices using optimized implementations of OpenAI Whisper. The system captures live audio from a microphone and performs on-device inference to generate continuous text output without relying on cloud resources. We focus on building a streaming pipeline that supports continuous transcription by segmenting incoming audio into overlapping time windows and incrementally processing them. To operate under the resource constraints of edge hardware, we evaluate multiple Whisper model variants and analyze trade-offs between accuracy, latency, and memory usage. System-level optimizations, including GPU acceleration and efficient audio buffering, are applied to improve throughput and responsiveness. We evaluate the system using real-world audio streams and benchmark its performance against reference transcriptions. Results show that real-time transcription is feasible on edge platforms with acceptable accuracy and latency, demonstrating the practicality of deploying modern speech models outside of cloud environments. This work highlights the potential of edge-based speech intelligence systems and contributes a scalable approach for continuous, on-device audio transcription in real-world settings.

Author Bios:
I am Samin Sohrabi, Just graduating from my masters and starting my PhD in Computer Science at UIC under supervision of Prof. Mike Papka.

Empowering scientific workflows with federated agents

Authors: Alok Kamatar, J Gregory Pauloski, Yadu Babuji, Ryan Chard, Mansi Sakarvadia, Daniel Babnigg, Kyle Chard, Ian Foster, Suman Raj, Augustus Ellerm

Institution(s): University of Chicago, Argonne National Laboratory

Room: Lake

Board #: L21

Session #: 1

Abstract: Agentic systems, in which diverse agents cooperate to tackle challenging problems, are exploding in popularity in the AI community. However, existing agentic frameworks take a relatively narrow view of agents, apply a centralized model, and target conversational, cloud-native applications (e.g., LLM-based AI chatbots). In contrast, scientific applications require myriad agents be deployed and managed across diverse cyberinfrastructure. Here we introduce ACADEMY, a modular and extensible middleware designed to deploy autonomous agents across the federated research ecosystem, including HPC systems, experimental facilities, and data repositories. To meet the demands of scientific computing, ACADEMY supports asynchronous execution, heterogeneous resources, high-throughput data flows, and dynamic resource availability. It provides abstractions for expressing stateful agents, managing inter-agent coordination, and integrating computation with experimental control. We present microbenchmark results that demonstrate high performance and scalability in HPC environments. To explore the breadth of applications that can be supported by agentic workflow designs, we also present case studies in materials discovery, astronomy, decentralized learning, and information extraction in which agents are deployed across diverse HPC systems.

Author Bios:
Alok Kamatar is a Ph.D. student in Computer Science at the University of Chicago, affiliated with Globus Labs.
J. Gregory Pauloski is a computer scientist formerly at the University of Chicago and Argonne National Laboratory, now at NVIDIA.
Yadu Babuji is a Principal Software Engineer at the University of Chicago, with a joint appointment at Argonne National Laboratory.
Ryan Chard is a computer scientist in the Data Science and Learning Division at Argonne National Laboratory.
Mansi Sakarvadia is a Ph.D. candidate in Computer Science at the University of Chicago, co-advised by Ian Foster and Kyle Chard in Globus Labs.
Daniel Babnigg is a MS student at the University of Chicago and a former research assistant at Globus Lab.
Kyle Chard is a Research Associate Professor in the Department of Computer Science at the University of Chicago, with a joint appointment at Argonne National Laboratory.
Ian Foster is the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago and Director of the Data Science and Learning Division at Argonne National Laboratory.
Suman Raj is a Postdoctoral Scholar in Department of Computer Science at the University of Chicago, advised by Ian Foster and Kyle Chard.
Augustus Ellerm is a Postdoctoral Scholar in Department of Computer Science at the University of Chicago, advised by Ian Foster and Kyle Chard.

HAMMER: Homomorphic Analytics with Multiparty Method for Encrypted Relational Queries

Authors: Donghyun Sohn, Zohaib Azam, Minxuan Zhou, Jennie Rogers

Institution(s): Northwestern University, Illinois Institute of Technology

Room: Lake

Board #: L3

Session #: 1

Abstract: User queries may reveal a client’s intent with their predicate parameters. Concealing these parameters from the untrusted server during queries over a public database is private information retrieval (PIR), a well-studied problem in cryptography. This work is the first SQL query evaluation system to address a stronger variation of this problem that preserves both predicate confidentiality and the privacy of the database’s records–not revealing them to the client–the symmetric PIR (SPIR) setting. Prior PIR work used heavyweight fully homomorphic encryption (FHE) schemes while selectively offloading some tasks onto the client to make their performance practical. In our work, the client does not take part in query evaluation, only receiving the final query answer. These settings are well-suited to lightweight clients such as mobile devices. SPIR querying is also useful for settings like anti-money laundering compliance for banks, where regulators issue private queries to learn about how transactions are moving through accounts without divulging information about all their customers to the government. We present HAMMER a query execution system that offers SPIR over SQL queries using a combination of local, server-side query evaluation, FHE, and multiparty computation (MPC) to make this process secure and scalable. FHE supports SIMD, which makes it 1-2 OOM faster than MPC on data-parallel operators. We therefore evaluate private filters and aggregates under FHE with operators organized to reduce their circuit depth and we support precise integer arithmetic. For steps with long sequences of conditional logic, such as sorts, we turn to MPC via our novel secure context switch. We convert FHE-encrypted intermediate results into secret shares among the original server and a non-colluding MPC helper node with no client assistance. In addition, HAMMER selects per-query FHE parameters guided by analytical depth bounds, and it supports parallelism on multicore CPUs and GPUs. Our hybrid query evaluation achieves 5.3×–8.4× speedups over the state-of- the-art FHE database on TPC-H workloads. With GPU-accelerated FHE evaluation, the majority of our query workload runs at SF1 in under a minute.

Author Bios:
Donghyun Sohn
Donghyun Sohn is a 4th-year PhD student in Computer Science at Northwestern University, advised by Prof. Jennie Rogers. His research focuses on privacy-preserving database systems, with an emphasis on query optimization for fully homomorphic encryption (FHE) and secure multi-party computation (MPC), and on how hardware properties shape secure analytics engines.

Zohaib Azam
Zohaib Azam is a PhD student in Computer Science at Illinois Institute of Technology, advised by Prof. Minxuan Zhou in the Emerging Computing Systems (ECS) Lab. His research interests include hardware-software co-design for privacy-preserving computation, with a focus on accelerating fully homomorphic encryption (FHE).

Minxuan Zhou
Minxuan Zhou is an Assistant Professor of Computer Science at Illinois Institute of Technology, where he leads the Emerging Computing Systems (ECS) Lab. His research spans computer architecture, software-hardware co-design, emerging memory technologies, machine learning acceleration, and privacy-preserving computing, with a particular emphasis on hardware acceleration for fully homomorphic encryption.

Jennie Rogers
Jennie Rogers is an Associate Professor of Computer Science at Northwestern University. Her research focuses on pragmatic privacy-preserving data analytics, federated databases over multiple data models, and database query optimization, with a particular emphasis on private data federations that allow mutually distrustful parties to compute SQL queries over their joint data without revealing sensitive inputs.

Portable Execution of Data-Intensive Notebook Workflows Across Heterogeneous HPC Systems

Authors: Md Saiful Islam, Douglas Thain

Institution(s): University of Notre Dame

Room: Lake

Board #: L16

Session #: 1

Abstract: Notebook workflows are widely used for exploratory and iterative scientific computing, but they remain difficult to scale and reproduce across heterogeneous HPC systems. While distributed execution frameworks enable notebooks to run at scale, portability is often hindered by incomplete execution context and site-specific assumptions. For data-intensive workflows, the primary barrier is data: datasets originate from heterogeneous storage systems, and notebooks frequently embed hard-coded paths and ad hoc staging logic that limit reuse and reproducibility. We present a data-centric approach for portable execution of notebook workflows using the Backpack abstraction, which captures software environments, data dependencies, and resource requirements in a portable specification. To address data portability, we introduce a declarative data specification that separates data handling from notebook code and enables consistent materialization of datasets at stable runtime paths across systems. Our design supports multiple data sources and integrates a metadata-driven fingerprinting mechanism that combines source metadata with specification attributes to generate deterministic identifiers for datasets. This enables efficient cache reuse across executions while ensuring that changes in data or configuration trigger re-staging. We evaluate our approach using three real-world data-intensive applications across three HPC systems with different schedulers and storage architectures. Results show that workflows execute unchanged across sites, with improved reproducibility and reduced data staging overhead through cache reuse.

Author Bios:
Md Saiful Islam is a Ph.D. student in Computer Science and Engineering at the University of Notre Dame, advised by Professor Douglas Thain. His research focuses on portable and reproducible scientific workflows on HPC systems, and he is a primary contributor to the Floability project.

Douglas Thain is a Professor of Computer Science and Engineering at the University of Notre Dame, where he leads the Cooperative Computing Lab. His research focuses on large-scale distributed computing for data-intensive scientific applications, and he develops open-source workflow tools including TaskVine and Makeflow.

TraceProv: All You Need is Log - Efficient Bolt-on Provenance Capture for Complex Relational Queries

Authors: Vinayak Jha, Dr. Boris Glavic

Institution(s): University of Illinois at Chicago

Room: Lake

Board #: L15

Session #: 1

Abstract: A long line of work in data provenance has investigated how to support provenance capture for relational databases. However, achieving high performance for complex queries has been challenging: systems with low overhead had to heavily modify the execution engine (and other components) of the database system, while less-invasive systems incur high overhead. In this work, we propose TraceProv, a data provenance capture system that achieves low overhead without being invasive. Our approach is based on a lightweight, parallelism-friendly logging framework that uses memory-mapped files and a two-phase capture strategy that opportunistically combines logging with propagation. TraceProv utilizes extensibility mechanisms supported by most databases, and supports complex analytical queries using features such as nested subqueries, aggregation, order-by with limit, window functions, and many others. We implement TraceProv on PostgreSQL and DuckDB, and experimentally compare against SOTA relational provenance systems. TraceProv outperforms non-invasive techniques and is competitive to or faster than invasive systems. Compared to invasive systems, our approach requires significantly less implementation effort and is easier to port to new database systems or new versions of an already supported system.

Author Bios:
Vinayak Jha:
Vinayak Jha is a first year Master's student in Computer Science at the University of Illinois at Chicago. He is currently serving as a research assistant under Dr. Boris Glavic. His research interests span database systems and OS internals, with a focus on kernel-level database integration.

Dr. Boris Glavic:
Boris Glavic (https://www.cs.uic.edu/~bglavic/dbgroup/members/bglavic.html) is an Associate Professor in the Department of Computer Science at the University of Illinois at Chicago leading the DBGroup. His research spans several areas of database systems and data science including data provenance, data integration, query execution and optimization, uncertain data, and data curation. Boris strives to build systems that are based on solid theoretical foundations.

Persistent Translation Validation

Authors: Tommy McMichen, Leyla Latifova

Institution(s): Northwestern University

Room: Lake

Board #: L9

Session #: 1

Abstract: With the emerging uses of AI to automate various software engineering tasks, one such use case is integration of models into compilation pipelines. Due to intrinsic stochastic properties of LLMs, the correctness of program transformations no longer relies on time-tested compilers, turning them very untrustworthy. Validating those transformations increases reliability, but not without a cost; such a computation takes considerable amount of time. There is however redundancy as the program goes through a chain of transformations. As an example, in the vast majority of programs, a pass through a language model would preserve the program's memory backbone; it is also the hardest constituent to verify. Identifying such redundancies would greatly reduce the problem size, and we therefore propose to break down programs into persistent and non-persistent components, with only the need validate the latter with each new transformation. Therefore, this work enables AI transformation passes to be integrated into compilation pipelines without an overwhelming overhead.

Author Bios:
Tommy McMichen is a final-year Ph.D. student at Northwestern University, advised by Simone Campanoni. His research builds compiler infrastructure to preserve high-level semantic information across system boundaries — most notably through MemOIR, which brings data collections into the compiler as first-class citizens. He is now applying this methodology to AI coding agents, grounding their reasoning in trustworthy compiler analysis to make them economical and correct by construction.

Leyla Latifova is a junior CS exchange student from ETH Zürich. Her current work focuses on speeding up the verification of correctness of program transformations. She previously did research in theoretical computer science.

Turbocharging Parallel Backtesting with High-Performance Storage Systems

Authors: Lan Nguyen, Ioan Raicu

Institution(s): Illinois Institute of Technology

Room: Lake

Board #: L12

Session #: 1

Abstract: The rapid growth of high-frequency time-series data across domains such as computational finance and scientific simulation has introduced significant challenges in storage, querying, and large-scale analysis. Workloads such as parameter sweeps in backtesting pipelines often require executing billions of simulations over decades of fine-grained data, placing extreme pressure on storage systems in terms of throughput, scalability, and resource utilization. Traditional storage solutions such as MongoDB, InfluxDB, CSV (via Pandas), and Parquet experience difficulty with balancing performance and efficiency under highly concurrent workloads and diverse access patterns.

Author Bios:
N/A

MPI-LLM: Extreme-scale Multi-LLM Serving for HPC

Authors: Shu Shi, Wenyi Wang, Yadu Nand Babuji, Kyle Chard, Ian Foster

Institution(s): University of Chicago

Room: Lake

Board #: L8

Session #: 1

Abstract: Large Language Models (LLMs) have become a critical part of broad workflows. Increasingly, users use many LLMs in their work, ranging from small to large models, fine-tuned on different corpora, and optimized for different use cases. As such, institutions face the challenge of serving many LLMs concurrently to their users. Existing solutions, like vLLM, focus on high-performance multi-GPU model serving for a single model. Deploying many such instances can be inefficient at scale. We explore serving multiple machine learning models with varied GPU requirements in a resource-constrained environment with a limited number of GPUs. We introduce MPI-LLM, a serving system designed to support efficient, scalable, multi-node inference of LLMs. Our work extends vLLM, the leading model serving system, with a novel Message Passing Interface backend, providing faster startup and resource allocation optimized for HPC systems. MPI-LLM offers three primary contributions: (i) an MPI-based backend that accelerates initialization and ensures compatibility with HPC infrastructure, (ii) support for hosting multiple models simultaneously, and (iii) rapid model switching capabilities.

Author Bios:
Shu Shi is a first-year Ph.D. student in Computer Science at the University of Chicago. He works with Prof. Ian Foster and Prof. Kyle Chard in Globus Labs. His research focuses on machine learning systems, with an emphasis on scalable LLM serving on high-performance computing platforms.

Wenyi Wang is a Ph.D. student in Computer Science at the University of Chicago and a member of Globus Labs, advised by Prof. Ian Foster and Prof. Kyle Chard. His research focuses on computer systems and machine learning, including scalable LLM inference on many-node HPC systems and fine-grained parallelism techniques.

Yadu N. Babuji is a Senior Software Engineer at the University of Chicago. He conducts systems research at Globus Labs, focusing on scalable distributed systems for scientific computing. He is the primary developer and technical lead of Parsl, and has also worked on funcX, a function-serving platform for science.

Kyle Chard is a Research Associate Professor in the Department of Computer Science at the University of Chicago and a researcher at Argonne National Laboratory. He co-leads Globus Labs, where his research focuses on systems for computational and data-intensive science, including distributed computing, research automation, and research data management.

Ian Foster is an Argonne Senior Scientist and Distinguished Fellow, Director of Argonne's Data Science and Learning Division, and the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago. His research focuses on distributed, parallel, and data-intensive computing technologies and their applications to scientific problems.

SafeWall: Risk-Aware Job Walltime Recommendation for HPC Scheduling

Authors: Kanglin Xu, Michael Papka, Zhiling Lan

Institution(s): University of Illinois Chicago

Room: Lake

Board #: L17

Session #: 1

Abstract: In high-performance computing (HPC), users must submit resource requirements for their jobs, with user-supplied walltime being a critical parameter. However, these estimates are notoriously inaccurate, as users tend to overestimate walltime to prevent the scheduler from terminating their jobs prematurely. While existing work on learning-based walltime prediction primarily focuses on maximizing prediction accuracy, such predictions are inherently risky for walltime recommendations. Specifically, underestimation leads to job termination before completion, resulting in lost progress and wasted resources. In this study, we present SafeWall, a novel recommendation method that adjusts walltime predictions by incorporating an explicit, nonnegative safety buffer to mitigate underestimation risk. The core of our approach is a risk-aware buffer that dynamically calibrates safety margins based on individual user's historical behavior. When user data is limited, the system employs statistical shrinkage to align these margins with a global baseline, ensuring robustness. This yields a lightweight mechanism that optimizes the trade-off between prediction accuracy and termination risk. SafeWall is modular and applicable to various learning-based prediction models. We evaluate SafeWall using five public workload traces, including the full-lifetime job log from the 4,392-node Theta machine at the Argonne Leadership Computing Facility. Across all traces, SafeWall substantially reduces underestimation risk compared to the unbuffered base predictors, while preserving competitive accuracy.

Author Bios:
Kanglin Xu is a Ph.D. student in Computer Science at the University of Illinois Chicago. His research interests include high-performance computing, job scheduling, and machine learning for large-scale system optimization.

Michael E. Papka is the Warren S. McCulloch Professor of Computer Science at the University of Illinois Chicago and director of the Electronic Visualization Laboratory. A Senior Scientist and Distinguished Fellow at Argonne National Laboratory, he leads the Argonne Leadership Computing Facility and co-directs the George Crabtree Institute for Discovery. His research spans high-performance computing, large-scale data analysis, and scientific visualization across the computing continuum.

Zhiling Lan is a Professor of Computer Science at the University of Illinois Chicago with a joint appointment at Argonne National Laboratory. She leads the SPEAR team (Systems for Performance, Energy, and Resiliency) within the UIC EVL Lab. Previously a professor at the Illinois Institute of Technology for over two decades, her research focuses on parallel and distributed systems and high-performance computing.

Beyond Single-Node: High-Performance Parallel Sorting for CKKS Encrypted Data

Authors: Charis Hulu, Valentino Guerrini, Sidharth Kumar, Anrin Chakraborti

Institution(s): University of Illinois of Chicago

Room: Lake

Board #: L1

Session #: 1

Abstract: Sorting encrypted data under fully homomorphic encryption (FHE) is a critical primitive for privacy-preserving computation, enabling applications from secure database operations to confidential machine learning pipelines. However, the prohibitive computational cost of homomorphic operations makes efficient sorting on encrypted data exceptionally challenging. While recent work has demonstrated single-node parallel sorting algorithms for FHE data using shared-memory architectures, these approaches remain fundamentally limited by the computational capacity and memory constraints of individual machines. This paper presents the first distributed-memory parallel sorting algorithm for homomorphically encrypted data, extending beyond single-node limitations to multi-node supercomputing environments. We develop a hybrid MPI+OpenMP implementation that exploits the constant-depth, embarrassingly parallel structure of the Mazonne sorting algorithm (USENIX Security 2025), enabling effective utilization of distributed computing resources. Our approach incorporates efficient load balancing schemes to ensure optimal workload distribution across processors and nodes. The compute-dominated nature of FHE operations, combined with low-latency interconnects on modern supercomputers, makes the marginal communication overhead negligible compared to computational gains from massive parallelism. Experimental evaluation on the Polaris supercomputer at Argonne Leadership Computing Facility shows that distributed execution significantly mitigates the quadratic runtime growth observed in the single-node setting and achieves up to a 2^7x speedup. We successfully scale to 1,024 MPI processes and 16,384 threads to sort 65,536 CKKS-encrypted elements in 3,069 seconds, the largest reported FHE sorting experiment to date.

Author Bios:
Charis Hulu is a PhD student at the University of Illinois Chicago whose research focuses on high-performance computing, distributed systems, and homomorphic encryption.

Valentino Guerrini is a researcher interested in high-performance computing, FPGA acceleration, and hardware/software co-design for scalable computing systems.

Sidharth Kumar is an Associate Professor at the University of Illinois Chicago whose research focuses on high-performance computing, scalable algorithms, data-intensive systems, and parallel computing.

Anrin Chakraborti is an Assistant Professor at the University of Illinois Chicago whose research lies at the intersection of cryptography, systems, and privacy-preserving computing.

Experimental Demonstration of Software-Orchestrated Quantum Network Applications over a Campus-Scale Testbed

Authors: Md Shariful Islam, Joaquin Chung, Ely Marcus Eastman, Robert J. Hayek, Prem Kumar, Rajkumar Kettimuthu

Institution(s): Northwestern University, Argonne National Laboratory

Room: Lake

Board #: L2

Session #: 1

Abstract: To fulfill their promise, quantum networks must transform from isolated testbeds into scalable infrastructures for distributed quantum applications. In this paper, we present a prototype orchestrator for the Argonne Quantum Network (ArQNet) testbed that leverages design principles of software-defined networking (SDN) to automate typical quantum communication experiments across buildings in the Argonne campus connected over deployed, telecom fiber. Our implementation validates a scalable architecture supporting service-level abstraction of quantum networking tasks, distributed time synchronization, and entanglement verification across remote nodes. We present a prototype service of continuous, stable entanglement distribution between remote sites that ran for 12 hours, which defines a promising path towards scalable quantum networks.

Author Bios:
Ely M. Eastman is a graduate research assistant at Northwestern University and Argonne National Laboratory. His research interests include photonic quantum communication and silicon photonics.

Robert J. Hayek is a predoctoral appointee within the Data Science and Learning Division at Argonne National Laboratory. His research experience lies in quantum communications, 5G cellular networks, federated learning, and synchronization protocol design.

Joaquin Chung is a research scientist at the Data Science and Learning Division at Argonne National Laboratory. His main work focuses on studying architectures for scalable quantum networks.

Prem Kumar is a Professor of Electrical and Computer Engineering and the Director of the Center for Photonic Communication and Computing. His research spans the fields of Nonlinear Optics, Optical Communication Networks, and Quantum Computing and Communication.

Rajkumar Kettimuthu is a senior scientist and group leader at Argonne National Laboratory. He has over 20 years of experience leading large-scale R&D efforts in high-performance computing, AI for science, scientific workflows, and advanced networking.

Toward Scalable Privacy-Preserving Machine Learning: Unified GPU Acceleration for Offline Correlation Generation

Authors: Fatih Cetin, Chenkai Wang, Yanxue Jia, Minxuan Zhou

Institution(s): Illinois Institute of Technology, Arizona State University

Room: Lake

Board #: L4

Session #: 1

Abstract: Privacy-preserving machine learning (PPML) protocols partition computation across two paradigms: function secret sharing (FSS) handles non-linear operations such as ReLU and comparison, while additive secret sharing handles linear operations. Both adopt offline/online decompositions to minimize online latency, but this shifts the burden to the offline phase, where generating and storing correlated randomness becomes the dominant bottleneck. Our profiling of Orca, a GPU-accelerated FSS-based PPML system, reveals that precomputed FSS keys grow from gigabytes to tens of gigabytes as model complexity increases, with key storage, I/O, and memory movement consuming as much time as GPU computation even for moderate-sized models. These costs worsen as FSS-based frameworks target heavier workloads. We propose a unified acceleration framework addressing this offline bottleneck across both paradigms. For FSS-based non-linear evaluation, we replace large precomputed keys with compact seeds expanded on-the-fly on GPU. Chunked online key generation reduces peak staged key footprint by up to 128× compared to one-shot generation with under 2× time overhead, providing a tunable memory-efficiency knob. For secret-sharing-based linear evaluation, we GPU-accelerate Ring-LPN pseudorandom correlation generators (PCGs). Our GPU NTT engine, adapted from Cheddar's two-phase kernel structure, achieves roughly 89× polynomial multiplication speedup over the NFLLib CPU baseline, providing the fast polynomial core that online PCG expansion requires. Built on this, our GPU Ring-LPN VOLE expansion prototype demonstrates that PCG-based correlation generation runs efficiently on GPU, with validated correctness across polynomial degrees from 8,192 to over one million for both 32-bit and 64-bit modulus widths. We are currently integrating these components into Orca

Author Bios:
Fatih Alper Cetin is a graduate researcher at the Illinois Institute of Technology, where he works at the intersection of privacy-preserving machine learning and high-performance computing. His current research focuses on GPU-accelerated cryptographic primitives for secure multi-party computation, with an emphasis on scalable offline correlation generation.

Minxuan Zhou is an Assistant Professor in the Department of Computer Science at Illinois Institute of Technology, where he is leading the Emerging Computing Systems (ECS or X) Lab. Prior to that, he obtained a PhD degree in computer science from UC San Diego.

Yanxue Jia is an assistant professor in the Department of Computer Science at Illinois Institute of Technology since 2025. Before joining Illinois Tech, she was a post-doctoral researcher at Purdue University, and she earned her Ph.D. in Computer Science from Shanghai Jiao Tong University in 2022.

Accelerating Multi-Agent Orchestration with Speculative Dispatching

Authors: Jie Ye, Shazzadul Islam, Jaime Cernuda, Xian-He Sun, Anthony Kougkas

Institution(s): Illinois Institute of Technology

Room: Lake

Board #: L10

Session #: 1

Abstract: Scientific computing workflows are increasingly being restructured around autonomous AI agents. These agents coordinate complex tasks across heterogeneous tools and infrastructure, ranging from molecular simulations spanning quantum chemistry solvers and machine learning potentials to cross-facility manufacturing experiments orchestrated by multi-agent teams. However, existing orchestration approaches either follow static execution graphs defined at design time or delegate routing to an LLM that greedily selects the next agent at each step, without considering resource availability, heterogeneous model and provider selection, and infrastructure constraints. Achieving resource-aware, dynamic, and deterministic orchestration that reasons about task decomposition, agent capabilities, and infrastructure state demands complex planning and dispatching decisions at runtime. This may introduce significant latency before any agent can begin execution. We propose a speculative dispatch mechanism to accelerate multi-agent orchestration. We are the first to apply speculative execution to the dispatch decision. Specifically, our approach speculatively dispatches agents ahead of time while the full orchestration decision is being computed. Since speculative dispatch can mispredict, a reconciliation engine reconciles the speculative and optimal dispatch plans. Unlike single-agent speculation approaches that accept or discard results entirely, our framework commits correct work, salvages partial matches, and flushes mispredictions, reducing the cost of misprediction. As workload patterns evolve over time, a learner progressively refines speculation accuracy from reconciliation outcomes, enabling the system to improve dispatch predictions. Preliminary results show speculative planner produces partially correct dispatch plans (0.50–0.78 similarity) at 4–10× lower latency than optimal planner, confirming the latency-quality gap speculative dispatch exploits.

Author Bios:
Jie Ye is a PhD student in the Department of Computer Science at the Illinois Institute of Technology (IIT) and a member of the Gnosis Research Center, advised by Prof. Xian-He Sun and Dr. Anthony Kougkas at IIT and co-advised by Dr. Bogdan Nicolae at Argonne National Laboratory. Her research focuses on accelerating and optimizing DNN/LLM inference serving.

Shazzadul Islam is a PhD student in the Department of Computer Science at the Illinois Institute of Technology and a member of the Gnosis Research Center, working under the guidance of Dr. Xian-He Sun and Dr. Anthony Kougkas. He holds an M.S. in Computer Science from IIT, and his research specializes in high-performance computing and artificial intelligence.

Dr. Jaime Cernuda is a Research Assistant Professor in the Department of Computer Science at the Illinois Institute of Technology and a member of the Gnosis Research Center. He specializes in high-performance computing infrastructure, with expertise in distributed storage systems, real-time data processing, and exa-scale computing environments.

Dr. Xian-He Sun is the Director of the Gnosis Research Center and an IEEE Fellow. He is a University Distinguished Professor and the Ron Hochsprung Endowed Chair of Computer Science at the Illinois Institute of Technology. His research interests include parallel and distributed processing, memory and I/O systems, software systems for Big Data applications, and performance evaluation and optimization.

Dr. Anthony Kougkas is the Deputy Director of the Gnosis Research Center at the Illinois Institute of Technology. With a Ph.D. in Computer Science, he focuses on solving data management and I/O challenges in extreme-scale distributed applications, with research spanning multi-tiered storage, prefetching, replication, compression, and in-transit/in-situ techniques.

Optimized Task Scheduling for Connectomics and Large-Scale Image Processing Workloads

Authors: Jyotsna Rajaraman, Michael E. Papka

Institution(s): University of Illinois Chicago

Room: Lake

Board #: L18

Session #: 1

Abstract: Volumetric image processing pipelines, particularly in connectomics, process petascale datasets through block-wise decomposition, resulting in irregular, multi-stage workloads. Existing task scheduling frameworks address computational imbalance but overlook key domain-specific characteristics, including spatial locality, overlapping (halo) regions, object-level irregularity, and data movement across HPC systems. As a result, these approaches often incur redundant computation, excessive data movement and sub-optimal resource usage, which are significant bottlenecks at scale. In this work, we identify the limitations of state-of-the-art scheduling approaches for connectomics workloads and demonstrate that load balancing alone is insufficient for optimal performance. We show that effective scheduling must also consider task granularity, spatial adjacency and pipeline-stage-specific behavior to minimize end-to-end runtime. To address these irregularities, we propose a domain-aware task scheduling framework which incorporates (1) spatial locality-aware task placement to reduce communication and I/O overhead, (2) cost-aware scheduling using predictors derived from input characteristics and (3) adaptive task scheduling policies that tailor scheduling strategies to different phases of the connectomics pipeline. We will also evaluate this framework on different stages of a connectomics workflow and compare it with existing task-scheduling strategies.

Author Bios:
Jyotsna Rajaraman: Computer Science graduate student at University of Illinois Chicago
Michael E. Papka: Professor of Computer Science

The Sage Grande Testbed

Authors: Neal Conrad, Peter Lebiedzinski, Francisco Lozano, Yongho Kim, Sean Shahkarami, Nicola Ferrier, Rajesh Sankaran, Pete Beckman

Institution(s): Argonne National Laboratory, Northwestern University

Room: Lake

Board #: L22

Session #: 1

Abstract: The Sage Grande Testbed is an open artificial intelligence testbed that supports edge computing and intelligent sensing. The platform allows researchers to develop new AI algorithms and measurement strategies. After a decade of experience, our current project is set to deploy 300 new state-of-the-art AI-enabled platforms across the USA, and in every state. Scientists and students can explore next-generation AI-enabled infrastructure for real-time monitoring of sensor data for many applications including wildfire, flooding, and agriculture. As edge computing platforms become more powerful, they can support bigger AI models, including large language models (LLMs). In the future, scientists will be able to use natural language prompts and AI agents to program their sensing strategies and collect relevant data. With new AI-enabled infrastructure, Sage can provide users with new ways to write programs and control Sage nodes.

Author Bios:
Neal Conrad is a Principal Specialist in Research Software Engineering at Argonne National Laboratory. He works at the intersection of advanced computing, scientific discovery, and user‑centered design, with a focus on AI at the Edge. His expertise spans data visualization, AI workflows, and UI/UX design that brings clarity and usability to complex scientific data.

Room: Northwestern

A Client-Side Architecture for Scalable Ingestion and Interactive Visualization of Cosmological Particle Data

Authors: Idunnuoluwa A. Adeniji, Joseph Insley, Michael E. Papka

Institution(s): University of Illinois Chicago, Argonne National Laboratory

Room: Northwestern

Board #: N20

Session #: 1

Abstract: We present the data ingestion and processing architecture of the OpenCosmo Visualization Application (OVA), a browser-based system for interactive exploration of large-scale cosmological particle data designed for direct integration within the OpenCosmo portal. In this workflow, users query subsets of simulation outputs containing millions of particles, requiring efficient, low-latency transformation from columnar data into interactive visual form. OVA executes this pipeline entirely on the client. It ingests Apache Parquet datasets directly in the browser using WebAssembly, leveraging Parquet’s columnar layout and high compression efficiency to reduce data transfer costs while enabling selective decoding of relevant fields. Data is processed in parallel via Web Workers and prepared as GPU-ready buffers for rendering, enabling end-to-end exploration without preprocessing or server-side visualization infrastructure. Profiling a baseline pipeline processing 3.3M–52.9M particles reveals that WebAssembly decoding dominates per-file ingestion cost (up to 84%), while structured-clone serialization across the Worker–main thread boundary dominates end-to-end latency, averaging 14.5 seconds at 3.3M particles and failing at 16.5M due to memory exhaustion. We address this bottleneck through a columnar Transferable buffer architecture that transfers typed array ownership without copying, reducing transfer time by 13× and memory usage by 4.3×. Evaluation demonstrates linear ingestion scaling, near-constant transfer overhead, and sustained interactive performance on commodity hardware, extending feasible dataset sizes to 52.9M particles. This work shows that careful systems design enables scalable, client-side analysis of HPC-generated cosmological data without reliance on external visualization pipelines.

Author Bios:
Idunnuoluwa A. Adeniji: Computer Science PhD student, Creative technologist and Visualization Researcher, UIC. Michael E. Papka: Professor Computer Science UIC. Joseph Insley: Visualization Lead, Argonne National Laboratory

Pre-RoPE versus Post-RoPE: Per-Step Compute for a Reduced KV-Cache Memory Footprint

Authors: Jaime Cernuda, Zia Uddin Chowdhury, Jie Ye, Anthony Kougkas, Xian-He Sun

Institution(s): Illinois Institute of Technology

Room: Northwestern

Board #: N14

Session #: 1

Abstract: Modern LLMs built on the transformer architecture rely on self-attention. To avoid recomputing past representations at each decoding step, inference engines maintain a key–value (KV) cache that stores the projected keys and values of previously seen tokens. Since self-attention is position-agnostic, models inject positional information through Rotary Position Embedding (RoPE), which rotates query and key vectors by an angle tied to each token's position, so that attention scores depend on the relative distance between tokens. A new trend has emerged exploring the usage of Pre-RoPE KV-cache, which defers this rotation until attention time, leaving cached keys position-free. Post-RoPE storage remains the universal default across production inference engines today. Because every cached key has its position baked in, subsequent attention steps read them as-is, yielding efficient decoding. Identical content at different positions therefore produces distinct keys and cannot share a cache entry. For example, when an agent reads the same file twice, the cache may store two separate copies, inflating the memory footprint. The pre-RoPE alternative stores unrotated keys and applies RoPE at attention time, adding a per-step rotation overhead over every cached key. We provide the modeling and analysis of pre-RoPE versus post-RoPE storage in vLLM. Our results on Llama-3.2-1B-Instruct show that pre-RoPE incurs roughly 33% higher TTFT and 30% lower decode throughput while preserving output correctness, quantifying the per-step compute price that position-independent caching trades for the possibility of a reduced KV-cache memory footprint.

Author Bios:
Dr. Jaime Cernuda is a Research Assistant Professor in the Department of Computer Science at the Illinois Institute of Technology and a member of the Gnosis Research Center. Jaime Cernuda works under the guidance of Dr. Xian-He Sun.

Zia Uddin Chowdhury is a first year PhD student and Research Assistant in the Department of Computer Science at the Illinois Institute of Technology. He is a member of the Gnosis Research Center and works under the guidance of Dr. Xian-He Sun and Dr. Anthony Kougkas.

Jie Ye is a PhD candidate in the Department of Computer Science at the Illinois Institute of Technology and a member of the Gnosis Research Center. She is advised by Dr. Xian-He Sun and Dr. Anthony Kougkas.

Dr. Anthony Kougkas is an Associate Research Professor in Computer Science at Illinois Tech, where he serves as the Deputy Director of the Gnosis Research Center. His work at Illinois Tech and as a Guest Research Scientist at Argonne National Laboratory involves developing cutting-edge data management and storage solutions.

Dr. Xian-He Sun is the director of Gnosis Research Center at the Illinois Institute of Technology. He is a University Distinguished Professor and the Ron Hochsprung Endowed Chair of Computer Science and a former department chair of the Department of Computer Science at Illinois Tech.

PropCov: Effective Coverage Reporting for Property-Based Testing

Authors: Jesse Coultas; Joseph Wiseman; Luís Pina

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N7

Session #: 1

Abstract: Property-based testing (PBT), introduced by Haskell’s QuickCheck, is becoming more popular using tools like junit-quickcheck. Developers define a property and a generator that produces well-formed, randomized inputs. The property is repeatedly tested against fresh inputs to verify a property of the system, such as a serialization round-trip. Failures often expose edge-case bugs, while passing tests increase confidence of the system. However, traditional test coverage tools perform poorly when evaluating PBT coverage. In this poster, we present PropCov, a tool for understanding coverage in PBT that also provides suggestions for coverage improvement. PropCov employs a novel combination of static analysis with PBT to approximate the maximum possible coverage, providing an effective measure of the PBT coverage and making suggestions on paths to improve existing tests. PropCov features an easily extensible architecture that can support new languages, build systems, and PBT frameworks. We evaluated PropCov using 25 Java projects using junit-quickcheck or jqwik, totaling 293 properties, and found that existing tools report missed coverage that is impossible to reach (86% of lines that JaCoCo reports as not covered), which leads developers to consider hundreds of extra lines of code (2910). Unlike existing coverage tools, PropCov results are accurate — only 7.5% of all properties contain unfeasible code, and PropCov only misses 4% of reachable code. Using PropCov’s suggestions, we increased the coverage of 42 tests over 7 projects and found 5 new bugs in 4 projects.

Author Bios:
Jesse Coultas:
Jesse Coultas is a Ph.D. student in Computer Science whose research focuses on Property-Based Testing, coverage analysis, and AI-assisted developer tools. His work explores how coverage information and large language models can help developers understand reachability, improve test inputs, and evaluate what behavior their tests actually exercise.
Joseph Wiseman:
Joseph Wiseman is a computer science PhD student at the University of Illinois at Chicago, where his research focuses on programming languages, static program analysis, and software engineering. He has earned a Bachelor of Science in Physics from Loyola University Chicago, and a Bachelor and Master of Science in Computer Science at University of Illinois at Chicago.
Luís Pina:
Luís Pina is an Assistant Professor of Computer Science at the University of Illinois Chicago. His research focuses on programming languages and systems, with an emphasis on multi-version execution, record/replay, concurrency, and managed runtime systems. His work develops techniques that make complex software systems more reliable, reproducible, and secure.

The Science of Science: Loyola University's Research Network

Authors: Dawson Gallay, Satyaki Sikdar

Institution(s): Loyola University Chicago

Room: Northwestern

Board #: N24

Session #: 1

Abstract: This study analyzes bibliometric data to map the research footprint of Loyola University Chicago. We started with a dataset provided by SciSciNet and OpenAlex that documents the collaboration networks, topics, funding, and more that goes into a successful research paper: 250 million papers by 100 million authors representing over 100 thousand institutions across the world; 30+ tables, hundreds of fields, and over 100GB. To navigate this complex schema, we utilized parquet push-down predicates to mine the data, target Loyola's ID, and filter through nearly 1 billion paper-author affiliations to create a focused subset: 42,015 papers with at least one author from Loyola University. We then mapped Loyola's research network throughout US institutions and the world, analyzing trends in impact and collaboration shifts. These metrics tell the story of a school with global reach, broad interests, and recognized expertise. Looking ahead, we can expand this analysis beyond Loyola's historical output. Our robust dataset provides the opportunity to build a complete picture of the scientific ecosystem — the untapped collaboration networks, the way topic switches are encouraged by historical events like COVID, and the downstream effects of funding. Understanding this data is the gateway to future models that promote smarter, more efficient academic growth and scientific development.

Author Bios:
Dr. Satyaki Sikdar is an Assistant Professor of Computer Science at Loyola University Chicago. He earned his Ph.D. from the University of Notre Dame and completed postdoctoral work at Indiana University. His recent projects use generative models and embeddings to mine large‑scale complex networks, with results appearing in venues such as Nature Human Behaviour, IEEE TKDE, and Scientific Reports.

Dawson Gallay is a Computer Science graduate student at Loyola University Chicago. His research utilizes large-scale bibliometric data to model scientific discovery. After earning his Bachelor's degree from Boston College, he served as a Trading Systems Engineer at Société Générale and an Associate Director of Market Data at Janus Henderson Investors.

Understanding the Resource Usage of High-Energy Physics Applications in HPC Systems

Authors: Akshar Patel, Zhiling Lan

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N16

Session #: 1

Abstract: High-Energy Physics (HEP) applications, like many other scientific applications, are growing in scale, requiring more energy, memory, and time out of HPC systems. As a result, optimizing resource usage has become a requirement in order to maintain sustainable high-throughput computing environments. This necessitates better methods to understand and apply resource usage data in our everyday computing. This work analyzes the power and memory usage data of a single HEP application in order to explore how this data can be used to inform how we manage HEP scientific applications. This work also explores methods to profile and fingerprint this data to assist in this goal.

Author Bios:
Akshar Patel: I am a 1st year PhD student at UIC in the SPEAR group of the Electronic Visualization Lab (EVL). My advisor is Dr. Zhiling Lan, and my interests involve verifying concurrent programs for functional correctness , data-race freedom, and many other bugs that may show themselves in concurrent programs.

Zhiling Lan: Professor of Computer Science at UIC

What Do AI Coding Agents Need? Mining Trajectories to Identify Gaps in Language Tooling

Authors: Tommy McMichen

Institution(s): Northwestern University

Room: Northwestern

Board #: N10

Session #: 1

Abstract: Modern AI coding agents are capable of software development tasks ranging from large greenfield projects to low-level optimizations in production-quality codebases. However, these agents are computationally expensive and require specialized hardware that typically must be accessed from cloud providers, making token efficiency important. A necessary step toward token efficiency is to offload work to traditional language tooling—such as compilers, linters, and static analyzers—yet this tooling was never designed with AI agents in mind. There is currently no systematic method for identifying where existing tools fail these agents and how they should be extended. In this work, we propose a workflow that applies statistical analysis and LLM-as-a-judge to agent trajectories (the reasoning and actions taken when solving a problem) to automatically identify these gaps, reducing the need for labor-intensive manual analysis. Applying this workflow to Python coding agents, we find that agents manually implement common program analyses such as program slicing, and get stuck in loops when resolving complex import errors. Both findings indicate absent or inadequate tooling support, and our workflow prescribes targeted extensions to Python's tooling ecosystem for agent workflows. With this methodology, programming language implementers, toolchain designers, and agent framework developers can evaluate and extend their tools to better support AI coding agents.

Energy and Thermal Evaluation of Sage Thor-Blade Edge Computing Platform for Continuous Foundation Model Inference

Authors: Yongho Kim, Peter Lebiedzinski, Rajesh Sankaran, Sean Shahkarami, Neal Conrad, Francisco Lozano, Nicola Ferrier, and Pete Beckman

Institution(s): Argonne National Laboratory

Room: Northwestern

Board #: N22

Session #: 1

Abstract: Deploying foundation models on edge hardware requires balancing inference capability with strict thermal and energy limits. This work presents a thermal characterization of an edge-computing hardware design under sustained AI inference, with emphasis on the effect of different power modes and cooling configurations. We evaluate workloads that vary by inference frequency, model size, and quantization level, and measure the resulting energy consumption and thermal behavior during continuous AI operation. Our analysis compares transient and steady-state thermal responses across workload and cooling conditions, revealing how workload intensity and model configuration interact with hardware cooling capacity. The results highlight operating regions where the platform sustains stable inference and regions where thermal stress becomes a limiting factor. This study provides insight into the thermal robustness of edge hardware for foundation AI model deployment and motivates future work on thermal-aware workload scheduling and co-design software through hardware.

Author Bios:
Yongho Kim: Yongho Kim is an assistant computer scientist at Argonne National Laboratory. His research interests include AI@edge, edge computing, task scheduling, resource management and control, and agentic AI., Peter Lebiedzinski: Piotr Lebiedzinski is a Senior Developer at Northwestern University's Northwestern-Argonne Institute of Science and Engineering (NAISE). His research interests include edge computing, 3D diffusion, and agentic AI., Rajesh Sankaran: , Rajesh Sankaran is a Principal Specialist and R&D Leader at Argonne National Laboratory, with research interests spanning edge computing, embedded systems, distributed systems, machine learning, and cyber-physical sensing platforms. His work applies electrical and computer engineering methods to scientific and engineering problems, with emphasis on sensor-driven computation, distributed sensing, and AI-enabled edge infrastructure.Sean Shahkarami: Sean Shahkarami leads the software development team on Sage Grande at Northwestern University., Neal Conrad: Neal Conrad is a Principal Specialist in Research Software Engineering at Argonne National Laboratory. He works at the intersection of advanced computing, scientific discovery, and user‑centered design, with a focus on AI at the Edge. His expertise spans data visualization, AI workflows, and UI/UX design that brings clarity and usability to complex scientific data., Francisco Lozano: Francisco Lozano is a software engineer currently contributing to the Sage Grande project at Northwestern University. His interests include AI, IoT, and edge computing., Nicola Ferrier: Nicola Ferrier is a Senior Computer Scientist in the Mathematics and Computer Science (MCS) Division of Argonne National Laboratory and a Research Fellow in the Northwestern Argonne Institute for Science and Engineering. Ferrier’s research interests are in the use of artificial intelligence and computer vision to control robots, machinery, and devices, with applications as diverse as medical systems, manufacturing, and biology., Pete Beckman: Pete Beckman is the Co-Director of the Northwestern University / Argonne Institute for Science and Engineering. His current research interests include artificial intelligence at the edge, smart sensing, distributed sensor networks, and extreme-scale operating systems.

Agentic Orchestrator for Scientific LLM Inference

Authors: Aldo Malaquias Cabrera, Michael Papka, Zhiling Lan

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N31

Session #: 1

Abstract: This project introduces Agentic Orchestrator, a framework for adaptive inference routing across heterogeneous compute resources, including edge devices, cloud servers, and HPC clusters. For each incoming request, an LLM planner produces a service-level task DAG that specifies operations and dependencies without binding steps to specific machines. A Thompson-sampling multi-arm contextual bandit then routes each step to a concrete worker using task and fleet context, while combining quality, latency, and energy signals to reflect user priorities rather than throughput alone. Human feedback is a central part of the design. Because inference quality judgments may be delayed and sparse, deferred ratings are used as high-value supervision to calibrate the quality bandit and learn preference-aligned trade-offs across quality, speed, and efficiency. Between feedback events, runtime telemetry supports continual adaptation as workers join, leave, or degrade, while reward normalization and decay mechanisms help maintain stability under non-stationarity. Overall, Agentic Orchestrator provides an extensible, feedback-aware foundation for planning and executing scientific workflows on heterogeneous AI infrastructure.

Author Bios:
Aldo Malaquias Cabrera is a computer science PhD student at UIC exploring agentic systems for at-scale language model inference, with prior work in environmental monitoring, wireless sensing, and edge devices.

Michael Papka is the Warren S. McCulloch Professor at UIC, director of EVL, co-director of CS+Design, Argonne Senior Scientist and Distinguished Fellow, and ALCF director.

Zhiling Lan is a Professor of Computer Science at the University of Illinois Chicago, holds a joint Argonne appointment, and co-leads SPEAR Team at UIC EVL with Professor Mike Papka.

Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File Systems

Authors: Haochen Pan; Ryan Chard; Song Young Oh; Maxime Gonthier; Valérie Hayot-Sasson; Geoffrey Lentner; Joe Bottigliero; Rachana Ananthakrishnan; Kyle Chard; Ian Foster

Institution(s): University of Chicago; Argonne National Laboratory; École de technologie supérieure; Purdue University

Room: Northwestern

Board #: N18

Session #: 1

Abstract: Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes. While external indexing tools like GUFI and Brindexer improve query performance, they remain batch-oriented and unsuitable for heterogeneous, rapidly evolving environments. We present Icicle, a scalable framework for continuous file system metadata indexing and monitoring. Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of objects. Our experimental evaluation on production-scale HPC datasets demonstrates order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.

Author Bios:
Haochen Pan (University of Chicago; Argonne National Laboratory) — Haochen Pan is a fifth-year Ph.D. student in Computer Science at the University of Chicago and a member of Globus Labs, advised by Dr. Kyle Chard, Dr. Ian Foster, and Dr. Ryan Chard. His research lies at the intersection of distributed systems and cloud computing, focusing on coordination across HPC facilities through hierarchical event fabrics and event-driven architectures.

Ryan Chard (Argonne National Laboratory) — Ryan Chard is a consultant computer scientist with the Data Science and Learning Division at Argonne National Laboratory and Globus. He received his Ph.D. from Victoria University of Wellington, New Zealand, and was previously a Maria Goeppert Mayer Fellow at Argonne. He develops cyberinfrastructure to enable scientific research.

Song Young Oh (University of Chicago) — Song Young Oh is a second-year Ph.D. student in Computer Science at the University of Chicago, advised by Dr. Kyle Chard and Dr. Ian Foster at Globus Labs. Her research spans data management systems for scientific AI in HPC environments, with a focus on vector databases and multimodal representations.

Maxime Gonthier (University of Chicago; Argonne National Laboratory) — Maxime Gonthier is a postdoctoral researcher at INRIA Bordeaux, having previously held a joint appointment at the University of Chicago and Argonne National Laboratory. He received his Ph.D. from the École Normale Supérieure de Lyon. His research focuses on data locality, batch scheduling, energy in HPC, and scheduling for runtime systems.

Valérie Hayot-Sasson (École de technologie supérieure) — Valérie Hayot-Sasson is an Assistant Professor in the Department of Software Engineering and Information Technology at École de technologie supérieure in Montréal. She completed her Ph.D. at Concordia University in 2022 and was a postdoctoral scholar at the University of Chicago. Her research interests include open science, reproducibility, data management, and scientific workflows.

Geoffrey Lentner (Purdue University) — Geoffrey Lentner is a lead research data scientist at the Rosen Center for Advanced Computing at Purdue University, where he serves as an HPC facilitator, educator, consultant, and research software engineer. With a domain background in astrophysics, he leads efforts in data science for HPC and supports faculty and industry partners on Purdue's cyberinfrastructure.

Joe Bottigliero (University of Chicago) — Joe Bottigliero is a Senior Software Engineer and technology leader at the University of Chicago, working with the Globus team to develop research cyberinfrastructure.

Rachana Ananthakrishnan (University of Chicago; Argonne National Laboratory) — Rachana Ananthakrishnan is Executive Director and Head of Products at the University of Chicago, where she leads Globus product strategy and development. She also holds a joint staff appointment at Argonne National Laboratory.

Kyle Chard (University of Chicago; Argonne National Laboratory) — Kyle Chard is a Research Associate Professor of Computer Science at the University of Chicago with a joint appointment at Argonne National Laboratory. He co-leads the Globus Labs research group on data-intensive computing and research data management, and received the IEEE TCHPC Award for Excellence for Early Career Researchers in HPC.

Ian Foster (Argonne National Laboratory; University of Chicago) — Ian Foster is Senior Scientist and Distinguished Fellow at Argonne National Laboratory, where he directs the Data Science and Learning Division, and the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago. His research spans distributed, parallel, and data-intensive computing. He is a Fellow of the AAAS, ACM, BCS, and IEEE, and a recipient of the ACM/IEEE CS Ken Kennedy Award.

MEMO: High-Throughput Blockchain using Memoization

Authors: Venkata Harsha Pedada, Ioan Raicu

Institution(s): Illinois Institute of Technology

Room: Northwestern

Board #: N6

Session #: 1

Abstract: Blockchain technology enables decentralized, tamper-proof transaction processing but faces challenges in terms of energy efficiency and throughput. Bitcoin's Proof-of-Work (PoW) power consumption, coupled with its low 7 transactions per second (TPS) yields an effective per-transaction cost of over $100, limiting Bitcoin's adoption for many small everyday transactions. We present MEMO, a PoSpace blockchain system in C that explores high-throughput block creation through a multi-process architecture with dedicated components for transaction pooling, proof-of-space consensus, block validation, and timing coordination. Power efficiency is achieved through the Memoization of cryptographic proofs that are stored and retrieved from persistent storage. MEMO introduces deadline-aware batch processing, where validators verify EdDSA signatures in parallel using OpenMP and construct blocks within strict time budgets. Validators send partially-filled blocks when deadlines approach rather than missing them, ensuring consistent block intervals across all configurations. The architecture separates concerns across independent processes connected via ZeroMQ messaging and Google Protocol Buffer serialization, a design that mirrors real-world blockchain deployments where transaction submitters, memory pools, validators, and chain nodes operate as distinct entities. This separation enables each component to be independently optimized, tested, and eventually distributed across multiple machines without altering the core consensus logic. We evaluate MEMO across 42 parameter configurations spanning block intervals of 1–32 seconds, block sizes up to 64K transactions per block, demonstrating over 8K/sec end-to-end confirmed throughput on a single node. Future work targets distributed deployment across 10~1K nodes using gossip-based propagation, persistent storage backends, and post-quantum signature support through a pluggable cryptographic interface.

Author Bios:
Venkata Harsha Pedada:
Harsha is a first-year PhD student in Computer Science at Illinois Institute of Technology, advised by Prof. Ioan Raicu at the DataSys Lab. He holds an M.S. in Computer Science (2024) and has industry experience as a Data Engineer. His research interests include distributed systems, blockchain consensus mechanisms, and proof-of-space protocols, with a focus on designing high-throughput decentralized systems.

Ioan Raicu:
Dr. Ioan Raicu is an Associate Professor in the Department of Computer Science at Illinois Institute of Technology and guest research faculty at Argonne National Laboratory. He is the founder and director of the Data-Intensive Distributed Systems Laboratory (DataSys Lab). His research focuses on distributed systems, many-task computing, and data-intensive computing at extreme scales, with recognition including the IEEE TCSC Young Achievers in Scalable Computing award and an NSF CAREER award.

Conventional Attacks on Scientific Applications

Authors: Michael Polinski, Peter Dinda

Institution(s): Northwestern University

Room: Northwestern

Board #: N5

Session #: 1

Abstract: How vulnerable are scientific applications to conventional attacks mounted through their inputs? Prior work largely leaves this question unanswered. We begin to answer through a challenging adaptation of security-focused fuzzing and bug triage to identify and characterize vulnerabilities in a set of scientific codebases. Our process discovers almost 1000 unique crash clusters (distinct bugs), across Enzo, GROMACS, Laghos, and LAMMPS (about 3 million lines of code). Per target, an average of 7.3% of crash clusters represent easily exploitable vulnerabilities according to heuristic security analysis. Additionally, 25% of all clusters (52% for GROMACS) are exploitable heap memory safety issues. Our overall analysis points to over 110 distinct vulnerabilities that may permit arbitrary code execution. Manual inspection confirms the existence of exploitable bugs. Our results suggest untrusted data could have catastrophic effects on artifact integrity, creating a real risk for institutions and society as computational scientific results drive increasingly consequential decisions.

Author Bios:
Michael is a PhD candidate at Northwestern University advised by Peter Dinda. Broadly, his research interests lie in security and privacy in computer systems and networks with a focus on applying innovations in software and hardware to address recurring and pervasive security issues. His current work examines how both well-known and novel techniques can be used to exploit scientific applications to manipulate results and compromise infrastructure.

STARD-Net: SpatioTemporal Attention for Robust Detection of Tiny Airborne Objects from Moving Drones

Authors: Md. Hasibur Rahman and Sanjay Madria

Institution(s): Missouri University of Science and Technology

Room: Northwestern

Board #: N33

Session #: 1

Abstract: The rapid adoption of drones across various domains, alongside advancements in computer vision, has driven growing interest in vision-based airborne object detection from moving aerial platforms. However, this task remains challenging due to the small scale of objects, camouflage within cluttered backgrounds, and occlusions. To address these challenges, we introduce an end-to-end detection framework that integrates a Drone Receptive Field Block (DRFB) to extract multiscale and geometrically diverse features, specifically designed to enhance the detection of small and camouflaged airborne objects. To model motion patterns over time while preserving spatial structure, particularly for detecting camouflaged, cluttered and occluded objects with limited appearance cues, we incorporate a Convolutional Long Short-Term Memory (ConvLSTM) module, which effectively captures temporal dependencies across consecutive frames. Additionally, we introduce a SpatioTemporal Attention Block (STAB), inspired by Multi-Head Attention, to aggregate spatial and temporal context for improved semantic understanding. The detection head combines a Swin Transformer with a Cross Stage Partial (CSP) Bottleneck, offering lightweight yet powerful global context modeling for robust detection in complex aerial scenes. We evaluate our model on four publicly available airborne object detection datasets from moving drones, achieving significant improvements in accuracy while maintaining real time inference speed. Moreover, when integrated into various You Look Only Once (YOLO) architectures, our spatial feature extraction module (DRFB) consistently boosts performance, demonstrating its broad applicability and effectiveness.

Author Bios:
Md Hasibur Rahman is a Ph.D. candidate in Computer Science at Missouri University of Science and Technology. His research focuses on applied artificial intelligence, computer vision, and multimodal learning, with applications in UAV-based perception, small object detection and tracking, GeoAI, remote sensing, and biomedical AI.

Biography: Sanjay K Madria is a Curators’ Distinguished Professor in the Department of Computer Science at the Missouri University of Science and Technology, Rolla, MO.

Private Inference for Decentralized Multimodal Data Silos

Authors: Vivek Rai, Nathaniel Hudson

Institution(s): Illinois Institute of Technology

Room: Northwestern

Board #: N2

Session #: 1

Abstract: Scientific discovery increasingly relies on the collaboration between diverse institutions, such as telescope observatories, satellites, and laboratories, which each observe only a fragment of the underlying phenomenon. We study private, distributed inference in this setting, where such parties hold partial, multi-modal observations over local timestamps and cannot directly share raw data. We model the system as a multilayer tree comprising participant nodes, intermediate forwarders, and a trusted root orchestrator that coordinates request fulfilment and computation. Each node submits requests when the required modalities for a fixed calculation are missing locally. We evaluate our privacy-aware network routing protocol under varying environmental conditions: data availability, network topology, and request constraints such as accuracy and speed. To validate this protocol, we compare it against an integer linear program baseline for the model system with the objective set to minimise communication overhead across the network. We also take into account the variable sizes of these modalities (e.g. numerical data, images, videos). In preliminary per-timestamp settings, our protocol matched the ILP baseline on feasibility and communication cost across 25,536 request-level evaluations while reducing mean runtime from 8.15s to 0.08s, a roughly 99x improvement. It maintained 100\% successful calculation completion with zero failures, while a random baseline incurred 14.8\% higher communication cost. These results suggest that structured routing can preserve optimization-quality performance while making distributed scientific inference far more practical.

Author Bios:
Nathaniel Hudson:
Nathaniel Hudson is an Assistant Professor of Computer Science at the Illinois Institute of Technology in the Department of Computer Science. His research studies the design of systems for serving AI on edge computing infrastructure — i.e., Edge Intelligence (EI) — for smart city applications.

Vivek Rai:
Vivek Rai is a Master’s student in Computer Science at the Illinois Institute of Technology, focused on accelerating scientific discovery through efficient AI systems. His work spans distributed learning, autonomous systems, and multimodal models, with experience building scalable ML, cloud NLP pipelines, and real-world decision-making systems.

An Investigation of LLM Use in Scientific Papers

Authors: Meredith Sauer, Satyaki Sikdar

Institution(s): Loyola University Chicago

Room: Northwestern

Board #: N13

Session #: 1

Abstract: Since the advent of ChatGPT in late 2022, LLMs have become ubiquitous in academic settings. Several recent papers have investigated the increase in LLM use in scientific papers since 2022, finding a broad and increasing presence of LLM-modified text in papers across disciplines. Previous research has attempted to measure LLM use in shorter texts: paper abstracts and conference peer reviews. In this project, we build on this work by developing a model to quantify the use of LLMs in the full text of scientific papers that have been posted on ArXiv, an open access archive for pre-print articles primarily in Physics, Mathematics, and Computer Science. Our method uses individual word frequencies in known human-written and AI-generated articles to train a model to provide a population-level estimate of the proportion of text produced by an LLM in a mixed set of human-written and AI-generated papers. We then use this model to estimate the percentage of text that has been substantially modified by LLMs in a corpus of papers posted on ArXiv between 2023 and 2024. This lets us track and quantify the rise of LLM use in full text of papers across several disciplines spanning different research areas.

Author Bios:
Dr. Satyaki Sikdar is an Assistant Professor of Computer Science at Loyola University Chicago. He earned his Ph.D. from the University of Notre Dame and completed postdoctoral work at Indiana University. His recent projects use generative models and embeddings to mine large‑scale complex networks, with results appearing in venues such as Nature Human Behaviour, IEEE TKDE, and Scientific Reports.

Meredith Sauer received her MA in Digital Humanities from Loyola University Chicago. During her time at Loyola, she served as a graduate assistant for the Department of Computer Science, assisting with research in natural language processing. She holds a BA in English Literature from Kenyon College.

Building Reliable AI Through Explainable AI-Enhanced Training Pipelines

Authors: Shilpika, Carlo Graziani, Bethany Lusch, Michael E. Papka

Institution(s): Argonne National Laboratory, University of Illinois Chicago

Room: Northwestern

Board #: N30

Session #: 1

Abstract: Improving the reliability of artificial intelligence systems is essential for their adoption in scientific discovery and large-scale domain workflows. In collaboration with domain scientists working on AI and machine learning pipelines at scale, our work investigates how Scalable Explainable AI (XAI) techniques can be embedded directly into the model training process to enhance reliability. Rather than treating explainability as a post hoc step applied only after prediction, we explore the use of explanations during training to guide models toward learning patterns that are more transparent, robust, and scientifically meaningful. This training-integrated approach has the potential to improve model trustworthiness, facilitate validation by domain experts, and better align learned representations with real-world scientific expectations. By incorporating explainability into the learning process itself, the proposed framework aims to bridge the gap between high-performing AI models and the interpretability requirements of scientific practice, supporting the development of AI systems that are both effective and dependable in real-world applications.

Author Bios:
Dr. Shilpika is a Postdoctoral Appointee in the LCF Leadership Computing Facility organization. Her research focuses on data visualization and analysis of high-performance computing systems, including visualization and interpretation of AI for science to enable informed decision-making in AI workflows.

Carlo Graziani is a computational scientist in the Mathematics and Computer Science Division of Argonne National Laboratory. He received his PhD in Physics at the University of Chicago in 1993. He has worked on problems in theoretical astrophysics, computational fluid dynamics, plasma physics, mathematical statistics, uncertainty quantification, and machine learning. He joined Argonne in 2017.

Dr. Bethany Lusch is a Computer Scientist in the data science group at the Argonne Leadership Computing Facility at Argonne National Lab. Her research expertise includes developing methods and tools to integrate AI with science, especially for dynamical systems and PDE-based simulations.

Dr. Michael E. Papka is an Argonne Senior Scientist and Distinguished Fellow. He serves as the deputy associate laboratory director for Computing, Environment, and Life Sciences (CELS) and the division director of the Argonne Leadership Computing Facility (ALCF). His leadership focuses on leveraging high-performance computing to advance scientific discovery and innovation.

SMART: A Surrogate Model for Predicting Application Runtime in Dragonfly Systems

Authors: Xin Wang, Pietro Lodi Rizzini, Sourav Medya, Zhiling Lan

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N21

Session #: 1

Abstract: The Dragonfly network, with its high-radix and low-diameter structure, is a leading interconnect in high-performance computing. A major challenge is workload interference on shared network links. Parallel discrete event simulation (PDES) is commonly used to analyze workload interference. However, high-fidelity PDES is computationally expensive, making it impractical for large-scale or real-time scenarios. Hybrid simulation that incorporates data-driven surrogate models offers a promising alternative, especially for forecasting application runtime, a task complicated by the dynamic behavior of network traffic. We present SMART, a surrogate model that combines graph neural networks (GNNs) and large language models (LLMs) to capture both spatial and temporal patterns from port level router data. SMART outperforms existing statistical and machine learning baselines, enabling accurate runtime prediction and supporting efficient hybrid simulation of Dragonfly networks.

Author Bios:
Xin Wang is a Postdoctoral Research Associate in the Department of Computer Science at the University of Illinois Chicago. She received her Ph.D. in Computer Science from the Illinois Institute of Technology. Her research focuses on high performance computing, including resource management, job scheduling, and scalable modeling and simulation for large scale systems.

Pietro Lodi Rizzini received his Master’s degree in Computer Science, where he conducted research under the supervision of Sourav Medya at the University of Illinois Chicago. His work focused on data driven methods for large scale systems.

Sourav Medya is an Assistant Professor at the Department of Computer Science, University of Illinois Chicago (UIC). Before joining UIC, he was a research assistant professor in the Kellogg School of Management at Northwestern University and the Northwestern Institute on Complex Systems (NICO). He received his Ph.D in Computer Science from the University of California, Santa Barbara.

Zhiling Lan is a Professor of Computer Science at the University of Illinois Chicago, with a joint appointment at Argonne National Laboratory. Her research focuses on high performance computing, including system reliability, performance modeling, and workload management for large scale systems.

Characterizing Spectrum Usage from Long-Term Passive Measurements

Authors: Naicheng Wei, Cynthia Hood

Institution(s): Illinois Institute of Technology

Room: Northwestern

Board #: N23

Session #: 1

Abstract: Understanding how spectrum is used in real-world environments is central to dynamic spectrum access and spectrum policy. Most existing approaches rely on threshold-based occupancy, which indicates activity but does not distinguish different usage patterns. We present a proof-of-concept pipeline for characterizing spectrum use from long-term passive measurements. Using year-long data collected at a fixed monitoring site in Chicago, we extract interpretable features from sliding windows of channel-level power and apply density-based clustering to identify recurring operational regimes. These regimes capture distinct patterns such as persistent transmission, burst-dominated activity, and extended silence. Representing each day as a mixture of regimes provides a compact summary of channel behavior and enables tracking of long-term stability and structural change. The resulting representation can support labeling of large-scale measurement data and can be combined with contextual information for improved interpretation. The approach is measurement-driven and designed to generalize across bands and deployments.

Author Bios:
1: Naicheng Wei is a PhD student in Computer Science at Illinois Institute of Technology. Her research focuses on spectrum measurement and data-driven analysis of wireless systems, with an emphasis on real-world measurement data. 2: Cynthia Hood is an Associate Professor of Computer Science and Engineering at Illinois Tech. Her research involves using AI to automate the analysis and modeling of wireless spectrum utilization using long-term, wideband measurements collected by the IIT Spectrum Observatory. Dr. Hood is the recipient of an NSF Career Award and was a Fulbright Scholar in Poland in 2023.

Characterizing IR Generated by Rustc and Clang

Authors: Benjamin Ye

Institution(s): Northwestern University

Room: Northwestern

Board #: N8

Session #: 1

Abstract: The Rust programming language introduces many zero-cost abstractions aimed at improving memory safety and expressiveness. This work investigates whether these features lead to identifiable differences in IR compared to that generated from more minimal languages such as C, and considers their potential implications. We analyze two C benchmark suites alongside their Rust reimplementations. Our evaluation focuses on many metrics including opcode frequency, type usage, and def-use graphs. We find consistent and measurable differences in the generated LLVM-IR, with rustc generating IR with substantially more types and def-use graphs 10x wider. These structural differences unlock potential for parameterized algorithms and suggest new opportunities for optimizations.

Author Bios:
Benjamin Ye is a M.S. Student in Computer Science at Northwestern University studying compilers, with an emphasis on IR design and optimization. His research explores how language-level design decisions influence code generation and performance.

SchedTwin: Simulation-Assisted Online Learning for Adaptive HPC Scheduling

Authors: Jordan Zhang; Kanglin Xu; Yash Kurkure; Michael Papka; Zhiling Lan

Institution(s): University of Illinois Chicago; Argonne National Laboratory

Room: Northwestern

Board #: N19

Session #: 1

Abstract: Modern High-Performance Computing (HPC) sys- tems are increasingly bottlenecked by static, heuristic-based schedulers that cannot adapt to heterogeneous and dynamic workloads. While Deep Reinforcement Learning (DRL) offers adaptability in principle, its high training cost and retraining overhead limit practical deployment. We present SchedTwin, a framework that formulates adaptive scheduling as a contextual multi-armed bandit (CMAB) problem, enabling continuous on- line learning of policy selection strategies without retraining. SchedTwin’s key innovation is a dual-context representation of real-time system state and simulator-projected policy outcomes, overcoming the delayed feedback inherent to job scheduling and enabling the CMAB agent to select policies based on both current conditions and anticipated future impact. Evaluation on production traces from ALCF systems ranging from 560 to 10,624 nodes shows that SchedTwin consistently outperforms traditional heuristics and DRL-based baselines, reducing job wait time by up to 86.6% and job slowdown by up to 92.6%. Live deployment on a PBS-managed cluster confirms that SchedTwin incurs only seconds of overhead per scheduling decision, demonstrating its practicality for production HPC environments.

Author Bios:
Jordan Zhang, a second-year CS PhD student at UIC, advised by Prof. Zhiling Lan; Kanglin Xu, a first-year CS PhD student at UIC, advised by Prof. Zhiling Lan; Yash Kurkure, a second-year CS PhD student at UIC, advised by Prof. Michael Papka; Michael Papka is a CS Professor at UIC; Zhiling Lan is a CS Professor at UIC;

TRUSTCHECKPOINTS: Time Defeats Malware for Unconditional Software Root of Trust

Authors: Friedrich Doku, Peter Dinda

Institution(s): Northwestern University

Room: Northwestern

Board #: N3

Session #: 1

Abstract: Modern IoT and embedded platforms must start execution from a known trusted state to thwart persistent malware and protect critical infrastructure. Current approaches to establish a root of trust depend on manufacturer-provisioned secret keys and specialized secure hardware. If secrets are leaked or a hardware vulnerability is discovered, the security of the entire device is permanently subverted. Furthermore, these methods drive up costs, may involve third-parties, and rely on assumptions about an attacker’s computational power that may not hold over the device’s lifespan. This paper presents TRUSTCHECKPOINTS, the first implementation of an unconditional software root of trust on commodity hardware. We define malware operationally as any deviation from the device’s expected memory contents, and detect it by physical footprint. Our novel algorithm, MULTIPASS, uses k-independent randomized polynomial evaluation (via Horner’s rule) to force persistent malware into slower off-chip storage, causing detectable timing delays. An external verifier, physically connected to the device by the owner, measures these delays to detect even single-instruction payloads, enabling integrity checking without cryptographic keys or specialized trusted hardware. Our ARM Cortex-A53 prototype validates 192 KB of SRAM in ∼10 s using 500 passes. Two modes, SRAM-bootstrap and full-memory scan, balance speed and coverage on unmodified hardware. Our evaluation introduces new assumptions that bound unconditional software root-of-trust establishment.

Author Bios:
Friedrich is a second-year PhD candidate and systems security researcher working on operating systems, computer architecture, and the foundations of trusted computing. His recent work explores keyless, unconditional roots of trust for embedded and IoT devices, and capability-based hardware mechanisms for safely giving applications direct access to devices. He is especially interested in how minimal trusted components and physical assumptions can replace today's complex attestation stacks. More at https://fdoku.me.

Peter Dinda is a professor in the Department of Computer Science at Northwestern University, and also holds an appointment in the Department of Electrical and Computer Engineering. He works in experimental computer systems. You can find out more about him at pdinda.org.

Deferred Execution for High-Performance Code Generation on Processing-in-Memory Systems

Authors: David Krasowska, Andrew Crotty, Peter Dinda

Institution(s): Northwestern University

Room: Northwestern

Board #: N4

Session #: 1

Abstract: Processing-in-Memory (PIM) reduces the data-to-compute critical path, but its low-level interfaces makes PIM difficult to program. To bridge this gap, we have designed and implemented PolymerPIM for the UPMEM PIM system. PolymerPIM enables the programmer to write composable vector operations with broadcast semantics in either C++ or Julia. At runtime, these operations queue tasks that represent deferred execution. The executor fuses and compiles tasks upon first use. By applying both vertical and horizontal fusion, PolymerPIM automatically generates highly optimized kernels that minimize data movement and maximize the amount of computation per byte transferred. PolymerPIM improves ease of use without sacrificing speed, matching the performance of hand-tuned code.

Author Bios:
Author 1: David Krasowska is a DOE Computational Science Graduate Fellow and Computer Science Ph.D. candidate at Northwestern University. His research focuses on enabling domain scientists to efficiently run their workloads on distributed, heterogeneous computing systems and emerging AI accelerators using novel runtimes. More: krasow.dev.

Author 2: Andrew Crotty is an Assistant Professor of Computer Science at Northwestern University. His research focuses on building systems for data management, data science, and machine learning.

Author 3: Peter Dinda is a professor in the Department of Computer Science at Northwestern University, and also holds an appointment in the Department of Electrical and Computer Engineering. He works in experimental computer systems. You can find out more about him at pdinda.org.

When Does the Bias Pay Off? An Empirical Study of Reference Counting in Free-Threaded CPython

Authors: Farhan Saif

Institution(s): University of Illinois At Chicago

Room: Northwestern

Board #: N9

Session #: 1

Abstract: CPython 3.13's experimental free-threaded mode removes the Global Interpreter Lock (GIL) to enable multi-threaded parallelism. Its centerpiece is biased reference counting, which splits each object's refcount into a thread-local field — updated without atomics by the owning thread — and a shared field requiring atomics from other threads. The design bets that most objects are accessed primarily by their creating thread. No systematic study has characterized when this bet pays off. We present an empirical evaluation across the object-sharing spectrum using three CPython builds from the same commit: a GIL baseline, a free-threaded release build, and a free-threaded instrumented build that decomposes every `Py_INCREF` and `Py_DECREF` into its local or shared component. We measure throughput, per-object memory overhead, deallocation latency, and cache behavior across six object types (dict, list, tuple, slots, int, str). Seven microbenchmarks span the sharing spectrum — thread-local allocation, producer-consumer handoff, shared-read, shared-write, mixed-locality, object transfer, thread churn — complemented by five application workloads including a thread-pooled web server, parallel parser, and multi-stage pipeline. Thread counts span 1 to 32, and we parameterize the local-to-shared ratio to find the crossover where the bias's overhead exceeds its benefit. Using hardware counters (`perf stat`, `perf c2c`), we characterize last-level cache misses and false sharing on refcount fields, and measure stop-the-world pause durations and per-object merge costs during thread termination — a previously unmeasured cost of the biased scheme. We provide actionable guidance on which concurrency patterns preserve biased refcounting's advantages and which inadvertently defeat them.

Author Bios:
Farhan Saif, MS student at UIC, working under the lab of professor Luis Pina.

Nemo: Efficient Caching with Index Pushdown for Disaggregated OLTP Databases

Authors: Wenjie Hu

Institution(s): University of Wisconsin Madison

Room: Northwestern

Board #: N15

Session #: 1

Abstract: Storage-disaggregated databases decouple compute from storage, improving elasticity but suffering from network bottlenecks. Most cloud databases primarily use page-based B+trees and buffer pools. Under disaggregation, each index traversal incurs serial network round-trips. Page-granularity transfers and caching also amplify network traffic and lower cache efficiency. Pushing down the full index to storage and caching only hot records at compute is appealing as a solution, but an efficient design must (1) provide a holistic record cache that accelerates both point and range queries, (2) mitigate the negative search penalty of confirming non-existence via remote probes, and (3) enforce storage-agnostic phantom protection despite the index being encapsulated by the storage service. We introduce Nemo, a holistic index-and-buffer design optimized for storage-disaggregated OLTP with index pushdown. Nemo employs a tree-structured record buffer that unifies caching and buffering and accelerates both point and range queries. It further reduces negative search penalty by caching negative search results. Nemo provides storage-agnostic phantom protection that is robust against large key domains, varying data skew, and partition schemes. Experiments show that Nemo outperforms page-based B+Tree and the state-of-the-art 2-Tree by up to 11.42× and 2.97× on positive point queries, and achieves up to 4.76× speedup on negative searches over a hash-indexed record cache. For phantom protection, Nemo outperforms PCL by up to 1.55× even under a dense key domain. Moreover, Nemo maintains stable performance while PCL degrades sharply as the key domain grows large and sparse.

Author Bios:
Wenjie Hu is a fourth-year Ph.D. student at the University of Wisconsin-Madison, advised by Prof. Xiangyao Yu. Her research focuses on bridging the compute-storage gap in disaggregated OLTP databases through efficient data access and coordination mechanisms. She is also interested in agentic data systems.

Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems

Authors: Kirill Nagaitsev, Luka Grbcic, Samuel Williams, Costin Iancu

Institution(s): Northwestern University, Lawrence Berkeley National Laboratory

Room: Northwestern

Board #: N17

Session #: 1

Abstract: Maximizing performance on available GPU hardware is an ongoing challenge for modern AI inference systems. Traditional approaches include writing custom GPU kernels and using specialized model compilers to tune high-level code for specific GPU targets. Recent work shows that LLM-based multi-agent systems can effectively perform such tuning, often outperforming existing compilers and eliminating the need for manual kernel development. However, the dynamics of multi-agent systems for this task remain unexplored. In this work, we present a logical framework for comparing multi-agent PyTorch optimization systems. Our evaluation shows that exploit-heavy strategies perform best when paired with error-fixing agents, and that performance correlates with the granularity of optimization steps. The best implementation achieves an average 2.88× speedup over PyTorch Eager (1.85× over torch.compile) on an H100 GPU across diverse tasks in KernelBench, a benchmark suite covering a range of machine learning architectures in PyTorch. Code is publicly available at: https://github.com/pike-project/pike

Author Bios:
Kirill Nagaitsev: I am a computer science Ph.D. candidate and DOE CSGF fellow at Northwestern University, advised by Peter Dinda as part of the Prescience Lab. My research applies LLM-based systems to compiler and performance-engineering problems. Recently, I have focused on understanding and improving agent-based approaches to performance engineering tasks such as PyTorch inference optimization.

Luka Grbcic: I am a postdoctoral researcher at the Lawrence Berkeley National Laboratory in the Applied Math and Computational Research Division. I am a part of the Applied Computing for Scientific Discovery group, where my research is focused on developing computational tools and methods that accelerate scientific discovery.

Samuel Williams: Dr. Williams is a Senior Scientist at the forefront of high-performance computing (HPC) and machine learning (ML). His research is focused on the development of performance modeling and analytical capabilities for novel and emerging accelerated HPC and ML architectures and the co-design of algorithms and architectures that balance mathematical and ML performance with computational efficiency and distributed memory scalability.

Costin Iancu: Over the years I've been involved in multiple disciplines. I tend to favor simple and practical designs. I am still performing research in the areas of programming models and code optimization for large scale parallel systems. The emphasis here is in composing calls to parallel libraries either at the runtime or at language level.

A Modular Architecture for Detecting Data Drift in Research Systems

Authors: Kenny Lyons

Institution(s): Loyola University Chicago

Room: Northwestern

Board #: N25

Session #: 1

Abstract: Modern data-driven systems operate in dynamic environments where distributional drift can silently degrade performance, reliability, and interpretability. This work presents a modular supervision architecture for real-time drift detection across heterogeneous research systems. The framework combines embedding-based similarity, lightweight statistical signals, and k-nearest neighbor classification to identify deviations from expected behavior in a transparent and domain-agnostic manner. To support non-stationary data, the system incorporates both fixed and adaptive reference baselines, enabling it to distinguish meaningful divergence from routine variation. The architecture is designed for extensibility and low overhead, making it suitable for integration into machine learning pipelines, experimental instrumentation, and broader scientific workflows. By prioritizing interpretability, configurability, and generality, this work offers a practical foundation for monitoring and maintaining reliability in modern research infrastructure.

Author Bios:
Kenny Lyons is a Master’s student in Computer Science at Loyola University Chicago focused on machine learning systems, reliability, and data drift in dynamic environments. His work explores modular supervision architectures for monitoring and stabilizing evolving data pipelines. He will be joining Lawrence Berkeley National Laboratory as a summer research fellow and is interested in pursuing PhD research at the intersection of machine learning and complex systems.

Trigger, Render, Reason: Multimodal LLM Agents for Science-Aware Scientific Simulation Monitoring

Authors: Hua Xu

Institution(s): Illinois Institute of Technology

Room: Northwestern

Board #: N26

Session #: 1

Abstract: Scientific simulations on leadership-class HPC systems routinely waste millions of core-hours producing invalid results, yet no tool monitors whether a running simulation computes valid science. We call this blind spot "dark waste": simulations that run to completion while computing the wrong answer. Large language models are increasingly used to automate scientific visualization, yet they cannot directly process the volumetric, multi-resolution data produced by scientific simulations. Current approaches either control visualization tools without accessing data content, or reduce simulation outputs to statistical summaries that discard spatial structure. We introduce a multi-resolution translation layer that converts raw simulation data in standard formats into LLM-native representations, combining hierarchical metadata extraction, derived quantities to encode spatial, temporal, and topological relationships in a form LLMs can reason about: not merely view as rendered images. Building on this capability, we present Vigil, which couples deterministic science-aware triggers with multimodal LLM confirmation through a trigger-render-reason pipeline. Vigil instruments simulations with lightweight checks on derived-quantity operators and regions of interest; when a check detects a potential anomaly, the flagged data is translated and analyzed by a multimodal model to assess scientific validity. Upon confirming invalid behavior, Vigil can terminate or restart the simulation from a checkpoint, preventing further expenditure on "dark waste". Together, the translation layer and trigger-render-reason pipeline enable a new class of data-aware visualization agents capable of anomaly localization and physics-informed visualization design, which neither tool-mediation nor metadata-only approaches can support. We evaluate Vigil across four codes spanning numerical instability, pattern dynamics, turbulence dissipation, and molecular-dynamics failures. Vigil detects every deliberately configured failure with zero false positives and supplies an actionable semantic verdict for each alert.

Author Bios:
Hua Xu is a PhD Student in the Department of Computer Science at the Illinois Institute of Technology and a member of the Gnosis Research Center. Hua Xu works under the guidance of Dr. Xian-He Sun. Before that, he worked at S&C Electric as a software engineer, specializing in .NET development and CI/CD.

Exploring the Cost–Performance Tradeoff in Federated RAG Systems

Authors: Song Young Oh, Arham Khan, Mansi Sakarvadia, Haochen Pan, Ian Foster, Kyle Chard

Institution(s): University of Chicago

Room: Northwestern

Board #: N27

Session #: 1

Abstract: Retrieval-Augmented Generation (RAG) over federated knowledge sources introduces a key tradeoff: querying more vector databases improves recall but increases communication and latency cost, while querying fewer risks missing the evidence needed to answer a question. Prior work has largely framed this as a routing problem: deciding which (and how many) sources to contact. We revisit this question and ask where the real cost–performance tradeoff arises in federated RAG systems. Through controlled experiments on QA benchmarks, we find that different routing strategies often make similar routing and retrieval decisions and the minor variations between them have limited impact on downstream LLM answer quality. Further, we find that while retrieval performance across different RAG methods is relatively consistent, the cost of each method is much more variable. In particular, we find that ML-based routing methods incur a substantially higher computational overhead compared to non-ML-based routers but routing quality does not significantly differ across these settings. In light of this finding, we pose the question: how should federated RAG systems efficiently construct information-dense contexts that maximize utility for generation? We introduce a proxy metric for information density which we use to characterize the efficiency of different routing methods. We use this metric to do pareto-optimal analysis to choose routing strategies that effectively balance the routing cost vs. downstream LLM performance tradeoff. Preliminary results suggests routers which can quickly retrieve a higher number of "reasonable" documents have greater potential to impact downstream model performance compared to more expensive routing methods which may return a smaller subset of highly curated documents under the same compute budget.

Author Bios:
Song Young Oh is a second-year PhD student in computer science at the University of Chicago.

Arham Khan is a fourth-year PhD student in computer science at the University of Chicago.

Mansi Sakarvadia is a fourth-year PhD student in computer science at the University of Chicago.

Haochen Pan is a fifth-year PhD student in computer science at the University of Chicago.

Ian Foster is the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago.

Kyle Chard is a Research Associate Professor in the Department of Computer Science at the University of Chicago.

Scalable and Interactive Visualization for Out-of-Distribution Data

Authors: Marija Stojanoska, Nandhini Gulasingam, Alexandru Orhean, Jacob Furst , Daniela Raicu

Institution(s): DePaul University

Room: Northwestern

Board #: N28

Session #: 1

Abstract: For the past decade, due to the advancement of computer technology, the rapid increase in large quantities of data and the reemergence of Artificial Intelligence, there has been a lot of effort dedicated towards improving the efficiency and effectiveness of Data Science and Machine Learning models, both in terms of the quality of the model but also in terms of the performance of the implementation. Typically, machine learning or deep learning models can be tuned and improved to obtain very high accuracy and precision over specific training and test dataset. However, when the same model is confronted with data outside of the training and test dataset, commonly named Out-of-Distribution data, the performance of the model can decrease dramatically. This can have sever repercussions in the context of critical domains. For example, in medicine the model could misdiagnose a patient, that would lead to incorrect treatments. Or in the context of home robotics, it could misidentify an object or a command, leading to unintended actions. Detecting and highlighting Out-of-Distribution data points becomes cumbersome when using popular standard tools such as Python Matplotlib and Jupyter Notebooks. Thus, this project will focus on the design and implementation of a scalable and interactive Out-of-Distribution data visualization software platform that will automate and simplify the process of analyzing and identifying Out-of-Distribution data. The proposed software platform will be used to enhance the Machine Learning models developed in previous work, that focus on predicting lung nodule tumor malignancy from medical images.

Author Bios:
Marija Stojanoska - Computer Science Combined Degree Student at DePaul University
Nandhini Gulasingam - PhD Student at DePaul University and Adjunct Faculty Member
Alexandru Orhean - Assistant Professor at Jarvis School of Computing DePaul University
Jacob Furst - Professor at Jarvis School of Computing DePaul University
Daniela Raicu - Professor at Jarvis School of Computing DePaul University

Bottlenet: Estimating data movement latency within a clustered GPU topology using GNN

Authors: Chalmers Phua, Animesh Saxena

Institution(s): University of Illinois Chicago

Room: Northwestern

Board #: N32

Session #: 1

Abstract: The rapid growth of deep learning model sizes has made single-GPU training and inference increasingly infeasible. State-of-the-art language models, vision transformers, and multimodal architectures routinely exceed the memory and compute capacity of individual accelerators, driving organizations toward multi-GPU infrastructures as the primary vehicle for both training and serving. We present a GNN–based approach to estimating data movement latency within clustered GPU topologies. We evaluate three GNN architectures: MPNN (NNConv), GAT, and GCN using scenario-aware data splitting to prevent counterfactual leakage and ensure generalization to unseen topologies. Our results demonstrate that GNNs are a viable and generalizable alternative to static cost models for data movement estimation in GPU clusters, with potential applications in real-time routing and adaptive congestion avoidance for collective communication libraries.

Author Bios:
Chalmers Phua is a PhD student at the University of Illinois Chicago, he conducts research on dynamic workload optimization, automated program repair, and agentic AI.

Animesh Saxena is a PhD student in Computer Science at the University of Illinois Chicago, where he researches distributed machine learning systems with a focus on collective communication optimization for heterogeneous multi-GPU clusters.

Room: Dillo

Simulation-Based Performance Evaluation of Sharded Blockchain Architectures

Authors: Om Amit Gandhi and Ioan Raicu

Institution(s): Illinois Institute of Technology

Room: Dillo

Board #: D4

Session #: 1

Abstract: Public blockchains continue to struggle with scalability because improving throughput is not as simple as increasing block size or reducing block interval. Larger blocks increase validation and transmission cost, while shorter intervals raise the likelihood of propagation delays, forks, and stale blocks. These limits motivate sharding, where transaction processing is divided across multiple parallel shard groups. In this work, we present a configurable SimPy-based discrete- event simulator for evaluating sharded blockchain architectures under controlled workload and network assumptions. The simulator models mining, verification, inter-shard coordination, block dissemination, measured throughput, average block time, and communication overhead. Our goal is not to claim production deployment performance, but to study scaling trends in a controlled environment. Our simulator achieves 1.4M TPS at 256 shards under a local datacenter like setup and 0.5M TPS in a global WAN setup, showing strong throughput gains from parallel execution. However, the gains are not unbounded: beyond a certain number of shards, coordination traffic, synchronization, and network overhead begin to dominate, leading to diminishing returns.

Author Bios:
Om Amit Gandhi
Om Amit Gandhi is a Computer Science student at Illinois Institute of Technology specializing in systems programming, GPU computing, and blockchain technology. His work spans parallel computing, cryptography, and high-performance architecture, with projects including GPU-accelerated blockchain plotters and distributed sharding simulations. He actively builds portfolio projects bridging academic research and real-world systems.

Dr. Ioan Raicu
Dr. Ioan Raicu is an Associate Professor in the Department of Computer Science at Illinois Institute of Technology and guest research faculty at Argonne National Laboratory. He is the founder and director of the Data-Intensive Distributed Systems Laboratory (DataSys Lab). His research focuses on distributed systems, many-task computing, and data-intensive computing at extreme scales, with recognition including the IEEE TCSC Young Achievers in Scalable Computing award and an NSF CAREER award.

Hybrid Edge-HPC Systems for Low-Latency Data-Driven Inference

Authors: Ryan Hartung, Douglas Thain

Institution(s): University of Notre Dame

Room: Dillo

Board #: D12

Session #: 1

Abstract: Emerging cyber-physical systems increasingly require real-time inference from streaming sensor data while maintaining models that reflect complex and evolving physical processes. However, high-fidelity simulations and model training often require execution on remote high-performance computing (HPC) systems, creating a fundamental mismatch between the latency requirements of edge applications and the computational cost and availability of HPC resources. We present RBF (Reverse Backfill), a hybrid edge–HPC learning and inference architecture that integrates real- time edge inference with asynchronous simulation-driven model improvement on HPC systems. RBF reinterprets HPC backfilling by using opportunistic, delay-tolerant computation to improve model accuracy rather than system utilization. RBF decouples low-latency inference from expensive simulation and training workflows by deploying lightweight surrogate models with a regular cadence. It also uses batch-controlled HPC resources to generate improved models that are retrained opportunistically as resources become available. The architecture supports pluggable surrogate models and orchestrates computation across heterogeneous infrastructure spanning edge devices, private 5G networking, cloud services, and HPC clusters. To demonstrate and evaluate RBF, we instantiate it using a real digital agriculture application. The deployment couples edge sensing with computational fluid dynamics (CFD) simulations to infer spatial airflow patterns in a large agricultural screenhouse. Our evaluation characterizes the end-to-end performance of the RBF pipeline under realistic system constraints, quantifying simulation latency, model training cost, inference throughput, and the impact of delayed model updates on prediction accuracy. Results show that HPC resources can be effectively integrated into operational cyber-physical systems to deliver real-time inference from AI surrogates trained on high-fidelity simulations.

Author Bios:
Ryan Hartung is a first-year Ph.D. student in Computer Science and Engineering at the University of Notre Dame and a member of the Cooperative Computing Lab. His research focuses on distributed and high-performance computing systems, with an emphasis on building scalable and portable tools and infrastructure for data-intensive scientific applications.

Douglas Thain is a Professor of Computer Science and Engineering at the University of Notre Dame, where he leads the Cooperative Computing Lab. His research focuses on distributed, cloud, and high-performance computing systems, developing tools that enable large-scale scientific applications across clusters and grids. He is also an educator and author of widely used materials in computing systems.

Medovik: A Triple-Layered Approach to Enabling x86 Linux Compatibility in TrustZone

Authors: Lucian Mocan, Michael Polinski, Friedrich Doku, Peter Dinda

Institution(s): Northwestern University (* University of Strasbourg)

Room: Dillo

Board #: D1

Session #: 1

Abstract: Medovik: A Triple-Layered Approach to Practical Linux Compatibility in TrustZone Trusted Execution Environments such as ARM TrustZone offer strong hardware-backed isolation, yet their restrictive programming model makes them difficult to use with standard Linux toolchains and legacy applications. Porting non-trivial codebases into the secure world is often prohibitively expensive. We introduce Medovik, a triple-layered sandbox that runs a WebAssembly runtime inside OP-TEE, which in turn hosts an x86 emulator (Bochs). This design provides a Linux-like environment behind three nested protection boundaries, at the expense of significant performance overhead. On a Neoverse-N1 CPU, the full stack exhibits a 205.6× slowdown on CoreMark (2k) and a 219.6× slowdown on the compute-intensive SPEC CPU2017 LBM benchmark (single-threaded). To mitigate this overhead, we developed a punch-through hypercall mechanism that selectively bypasses abstraction layers for hot paths. By manually offloading a single performance-critical function (LBM_performStreamCollideTRT, ~140 lines out of 1299) to AOT-compiled WebAssembly, we reduced the slowdown from 219.6× to ~7× — a roughly 30× relative improvement. Although the resulting performance remains well below native execution, these results demonstrate that selective hot-path offloading can make heavily nested TEE sandboxes practical for targeted compute-heavy workloads. We discuss the performance, compatibility, and security trade-offs of this approach.

Author Bios:
Lucian Mocan is a computer science researcher and software engineer pursuing his M.S. at the University of Strasbourg, currently a visiting student researcher at Northwestern University's Prescience Lab under Professor Peter Dinda. His work spans systems research, language tooling, and full-stack engineering, with a growing interest in how AI can accelerate and potentially automate systems research. He is open to opportunities starting September 2026 at lucianmocan.com.

Michael is a PhD candidate at Northwestern University advised by Peter Dinda. Broadly, his research interests lie in security and privacy in computer systems and networks with a focus on applying innovations in software and hardware to address recurring and pervasive security issues. His current work examines how both well-known and novel techniques can be used to exploit scientific applications to manipulate results and compromise infrastructure.

Friedrich is a second-year PhD candidate and systems security researcher working on operating systems, computer architecture, and the foundations of trusted computing. His recent work explores keyless, unconditional roots of trust for embedded and IoT devices, and capability-based hardware mechanisms for safely giving applications direct access to devices. He is especially interested in how minimal trusted components and physical assumptions can replace today's complex attestation stacks. More at https://fdoku.me.

Peter Dinda is a professor in the Department of Computer Science at Northwestern University, and also holds an appointment in the Department of Electrical and Computer Engineering. He works in experimental computer systems. You can find out more about him at pdinda.org.

Extending gem5 with Intel VMX Support for Full-System Virtualization Research

Authors: Seaver Olson

Institution(s): Loyola University at Chicago

Room: Dillo

Board #: D2

Session #: 1

Abstract: Gem5 is an excellent source to simulate cycle-accurate machines and full-system payloads however currently it is lacking the capabilities for modern hardware virtualization on x86 machines. This bars researchers from cycle-accurate simulations for hypervisor and other virtualization research. Intel's Virtual Machine Extension (VMX) allows x86 machines to handle virtualization through an architecture governed by two separate modes of operation, VMX root and VMX non-root. My work aims to extend gem5 with VMX support for x86 full-system simulation. The implementation targets the core structures and behaviors required to model virtualization. By enabling virtualization, gem5 opens up to research on hypervisors and virtualization overhead.

Author Bios:
I am a undergraduate computer systems researcher with an interest in operating systems and computer architecture.

Status-Coherent Clique Mining in Signed Networks

Authors: Layla Payton, TaiNing Wang

Institution(s): Loyola University Chicago

Room: Dillo

Board #: D9

Session #: 1

Abstract: Signed graphs encode positive and negative relationships between entities and arise naturally in social network systems, trust systems, and opinion platforms. Existing work on clique mining in signed graphs relies on structural balance theory as the coherence criterion, but balance theory is known to be a poor fit for directed networks. Status theory, which interprets a positive edge from A to B as A asserting higher standing for B, better explains empirically observed link patterns in directed signed graphs. We study the problem of mining maximal cliques that are coherent under status theory rather than structural balance. This shift introduces concrete systems challenges: status coherence requires verifying consistent directed status orderings within candidate cliques, which demands efficient triangle counting, directed triad enumeration, and constraint propagation across partial solutions. Standard pruning and bounding techniques developed for unsigned or balance-based clique search do not transfer cleanly, as the status coherence constraint lacks the monotone decomposability that makes those approaches effective. We present an algorithmic framework that addresses these challenges through status-aware graph reduction, adapted branch-and-bound search, and early pruning via local status inconsistency detection. Experiments on real-world signed networks demonstrate significant search space reduction compared to naive baselines. This work opens new directions for coherence-aware subgraph mining on signed graphs.

Author Bios:
Layla Payton is a PhD student at Loyola University Chicago researching graph data and clustering optimizations. Her interests include designing efficient methods for analyzing complex real-world data, as well as implementing analytical algorithms into social media networks.

Dr. TaiNing Wang is a tenure-track Assistant Professor of Computer Science at Loyola University Chicago. Her research focuses on database systems, query processing and optimization, graph data, applied AI/ML, AI accountability, and formal methods. Her work has been published in leading venues such as ACM SIGMOD, IEEE ICDE, and Future Generation Computer Systems.

Benchmarking the Performance of Semantic Search Techniques for Agent Discovery

Authors: Rajni Parshuram Pawar, Luke Logan, Xian-He Sun, Anthony Kougkas

Institution(s): Illinois Institute of Technology

Room: Dillo

Board #: D10

Session #: 1

Abstract: AI agents are becoming the primary interface to scientific data, yet discovering datasets on HPC filesystems remains a guess-and-check process. Agents repeatedly invoke find and ls with regex patterns, wasting compute, time, and LLM credits on failed searches. Filesystem metadata — rich type information, dimensional annotations, and simulation parameters embedded in formats like HDF5 — is invisible to these tools. Ideally, storage systems would support semantic search natively, but scientific metadata is highly distributed across cluster nodes, and no benchmark exists to evaluate which search technique best serves agent-driven discovery in this setting. In this project, we benchmark four search backends — BM25 keyword matching, Elasticsearch (external BM25 over REST), Neo4j knowledge graph traversal, and Qdrant vector cosine similarity — integrated into CLIO, an optimized multi-tiered context storage system with support for filesystem semantics. Each backend is plugged into CLIO's Context Transfer Engine through a factory pattern, receiving identical LLM-generated summaries (Qwen 2.5 3B) extracted from HDF5 simulation metadata. This enables controlled comparison across keyword, graph, and semantic approaches. We evaluate on 273 GADGET-2 N-body astrophysics snapshots spanning 16 simulation configurations with 20 natural-language queries. Embedding BM25 directly in CLIO's I/O layer achieves 70% Top-1 accuracy at 0.047 ms per query — 350× faster than Elasticsearch running the same algorithm over REST (16.4 ms), isolating the speedup to transport elimination. Qdrant cosine similarity reaches 95% Top-1 by bridging synonym gaps that keyword methods miss. Neo4j graph traversal enables structured entity-relationship queries at 62% accuracy. On a 4-node distributed deployment, accuracy remains invariant while query latency stays below 2 ms. These results establish performance tradeoffs between keyword, semantic, and graph-based search for agent-driven scientific data discovery.

Author Bios:
Rajni Pawar is a Ph.D. student in Computer Science at Illinois Institute of Technology and an HPC software engineer at the Gnosis Research Center. Working under Xian-He Sun, her research focuses on distributed storage systems, high-performance computing, and scalable algorithm design.

Dr. Luke Logan is a Research Assistant Professor at Illinois Institute of Technology and an HPC software engineer at the Gnosis Research Center. Working under Xian-He Sun, his research focuses on distributed storage, operating systems, and data-intensive HPC and AI workloads.

Dr. Xian-He Sun is a University Distinguished Professor and the Ron Hochsprung Endowed Chair of Computer Science at Illinois Institute of Technology, and Director of the Gnosis Research Center. He is an IEEE Fellow specializing in parallel and distributed systems.

Dr. Anthony Kougkas is the Deputy Director of the Gnosis Research Center at Illinois Institute of Technology. His research focuses on extreme-scale distributed systems, addressing data management and I/O challenges for HPC, cloud, and AI workloads.

Toward Risk-Aware Bitcoin Forecasting with LLMs: Temporal Robustness and Internal Uncertainty Signals

Authors: Het Patel, Anver Kurmushiev, John Chumara, Loan Raicu

Institution(s): Illinois Institute of Technology

Room: Dillo

Board #: D7

Session #: 1

Abstract: Large language models (LLMs) have recently emerged as promising tools for time series forecasting by repurposing sequence modeling capabilities originally developed for natural language. In this work, we study CryptexLLM, an LLM-based framework for Bitcoin forecasting that adapts structured OHLCV time series into forms compatible with frozen language model backbones. Our goal is not only to evaluate predictive performance, but also to examine model stability and uncertainty in a highly volatile, non-stationary financial setting. First, we investigate temporal robustness by training CryptexLLM on different historical Bitcoin windows while keeping the same future evaluation period fixed. Using a reproducible pipeline with strict time-ordered splits, automated inference on a fixed 2022–2025 holdout window, and unified experiment tracking, we compare multiple fixed and expanding training windows. Results suggest that while some windows yield slightly better metrics, overall behavior remains broadly consistent across market eras. Second, we explore whether internal LLM signals such as hidden-state features and attention-derived representations can provide useful indicators of predictive uncertainty. We analyze correlations between these internal representations and downstream error metrics to assess whether they help identify unreliable forecasts before deployment. We further position these signals alongside traditional uncertainty estimation approaches in time series analysis to evaluate their complementary value. Taken together, our findings suggest that CryptexLLM is relatively stable across shifting historical regimes and that internal LLM dynamics may offer a promising path toward risk-aware forecasting. This work supports future development of filtering and risk-adjustment pipelines for decision-oriented financial prediction.

Author Bios:
Het Patel is an M.S. in Computer Science student at Illinois Institute of Technology with experience in machine learning, generative AI, and software engineering. His work spans LLM-based forecasting, transformer models, experiment pipelines, and full-stack application development. He is particularly interested in time-series forecasting, financial ML, and building practical AI systems for real-world problems.

John Chumara

Master’s student in Artificial Intelligence and member of the DataSys Lab at IIT, focusing on time-series forecasting with LLMs and analyzing hidden representations to improve risk-adjusted decision-making.

anver
Master’s student in Artificial Intelligence and member of the DataSys Lab at IIT, with research focused on LLM-based time-series forecasting and the study of hidden model representations for improving risk-aware decision-making.

loan raicu

Ioan Raicu is an Associate Professor of Computer Science at Illinois Institute of Technology, Associate Program Director for the Master of Data Science program, and leader of the DataSys Lab. His research focuses on distributed systems, data-intensive computing, and scalable system design for high-impact computing applications.

Establishing a Quantum Error Correction Testbed

Authors: Jack Lange, Christopher Zimmer, Travis Humble

Institution(s): Oak Ridge National Laboratory

Room: Dillo

Board #: D3

Session #: 1

Abstract: The Quantum Science Center is investigating the integration of Quantum Computing with traditional HPC systems in preparation for deploying a scientifically useful quantum computer by the end of the decade. One of the central challenges we are facing is the effective implementation of Quantum Error Correction (QEC) in order to achieve the Fault Tolerant Quantum Computing (FTQC) regime. At ORNL, we are deploying an ecosystem of QEC testbeds in order to explore the capabilities of HPC architectures to support the classical processing needed by the QEC control loop. These testbeds will be designed to support the integration of different quantum modalities, control systems, HPC accelerators, and QEC codes/algorithms in a flexible ecosystem that can be dynamically configured for exploratory experimentation. The goal of these testbeds is to provide an experimental environment that supports the collaboration between quantum and classical HPC vendors, Department of Energy scientists, and the academic research community, with the ability to develop cross-stack solutions incorporating advances that span algorithmic approaches to low-level hardware architectural modifications.

Author Bios:
Jack Lange is the System Architecture Team Leader in the Technology Integration Group within the National Center for Computational Science and Associate Professor of Computer Science at the University of Pittsburgh. He is currently leading the Quantum Controls and Error Correction effort in the Quantum-HPC Architectures Thrust in the US Department of Energy’s Quantum Science Center.

Christopher Zimmer is the Group Leader for the Technology Integration Group within the National Center for Computational Science. Christopher has served as a key technical lead in the deployment of the Frontier and upcoming Discovery Leadership Supercomputers in addition to leading the Quantum-HPC Architectures Thrust in the US Department of Energy’s Quantum Science Center.

Travis Humble is director of the US Department of Energy’s Quantum Science Center and a Distinguished Scientist at Oak Ridge National Laboratory. Travis also holds a joint faculty appointment with the University of Tennessee Bredesen Center for Interdisciplinary Research and Graduate Education.

FlowCompile: An Optimizing Compiler for Structured LLM Workflows

Authors: Junyan Li, Zhang-Wei Hong, Maohao Shen, Yang Zhang, Chuang Gan

Institution(s): University of Massachusetts Amherst, MIT-IBM Watson AI Lab, Massachusetts Institute of Technology

Room: Dillo

Board #: D5

Session #: 1

Abstract: Structured LLM workflows, in which specialized LLM sub-agents are executed according to a predefined execution graph, have emerged as a powerful abstraction for complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to achieve favorable accuracy–latency trade-offs under resource constraints, is challenging due to the combinatorial design space over model choices, reasoning budgets, and even the execution graph itself. Existing cost-aware approaches typically formulate this problem as routing: they learn a policy to select a single configuration per query that optimizes a predefined cost objective. In this work, we instead cast workflow optimization as a compilation problem. Analogous to machine learning compilers for neural network kernels, we aim to develop a systematic approach to analyze and optimize structured LLM workflows. We introduce FlowCompile, an optimizing compiler designed for structured LLM workflows that performs compile-time optimization over a unified workflow design space and produces a diverse set of optimized workflow configurations in a single compile-time pass, covering a wide range of accuracy–latency trade-offs. FlowCompile decomposes a workflow into sub-agents, profiles each component under different configurations, composes these local estimates into workflow-level accuracy and latency predictions, and efficiently identifies globally optimized configurations. Experiments across diverse workflows and challenging benchmarks demonstrate that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines by a large margin. Moreover, our results highlight key advantages of the compilation paradigm, including the possibility to combine with routing methods to further boost performance, as well as task transferability.

Author Bios:
Junyan Li: UMass Amherst 3rd year PhD student, working with Prof. Chuang Gan
Zhang-Wei Hong: Principal Investigator and Research Staff Member at the MIT-IBM Watson AI Lab
Maohao Shen: PhD student at EECS@MIT supervised by Prof. Gregory Wornell
Yang Zhang: Research scientist at MIT-IBM Watson AI Lab
Chuang Gan: Faculty member at UMass Amherst and research manager at MIT-IBM Watson AI Lab

How To Evaluate Compiler Diagnostics for AI Coding Agents

Authors: Akash Deo, Simone Campanoni, Tommy McMichen

Institution(s): Northwestern University

Room: Dillo

Board #: D6

Session #: 1

Abstract: AI coding agents get feedback from many tools: linters, build systems, compilers, profilers, and more. However, there are no methods to systematically measure the quality of this feedback. We introduce an experimental methodology that evaluates feedback for AI coding agents on two axes: descriptiveness (explaining what is wrong) and prescriptiveness (how to fix it). We apply this methodology to a case study on triggering compiler auto-vectorization. We find that while Clang's compiler feedback can help AI coding agents trigger auto-vectorization 3x more frequently, many remarks were not very descriptive. After we made Clang's compiler feedback more descriptive by adding specific data dependence remarks, we observed a 43\% improvement in how frequently AI coding agents triggered auto-vectorization. This demonstrates that decomposing feedback into descriptiveness and prescriptiveness helps systems researchers improve their tools.

Author Bios:
Akash Deo is a master's student at Northwestern University, advised by Tommy McMichen and Simone Campanoni. His research focuses on making compilers better collaborators for AI agents. His master's thesis work, Canary, evaluates how compiler diagnostics for auto-vectorization can help - and hinder - AI agents. Akash has published at the CoDAIM workshop and IEEE BSN, and previously worked on backend APIs at Capital One.

Rising Above the Relational vs Document Database Debate: A Unified Perspective on Data Modeling

Authors: Steve Tarzia

Institution(s): MongoDB, Inc.

Room: Dillo

Board #: D8

Session #: 1

Abstract: I'll discuss the performance implications of MongoDB's Document Model. Compared to Relational systems, a Document database provides much more opportunity for spatial locality of data. Savvy users can take advantage of this to achieve superior read performance for many workloads. However, the Relational Model has its own strengths, including: clearer semantics for query optimization, better support for data integrity constraints, and excellent single-write performance. I'll talk about how MongoDB is broadening support for relational-style workloads, and what the open research challenges are in a world where Document vs Relational is not a binary choice, but two ends of a continuous schema design space.

Author Bios:
Steve Tarzia is now a Sr. Staff engineer on the Query team at MongoDB and he was Director of its Query Optimization team for the past 3 years. Previously he was an Assistant Professor of Instruction in CS at Northwestern. His PhD research was in the mobile systems domain.

Cascade: A Distributed Agentic Framework for On-the-Fly Surrogate Learning in Scientific Simulations

Authors: Michael Tynes, Logan Ward, Kyle Chard, Ian Foster

Institution(s): University of Chicago, NVIDIA, Argonne National Laboratory

Room: Dillo

Board #: D11

Session #: 1

Abstract: Training machine learning (ML) surrogates for expensive subroutines in simulations is traditionally done up-front on large datasets, with fine-tuning often employed to specialize these surrogates on specific systems. Even with fine tuning, highly dynamic simulations often encounter system states which are outside of the training domain of ML surrogate for which the ML prediction will be highly uncertain and which can render the simulation results inaccurate. On-the-fly learning, which involves updating the ML surrogate when such states are encountered, is a promising but underexplored solution to this problem. We propose Cascade: a distributed agentic framework built with Academy, that enables on-the-fly training of ML simulation surrogates across distributed resources. In Cascade, specialized agents continue simulations until halted by an auditor agent if a control system identifies if the simulation has drifted too far from the training data. Additional agents produce new training data, train new surrogate models, and make them available to runner agents. We discuss the design considerations of Cascade and their impact on scalability. We also discuss results from a serial implementation which reduces error from 80% to 5% by carefully blending in new ML surrogates over time rather than naively swapping in newly trained surrogates. By separating simulation, auditing, and retraining into coordinated agents, Cascade creates a testbed for evaluating strategies for training data drift-detection, data selection, and surrogate integration. Cascade can be distributed across heterogeneous resources to account for the varying hardware demands of running simulations and training surrogate models, aiming for efficient HPC resource utilization.

Author Bios:
Michael Tynes is a PhD student in Computer Science at the University of Chicago, advised by Ian Foster and Kyle Chard. His current research focuses on developing methods for training and deploying machine learning surrogates for use in scientific simulations, including methods for on-the-fly learning to meet simulation error tolerances and highly parallelizable methods for sampling training data for multi-scale surrogate models.

Logan is a Senior Application Engineer at NVIDIA. He currently works on a GenAI for Science team on developing machine learning tools for designing new materials and chemicals on leadership-class supercomputing systems.

Kyle Chard is a Research Associate Professor in the Department of Computer Science at the University of Chicago, with a joint appointment at Argonne National Laboratory. Together with Ian Foster, he co-leads the Globus Labs research group, focusing on distributed systems, data-intensive computing, and research data management.

Ian T. Foster is Senior Scientist and Distinguished Fellow at Argonne National Laboratory and the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago. His research focuses on distributed computing.