Malak Mahdy - Research

I have extensive research experience through three NSF REUs and multiple assistantships, specializing in agentic AI.
I bring strong experience both in designing research projects from the ground up and in joining ongoing efforts where I can advance it immediately and effectively.

Agentic AI AI Applications Multi-Agent Systems Optimization HCI Machine Learning Ethics & Security

Towards Capable and Secure Autonomous Computer Use Agents - Fall '24 to Present

Conducted as an NSF REU research scholar under the mentorship and guidance of Dr. Carlos Rubio-Medrano.

In this project, funded by two cycles of the NSF Computing Alliance of Hispanic-Serving Institutions REU program, I investigated autonomous computer-use agents (ACUAs) — systems powered by large language models that can operate a computer end-to-end. Unlike traditional chatbots, ACUAs navigate interfaces, execute tasks, and make independent decisions, raising important questions about their reliability and security.

I designed and introduced one of the first systematic evaluation frameworks for ACUAs, testing agents from OpenAI, Anthropic, and open-source projects across five task domains of increasing complexity, adapting principles of an HCI IBM UI/UX quantitative assesssment to measure complexity. The study identified two classes of agents, full computer access and browser-based agents.

Performance was measured with a seven-factor rubric assessing accuracy, adaptability, efficiency, robustness, security, relevance, and consistency. Quantitative data such as completion rates, time, failed interactions, and remediation percentages were collected for evaluation.

Findings revealed significant limitations: full computer-access agents often failed due to hallucinations, navigation errors, and unauthorized system changes, while browser-based agents achieved higher success rates but still showed vulnerabilities to prompt injection and inconsistent security awareness.

These results currently guide the development of an ACUA, integrating multi-agent orchestration, machine learning, RAG, and security frameworks including access control and prompt verification. Potential approaches for addressing LLM limitations, including chain-of-thought (CoT) reasoning, are under investigation to improve decision making.

This work has been recognized with multiple honors, including a GMiS Student Poster Scholarship, an NSF Louis Stokes Alliance for Minority Participation (LSAMP) Scholarship, a second NSF CAHSI REU award, and a scholarship with second place at the international WiCyS Student Poster Competition.

DEVELOPING AND OPTIMIZING LLM PIPELINES FOR SMART CITY SAFETY ANALYSIS - Summer '25 to Present

Conducted as an NSF REU research scholar and research assistant, in the team of Dr. Jee Woong Park of UNLV, working in the projects of Unmesa Ray and Niloy Das.

In this project, funded by the NSF Smart Cities Research Experience for Undergraduates program, I investigated large language model optimization techniques for analyzing unstructured construction safety data to enhance urban infrastructure development. Unlike traditional data analysis approaches, LLM-pipeline optimization enables automated extraction of critical insights from accident reports, raising important questions about scalability and real-world implementation in smart city frameworks.

Project I: As the computer scientist in a team of civil engineers, I developed novel LLM-pipeline optimization methods to analyze construction accident reports, creating a prototype that processes unstructured safety data and extracts meaningful insights. My work included researching optimized retrieval-augmented generation (RAG) and machine learning pipelines, and evaluating strategies to improve efficiency while enhancing narrative tone extraction.

Project II: I collaborated on NDA research focused on machine learning for system analysis, collaborating on techniques designed to enhance LLM performance and improve the accuracy of pattern extraction. My role was to work on implementing the AI aspects of the experiments, as the only computer science on board.

While I cannot share specifics or findings prior to publication, I have been invited to continue collaborating as a research assistant. Project I has evolved quickly, and I now lead the LLM-driven analytical methods. In addition, I have contributed to manuscript development and gained valuable experience quickly integrating into a new interdisciplinary team.