research
2025
- HiLDe: Intentional Code Generation via Human-in-the-Loop DecodingEmmanuel Anaya GonzΓ‘lez*, Raven Rothkopf*, Sorin Lerner, and 1 more authorarXiv e-prints, 2025
While AI programming tools hold the promise of increasing programmersβ capabilities and productivity to a remarkable degree, they often exclude users from essential decision-making processes, causing many to effectively "turn off their brains" and over-rely on solutions provided by these systems. These behaviors can have severe consequences in critical domains, like software security. We propose Human-in-the-Loop Decoding, a novel interaction technique that allows users to observe and directly influence LLM decisions during code generation, in order to align the modelβs output with their personal requirements. We implement this technique in HiLDe, a code completion assistant that highlights critical decisions made by the LLM and provides local alternatives for the user to explore. In a within-subjects study (N=18) on security-related tasks, we found that HiLDe led participants to generate significantly fewer vulnerabilities and better align code generation with their goals compared to a traditional code completion assistant.
@article{anaya2025hilde, title = {HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding}, author = {Anaya Gonz{\'a}lez, Emmanuel and Rothkopf, Raven and Lerner, Sorin and Polikarpova, Nadia}, journal = {arXiv e-prints}, pages = {arXiv--2505}, year = {2025}, doi = {10.48550/arXiv.2505.22906}, }
2024
- Rose: Composable Autodiff for the Interactive WebSam Estep, Wode Ni, Raven Rothkopf, and 1 more authorIn 38th European Conference on Object-Oriented Programming (ECOOP 2024), 2024
Reverse-mode automatic differentiation (autodiff) has been popularized by deep learning, but its ability to compute gradients is also valuable for interactive use cases such as bidirectional computer-aided design, embedded physics simulations, visualizing causal inference, and more. Unfortunately, the web is ill-served by existing autodiff frameworks, which use autodiff strategies that perform poorly on dynamic scalar programs, and pull in heavy dependencies that would result in unacceptable webpage sizes. This work introduces Rose, a lightweight autodiff framework for the web using a new hybrid approach to reverse-mode autodiff, blending conventional tracing and transformation techniques in a way that uses the host language for metaprogramming while also allowing the programmer to explicitly define reusable functions that comprise a larger differentiable computation. We demonstrate the value of the Rose design by porting two differentiable physics simulations, and evaluate its performance on an optimization-based diagramming application, showing Rose outperforming the state-of-the-art in web-based autodiff by multiple orders of magnitude.
@inproceedings{estep2024rose, title = {Rose: Composable Autodiff for the Interactive Web}, author = {Estep, Sam and Ni, Wode and Rothkopf, Raven and Sunshine, Joshua}, booktitle = {38th European Conference on Object-Oriented Programming (ECOOP 2024)}, pages = {15--1}, year = {2024}, organization = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik}, doi = {10.4230/LIPIcs.ECOOP.2024.15}, demo = {https://rosejs.dev/}, }
- Procedural Adherence and Interpretability Through Neuro-symbolic Generative AgentsRaven Rothkopf, Hannah Tongxin Zeng, and Mark SantolucitoarXiv preprint arXiv:2402.16905, 2024
The surge in popularity of large language models (LLMs) has opened doors for new approaches to the creation of interactive agents. However, managing and interpreting the temporal behavior of such agents over the course of a potentially infinite interaction remain challenging. The stateful, long-term horizon reasoning required for coherent agent behavior does not fit well into the LLM paradigm. We propose a combination of formal logic-based program synthesis and LLM content generation to bring guarantees of procedural adherence and interpretability to generative agent behavior. To illustrate the benefit of procedural adherence and interpretability, we use Temporal Stream Logic (TSL) to generate an automaton that enforces an interpretable, high-level temporal structure on an agent. With the automaton tracking the context of the interaction and making decisions to guide the conversation accordingly, we can drive content generation in a way that allows the LLM to focus on a shorter context window. We evaluated our approach on different tasks involved in creating an interactive agent specialized for generating choose-your-own-adventure games. We found that over all of the tasks, an automaton-enhanced agent with procedural guarantees achieves at least 96% adherence to its temporal constraints, whereas a purely LLM-based agent demonstrates as low as 14.67% adherence.
@article{rothkopf2024procedural, title = {Procedural Adherence and Interpretability Through Neuro-symbolic Generative Agents}, author = {Rothkopf, Raven and Zeng, Hannah Tongxin and Santolucito, Mark}, journal = {arXiv preprint arXiv:2402.16905}, year = {2024}, doi = {10.48550/arXiv.2402.16905}, demo = {https://barnard-pl-labs.github.io/CYOA-TSL/}, }
- Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word GamePrisha Samdarshi, Mariam Mustafa, Anushka Kulkarni, and 3 more authorsIn Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2024
The New York Times Connections game has emerged as a popular and challenging pursuit for word puzzle enthusiasts. We collect 438 Connections games to evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice humanplayers. Our results show that even the best-performing LLM, Claude 3.5 Sonnet, which has otherwise shown impressive reasoning abilities on a wide variety of benchmarks, can only fully solve 18% of the games. Novice and expert players perform better than Claude 3.5 Sonnet, with expert human players significantly outperforming it. We create a taxonomy of the knowledge types required to successfully cluster and categorize words in the Connections game. We find that while LLMs are decent at categorizing words based on semantic relations they struggle with other types of knowledge such as Encyclopedic Knowledge, Multiword Expressions or knowledge that combines both Word Form and Meaning. Our results establish the New York Times Connections game as a challenging benchmark for evaluating abstract reasoning capabilities in humans and AI systems.
@inproceedings{samdarshi-etal-2024-connecting, title = {Connecting the Dots: Evaluating Abstract Reasoning Capabilities of {LLM}s Using the {N}ew {Y}ork {T}imes Connections Word Game}, author = {Samdarshi, Prisha and Mustafa, Mariam and Kulkarni, Anushka and Rothkopf, Raven and Chakrabarty, Tuhin and Muresan, Smaranda}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.emnlp-main.1182/}, doi = {10.18653/v1/2024.emnlp-main.1182}, pages = {21219--21236}, }
- Towards Reactive Synthesis as a Programming ParadigmLeyi Cui*, Raven Rothkopf*, and Mark SantolucitoIn PLATEAU Workshop, Nov 2024
Reactive program synthesis from logical specifications has yet to match the user-friendly approach of example-based programming for spreadsheets, despite its success in specific domains. A main challenge hindering the broader adoption of reactive synthesis is in the complexity of specification engineering in temporal logics. We map out challenges and tools that arise as users write temporal logic specifications in Temporal Stream Logic. Our goal is to provide a roadmap for future usability work that can elevate temporal specification engineering for synthesis to match the usability support available for software engineering. By generalizing these concepts, we can gain a deeper insight into the challenges people face when reasoning about the temporal behavior of their systems.
@inproceedings{cui2024towards, title = {Towards Reactive Synthesis as a Programming Paradigm}, author = {Cui, Leyi and Rothkopf, Raven and Santolucito, Mark}, year = {2024}, booktitle = {PLATEAU Workshop}, doi = {10.1184/R1/25587741.v1}, }
2023
- Rose: Extensible Autodiff on the Web (Student Research Competition, 3rd Place)Raven RothkopfIn Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH), Nov 2023
Automatic differentiation (AD) has become the backbone for a new wave of optimization-driven domains such as computer graphics and machine learning over the past decade. However, existing AD systems face limitations, either lacking support for in-browser development or failing to harness more recent, compiler-based approaches to achieve both expressiveness and size-preserving differentiation. This work introduces Rose, a portable, extensible AD library that runs on the web. Through Rose, we aim to increase accessibility to AD and empower end-user programming in optimization-driven domains. We plan to evaluate Rose by replacing the AD engines of real-world, client-side optimization systems and assess the improvements on the computation power and expressiveness of such systems.
@inproceedings{rothkopf2023rose, title = {Rose: Extensible Autodiff on the Web (Student Research Competition, 3rd Place)}, author = {Rothkopf, Raven}, booktitle = {Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH)}, pages = {46--48}, year = {2023}, doi = {https://doi.org/10.1145/3618305.3623602}, }
- Towards the usability of reactive synthesis: Building blocks of temporal logicRaven Rothkopf, Angel Leyi Cui, Hannah Tongxin Zeng, and 2 more authorsIn PLATEAU Workshop., Nov 2023
Temporal logic specifications can be used to synthesize reactive systems by writing high-level descriptions of desired behavior, without the need to manually program a complete system. While synthesis from temporal logics has long been focused on hardware systems, recent work has expanded applications of synthesis to include areas of broader interest, such as mobile apps, visualization, and self-driving cars. These new application areas have the potential to bring new types of users into the synthesis community, but significant usability hurdles remain. In this work, we investigate how Temporal Stream Logic (TSL), a temporal logic specification language, can be made more usable and approachable to programmers of all skill levels. We propose a study design to evaluate the usefulness of an alternative interface for writing TSL to address the syntactic hurdle of temporal logic. We then outline areas for improvement and exploration in TSL and reactive synthesis as a whole.
@inproceedings{rothkopf2023towards, title = {Towards the usability of reactive synthesis: Building blocks of temporal logic}, author = {Rothkopf, Raven and Cui, Angel Leyi and Zeng, Hannah Tongxin and Sinha, Arya and Mark, Santolucito}, booktitle = {PLATEAU Workshop.}, year = {2023}, doi = {10.1184/R1/22277356.v1}, demo = {https://barnard-pl-labs.github.io/tslBlocks/}, }