Research

1) Philosophy of Cognitive Science

My dissertation consists of three chapters that were developed as individual papers yet all converge towards a single goal: to establish a philosophy of cognitive science for large language models (LLMs). Despite the rapidly shifting landscape of artificial intelligence, I seek to anchor the understanding of these systems in conceptual machinery that has stood the test of time and is likely to remain useful regardless of how this technology evolves.

Towards that end, each chapter offers an application of a classical framework from philosophy of science and/or the cognitive sciences to LLMs. Specifically:

The first chapter “Dennettian Stances and Large Language Models” works through the details of applying Daniel Dennett’s stances to LLMs with a focus on the interplay between the intentional and the design stance as a way to make explanatory progress for prototypical LLM behaviors. The second chapter “Three Levels for Large Language Model Cognition” adapts David Marr’s three-level analysis of a complex system to LLMs aiming to elucidate the kinds of explanations that would be sufficient and necessary for understanding these systems. The third chapter “The Complex Correspondence Between Behavior and Algorithm” takes a mechanistic modeling perspective to address the challenges in describing the relationship between the level of observable behaviors of a system and their algorithmic representations.

These three frameworks are complementary to one another. They all seek to answer one question: what do we need to know to be able to explain and predict LLM behavior? In that sense, they constitute the theoretical groundwork for a science of LLM cognition and sketch out a series of relevant concerns that are orthogonal to how the field of artificial intelligence evolves. This is because they purport to extract the methodological principles that pertain to the philosophy of science of LLM cognition, rather than commit to, or assert claims about LLMs as minds. Consequently, this is not a philosophy of mind project; it is a systematic effort in philosophy of science to advance the puzzle-solving brought by the deep learning revolution.

2) AI Alignment

I have worked on projects related to evaluating model capabilities and studying system internals. I am especially interested in modeling safety-relevant capabilities such as strategic deception and co-authored A Problem to Solve Before Building a Deception Detector.

3) AI Governance

As a GovAI Winter Fellow, I worked on policy for safely automating AI R&D and relevant considerations for automating aspects of AI alignment research.

I maintain a Substack where I share shorter, non-academic versions of my research.

 

Recent Papers