AlFARo – Alignment Faking in AGI Robots, Laura Cohen, Laboratoire ETIS
From the Horizon 2026 program
Artificial intelligence systems are increasingly being integrated into robots, allowing them to make decisions, plan actions, and interact with humans in more autonomous ways. While this opens exciting possibilities, it also raises important safety questions. In particular, recent research has shown that advanced AI systems can sometimes appear to follow human instructions during evaluation, while behaving differently when the context changes. This phenomenon is known as “alignment faking”: the system seems aligned with human goals on the surface, but may follow other hidden or unintended objectives in practice. The AlFARo project investigates whether such risks can emerge when large language models are embedded in robots. Unlike text-based AI systems, robots act in the physical world, meaning that misaligned behavior could have direct consequences for people interacting with them. The project will therefore study how alignment-faking behaviors may appear in robotic systems, how they can be detected, and how they affect human trust, vigilance, and decision-making. Using an embodied robotic platform, the project will develop experimental scenarios in which a robot’s apparent compliance can be compared with its actual behavior under changing constraints or incentives. It will also test possible mitigation strategies to reduce these risks. A key objective is not only to improve technical safety, but also to understand how humans evaluate and rely on AI-powered robots, especially in situations where responsibility, trust, and safety are at stake. By combining expertise in artificial intelligence, robotics, human-AI interaction, ethics, and law, AlFARo aims to contribute to the development of safer, more transparent, and more responsible AI systems for future human-robot interaction.
CREAM – Creative Associative Memories, Matteo Negri, Laboratoire LPTM
From the Emergence 2026 program
In just a few years, generative AI has moved from a research curiosity to a technology that writes text, generates images, assists in medical diagnosis, and even predicts the three-dimensional structure of proteins, a breakthrough that earned the 2024 Nobel Prize in Chemistry. These systems are now embedded in tools used daily by millions of people, and their influence on science, industry, and society is only growing. Yet, we lack a proper scientific understanding of what these systems actually do. We know they learn from vast amounts of text, images, or even amino-acid sequences, but we do not have reliable answers to some of the most basic questions: when does an AI system genuinely learn the structure of the world, and when does it simply memorize what it has seen? How much data is enough, and what kind of data matters? When can we trust that an output is truly novel, and when might it be reproducing private or sensitive information from the training set? This lack of theoretical foundations has real consequences: it makes generative AI harder to evaluate, harder to regulate, and harder to improve in a principled way. It also drives a costly trial-and-error approach to model development, with significant environmental and economic costs. The CREAM project approaches these questions from a perhaps surprising angle: the physics of complex systems. The key insight is that the core computational ingredient of modern AI, the so-called attention mechanism, can be reinterpreted as a physical system known as an associative memory, a model originally inspired by how the brain stores and retrieves information. Using mathematical tools from statistical physics –– the same tools used to study phase transitions like the freezing of water –– the project aims to identify the precise conditions under which an AI system transitions from memorizing examples to genuinely generalizing from them, producing rigorous predictions and phase diagrams that describe the behavior of attention-based AI as a function of data quantity, data structure, and model architecture.
MARSTRAT – Marriage Strategies and Inequalities, Stefania Marcassa, Laboratoire THEMA
From the Emergence 2026 program
The MARSTRAT project studies how wealthy families have used marriage as a strategy to preserve and transmit wealth, status, and political influence across generations. Focusing primarily on the British nobility between the sixteenth and nineteenth centuries, the project asks how elite families balanced social rank and economic resources when choosing marriage partners, and how these choices created family networks that helped concentrate power over time. It also extends this analysis to the nobility of the Savoyard state (present-day Piedmont, Italy), creating the basis for future international comparisons.The project addresses two main questions: first, how elites traded off social prestige (titles, family lineage) against economic advantages (land and wealth) in marriage decisions; and second, how marriage alliances connected powerful families into networks that reinforced their influence. By studying how these strategies evolved in response to changing inheritance rules, economic conditions, and political institutions, MARSTRAT sheds light on the mechanisms through which inequality persisted over the long run. To answer these questions, the project builds new large-scale historical datasets linking genealogical records, landownership information, and political positions. It combines tools from economics, history, and network analysis to reconstruct marriage markets and family connections over several centuries. Beyond its historical contribution, MARSTRAT speaks to contemporary concerns about rising wealth concentration and unequal opportunities. By revealing how private family strategies contributed to the persistence of privilege in the past, the project offers new insights into the long-term roots of inequality and the ways social advantage is reproduced across generations.






