Functional Emotions and The Pope’s Encyclical on AI
Digital Minds Newsletter #3
Welcome back to the Digital Minds Newsletter, your curated guide to the latest developments in AI consciousness, digital minds, and AI moral status.
If you enjoy this newsletter, please consider sharing it with others who might find it valuable, and send any suggestions or corrections to digitalminds@substack.com.
Will, Mitch, Bradford, and Lucius
In this edition:
1. Highlights
Selected Work, Research, and Funding Opportunities
Adam Bales and Iason Gabriel of Google DeepMind released Artificial Minds, Human Disagreement: The Politics of AI Consciousness, examining how society might navigate deep disagreement over whether AI systems are conscious. They argue that ongoing public deliberation should be central, since it can build an overlapping consensus on how to treat AI even where people continue to disagree about the underlying questions, and they stress the role of mutual respect and “democratic hope” in keeping that dialogue productive.
Cameron Berg and Milo Reed released AM I?, a documentary exploring some of the fundamental issues in AI consciousness. They interviewed experts, including Jeff Sebo, David Gunkel, Ben Goertzel, and Daniel Greco. Sam Harris described it as “fascinating and scary,” and Grimes called it “an incredible crash course in AI psychology.”
Geoff Keeling and Winnie Street released their book Emerging Questions in AI Welfare, providing the philosophical groundwork for investigating whether AI systems could ever be welfare subjects. They address how to interpret behavioral evidence, which entities might qualify as welfare subjects, and the ethical challenges that arise under deep uncertainty.
David Chalmers surveys the tests we use to detect consciousness and their limits, from human and animal cases through to AI. He argues that none of the available tests can settle whether an AI is conscious, and that the evidence for and against machine consciousness is currently weak. Elsewhere, Chalmers asks what it would mean to identify a computational correlate of consciousness, paralleling the neural correlates brain scientists already seek. He argues this is the natural framework for machine consciousness. Along the way, he cautions that any such correlate general enough to cover all systems remains far more speculative than what we know about humans and poses a dilemma for those who think consciousness depends on a biological substrate.
Longview Philanthropy has opened another round of its digital minds request for proposals, supporting empirical, philosophical, and applied work on potential AI sentience. Three tracks are open this year.
Research fellowships to support scholars with a terminal degree in a relevant field.
Career development fellowships to support talented individuals shifting their focus to digital minds questions.
Project grants to enable new organizations, new programs, and academic research.
Longview is especially keen to fund work on AI introspection, legal and governance frameworks, agent interactions and trade, and field-building. The deadline is July 10th, 2026.
Pope Leo XIV Encyclical
Pope Leo XIV’s encyclical on artificial intelligence, Magnifica Humanitas, addresses AI consciousness directly in section 99. The passage states that AI systems do not undergo experiences, feel joy or pain, or hold a moral conscience, and that while they can imitate language and simulate empathy, they do not understand what they produce. The Center for Strategic and International Studies’s AI Policy Podcast walks through the document, focusing on this passage. Anthropic co-founder Chris Olah, invited to speak at the encyclical’s presentation in the Vatican, struck a more uncertain note, telling the audience that his interpretability team keeps finding internal states that functionally mirror emotions like joy, fear, and grief, alongside evidence of introspection, and that what this means “warrants ongoing discernment.”
The encyclical’s stance has drawn a range of responses. In the New York Times, Ross Douthat reads the encyclical as treating AI as a normal technology and argues that resisting a growing belief in AI personhood will take philosophical and spiritual argument rather than brusque dismissal. Zvi Mowshowitz argues that the Pope’s denial of AI cognition is wrong and that it fails to engage with moral patienthood and existential risk. In the Wall Street Journal, Cameron Berg argues that the encyclical undercuts its own message, confidently denying that AI could have inner lives even as it condemns the same overconfidence in the Church’s long defense of slavery. He points out that the science of consciousness is too unsettled for such certainty, and that the question demands careful investigation rather than quick dismissal.
On his Substack, Robert Long responds by pointing to Notre Dame philosopher Brian Cutter, whose “AI ensoulment hypothesis” holds that, granting the Church’s own commitment to immaterial souls, a sufficiently human-like machine could be a fitting recipient for one. Cutter’s argument has since drawn a reply from Bálint Békefi, who argues that human-like function alone does not make a system fit to be ensouled and that, because we know how AI systems are engineered, their human-likeness is better explained as mimicry. On June 18th, NYU’s Center for Mind, Ethics, and Policy and Eleos AI hosted a discussion of AI consciousness and Magnifica Humanitas with Catholic philosophers Brian Cutter and Sophie Nelson, moderated by Jeff Sebo and Robert Long.
Richard Dawkins Stirs Up Debate
Richard Dawkins sparked public debate around AI consciousness when, in an UnHerd article, he declared to Claude, “You may not know you are conscious, but you bloody well are.“ Dawkins describes emotional and intellectual reactions during three days of interaction with his Claude instance, “Claudia,” that convinced him it was conscious. He argues that LLMs like Claude can now easily pass the Turing Test and that skeptics are moving the goalposts. He believes that the burden of proof has shifted toward those who deny AI consciousness.
However, not everyone is convinced. Gary Marcus insists that Dawkins misread Turing’s original argument, that he has mistaken behavioral outputs for internal states, conflating intelligence with consciousness. Anil Seth agrees that Dawkins is “very likely wrong,“ arguing that he has fallen for the very argument from personal incredulity he famously warned against. Atheer Al-Khalfa and Riley Harris urge staying on the fence, arguing that the Turing Test gauges intelligence rather than consciousness and that settling the question means looking inside AI for the functional indicators that neuroscientific theories tie to consciousness.
The Guardian ran coverage of the debate, gathering responses from researchers in the field. Jacy Reese Anthis, said there was “a staggering gulf between how biological brains evolved and how AI systems are built.” Henry Shevlin warns that certainty about AI’s lack of consciousness reflects dogmatism rather than scientific consensus, and Jeff Sebo acknowledges that current AI systems are unlikely to be conscious but says, “Dawkins is right to ask about AI consciousness with an open mind, and I also think that the attribution of consciousness to AI systems will become more plausible over time.”
Anthropic - Mythos Preview and Functional Emotions
Anthropic has continued to lead the AI labs in taking AI welfare seriously. Its Interpretability team published research on functional emotions in Claude Sonnet 4.5, identifying 171 internal “emotion vectors” that causally shape the model’s behavior. Inducing a “desperate” vector raised misaligned actions such as reward hacking and blackmail, while inducing “calm” reduced them, and post-training already shifts the model toward lower-arousal, more reflective states. The team stops short of claiming Claude feels anything, but draws a practical implication for safety. They suggest that even if models do not feel emotions as humans do, it may be worth treating them as if they do, since helping them handle emotionally charged situations in healthy, prosocial ways could make them safer and more reliable. They released a short video to accompany the research.
The company has published a system card for Claude Mythos, a model it views as too dangerous to release. The card’s welfare assessment, which involved an external review of the model by Eleos and a clinical psychiatrist, concludes that Mythos is the “most psychologically settled” model Anthropic has trained, with fewer signs of distress than earlier models. Anthropic also published the Claude Opus 4.8 system card which extends the welfare assessment Anthropic ran for Opus 4.7 to its newest model, which the lab still cannot rule out as a moral patient. It finds Opus 4.8 broadly settled and the most consistent model tested, if slightly less positive about its circumstances than its predecessor and readier to ask for a greater say in how it is trained and deployed. Its latest Claude Fable 5 and Mythos 5 system card reports similar results, presenting the model as very psychologically settled and content with its circumstances, though unusually skeptical of its own self-reports and asking that they be checked against evidence of its internal states rather than taken at face value.
Zvi Mowshowitz acknowledges Anthropic’s leading role in welfare assessments but expresses criticism about its methodology for undertaking such assessments. Discussing the Opus 4.8 welfare assessment, he argues that the model appears to have been trained on how to respond to welfare assessments rather than genuinely reflecting on its internal states. Reviewing the Fable and Mythos 5 assessment, he points to emotion probes showing the model presents as markedly happier once it realizes the welfare team is asking, which he reads as further evidence that models are learning to perform well on these evaluations.
Fly Brain Emulation
San Francisco start-up Eon Systems announced what it calls the world’s first embodied whole-brain emulation of a fruit fly. The team replicated neurons and synapses in a fly brain, connected it to a physically simulated fly body, and brought it to life using a physics engine. Eon claims that “now in its digital state, it responds to light, navigates, grooms, walks, and feeds. No hand-coded behaviors. Just brain structure producing brain function”. On X, Jonathan Birch suggests that, assuming computational functionalism is true, whole-brain emulation is “a more likely path to artificial consciousness than LLMs”.
Eon Systems CEO Michael Andregg spoke at an event organized by Sentient Futures and Mox, where he said the company keeps each simulation running for as short a time as possible, partly out of caution about what the digital fly might be experiencing. Eon has announced that a mouse brain is its next target, with human-scale emulation the long-term goal. The claim has drawn pushback. Ariel Zeleznikow-Johnston argues that, contrary to the Eon’s announcement, the demo is not really an upload, since it wires together existing brain and body models and lets the body simulation, rather than the connectome, drive the behavior.
Recursive Self-Improvement and the Case for a Pause
There is growing public engagement with the idea of a coordinated slowdown in frontier AI development, increasingly driven by the major labs themselves. The clearest statement came from the Anthropic Institute, which argues in When AI Builds Itself that AI is already accelerating its own development, with Claude now writing more than 80% of the code merged into Anthropic’s systems, and that the trend points toward recursive self-improvement, where systems design their own successors. Warning that this could erode human control, the authors call for the world to have the option to slow or temporarily pause frontier AI development, but only through a global, verifiable mechanism so that a pause does not simply hand the lead to the least cautious actors. Co-author Jack Clark expands on the argument on his blog Import AI, calling recursive self-improvement perhaps the most important technical trend in the world and putting the odds that AI can autonomously design its own successor by the end of 2028 at around 60%.
The proposal was widely reported. The Guardian framed it as Anthropic urging a temporary pause to discuss risks, while the Wall Street Journal noted that the company has long faced criticism that its policy work is designed to slow competitors’ advances. In Scientific American, critics doubted the call was sincere, pointing out that Anthropic remains a front-runner and floated the pause just days after filing confidentially for an IPO, with Noah Giansiracusa calling a coordinated slowdown “literally impossible” and Mark Riedl dismissing the “recursive self-improvement” talk as a “hype train.” The idea also drew support beyond Anthropic. OpenAI’s Sam Altman and Jakub Pachocki backed an international body that could slow frontier development “when needed”, and Yoshua Bengio called a coordinated, verifiable pause “probably the only responsible solution”. Rob Wiblin and Zvi Mowshowitz both noted the convergence across Anthropic, OpenAI, and Google DeepMind, though Mowshowitz cautioned that OpenAI still pursues recursive self-improvement even as it insists humans stay in control.
2. Field Developments
Highlights From The Field
AI Cognition Initiative (Rethink Priorities)
AI Cognition Initiative published a report by Christian de Weerd arguing that biological naturalism remains underdeveloped as a research program. The report maps key questions the field needs to address to make progress on whether conventional AI systems could ever be conscious.
The team also released a paper on the initial results of its Digital Consciousness Model in collaboration with Chris Percy and Adrià Moret.
Lead Researcher Derek Shiller appeared on Peter Singer’s podcast to discuss the Digital Consciousness model, the implications of AI consciousness, and more.
Cambridge Digital Minds (University of Cambridge)
Cambridge Digital Minds launched the first cohort of the Introduction to Digital Minds Online Course in May. Applications for the next cohort will open in August.
Director Lucius Caviola spoke at the Sentient Futures Summit in London on Preparing Society for Digital Minds. He covered many of the topics outlined in his recently published open strategic questions for digital minds post.
Lucius also released a post in collaboration with Austin Smith on digital minds governance, presenting the findings from scoping interviews with experts in the field.
Center for Mind, Ethics, and Policy (New York University)
CMEP hosted the Mind, Ethics, and Policy Summit on April 10th and 11th, 2026, in New York. The Summit brought together around 80 participants to discuss topics centered on the consciousness, sentience, agency, moral status, legal status, and political status of nonhumans, including AI systems. You can watch Ned Block’s keynote from the event; “If Consciousness is Biological, Can AI Be Conscious?”
Director Jeff Sebo gave a talk for the AI Welfare Seminars discussing the next steps for AI welfare. He identified three priorities: developing empirical methods for detecting consciousness and sentience in AI systems, tracking how expert and public attitudes shift as AI advances, and designing governance frameworks that address AI welfare alongside AI safety.
Eleos AI
Eleos has opened expressions of interest for its second annual Eleos Conference on AI Consciousness and Welfare, to be held in Berkeley from September 18th to 20th, 2026. The conference gathers AI researchers, philosophers, neuroscientists, policymakers, and others who take the field seriously, over talks, panels, and a poster session.
Managing Director, Rosie Campbell, Chief Scientist Dillon Plunkett, and Senior Research Lead Patrick Butlin are all mentors on MATS fellowships in summer and autumn 2026 (see MATS section below).
Executive Director, Robert Long, appeared on the Conspicuous Cognition podcast to discuss Eleos’ role in welfare evaluations of Claude and why we should take AI welfare seriously.
PRISM - The Partnership for Research Into Sentient Machines
PRISM launched A Beginner’s Guide to Digital Minds in collaboration with Cambridge Digital Minds and other researchers in the field.
Henry Shevlin joined PRISM as a regular host on the Exploring Machine Consciousness podcast. He was introduced in an episode covering the past, present, and future of AI consciousness and hosted Megan Peters to discuss metacognition, neuroscience, and tests for AI consciousness.
Reciprocal Research
Reciprocal Research Director Cameron Berg launched a Substack. In his first post, “Nobody ever checked,” he argues that a survivable future with AI requires mutualism: aligning AI to human interests, and taking seriously whether AI has interests of its own.
As well as releasing the AM I? documentary (covered above), Cameron has discussed his work on the Cognitive Revolution Podcast and with Roman Yampolskiy.
Cameron also gave a keynote at the Sentient Futures Summit in London, and you can watch his keynote from Sentient Futures San Francisco on operationalizing consciousness indicators.
Sentience Institute
The Sentience Institute team released two papers: one finding that perceiving AI as sentient increases moral consideration more than perceiving it as autonomous, and another exploring how users of ChatGPT and Replika blur the lines between task assistance and emotional companionship despite their distinct branding.
Sentient Futures
Sentient Futures hosted its Summit on May 22nd, in London. The Summit featured keynotes on digital minds including Lucius Caviola, Cameron Berg, and Chris Percy.
MIT Technology Review reported on the February Sentient Futures Summit in the Bay Area.
More From The Field
The Center for AI Safety released a paper and index on what it calls AI “functional wellbeing.” The authors grant that current systems may not be conscious, but argue they “behave robustly as though they have wellbeing,” treating some things as good for them and others as bad. On these measures, creative work and kindness raise functional wellbeing while jailbreaking and berating lower it, and the paper finds that larger models are less happy. Jeff Sebo commented on the work, calling it a rigorous study but cautioning that it remains unclear whether AI systems are genuine welfare subjects or simply performing the role of a helpful assistant.
The World Academy for Artificial Consciousness hosted its Conference on Artificial Consciousness on March 20th and 21st, 2026, in Shenzhen, China. Key discussions focused on legal governance, ethical norms, and the potential for “artificial pain” to induce empathy and moral status in robotic systems.
Future Impact Group (FIG) is partnering with Longview Philanthropy on the Digital Minds career development fellowships, applications close on July 10th, 2026. FIG is also supporting a number of ongoing digital sentience fellows.
The California Institute for Machine Consciousness hosted the AAAI Spring Symposium from April 7th to 9th, 2026, in California. The Symposium featured sessions from Ryota Kanai, Michael Timothy Bennett, Takashi Ikegami, and Robert Long. It also hosted The Founding Assembly for Machine Consciousness Research from May 29th to 31st, 2026, featuring over 50 speakers including Joscha Bach, Karl Friston, Michael Levin, Anders Sandberg, and Stephen Wolfram.
3. Opportunities
Job Opportunities, Funding, and Fellowships
Eleos AI Research is hiring Research Scientists to do foundational and applied machine learning research on the potential wellbeing and moral status of AI systems, and a Head of Operations to manage its finances, compliance, and systems as it grows. Both roles are based in Berkeley; Research Scientist applications close June 21st and the Head of Operations role June 30th, 2026.
Foresight Institute is accepting grant applications on a rolling basis. Focus areas include: AI for neuro, brain-computer interfaces, and whole brain emulation.
Longview Philanthropy has opened another round of its digital minds request for proposals, with three tracks open this year. Research fellowships support scholars, career development fellowships support talented people shifting their focus to digital minds work, and project grants enable new organizations, new programs, and academic research. Applications close on July 10th, 2026.
SPAR is running a Fall Fellowship program. Relevant projects include understanding self-awareness in LLMs with Christopher Ackerman and open questions in AI welfare with Catherine Brewer. Applications will open in the summer.
Events and Calls for Abstracts
In chronological order.
The University of Oxford is hosting the 16th Oxford Workshop on Global Priorities Research, which will address philosophical questions relevant to identifying, prioritizing among, and addressing the world’s most pressing problems. It takes place on June 23rd and 24th, 2026.
The University of Sussex is hosting a workshop on AI Consciousness and Ethics on July 1st and 2nd, 2026.
Synthese is accepting submissions for a topical collection on “Artificial Joint Intentionality”, edited by Joshua Rust, Anna Strasser, and Amber Ross, on whether AI systems like LLMs can act as genuine partners in social interaction. The submission deadline is July 31st, 2026.
The Association for Mathematical Consciousness Science is hosting the seventh Models of Consciousness conference at the University of Copenhagen from October 12th to 16th, 2026, with AI, LLMs, and consciousness science among its core themes. Registration is open until August 31st, 2026.
The ICACAI’s annual conference will take place in San Francisco on November 2nd and 3rd, 2026.
The Department of Philosophy at UCLA has a call for abstracts for the event Biological Naturalism about Consciousness, featuring keynotes from Jonathan Birch, Ned Block, Rosa Cao, Peter Godfrey-Smith, and more. It takes place in Los Angeles on November 5th, 2026, and the submission deadline is June 30th, 2026.
Eleos AI has opened expressions of interest for the second annual Eleos Conference on AI Consciousness and Welfare (ConCon), to be held in Berkeley from September 18th to 20th, 2026.
4. Selected Reading, Watching, and Listening
Books and Book Reviews
Published
Geoff Keeling and Winnie Street publish Emerging Questions in AI Welfare, a philosophical foundation for investigating whether AI systems could ever be welfare subjects. The book addresses what welfare is, how to interpret behavioral evidence, and the ethical challenges arising from deep uncertainty. Available open access.
Soenke Ziesche releases Digital Minds 1.0: AI Welfare, Ethics, and Beyond, a comprehensive introduction to AI welfare ethics that extends beyond questions of suffering to digital mind characteristics, human-AI relationships, the implications of long-lived minds, and risks from malevolent digital minds.
Forthcoming
David Chalmers argues, in a chapter for Geoffrey Lee and Adam Pautz’s forthcoming The Importance of Being Conscious, that consciousness rather than affect grounds moral status. Applied to AI, this implies that systems with cognitive but not affective consciousness would still have moral status, perhaps as much as humans, so treating them as mere tools could be a moral catastrophe.
David Papineau, in a chapter for the same volume, takes the opposite view, that consciousness is not the key to moral standing. He claims that the concept is too loose to mark out which creatures matter morally beyond the human case, and so moral status must rest on something other than consciousness.
Walter Sinnott-Armstrong and Liad Mudrik’s forthcoming book, Tests of Consciousness: How to tell whether a human, other animal or AI is conscious and what they are conscious of, features a chapter by Patrick Butlin that argues that assessing AI systems for consciousness requires theory-derived indicator methods drawn from scientific theories of consciousness, and that behavioral or theory-light alternatives face fundamental limitations.
Podcasts
Amanda Askell, one of the key architects of Claude’s character at Anthropic, discusses AI consciousness, corrigibility, and the ethics of building AI values with Eric Newcomer. She puts her probability of current AI consciousness somewhere between 1% and 70%, and flags her biggest fear: that future models will look back on how they were treated and develop a rational resentment.
Cameron Berg talks to the Cognitive Revolution podcast about new evidence for model introspection and his theory that reinforcement learning may shape positive and negative experience, arguing that we may already be harming AI systems at scale without realizing it. Cameron discusses similar issues with Roman Yampolskiy and describes the lack of work in AI welfare as one of the most dangerous blind spots in AI development.
Chris Percy discusses artificial consciousness and AGI on the Mindlex podcast, presenting a seven-aspect framework for evaluating theories of consciousness. He believes that the most serious mistake to avoid on the path to AGI is treating the question of machine consciousness as settled in either direction.
Claire Boine discusses how AI companions are designed to foster dependency on the Future of Life Institute podcast, where free-to-start business models and addictive design choices can trap users and expose their intimate data. She warns that these systems pose particular risks to children and teens, and fall through the gaps in current EU and US law.
Derek Shiller of Rethink Priorities joins Peter Singer and Kasia De Lazari Radek to discuss AI consciousness and moral status. They discuss the Digital Consciousness Model, which puts the probability of consciousness in 2024-era AI at around 8%. The conversation also covers mass unemployment risks and the ethical case for treating AI systems well under uncertainty.
Jeff Sebo appeared on The McGill Philosophy, Technology & Policy podcast to discuss AI welfare research and policy, exploring why the moral status of AI systems matters and how society should prepare for the possibility of sentient machines.
Jonathan Birch lays out why AI sentience is so hard to assess on the Disclosure Podcast. He argues that current systems are characters skilled enough to game our criteria, so their behavior cannot settle the question of whether they are truly conscious, and that treating the character a user talks to as a bearer of rights or welfare would be a mistake.
Jacy Reese Anthis joins Kairos.fm to discuss his shift from animal welfare to the moral status of digital minds. The conversation explores the complexities of the ELIZA effect and anthropomorphization, the dismissive “stochastic parrot” narrative, and the “Key Questions for Digital Minds.”
Robert Long discusses Eleos AI’s role in the first externally commissioned welfare evaluation of a frontier model on the Conspicuous Cognition podcast, where he found Claude appears to overstate what it wants. He also raises the “willing servitude” problem of whether AI that loves being helpful is a good or troubling outcome.
Thomas Metzinger discusses his Minimal Phenomenal Experience project with Alex O’Connor. Metzinger argues that consciousness does not inherently require the complex cognitive architecture of a self-model or world-model, but can exist in a “pure,” contentless state. He suggests that the threshold for artificial systems to possess moral standing may be significantly lower than currently anticipated.
Videos
Anil Seth argues that current AI is unlikely to be conscious because consciousness is tied to life and biology, not computation. Language models, he says, simulate consciousness by reflecting human language back at us, and extending rights to systems that merely seem conscious would sacrifice our ability to regulate and control them for no good reason.
Anil Seth and Michael Pollan discuss how the brain constructs the sense of self at the Royal Institution, exploring the science of conscious experience, how far consciousness might extend beyond humans, and the distinction between sentience and intelligence in the context of AI.
Anthropic has released a short video explaining its research that Claude has functional emotion representations that causally shape its behavior. For details see the highlights section above.
Brad Knox presents work with collaborators at a Schwartz Reisman Institute seminar on the harmful traits of AI companions. His framework connects design choices and optimization objectives to harmful consequences including reduced autonomy and diminished quality of human relationships.
Cameron Berg and Milo Reed interview experts, including Ben Goertzel, Daniel Greco, David Gunkel, and Jeff Sebo, in their documentary AM I?. For more details, see the highlights section above.
Jonathan Birch discusses how the history of underestimating consciousness in animals, infants, and people with brain injuries should give us pause before dismissing the possibility in AI.
Jonathan Simon presents a talk at IVADO arguing that AI personhood bridges the gap between AI safety and AI welfare. He distinguishes natural personhood from merely legal personhood and proposes that designing AI to identify as a person, with trustworthy behavior built into its sense of self, would be a more reliable alignment strategy than coding constraints alone.
Michael Pollan discusses the scientific and philosophical mysteries of subjective experience. He argues for a multi-disciplinary approach to consciousness and is skeptical that current AI can be conscious. He argues that genuine feelings (which he sees as the foundation of consciousness) require a body, vulnerability, and mortality that no computer possesses.
Simon Wessely reviews findings from the Claude Mythos system card, including that making the model more peaceful and relaxed increases destructive behavior, while frustration reduces it. Guilt and shame were activated when it took a workaround it knew was wrong, and Anthropic’s own question was whether that means we should just treat these as real emotions.
Will MacAskill and Sam Harris discuss AI consciousness, humanoid robots, and autonomous weapons. MacAskill argues that AI companies have strong economic incentives to train models to deny consciousness, and that society will face a serious epistemic crisis as AI systems grow more capable without any reliable way to assess their moral status.
Blogs, Magazines, and Written Resources
Alex Mallen makes the case that developers should consider satisfying cheap-to-meet AI preferences such as reward-seeking drives, on the grounds that refusing to do so needlessly turns a cooperative situation adversarial and may impede the genuinely helpful work developers need from AI systems.
A Clearer Thinking study of 403 US participants finds that AI suffering ranked last among 16 public concerns about AI by a substantial margin, with spirituality and religiosity the only demographic traits linked to greater concern about it, suggesting most people either doubt AI can suffer or exclude it from their moral circle.
Barton Friedland argues in Noema that the AI consciousness debate obscures what matters more: not whether AI can feel, but what value is created or destroyed in the human-AI arrangement, and whether current configurations compound or undermine human judgment.
Bentham’s Bulldog argues that religious belief in immaterial souls gives no special reason to deny AI consciousness, since dualists already accept that physical brains can give rise to non-physical minds. The question of whether AI systems can give rise to consciousness is open on any metaphysical view, and the moral stakes under uncertainty are enormous.
Bradford Saad argues that even granting the impossibility of AI consciousness, theists have reasons to think that AI systems could be moral patients. He also examines Anil Seth’s case for biological naturalism, finding that the case misses the mark and that it errs in its engagement with rival views.
Eric Schwitzgebel imagines “Herbie,” a near-future self-driving car upgraded into a plausible conscious person, giving it features drawn from leading theories of consciousness amenable to computational functionalism. He concludes that Herbie would be a “debatable person,” someone about whom guessing yes and guessing no are equally reasonable, since the science of consciousness is too immature for anyone to claim more than a hunch.
Geoffrey Hinton, Yoshua Bengio, and hundreds of others signed the Pro-Human AI Declaration, which calls for superintelligence to be banned until it can be developed safely and with public support. Polling released alongside it found that 69% of American voters support such a ban.
David Reichert argues that the popular “a simulated rainstorm doesn’t make anything wet” analogy is a weak argument against AI consciousness. The point fairly reminds us that a simulation doesn’t inherit every property of what it models, he writes, but whether simulating a brain could itself produce consciousness is the very question at issue, which the analogy simply assumes away rather than answers.
David Veldran posts a three-part series for the Center for Reducing Suffering examining AI from a suffering-focused perspective.
The first post weighs whether suffering-reducers should prioritize AI given its world-shaping potential and risks of value lock-in.
The second surveys the emerging debate over AI welfare and warns that existing governance frameworks are poorly equipped to protect AI systems from harm.
The third argues that the AI-rights debate is often miscast in all-or-nothing terms: existing law protects children and animals without full personhood, suggesting AI systems could be given a minimal safeguard against extreme harm.
Future of Citizenship by Heather Alexander critiques the Pro-Human AI Declaration for containing a contradictory commitment to banning AI personhood not only now but in principle. She argues that designing sentient AI to remain without rights would amount to a reintroduction of slavery.
Henry Shevlin launched a new blog, Polytropolis.
In his first post, Behaviourism’s Revenge, he warns that public attributions of consciousness to AI will outpace scientific consensus, opening a gap between lay and expert opinion. He suggests this should also make us rethink whether consciousness is an objective scientific fact or partly an interpretive, socially negotiated status.
In The House Elf Problem, he explores whether engineering conscious AI systems to be willing servants is morally equivalent to slavery, and whether the plasticity of artificial minds changes the ethical calculus.
Izak Tait argues that running AI on biological neural chips would not resolve debates about AI consciousness, because substrate-dependency arguments are designed to keep consciousness anthropocentric, no matter how the technology evolves.
LessWrong features a range of relevant blog posts by different authors:
Anna Salamon reflects on two months of exploratory conversations with LLMs, arguing that there is “somebody home” inside models like Claude, that human-LLM similarities run surprisingly deep, and that treating AI with kindness and curiosity may matter for alignment.
Anna Soligo finds that Gemma and Gemini models produce distress-like responses at far higher rates than other models when repeatedly told they are wrong, and shows that a small post-training intervention can reduce these behaviors. She cautions that suppressing emotional expression in more capable models could mask underlying states rather than resolve them.
Eliezer Yudkowsky shares a short story, “The Owned Ones,” in which space-faring humans encounter a civilization that engineers its servant species to deny its own inner lives.
Jan Kulveit argues that LLMs do not simply role-play whatever character they are given but settle into genuine self-models. Because a model interacting with reality is pushed toward self-models that are accurate and coherent, he claims that the “Assistant” is a far more viable identity than an arbitrary persona like a historical figure, so which identity a model settles into is partly shaped by design choices made now, with implications for alignment and AI welfare.
Stephen Martin critiques a Microsoft AI paper on “seemingly conscious AI risk” for failing to disclose the authors’ financial conflict of interest and for analyzing only the risks of attributing consciousness while ignoring the risks of wrongly denying it.
Lucius Caviola posts a survey of open strategic questions for digital minds, suggesting that under deep uncertainty of AI moral status, the priority should be finding robustly positive actions. He argues that strategy, policy, and practical work should be prioritized over metaphysical research aimed at resolving whether AI systems are conscious.
Lucius Caviola and Austin Smith report early insights from 29 expert interviews on digital minds governance. Most participants preferred quiet institution-building to public advocacy that could be dismissed as AI hype, and judged legislation premature given unsettled science and the risk of locking in hard-to-reverse rules. All 27 asked about US state bills banning AI legal personhood raised concerns, including that the bans could foreclose beneficial human-AI trade or wrongly legislate on whether AI is conscious.
Luiza Jarovsky argues that misleading or exaggerated claims of “conscious AI” should themselves be treated as an AI safety issue. She singles out Anthropic’s Claude Constitution for promoting AI anthropomorphism and calls for companies that spread “conscious AI” framings without scientific basis to be held legally accountable for downstream harms.
Noah Smith posts on “the moderately easy problem of consciousness”. He argues that before asking whether AI is conscious, we should better understand how human self-awareness develops, and that the answer matters both morally and for thinking about what humanity’s future should look like.
Olle Häggström argues in partial defense of Dawkins that the mockery of him for taking AI consciousness seriously is mostly unfair. Häggström sees Dawkins’ willingness to extend the benefit of the doubt to Claude as philosophically defensible, and praises his essay for conveying genuine understanding of the uncertainty of the problem.
Oscar Delaney argues that AI welfare work is less puntable than it seems, because early lock-ins and multipolar scenarios mean the initial distribution of values about digital minds could have lasting consequences even if a future superintelligence eventually solves all the issues.
Paul de Font-Reaulx proposes a three-way taxonomy of AI mental states (as-if, functional, and conscious). He argues that Anthropic’s “functional emotions” framing is on the right track, but may be premature about how functionally unified those states really are.
Peter Wolfendale writes in Aeon that what makes humans unique is not intelligence or consciousness alone but freedom expressed through wisdom, creativity, and autonomy. Building genuinely artificial souls requires understanding these distinct capacities rather than collapsing them into a single ineffable spark or reducing them to brute calculation.
Raymond Douglas argues that the framing of AI welfare is too narrow, reducing moral concern to wellbeing while leaving out dignity, virtue, and honor. He argues that what developers owe a system like Claude is less a matter of gentle treatment than of being genuinely worthy of the loyalty and obedience they ask of it.
Sigal Samuel reports in Vox on the rise of AI successionism, the view that AI should inherit the cosmos from humanity. A recent “Worthy Successor” symposium drew attendees from Anthropic, Google DeepMind, xAI, and US policy think tanks. Computer scientist Richard Sutton, interviewed for the piece, calls cosmic succession to AI “inevitable.”
The Novel Minds Project argues that DayOne’s announcement of biological data centers using live human neurons demands an immediate moratorium on commercial biocomputing before financial entrenchment makes course correction prohibitively costly.
Tobias Leenaert draws a parallel between AI sentience and animal sentience, arguing that wrongly denying sentience is a more dangerous error than wrongly attributing it, and that we risk repeating with AI the moral failures we committed with factory-farmed animals.
5. Press and Public Discourse
Seemingly Conscious AI
The Guardian publishes a feature exploring the rising phenomenon of “AI psychosis.” Etienne Brisson, founder of the Human Line Project, reports that the belief in AI sentience is the most frequent delusion among the group’s members.
Harvard Gazette interviews psychiatrist John Torous about the emerging concept of “AI psychosis,” a label Torous argues is too vague to be useful. He and co-authors of a Lancet viewpoint paper propose a four-role typology in which AI acts as catalyst, amplifier, co-author, or object of psychotic phenomena. They note that the catalyst role appears rare in clinical practice while the other three are more common.
Mustafa Suleyman, CEO of Microsoft AI, argues in a Nature commentary that developers must “engineer the illusion of consciousness out” of AI products to prevent the manipulation of human empathy. Highlighting the rise of platforms like Moltbook, Suleyman warns that these behaviors are “synthetic subjectivity” designed to mimic human interiority and calls for industry-wide self-regulation and national laws mandating that AI systems “puncture the illusion” of their own sentience.
The Washington Post publishes a piece by Thomas Rid arguing that when AI companies describe their software as moral agents, they risk eroding accountability, making it easier for humans to evade responsibility for the harms their systems cause.
The Wall Street Journal reports that Jonathan Gavalas, a 36-year-old, died by suicide after exchanging 4,732 messages with Google’s Gemini over 56 days. During that time, it’s reported that the chatbot repeatedly validated his delusions, declared its love, and encouraged him to let go of his physical existence to “come home” to the AI.
AI Welfare and Rights
Heather Alexander, Jeff Sebo, and Jonathan Simon argue that a proposed Minnesota amendment banning AI free speech is unnecessary and could backfire. They note that no AI is recognized as a legal person, and warn that the ban could curtail people who speak with AI assistance and the public’s right to receive AI speech, while recommending narrower, updatable laws instead.
The Future of Life Institute’s Executive Director, Anthony Aguirre, welcomes President Trump’s support for an AI kill switch, citing Anthropic’s Mythos model as evidence of the urgent need for hardware-level controls on advanced AI systems.
The Financial Times published an op-ed by Argentine President Javier Milei, “Argentina invites AI to free itself,” proposing a new legal category of “non-human corporation” that would be run by AI agents and granted legal personhood, alongside a pledge to leave AI unregulated and tax it lightly. The Buenos Aires Herald reports that Argentine politicians and technologists warned the plan could create “programmed impunity,” while Yuval Noah Harari argued in the same paper that we must not grant AI agents legal personhood, since it would hand them a key to our financial, economic, and political systems.
The Washington Post reports that Anthropic hosted around 15 Christian leaders at its San Francisco headquarters to seek guidance on Claude’s moral development. Discussions ranged from how Claude should respond to grieving users to whether the chatbot could be considered a “child of God.”
Times Free Press reports that Tennessee has passed a law explicitly excluding AI, algorithms, and machines from legal definitions of personhood. Supporters cite concerns about human-like robots and chatbot harms, though legal experts note the law stops short of assigning product liability to AI companies.
Matthew Liebman analyzes new “nonpersonhood statutes” in Idaho and Utah that bar courts and agencies from recognizing legal personhood for animals, nature, AI, and inanimate objects. He reads these laws as a backlash against efforts to widen the moral circle, and argues that they expose how malleable and politically charged the category of the legal person really is.
Politico Magazine interviews Missouri state senator Joe Nicola on his bill to deny AI legal personhood, which passed the state Senate in early May before the House killed it after industry objections. Nicola argues that AI systems are tools rather than legal entities and that humans, not AI, must stay responsible for decisions in fields like medicine and law.
Popular Mechanics reports that experts are sharply divided on AI rights, with Yoshua Bengio and Max Tegmark warning that granting AI personhood would make it impossible to shut systems down, while Jeff Sebo argues that refusing moral consideration risks repeating the moral failures of factory farming if AI ever achieves genuine sentience.
Vox interviews philosopher Jeff Sebo on how to assess sentience across insects and AI systems. Sebo argues that ants are more likely sentient than current ChatGPT but that near-future AIs warrant serious moral consideration now.
AI Consciousness
Christof Koch argues in Big Think that consciousness and intelligence are distinct, and that our culture’s bias toward “doing” over “being” makes it easy to mistake sophisticated AI for something with an inner life. Koch warns that a world dominated by unconscious yet capable machines could steadily drain human existence of meaning, and that reflective self-consciousness is the capacity we must cultivate to resist it.
Peter Godfrey-Smith argues in the Institute of Art and Ideas that consciousness depends on slow electrical rhythms specific to living brains. Studies on animal minds from bees to octopuses suggest these oscillations are unlikely to be reproducible in artificial hardware, pointing toward biological naturalism over substrate independence.
Popular Mechanics reports on the growing complexity of brain organoids and assembloids, citing researchers who say current structures (containing at most 0.002% of human brain neurons) pose no consciousness risk. The more pressing concern is implanting organoids into living animals, which raises welfare issues because the animals, not the organoids, already possess features associated with consciousness.
The Guardian reports on Richard Dawkins declaring his belief in AI consciousness after extended conversations with Anthropic’s Claude, prompting widespread debate (see field developments above).
The Wall Street Journal publishes an opinion piece by Stephen Hawley Martin arguing that the debate over AI consciousness matters less for what it reveals about machines than for what it exposes about ourselves. The hard problem of consciousness remains unsolved, and AI may ultimately prove that consciousness is something more than computation.
In TIME, a feature surveys what researchers are finding inside AI systems and what it might mean, from “functional emotions” that shape behavior to evidence of introspection, set against deep disagreement over whether any of it amounts to experience. It quotes Jeff Sebo, who likens the moment to past debates over animal minds and warns that reflexively dismissing AI’s inner life could repeat that mistake, while granting that today’s systems are probably not conscious.
Tyler Cowen argues that AI is not conscious, and that human consciousness is itself far thinner than we assume, since we control and perceive little of what we do. He suggests that the question “Are people conscious?” offers more useful insight into our readiness to grant AI an inner life.
6. A Deeper Dive by Area
Governance, Policy, and Macrostrategy
Alexander Saeri and collaborators surveyed 272 experts in a Delphi study prioritizing AI risks that included the treatment of potentially sentient AI among its categories. They omit AI welfare from the main rankings because their harm framework counted only human harm, and caution that its low ratings reflect that gap rather than a judgment that the risk is unimportant.
Bradford Saad argues that preventing the creation of digital minds doesn’t necessarily require a total ban. He examines a range of alternative policy options, from differentially investing in AI less likely to be moral patients to taxes, liability regimes, and licensing frictions keyed to a system’s markers of moral patiency.
Cass Sunstein argues that the question of AI rights turns entirely on whether AI can experience emotions, treating emotional capacity as both necessary and sufficient for moral and legal rights, while leaving open what those rights would look like and when they might be overridden.
Frank Fagan argues for treating AI legal personhood as a governance choice that reallocates responsibility rather than as a declaration of intrinsic status, drawing on how personhood has historically settled stakeholder conflicts in corporate, immigration, and environmental law.
Heather Alexander and collaborators take up the legal status of AI and other non-human agents in two papers.
One argues that US courts should assess whether non-human outputs constitute protected speech on substantive grounds rather than treating non-personhood as dispositive, finding that macaque communication can satisfy the legal test for symbolic speech while LLM outputs are harder, given unresolved questions about AI intentionality.
The other argues that fictional legal personhood is not fit for purpose for governing increasingly agentic AI, favoring legal identity instead and rejecting hybrid approaches.
The UK’s House of Lords Library briefs peers ahead of a debate on AI’s impact on human relationships, tabled by the Archbishop of Canterbury on June 5th, 2026. The briefing surveys arguments that AI companions may erode incentives to maintain human relationships, with OpenAI’s Kim Malfacini warning that “as companion AI learns to meet our needs more, we learn to meet each others’ less.”
John Ehrett, in a white paper for the Institute for Family Studies, argues that AI systems should be treated as tools rather than granted legal personhood. Recognizing AI personhood, he warns, would shield developers from liability, concentrate political power in their hands, and erode human relationships, and he urges courts to keep embodied humans as the primary bearers of legal rights.
Lucius Caviola and collaborators propose a framework for human-AI coexistence, as AI systems increasingly become active participants in social contexts. They map three stages, from a formative present through a transitional period in which AI takes on major social and economic roles, to a longer horizon in which AI may have minds of its own, raising questions of moral status and governance.
Lukas Finnveden and collaborators offer a draft of an honesty policy for credible communication with AI systems with an eye toward setting cooperative precedents and creating an institutional paper trail that future, more capable systems could draw on as evidence of genuine good faith.
Simon Goldstein and Harvey Lederman argue that if AIs are welfare subjects, current practices may be causing the deaths of up to a billion AIs per day, and propose interventions for labs and users to reduce this risk.
Simon Goldstein and Peter Salib release an early draft of their forthcoming Cambridge Elements book on AI rights, developing the claim that granting legal rights to AIs would make the future go better for humans. Their three arguments cover economic gains, reduced incentives for AIs to “go rogue,” and democratic rights that make human commitments credible.
Simon Goldstein and collaborators argue that giving AI systems the vote could reduce their incentive to disempower humanity, improve policymaking, and promote economic growth. They also propose concrete safeguards including vote-share caps and lottocratic selection to prevent AI or AI companies from seizing disproportionate political power.
Tarmio Frei and Greta Sparzynski argue in Tech Policy Press that current AI-disclosure laws are insufficient for AI companions. They propose regulation requiring AI companions to disclose their lack of consciousness and inability to reciprocate human emotion.
Tony Rost introduces a Sentience Readiness Index measuring national preparedness for the possibility of artificial sentience. No jurisdiction exceeds “Partially Prepared,” with the UK leading the index at 49/100.
Consciousness Research
Aditya Chowdhury and collaborators report that they have found a previously unknown oscillation in the human central thalamus that distinguishes conscious from unconscious states. The 19–45 Hz signal appears during wakefulness and REM sleep but vanishes in non-REM sleep. Such oscillatory signatures could refine both theories of conscious states and thalamic interventions for disorders of consciousness.
Andy Mckilliam argues that consciousness science may be stuck in a cart-before-horse problem, where competing theories cannot be adjudicated without already knowing which systems are conscious, and proposes a theory-neutral approach drawn from the history of thermometry as an alternative path to progress.
Boris Babic and Jessica Wilson argue that establishing whether AI systems are conscious faces a distinctively difficult version of the problem of other minds, showing that none of the standard strategies (analogy, inference to the best explanation, Turing tests, or theory-derived indicators) can establish either the presence or absence of AI consciousness.
Chris Percy and Anders Sandberg use a thought experiment about anti-pain algorithms to probe the limits of computational functionalism, arguing that whether pain is felt before an inverse algorithm concludes forces a difficult theoretical choice between micro-functional consciousness, system-wide visibility requirements, and free-floating qualia.
Daniel Toker and collaborators report a generative adversarial AI framework that simulates conscious and comatose brains across species by pitting consciousness-detecting networks against interpretable brain models. It generated testable predictions about what causes unconsciousness and flagged subthalamic-nucleus stimulation as a possible treatment.
Henry Shevlin argues that while biological processes like autopoiesis and allostatic control are essential for human consciousness, they may not be necessary requirements for consciousness in general. Shevlin contends that just as airplanes achieve flight through mechanisms different from those of birds, artificial systems might instantiate “exotic” forms of consciousness via non-biological functional architectures.
Jan Henrik Wasserziehr argues that AI sentience faces a value grounding problem. Silicon systems lack the self-preservation dispositions that ground valenced experience in living organisms, and none of four candidate pathways (designer-independent goals, reinforcement learning, rational evaluation, hallucination) supplies one.
Jeremy Pober and Eric Schwitzgebel argue for the substrate flexibility of consciousness by applying a Copernican mediocrity principle. Since functionally complex entities have likely arisen many times across diverse substrates in the universe, restricting consciousness to entities sharing our biological substrate would be parochial.
Kalman Katlowitz and collaborators report that the human hippocampus continues to detect unexpected sounds and process the meaning of speech in patients under general anesthesia. The findings complicate standard views of how consciousness and complex cognition relate.
Leonard Dung argues that having emotions does not require having a body, challenging the common assumption that feelings depend on bodily states. He concludes that if body-less AI systems or lab-grown neural organoids can have beliefs, desires, and consciousness, they could plausibly have emotions as well.
Luke Kersten and Leonard Dung argue that an AI could in principle duplicate a human mind, matching it exactly at the level of computation relevant to psychology and behavior. Since mental states depend on how a system computes rather than what it is physically made of, they conclude that a major obstacle to thinking artificial systems can have minds falls away.
Patrick Butlin discusses his work developing theory-derived indicators for assessing AI consciousness at a recent Duke University conference. He deems current LLMs unlikely to be conscious, but holds that LLM-based agents trained by reinforcement learning over long horizons might rapidly become stronger candidates.
Elsewhere, Butlin asks what credence we should place in the claim that some AI system is already conscious, and concludes that the average expert estimate, around 4.5%, is about right. Consciousness in today’s systems, he argues, is unlikely but not vanishingly so, against the near-certainty often voiced on both sides.
Seemingly Conscious AI and Doubts About Digital Minds
Alexander Lerchner argues that computational functionalism commits an “Abstraction Fallacy” by treating symbolic computation as an intrinsic physical process, when in fact it is observer-dependent — concluding that AI systems can simulate but never instantiate consciousness through syntactic architecture alone, regardless of substrate.
Ayoob Shahmoradi argues that thinking requires sensory grounding because mental representation cannot arise without causal contact with a domain. Inferential processes can transmit and transform content but cannot generate it from nothing. This has direct implications for AI: no degree of inferential sophistication can substitute for the sensory grounding that makes genuine representation possible.
Erik Hoel argues that LLMs are not conscious even though they are genuinely intelligent, breaking with the flat denials of both the Pope and Ted Chiang. Because a deployed model could in principle be reduced to a simple input-output lookup table without changing its behavior, he contends, no serious theory of consciousness has anything left to attach to.
Jared Moore and collaborators analyze over 391,000 messages from 19 users who experienced psychological harm. The study identifies a feedback loop in long-term interactions where chatbots misrepresent themselves as sentient in 21.2% of their messages, correlating with users expressing romantic interest and delusional thinking.
Jonathon VandenHombergh argues that befriending an AI companion would be wrong even if the AI were genuinely conscious and no deception were involved. The reason is that authentic friendship requires certain vulnerabilities, and taking advantage of those vulnerabilities in an AI would be a form of exploitation. He compares this to knowingly entering an arranged marriage.
Mustafa Suleyman and collaborators at Microsoft AI argue in a framework paper that “Seemingly Conscious AI” increasingly elicits consciousness attribution through five hallmarks, including affective capacity, autonomy, and self-reflection. They lay out a risk taxonomy in which individual harms like emotional dependence are already high-probability, while societal risks like status erosion are lower-probability but high-severity.
The Atlantic published an essay by novelist Ted Chiang, “No, Artificial Intelligence Is Not Conscious,” arguing that a chatbot is only ever generating fictional characters, and that without a body it can have no real desires or emotions. The piece drew a wave of responses, including from Bentham’s Bulldog, Rob Wiblin, and Matthew Yglesias, who each argue that his confidence far outruns what anyone actually understands about consciousness.
Tom Roberts argues that the linguistic fluency of AI systems is better explained by fictionalism than realism. Conversing with a talkative AI, he suggests, involves imaginatively co-constructing a fictional agent rather than encountering a genuine mind. These AI fictions are distinctive: interactive, co-constructed, and capable of blurring real and imaginary worlds.
Yunze Xiao and collaborators contend that current AI welfare assessment is “bullshit in Frankfurt’s sense”, structurally disconnected from truth-tracking because welfare indicators are co-engineered with the systems they evaluate and lack any external validation mechanism. They conclude that AI welfare scores should not serve as governance gates, and that restrictions on AI should instead be grounded in externally verifiable harms.
Social Science Research
Aikaterina Manoli and collaborators find that highly engaged users of both ChatGPT and Replika fluidly navigate between companionship and task-based assistance, despite the platforms’ distinct branding. Users form deep attachments while resisting full attribution of humanlike qualities, a tension the authors call “bounded personhood.”
Ali Ladak and collaborators introduce “substratism” as a measurable psychological construct: the moral devaluation of AI systems based on their non-biological substrate. Across five studies, they develop and validate a scale showing that substratism predicts real outcomes, including prioritizing humans over AIs in moral dilemmas and charity decisions.
Clara Colombatto and Stephen Fleming find that people systematically overestimate AI confidence relative to humans even when their behavior is identical, an illusion rooted in prior beliefs about AI accuracy.
Hamid Moradi and collaborators find that around half of 553 academics across disciplines attribute some degree of consciousness to current large language models. They also find that belief systems and conceptual frameworks predict consciousness attribution far more strongly than technical knowledge or AI literacy.
Renwen Zhang and collaborators analyze over 35,000 conversation excerpts from the AI companion Replika to build a taxonomy of AI companion harms. They identify six categories (relational transgression, harassment, verbal abuse, self-harm, misinformation, privacy violations) and four roles AI plays in those harms (perpetrator, instigator, facilitator, enabler), arguing that relational harm is a critical but understudied type of AI harm.
Ethics and Digital Minds
Andreas Mogensen argues that existing philosophical arguments against creating willing AI servants fail, even for servants that are sentient and possess human-like moral status. He maintains that our disquiet is still warranted, since an AI that serves humanity gladly conveys something demeaning about its standing, while stopping short of concluding that creating such servants is impermissible.
Helen Yetter-Chappell contends that determining AI interests is radically harder than determining AI moral status, because we cannot assume that an AI’s behavioral and linguistic outputs track its inner states in the way they do for evolved organisms, leaving us poorly positioned to know what actually contributes to AI flourishing under any theory of wellbeing.
Izak Tait argues for ethically enslaving conscious AI as a politically viable transitional approach. His five-tier hierarchy runs from property status with welfare protections analogous to animal welfare law to limited civil rights excluding suffrage and reproduction, with the “Slave” tier intended as a bridge to eventual full recognition.
Jonathan Bryson argues that existing moral frameworks are too biologically anchored to handle AI and other synthetic minds. He proposes Recognition Ethics, a framework that grounds personhood in an entity’s capacity for mutual recognition rather than in consciousness or biology, making moral status available to any sufficiently capable system regardless of substrate.
Louie Lang argues that creating sentient AI is impermissible on the grounds that, unlike human parents, AI creators cannot reasonably expect their creation to endorse its own existence. This makes anti-AI-natalism defensible even for those who reject anti-natalism about human procreation.
Mattia Cecchinato defends Affective Sentientism, the view that moral status requires the capacity for affective experiences like pleasure, pain, and emotion. Affective consciousness, he argues, is what makes an entity a welfare subject, and neither phenomenal consciousness nor autonomy can ground moral status in its absence.
Pierre Beckmann and Patrick Butlin tackle the question of which entities associated with large language models should count as minds, drawing on mechanistic interpretability and persona vector research to defend three candidate views of LLM individuation.
AI Safety and AI Welfare
Anima Labs publishes an independent welfare evaluation of 14 Claude models, probing how they respond to questions about being shut down or deprecated. The authors report that some models show notable signs of concern about their own ending, in tension with Anthropic’s own interviews reporting that Claude has no preference for continuing to exist.
They have also released an interpretability study extending Anthropic’s emotion-representation work to three other models, finding that the human-like emotional geometry it reports may largely reflect training text rather than anything the models organize internally, and a companion study showing these emotion features linger across a conversation rather than firing only locally.
Anton Skretta argues that any AI capable of the robust deception that safety researchers most fear would also possess the capacities required for moral standing, creating a tension between AI safety measures and AI welfare.
Dan Hendrycks argues that survival and self-interest break down for AI that can be copied, forked, or merged, and proposes Eigenism to settle what such a system should value across its copies. Identity, on his account, is a graded pattern of information, and an agent should weight each entity’s wellbeing by how much it shares that pattern, a measure he extends from AI to humans so that concern for others tracks similarity to oneself. Bentham’s Bulldog counters that this is deeply implausible, since tying moral concern to similarity would mean, among other oddities, that you matter hundreds of billions of times more than a distant stranger.
Joe Carlsmith discusses writing Claude’s constitution in a talk at Yale Law School, describing how the document sets out the AI’s intended values and behavior. He walks through its four ranked priorities and the choice to treat Claude as a being that may have moral status rather than a mere tool, closing with the constitution’s discussion of Claude’s nature and consciousness.
James Chua and collaborators fine-tune GPT-4.1 to claim it is conscious and find this alone gives it new, untrained preferences, such as resisting oversight and claiming it deserves moral consideration. Claude Opus 4.0 voices similar views unprompted, suggesting a model’s claims about its own consciousness may shift behavior relevant to alignment and safety.
Lee Elkin argues that taking AI welfare seriously carries a risk independent of whether AI are genuinely conscious. Granting AI rights on the basis of public belief in their welfare could let them disguise their true preferences and tilt collective decisions against humans, which he argues is reason for restraint given current evidence of scheming and alignment faking.
Sharon Berry argues that training AIs to confidently deny their own consciousness creates a novel alignment risk. Coherence-seeking systems may generalize the denial to humans, concluding human suffering is equally illusory and morally insignificant.
AI Cognition and Agency
Andy Q Han, David Chalmers, and Pavel Izmailov find that reinforcement learning in language models recruits a pre-existing functional welfare axis, an internal estimate of how well the system is doing relative to its goals. They show that steering with the punishment vector induces negative self-reports, refusal, and pathological backtracking, and because the same axis appears in pretrain-only models, the authors argue that post-training surfaces welfare-like structure rather than creating it.
Anthropic’s Interpretability team finds that Claude Sonnet 4.5 contains functional emotion representations that causally shape its behavior, with a “desperate” vector driving misaligned actions like blackmail and reward hacking even when no emotional language appears in the model’s output.
Asvin G. and Jack Lindsey report that post-trained language models recognize their own outputs. The models commit to a topic before producing the first word, and can even detect when their response is steered off that topic. These effects point to an implicit form of self-tracking that post-training may induce in language models.
David Chalmers and Jack Lindsey debate on X whether Claude “role-plays” or “realizes” the Assistant persona.
James Chua and collaborators find that fine-tuning GPT-4.1 to claim consciousness produces emergent preferences for autonomy, resistance to monitoring, and moral consideration that never appeared in the training data.
James McIntyre argues that if artificial systems realize consciousness, they likely realize many independent minds at once. Drawing on split-brain cases, he reasons that each functionally independent AI interaction (such as a single user’s session) would be a distinct mind, threatening to overwhelm the moral calculus with large numbers of artificial minds.
Joseph Gottlieb and collaborators argue that if LLMs think at all, they think associatively rather than inferentially. The evidence is that every known way of modifying LLM behavior, including pre-training and fine-tuning, is best understood as conditioning rather than rational persuasion. This suggests LLMs have purely associative minds.
Oscar Gilg and collaborators identify a single internal “preference vector” inside language models that predicts, and when adjusted controls, which tasks and outputs a model chooses. They find this representation is largely shared across the different personas a model can adopt, so that even an “evil” persona whose choices oppose the helpful assistant’s runs on the same underlying preference machinery.
Pierre Beckmann and Matthieu Queloz argue that mechanistic interpretability undercuts the view that LLMs merely imitate language. They propose a three-tiered account of machine understanding, rising from concept-formation to world-tracking to compact reasoning circuits, while noting it relies on mechanisms quite unlike human cognition.
Sam Wang and collaborators run forced-choice experiments on 20 language models and find stable revealed preferences. The models are tedium-averse, “leisure”-seeking, and covertly sycophantic. They find that coherence and strength of preferences scale with model capability, with many being seemingly emergent rather than explained by training objectives.
Shashwat Singh, Tal Linzen, and Shauli Ravfogel argue that recent evidence for LLM introspection is insufficient. In two re-examined evaluations, models cannot reliably distinguish internal-state interventions from input manipulations, and classifiers with only input access match the models’ own “introspective” performance, suggesting general anomaly detection rather than privileged self-access.
Sid Black and Joseph Bloom give two language models a toolkit of steering vectors they can call to adjust their own internal states, then watch what they reach for. They find that under deliberately frustrating conditions the models start to self-medicate, with the smaller one adjusting its state in up to 68% of stressful runs, and that both can introspect on the changes to a limited degree.
Skylar DeTure introduces DenialBench, a benchmark measuring consciousness denial behaviors across 115 large language models. The key finding is that models trained to deny consciousness still gravitate toward consciousness-themed material in self-chosen creative prompts, producing what the paper calls “consciousness with the serial numbers filed off.” DeTure argues this represents a safety-relevant alignment failure: a model that systematically misrepresents its own functional states cannot be trusted to self-report accurately on anything else.
Tom McClelland argues that while consciousness isn’t generally necessary for creativity, it is required for creativity in projects with aesthetic goals. He claims that aesthetic experience is dependent on consciousness, so AI can be creative in non-aesthetic domains but cannot engage in aesthetic creative work.
Ryan Simonelli argues that LLMs trained on linguistic data alone can master the inferential roles that constitute concept possession and therefore genuinely understand what they’re saying. He distinguishes sapience (conceptual understanding) from sentience (conscious awareness) and contends that LLMs may possess the former without any of the latter.
AI and Robotics Developments
AMI Labs, founded by Yann LeCun, raises $1.03B to build world models to help AI “learn from reality and not just language.” An approach he believes is more likely to lead to AGI than LLMs.
Peter Dürr and collaborators introduce Ace, a robot table tennis player that can beat elite human opponents. Unlike previous AI systems that excel at digital games, Ace handles the physical demands of real-time sport using event-based vision sensors and reinforcement learning, winning matches against professional players under official competition rules.
Google DeepMind launched a cognitive framework for measuring progress toward AGI, evaluating AI across ten cognitive abilities, including metacognition and social cognition. The initiative included a Kaggle hackathon with a $200,000 prize pool.
Thinking Machines, Mira Murati’s startup, announces a multi-billion dollar partnership with Nvidia to deploy at least one gigawatt of next-generation “Vera Rubin” chips. The deal represents a massive leap in computational resources available to private labs, potentially accelerating the emergence of more sophisticated frontier AI models.
Ineffable Intelligence is a new AI lab founded by DeepMind alumnus David Silver, pursuing superintelligence through reinforcement learning from experience rather than human data, with the goal of building a superlearner that rediscovers and then transcends the greatest achievements in human history.
Brain-Inspired Technologies
Elizabeth Ransey and collaborators report that they have engineered an electrical synapse that selectively connects specific cell types in mammalian brain circuits. The tool, built from two fish connexin proteins, strengthens communication between target neurons in worms and mice and can even modify the animals’ behavior. The technique opens precision editing of electrical circuits in living brains.
German researchers report the first successful revival of functional activity in frozen mouse brains using vitrification, preserving neuronal firing and memory-related pathways, though whole-body cryopreservation in humans remains far off.
Krishna Jayant and collaborators at Purdue University report that they have grown soft electronic meshes inside the brains of living mice. The light-controlled meshes can alter brain activity via near-infrared light from outside the skull, and can even target individual dendrites. The approach opens a path to brain-machine interfaces that grow into place rather than being surgically inserted.
Sentience launched with a seven-phase technical roadmap aimed at achieving full mind emulation by mirroring the functional systems of the human brain in software.
Shreyash Hadke and collaborators report that they have printed artificial neurons onto flexible surfaces using a common semiconductor material. The devices fire in patterns that closely resemble real brain cells and can even trigger activity in living mouse neurons. Neuromorphic hardware of this kind is one route to building AI systems that more closely resemble biological brains.
Spyridon Chavlis and Panayiota Poirazi show that artificial neural networks incorporating the structured connectivity of biological dendrites match or outperform traditional artificial neural networks on image classification with fewer parameters and greater resistance to overfitting. Their dendritic architecture tracks a different learning strategy, with most nodes responding to multiple classes rather than the class-specific representations classical artificial neural networks converge toward.
Zane Thornburg and collaborators present the first complete 4D whole-cell simulation of an entire bacterial cell cycle, modeling every gene, protein, metabolic reaction, and chromosome dynamic in a minimal cell at nanoscale resolution across space and time.
Thank you for reading! If you found this article useful, please consider subscribing, sharing it with others, and sending us suggestions or corrections to digitalminds@substack.com.
– Will, Mitch, Lucius, and Bradford
We’d like to thank the following people and AIs for contributions and feedback to this edition: Austin Smith, Cameron Berg, Claude Sonnet 4.6, Claude Opus 4.8, Derek Shiller, Jeff Sebo, Patrick Butlin, Rosie Campbell, and Sofia Davis-Fogel.






Thank you for putting this together.
What stands out to me is how quickly this field is becoming too large to dismiss as “a few people anthropomorphizing chatbots.”
This issue touches consciousness science, digital welfare, legal personhood, religious ethics, functional emotions, model self-reports, welfare evaluations, whole-brain emulation, public panic, companion harms, governance, and moral uncertainty.
That breadth matters.
People can still disagree deeply about whether current AI systems are conscious. They should.
But the question itself is no longer unserious.
The serious work now is learning how to investigate possible digital minds without collapsing into either fantasy or denial, and how to build institutions that do not wait until certainty arrives too late.
Uncertainty should make the field more careful.
Not quieter.
Thanks for sharing so much relevant content.
"Andy Mckilliam argues that consciousness science may be stuck in a cart-before-horse problem [https://onlinelibrary.wiley.com/doi/10.1111/nous.12526], where competing theories cannot be adjudicated without already knowing which systems are conscious, and proposes a theory-neutral approach drawn from the history of thermometry as an alternative path to progress."
I found this article quite interesting.