The disclosure paradox: what AI developers are getting wrong about mental health

Originally published on LinkedIn.

One of the reasons there is excitement about AI in mental health is the belief that people will tell AI things they won’t tell humans. On the surface, the belief seems obviously true. Mental illness is still heavily stigmatised in most parts of the world, stigma is fundamentally relational, and so the things people most need to talk about are precisely the things they are least willing to bring to another human. Therefore, because AI isn’t a human, the barrier should be lower.

This is largely supported by decades of disclosure research. In a 1999 meta-analysis of 61 studies, Richman and colleagues found that computer-administered interviews elicited less social desirability distortion than face-to-face interviews [1]; however, this effect wasn’t always there. It showed up most clearly when respondents felt alone, unobserved, and free to take their time, which are conditions a typical AI conversation already meets. Gnambs and Kaspar (2015) showed a related effect with a different comparison, where people disclosed more sensitive personal details on computerised surveys than on the same surveys administered on paper [2]. A 2024 review by Papneja and Yadav, synthesising 26 studies on self-disclosure to conversational AI, finds the same pattern now well-documented for chatbots and LLMs specifically [3]. The methods differ, but the direction doesn’t; as the felt presence of another human recedes, honest disclosure tends to rise.

What I find most interesting in this work is that the active ingredient isn’t really the technology. Lucas and colleagues (2014) ran a clever experiment in which everyone spoke to the same on-screen virtual interviewer, but half were told it was being controlled by a human and half were told it was running automatically [4]. The interface didn’t change. Only the perceived presence of a human did. The participants who believed they were talking to a computer reported less fear of being judged, displayed sadness more openly, and were rated by independent observers as more willing to disclose. Croes and Antheunis (2024) ran a more recent experimental study at a music festival with confessional booths. They found that participants perceived chatbots as less judgmental than human conversation partners, and that perceived anonymity predicted self-reported disclosure intimacy, although, there was no overall difference between conditions in how intimately people disclosed [5]. What seems important for disclosure, at least at the moment of opening up, is the perceived absence of a judging human. In those conditions, people’s apprehension about being evaluated drops, they engage in less impression management, and start to talk about the things they’d quietly been carrying. The disclosure literature describes this as perceived anonymity, and it’s the property AI mental health products are quietly relying on whether they know it or not.

It is worth being clear about how broadly this property operates. Perceived anonymity isn’t just shaping how deeply someone discloses once they have chosen to engage. It is also shaping whether they choose to engage at all. The decision to bring something stigmatised to an AI in the first place, before any disclosure has happened, is its own moment, and it is gated by the same property. Anthropic’s recent 81,000-person study shows the same pattern in open-ended conversations rather than structured screenings, with people sharing grief, mental health struggles, and financial precarity [6]. The study itself doesn’t compare AI with human conditions, so we can’t say from this data alone what people would or wouldn’t have brought to a human researcher. But Anthropic’s own framing of the appeal, patience, availability, and the absence of judgment, is a description that thirty years of disclosure literature would have predicted almost word for word, and it describes both why people come to AI with these things and why they are able to say them once there.

This is one of AI’s genuine structural advantages over human-delivered support. Not because AI is better at therapy, but because the perceived absence of human judgement lowers the barrier to disclosure, and disclosure is often the first, hardest step. People avoid seeking help in part because of stigma, and perceived anonymity reduces that barrier in a way that scales. Which is why the dominant design strategy in this space, anthropomorphism, deserves a closer look than it usually gets.

The picture gets more complicated when you turn from the literature to the products

Pickard and Roster (2020) compared a faceless audio interview system, an embodied conversational agent with a virtual face, and a human interviewer, and found significantly higher disclosure in the faceless condition than in either embodied one, with no significant difference between the agent and the human [7]. Read alongside Lucas, the implication is that both perceived agency and visible cues of humanness can work against disclosure, in different ways and through different channels; believing it’s a computer matters, and so does the visible face.

The insight isn’t new. In 1564, at the Council of Milan, Charles Borromeo mandated that confessionals be built with a screen partition between priest and penitent, specifically to enable disclosure under conditions of anonymity. The penitent knew there was a human on the other side, but could not see them, and was not seen. That arrangement was a partial solution to a problem the disclosure literature would later describe in psychological terms and AI extends the same logic a step further. Whereas the confessional removed the visible cues of the listener; AI removes the human listener altogether. The use of AI tools in mental health is a continuation of this mechanism, but what is new is the scale and the choices designers are making about what to put back in.

One of the main things designers are putting back is anthropomorphism: giving an AI a name, a face, a voice, a personality, sometimes a backstory. The instinct behind it isn’t careless. It comes from people taking the design problem seriously and trying to build something that feels safe, human-paced, and trustworthy. There is genuine evidence that anthropomorphic warmth supports rapport, perceived empathy, and a sense of being met, all of which contribute to engagement and adherence. A coaching agent that feels too cold may struggle to hold someone in the kinds of repeated, sustained interactions where capacity actually gets built. The complication is that anthropomorphic warmth and perceived anonymity operate on different mechanisms and at different points in the interaction. Perceived anonymity does most of its work at the entry, both in the decision to engage and in the moment of first disclosure, when someone is deciding whether to bring something stigmatised into view at all. Warmth tends to matter later, in sustaining engagement once disclosure has happened.

Most AI mental health products are designed at the warmth end of that dial. Branded personas, human-sounding names, faces, avatars, warm voice acting. Until recently, character.ai’s Psychologist persona, complete with a photo, a bio, and no meaningful disclosure that the user was talking to a language model, was one of the more visible examples among many, and a clear outlier on the careless end. That specific character has since been removed, at least in the UK, and the Psychology character that replaced it now carries a disclaimer banner clarifying that it isn’t a real person or licensed professional. The disclaimer is adds transparency, but it sits outside the persona, as a banner, while the persona itself, the name, the framing, the in-conversation experience, remains broadly intact. That’s a useful illustration of how partial the fix is. Informed consent and perceived anonymity are different aspects of design, and a banner addresses the first more than the second. The mechanism the product is relying on for first disclosure is shaped by the in-conversation experience, not the regulatory metadata above it.

Which is where the paradox sits. The design features intended to build trust are the same features that can erode the property responsible for the trust in the first place. If a product wants someone to bring the thing they haven’t told anyone else, the design pattern is, at that moment, working against itself.

The risks of adding a face, giving the AI a human name, or obscuring its nature are usually framed around informed consent, realistic expectations, and trust, and those framings are right as far as they go. But there is a deeper concern that doesn’t get the same attention. These choices don’t just mislead users about what they’re talking to, they erode the perceived anonymity that lets users disclose in the first place. It isn’t only a transparency failure, it’s a design failure against the mechanism the product is relying on.

The honest version of the picture, then, is that the paradox sharpens into a trade-off. There are two mechanisms, both supported by evidence, working in different directions at different moments. For some users, the tension barely registers. For others, particularly those carrying something stigmatised they have not told anyone, perceived anonymity matters most at the points that matter most, both at the threshold of choosing to engage and at the moment of first disclosure, and a warm persona may be the thing tipping them away from saying it, or from arriving in the first place. The same product, with the same persona, sits very differently for the user already mid-engagement than for the user trying to find a way to begin.

This is one of the design questions I sit with most often in building Nova. Responsible AI development in mental health isn’t only about safety guardrails and crisis routing. It’s about understanding why AI works at all in this domain, and building in a way that preserves the properties doing the work rather than eroding them in the pursuit of a warmer brand experience.

What this points toward, in practice, is to design for the property, not the persona. Perceived anonymity is what gives AI its structural advantage in mental health, and any design choice that erodes it without thought is a design choice trading against the product’s own purpose. Transparency, warmth, and perceived anonymity aren’t simply in conflict, but they aren’t free of one another either. That then means they need to be designed for as inter-related properties rather than collapsed into a single dimension called trust.

What looks like building trust can be exactly what breaks it.

References

[1] Richman, W.L., Kiesler, S., Weisband, S., & Drasgow, F. (1999). A meta-analytic study of social desirability distortion in computer-administered questionnaires, traditional questionnaires, and interviews. Journal of Applied Psychology, 84(5), 754–775. psycnet.apa.org/record/1999-01454-009

[2] Gnambs, T., & Kaspar, K. (2015). Disclosure of sensitive behaviors across self-administered survey modes: A meta-analysis. Behavior Research Methods, 47(4), 1237–1259. link.springer.com/article/10.3758/s13428-014-0533-4

[3] Papneja, H., & Yadav, N. (2024). Self-disclosure to conversational AI: a literature review, emergent framework, and directions for future research. Personal and Ubiquitous Computing, 29, 119–151. link.springer.com/article/10.1007/s00779-024-01823-7

[4] Lucas, G.M., Gratch, J., King, A., & Morency, L.-P. (2014). It’s only a computer: Virtual humans increase willingness to disclose. Computers in Human Behavior, 37, 94–100. sciencedirect.com/science/article/abs/pii/S0747563214002647

[5] Croes, E.A.J., & Antheunis, M.L. (2024). Digital Confessions: The Willingness to Disclose Intimate Information to a Chatbot and its Impact on Emotional Well-Being. Interacting with Computers, 36(5), 279–292. academic.oup.com/iwc/article/36/5/279/7692197

[6] Huang, S., Carter, S., Eaton, J., Pollack, S., Callender III, D., Makagiansar, N., Gonzalez, M., Carr, S., Hong, J., Handa, K., McCain, M., Millar, T., Julapalli, M., Yun, G., Alt, A.J., Larsson, C., Leibrock, J., Gallivan, M., Sumers, T., Durmus, E., Kearney, M., Shen, J.H., Clark, J., Stern, M., & Ganguli, D. (2026). What 81,000 People Want from AI. Anthropic. anthropic.com/features/81k-interviews

[7] Pickard, M.D., & Roster, C.A. (2020). Using computer automated systems to conduct personal interviews: Does the mere presence of a human face inhibit disclosure? Computers in Human Behavior, 105, 106197. sciencedirect.com/science/article/abs/pii/S0747563219304170