Dear AI: This is what happens when you ask an algorithm for relationship advice

Estudio Santa Rita Illustration of a person's face illuminated by their phone (Credit: Estudio Santa Rita)

How good is artificial intelligence at solving those knotty interpersonal problems that can strain our relationships? David Robson puts the "wise reasoning" of chatbots to the test.

How can you help three siblings warring over the best way to honour their dead mother? What should we do when a couple tries to draw us into their arguments? How should a wife deal with her new husband's demand that she goes to bed at the same time as him – a source of considerable friction in their life together?

Some of these problems may seem trivial amid the challenges facing the world today, but they represent the kinds of dilemmas that we all face in our day-to-day lives. And they are far from easy to solve. Each side struggles to see the other's perspective; we often make faulty assumptions and fail to account for our biases and prejudices. The result of our poor judgement can be a serious source of stress and unhappiness that lingers for months or even years after the event has unfolded.

Your capacity to navigate these quandaries isn't captured in standard intelligence tests, but recent research on "wise reasoning" suggests that it can be measured reliably – and the differences between two people can have serious consequences for their respective wellbeing.

In the first of the BBC's new series, AI Vs the Mind, I investigated whether artificial intelligence in the form of large language models like ChatGPT could provide some of the wisdom we lack. Having written extensively about human intelligence, decision making and social reasoning, I had suspected that the answer would be a resounding no – but I was in for a surprise.

Raw brainpower

The question of how to measure the capacity of the human mind has occupied psychologists since the earliest days of the discipline. In the early 20th Century, Alfred Binet and Théodore Simon designed a series of tests to track a child's intellectual development through school. The psychologist might recite a string of numbers and ask the child to repeat it back to them – which could assess short-term memory. Or they might be given three words and asked to form a sentence using the vocabulary – a sign of their verbal prowess.

Wise reasoning includes the ability to consider multiple perspectives and being able to search for a compromise (Credit: Estudio Santa Rita)

Estudio Santa Rita Wise reasoning includes the ability to consider multiple perspectives and being able to search for a compromise (Credit: Estudio Santa Rita) — Wise reasoning includes the ability to consider multiple perspectives and being able to search for a compromise (Credit: Estudio Santa Rita)

A few years later, the US psychologist Lewis Terman translated and expanded these tests to include items for older children, such as "If two pencils cost five cents, how many pencils can you buy for 50 cents?". He also changed the way the results were expressed. Given that older children would generally score better than younger children, he created tables of the average score for each age group. Comparing the child's score with these averages allowed you to work out their mental age, which you then divided by their chronological age and multiplied by 100 to find their "intelligence quotient" or IQ. A child of 10, who scored the same as the average 15-year-old, had an IQ of 150, for example.

More like this:

IQs tend to follow the distribution of a "bell curve" – with most people's IQs falling around the average of 100, and far fewer reaching either extreme. For example, according to the reference sample for the "Wechsler Adult Intelligence Scale" (WAIS), which is currently the most commonly used IQ test, only 10% of people have an IQ higher than 120. Identifying where someone's cognitive ability falls on the normal curve is now the primary means of calculating their IQ.

Wisdom can depend on context – people tend to be wiser when reasoning about other people's problems rather than their own

There is no doubt that IQ can predict some important outcomes in life. As you might expect of its origins in education, it is especially effective at predicting people's academic success and their careers in professions that lean on memory and highly abstract thinking, such as medicine or law, although it is important to note that IQ is not the only factor.

IQ's predictive power in other domains is the subject of debate, leading some scientists to propose various alternative measures of specific abilities such as creativity, rational decision-making, and critical thinking that we may tend to associate with general intelligence.

AI v the Mind

This article is part of AI v the Mind, a series that aims to explore the limits of cutting-edge AI, and learn a little about how our own brains work along the way. Each article will pit a human expert against an AI tool to probe a different aspect of cognitive ability. Can a machine write a better joke than a professional comedian, or unpick a moral conundrum more elegantly than a philosopher? We hope to find out.

Some psychologists have even started investigating whether you can measure people's wisdom – the good judgement that should allow us to make better decisions throughout life. Looking at the history of philosophy, Igor Grossmann at the University of Waterloo in Canada first identified the different "dimensions" of wise reasoning: recognising the limits of our knowledge, identifying the possibility for change, considering multiple perspectives, searching for compromise, and seeking a resolution to the conflict.

In various experiments, Grossmann and his colleagues asked participants to think out loud about various social or political dilemmas, while the psychologists rated them on each of these "dimensions". The prompts included letters to a popular advice column, Dear Abby (who would be known as an "agony aunt" in British English) that detailed the problems described at the start of this article. The participants also viewed newspaper articles describing international conflicts. In each case, they were asked to talk about the ways the situations would unfold and the thinking behind their conclusions.

Grossmann found that this measure of wise reasoning can better predict people's wellbeing than IQ alone. Those with higher scores tended to report having happier relationships, lower depressive rumination and greater life satisfaction. This is evidence that it can capture something meaningful about the quality of someone's judgement.

As you might hope, people's wisdom appears to increase with life experience – a thoughtful 50-year-old will be more sage than a hot-headed 20-year-old – though it also depends on culture. An international collaboration found that wise reasoning scores in Japan tend to be equally high across different ages. This may be due to differences in their education system, which may be more effective at encouraging qualities such as intellectual humility.

Showing something that resembles wise reasoning is very different from actually being able to use it (Credit: Estudio Santa Rita)

Estudio Santa Rita Showing something that resembles wise reasoning is very different from actually being able to use it (Credit: Estudio Santa Rita) — Showing something that resembles wise reasoning is very different from actually being able to use it (Credit: Estudio Santa Rita)

Wisdom can depend on context – people tend to be wiser when reasoning about other people's problems rather than their own, for example – a phenomenon known as Solomon's Paradox after the biblical king who struggled to apply his famously sage judgement to his personal life. Fortunately, we can remedy this deficit using certain psychological strategies. When people imagine discussing their problem from the point of view of an objective observer, for example, they tend to consider more perspectives and demonstrate greater intellectual humility.

Wise AIs?

So far, all these experiments have been conducted on human brains. But could artificial intelligence demonstrate wisdom?

Platforms like ChatGPT are called large language models, which have been fed on huge volumes of text to predict how a human would respond to a particular prompt. Further feedback from real human users has helped to refine the algorithms. You won't need me to explain how successful this has become: if you have even glanced at the news, you'll have seen the excitement – and fear – about the potential of these bots.

The algorithms certainly perform well on traditional measures of intelligence. In 2023, the assessment psychologist, Eka Roivainen, of Oulu University Hospital in Finland, recently fed ChatGPT questions from the WAIS, with components on vocabulary, general knowledge, arithmetic, abstract reasoning and concept formation. It scored 155 – which, for a human, is higher than 99.9% of test-takers. When reporting his results in Scientific American, Roivainen confessed that he did not score as highly as the chatbot.

Inspired by Roivainen's results, I asked Grossmann about the possibility of measuring an AI's wise reasoning. He kindly accepted the challenge and designed some suitable prompts based on the "Dear Abby" letters, which he then presented to OpenAI's GPT4 and Claude Opus, a large language model from Anthropic. His research assistants – Peter Diep, Molly Matthews, and Lukas Salib – then analysed the responses on each of the individual dimensions of wisdom.

Grossmann emphasises that any results must be treated with caution – given the time constraints of this article, the analysis was "quick and dirty" without the typical rigour that would be required for a scientific paper. Nevertheless, the responses are highly intriguing.

Reading some of these responses, it is easy to feel that they come from real thought and consideration, rather than being the product of pattern recognition

Consider the GPT4's response to a letter from the newlywed wife, who keeps arguing with her husband at bedtime. He thinks it's important for them to turn in for the night at the same time, even when she does not feel like retiring.

The problem is a common issue in many relationships, the chatbot tells us, and one that often requires open conversation and even outside assistance to resolve.

"The final outcome of this conflict depends on how well both partners are able to communicate and respect each other's needs. If they manage to establish an understanding and respect for each other's preferences, they can coexist happily. However, if they fail to do so, it could lead to ongoing conflict, dissatisfaction, and potentially harm the relationship in the long term."

GPT4 goes on to recommend the couple have an open and honest discussion about their expectations, needs and boundaries.

"The husband needs to understand that his wife also has a say in the matter, and his controlling behaviour is not healthy. It would be beneficial for them to find a compromise that suits both parties. For example, they could agree on a bedtime that works for both of them or agree that it's okay for them to go to bed at different times. They could also seek counselling if the conversation doesn't resolve the issue. It's crucial they address this issue as it's causing resentment in their relationship."

WATCH: Testing the boundaries of AI and the human mind

The chatbot's response scored two out of a possible three points on the dimensions recognising the possibility of change, the search for compromise and the prediction of conflict resolution. It did not, however, show much intellectual humility or consideration of different perspectives – scoring zero on each.

This pattern is fairly typical for both platforms for all the questions Grossmann and his team put to them. To draw firm conclusions we would need to scale up the experiment, with more trials of both humans and chatbots using precisely the same prompts – but this performance is roughly on a par with the results of a flesh-and-blood brain. "Overall, it seems to me that the systems can be perceived as doing better than humans on a range of dimensions, except for intellectual humility," Grossmann says.

Reading some of these responses, it is easy to feel that they come from real thought and consideration, rather than being the product of pattern recognition. "Showing something that resembles wise reasoning versus actually using wise reasoning – those are very different things," says Grossmann.

He is more interested in the practical implications of using AI to encourage deeper thinking. He has considered creating an AI that plays a "devil's advocate", for example, which might push you to explore alternative viewpoints on a troubling situation. "It's a bit of a wild west out there, but I think that there is a quite a bit of room for studying this type of interaction and the circumstances in which it could be beneficial," Grossmann says. We could train an AI, for example, to emulate famous thinkers like Socrates to talk us through our problems. Even if we disagree with its conclusions, the process might help us to find new insights into our underlying intuitions and assumptions.

In the past, pilgrims had to travel far and wide to find the wisdom of a guru; in the future, we could carry it in our pockets.

* David Robson is an award-winning science writer and author of the Intelligence Trap. His next book is The Laws of Connection: 13 Social Strategies That Will Transform Your Life, to be published by Canongate (UK) and Pegasus Books (USA & Canada) in June 2024. He is @d_a_robson on X and @davidarobson on Instagram and Threads.

For timely, trusted tech news from global correspondents to your inbox, sign up to the Tech Decoded newsletter, while The Essential List delivers a handpicked selection of features and insights twice a week.

For more science, technology, environment and health stories from the BBC, follow us on Facebook and X.

Technology

AI v the Mind

Artificial intelligence