Is Intent Analysis the Next Frontier After Step-by-Step Thinking?
An Interview with Yuwei Yin
Yuwei Yin from the University of British Columbia outlines his research on enhancing large language models through intent analysis. He explains how identifying the purpose behind a query can improve the accuracy and coherence of AI-generated answers.
DAILOGUES: Querying a large language model to think step by step has proven to be a very successful strategy. Do you have an idea why this is so?
Yuwei Yin: In short, the answer would be that context matters for large language models, and thinking step by step enriches the context by adding bits of logical reasoning to the input of the model. Let us briefly consider how LLMs are developed. A good LLM should generate plausible sentences, such as “I like eating apples” instead of “I eating like apples.” We get such a model by letting it predict each token based on the previous tokens from the sequence. In other words, given the context of the previous input, the model chooses the next most probable wording for the continuation of the sentence. For this purpose, LLMs are trained on very large text corpora so that they can learn which relationships between different words or tokens in different situations are likely. In recent years, these models have grown larger and larger in terms of data used for their training as well as in terms of their size, that is, the number of their parameters. With the release of GPT-3 in 2020, we began to observe what is known as “in-context learning” (ICL). By giving the model several examples, it could also make good predictions for similar examples without the need to fully retrain the model on these examples beforehand. ICL even helped the model to address more complex tasks, such as math reasoning and programming code generation. Finally, prompting LLMs like GPT-3 to “think step by step” has proven effective, as it encourages them to generate so-called reasoning traces – that is, additional context they produce on their own while solving such tasks.
DAILOGUES: This was observed in 2022, maybe 2021, in the literature. This observation has also been incorporated into the training of newer models. Could you explain how?
Yuwei Yin: We can directly ask the model to think step by step when presented with a complex problem. This is called zero-shot Chain-of-Thought (CoT) prompting. But we can also show the model further exemplars of step-by-step solutions, adding them to the context, based on which it can learn similar problem-solving tricks and thus find the solution to the problem. This is called few-shot CoT prompting. For the new large reasoning models (LRMs), a lot of such reasoning chains have been collected and used to train the models.
DAILOGUES: In your research, you identified another strategy that helps elicit good answers from large language models. You ask the model to analyze the intent of an input query before generating an output. Could you explain this in more detail?
Yuwei Yin: We call this method ARR: analyze, retrieve, and reason. It is a general framework for question answering. The first step that the model should take, according to ARR, is to analyze what the provided question is about. What is the intention behind the question? Secondly, the model should gather enough information to answer the question. Thirdly, it should conduct reasoning over the collected information, with the question’s intent in mind. It turns out that the first step is very vital because if the model misunderstands the question, the whole solution will go astray. For this purpose, we let the model analyze the question, for example, by reformulating it, to ensure it understands the question’s content. While steps two and three are mainstream research trajectories, intent analysis has lagged.
DAILOGUES: What do you understand as intention or intent?
Yuwei Yin: By definition, an intent is usually a clearly formulated or planned intention. We can also understand it as the act or fact of intending something. If you want to do something, then your desire stems from a source, which we can refer to as an intention. Some synonyms for intent would be aim, goal, or purpose. I consider intent as part of our mental states. I believe the brain-mind distinction can be used as a good analogy for our research. The brain part is more about information processing, like a computer. But we also have a mind, which is a mental or psychological part of our human condition. Concerning the current research agenda of AI, I believe that the mind is often overlooked, although it is a crucial component of human intelligence.
DAILOGUES: How can we know someone’s intention? For example, there are situations where we sometimes say one thing, but we actually want something else. Human communication is often ambiguous. To figure out someone’s intent, we often need to read between the lines. This is not easy. How can a model replicate this?
Yuwei Yin: This is indeed very difficult for models. But we assume that a model can pick up on some of the intent behind most sentences. Consider again how LLMs, such as ChatGPT, are trained. Part of their training is called Reinforcement Learning from Human Feedback, where the model is trained to generate answers according to human preferences. These preferences are intentions in some sense. In our ARR framework, we query the model to spell out all possible intents for a given question. This alone improves the model’s performance across many tasks. As with Chain-of-Thought, the next step would be to directly fine-tune a model on intention-focused data.
DAILOGUES: You have also developed the idea of “speaking with intent” for language models. How does that work?
Yuwei Yin: In this case, we ask the model to simply speak as if it possessed a specific intent. In this fashion, it is supposed to generate an intent with which it can then reason to find a good answer. Each intent statement should clearly explain the sentence that follows. As with ARR, our experiments have shown that this method helps the models to generate better answers. I think we have provided a proof of concept for how we can further enhance LLMs with cognitive-like functions. Now we are working on investigating in-depth aspects of intents and training intentional models.
DAILOGUES: How has the combination of your ARR framework and speaking with intent helped improve results?
Yuwei Yin: We tested our frameworks with respect to three tasks: reasoning ability, question answering, and text summarization. In our experiments, we observed that our methods would improve the model scores on respective benchmarks compared to the answers that were generated without our methods. We further hired human annotators to analyze the quality of the generated intents in terms of effectiveness, interpretability, and coherence. The annotators corroborated the improvements from the benchmark tests. For this reason, we are confident that our methods are moving in a promising direction.
DAILOGUES: So far, you have conducted all experiments in English. Do you know if all languages and cultures have these kinds of concepts of intention or intent? Maybe they are universal psychological features, maybe they are not?
Yuwei Yin: This is still an open line of interesting research. Multilingual research involves multicultural investigation, and different cultures may regard intent (and other cognitive concepts) differently, like the distinct views on mind, body, and soul in Western and Eastern philosophies. It would be interesting to observe how this would affect our methods.
DAILOGUES: Why could your approach make models more transparent and safer?
Yuwei Yin: If we can force a language model to show its genuine intent behind every sentence or action, then we would not only gain more transparency, but also leverage to make the models safer. For example, we could detect malicious intents and intervene before AI agents produce harmful actions.
DAILOGUES: This assumes that models possess minds and genuine intentions, rather than being mere prediction machines.
Yuwei Yin: Maybe humans are also just prediction machines. Maybe we are attached to an illusion of free will and intentions. But you are right, these are open questions that we need to consider. Do you know the “duck test”? It says if it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck. In a similar vein, we could ask ourselves if an AI system speaks like a human, thinks like a human, and acts like a human, what would it be? We are still in the early stages of human-like AI, but I believe a further alignment with human abilities is an inevitable direction of AI development. We should try our best to build AIs that will be helpful and kind to us. By providing models with a better understanding of intentions, I think we are taking a step in this direction.
DAILOGUES: In the field of mechanistic interpretability, it has been shown that models do not always truly follow the reasoning steps that they generate. Although they may state them during the generation process, their predictions can rely on different latent information from their weights. This would also undermine your methods: While a model provides intent statements, it might do something else.
Yuwei Yin: This is a considerable problem. It could be the case that we would even need different model architectures that effectively prohibit models from lying about their intents to address this issue.
We thank Yuwei Yin for the DAILOGUE.
References:
Yin, Y., & Carenini, G. (2025). ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning. ArXiv, abs/2502.04689.
Yin, Y., Hwang, E., & Carenini, G. (2025). SWI: Speaking with Intent in Large Language Models. ArXiv, abs/2503.21544.
About the Author
Yuwei Yin
University of British Columbia