Health Systems Action

An Old German Philosopher Rescues AI Users

Advances in artificial intelligence have come at breathtaking speed but have outpaced most ordinary people’s ability to make productive use of these amazing technologies. Some people walk away disappointed because they use Large Language Models (LLMs) inappropriately, as if they were search engines, and get less than impressive results. A good answer to how to use LLMs more effectively comes from the German philosopher Immanuel Kant (not Kantor 😊) who developed his ideas 150 years ago.

Image: Hejia Geng, Boxun Xu, Peng Li. UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities. Arxiv 2024.

Chain of Thought

Chain of Thought (CoT) prompting is one way to get better results. This approach encourages an LLM like GPT-4, Gemini or Bing to generate intermediate steps or a “chain of thought” that leads to the final answer. It’s like asking the model to “think aloud” as it solves the problem. This approach helps in complex problem-solving by breaking down the process into more manageable parts, making the LLM’s reasoning process more transparent and understandable. CoT can improve the accuracy of LLMs on tasks that require reasoning, as it forces the model to consider and articulate each step of the thought process.

Kant, not Kantor

This article introduces UPAR – which stands for Understand, Plan, Act, and Reflect – as a way to make LLMs even smarter, and more understandable. Immanuel Kant thought deeply about thinking, and about knowledge, and the article is inspired by his ideas[1].

Understand is the first stage of the framework: look at the information given and pick out the important parts.

The next stage is to make a Plan on how to solve the problem.

Act is next – follow the plan to come up with an answer.

Finally, Reflection – checking the work and learning from any mistakes.

This method helps the LLM explain its thinking better and make fewer mistakes. For example, in a difficult math and science benchmark test, UPAR helped GPT-4 score much higher than usual[2], without needing extra examples or tools (both of which are ways to get better results too).

Most other current methods of making LLMs smarter are less structured and don’t have a solid theory behind them.

One of my favourite podcasters has a weekly seminar about his approach to effective prompting, called PPP – Prime Prompt Polish. He makes fun of online vendors who sell random sets of prompts, with big claims, for ludicrous amounts of money. His structured method of prompting and testing makes sense, has multiple testimonials, and it’s free. But no theory base or published studies to provide empirical evidence.

UPAR tries to address the challenge by using Kant’s ideas to make a system that’s more like how humans think and learn therefore not just more accurate but easier to understand and trust.

Applying the UPAR framework to a simple example will show how this works.

“Cyril has 5 apples. He buys 8 more and then gives 3 to his friend. How many apples does Cyril have now?”

Understand: the prompt for the LLM is: “Identify the key pieces of information and the question being asked.” The LLM’s task is to recognize that Cyril initially has 5 apples, buys 8 more, gives away 3, and we need to find out how many apples he has in the end.

Plan: the prompt is “Outline the steps needed to solve the problem based on the information identified.” The LLM’s task is to plan to first add the 8 apples to the initial 5 to find out how many apples Cyril has before giving any away. Then, subtract the 3 apples given to his friend to find the final count.

Act: the prompt for the LLM is “Execute the plan step by step.” The LLM’s task is to calculate (5 + 8 = 13) apples before giving any away. Then, calculate (13 – 3 = 10) apples as the final count.

Reflect: the prompt is “Review the solution process and outcome. Is there anything that might have been overlooked or could be explained better?” The LLM’s task is to reflect on the calculation process and confirm that the steps followed logically lead to the correct answer, ensuring no step was missed or miscalculated.

Final Output: Cyril has 10 apples now.

Clear reasoning, explainability and accuracy

The UPAR framework guides the LLM through solving a problem in a structured and understandable manner, mimicking human cognition. By breaking down the task into these four phases, the LLM is not only more likely to arrive at the correct answer but provides a clear and traceable path of reasoning that can be understood and verified by humans.

If you give an LLM a question without guiding it through the UPAR stages, the LLM might still be able to solve the problem, especially if it’s straightforward. LLMs like GPT-4 are trained on vast amounts of text data, including many examples of problem-solving, so they can often come up with the correct answer to simple questions directly. However, the process might be less transparent, and for more complex problems, the chance of errors or “hallucinations” (producing incorrect or nonsensical answers) increases without a structured approach.

UPAR goes a step further than CoT. It’s not just about thinking aloud. It follows a structured method that mirrors human thinking and increases the explainability and reliability of LLMs. It not only breaks down the reasoning process but incorporates reflection to evaluate and potentially correct the thought process.

Both CoT and UPAR aim to improve LLMs’ problem-solving abilities and explainability but UPAR’s structured framework might offer better insights into the model’s thought process and potential areas for improvement, especially in more complex or ambiguous scenarios.

[1] Kant’s model of human thinking describes specific reasoning layers that form built-in comprehension abilities that we unconsciously use to make sense of the world. These include sensibility (assembling perceptions), understanding (organizing perceptions), reasoning (abstracting higher principles) and judgment (grounding back in empirical experience).

[2] This approach increased the accuracy from a CoT baseline of 22.9% to 58.3% in a challenging subset of GSM8K, a benchmark test, and from 67.9% to 75.4% in a causal judgment task

1 thought on “An Old German Philosopher Rescues AI Users”

  1. TSP – Tenacity , Sagacity and Perspicacity
    The LLM needs to be tenacious in searching for answers
    The LLM then needs to search for wisdom .Is there wisdom in the answers ?
    The LLM then finally needs to consider the wisdom of it’s answers

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top