Fact Finder - Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs
๐ Abstract
The article introduces FactFinder, a hybrid question answering system that combines Large Language Models (LLMs) and Knowledge Graphs (KGs) to provide accurate and comprehensive answers to scientific questions, particularly in the medical domain. The key points are:
- LLMs have limitations in terms of domain-specific knowledge and can produce incorrect or incomplete answers, which is problematic for applications that require factual correctness, such as target identification in life sciences.
- KGs can provide useful additional context to enhance the factual correctness and completeness of LLM responses.
- FactFinder leverages both LLMs and KGs, using LLMs to generate Cypher queries to retrieve relevant information from the KG, and then combining the graph results with LLM-based verbalization to produce the final answer.
- The system includes features to ensure transparency and explainability, such as providing the generated Cypher query, graph response, and a visualization of the relevant subgraph.
- Evaluations show that the hybrid FactFinder system outperforms standalone LLMs in terms of accuracy and completeness, and can also detect when the KG response is insufficient to answer a question.
๐ Q&A
[01] Introduction
1. What are the limitations of LLMs that the article aims to address?
- LLMs are limited by the timeframe of their training data and can produce incorrect statements (hallucinations) or incomplete answers by missing relevant entities not included in their internal knowledge.
- In domains like life sciences, obtaining answers with current and factual information is paramount for many use cases, such as target identification, designing effective field or clinical trials, and competitive intelligence.
2. How do Knowledge Graphs (KGs) help improve the factual correctness of LLMs?
- KGs represent entities and their relationships in a structured network, providing useful additional context for LLMs to enable precise and relevant information retrieval.
- KGs allow systems to leverage current and comprehensive information, including recent data not available during the LLMs' training phase.
- Integrating KGs enhances top-tier LLMs with proprietary or specialized knowledge, enabling the inclusion of unique organizational data sources.
[02] System Description
1. What are the key components of the FactFinder system? The FactFinder system consists of the following key components:
- Cypher Query Generation: LLMs are used to generate Cypher queries from natural language questions, leveraging the graph schema information.
- Query Pre-Processors: Various preprocessing steps are applied to the generated Cypher queries to improve robustness, such as formatting, synonym mapping, and handling deprecated code.
- Graph Question Answering and Verbalization: The preprocessed Cypher query is executed on the graph, and the results are incorporated into a prompt template and sent to an LLM to generate the final natural language answer.
- Explainability through Evidence: The system provides various forms of evidence, such as the generated Cypher query, graph response, and a visualization of the relevant subgraph, to enhance transparency and explainability.
2. How does the system handle cases where the Cypher query generation step produces an incorrect query? The system is designed to detect when the Cypher query generation step produces an incorrect query, resulting in irrelevant graph results. In such cases, the LLM-based verbalization component can recognize the irrelevant information and respond with "I don't know", demonstrating the system's ability to enhance reliability by leveraging both structured and world knowledge.
[03] Evaluation
1. How does the system evaluate the performance of the Cypher query generation step? The system uses a manually curated dataset of 69 text-to-Cypher query pairs to quantitatively evaluate the Cypher query generation step. It computes metrics like intersection over union (IoU), precision, and recall by comparing the expected and generated graph result sets.
2. How does the system evaluate the correctness and completeness of the final answers? The system uses the LLM-as-a-Judge approach to evaluate the quality of the LLM-generated answers. It compares the answers from the hybrid KG-LLM-based system to those from an LLM-only system, and also evaluates the reliability of LLM verbalization of the information provided by the KG. Correctness is defined as the inclusion of only facts from the graph nodes, and completeness as the inclusion of all such facts.
</output_format>