Text2SQL is Not Enough: Unifying AI and Databases with TAG
๐ Abstract
The article proposes a unified paradigm called "Table-Augmented Generation" (TAG) for answering natural language questions over databases. It highlights the limitations of existing methods like Text2SQL and Retrieval-Augmented Generation (RAG) in handling queries that require semantic reasoning or world knowledge beyond what is directly available in the database. The TAG model aims to leverage the reasoning capabilities of language models (LMs) and the computational power of database management systems (DBMS) to answer a broader range of natural language queries over data.
๐ Q&A
[01] The TAG Model
1. What are the three key steps of the TAG model? The TAG model consists of three key steps:
- Query Synthesis (syn): Translates the user's natural language request into an executable database query.
- Query Execution (exec): Executes the generated query on the database system to efficiently compute the relevant data.
- Answer Generation (gen): Utilizes the computed data and the language model to generate the final natural language answer.
2. How does the TAG model unify prior methods like Text2SQL and RAG? The TAG model unifies prior methods like Text2SQL and RAG, which represent special cases of TAG and serve only a limited subset of user questions. Text2SQL focuses solely on natural language queries that can be expressed in relational algebra, while RAG considers the limited subset of queries that can be answered with point lookups to one or a few data records within the database.
3. What are some of the key design choices in the TAG model? The TAG model has a rich design space, including:
- Query types: The TAG model can handle both point queries and aggregation queries, as well as queries requiring different levels of knowledge and reasoning capabilities.
- Data model: The underlying data model can take many forms, including relational databases, unstructured data, semi-structured data, etc.
- Database execution engine and API: The TAG model can leverage a variety of database execution engines and APIs, including SQL-based systems, vector embeddings, semantic operators, and LM user-defined functions.
- LM generation patterns: The answer generation step in TAG can utilize various LM-based algorithms, including single-call generation, iterative or recursive generation patterns.
[02] Evaluation
1. What are the key findings from the evaluation of baseline methods on the TAG benchmark? The evaluation found that existing methods like Text2SQL, RAG, and their extensions struggle to perform well on queries requiring semantic reasoning or world knowledge. The Text2SQL baseline achieved no more than 20% exact match accuracy, while the RAG baseline failed to answer a single query correctly across all query types.
2. How did the hand-written TAG pipelines perform in comparison to the baselines? The hand-written TAG pipelines, implemented using the LOTUS runtime, consistently achieved 40% or higher exact match accuracy, significantly outperforming the baseline methods. This demonstrates the promise of building efficient TAG systems that can effectively leverage the capabilities of both LMs and database systems.
3. What are the key advantages of the hand-written TAG pipelines? In addition to higher accuracy, the hand-written TAG pipelines also exhibited significantly lower execution times, up to 3.1x faster than the baseline methods. This highlights that an efficient TAG system can be designed by exploiting efficient batched inference of LMs and optimized semantic query execution in the database.