Survey on Embedding Models for Knowledge Graph and its Applications
๐ Abstract
The article discusses knowledge graphs (KGs) and their representation, as well as deep learning models and knowledge graph embedding techniques. It covers:
- Basics of KG representation, including different models like RDF, property-centric, and Wikidata
- Overview of large-scale KGs like Freebase, DBpedia, Wikidata, and YAGO
- Introduction to deep learning models like RNN, LSTM, GRU, and CNNs
- Discussion of translation-based and neural network-based KG embedding models, including TransE, TransR, DistMult, ComplEx, SME, MLP, NTN, NAM, ConvKB, and KBGAN
- Applications of KG embedding in NLP tasks like link prediction, triple classification, entity classification, and entity resolution
- Use cases of KG in domains like fake news/rumor detection, drug-related applications, suicidal ideation, and completing KGs with social media data
๐ Q&A
[01] Introduction
1. What are the key differences between Knowledge Base (KB) and Knowledge Graph (KG)?
- KB represents facts as a set of triples (subject, predicate, object), while KG is a graph representation of these triples where nodes are entities and edges are relations.
- KG provides a more flexible schema to handle the growing nature of data and allows the use of graph algorithms for querying, summarizing, and reasoning about the semantics.
2. What are the different models used to represent knowledge in KGs? The article discusses three main models:
- Resource Description Framework (RDF): Represents entities and relations using Uniform Resource Identifiers (URIs)
- Property-Centric model: Represents nodes and edges as key-value pairs, used in graph databases like Neo4j
- Wikidata model: Represents nodes as items and properties, and edges as statements with additional information like references and qualifiers
3. What are the challenges associated with representing knowledge in traditional KG models? The key challenges are:
- Computational inefficiency of graph algorithms for large-scale KGs
- Data sparsity, making it difficult to accurately calculate semantic or inferential relations
- Requirement for manual feature engineering to use machine learning on KG data
[02] Knowledge Graph Embedding
1. What is the key idea behind knowledge graph embedding? Knowledge graph embedding aims to tackle the challenges of traditional KG representation by learning low-dimensional vector representations of entities and relations that capture the semantic relationships between them.
2. What are the main steps involved in the knowledge graph embedding process?
- Assign random initial values to the entity and relation embeddings.
- Define an evaluation/scoring function to measure the plausibility of triples.
- Iteratively update the embeddings by optimizing the global plausibility of facts using an optimization algorithm.
- Generate negative examples by randomly replacing head or tail entities in the triples.
3. What are the two broad categories of KG embedding models discussed in the article?
- Translation-based models: e.g., TransE, TransR, DistMult, ComplEx
- Neural network-based models: e.g., SME, MLP, NTN, NAM, ConvKB, KBGAN
4. How do the translation-based and neural network-based models differ?
- Translation-based models represent entities and relations in the same vector space and define a translation-based scoring function.
- Neural network-based models use more complex neural network architectures to learn the embeddings and scoring functions.
5. What are the key capabilities and limitations of the different KG embedding models discussed?
- TransE is a simple model but cannot handle complex relations like symmetry and 1-to-N relations.
- TransR addresses this by using separate spaces for entities and relations.
- DistMult can model 1-to-N and symmetric relations but not anti-symmetric, inverse, and composition relations.
- ComplEx can model symmetric, anti-symmetric, inverse, and 1-to-N relations, but not composition relations.
- Neural network-based models like SME, MLP, NTN, and ConvKB can capture more complex patterns but have more parameters to learn.
[03] Applications of Knowledge Graph Embedding
1. How are KG embeddings used in NLP tasks? KG embeddings can be applied to various NLP tasks:
- Link prediction: Predicting missing entities in a triple
- Triple classification: Verifying if an unseen triple is true or false
- Entity classification: Categorizing entities into semantic classes
- Entity resolution: Identifying if two entities refer to the same object
2. How are KG embeddings used in the domain of fake news/rumor detection?
- Knowledge-based approaches to fake news detection use KG embeddings to verify the truthfulness of claims by finding semantic paths in the KG.
- KG embedding models can also be used as features to build classifiers that distinguish fake news from true news.
3. How are KG embeddings used in drug-related applications?
- KG embeddings are used to build drug relation graphs from social media data to detect drug-related content and activities.
- Graph neural network models like R-GCN are used to learn node embeddings in heterogeneous graphs of users, posts, and drug-related keywords for detecting illicit drug traffickers.
4. How are KG embeddings used in suicidal ideation detection?
- Personal knowledge graphs are constructed from user data like age, gender, location, mental health history, and social media activity.
- Graph neural network models are then used to learn node embeddings in these personal KGs to detect signals of suicidal ideation.
5. How are KG embeddings used to complete missing information in existing knowledge graphs?
- Techniques like combining information from KGs and social media profiles, and using clustering and classification on the combined data, can be used to identify missing links and attributes in KGs.