Knowledge Graph Dataset

Introduction to Knowledge Graph and Graph Neural Networks with practical use case

1. Introduction:

Knowledge graph

represents a collection of interlinked descriptions of entities – objects, events, or concepts. Basically, it is a way of finding relationships within the data and storing it in a format that allows to leverage it for data integration, advanced analytics (like clustering the data on same link types), etc. Some industries have greatly benefited from the use of Knowledge Graphs for instance The Google Knowledge Graph is a knowledge base used by Google and its services to enhance its search engine’s results with information gathered from a variety of sources.

Graph Neural Networks(GNN)

is connection models that capture the dependence of graphs via message passing between the nodes of graphs. Unlike standard neural networks, graph neural networks retain a state that can represent information from its neighborhood with arbitrary depth. Although initial attempts at training GNN have been very difficult, but with advances in architecture and parallel computing, several variants of GNN have been proposed like graph convolutional network (GCN), graph attention network (GAT), gated graph neural network (GGNN) which have demonstrated ground performances in many of the traditional machine learning tasks.

2. Types of Graph:

Directed Graph: The nodes in this type of graph have a relationship among themselves i.e., a set of objects (called vertices or nodes) that are connected together, where all the edges are directed from one vertex to another. A directed graph is sometimes called a digraph or a directed network.

Heterogeneous Graphs: Heterogeneous graphs, or heterographs for short, are graphs that contain different types of nodes and edges. The different types of nodes and edges tend to have different types of attributes that are designed to capture the characteristics of each node and edge type.

Graphs with Edge Information: In this variant of the graph, each edge has additional information like the weight or the type of the edge.

Dynamic Graphs: Another variant of the graph is a dynamic graph, which has a static graph structure and dynamic input signals. To capture both kinds of information, DCRNN(Diffusion Convolutional Recurrent Neural Network) first collect spatial information by GNNs, then feed the outputs into a sequence model like sequence-to-sequence model or CNNs.
They extend static graph structure with temporal connections so they can apply traditional GNNs on the extended graphs.


3. Use-case: Understanding the concept by discussing a real-world problem:

Problem statement: To create a Knowledge graph from web scraped Reddit data and to use GNN to perform sentiment classification on it.

Step 1: The first step is the creation of Knowledge Graph tuple dataset (Source, Edge, Target)from the scraped data, this involves employing several lemmatization and tokenization techniques as the data in the scraped form will have to be converted into proper nodes and edges. The idea behind this is to divide the sentence into three chunks -> Subject, the main verb, object where the main verb and object is basically the predicate part of the sentence. The subject will become the Source, the main verb the edge, and the object the target.

Eg: given sentence: I am sorry for you

Transformation in KG tuple:

Source Edge Target
I be sorry you

* use of lemmetization breaks down the words in their root form thereby eliminating repetitions and duplication in Tuple creation.

Knowledge Graph Dataset
Knowledge Graph Dataset
Knowledge Graph edge 'want'
Knowledge Graph edge ‘want’

Figure 3 represents a part of the Knowledge Graph where the edge is ‘want’

Step 2: Let’s try clustering the tuples have similar edge type.Here I’ll be using the Affinity Propagation Technique (type: euclidean) to group similar types of edges and create distinct clusters.

Eg: for the cluster type ‘need’ the similar words are:
beg, get, help, keep, let, need, send.

Clusters created:

Clusters Created
Clusters Created

Step 3: Creating a Knowledge graph using networkx library

The network library provides an easy and robust Knowledge graph creation from different data sources. The API(from_pandas_edgelist) to create a KG from the pandas data frame is very useful when the data is present as a tuple in the data frame.

Step 4:Using Deep Graph Library(DGL) to perform sentiment classification(positive, negative) on the Knowledge Graph.

  • Step 4.1: Forming Graph mini-batches using dgl.batch() API: One of the most common practices to train Neural Networks efficiently is to form a mini-batch by collating multiple samples together. However, batching graph inputs has two challenges:
  1. Graphs are sparse.
  2. Graphs can have various lengths. For example, a number of nodes and edges.
    To address this, DGL provides a dgl.batch() API. It leverages the idea that a batch of graphs can be viewed as a large graph that has many disjointed connected components.
    Below is a visualization that gives the general idea.
DGL batch API
DGL batch API
  • Step 4.2: DGL Graph Classifier -The procedure involves message passing and graph convolution (for nodes to communicate with others) from a batch of graphs. After message passing a tensor is computed for graph representation from node (and edge) attributes. This step might be called readout or aggregation. Finally, the graph representations are fed into a classifier g to predict the graph labels.
    DGL Graph Classifier
    DGL Graph Classifier

    The final label from the probabilities is got by defining a custom function which considers label =1 if the probability> 0.5 and if probability <0.5 then label = 0

4. Python Libraries to get one started with KG and GNN

  • NetworkX (
  • Deep Graph Library (
  • Spektral(
  • StellarGraph(
  • Ktrain(
  • CogDL (

Author: Aninda Bhattacharjee


Join our new AAIE cohort starting soon – Apply here