What is Neural Structure Learning?
This blog is an introduction to Neural Structure Learning (NSL). Before we explore Neural Structure Learning, let’s revise some basic concepts which we need for understanding of NSL.
Neural networks are a class of non-linear mapping from inputs to outputs and comprised of multiple layers that can potentially learn useful representations for predicting the outputs.
In Semi-supervised learning, the algorithm is trained upon a combination of labeled and unlabeled data. Hence, this type of machine learning algorithm can improve the prediction performance compared to techniques that use only labeled data, by leveraging a large amount of unlabeled data. In this learning, we use a very small amount of labeled data and a large amount of unlabelled data, which is relatively much cheaper than labeled data. Here we first cluster the similar type of data using unsupervised machine learning and then use available labeled data to label the rest of the unlabelled data.
Label propagation is a semi-supervised machine-learning algorithm that assigns labels to previously unlabeled data points. It constructs a smooth graph over the unlabeled and labeled data. We can easily describe the relationships between nodes by using graphs. Edges in the graph connect semantically similar nodes or data points, and if present, edge weights reflect how strong such similarities are. By providing a set of labeled nodes, such techniques iteratively refine the node labels by aggregating information from neighbors and propagate these labels to the nodes’ neighbors.
Neural Graph Learning
Neural Graph machines, is a new training framework that can combine the power of neural networks and label propagation with the objective of graph regularization. It is proposed in the research paper Bui et al. (WSDM’18), Neural Graph Learning: Training Neural Networks Using Graphs, by Thang D. Bui, Sujith Ravi, and Vivek Ramavajjala.
If a cat image and a dog image are strongly connected in a graph, and if the cat node is labeled as an animal, the predicted probability of the dog node being animal is also high. In contrast, the neural network training objective only takes into account the labeled instances and ensures correct predictions on the training set. As a consequence, a neural network trained on the cat image alone will not make an accurate prediction of the dog image.
Such shortcoming of neural network training can be rectified by biasing the network using prior knowledge about the relationship between instances in the dataset. Training instances
that are connected in a graph, either labeled or unlabeled, for example, dog and cat in the
above example, should have similar predictions. This can be done by encouraging neighboring data points to have a similar hidden representation learned by a neural network, resulting in a modified objective function for training neural network architectures using both labeled and unlabeled data points. The architectures trained using this objective are called Neural Graph Machines (NGM).
Neural Graph Machines is a general framework for graph-augmentation training of neural networks. Its objective function encourages the neural networks to make accurate node-level predictions, as in vanilla neural network training, as well as constrains the networks to learn similar hidden representations for nodes connected by an edge in the graph. The objective function is a weighted sum of the neural network cost and the label propagation cost, can be trained by stochastic gradient descent and scaled to large graphs. The new objective has a regularization term for generic neural network architectures that enforce similarity between nodes in the graphs, which is inspired by the objective function of label Propagation.
The new objective allows neural networks to combine the learning methods of both supervised learning for labeled data and label propagation for unlabeled data. It allows the network to train using labeled data as in the supervised setting and biasing the network to learn similar hidden representations for neighboring nodes on a graph, in the same vein as label propagation.
The training objective uses graphs to augment neural network learning and works with many forms of graphs like natural graphs, constructed graphs and any type of neural network including neural network architectures like Feed-forward NNs, CNNs, RNNs on various training datasets and prediction tasks. Hence, Neural Graph machines is a generalized framework for graph-augmented training of neural networks, as it directly learns better predictive models from the graph.
Adversarial examples are malicious inputs, intentionally designed to fool machine learning models. They are constructed to intentionally mislead the model into making wrong predictions or classifications They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model’s parameters. Several machine learning models, including neural networks, consistently misclassify adversarial examples — inputs formed by applying small but intentionally worst-case perturbations or disturbance to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean inputs.
Below is the demonstration of Adversarial example from Explaining and Harnessing Adversarial Examples.
This example is a demonstration of fast adversarial example generation applied to GoogLeNet (Szegedy et al., 2014a) on ImageNet. By adding an imperceptibly small perturbation that changed GoogLeNet’s classification of the image. If you want to read more about Adversarial Learning, please refer to this blog.
Neural Structure Learning
Neural Structure learning (NSL) is a new framework in TensorFlow that allows you to train neural networks with structured signals. NSL is based on the concept of Neural Graph Learning. It jointly optimizes both features and structure signals for better models. It allows you to train more robust and better neural networks by leveraging the structure in the data. It is also supported in TF2.0 and Keras.
So what exactly is a structure?
For example, consider an album It is actually the structure of photos or we can say that photos in the album have some sort of connection or relationship with each other. Similarly, consider a Biomedical research paper, the references and citation to another paper represents a link. This kind of relationship exists between various data sources, and that’s what NSL tries to leverage.
The core idea behind NSL is to take a Neural Network, feed it with feature inputs with addition structures signals of data. For example — cat and dog classifier. In addition to images with labels, we are also going to feed Neural Network with connections and relationships between these samples themselves.
The majority of the Machine Learning applications required annotated data and labeled data and Annotations is a really very tedious task. This is where a framework like NSL can be very useful. NSL automatically captures data structures and relationships and with minimal supervision, so that you can train classifiers and prediction system with the same accuracy.
There are three major benefits of NSL -
1. Less Labeled data
2. More robust Model
3. Higher accuracy
If data changes or get corrupt. suddenly model goes bonkers, so the use of NSL improves the quality and robustness of the network.
Consider an example of document classification. Suppose we have plenty of raw books. One person wants to classify the books according to the genre of the books. Another person wants to classify the books according to the author of the books. Similarly, everyone has different needs according to their problem statement. But is it possible to annotate and label data according to different tasks to achieve high accuracy? Probably not.
Using NSL, you can label only 5% to 10% of data and train it as a classifier or prediction system.
As explain in Neural Graph Learning, the structured signals are used to regularize the training of a neural network, forcing the model to learn accurate predictions by minimizing supervised loss, while at the same time maintaining the input structural similarity by minimizing the neighbor loss.
For explicit neighbor-based regularization, we typically compute the neighbor loss as the distance between the sample’s embedding and the neighbor’s embedding. However, any layer of the neural network may be used to compute the neighbor’s loss. On the other hand, for induced neighbor-based regularization, we compute the neighbor loss as the distance between the output prediction of the induced adversarial neighbor and the ground truth label. This is commonly used in adversarial learning. Hence, NSL generalizes to Neural Graph Learning if neighbors are explicitly represented by a graph, and to Adversarial Learning, if neighbors are implicitly induced by adversarial perturbation.
For this purpose, the NSL framework has AdversarialRegularization and GraphRegularization as two separate classes which actually are wrapper classes for the Keras base model to include graph regularization or adversarial regularization as a training objective.
As explained in the NGL, this technique is generic and can be applied to any arbitrary neural architectures such as Feed-forward NNs, Convolutional NNs, and Recurrent NNs.
Workflow for Neural Structured Learning
The training samples are augmented to include structured signals.
When structured signals are not explicitly provided, then there are two methods by which we can create structured signals:-
Construct the graph by preprocessing
In an example like sentence classification, we create sample embeddings for every text and then use them to build a similarity graph, so that nodes in the graph correspond to samples and edges in the graph corresponds to the similarity between pairs of nodes.
Construct the graph dynamically (by inducing)
The core idea of adversarial learning is to train a model with adversarially-perturbed data (called adversarial examples) in addition to the organic training data. Hence original sample and its adversarial perturbed image become its neighbor, and in this way, we can create graphs dynamically.
The augmented training samples (including both original samples and their corresponding neighbors) are fed to the neural network for calculating their embeddings.
The distance between a sample’s embedding and its neighbor’s embedding is calculated and used as the neighbor loss, which is treated as a regularization term and added to the final loss.
Graph regularization for sentiment classification.
The below example is of binary classification, an important and widely applicable kind of machine learning problem. It classifies movie reviews as positive or negative using the text of the review. The following are the general steps for building a graph-regularized model using the Neural Structured Learning (NSL) framework when the input does not contain an explicit graph.
Step 1: Generate Embeddings
1.Install Neural Structured Learning package and import required libraries.
2.Load the imdb data from keras.
4.Now generate embeddings for each review (input) in imdb data by creating text-embedding lookup.
Step 2: Build Graph and prepare graph input for NSL.
Build a graph based on the generated embeddings by using a similarity metric such as the ‘L2’ distance, ‘cosine’ distance, etc. Nodes in the graph correspond to samples and edges in the graph correspond to the similarity between pairs of samples. Generate training data from the above-synthesized graph and sample features. The resulting training data will contain neighbor features in addition to the original node features.
1.Build a graph based on these embeddings using build_graph library of NSL.
2.Create sample features (features definition) and each sample will include the Id, words, and label.
3.Generate augmented training data for Neural Structured Learning. The NSL framework provides a library to combine the graph and the sample features to produce the final training data for graph regularization. The resulting training data will include original sample features as well as features of their corresponding neighbors.
Step 3: Graph Regularised Keras Model
Create a neural network as a base model using the Keras sequential, functional, or subclass API. Wrap the base model with the GraphRegularization wrapper class, which is provided by the NSL framework, to create a new graph Keras model. This new model will include a graph regularization loss as the regularization term in its training objective. Train and evaluate the graph Keras model.
Read the data by creating train and test datasets using the function make_datasets
2. Creating the base model. For this problem here we can create a Bi-LSTM model or even a simple feed-forward neural network.
3.Config the graph model by wrapping the base model with graph regularisation
4.Compile, Fit and Evaluate
Supervision ratio is the ratio of training samples to the total number of samples which includes training, validation, and test samples. The above graphs are the end results of the sentence classification problem. It can be observed that as the supervision ratio decreases, model accuracy also decreases. This is true for both the base model and for the graph-regularized model, regardless of the model architecture used. Here this joint training approach (with both structured signals and features) has outperformed the Bi-LSTM method.
NSL is really very interesting and vast topic to explore. For those who want to have hands-on experience with Neural Structured Learning, please refer to these detailed tutorials.
Thank You for reading!
Please feel free to share your doubts or suggestions. I am one of the members of team Nsemble.ai, we love to research and develop challenging products using artificial intelligence. Nsemble has developed several solutions in the domain of Industry 4.0 and E-commerce. We will be happy to help you!