Holdings: Entropy-based optimization strategies for convolutional neural networks :

Entropy-based optimization strategies for convolutional neural networks : [a thesis submitted to Auckland University of Technology in fulfilment of the requirements for the degree of Doctor of Philosophy (PhD), 2021] / Nidhi Nijagunappa Gowdra ; supervisors: Roopak Sinha, Wei Qi Yan, Stephen MacDonell.

Deep convolutional neural networks are state-of-the-art for image classification and significant strides have been made to improve neural network model performance which can now even outperform human-level abilities. However, these gains have been achieved through increased model depths and rigorous...

Full description

Saved in:

Bibliographic Details
Main Author:	Gowdra, Nidhi Nijagunappa (Author)
Corporate Author:	Auckland University of Technology
Format:	Ethesis
Language:	English
Subjects:	Neural networks (Computer science). Machine learning.
Online Access:	Click here to access this resource online

MARC


LEADER	00000ntm a2200000 i 4500
005	20221116113852.0
006	m\|\|\|\| o\|\|d\| \|\|\|\|\|\|
007	cr \|n \|\|\|\|\|\|a\|
008	210618s2021 nz a omb 000 0 eng d
040			\|a Z5A \|b eng \|e rda \|c Z5A
082	0	4	\|a 006.3 \|2 23
100	1		\|a Gowdra, Nidhi Nijagunappa, \|e author
245	1	0	\|a Entropy-based optimization strategies for convolutional neural networks : \|b [a thesis submitted to Auckland University of Technology in fulfilment of the requirements for the degree of Doctor of Philosophy (PhD), 2021] / \|c Nidhi Nijagunappa Gowdra ; supervisors: Roopak Sinha, Wei Qi Yan, Stephen MacDonell.
264		0	\|c [2021]
300			\|a 1 online resource
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
347			\|a PDF \|c 2.255 Mb \|3 Thesis
502			\|a Thesis \|b PhD \|c Auckland University of Technology \|d 2021
504			\|a Includes bibliographical references.
516			\|a Text (PDF file (172 pages, 2.255 Mb))
520	3		\|a Deep convolutional neural networks are state-of-the-art for image classification and significant strides have been made to improve neural network model performance which can now even outperform human-level abilities. However, these gains have been achieved through increased model depths and rigorous specialized manual fine-tuning of model HyperParameters (HPs). These strategies cause considerable over-parameterization and elevated complexity in Convolutional Neural Network (CNN) model training. Training over-parameterized CNN models tend to induce afflictions like overfitting, increased sensitivity to noise and decreased generalization ability which contribute to deterioration of model performance. Furthermore, training over-parameterized CNN models require specialized regimes and vast computing power subsequently increasing the complexity and difficulty of training. In this thesis, we develop several novel entropy-based techniques to abate the effects of over-parameterization, reduce the number of manually tuned HPs, increase generalization ability and, enhance performance of CNN models. Specifically, we examine information propagation and feature extraction/generation \& representation in CNNs. Armed with this knowledge, we develop a heuristic and several optimization strategies to simplify model training and improve model performance by addressing the problem of over-parameterization in CNNs. We cultivate the techniques in this thesis utilizing quantitative metrics such as Shannon's Entropy (SE), Maximum Entropy (ME) and Signal-to-Noise (SNR) ratio. Our methodology involves a multi-faceted approach of incorporating iterative and continuous integration of quantitatively defined feedback loops which allows us to test numerous research hypotheses efficiently using the design science research framework. We start off by exploring and understanding the hierarchical feature extraction \& representational capabilities of CNNs. Through our experimentation we were able to explore the sparsity of feature representations and analyze the underlying learning mechanisms in CNNs for non-convex optimization problems such as image classification. Equipped with this knowledge, we were able to experimentally demonstrate and validate the notion that for low and high quality input data (determined through ME and SNR measures) using deeper and shallower networks could lead to the phenomena of information underflow and overflow respectively, degrading classification performance. To mitigate the negative effects of information underflow and overflow in the context of kernel saturation, we propose and evaluate a novel hypothesis of augmenting the data distribution of the input dataset with negative images. Our experimental results generated a classification accuracy increase of 3%-7% on various datasets. One of the limitations argued against the validity of our novel augmentation was model training time, in particular, models require large amounts of computing power and time to train. In order to address these criticisms, we theorize a SE-based heuristic to resolve the problem of over-parameterization by forcing feature abstractions in the convolutional layers up to its theoretic limit as defined by their SE measure. The SE-based model trained 45.22% faster without compromising classification accuracy when compared to deeper models. Further arguments were posed relating to model training afflictions such as overfitting and generalizability. To mitigate the speculations raised around model training afflictions such as overfitting and generalizability in deep CNN models, we introduce a Maximum Entropy-based Learning raTE enhanceR (MELTER), to dynamically schedule and adapt model learning during training, and a Maximum Categorical Cross-Entropy (MCCE) loss function derived from the commonly used Categorical Cross-Entropy (CCE) loss function, to reduce model overfitting. MELTER and MCCE utilize a priori knowledge of the input data to curtail a few risks encountered during model training which affect performance such as, sensitivity to random noise, overfitting to the training data and lack of generalizability to new unseen data. To this extent, MELTER outperforms manually tuned models by 2%-6% on various benchmarking datasets by exploring a larger solution space. MCCE-trained models showed a reduction in overfitting by up to 5.4% and outperform Categorical Cross-Entropy (CCE) trained models in terms of classification accuracy by up to 6.17% on two facial (ethnicity) recognition datasets, colorFERET and UTKFace, along with standard benchmarking datasets such as CIFAR-10 and MNIST. Through these series of experiments, we can conclude that, entropy-based optimization strategies for tuning HPs of deep learning models are viable and either maintain or outperform baseline classification accuracies achieved by networks trained using traditional methods. Furthermore, the entropy-based optimization methods outlined in this thesis also mitigate several well-known training afflictions such as overfitting, lack of generalizability and rate of convergence while eliminating manual fine-tuning of HPs.
520	8		\|a Author supplied keywords: EBCLE; MELTER; MCCE; Entropy; Optimization.
650		0	\|a Neural networks (Computer science). \|9 327371
650		0	\|a Machine learning. \|9 320264
700	1		\|a Sinha, Roopak, \|e degree supervisor.
700	1		\|a Yan, Wei Qi, \|e degree supervisor.
700	1		\|a MacDonell, Stephen G. \|q (Stephen Gerard), \|d 1967-, \|e degree supervisor. \|9 1232266
710	2		\|a Auckland University of Technology. \|9 331914
710	2		\|a Auckland University of Technology, \|e degree granting institution. \|9 331914
856	4	0	\|u http://hdl.handle.net/10292/14258 \|z Click here to access this resource online
907			\|a .b30720539 \|b 25-06-21 \|c 18-06-21
942			\|c ET \|2 ddc \|n 0
998			\|b 18-06-21 \|c m \|d s \|e -
999			\|c 1633269 \|d 1633269

MARC

Online