Entropy-based optimization strategies for convolutional neural networks : [a thesis submitted to Auckland University of Technology in fulfilment of the requirements for the degree of Doctor of Philosophy (PhD), 2021] / Nidhi Nijagunappa Gowdra ; supervisors: Roopak Sinha, Wei Qi Yan, Stephen MacDonell.

Deep convolutional neural networks are state-of-the-art for image classification and significant strides have been made to improve neural network model performance which can now even outperform human-level abilities. However, these gains have been achieved through increased model depths and rigorous...

Full description

Saved in:
Bibliographic Details
Main Author: Gowdra, Nidhi Nijagunappa (Author)
Corporate Author: Auckland University of Technology
Format: Ethesis
Language:English
Subjects:
Online Access:Click here to access this resource online

MARC

LEADER 00000ntm a2200000 i 4500
005 20221116113852.0
006 m|||| o||d| ||||||
007 cr |n ||||||a|
008 210618s2021 nz a omb 000 0 eng d
040 |a Z5A  |b eng  |e rda  |c Z5A 
082 0 4 |a 006.3  |2 23 
100 1 |a Gowdra, Nidhi Nijagunappa,  |e author 
245 1 0 |a Entropy-based optimization strategies for convolutional neural networks :  |b [a thesis submitted to Auckland University of Technology in fulfilment of the requirements for the degree of Doctor of Philosophy (PhD), 2021] /  |c Nidhi Nijagunappa Gowdra ; supervisors: Roopak Sinha, Wei Qi Yan, Stephen MacDonell. 
264 0 |c [2021] 
300 |a 1 online resource 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
347 |a PDF  |c 2.255 Mb  |3 Thesis 
502 |a Thesis  |b PhD  |c Auckland University of Technology  |d 2021 
504 |a Includes bibliographical references. 
516 |a Text (PDF file (172 pages, 2.255 Mb)) 
520 3 |a Deep convolutional neural networks are state-of-the-art for image classification and significant strides have been made to improve neural network model performance which can now even outperform human-level abilities. However, these gains have been achieved through increased model depths and rigorous specialized manual fine-tuning of model HyperParameters (HPs). These strategies cause considerable over-parameterization and elevated complexity in Convolutional Neural Network (CNN) model training. Training over-parameterized CNN models tend to induce afflictions like overfitting, increased sensitivity to noise and decreased generalization ability which contribute to deterioration of model performance. Furthermore, training over-parameterized CNN models require specialized regimes and vast computing power subsequently increasing the complexity and difficulty of training. In this thesis, we develop several novel entropy-based techniques to abate the effects of over-parameterization, reduce the number of manually tuned HPs, increase generalization ability and, enhance performance of CNN models. Specifically, we examine information propagation and feature extraction/generation \& representation in CNNs. Armed with this knowledge, we develop a heuristic and several optimization strategies to simplify model training and improve model performance by addressing the problem of over-parameterization in CNNs. We cultivate the techniques in this thesis utilizing quantitative metrics such as Shannon's Entropy (SE), Maximum Entropy (ME) and Signal-to-Noise (SNR) ratio. Our methodology involves a multi-faceted approach of incorporating iterative and continuous integration of quantitatively defined feedback loops which allows us to test numerous research hypotheses efficiently using the design science research framework. We start off by exploring and understanding the hierarchical feature extraction \& representational capabilities of CNNs. Through our experimentation we were able to explore the sparsity of feature representations and analyze the underlying learning mechanisms in CNNs for non-convex optimization problems such as image classification. Equipped with this knowledge, we were able to experimentally demonstrate and validate the notion that for low and high quality input data (determined through ME and SNR measures) using deeper and shallower networks could lead to the phenomena of information underflow and overflow respectively, degrading classification performance. To mitigate the negative effects of information underflow and overflow in the context of kernel saturation, we propose and evaluate a novel hypothesis of augmenting the data distribution of the input dataset with negative images. Our experimental results generated a classification accuracy increase of 3%-7% on various datasets. One of the limitations argued against the validity of our novel augmentation was model training time, in particular, models require large amounts of computing power and time to train. In order to address these criticisms, we theorize a SE-based heuristic to resolve the problem of over-parameterization by forcing feature abstractions in the convolutional layers up to its theoretic limit as defined by their SE measure. The SE-based model trained 45.22% faster without compromising classification accuracy when compared to deeper models. Further arguments were posed relating to model training afflictions such as overfitting and generalizability. To mitigate the speculations raised around model training afflictions such as overfitting and generalizability in deep CNN models, we introduce a Maximum Entropy-based Learning raTE enhanceR (MELTER), to dynamically schedule and adapt model learning during training, and a Maximum Categorical Cross-Entropy (MCCE) loss function derived from the commonly used Categorical Cross-Entropy (CCE) loss function, to reduce model overfitting. MELTER and MCCE utilize a priori knowledge of the input data to curtail a few risks encountered during model training which affect performance such as, sensitivity to random noise, overfitting to the training data and lack of generalizability to new unseen data. To this extent, MELTER outperforms manually tuned models by 2%-6% on various benchmarking datasets by exploring a larger solution space. MCCE-trained models showed a reduction in overfitting by up to 5.4% and outperform Categorical Cross-Entropy (CCE) trained models in terms of classification accuracy by up to 6.17% on two facial (ethnicity) recognition datasets, colorFERET and UTKFace, along with standard benchmarking datasets such as CIFAR-10 and MNIST. Through these series of experiments, we can conclude that, entropy-based optimization strategies for tuning HPs of deep learning models are viable and either maintain or outperform baseline classification accuracies achieved by networks trained using traditional methods. Furthermore, the entropy-based optimization methods outlined in this thesis also mitigate several well-known training afflictions such as overfitting, lack of generalizability and rate of convergence while eliminating manual fine-tuning of HPs. 
520 8 |a Author supplied keywords: EBCLE; MELTER; MCCE; Entropy; Optimization. 
650 0 |a Neural networks (Computer science).  |9 327371 
650 0 |a Machine learning.  |9 320264 
700 1 |a Sinha, Roopak,  |e degree supervisor. 
700 1 |a Yan, Wei Qi,  |e degree supervisor. 
700 1 |a MacDonell, Stephen G.  |q (Stephen Gerard),  |d 1967-,  |e degree supervisor.  |9 1232266 
710 2 |a Auckland University of Technology.  |9 331914 
710 2 |a Auckland University of Technology,  |e degree granting institution.  |9 331914 
856 4 0 |u http://hdl.handle.net/10292/14258  |z Click here to access this resource online 
907 |a .b30720539  |b 25-06-21  |c 18-06-21 
942 |c ET  |2 ddc  |n 0 
998 |b 18-06-21  |c m  |d s   |e - 
999 |c 1633269  |d 1633269 
Requests
Request this item Request this AUT item so you can pick it up when you're at the library.
Interlibrary Loan With Interlibrary Loan you can request the item from another library. It's a free service.