The following is a guest post from Mohammad Muquit here to discuss implementing multi-order modeling to improve accuracy of deep learning models.
In typical classification problems, deep neural network (DNN) accuracy is measured in terms of percentage of correct class predictions against ground truths. However, DNN's final layer contains more information than just a class name, but also a probability density array containing the probability of each class in the DNN. As a result, typical approaches may ignore a significant amount of output data.
In this blog we use the idea that this probability density array
itself could be utilized as a set of predictors for additional modeling and look into cases where this lost information can be utilized to improve a DNN's accuracy.
Fig 1. shows the basic idea behind the proposed approach.
We introduce a quality assurance (QA) application as a case example here.
For a given sample product going to be inspected, multiple images of that sample are used as inputs. The images are captured by rotating the sample product in front of a QA imaging system, capturing 60 images per sample. In total, 135 unique samples are used in this case study, where 55 of them had some defects and the remaining 80 samples are normal. We split this data into training and test data as follows:
|total unique samples:
A DNN is first generated using the training images for transfer learning with GoogleNet. The overall accuracy of the 2100 test images was 67.43%, where the accuracy for Normal
images were 67.05% and 87.18%, respectively. Though it can identify individual defected images with higher accuracy, it fails to identify normal images for almost one-third of the cases.
Note that these results are only at individual image-to-class level, yet we have 60 images per sample. Even when these 60 predicted results are of very low accuracy individually
, collating them together for an additional model (shown below in Fig2) might result in very high accuracy, which we'll explore in the next section.
Fig2: Image of the same sample are collated together to form a very long predictor array as input for a second model. This figure shows the idea of 2nd order modeling, but note that this can be further extended to multiple-order modeling.
Approaches for multiple-order modeling
I will present 4 approaches to multiple-order modeling from simplest to most complex, all of which improves on the original accuracy of the individual image-to-class accuracy.
The code to recreate these experiments and plots is available on file exchange
We set a rule that if N
out of the 60 images are predicted as Defected, then the sample will be called Defected. Fig3 shows how the accuracy level for Normal and Defected moves with the changes in the value of N. Fig4 shows the ROC curve.
numImg = width(TrainDataTable)-1;
nrIndx = ~dfctIndx;
Nnr = sum(nrIndx);
Ndf = sum(dfctIndx);
SA = sum(table2array(TestDataTable) <= 0.5,2);
nAcr = zeros(numImg,1);
dAcr = zeros(numImg,1);
for k = 1:numImg
rslt = SA >= k;
dAcr(k) = sum(rslt == 1 & dfctIndx == 1)/Ndf;
nAcr(k) = sum(rslt == 0 & nrIndx == 1)/Nnr;
From the coincide of the two accuracy curves in Fig3, we can see that if we set as N as 17 or 18, then Normal and Defected samples can be detected with accuracies as low as 4.35% and 8.33%, respectively. We understand that such approach is not effective for collating outputs of a model with very low prediction accuracy against individual images.
Normal-Defected pattern as predictor for machine learning
In this second approach, we train a second model by using only category predictions of all the 60 images. So, for each individual sample, we create a 1 x 60 array of binary values, i.e., assigning a value of either 1 or 0 (i.e., Normal:1
. We create arrays for training samples, train a model, and then create arrays for test samples to evaluate the model. We find that the accuracy improves to more than 90%. Instead of looking at only the number of 0's or 1's, looking at how the 0's or 1's are arranged in an array is much more efficient in differentiating the samples.
DiscTestDataTable = double(table2array(TestDataTable(:,1:end)) > 0.5);
DiscTestDataTable = array2table(DiscTestDataTable);%Array to table conversion
For both test and training data, set the data to 1 if normal, 0 if defected. This is indicated by data that is greater than .5 in the original probability density table.
So using the first sample in the test data, the conversion would look like this:
Test Sample 1, images 1-20:
Test Sample 1 with threshold of 0.5:
We train a machine learning classifier to identify the pattern of 0's and 1's to differentiate normal and defected samples.
bTM = trainClassifier(DiscTrainDataTable,numImg);
bP = bTM.predictFcn(DiscTestDataTable);
bAcc = 100*sum(bP==TestProdLabels)/numel(bP);
disp(['Thresholded Data Accuracy:', num2str(bAcc),'%'])
Thresholded Data Accuracy: 91.4286%
Probability distribution value as predictor for machine learning
In the previous approach, there is no guarantee that 0.5 is the best value to divide the predictors into two different classes. Therefore, using the probability density value itself (a continuous value) as predictor might be the next step for improvement. We train and evaluate models as shown in the code below. Because information loss is reduced, we see a good rise in accuracy (97.14%) in this approach compared to the previous one (91.43%).
TM = trainClassifier(TrainDataTable,numImg);%Modeling using continuous pattern as
P = TM.predictFcn(TestDataTable);% Prediction regarding the Test data (Sample-to-Class level prediction)
acc = 100*sum(P==TestProdLabels)/numel(P);
disp(['Normal Data Accuracy:',num2str(acc),'%'])
Training a LSTM neural network
There is still one more point left, which is the coherence among the 60 probability density values obtained against 60 images, which were captured in a time-series manner. The idea is: for a given Defected sample, the defects are supposed to be visible on some of the images out of 60 images. As a result, regarding such Defected samples, probability density value indicating Defected condition should appear in a bunch. Whereas, for a given Normal sample, even if there are some probability density value indicating Defected condition by mistake, they should appear randomly caused by noise or other factors.
In this section we use the probability density data for training an LSTM neural network and evaluate its accuracy. This time, the accuracy reaches 100% with a neural network trained with 100 epochs.
%% Evaluation of the LSTM network
PL = classify(net,tstLstm);% Prediction on test data
lAcc = 100*sum(PL==TestProdLabels)/numel(PL);%Calculation of accuracy
In this blog, we showed that the general approach of interpreting the output data of a neural network into a single decision may not be the best practice to get optimal results. Our introduced approaches show that multiple-order modeling using outputs from prior deep neural network (DNN) with apparently very low accuracy can eventually contribute to high accuracy in practical applications.
We also introduced different approaches of multiple-order modeling to show that the use of probability distribution value instead of predicted classes may help obtain better accuracy. In addition, for cases where input image is acquired in a time-series manner, LSTM based approaches might be further helpful in accuracy improvement.
The full code is available here: https://www.mathworks.com/matlabcentral/fileexchange/79092-multiple-order-modeling-for-deep-learning
Have any comments or questions for Mohammad? Leave a comment below.