Correction of training samples taking into account errors in measuring the characteristics of objects when constructing classifiers according to the methodology of teaching with a teacher
Abstract
Correction of training samples taking into account errors in measuring the characteristics of objects when constructing classifiers according to the methodology of teaching with a teacher
Incoming article date: 17.06.2021Noise in training samples, the main part of which is made up of outliers and novelty, is considered. The analysis of the main causes of outliers in training samples is given. The essence of the main existing approaches to determining outliers in training samples is considered. Based on the use of the nearest neighbors method, a modified method for comparing generalized distances from objects to classes is proposed. For the main types of metrics used in the spaces of feature values, the justified values of the safety factors used in this technique are found. For a programmatic assessment of the quality of the training sample and a reasonable choice of the method for correcting outliers, it is proposed to use the permissible fractions of corrected and removed outliers in it. An algorithm for analyzing the presence of outliers in a set of training examples is given. An estimate of the complexity of the algorithm by the length of the input of the problem is given. An algorithm for evaluating and correcting training samples has been developed.
Keywords: classification problem, classifier, decision function, training sample, precedent, erroneous data, analysis, correction, artificial intelligence, compactness hypothesis, novelty, learningg