Using instance hardness measures in curriculum learning
- 0 Collaborators
Curriculum learning consists of training strategies for machine learning techniques in which the easiest observations are presented first, progressing into more difficult cases as training proceeds. ...learn more
Project status: Under Development
Overview / Usage
Curriculum learning consists of training strategies for machine learning techniques in which the easiest observations are presented first, progressing into more difficult cases as training proceeds. For assembling the curriculum, it is necessary to order the observations a dataset has according to their difficulty. This work investigates how instance hardness measures, which can be used to assess the difficulty level of each observation in a dataset from different perspectives, can be used to assemble a curriculum. Experiments with four CIFAR-100 sub-problems have
demonstrated the feasibility of using the instance hardness measures, the main advantage is on convergence speed and some datasets accuracy gains can also be verified.
Methodology / Approach
The usage of HM in CL is simple using the methodology proposed in [Hacohen and Weinshall 2019]. We embed HM as the scoring function f in Algorithm 1, allowing the use of those metrics with minimal modifications to the CLframework, which we describe next. All HMs as described in Section 2 are already standardized so that higher values are output for harder instances. The PyHard package was used in the computation of the HM 1 . In our experimental evaluation, we use as base an experimental framework delineated in the recent work of [Hacohen and Weinshall 2019] 2 , which uses a convolutional network (CNN) of moderate size for classifying a group of images from the CIFAR-100 dataset. The CIFAR-100 dataset contains 60,000 32x32x3 colored images divided into 100 classes [Krizhevsky et al. 2009]. The classes are also grouped into super-classes, each one comprising five similar classes. Each super-class contains 3,000 images, further divided into 2,500 training images and 500 test images. The datasets used in this work are: (i) “people”, (ii) “small mammals”, (iii) “trees” and (iv) “vehicles 2”, which show different difficulty levels.
The CNN used in the experiments has eight convolutional layers with 32, 32, 64, 64, 128, 128, 256 and 256 filters, respectively. The first six layers have filters of size 3 x 3, while the last two layers have filters of size 2 x 2. Additionally, in every second layer there is a 2 x 2 max-pooling layer and a 0.25 dropout layer. After the convolutional layers, the units are flattened and there is a fully-connected layer with 512 units followed by a 0.5 dropout layer. The output layer is a fully-connected layer with a number of output units equal to the number of classes, followed by a softmax layer. Finally, the network is trained using a SGD optimizer, with cross-entropy loss and batch size of 100.
Hacohen and Weinshall [Hacohen and Weinshall 2019] used a transfer scoringcfunction in their experiments. It takes the Inception network [Szegedy et al. 2016], pre-trained on the ImageNet dataset [Deng et al. 2009], runs it through each training set observation and uses the activation levels of the penultimate layer of the network as a feature vector. This yields a new dataset consisting of the extracted feature vectors of the images (with 2048 features) and their original labels. Finally, a SVM classifier is trained over this new dataset and this classifier’s classification probability scores are used as the scoring function. Given these scores, the network is trained using a fixed exponential pacing function. We also use the features from the Inception network to build the datasets fed into the HM, but without the need of making use of the SVM classifier.