PSPNet运行及训练

前言

最近实习完回学校,继续开始我的图像语义分割研究了,为毕业论文做准备,首先还是看了一下最近的一些最新的分割论文,在ICNet论文中找到一个很好的图表,显示各个方法在CItyscapes数据集的测试集上的时间和精度指标情况:


图片还在路上,稍等...

由上图可以看到精度上很高的方法有Resnet38,PSPNet,DUC三种方法,速度上较快的只有ICNet(考虑到精度因素);除了上图中的方法之外,2018年最新的模型中deeplab v3+在这个测试集上精度达到了最好的82.1(时间没有考虑),所以本次主要实验的baseline使用PSPNet,下面就开始漫漫征程的第一步——跑通PSPNet

# 配置环境
## 准备工作

深度学习的第一步当然是配置环境了,好的环境简直就是成功的一半了。不过这里就不详细介绍了,得益于出去实习的四个月我之前配置好环境的服务器还是完好的保存着没有人用,所以可以有一套完好的caffe训练环境,需要配置的童鞋可以参考我之前的:Unbuntu配置Caffe以及调试DeepLab记录

我自己配置的环境是Ubuntu 16.04+NVIDIA-Linux-x86_64-384.98驱动
+CUDA8.0+Anaconda2+cuDNN5.0.5+OpenCV2.4.13.4+caffe

在此环境下首先还需要下载的就是:

1. PSPNet论文
2. PSPNet caffe源码
3. MATLAB 2015b(链接:https://pan.baidu.com/s/10q16mB_62EZL_aVCdo545w 密码:1ggm),这里需要稍微注释一下,我本来下的2016b,但是看到caffe的官网上:在这里插入图片描述
所以我还是猥琐的换回2015a算了,也是因为之前实习的时候配置用的2016b结果没有弄好(好尴尬,写这篇博客的时候才发现自己还是看错了,人家支持的是2015a,我下的是2015b,哎,这粗心的毛病啥时候能改的好啊。。。),不过最终还是配置成功了,不放心的童鞋还是可以自己去下2015a。为什么需要下MATLAB是因为作者的评价代码是用的MATLAB,为了和作者保持一致,还是配置一下matcaffe吧

## 开始配置

1、首先我们来安装MATLAB2015b,我主要参考的是
https://blog.csdn.net/hejunqing14/article/details/50265049

2、安装好了之后就是非常艰难的源码编译过程了,主要可以参考:
https://blog.csdn.net/WZZ18191171661/article/details/70149070 ,首先还是修
改Makefile.config,我是直接把之前编译成功的caffe的Makefile.config复制到PSPNet源码里面进行编译的,刚开始问题就来了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from src/caffe/common.cpp:7:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::createPoolingDesc(cudnnPoolingStruct**, caffe::PoolingParameter_PoolMethod, cudnnPoolingMode_t*, int, int, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:127:41: error: too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’
pad_h, pad_w, stride_h, stride_w));
^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
cudnnStatus_t status = condition; \
^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
from ./include/caffe/util/device_alternate.hpp:40,
from ./include/caffe/common.hpp:19,
from src/caffe/common.cpp:7:
/usr/local/cuda-7.5//include/cudnn.h:803:27: note: declared here
cudnnStatus_t CUDNNWINAPI cudnnSetPooling2dDescriptor(
^
make: *** [.build_release/src/caffe/common.o] Error 1


这个问题是我之前编译成功的caffe的cuDNN版本和PSPNet的版本不一致导致的,参考
https://blog.csdn.net/u011070171/article/details/52292680 中的方法顺利解决

3、在make runtest的时候我出现了下面的错误






网上一般的解答都是网络中feature map大小不一致造成的,可是我又不是在训练自己的网络的时候遇到的,我也很无奈,所以就放着了,毕竟这个也并没有影响到后面的测试以及训练,不过有大牛知道还是希望告知一下的。

4、然后就是编译MATLAB的caffe接口了,make matcaffe还好没有报错,但是make mattest的时候:
在这里插入图片描述
主要参考:http://www.cnblogs.com/laiqun/p/6031925.html 以及 caffe官网,但是我用

1
ldd ./matlab/+caffe/private/caffe_.mexa64


查询C++的动态链接库的时候,发现了两个没有链接到的运行时库,尝试了各种方法将运行时库链接上了,可是还是编译不成功,最后我把终端重启了一次之后就好了,我也很迷。。。

至此基本上就完成了所有的编译过程,可以进行到下面的步骤了。

# PSPNet运行
## 数据集准备
PSPNet论文中用到的数据集为:

- ADE20K
- Cityscapes
- PASCAL VOC2012以及PASCAL VOC2012增强数据集

## 数据集处理
作者在使用这些数据之前都进行了一系列的预处理,下面就来介绍一下:

1、ADE20K
下载完成后解压数据集可以发现它分为training和validation两个部分,每个部分都由很多子文件夹组成,具体的图片以及标签为:
在这里插入图片描述
官方对于以上的图片解释如下:
在这里插入图片描述

作者对于以上的数据集主要使用的了原始图像以及分割的标记图像(上图中的ADE_val_00000001.jpg和ADE_val_00000001_seg.png),同时作者对分割的标记图像进行了预处理,转化为代表类别的灰度图像,转化工具是ADE20K官方提供的MATLAB转换代码,对于代码的介绍可以参考官网:
在这里插入图片描述
将demo.m进行修改后可以进行转换(我自己也没有写出来,有大牛写出来了希望@我一下,万分感谢),官网代码的github上也有对应的issues:https://github.com/hszhao/PSPNet/issues/76

2、Cityscapes
作者使用的是下面这四个部分:
在这里插入图片描述
对于这个数据集的介绍可以参考:https://blog.csdn.net/Cxiazaiyu/article/details/81866173

上面的数据集同样要进行预处理,首先按照上面博文的介绍运行cityscapesscripts/helpers/labels.py脚本可以看到下面的各个类别的情况:
在这里插入图片描述
而论文中使用的类别情况也和这个默认的trainId是一致的,所以我们直接使用cityscapesscripts/preparation/createTrainIdLabelImgs.py脚本来对数据集进行转换即可,只需要将文件夹放成如下结构:

图片还在路上,稍等...


图片还在路上,稍等...

然后修改脚本中:

1
2
3
4
5
6
def main():
# Where to look for Cityscapes
if 'CITYSCAPES_DATASET' in os.environ:
cityscapesPath = os.environ['CITYSCAPES_DATASET']
else:
cityscapesPath = os.path.join(os.path.dirname(os.path.realpath(__file__)),'..','..')

部分的路径即可将原始的标签图片转换到上表所示的类别情况中

3、PSCAL VOC2012
该数据集的处理主要参考:https://blog.csdn.net/Xmo_jiao/article/details/77897109

运行及测试

在PSPNet官方代码中evaluation文件夹下的.m文件以及evaluationCode文件夹下的文件即为整个测试的所有脚本,其中最主要需要注意的分别为,可以参考:PSPNet测试代码解读

  • eval_all.m:测试的主函数
  • eval_sub.m:运行caffemodel产生预测图片
  • eval_acc.m:将预测的图片和标签图片进行对比生成测试指标

根据自己的实际情况修改脚本中的路径并按照代码将各个测试的图片放入相应的位置就可以进行测试了
主要修改的脚本为eval_all.m:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 修改路径部(这里以ADE20K为例)
isVal = true; %evaluation on valset
step = 2000; %equals to number of images divide num of GPUs in testing e.g. 500=2000/4
data_root = '/home/t7810/data/ADE20K'; %root path of dataset
eval_list = 'list/ADE20K_val.txt'; %evaluation list, refer to lists in folder 'samplelist'
save_root = '/home/t7810/data/ADE20K/mc_result/pspnet50_473/'; %root path to store the result image
model_weights = '/home/t7810/project/PSPNet-master/evaluation/model/pspnet50_ADE20K.caffemodel';
model_deploy = '/home/t7810/project/PSPNet-master/evaluation/prototxt/pspnet50_ADE20K_473.prototxt';
fea_cha = 150; %number of classes
base_size = 512; %based size for scaling
crop_size = 473; %crop size fed into network
data_class = 'objectName150.mat'; %class name
data_colormap = 'color150.mat'; %color map

......

# 修改GPU部分(因为我的电脑只有一个GPU)
gpu_id_array = [0:3]; %multi-GPUs for parfor testing, if number of GPUs is changed, remember to change the variable 'step'
runID = 1;
%gpu_num = size(gpu_id_array,2);
gpu_num = 1
index_array = [(runID-1)*gpu_num+1:runID*gpu_num];

for i = 1:gpu_num %change 'parfor' to 'for' if singe GPU testing is used
eval_sub(data_name,data_root,eval_list,model_weights,model_deploy,fea_cha,base_size,crop_size,data_class,data_colormap, ...
is_save_feat,save_gray_folder,save_color_folder,save_feat_folder,gpu_id_array(i),index_array(i),step,skipsize,scale_array,mean_r,mean_g,mean_b);
end

测试图片文件夹结构为:


图片还在路上,稍等...

测试情况如下:
1、PASCAL VOC2012:


图片还在路上,稍等...

2、Cityscapes


图片还在路上,稍等...

3、ADE20K
由于上面的转换代码没有写出来,所以测试的结果很低(不能讲预测的类别标签和真实的标签进行匹配)

PSPNet训练

本次训练首先使用PASCAL VOC数据集进行训练,该数据集对应的label标签以及数据上面的:https://blog.csdn.net/Xmo_jiao/article/details/77897109 都有对应的下载链接,所以主要涉及到的是配置文档:

配置文档

solver.prototxt
参考论文中的数据:
在这里插入图片描述
在这里插入图片描述
可有得到下面的配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
train_net: "prototxt/VOC2012_train.prototxt"

# 由于实验室只有一个GPU,没有办法将batchsize设为16,所以用下面的参数来代替
iter_size: 16

lr_policy: "poly"
power: 0.9
# 实际运行中直接将学习率设为论文中的0.01并且不采用初始化模型时会导致loss爆炸,使用下面的学习率并使用作者训好的模型来finetune时loss正常
base_lr: 1e-3

average_loss: 20
display: 20
momentum: 0.9
weight_decay: 0.0001

# imagenet
# max_iter: 150000
# PASCAL VOC
max_iter: 30000
# Cityscape
#max_iter: 90000

snapshot: 1000
snapshot_prefix: "/evaluation/snapshot/voc2012"
solver_mode: GPU

train.prototxt
该训练网络要考虑下面这些问题:
在这里插入图片描述

主要参考的是:
https://github.com/SoonminHwang/caffe-segmentation/tree/master/pspnet/models
里面分享的网络

run.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/sh

cd ../
## MODIFY PATH for YOUR SETTING
CAFFE_DIR=/home/t7810/project/PSPNet-master
CONFIG_DIR=${CAFFE_DIR}/evaluation/prototxt
MODEL_DIR=${CAFFE_DIR}/evaluation/model
CAFFE_BIN=${CAFFE_DIR}/build/tools/caffe
DEV_ID=0

sudo ${CAFFE_BIN} train \
-solver=${CONFIG_DIR}/VOC2012_solver.prototxt \
-weights=${MODEL_DIR}/pspnet101_VOC2012.caffemodel \
-gpu=${DEV_ID} \
2>&1 | tee ${CAFFE_DIR}/evaluation/snapshot/train.log

数据增强

按照上面论文中的截图可以看到作者使用了以下数据增强:

  • 随机镜像
  • 随机在[0.5, 2]之间进行resize
  • 随机在[-10, 10]之间随机旋转
  • 随机高斯模糊

下面是进行数据增强的脚本:
其中主要是旋转部分考虑到旋转后的图像会产生黑边,而label图像每一个像素点都是类别特征,所以黑边可能会影响到训练,因此参考:
https://blog.csdn.net/YhL_Leo/article/details/51510432 这篇博文求每次旋转后图片的内接最大矩形,然后将博文中的C++代码用python改写成下面旋转处理部分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
"""
This script shows how to relize data augmentation in PSPNet.
1. random mirror for all datasets
2. random resize between 0.5 and 2 for all datasets
3. random rotation between -10 and 10 degrees for ImageNet(ADE20K) and PASCAL VOC
4. random Gaussian blur for ImageNet(ADE20K) and PASCAL VOC
"""

from __future__ import division
import os
import numpy as np
import re
import cv2
import math
import shutil

dataset_list = ['ADE20K', 'cityscapes', 'VOC2012']
dataset_name = dataset_list[0]
DATA_ROOT = "/Users/camlin_z/Data/data" # change to your data root directory
img_data_dir_name = "images" # original images directory name
anno_data_dir_name = "annotation" # label images directory name

img_data_dir = os.path.join(DATA_ROOT, img_data_dir_name)
anno_data_dir = os.path.join(DATA_ROOT, anno_data_dir_name)
# rotate image global param: judge the cols is bigger than rows or not(image is vertical or horizon)
isColBigger = True

def mkr(dr):
if not os.path.exists(dr):
os.mkdir(dr)

# 1. random mirror
def mirror_process(img_name, anno_name, mirror_img_data_dir, mirror_anno_data_dir):
"""
mirror process
"""
img = cv2.imread(os.path.join(img_data_dir, img_name))
anno = cv2.imread(os.path.join(anno_data_dir ,anno_name))

flip_flag = np.random.randint(-1, 2)
img_mirror = cv2.flip(img, flip_flag)
cv2.imwrite(os.path.join(mirror_img_data_dir, img_name), img_mirror)
anno_mirror = cv2.flip(anno, flip_flag)
cv2.imwrite(os.path.join(mirror_anno_data_dir, anno_name), anno_mirror)

# 2. random resize between 0.5 and 2 for all datasets
def resize_process(img_name, anno_name, resize_img_data_dir, resize_anno_data_dir):
"""
resize process
"""
img = cv2.imread(os.path.join(img_data_dir, img_name))
anno = cv2.imread(os.path.join(anno_data_dir, anno_name))
height, width = img.shape[:2]

resize_flag = np.random.uniform(0.5, 2)
new_height = int(height * resize_flag)
new_width = int(width * resize_flag)
img_resize = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC)
cv2.imwrite(os.path.join(resize_img_data_dir, img_name), img_resize)
anno_resize = cv2.resize(anno, (new_width, new_height), interpolation=cv2.INTER_CUBIC)
cv2.imwrite(os.path.join(resize_anno_data_dir, anno_name), anno_resize)

# 3. random rotation between -10 and 10 degrees
# reference material: https://blog.csdn.net/YhL_Leo/article/details/51510432
def rotate_process(img_name, anno_name, rotate_img_data_dir, rotate_anno_data_dir):
"""
rotate process
"""
img = cv2.imread(os.path.join(img_data_dir, img_name))
anno = cv2.imread(os.path.join(anno_data_dir, anno_name))

rows, cols = img.shape[:2]
if cols < rows:
isColBigger = False

# Notes: clockwise rotation is negative and anticlockwise rotation is positive
theta = np.random.randint(-10, 10)
if theta < 0:
theta += 360

# rotate image
# Notes: param "-theta" for accord with our intuition, make clockwise rotation positive
M_rotation = cv2.getRotationMatrix2D((cols / 2, rows / 2), -theta, 1)
img_rotated = cv2.warpAffine(img, M_rotation, (cols, rows))
anno_rotated = cv2.warpAffine(anno, M_rotation, (cols, rows))

# compute max rect
theta -= int(theta / 180) * 180
vertex = np.array([[-(cols-1)/2, (rows-1)/2], [(cols-1)/2, (rows-1)/2],
[(cols-1)/2, -(rows-1)/2], [-(cols-1)/2,-(rows-1)/2]])

if theta > 0 and theta < 90:
rMat = rotateMat(theta)
r_vertext = rotateVertex(vertex, rMat)
maxp = getCrossPoint(r_vertext)

if maxp[0] > cols/2 or maxp[1] > rows/2:
maxp = getSpecialCrossPoint(r_vertext)

maxp[0] = maxp[0] if maxp[0] < cols/2 else cols/2
maxp[1] = maxp[1] if maxp[1] < rows/2 else rows/2

# Notes: first slice param is rows(height) scale, second is cols(width) scale
img_rot = img_rotated[int(rows / 2 - maxp[1]): int(rows / 2 - maxp[1] + 2 * abs(maxp[1])),
int(cols / 2 - maxp[0]): int(cols / 2 - maxp[0] + 2 * abs(maxp[0]))]
anno_rot = anno_rotated[int(rows / 2 - maxp[1]): int(rows / 2 - maxp[1] + 2 * abs(maxp[1])),
int(cols / 2 - maxp[0]): int(cols / 2 - maxp[0] + 2 * abs(maxp[0]))]
cv2.imwrite(os.path.join(rotate_img_data_dir, img_name), img_rot)
cv2.imwrite(os.path.join(rotate_anno_data_dir, anno_name), anno_rot)

elif theta == 90:
if cols > rows:
img_rot = img_rotated[0:rows, int((cols-rows)/2):int((cols-rows)/2 + rows)]
anno_rot = anno_rotated[0:rows, int((cols-rows)/2):int((cols-rows)/2 + rows)]
else:
img_rot = img_rotated[0:cols, int((rows-cols)/2):int((rows-cols)/2+cols)]
anno_rot = anno_rotated[0:cols, int((rows-cols)/2):int((rows-cols)/2+cols)]
cv2.imwrite(os.path.join(rotate_img_data_dir, img_name), img_rot)
cv2.imwrite(os.path.join(rotate_anno_data_dir, anno_name), anno_rot)

elif theta > 90:
theta2 = 180 - theta
rMat = rotateMat(theta2)
r_vertext = rotateVertex(vertex, rMat)
maxp = getCrossPoint(r_vertext)

if maxp[0] > cols/2 or maxp[1] > rows/2:
maxp = getSpecialCrossPoint(r_vertext)

maxp[0] = maxp[0] if maxp[0] < cols / 2 else cols / 2
maxp[1] = maxp[1] if maxp[1] < rows / 2 else rows / 2

img_rot = img_rotated[int(rows / 2 - maxp[1]): int(rows / 2 - maxp[1] + 2 * abs(maxp[1])),
int(cols / 2 - maxp[0]): int(cols / 2 - maxp[0] + 2 * abs(maxp[0]))]
anno_rot = anno_rotated[int(rows / 2 - maxp[1]): int(rows / 2 - maxp[1] + 2 * abs(maxp[1])),
int(cols / 2 - maxp[0]): int(cols / 2 - maxp[0] + 2 * abs(maxp[0]))]
cv2.imwrite(os.path.join(rotate_img_data_dir, img_name), img_rot)
cv2.imwrite(os.path.join(rotate_anno_data_dir, anno_name), anno_rot)

else:
cv2.imwrite(os.path.join(rotate_img_data_dir, img_name), img_rotated)
cv2.imwrite(os.path.join(rotate_anno_data_dir, anno_name), anno_rotated)

def rotateMat(radian):
"""
compute rotate matrix
:param radian: radian system angle
"""
alpha = radian
alpha *= np.pi / 180
# below rotate matrix is different from Wikipedia
# ([[math.cos(alpha), -math.sin(alpha)], [math.sin(alpha), math.cos(alpha)]])
# like the above "Notes: param "-theta" ", make clockwise rotation is positive
return np.array([[math.cos(alpha), math.sin(alpha)], [-math.sin(alpha), math.cos(alpha)]])

def rotateVertex(vertexs, rMat):
"""
compute rectangle coordinate after rotated "rt" by rotate matrix "rMat"
:param vertexs: original rectangle coordinate
"""
rt = vertexs
for i in range(vertexs.shape[0]):
v_i = np.array([[vertexs[i][0]], [vertexs[i][1]]])
v_r = np.matmul(rMat, v_i)
rt[i] = (v_r[0][0], v_r[1][0])
return rt

def getCrossPoint(vertexs):
ln_ab = lineFunction(vertexs)
return getMaxRectRegion(ln_ab)

def getSpecialCrossPoint(vertexs):
line0_1 = lineFunction(vertexs[0], vertexs[1])
line1_2 = lineFunction(vertexs[1], vertexs[2])

(a1, b1, c1) = line0_1
(a2, b2, c2) = line1_2
x = -(a1*c2 + a2*c1) / (a2*b1 + a1*b2)
y = -(b1*x+c1) / a1
return np.array([x, y])

def lineFunction(v1, v2=np.array([])):
if v2.shape[0] == 0:
pa = v1[0]
pb = v1[1]
if not isColBigger:
pb = v1[3]
else:
pa = v1
pb = v2

delta_x = pa[0] - pb[0]
delta_y = pa[1] - pb[1]

line = np.array([delta_x, -delta_y, -pb[1] * delta_x + pb[0] * delta_y])

# normalization param
m_line = np.sqrt(line[0] * line[0] + line[1] * line[1])
# compute the param a, b, s in the blog
line *= 1 / m_line

# assume a >= 0
if line[0] < 0:
line = -line

return line

def getMaxRectRegion(line):
if line[0] != 0 and line[1] != 0:
(a, b, c) = line

if not isColBigger:
b *= -1
return np.array([-c/(2*b), -c/(2*a)])
else:
return np.array([0, 0])

def getImageRange(vertexs):
pMin = np.array(0, 0)
pMax = np.array(0, 0)
for i in range(vertexs.shape[0]):
pMin[0] = pMin[0] if pMin[0] < vertexs[i][0] else vertexs[i][0]
pMin[1] = pMin[1] if pMin[1] < vertexs[i][1] else vertexs[i][1]
pMax[0] = pMax[0] if pMax[0] < vertexs[i][0] else vertexs[i][0]
pMax[1] = pMax[1] if pMax[1] < vertexs[i][1] else vertexs[i][1]
return ([pMax[0] - pMin[0] + 1, pMax[1] - pMin[1] + 1])

# 4. random Gaussian blur for ImageNet(ADE20K) and PASCAL VOC
def gaussian_process(img_name, anno_name, gaussian_img_data_dir, gaussian_anno_data_dir):
"""
gaussian blur process
"""

img = cv2.imread(os.path.join(img_data_dir, img_name))
anno = cv2.imread(os.path.join(anno_data_dir, anno_name))

# gaussian blur param
kernel_size_list = [3, 5, 7]
kernel_size = kernel_size_list[np.random.randint(0, 3)]
sigma = np.random.uniform(0, 10)

img_gaussian = cv2.GaussianBlur(img, (kernel_size, kernel_size), sigma)
anno_gaussian = cv2.GaussianBlur(anno, (kernel_size, kernel_size), sigma)

cv2.imwrite(os.path.join(gaussian_img_data_dir, img_name), img_gaussian)
cv2.imwrite(os.path.join(gaussian_anno_data_dir, anno_name), anno_gaussian)


def main():
"""
The main entrance
"""

######################### make new director ######################
# mirrior data director
mirror_img_data_dir = os.path.join(DATA_ROOT, img_data_dir_name + "_mirror")
mirror_anno_data_dir = os.path.join(DATA_ROOT, anno_data_dir_name + "_mirror")
mkr(mirror_img_data_dir)
mkr(mirror_anno_data_dir)

# resize data director
resize_img_data_dir = os.path.join(DATA_ROOT, img_data_dir_name + "_resize")
resize_anno_data_dir = os.path.join(DATA_ROOT, anno_data_dir_name + "_resize")
mkr(resize_img_data_dir)
mkr(resize_anno_data_dir)

# rotate data director
rotate_img_data_dir = os.path.join(DATA_ROOT, img_data_dir_name + "_rotate")
rotate_anno_data_dir = os.path.join(DATA_ROOT, anno_data_dir_name + "_rotate")
mkr(rotate_img_data_dir)
mkr(rotate_anno_data_dir)

# gaussian data director
gaussian_img_data_dir = os.path.join(DATA_ROOT, img_data_dir_name + "_gaussian")
gaussian_anno_data_dir = os.path.join(DATA_ROOT, anno_data_dir_name + "_gaussian")
mkr(gaussian_img_data_dir)
mkr(gaussian_anno_data_dir)

if dataset_name == 'cityscapes':
shutil.rmtree(rotate_img_data_dir)
shutil.rmtree(rotate_anno_data_dir)
shutil.rmtree(gaussian_img_data_dir)
shutil.rmtree(gaussian_anno_data_dir)

####################### main process loop ######################
for _, _, anno_name_list in os.walk(anno_data_dir):
for anno_name in anno_name_list:

# generate image name by annotation name
if dataset_name == 'ADE20K':
img_name = re.split(r'.seg', anno_name)[0] + ".jpg"
elif dataset_name == 'cityscapes':
img_name = re.split(r'.leftImg8bit', anno_name)[0] + "_gtFine_labelTrainIds.png"
elif dataset_name == 'VOC2012':
img_name = anno_name.split('.')[0] + ".jpg"
else:
print "wrong dataset name!"
return

print "Processing: ", img_name
if not (os.path.exists(os.path.join(img_data_dir, img_name)) and os.path.join(mirror_anno_data_dir, anno_name)):
print "file not exists!"
continue

# process image and annotation
mirror_process(img_name, anno_name, mirror_img_data_dir, mirror_anno_data_dir)
resize_process(img_name, anno_name, resize_img_data_dir, resize_anno_data_dir)
if dataset_name == 'ADE20K' or dataset_name == 'VOC2012':
rotate_process(img_name, anno_name, rotate_img_data_dir, rotate_anno_data_dir)
gaussian_process(img_name, anno_name, gaussian_img_data_dir, gaussian_anno_data_dir)


if __name__ == '__main__':
main()

对应的标签处理脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import os

root = "/home/t7810/data/VOC2012_train"
path_old_file = "VOC2012_train.txt"
path_new_file = "VOC2012_train_aug.txt"
dir_suffix_list = ['_mirror', '_resize', '_rotate', '_gaussian']

file_new = open(os.path.join(root, path_new_file), 'w')

with open(os.path.join(root, path_old_file)) as file:
for line in file:
# for PASCAL VOC 2012
file_new.write(line)
line_split = line.strip().split()
(_, img_dir_name, img_name) = line_split[0].strip().split('/')
(_, anno_dir_name, anno_name) = line_split[1].strip().split('/')

for dir_suffix in dir_suffix_list:
img_dir_name_new = img_dir_name + dir_suffix
anno_dir_name_new = anno_dir_name + dir_suffix
file_new.write("/" + img_dir_name_new + "/" + img_name + " "
+ "/" + anno_dir_name_new + "/" + anno_name + "\n")

进行数据增强后的图片文件夹结构为:


图片还在路上,稍等...

按照上面的配置可以进行训练:
在这里插入图片描述
这是我用原始模型进行finetune一天之后的情况,很明显没有收敛,其中第一个精度是整个网络的精度,后面的两个loss分别是总体的和拉出一个分支的loss,下面的精度分别对应21类的精度,由于每一轮的batchsize为1,所以每一次只有1-2个类的精度会显示出来,明显很低,而且用中间保存的caffemodel进行测试结果也很不好,如果有哪位大神完美复现了,可以一起交流学习一下,后续自己也会继续探索,感恩!!!

-------------本文结束感谢您的阅读-------------