文章目录:
一. 神经网络评价指标
二.图像分类loss曲线绘制
1.数据集介绍
2.训练过程
3.绘制loss和accuracy曲线
三.图像分类准确率、召回率、F值计算
1.预测
2.计算
四.总结
https://github.com/eastmountyxz/
AI-for-TensorFlow
https://github.com/eastmountyxz/
AI-for-Keras
学Python近八年,认识了很多大佬和朋友,感恩。作者的本意是帮助更多初学者入门,因此在github开源了所有代码,也在公众号同步更新。深知自己很菜,得拼命努力前行,编程也没有什么捷径,干就对了。希望未来能更透彻学习和撰写文章,也能在读博几年里学会真正的独立科研。同时非常感谢参考文献中的大佬们的文章和分享。
- https://blog.csdn.net/eastmount
True Positive(TP):正确预测出的正样本个数(预测为正例,实际为正例)
False Positive(FP):错误预测出的正样本个数(本来是负样本,被预测成正样本)
True Negative(TN):正确预测出的负样本个数(预测为负例,实际为负例)
False Negative(FN):错误预测出的负样本个数(本来是正样本,被预测成负样本)
from sklearn.metrics import r2_score
y_true = [1,2,4]
y_pred = [1.3,2.5,3.7]
r2_score(y_true,y_pred)
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 7 13:39:19 2020
@author: xiuzhang Eastmount CSDN
"""
import os
import glob
import cv2
import numpy as np
import tensorflow as tf
# 定义图片路径
path = 'photo/'
#---------------------------------第一步 读取图像-----------------------------------
def read_img(path):
cate = [path + x for x in os.listdir(path) if os.path.isdir(path + x)]
imgs = []
labels = []
fpath = []
for idx, folder in enumerate(cate):
# 遍历整个目录判断每个文件是不是符合
for im in glob.glob(folder + '/*.jpg'):
#print('reading the images:%s' % (im))
img = cv2.imread(im) #调用opencv库读取像素点
img = cv2.resize(img, (32, 32)) #图像像素大小一致
imgs.append(img) #图像数据
labels.append(idx) #图像类标
fpath.append(path+im) #图像路径名
#print(path+im, idx)
return np.asarray(fpath, np.string_), np.asarray(imgs, np.float32), np.asarray(labels, np.int32)
# 读取图像
fpaths, data, label = read_img(path)
print(data.shape) # (1000, 256, 256, 3)
# 计算有多少类图片
num_classes = len(set(label))
print(num_classes)
# 生成等差数列随机调整图像顺序
num_example = data.shape[0]
arr = np.arange(num_example)
np.random.shuffle(arr)
data = data[arr]
label = label[arr]
fpaths = fpaths[arr]
# 拆分训练集和测试集 80%训练集 20%测试集
ratio = 0.8
s = np.int(num_example * ratio)
x_train = data[:s]
y_train = label[:s]
fpaths_train = fpaths[:s]
x_val = data[s:]
y_val = label[s:]
fpaths_test = fpaths[s:]
print(len(x_train),len(y_train),len(x_val),len(y_val)) #800 800 200 200
print(y_val)
#---------------------------------第二步 建立神经网络-----------------------------------
# 定义Placeholder
xs = tf.placeholder(tf.float32, [None, 32, 32, 3]) #每张图片32*32*3个点
ys = tf.placeholder(tf.int32, [None]) #每个样本有1个输出
# 存放DropOut参数的容器
drop = tf.placeholder(tf.float32) #训练时为0.25 测试时为0
# 定义卷积层 conv0
conv0 = tf.layers.conv2d(xs, 20, 5, activation=tf.nn.relu) #20个卷积核 卷积核大小为5 Relu激活
# 定义max-pooling层 pool0
pool0 = tf.layers.max_pooling2d(conv0, [2, 2], [2, 2]) #pooling窗口为2x2 步长为2x2
print("Layer0:\n", conv0, pool0)
# 定义卷积层 conv1
conv1 = tf.layers.conv2d(pool0, 40, 4, activation=tf.nn.relu) #40个卷积核 卷积核大小为4 Relu激活
# 定义max-pooling层 pool1
pool1 = tf.layers.max_pooling2d(conv1, [2, 2], [2, 2]) #pooling窗口为2x2 步长为2x2
print("Layer1:\n", conv1, pool1)
# 将3维特征转换为1维向量
flatten = tf.layers.flatten(pool1)
# 全连接层 转换为长度为400的特征向量
fc = tf.layers.dense(flatten, 400, activation=tf.nn.relu)
print("Layer2:\n", fc)
# 加上DropOut防止过拟合
dropout_fc = tf.layers.dropout(fc, drop)
# 未激活的输出层
logits = tf.layers.dense(dropout_fc, num_classes)
print("Output:\n", logits)
# 定义输出结果
predicted_labels = tf.arg_max(logits, 1)
#---------------------------------第三步 定义损失函数和优化器---------------------------------
# 利用交叉熵定义损失
losses = tf.nn.softmax_cross_entropy_with_logits(
labels = tf.one_hot(ys, num_classes), #将input转化为one-hot类型数据输出
logits = logits)
# 平均损失
mean_loss = tf.reduce_mean(losses)
# 定义优化器 学习效率设置为0.0001
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(losses)
#------------------------------------第四步 模型训练和预测-----------------------------------
# 用于保存和载入模型
saver = tf.train.Saver()
# 训练或预测
train = True
# 模型文件路径
model_path = "model/image_model"
with tf.Session() as sess:
if train:
print("训练模式")
# 训练初始化参数
sess.run(tf.global_variables_initializer())
# 定义输入和Label以填充容器 训练时dropout为0.25
train_feed_dict = {
xs: x_train,
ys: y_train,
drop: 0.25
}
# 训练学习1000次
for step in range(1000):
_, mean_loss_val = sess.run([optimizer, mean_loss], feed_dict=train_feed_dict)
if step % 20 == 0: #每隔20次输出一次结果
# 训练准确率
pre = sess.run(predicted_labels, feed_dict=train_feed_dict)
accuracy = 1.0*sum(y_train==pre) / len(pre)
print("{},{},{}".format(step, mean_loss_val,accuracy))
# 保存模型
saver.save(sess, model_path)
print("训练结束,保存模型到{}".format(model_path))
else:
print("测试模式")
# 测试载入参数
saver.restore(sess, model_path)
print("从{}载入模型".format(model_path))
# label和名称的对照关系
label_name_dict = {
0: "人类",
1: "沙滩",
2: "建筑",
3: "公交",
4: "恐龙",
5: "大象",
6: "花朵",
7: "野马",
8: "雪山",
9: "美食"
}
# 定义输入和Label以填充容器 测试时dropout为0
test_feed_dict = {
xs: x_val,
ys: y_val,
drop: 0
}
# 真实label与模型预测label
predicted_labels_val = sess.run(predicted_labels, feed_dict=test_feed_dict)
for fpath, real_label, predicted_label in zip(fpaths_test, y_val, predicted_labels_val):
# 将label id转换为label名
real_label_name = label_name_dict[real_label]
predicted_label_name = label_name_dict[predicted_label]
print("{}\t{} => {}".format(fpath, real_label_name, predicted_label_name))
# 评价结果
print("正确预测个数:", sum(y_val==predicted_labels_val))
print("准确度为:", 1.0*sum(y_val==predicted_labels_val) / len(y_val))
k = 0
while k < len(y_val):
print(y_val[k], predicted_labels_val[k])
k = k + 1
(1000, 32, 32, 3)
10
800 800 200 200
[4 4 3 0 0 0 8 3 8 6 7 1 7 7 9 0 4 7 0 6 0 7 7 0 9 5 4 3 5 1 2 2 8 2 8 5 1
7 8 7 1 7 7 2 6 4 0 9 0 6 1 1 2 7 4 3 9 6 2 2 1 2 3 3 4 1 6 0 5 3 0 4 8 1
8 1 6 5 9 3 6 9 8 4 2 7 2 9 2 0 3 3 0 8 6 5 0 4 4 2 7 2 4 4 3 5 9 6 8 0 9
0 4 6 9 9 3 5 0 9 8 1 4 1 8 5 3 2 6 5 1 9 0 2 1 9 9 3 0 8 5 7 8 8 3 4 4 4
0 5 6 2 8 1 5 5 8 9 7 2 0 8 6 1 5 8 9 9 2 8 2 6 0 7 8 0 2 1 9 0 4 3 1 9 0
0 4 3 3 3 3 1 8 8 1 5 9 8 0 9]
训练模式
0,62.20244216918945,0.12
20,8.619616508483887,0.3625
40,3.896609306335449,0.545
...
940,0.0003522337938193232,1.0
960,0.00033640244510024786,1.0
980,0.00032152896164916456,1.0
训练结束,保存模型到model/image_model
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import host_subplot
# 读取文件数据
fp = open('train_data.txt', 'r')
# 迭代次数 整体误差 正确率
train_iterations = []
train_loss = []
test_accuracy = []
# 解析数据
for line in fp.readlines():
con = line.strip('\n').split(',')
print(con)
train_iterations.append(int(con[0]))
train_loss.append(float(con[1]))
test_accuracy.append(float(con[2]))
# 绘制曲线图
host = host_subplot(111)
plt.subplots_adjust(right=0.8) # ajust the right boundary of the plot window
par1 = host.twinx()
# 设置类标
host.set_xlabel("iterations")
host.set_ylabel("loss")
par1.set_ylabel("validation accuracy")
# 绘制曲线
p1, = host.plot(train_iterations, train_loss, "b-", label="training loss")
p2, = host.plot(train_iterations, train_loss, ".") #曲线点
p3, = par1.plot(train_iterations, test_accuracy, label="validation accuracy")
p4, = par1.plot(train_iterations, test_accuracy, "1")
# 设置图标
# 1->rightup corner, 2->leftup corner, 3->leftdown corner
# 4->rightdown corner, 5->rightmid ...
host.legend(loc=5)
# 设置颜色
host.axis["left"].label.set_color(p1.get_color())
par1.axis["right"].label.set_color(p3.get_color())
# 设置范围
host.set_xlim([-10, 1000])
plt.draw()
plt.show()
(1000, 32, 32, 3)
10
800 800 200 200
[9 4 8 7 0 7 5 7 1 4 9 3 0 5 8 0 0 2 5 8 7 4 7 8 8 9 4 1 6 7 8 4 8 4 9 9 6
1 6 7 9 8 6 3 1 8 7 8 0 4 6 9 8 5 2 6 0 0 1 9 9 6 8 1 5 9 1 1 6 0 1 7 2 1
7 1 8 7 9 7 7 5 1 0 6 0 1 5 5 0 7 5 8 6 7 7 5 0 9 7 8 9 7 3 0 9 2 4 7 9 1
7 0 2 2 5 6 5 1 0 9 5 9 7 0 6 2 5 4 4 2 6 8 6 2 5 7 1 5 0 0 4 5 7 9 3 5 5
4 6 1 3 9 9 7 5 6 9 2 3 3 2 4 1 4 8 2 7 3 4 3 9 1 5 7 6 4 2 6 4 0 0 4 5 1
7 2 4 6 6 2 4 1 7 5 0 6 8 3 7]
测试模式
INFO:tensorflow:Restoring parameters from model/image_model
从model/image_model载入模型
b'photo/photo/9\\960.jpg' 美食 => 美食
b'photo/photo/4\\414.jpg' 恐龙 => 恐龙
b'photo/photo/8\\809.jpg' 雪山 => 雪山
b'photo/photo/7\\745.jpg' 野马 => 大象
b'photo/photo/0\\12.jpg' 人类 => 人类
...
b'photo/photo/0\\53.jpg' 人类 => 人类
b'photo/photo/6\\658.jpg' 花朵 => 花朵
b'photo/photo/8\\850.jpg' 雪山 => 雪山
b'photo/photo/3\\318.jpg' 公交 => 美食
b'photo/photo/7\\796.jpg' 野马 => 野马
正确预测个数: 181
准确度为: 0.905
9 9
4 4
8 8
...
6 6
8 8
3 9
7 7
读取数据集
分别计算0-9类(共10类)不同类标正确识别的个数和总识别的个数
按照第一部分的公式计算准确率、召回率和F值
调用matplotlib库绘制对比柱状图
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 7 13:39:19 2020
@author: xiuzhang Eastmount CSDN
"""
import numpy as np
import matplotlib.pyplot as plt
#--------------------------------------------------------------------------
# 第一部分 计算准确率 召回率 F值
#--------------------------------------------------------------------------
# 读取文件数据
fp = open('test_data.txt', 'r')
# 迭代次数 整体误差 正确率
real = []
pre = []
# 解析数据
for line in fp.readlines():
con = line.strip('\n').split(' ')
#print(con)
real.append(int(con[0])) #真实类标
pre.append(int(con[1])) #预测类标
# 计算各类结果 共10类图片
real_10 = list(range(0, 10)) #真实10个类标数量的统计
pre_10 = list(range(0, 10)) #预测10个类标数量的统计
right_10 = list(range(0, 10)) #预测正确的10个类标数量
k = 0
while k < len(real):
v1 = int(real[k])
v2 = int(pre[k])
print(v1, v2)
real_10[v1] = real_10[v1] + 1 # 计数
pre_10[v2] = pre_10[v2] + 1 # 计数
if v1==v2:
right_10[v1] = right_10[v1] + 1
k = k + 1
print("统计各类数量")
print(real_10, pre_10, right_10)
# 准确率 = 正确数 / 预测数
precision = list(range(0, 10))
k = 0
while k < len(real_10):
value = right_10[k] * 1.0 / pre_10[k]
precision[k] = value
k = k + 1
print(precision)
# 召回率 = 正确数 / 真实数
recall = list(range(0, 10))
k = 0
while k < len(real_10):
value = right_10[k] * 1.0 / real_10[k]
recall[k] = value
k = k + 1
print(recall)
# F值 = 2*准确率*召回率/(准确率+召回率)
f_measure = list(range(0, 10))
k = 0
while k < len(real_10):
value = (2 * precision[k] * recall[k] * 1.0) / (precision[k] + recall[k])
f_measure[k] = value
k = k + 1
print(f_measure)
#--------------------------------------------------------------------------
# 第二部分 绘制曲线
#--------------------------------------------------------------------------
# 设置类别
n_groups = 10
fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.2
opacity = 0.4
error_config = {'ecolor': '0.3'}
#用来正常显示中文标签
plt.rcParams['font.sans-serif']=['SimHei']
# 绘制
rects1 = ax.bar(index, precision, bar_width,
alpha=opacity, color='b',
error_kw=error_config,
label='precision')
rects2 = ax.bar(index + bar_width, recall, bar_width,
alpha=opacity, color='m',
error_kw=error_config,
label='recall')
rects3 = ax.bar(index + bar_width + bar_width, f_measure, bar_width,
alpha=opacity, color='r',
error_kw=error_config,
label='f_measure')
# 设置标签
ax.set_xticks(index + 3 * bar_width / 3)
ax.set_xticklabels(('0-人类', '1-沙滩', '2-建筑', '3-公交', '4-恐龙',
'5-大象', '6-花朵', '7-野马', '8-雪山', '9-美食'))
# 设置类标
ax.legend()
plt.xlabel("类标")
plt.ylabel("评价")
fig.tight_layout()
plt.savefig('result.png', dpi=200)
plt.show()
统计各类数量
[21, 22, 17, 13, 24, 28, 27, 36, 26, 31]
[19, 23, 18, 12, 24, 30, 29, 34, 25, 31]
[17, 19, 15, 11, 24, 26, 27, 34, 24, 29]
[0.8947368421052632, 0.8260869565217391, 0.8333333333333334, 0.9166666666666666, 1.0,
0.8666666666666667, 0.9310344827586207, 1.0, 0.96, 0.9354838709677419]
[0.8095238095238095, 0.8636363636363636, 0.8823529411764706, 0.8461538461538461, 1.0,
0.9285714285714286, 1.0, 0.9444444444444444, 0.9230769230769231, 0.9354838709677419]
[0.8500000000000001, 0.8444444444444444, 0.8571428571428571, 0.8799999999999999, 1.0,
0.896551724137931, 0.9642857142857143, 0.9714285714285714, 0.9411764705882353, 0.9354838709677419]
十三.如何评价神经网络、loss曲线图绘制、图像分类案例的F值计算
天行健,君子以自强不息。
地势坤,君子以厚德载物。
(By:Eastmount 2022-01-19 夜于贵阳)
[1] 冈萨雷斯著. 数字图像处理(第3版)[M]. 北京:电子工业出版社,2013.
[2] 杨秀璋, 颜娜. Python网络数据爬取及分析从入门到精通(分析篇)[M]. 北京:北京航天航空大学出版社, 2018.
[3] 罗子江等. Python中的图像处理[M]. 科学出版社, 2020.
[4]“莫烦大神” 网易云视频地址
[5] https://study.163.com/course/courseLearn.htm?courseId=1003209007
[6] TensorFlow【极简】CNN - Yellow_python大神
[7] https://github.com/siucaan/CNN_MNIST
[8] https://github.com/eastmountyxz/AI-for-TensorFlow
[9]《机器学习》周志华
[10] 神经网络模型的评价指标 - ZHANG ALIN
[11] [深度学习] 分类指标accuracy,recall,precision等的区别 - z小白
[12] 分类指标准确率(Precision)和正确率(Accuracy)的区别 - mxp_neu
[13] 学习笔记2:scikit-learn中使用r2_score评价回归模型 - Softdiamonds
[14] 方差、协方差、标准差、均方差、均方根值、均方误差、均方根误差对比分析 - cqfdcw
[15] 机器学习:衡量线性回归法的指标(MSE、RMSE、MAE、R Squared)- volcao