keras siamese 简单样例 详细注释
这份代码是来自keras的官方的代码,有关siames网络如何使用keras编写。 使用的数据集是mnist。 我打了详细注释。方便理解。
这里再给出这个网络的一个插图:
'''Trains a Siamese MLP on pairs of digits from the MNIST dataset.
It follows Hadsell-et-al.'06 [1] by computing the Euclidean distance on the
output of the shared network and by optimizing the contrastive loss (see paper
for more details).
# References
- Dimensionality Reduction by Learning an Invariant Mapping
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
Gets to 97.2% test accuracy after 20 epochs.
2 seconds per epoch on a Titan X Maxwell GPU
'''
from __future__ import absolute_import
from __future__ import print_function
import numpy as np
import random
import keras
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Flatten, Dense, Dropout, Lambda
from keras.optimizers import RMSprop
from keras import backend as K
num_classes = 10
epochs = 20
def euclidean_distance(vects):
'''
计算欧式距离?
K.epsilon() 一个很小的数,防止除零错误,maximum取最大值?
其实这里定义的就是,相关滤波器,用来计算两个图像feature的相似度,
输出一个相应图像response of windows ROW
response of a candidate window 候选窗口的相应图
emmm,都行
:param vects:
:return:
'''
# vects 是个list,里边有两个元素,process_a的结果,第二个是process_b的结果
x, y = vects
sum_square = K.sum(K.square(x - y), axis=1, keepdims=True)
return K.sqrt(K.maximum(sum_square, K.epsilon()))
def eucl_dist_output_shape(shapes):
'''
传过来的是输入的大小,
功能是计算输出的大小,告诉网络,
这里是将将输出大小直接定位了1
shape1[0]是指的有多少输入图像
:param shapes:
:return:
'''
shape1, shape2 = shapes
return (shape1[0], 1)
def contrastive_loss(y_true, y_pred):
'''Contrastive loss from Hadsell-et-al.'06
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
对比损失
自定义的loss
这里要注意,y_true是一个0,1标签的整数,
而y_pred是我们由siamese网络通过计算两个feature之间的欧氏距离得出的一个结果
这个结果的范围是(0~+无穷)
那么可知,margin-y_pred的结果预测为:当y_pred<=1时,margin-y_pred取得正值,答案为0-1的一个小数
而当y_pred>1时,margin-y_pred取负值,此时maximum得到0.
即当我们给出的是负样本,y_true=0时:
1.预测的距离比较近时,margin-y_pred比较大,loss较大
2.预测的距离比较远时,margin-y_pred=0,loss=0
'''
margin = 1
square_pred = K.square(y_pred)
margin_square = K.square(K.maximum(margin - y_pred, 0))
return K.mean(y_true * square_pred + (1 - y_true) * margin_square)
def create_pairs(x, digit_indices):
'''Positive and negative pair creation.
Alternates between positive and negative pairs.
传过来了所有训练数据x和分类标签digit——indices
'''
pairs = []
labels = []
# 找出数量最少的那一类的总数量减一
n = min([len(digit_indices[d]) for d in range(num_classes)]) - 1
# 遍历所有类别
for d in range(num_classes):
# 遍历所有图像,因为是取得最少的那个,所以每个类别都至少有n个,不会缺失
for i in range(n):
# 取出同一类别中连续的两个,组成一对,但z1,z2现在还是图像序号,下面用x[no],转化为了图像
z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
# 组成一对同类型的图像
pairs += [[x[z1], x[z2]]]
# 随机找一个增量数字,是1-9中的数
inc = random.randrange(1, num_classes)
# 保证随机到一个其他类别的数字
dn = (d + inc) % num_classes
# 从自己类别的当前第i个和其他类别第i个组成一对
z1, z2 = digit_indices[d][i], digit_indices[dn][i]
# 组成一对不同类型的图像
pairs += [[x[z1], x[z2]]]
# 标签打上
labels += [1, 0]
# 最后paires是n*10*2=20n组数据,labels是10n组数据,每个数据是一个[1, 0]
# 实际中大小为
# (108400, 2, 28, 28)
# (108400,)
return np.array(pairs), np.array(labels)
def create_base_network(input_shape):
'''Base network to be shared (eq. to feature extraction).
传过来的是单个数据的形状
(28, 28)
'''
input = Input(shape=input_shape)
# 变成了(?, 28, 28)的形状
x = Flatten()(input)
x = Dense(128, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(128, activation='relu')(x)
return Model(input, x)
def compute_accuracy(y_true, y_pred):
'''Compute classification accuracy with a fixed threshold on distances.
'''
pred = y_pred.ravel() < 0.5
return np.mean(pred == y_true)
def accuracy(y_true, y_pred):
'''Compute classification accuracy with a fixed threshold on distances.
'''
return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))
if __name__ == '__main__':
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
# x_train = np.reshape(x_train, (60000, 1, 28, 28))
input_shape = x_train.shape[1:]
# create training+test positive and negative pairs
# 将10种类别的图像分类,digit_indices是一个具有十个元素的元组,每一个包含了对应的数字的图像编号
digit_indices = [np.where(y_train == i)[0] for i in range(num_classes)]
# 制作正负图像对,就是两个同类的,两个不同类的
tr_pairs, tr_y = create_pairs(x_train, digit_indices)
# 对于测试数据,进行同样的操作
digit_indices = [np.where(y_test == i)[0] for i in range(num_classes)]
te_pairs, te_y = create_pairs(x_test, digit_indices)
# network definition 定义一个基础网络,算上输入层一共7层,摊平,然后随机丢失,最后输出128,没啥东西
base_network = create_base_network(input_shape)
# input_a, input_b shape (?, 28, 28)定义了两个输入
input_a = Input(shape=input_shape)
input_b = Input(shape=input_shape)
# because we re-use the same instance `base_network`,
# the weights of the network
# will be shared across the two branches
# 说的是重用了base_network, 然后网络参数会共享
# 好像是在说,进程a是,在base_network中传入input_a,b同
processed_a = base_network(input_a)
processed_b = base_network(input_b)
# 计算距离,这个是一个匿名函数,或者Lambda函数,在keras中定义自己的层.
# 官方文档说的是输出尺寸可以由tensorflow自行计算,但是这里我们是siamese网络,有连个输入,emmmm
# 这个siamese网络也是官方给的,哎,就这吧
# euclidean_distance 是siamese网络中的那个相关计算函数,传入的还是一个东西,但是可以用两个接,
# 因为我们是这样传入的([processed_a, processed_b])
distance = Lambda(euclidean_distance,
output_shape=eucl_dist_output_shape)([processed_a, processed_b])
model = Model([input_a, input_b], distance)
keras.utils.plot_model(model, "mysiamese.png", show_shapes=True)
# train
# 这里选用的优化器时RMS,而我了解到的adam是采用了这个和另一个算法的优点的优化方式。
# 即我感觉adam也是不错的
rms = RMSprop()
# 使用自定义的loss
model.compile(loss=contrastive_loss, optimizer=rms, metrics=[accuracy])
model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
batch_size=128,
epochs=epochs,
validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y))
# compute final accuracy on training and test sets
y_pred = model.predict([tr_pairs[:, 0], tr_pairs[:, 1]])
tr_acc = compute_accuracy(tr_y, y_pred)
y_pred = model.predict([te_pairs[:, 0], te_pairs[:, 1]])
te_acc = compute_accuracy(te_y, y_pred)
print('* Accuracy on training set: %0.2f%%' % (100 * tr_acc))
print('* Accuracy on test set: %0.2f%%' % (100 * te_acc))