小言_互联网的博客

【源码】Convolutional Two-Stream Network Fusion for Video Action Recognition

721人阅读  评论(0)

相关链接

简介:http://www.robots.ox.ac.uk/~vgg/software/two_stream_action/
代码:https://github.com/feichtenhofer/twostreamfusion

环境准备

  1. 下载代码 https://github.com/feichtenhofer/twostreamfusion

  2. 编译matconvnet

    1. 安装C++编译环境
      使用Visual Studio编译【推荐】
      使用MinGW编译
    2. 在matlab中配置mex使用C++编译
    3. 运行compile.m
  • ‘cl.exe’ 不是内部或外部命令,也不是可运行的程序或批处理文件。
    警告: CL.EXE not found in PATH. Trying to guess out of mex setup.

    解决:在Visual Studio安装目录下搜索cl.exe,将其所在的目录添加到环境变量中,重启Matlab

  • 错误使用 mex
    data.cpp
    error C2027: use of undefined type ‘vl::CudaHelper’
    note: see declaration of ‘vl::CudaHelper’
    error C2228: left of ‘.getLastCudnnErrorMessage’ must have class/struct/union

    解决:将原来的compile.m中的vl_compilenn注释,改成编译为CPU版本

  • 错误使用 make_all>search_cuda_devkit (line 442)
    Could not find a valid NVCC executable\n

    解决:将MexConv3D/make_all.m第19行改为opts.enableGpu = false;

运行代码

以运行cnn_ucf101_spatial为例,运行cnn_ucf101_temporal的方法类似。

  1. 伪装数据集

    注释掉三行,再添加两行就好。

  2. 下载imagenet预训练模型

模型 下载地址 大小 提出时间
vgg-m imagenet-vgg-m-2048.mat 329M 2013
vgg-16 imagenet-vgg-verydeep-16.mat 491M 2014
res-50 imagenet-resnet-50-dag.mat 91.5M 2015
res101 imagenet-resnet-101-dag.mat 159M 2015
res152 imagenet-resnet-152-dag.mat 215M 2015

默认是res-50,下载完成后放到models目录下,并对应地修改model的值:

一定要用上面的链接下载,不要下载官网上最新版的模型,否则后面运行到net = dagnn.DagNN.loadobj(net)时会报错:
类dagnn.Conv不存在公共字段dilate

正常情况下,这个时候就可以运行cnn_ucf101_spatial了,只是不能训练。

使用F12添加断点,F5调试,F10单步执行。在左下角工作区可以看到变量值,下方命令行窗口可以在调试过程中实时执行命令。

  1. 下载时空预训练网络

如果要运行融合网络cnn_ucf101_fusion,就要先下载预训练好的时空网络。res50、res152、vgg16里选一个,最好下vgg16,不然别的模型的层的名字不一样,后面代码要改很多地方。下载好之后放到models目录下,在代码中修改对应地文件名即可:

opts.modelA = fullfile(opts.modelPath, [opts.dataSet '-img-vgg16-split' num2str(opts.nSplit) '.mat']) ;
opts.modelB = fullfile(opts.modelPath, [opts.dataSet '-TVL1flow-vgg16-split' num2str(opts.nSplit) '.mat']) ;

代码阅读

依赖关系

目录结构

程序的入口文件:
cnn_ucf101_spatial:训练空间网络
cnn_ucf101_temporal:训练时间网络
cnn_ucf101_fusion:使用已训练好的模型,并训练最终的融合网络

获取数据的三个文件:
cnn_ucf101_get_frame_batch:获取单帧图像
cnn_ucf101_get_flow_batch:获取光流图像
cnn_ucf101_get_im_flow_batch:不清楚

其他文件:
cnn_setup_environment:初始化环境变量
cnn_ucf101_setup_data:初始化数据
cnn_train_dag:训练CNN的,跟具体模型无关
compile:编译matconvnet

cnn_ucf101_spatial

空间网络实际上就是一个标准的CNN,输入为224x224的RGB图像,输出为UCF-101数据集上的分类,因此其输入维度为[224 224 3],输出维度为[101 1]。中间的网络架构可以使用通用的vgg-m,vgg16,res50,res101,res152等。

文章使用在imagenet上预训练好的网络,这样有几个好处:一是网络结构的有效性经过大量实验验证,二是经过预训练后中间的隐藏层已经具有一定提取复杂图像特征的能力。

将预训练过的网络结构按照实验需要做轻微的修改即可得到空间网络,为了确保输入和输出和实验数据一致,需要修改(1)输入层的权重和偏置(2)最后一个全连接层的权重和偏置;其次为了训练(3)设置自己的目标函数loss和derOutputs;最后为了实验,需要添加(4)dropout、top1error和top5error。此外还需要设置numEpochs、epochFactor、learningRate、batchSize等参数。

关于DagNN和SimpleNN:
官方文档:DagNN
官方文档:SimpleNN
这篇文章说了一下DagNN和SimpleNN的一些区别
net一开始可能是DagNN类型,也可能是SimpleNN类型。如果是SimpleNN类型,会通过dagnn.DagNN.fromSimpleNN(net)转换成DagNN,最后通过cnn_train_dag来训练

主要参数

opts.dataSet = 'ucf101'; 
opts.dropOutRatio = 0 ; 
opts.inputdim  = [ 224,  224, 3] ; 

opts.train.batchSize = 256  ;
opts.train.augmentation = 'borders25';
opts.train.learningRate =  [1e-2*ones(1, 2) 1e-2*ones(1, 3) 1e-3*ones(1, 3) 1e-4*ones(1, 3)] ;

imdb = load(opts.imdbPath) ; % 图像数据集
nClasses = length(imdb.classes.name); % 数据集分类数
net = load(opts.model); % 预训练过的网络

(1)输入层

这一步无需修改,因为预训练好的模型的数据就是[224 224 3],和空间网络一致。

(2)最后一个全连接层

修改最后一个全连接层的两个参数

  1. fc_filter维度从 [1 1 2048 1000] 变为 [1 1 2048 101] 并随机初始化
  2. fc_bias维度从 [1000 1] 变为 [101 1] 并初始化为0
  % imagenet有1000个分类,所以分类层输出是一个(1000, 1)的向量
  % 修改最后一个全连接层的权重和偏置的维度,使得输出维度为(101, 1)
  % replace 1000-way imagenet classifiers
  for p = 1 : numel(net.params)
    sz = size(net.params(p).value);
    disp(sz);
    if any(sz == 1000)
      sz(sz == 1000) = nClasses;
      fprintf('replace classifier layer of %s\n', net.params(p).name);
      % 将fc1000_filter维度从[1 1 2048 1000]改为[1 1 2048 101]
      if numel(sz) > 2
         net.params(p).value = 0.01 * randn(sz,  class(net.params(p).value));
      % 将fc1000_bias维度从[1000 1]改为[101 1]
      else
         net.params(p).value = zeros(sz,  class(net.params(p).value));
      end
    end
  end
  % 设置normalization.border = [32 32]
  net.meta.normalization.border = [256 256] - net.meta.normalization.imageSize(1:2);
  net = dagnn.DagNN.loadobj(net);
  if strfind(model, 'bnorm')
    net = insert_bnorm_layers(net) ;
  end

(3)设置loss和derOutputs

% 移除Softmax层,添加Loss层,并将Loss设置为derOutputs用于反向传播
opts.train.derOutputs = {} ;
for l=numel(net.layers):-1:1
  if isa(net.layers(l).block, 'dagnn.Loss') && isempty(strfind(net.layers(l).name, 'err'))
      opts.train.derOutputs = {opts.train.derOutputs{:}, net.layers(l).outputs{:}, 1} ;
  end
  % 移除Softmax层
  if isa(net.layers(l).block, 'dagnn.SoftMax') 
    net.removeLayer(net.layers(l).name)
    l = l - 1;
  end
end

if isempty(opts.train.derOutputs)
  % 添加Loss层
  net = dagnn.DagNN.insertLossLayers(net, 'numClasses', nClasses) ;
  fprintf('setting derivative for layer %s \n', net.layers(end).name);
  % 设置模型的输出
  opts.train.derOutputs = {opts.train.derOutputs{:}, net.layers(end).outputs{:}, 1} ;
end

(4)dropout、top1error和top5error

% 根据opts.dropOutRatio设置dropout层
if ~isnan(opts.dropOutRatio)
  dr_layers = find(arrayfun(@(x) isa(x.block,'dagnn.DropOut'), net.layers)) ;
  % 更新现有的dropout层
  if ~isempty(dr_layers)
    if opts.dropOutRatio > 0
      for i=dr_layers, net.layers(i).block.rate = opts.dropOutRatio; end
    else
      net.removeLayer({net.layers(dr_layers).name});
    end
  else
    % 在net中没有找到dropout层,在最后一个pooling层后面添加dropout
    if opts.dropOutRatio > 0
      pool5_layer = find(arrayfun(@(x) isa(x.block,'dagnn.Pooling'), net.layers)) ;
      conv_layers = pool5_layer(end);
      for i=conv_layers
        block = dagnn.DropOut() ;   block.rate = opts.dropOutRatio ;
        newName = ['drop_' net.layers(i).name];

        net.addLayer(newName, ...
          block, ...
          net.layers(i).outputs, ...
          {newName}) ;

        for l = 1:numel(net.layers)-1
          for f = net.layers(i).outputs
             sel = find(strcmp(f, net.layers(l).inputs )) ;
             if ~isempty(sel)
              [net.layers(l).inputs{sel}] = deal(newName) ;
             end
          end
        end
      end
    end
  end
end
% 在loss后面添加两层,用来计算错误率
lossLayers = find(arrayfun(@(x) isa(x.block,'dagnn.Loss') && strcmp(x.block.loss,'softmaxlog'),net.layers));

net.addLayer('top1error', ...
             dagnn.Loss('loss', 'classerror'), ...
             net.layers(lossLayers(end)).inputs, ...
             'top1error') ;

net.addLayer('top5error', ...
             dagnn.Loss('loss', 'topkerror', 'opts', {'topK', 5}), ...
             net.layers(lossLayers(end)).inputs, ...
             'top5error') ;

其他训练参数

opts.train.train = find(ismember(imdb.images.set, [1])) ;
opts.train.train = repmat(opts.train.train,1,opts.train.epochFactor);
opts.train.valmode = '250samples';
opts.train.denseEval = 1;

opts.train.train是一个索引数组,指定参与训练的数据的索引值
所有训练数据重复opts.train.epochFactor次,这样每项数据都会多次参与训练

cnn_ucf101_temporal

和空间网络区别不大,主要是输入维度从[224, 224, 3]变成了[224, 224, 20],对应输入层的权重也要增加

    opts.inputdim  = [netNorm.imageSize(1:2), 20] ;
    net.layers{1}.weights{1} = repmat(mean(net.layers{1}.weights{1},3), [1 1 opts.inputdim(3) 1]) ;
    net.meta.normalization.averageImage = [];
    net.meta.normalization.border = [256 256] - netNorm.imageSize(1:2);
    net = replace_last_layer(net, [1 2], [1 2], nClasses, opts.dropOutRatio);
    net.normalization.imageSize = opts.inputdim ;

cnn_ucf101_fusion

主要参数

addConv3D = 1 ;
addPool3D = 1 ;
doSum = 0 ;
imdb = load(opts.imdbPath) ;
netA = load(opts.modelA) ;
netB = load(opts.modelB) ;

找到需要融合的层

opts.train.fusionType = 'conv';
opts.train.fusionLayer = {'relu5_3', 'relu5_3'; };

fusionLayerA = []; fusionLayerB = [];
if ~isempty(opts.train.fusionLayer)
for i=1:numel(netA.layers)
 if isfield(netA.layers(i),'name') && any(strcmp(netA.layers(i).name,opts.train.fusionLayer(:,1)))
   fusionLayerA = [fusionLayerA i]; 
 end                
end
for i=1:numel(netB.layers)
 if  isfield(netB.layers(i),'name') && any(strcmp(netB.layers(i).name,opts.train.fusionLayer(:,2)))
   fusionLayerB = [fusionLayerB i]; 
 end                
end
end

netA和netB在结构上是一样的,它们的全连接层之前的最后一个Relu层的名字都是relu5_3,融合就在这一层上进行,下面就进行融合。

空间融合:添加Concat fusion层

for i = 1:size(opts.train.fusionLayer,1)
  if strcmp(opts.train.fuseInto,'spatial')
    i_fusion = find(~cellfun('isempty', strfind({net.layers.name}, ...
      [opts.train.fusionLayer{i,1} '_' opts.train.fuseInto])));
  else
    i_fusion = find(~cellfun('isempty', strfind({net.layers.name}, ...
      [opts.train.fusionLayer{i,2} '_' opts.train.fuseInto])));
  end
  name_concat = [opts.train.fusionLayer{i,2} '_concat'];
 
  if doSum
    block = dagnn.Sum() ;
    net.addLayerAt(i_fusion(end), name_concat, block, ...
               [net.layers(strcmp({net.layers.name},[opts.train.fusionLayer{i,1} '_spatial'])).outputs ...
                net.layers(strcmp({net.layers.name},[opts.train.fusionLayer{i,2} '_temporal'])).outputs], ...
                name_concat) ;   
              

  else
    block = dagnn.Concat() ;
    net.addLayerAt(i_fusion(end), name_concat, block, ...
               [net.layers(strcmp({net.layers.name},[opts.train.fusionLayer{i,1} '_spatial'])).outputs ...
                net.layers(strcmp({net.layers.name},[opts.train.fusionLayer{i,2} '_temporal'])).outputs], ...
                name_concat) ;   
  end

  % set input for fusion layer
  net.layers(i_fusion(end)+2).inputs{1} = name_concat;
end

关键代码:

% 添加融合层
net.addLayerAt(
  i_fusion(end), ...       % 融合层的位置(感觉只是网络id,没有什么影响)
  name_concat, ...         % 融合层的名字(relu5_3_concat)
  block, ...               % 融合层的类型(dagnn.Sum()或dagnn.Concat()'relu5_3_spacial' ...    % 融合层输入1
  'relu5_3_temporal', ...  % 融合层输入2
  name_concat ...          % 融合层输出(relu5_3_concat)
); 
% 将融合层设置为pool5_spacial的输入
net.layers(i_fusion(end)+2).inputs{1} = name_concat;

上面这段代码通过在relu5pool5之间插入一层concat,将两个网络融合起来,也就是下图中红方框处的融合:

融合前网络结构如下:

融合后网络结构如下:

时间融合:Conv3D

if addConv3D
  block = dagnn.Conv3D() ;
  params(1).name = 'conv3Df' ;
  in = size(net.params(net.getParamIndex('conv5_3f_spatial')).value,4) + ...
    size(net.params(net.getParamIndex('conv5_3f_temporal')).value,4) ;
  out = 512;

  kernel = eye(in/2,out,'single');
  kernel = cat(1, .25 * kernel, .75 * kernel);
  kernel = permute(kernel, [4 5 3 1 2]);

  sigma = 1;
  [X,Y,Z] = ndgrid(-1:1, -1:1, -1:1);
  G3 = exp( -((X.*X)/(sigma*sigma) + (Y.*Y)/(sigma*sigma) + (Z.*Z)/(sigma*sigma))/2 );
  G3 = G3./sum(G3(:));
  kernel = bsxfun(@times, kernel, G3);

  params(1).value = kernel;
  params(2).name = 'conv3Db' ;
  params(2).value = zeros(1, out ,'single') ;

  pads = size(kernel); pads = ceil(pads(1:3) / 2) - 1;
  block.pad = [pads(1),pads(1), pads(2),pads(2), pads(3),pads(3)]; 
  block.stride = [1 1 1]; 
  block.size = size(kernel);

  i_relu5 = find(~cellfun('isempty', strfind({net.layers.name},'relu5_3_concat')));

  net.addLayerAt(i_relu5, 'conv53D',  block, ...
               [net.layers(i_relu5).outputs ], ...
                'conv3D5', {params.name}) ;  

  net.params(net.getParamIndex(params(1).name)).value = params(1).value ;
  net.params(net.getParamIndex(params(2).name)).value = params(2).value ;

  block = dagnn.ReLU() ;
  net.addLayerAt(i_relu5+1, 'relu3D5',  block, ...
               [net.layers(i_relu5+1).outputs ], ...
                'relu3D5') ;

  net.layers(find(~cellfun('isempty', strfind({net.layers.name},['pool5_' opts.train.fuseInto])))).inputs = {'relu3D5'};
end

在relu5_3_concat后面加一层conv3D5和relu3D5,再接到pool5上

时间融合:Pool3D

if addPool3D
  block = dagnn.Pooling3D() ;
  block.method = 'max' ;

  i_pool5 = find(~cellfun('isempty', strfind({net.layers.name},['pool5_' opts.train.fuseInto])));     
  block.poolSize = [net.layers(i_pool5).block.poolSize nFrames];         
  block.pad = [net.layers(i_pool5).block.pad 0,0]; 
  block.stride = [net.layers(i_pool5).block.stride 2];     
  net.addLayerAt(i_pool5, ['pool3D5_' opts.train.fuseInto], block, ...
               [net.layers(i_pool5).inputs], ...
                 [net.layers(i_pool5).outputs]) ; 
  net.removeLayer(['pool5_' opts.train.fuseInto], 0) ;    


  i_pool5 = find(~cellfun('isempty', strfind({net.layers.name},['pool5_' opts.train.fuseFrom ])));                 
  if ~isempty(i_pool5)
    block = dagnn.Pooling3D() ;

    block.poolSize = [net.layers(i_pool5).block.poolSize nFrames];
    block.pad = [net.layers(i_pool5).block.pad 0,0];
    block.stride = [net.layers(i_pool5).block.stride 2];

    net.addLayerAt(i_pool5, ['pool3D5_' opts.train.fuseFrom], block, ...
                 [net.layers(i_pool5).inputs], ...
                   [net.layers(i_pool5).outputs]) ;      

    net.removeLayer(['pool5_' opts.train.fuseFrom ], 0) ;    
  end

end

用pool3D5替换pool5

设置输出导数

opts.train.derOutputs = {} ;
for l=1:numel(net.layers)
  if isa(net.layers(l).block, 'dagnn.Loss') && isempty(strfind(net.layers(l).block.loss, 'err'))
    if opts.backpropFuseFrom || ~isempty(strfind(net.layers(l).name, opts.train.fuseInto ))
      fprintf('setting derivative for layer %s \n', net.layers(l).name);
      opts.train.derOutputs = [opts.train.derOutputs, net.layers(l).outputs, {1}] ;
    end
     net.addLayer(['err1_' net.layers(l).name(end-7:end) ], dagnn.Loss('loss', 'classerror'), ...
             net.layers(l).inputs, 'error') ;
  end
end

在融合网络中,有两个Loss输出:loss_spacialloss_temporal,使用哪个Loss来反向传播呢?首先,整个融合网络是从时间网络融入(fuse into)空间网络的,所以空间网络的损失loss_spacial必定要用于反向传播,这样才能使得融合是有效的。

接着又引出两个问题,一:反向传播可以使用多个Loss吗? 二:是否将loss_temporal用于反向传播?第一个问题的答案是肯定的,第二个问题的答案由两个参数决定:opts.backpropFuseFromopts.train.removeFuseFrom

opts.backpropFuseFrom为真,则将loss_temporal也加入反向传播,这样网络就有两条反向传播路线,通过调试和源码注释可以验证这一点:


opts.train.removeFuseFrom为真,则整个时间网络的后半段都被删掉(如下图所示),loss_temporal自然也被删掉:

具体地,两个Loss怎么同时反向传播呢,是顺序的还是交替的?反向传播的顺序可用如下命令获得:

net.layers(fliplr(net.getLayerExecutionOrder())).name

err1_temporal
err1__spatial
loss39_temporal
loss39_spatial
prediction_temporal
prediction_spatial
layer37_temporal
layer37_spatial
relu7_temporal
relu7_spatial
fc7_temporal
fc7_spatial
layer34_temporal
layer34_spatial
relu6_temporal
relu6_spatial
fc6_temporal
fc6_spatial
------------------------------
pool3D5_temporal
pool3D5_spatial
relu3D5
conv53D
relu5_3_concat
------------------------------
relu5_3_temporal
relu5_3_spatial
conv5_3_temporal
conv5_3_spatial
relu5_2_temporal
relu5_2_spatial
conv5_2_temporal
conv5_2_spatial
relu5_1_temporal
relu5_1_spatial
conv5_1_temporal
conv5_1_spatial
pool4_temporal
pool4_spatial
relu4_3_temporal
relu4_3_spatial
conv4_3_temporal
conv4_3_spatial
relu4_2_temporal
relu4_2_spatial
conv4_2_temporal
conv4_2_spatial
relu4_1_temporal
relu4_1_spatial
conv4_1_temporal
conv4_1_spatial
pool3_temporal
pool3_spatial
relu3_3_temporal
relu3_3_spatial
conv3_3_temporal
conv3_3_spatial
relu3_2_temporal
relu3_2_spatial
conv3_2_temporal
conv3_2_spatial
relu3_1_temporal
relu3_1_spatial
conv3_1_temporal
conv3_1_spatial
pool2_temporal
pool2_spatial
relu2_2_temporal
relu2_2_spatial
conv2_2_temporal
conv2_2_spatial
relu2_1_temporal
relu2_1_spatial
conv2_1_temporal
conv2_1_spatial
pool1_temporal
pool1_spatial
relu1_2_temporal
relu1_2_spatial
conv1_2_temporal
conv1_2_spatial
relu1_1_temporal
relu1_1_spatial
conv1_1_temporal
conv1_1_spatial

可以看到在融合层的前后,两个网络交替执行。


转载:https://blog.csdn.net/u013588351/article/details/101872292
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场