了解 Caffe 中的输入尺寸、SoftmaxWithLoss 和标签

Understanding input dimentions, SoftmaxWithLoss and labels in Caffe

本文关键字：SoftmaxWithLoss 标签 Caffe 输入了解更新时间：2023-10-16

我正在尝试使用我自己训练的网络和我自己来自C++的数据。我在具有ImageData层的".jpg"数据上训练和测试网络，然后实现基本的咖啡示例"分类.cpp"以逐个通过内存传递图像。因此，我需要知道 2 个类的概率：
1 - 对象，
2 - 环境。

我的常规用途输入层如下所示：

layer {
    name: "data"
    top:  "data"
    top:  "label"
    type: "Input"
    input_param { shape: { dim: 1 dim: 3 dim: 256 dim: 256 }}
}

输出层：

layer {
    name: "fc6"
    top:  "fc6"
    type: "InnerProduct"
    bottom: "drop5"
    inner_product_param {
        num_output: 2
        weight_filler {
            type: "xavier"
            std: 0.1
        }
    }
}
layer {
    name: "prob"
    top:  "prob"
    type: "SoftmaxWithLoss"
    bottom: "fc6"
    bottom: "label"
}
layer {
    name: "accuracy"
    top:  "accuracy"
    type: "Accuracy"
    bottom: "fc6"
    bottom: "label"
    include {
        phase: TEST
    }
}

在测试阶段，网络已经达到了精度=0.93，但现在经常使用C++我无法弄清楚一些基本概念，并且在解析模型时出现错误。

Check failure stack trace:
...
caffe::SoftmaxWithLossLayer<>::Reshape()
caffe::Net<>::Init()
caffe::Net<>::Net()
...
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (1 vs. 196608) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.

好的，1x3x256x256 = 196608，但为什么我需要这个标签计数？我有一个文件"标签.txt"，如示例"分类.cpp"所示：

environment
object

为什么标签！= 类？我应该如何处理 SoftmaxWithLoss 和输入尺寸？

您没有定义标签的shape，我假设每个图像只有一个标签。因此

layer {
  name: "data"
  top:  "data"
  top:  "label"
  type: "Input"
  input_param { shape: { dim: 1 dim: 3 dim: 256 dim: 256 }
                shape: { dim: 1 dim: 1 }}  # one label per image
}