使用 bazel 从源代码构建张量流服务遇到错误:C++规则'@org_tensorflow//…'编译失败(出口 4)

Using bazel to build tensorflow-serving from source meets error: C++ compilation of rule '@org_tensorflow//…' failed (Exit 4)

本文关键字:bazel tensorflow 编译 失败 出口 @org 错误 遇到 服务 张量流 C++      更新时间:2023-10-16

我正在尝试使用 bazel 在 CentOS 7.3 上从 https://github.com/tensorflow/serving 构建 tensorflow 服务。我的 gcc 版本是 4.8.5,bazel 版本是 0.10.1。我确定我遵循了安装介绍,并且我已经设置了所有要求的先决条件。每次我运行命令时:Bazel Build -c opt tensorflow_serving/model_servers/...它将运行大约 10~15 分钟,然后因错误而停止:

ERROR:
/root/.cache/bazel/_bazel_root/2d16d9349bff8cf3d8fc4a53d2a23056/external/org_tensorflow/tensorflow/core/kernels/BUILD:3120:1: C++ compilation of rule '@org_tensorflow//tensorflow/core/kernels:conv_ops' failed (Exit 4)
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
INFO: Elapsed time: 881.803s, Critical Path: 37.21s
FAILED: Build did NOT complete successfully

如果我尝试另一个命令:

bazel build -c opt tensorflow_serving/model_servers/...

要专门构建服务的子目录,错误是这样的:

ERROR: 
/home/serving/tensorflow_serving/batching/BUILD:122:1: C++ compilation of rule '//tensorflow_serving/batching:batching_util' failed (Exit 4)
tensorflow_serving/batching/batching_util.cc: In function 'std::map<std::basic_string<char>, std::vector<int> > tensorflow::serving::CalculateMaxDimSizes(const std::vector<std::vector<std::pair<std::basic_string<char>, tensorflow::Tensor> > >&)':
tensorflow_serving/batching/batching_util.cc:165:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < batch.size(); ++i) {
                                  ^
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
INFO: Elapsed time: 1486.641s, Critical Path: 211.94s
FAILED: Build did NOT complete successfully

请原谅我糟糕的英语并帮助我...我已经在这个问题上停留了很长时间。

在 TensorFlow 的 Github 上发现了类似的问题 (349(,很可能您的内存不足。他们建议添加参数"--jobs 1 --local_resources 2048,.5,1.0",以便 Bazel 一次生成不超过一个编译器进程并限制系统资源使用。

我想出了问题所在。你的记忆是不够的。添加参数 --local_resource 和 --jobs 实际上无济于事。然后我尝试使用具有 4 个 CPU 和 25G 内存的谷歌云实例来构建。我发现构建过程在高峰时间可能需要 8G 内存。所以你的1.8G内存是不够的。