如何解决桨v0.8.0b上的"cudaSuccess = err (0 vs. 8)"错误?
How to resolve "cudaSuccess = err (0 vs. 8)" error on Paddle v0.8.0b?
我已经使用https://github.com/baidu/Paddle/releases/download/V0.8.0b1/paddle-gpu-0.8.0b1-Linux.deb上的.deb
文件安装了paddlepaddle
我在一台4 GTX 1080的机器上安装了CUDA 8.0和cudnn v5.1,没有NVIDIA Accelerated Graphics Driver
:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
我已经设置了shell变量:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
所有的cuda
工作正常,因为我已经运行了所有的NVIDIA_CUDA-8.0_Samples
,它们"通过"了所有的测试。
Paddle/demo/quick_start
中的quick_start
演示代码也运行得很顺利,没有抛出错误。
但是当我试图从Paddle github repo运行image_classification
演示时,我得到了invalid device function
错误。有办法解决这个问题吗?
hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
完整的回溯:
~/Paddle/demo/image_classification$ bash train.sh
I1005 14:34:51.929863 10461 Util.cpp:151] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I1005 14:34:56.705898 10461 Util.cpp:126] Calling runInitFunctions
I1005 14:34:56.706171 10461 Util.cpp:139] Call runInitFunctions done.
[INFO 2016-10-05 14:34:56,918 layers.py:1620] channels=3 size=3072
[INFO 2016-10-05 14:34:56,919 layers.py:1620] output size for __conv_0__ is 32
[INFO 2016-10-05 14:34:56,920 layers.py:1620] channels=64 size=65536
[INFO 2016-10-05 14:34:56,920 layers.py:1620] output size for __conv_1__ is 32
[INFO 2016-10-05 14:34:56,922 layers.py:1681] output size for __pool_0__ is 16*16
[INFO 2016-10-05 14:34:56,923 layers.py:1620] channels=64 size=16384
[INFO 2016-10-05 14:34:56,923 layers.py:1620] output size for __conv_2__ is 16
[INFO 2016-10-05 14:34:56,924 layers.py:1620] channels=128 size=32768
[INFO 2016-10-05 14:34:56,925 layers.py:1620] output size for __conv_3__ is 16
[INFO 2016-10-05 14:34:56,926 layers.py:1681] output size for __pool_1__ is 8*8
[INFO 2016-10-05 14:34:56,927 layers.py:1620] channels=128 size=8192
[INFO 2016-10-05 14:34:56,927 layers.py:1620] output size for __conv_4__ is 8
[INFO 2016-10-05 14:34:56,928 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 14:34:56,929 layers.py:1620] output size for __conv_5__ is 8
[INFO 2016-10-05 14:34:56,930 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 14:34:56,930 layers.py:1620] output size for __conv_6__ is 8
[INFO 2016-10-05 14:34:56,932 layers.py:1681] output size for __pool_2__ is 4*4
[INFO 2016-10-05 14:34:56,932 layers.py:1620] channels=256 size=4096
[INFO 2016-10-05 14:34:56,933 layers.py:1620] output size for __conv_7__ is 4
[INFO 2016-10-05 14:34:56,934 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 14:34:56,934 layers.py:1620] output size for __conv_8__ is 4
[INFO 2016-10-05 14:34:56,936 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 14:34:56,936 layers.py:1620] output size for __conv_9__ is 4
[INFO 2016-10-05 14:34:56,938 layers.py:1681] output size for __pool_3__ is 2*2
[INFO 2016-10-05 14:34:56,938 layers.py:1681] output size for __pool_4__ is 1*1
[INFO 2016-10-05 14:34:56,941 networks.py:1125] The input order is [image, label]
[INFO 2016-10-05 14:34:56,941 networks.py:1132] The output order is [__cost_0__]
I1005 14:34:56.948256 10461 Trainer.cpp:170] trainer mode: Normal
F1005 14:34:56.949136 10461 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
@ 0x7fa557316daa (unknown)
@ 0x7fa557316ce4 (unknown)
@ 0x7fa5573166e6 (unknown)
@ 0x7fa557319687 (unknown)
@ 0x78a939 hl_gpu_apply_unary_op<>()
@ 0x7536bf paddle::BaseMatrixT<>::applyUnary<>()
@ 0x7532a9 paddle::BaseMatrixT<>::applyUnary<>()
@ 0x73d82f paddle::BaseMatrixT<>::zero()
@ 0x66d2ae paddle::Parameter::enableType()
@ 0x669acc paddle::parameterInitNN()
@ 0x66bd13 paddle::NeuralNetwork::init()
@ 0x679ed3 paddle::GradientMachine::create()
@ 0x6a6355 paddle::TrainerInternal::init()
@ 0x6a2697 paddle::Trainer::init()
@ 0x53a1f5 main
@ 0x7fa556522f45 (unknown)
@ 0x545ae5 (unknown)
@ (nil) (unknown)
/home/xxx/Paddle/binary/bin/paddle: line 81: 10461 Aborted (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
No data to plot. Exiting!
根据git repo的问题#158,这个问题应该在#170中解决,并且在CUDA 8.0中支持GTX 1080,但在访问GPU功能时仍然抛出错误。(抱歉不能添加超过2个信誉较低的链接)
有人知道如何解决这个问题并安装它,以便image_classification
可以运行吗?
我也试过从源代码编译+安装,同样的错误被抛出,而quick_start
演示运行顺利。
问题是由于CUDA 8.0的Paddle/cmake/flags.cmake
架构的标志设置。
在https://github.com/baidu/Paddle/pull/165/files中通过添加compute_52
, sm_52
, compute_60
和sm_60
我对桨一无所知。但是,CUDA错误几乎肯定是由您安装的二进制文件不包含(相当新的)GTX1080的代码引起的。要么找到支持Pascal gpu的版本,要么从源代码构建自己的版本。
- 在VS代码中交叉编译Windows与Linux上的MinGW的SDL程序
- 如何为模板化对象创建模板向量?VS正在投掷C3203
- 数据成员SFINAE的C++17测试:gcc vs clang
- 为什么在Windows上的VS 2019和Clang 9中"size_t"在没有标题的情况下工作
- 在for循环中使用auto vs decltype(vec.size())来处理字符串的向量
- 正在VS调试器中监视映射条目
- Confusion: decltype vs std::function
- 将IBM Rhapsody模型集成到VS 2019中
- VS Code "command":"make"与终端窗口中的命令行"make"不同
- 使用VS Code和CMake Tools运行自定义命令
- 修改 VS Code 中的默认C++代码段
- 如何使用c++在VS 2019上运行SQL查询
- vs 2015 constexpr变量不恒定,但与2019相比还好吗
- 完美前进使用 std::forward vs RefRefCast
- 从VS 2015更新3更新到VS2015更新3 d后浮点计算行为不同的原因
- VS 2015 链接错误 无法构建依赖于 libcurl 的项目
- consteval wrapper vs. source_location
- VS Code C++:不准确的系统包括路径错误(wchar.h,boost/lambda/lambda.hpp)
- QStringList vs list<shared_ptr<QString>> 性能比较C++
- 如何解决桨v0.8.0b上的"cudaSuccess = err (0 vs. 8)"错误?