yolov5训练时,出现系数为nan和0的问题,cpu跑没有问题,gpu出现nan和0的问题。一般问题cuda问题和显卡的原因。
显卡为GTX 16XX系列的在cuda使用较新版本时会出现该问题。
AutoAnchor: 6.13 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset Image sizes 640 train, 640 val Using 0 dataloader workers Logging results to runs rainexp7 Starting training for 100 epochs... Epoch gpu_mem box obj cls labels img_size 0/99 1.88G nan nan nan 10 640: 100%|██████████| 14/14 [00:35<00:00, 2.52s/it] D:19837anaconda3envspytorchlibsite-packages orchoptimlr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) Class Images Labels P R [email protected] [email protected]:.95: 100%|██████████| 7/7 [00:07<00:00, 1.09s/it] all 106 0 0 0 0 0 Epoch gpu_mem box obj cls labels img_size 1/99 1.96G nan nan nan 104 640: 7%|▋ | 1/14 [00:02<00:38, 2.92s/it] Process finished with exit code -1
解决方案为将cuda换为10.2的版本,我已经为大家准备好相应cuda和nudnn,下载链接:CUDA_10.2.zip官方版下载丨最新版下载丨绿色版下载丨APP下载-123云盘123云盘为您提供CUDA_10.2.zip最新版正式版官方版绿色版下载,CUDA_10.2.zip安卓版手机版apk免费下载安装到手机,支持电脑端一键快捷安装https://www.123pan.com/s/lgZzVv-eTQk3.html提取码:AMDZ
然后继续安装pytorch cu102版本
pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 torchaudio==0.10.1 -f https://download.pytorch.org/whl/torch_stable.html
接下来回到运行程序阶段
AutoAnchor: 6.13 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset Image sizes 640 train, 640 val Using 0 dataloader workers Logging results to runs rainexp10 Starting training for 100 epochs... Epoch gpu_mem box obj cls labels img_size 0/99 1.85G 0.1244 0.0515 0.06827 10 640: 100%|██████████| 14/14 [01:59<00:00, 8.53s/it] Class Images Labels P R [email protected] [email protected]:.95: 100%|██████████| 7/7 [00:11<00:00, 1.62s/it] all 106 433 0.00107 0.00842 0.000487 0.000122 Epoch gpu_mem box obj cls labels img_size 1/99 1.96G 0.1171 0.06178 0.06603 63 640: 50%|█████ | 7/14 [00:30<00:30, 4.37s/it]
至此就完成了