申请了好久的预算老婆终于批了,入手华硕的GPU本(RTX2070),默认预装了正版win10,平时还是习惯用linux,本文记录了如何安装ubuntu18.4的过程。
1.下载ubuntu18.4的iso镜像文件
自己去官网下载:https://www.ubuntu.com/download/desktop
2.把iso镜像烧录到u盘里
推荐使用balenaEtcher,它可以在 Linux、Windows 和 Mac OS 上运行。
3.准备ubuntu磁盘
磁盘管理,选中c盘(默认只有一个c盘),右键压缩卷,填写要从c盘中分割出来的磁盘大小,这里填的512000M即500G给ubuntu。注意不要格式化
4.重启按F2修改bois设置
4.1 Secure Boot 设置成Disable
否则会黑屏
4.2 Boot启动顺序改为U盘在win10前边
4.3 STAT Model改为AHCI
否则ubuntu分区时会找不到磁盘;但是改为AHCI会导致进不了win10,暂时没找到更好的办法
5.安装ubuntu
如果遇到No EFI System Partition was found.this system will likely….. Go back ….时,需要新建一个用于EFI的分区,给100M就可以。
6.进入ubuntu,安装gpu驱动
6.1 准备工作
sudo apt-get install vim-gtk openssh-server
sudo vim /etc/default/grub
GRUB_GFXMODE=1920x1080
sudo update-grub
sudo apt-get update
sudo apt-get upgrade
6.2 禁用nouveau驱动
sudo vim /etc/modprobe.d/blacklist.conf #末尾添加如下几行:
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist nvidiafb
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
sudo update-initramfs -u
reboot
lsmod | grep nouveau #查不到表示nouveau已被屏蔽
6.3 安装显卡RTX2070驱动
sudo apt-get remove --purge nvidia* #安装之前先卸载已经存在的驱动版本
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev #依赖项
sudo add-apt-repository ppa:graphics-drivers/ppa #添加Graphic Drivers PPA
sudo apt-get update
ubuntu-drivers devices #寻找合适的驱动版本
sudo apt-get install nvidia-driver-440 #安装440版本驱动
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.82
#440版本驱动也可以进入nvidia官网 https://www.geforce.cn/drivers下载对应rtx2070显卡的驱动程序,下载后的文件格式为run
sudo sh ./NVIDIA-Linux-x86_64-440.82.run –no-opengl-files #一路回车即可
reboot #重启
nvidia-smi #检查显卡驱动是否正常
到系统设置的“详细信息”部分,查看一下自己的图形卡是否已经是刚才的显卡。
6.4 安装CUDA10.0
选择CUDA版本:
Distribution | Kernel* | GCC | GLIBC | ICC | PGI | XLC | CLANG |
---|---|---|---|---|---|---|---|
x86_64 | |||||||
RHEL 8.1 | 4.18 | 8.2.1 | 2.28 | ||||
RHEL 7.7 | 3.10 | 4.8.5 | 2.17 | 19.0 | 18.x, 19.x | NO | 8.0.0 |
RHEL 6.10 | 2.6.32 | 4.4.7 | 2.12 | ||||
CentOS 7.7 | 3.10 | 4.8.5 | 2.17 | ||||
CentOS 6.10 | 2.6.32 | 4.4.7 | 2.12 | ||||
Fedora 29 | 4.16 | 8.0.1 | 2.27 | ||||
OpenSUSE Leap 15.1 | 4.15.0 | 7.3.1 | 2.26 | ||||
SLES 15.1 | 4.12.14 | 7.2.1 | 2.26 | ||||
SLES 12.4 | 4.12.14 | 4.8.5 | 2.22 | ||||
Ubuntu 18.04.3 (**) | 4.15.0 | 7.3.0 | 2.27 | ||||
Ubuntu 16.04.6 (**) | 4.4 | 5.4.0 | 2.23 |
Distribution | Kernel* | Default GCC | GLIBC | GCC | ICC | PGI | XLC | CLANG | Arm C/C++ |
---|---|---|---|---|---|---|---|---|---|
x86_64 | |||||||||
RHEL 8.1 | 4.18 | 8.3.1 | 2.28 | 9.x | 19.1 | 19.x, 20.x | NO | 9.0.0 | NO |
CentOS 8.1 | 4.18 | 8.2.1 | 2.28 | ||||||
RHEL 7.7 | 3.10 | 4.8.5 | 2.17 | ||||||
CentOS 7.7 | 3.10 | 4.8.5 | 2.17 | ||||||
OpenSUSE Leap 15.1 | 4.15.0 | 7.3.1 | 2.26 | ||||||
SUSE SLES 15.1 | 4.12.14 | 7.2.1 | 2.26 | ||||||
Ubuntu 18.04.4 (**) | 4.15.0 | 7.4.0 | 2.27 | ||||||
Ubuntu 16.04.6 (**) | 4.4 | 5.4.0 | 2.23 |
可以看到CUDA11在Ubuntu18.04下需要GCC9.x,CUDA10.0/10.2在Ubuntu18.04下需要GCC7.3,CUDA10.1在Ubuntu18.04下需要GCC7.4,而我刚安装的Ubuntu18.04实际默认带的是GCC7.5版本,所以应该安装CUDA10。
CUDA Toolkit | Linux x86_64 Driver Version | Windows x86_64 Driver Version |
---|---|---|
CUDA 10.2.89 | >= 440.33 | >= 441.22 |
CUDA 10.1 (10.1.105 general release, and updates) | >= 418.39 | >= 418.96 |
CUDA 10.0.130 | >= 410.48 | >= 411.31 |
CUDA 9.2 (9.2.148 Update 1) | >= 396.37 | >= 398.26 |
CUDA 9.2 (9.2.88) | >= 396.26 | >= 397.44 |
CUDA 9.1 (9.1.85) | >= 390.46 | >= 391.29 |
CUDA 9.0 (9.0.76) | >= 384.81 | >= 385.54 |
CUDA 8.0 (8.0.61 GA2) | >= 375.26 | >= 376.51 |
CUDA 8.0 (8.0.44) | >= 367.48 | >= 369.30 |
CUDA 7.5 (7.5.16) | >= 352.31 | >= 353.66 |
CUDA 7.0 (7.0.28) | >= 346.46 | >= 347.62 |
cat /proc/driver/nvidia/version 可以看到我们安装的驱动是44.82,满足CUDA10.x的驱动依赖条件。
GPU
版本 | Python 版本 | 编译器 | 构建工具 | cuDNN | CUDA |
---|---|---|---|---|---|
tensorflow-2.1.0 | 2.7、3.5-3.7 | GCC 7.3.1 | Bazel 0.27.1 | 7.6 | 10.1 |
tensorflow-2.0.0 | 2.7、3.3-3.7 | GCC 7.3.1 | Bazel 0.26.1 | 7.4 | 10.0 |
tensorflow_gpu-1.14.0 | 2.7、3.3-3.7 | GCC 4.8 | Bazel 0.24.1 | 7.4 | 10.0 |
tensorflow_gpu-1.13.1 | 2.7、3.3-3.7 | GCC 4.8 | Bazel 0.19.2 | 7.4 | 10.0 |
tensorflow_gpu-1.12.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
pip3 install tensorflow-gpu 默认安装了tensorflow-gpu-1.14.0,pytorch基本1.2.0以上也都可以用CUDA10.0,所以这里保险点还是安装CUDA10.0+CUDNN7.4。
安装CUDA10.0:
从https://developer.nvidia.com/cuda-10.0-download-archive 下载cuda-10.0,选择linux + x86_64 + ubuntu + 18.04 + runfile(local),
#下载安装cuda10.0 (手工安装,别用sudo apt install nvidia-cuda-toolkit装,版本会很旧) wget https://developer.download.nvidia.cn/compute/cuda/10.0/secure/Prod/local_installers/cuda_10.0.130_410.48_linux.run chmod +x cuda_10.0.130_410.48_linux.run sudo sh cuda_10.0.130_410.48_linux.run Do you accept the previously read EULA? accept/decline/quit: accept Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48? (y)es/(n)o/(q)uit: n #前边已经装过显卡驱动了,这里不要重复安装 Install the CUDA 10.0 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-10.0 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 10.0 Samples? (y)es/(n)o/(q)uit: y Enter CUDA Samples Location [ default is /home/work ]: Installing the CUDA Toolkit in /usr/local/cuda-10.0 ... Installing the CUDA Samples in /home/work ... Copying samples to /home/work/NVIDIA_CUDA-10.0_Samples now... Finished copying samples. # 下载安装cuda10.0 path1补丁 wget https://developer.download.nvidia.cn/compute/cuda/10.0/Prod/patches/1/cuda_10.0.130.1_linux.run chmod +x cuda_10.0.130.1_linux.run sudo sh cuda_10.0.130.1_linux.run
#查看cuda版本
cd /usr/local/cuda-10.0/bin nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 #添加环境变量 sudo vim~/.bashrc export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export CUDA_HOME=/usr/local/cuda-10.0 source ~/.bashrc #测试cuda cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery sudo make sudo ./deviceQuery
显示 PASS,表示CUDA安装成功。
此时顺便再是下nvidia-smi,看看gpu驱动是否还能正常工作。
6.5 安装cuDNN7.4
https://developer.nvidia.com/rdp/cudnn-archive
注册账号,下载对应cudnn版本,我选择的选择Download cuDNN v7.4.2 (Dec 14, 2018), for CUDA 10.0下的cuDNN Library for Linux
#下载CUDNN
wget https://developer.download.nvidia.cn/compute/machine-learning/cudnn/secure/v7.4.2/prod/10.0_20181213/cudnn-10.0-linux-x64-v7.4.2.24.tgz
#解压后的文件夹名称为cuda ,将对应文件复制到 /usr/local中的cuda-10.0目录内
tar zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
# 复制cudnn头文件
sudo cp ./cuda/include/* /usr/local/cuda-10.0/include/
# 复制cudnn的库
sudo cp ./cuda/lib64/* /usr/local/cuda-10.0/lib64/
# 添加可执行权限
sudo chmod +x /usr/local/cuda-10.0/include/cudnn.h
sudo chmod +x /usr/local/cuda-10.0/lib64/libcudnn*
#检查cudnn是否安装好
cat /usr/local/cuda-10.0/include/cudnn.h | grep CUDNN_MAJOR -A 2
7.安装pytorch/tensorflow-gpu,测试cuda是否生效
sudo apt install python3-pip
pip3 install torch tensorflow-gpu
vim test_gpu.py
import torch
print(torch.cuda.is_available())
device = torch.device("cuda:0")
print(device)
print(torch.cuda.get_device_name(0))
print(torch.rand(3,3).cuda())
print(torch.rand(5,5).cuda() + torch.rand(5,5).cuda())
import tensorflow as tf
print(tf.test.is_gpu_available())
python test_gpu.py
True
cuda:0
GeForce RTX 2070
tensor([[0.9530, 0.4746, 0.9819],
[0.7192, 0.9427, 0.6768],
[0.8594, 0.9490, 0.6551]], device='cuda:0')
True
nvidia-smi -l #查看gpu占用进程情况
yan 20.6.13 0:55
参考:
双系统安装Ubantu18.04 + RTX2070 + CUDA10.1 + Cudnn7.5.1+anaconda3+tensorflow-gpu
哥们,你好,我的也是ROG刚买的笔记本,但是装了ubuntu16.04以后没法链接Wify,你碰到这个问题了吗?如何解决的啊?