Guide: Building PyTorch 2.9.0a0 on RISC-V (Debian/RockOS) for CPU-Only Use
Hello Megrez community,
I recently built PyTorch 2.9.0a0+git9a665ca from source on a RISC-V system running a Debian-based RockOS distribution, targeting a CPU-only setup with Python 3.11. This procesGuide: Building PyTorch 2.9.0a0 on RISC-V (Debian/RockOS)s was complex and time-consuming, but I’ve successfully installed and tested PyTorch, achieving functional tensor operations. Below is a detailed step-by-step guide to help you replicate this on your RISC-V system, along with warnings about the challenges and tips for using AI to troubleshoot errors.
Warning: Complexity and Time Involved
-
Complexity: Building PyTorch on RISC-V is not straightforward. It involves cross-compilation (or native compilation), managing dependencies, configuring CMake, and handling Python packaging. You’ll need familiarity with Linux, compilers, and CMake.
-
Time: Compilation can take several hours (4–10 hours depending on hardware), especially for cross-compilation. Ensure you have ~10 GiB of disk space and 16 GiB+ of RAM to avoid crashes.
-
Potential Issues: Common pitfalls include compiler mismatches, CMake generator conflicts (Ninja vs. Unix Makefiles), and Python module resolution errors. Patience and careful debugging are essential.
Prerequisites
Before starting, ensure you have:
-
A Debian-based system (e.g., RockOS or standard Debian).
-
At least 10 GiB free disk space (
df -h
) and 16 GiB RAM (free -h
). -
Python 3.11 in a virtual environment (I used
pyenv
). -
Administrative privileges for installing dependencies.
Step-by-Step Guide
Follow these steps to build and install PyTorch 2.9.0a0+git9a665ca on RISC-V for CPU-only use.
Step 1: Set Up the Environment
-
Install Dependencies:
Install required libraries and tools:sudo apt update sudo apt install git cmake ninja-build build-essential libopenblas-dev liblapack-dev libprotobuf-dev protobuf-compiler python3-dev python3-pip
-
libopenblas-dev
andliblapack-dev
enable optimized matrix operations. -
cmake
(version ≥3.27) is critical for configuration.
-
-
Set Up Python Virtual Environment:
I usedpyenv
to manage Python 3.11:curl https://pyenv.run | bash echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc echo 'eval "$(pyenv init --path)"' >> ~/.bashrc echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc source ~/.bashrc pyenv install 3.11.9 pyenv virtualenv 3.11.9 pytorch-build-env pyenv activate pytorch-build-env pip install --upgrade pip pip install numpy wheel
Step 2: Install RISC-V Toolchain
For RISC-V cross-compilation, install the RISC-V GNU toolchain:
sudo apt install gcc-riscv64-linux-gnu g++-riscv64-linux-gnu
Verify the compilers:
/usr/bin/riscv64-linux-gnu-gcc --version
/usr/bin/riscv64-linux-gnu-g++ --version
Note: On my RockOS system, the compilers were named riscv64-linux-gnu-gcc
and riscv64-linux-gnu-g++
(not riscv64-unknown-linux-gnu-gcc
). Check your binary names:
ls /usr/bin/*riscv64*linux-gnu*
Step 3: Clone PyTorch Source
Clone the PyTorch repository at the specific commit I used:
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git checkout 9a665ca
Step 4: Configure and Build PyTorch
-
Create Build Directory:
mkdir build cd build
-
Configure CMake:
Use the following CMake command to configure a CPU-only build for RISC-V:cmake .. \ -DCMAKE_BUILD_TYPE=Release \ -DPYTHON_EXECUTABLE=$(which python) \ -DUSE_CUDA=OFF \ -DUSE_ROCM=OFF \ -DUSE_NNPACK=OFF \ -DUSE_QNNPACK=OFF \ -DUSE_PYTORCH_QNNPACK=OFF \ -DUSE_CUDNN=OFF \ -DUSE_FBGEMM=OFF \ -DUSE_KINETO=OFF \ -DUSE_NUMPY=ON \ -DUSE_OPENMP=ON \ -DUSE_SYSTEM_BLAS=ON \ -DUSE_SYSTEM_LAPACK=ON \ -DBUILD_TEST=OFF \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_C_COMPILER=/usr/bin/riscv64-linux-gnu-gcc \ -DCMAKE_CXX_COMPILER=/usr/bin/riscv64-linux-gnu-g++ \ -DCMAKE_POLICY_DEFAULT_CMP0126=NEW \ -DUSE_NCCL=OFF \ -DBUILD_PYTHON=True
-
Adjust compiler paths if your binaries differ (e.g.,
/usr/bin/riscv64-linux-gnu-gcc-14
). -
-DUSE_NCCL=OFF
prevents unnecessary GPU library cloning. -
-DBUILD_TEST=OFF
skips tests to save time.
-
-
Build PyTorch:
make -j$(nproc)
Warning: This step is time-consuming (hours). Monitor memory (
free -h
) and reduce parallelism (e.g.,make -j4
) if memory is low. -
Verify Build Completion:
Look for[100%] Built target functorch
in the output. Check artifacts:ls lib/
You should see
libtorch.so
,libtorch_cpu.so
, etc.
Step 5: Create and Install the Python Wheel
-
Clean CMake Cache (to avoid Ninja conflicts):
cd ~/pytorch/build rm -rf CMakeCache.txt CMakeFiles
-
Build the Wheel:
cd ~/pytorch python setup.py bdist_wheel
If
setup.py
tries to use Ninja or unwanted settings, use:PYTHON_CMAKE_FLAGS="\ -DCMAKE_BUILD_TYPE=Release \ -DPYTHON_EXECUTABLE=$(which python) \ -DUSE_CUDA=OFF \ -DUSE_ROCM=OFF \ -DUSE_NNPACK=OFF \ -DUSE_QNNPACK=OFF \ -DUSE_PYTORCH_QNNPACK=OFF \ -DUSE_CUDNN=OFF \ -DUSE_FBGEMM=OFF \ -DUSE_KINETO=OFF \ -DUSE_NUMPY=ON \ -DUSE_OPENMP=ON \ -DUSE_SYSTEM_BLAS=ON \ -DUSE_SYSTEM_LAPACK=ON \ -DBUILD_TEST=OFF \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_C_COMPILER=/usr/bin/riscv64-linux-gnu-gcc \ -DCMAKE_CXX_COMPILER=/usr/bin/riscv64-linux-gnu-g++ \ -CMAKE_POLICY_DEFAULT_CMP0126=NEW \ -DUSE_NCCL=OFF \ -DBUILD_PYTHON=True \ -G'Unix Makefiles'" python setup.py bdist_wheel
-
Install the Wheel:
ls dist/ pip install dist/torch-2.9.0a0+git9a665ca-*.whl
Step 6: Verify Installation
Run tests from outside the pytorch
directory to avoid source conflicts:
cd ~
python -c "import torch; print(torch.__version__); print(torch.__file__)"
Expected output:
2.9.0a0+git9a665ca
/home/<user>/.pyenv/versions/pytorch-build-env/lib/python3.11/site-packages/torch/__init__.py
Test tensor operations:
python -c "import torch; print(torch.randn(2, 3)); print(torch.cuda.is_available())"
Expected: A random 2x3 tensor and False
for CUDA.
Test BLAS performance:
python -c "import torch; a = torch.randn(1000, 1000); b = torch.randn(1000, 1000); print((a @ b).sum())"
Common Issues and Fixes
-
Ninja vs. Unix Makefiles Mismatch:
Ifsetup.py
fails withCMake Error: Error: generator : Ninja
, clean the CMake cache (rm -rf build/CMakeCache.txt build/CMakeFiles
) and usePYTHON_CMAKE_FLAGS
with-G'Unix Makefiles'
. -
Compiler Not Found:
If CMake reportsCMAKE_C_COMPILER not set
, verify your toolchain:ls /usr/bin/*riscv64*linux-gnu*
Update
-DCMAKE_C_COMPILER
and-DCMAKE_CXX_COMPILER
paths accordingly. -
Source Directory Conflict:
If you seeImportError: Failed to load PyTorch C extensions
, avoid running Python from~/pytorch
. Usecd ~
first. -
Memory Issues:
Monitor memory (free -h
). If the build crashes, reduce parallelism (make -j4
).
Tips for Using AI to Troubleshoot
I used an AI assistant (Grok) to debug issues, and it was a lifesaver. Here’s how to leverage AI effectively:
-
Provide Full Error Output: Copy-paste the entire error message (e.g., CMake errors, Python tracebacks) into your AI query. This helps pinpoint the issue.
- Example: Share
CMake Error: Error: generator : Ninja
with the command you ran.
- Example: Share
-
Include Context: Mention your system (RISC-V, RockOS), PyTorch version (2.9.0a0+git9a665ca), and whether you’re cross-compiling or building natively.
-
Ask for Specific Fixes: Request step-by-step solutions for errors, like “How do I fix a Ninja mismatch in CMake?” or “Why is my compiler not found?”
-
Iterate with Follow-Ups: If the AI’s suggestion fails, provide the new error output and ask for clarification. For example, I shared compiler check outputs (
ls /usr/bin/*riscv64*
) to fix a toolchain issue. -
Verify AI Suggestions: Cross-check AI advice with PyTorch’s official docs or GitHub issues to ensure accuracy.
Final Notes
Building PyTorch on RISC-V is a significant undertaking, but it’s rewarding to get it running. My setup was cross-compiled for RISC-V using riscv64-linux-gnu-gcc/g++
on a RockOS system. If you’re building natively, you may need gcc
and g++
instead. The wheel file (dist/torch-2.9.0a0+git9a665ca-*.whl
) is portable to other RISC-V systems with Python 3.11 and compatible libraries.
If you hit errors, share them in this thread with:
-
Full error output.
-
Your CMake command or
setup.py
invocation. -
Output of
ls /usr/bin/*riscv64*linux-gnu*
anddf -h
.
Good luck, and happy coding with PyTorch on RISC-V!