Guide: Building PyTorch 2.9.0a0 on RISC-V (Debian/RockOS)

Guide: Building PyTorch 2.9.0a0 on RISC-V (Debian/RockOS) for CPU-Only Use

Hello Megrez community,

I recently built PyTorch 2.9.0a0+git9a665ca from source on a RISC-V system running a Debian-based RockOS distribution, targeting a CPU-only setup with Python 3.11. This procesGuide: Building PyTorch 2.9.0a0 on RISC-V (Debian/RockOS)s was complex and time-consuming, but I’ve successfully installed and tested PyTorch, achieving functional tensor operations. Below is a detailed step-by-step guide to help you replicate this on your RISC-V system, along with warnings about the challenges and tips for using AI to troubleshoot errors.

Warning: Complexity and Time Involved

  • Complexity: Building PyTorch on RISC-V is not straightforward. It involves cross-compilation (or native compilation), managing dependencies, configuring CMake, and handling Python packaging. You’ll need familiarity with Linux, compilers, and CMake.

  • Time: Compilation can take several hours (4–10 hours depending on hardware), especially for cross-compilation. Ensure you have ~10 GiB of disk space and 16 GiB+ of RAM to avoid crashes.

  • Potential Issues: Common pitfalls include compiler mismatches, CMake generator conflicts (Ninja vs. Unix Makefiles), and Python module resolution errors. Patience and careful debugging are essential.

Prerequisites

Before starting, ensure you have:

  • A Debian-based system (e.g., RockOS or standard Debian).

  • At least 10 GiB free disk space (df -h) and 16 GiB RAM (free -h).

  • Python 3.11 in a virtual environment (I used pyenv).

  • Administrative privileges for installing dependencies.

Step-by-Step Guide

Follow these steps to build and install PyTorch 2.9.0a0+git9a665ca on RISC-V for CPU-only use.

Step 1: Set Up the Environment

  1. Install Dependencies:
    Install required libraries and tools:

    sudo apt update
    sudo apt install git cmake ninja-build build-essential libopenblas-dev liblapack-dev libprotobuf-dev protobuf-compiler python3-dev python3-pip
    
    
    • libopenblas-dev and liblapack-dev enable optimized matrix operations.

    • cmake (version ≥3.27) is critical for configuration.

  2. Set Up Python Virtual Environment:
    I used pyenv to manage Python 3.11:

    curl https://pyenv.run | bash
    echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc
    echo 'eval "$(pyenv init --path)"' >> ~/.bashrc
    echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc
    source ~/.bashrc
    pyenv install 3.11.9
    pyenv virtualenv 3.11.9 pytorch-build-env
    pyenv activate pytorch-build-env
    pip install --upgrade pip
    pip install numpy wheel
    
    

Step 2: Install RISC-V Toolchain

For RISC-V cross-compilation, install the RISC-V GNU toolchain:

sudo apt install gcc-riscv64-linux-gnu g++-riscv64-linux-gnu

Verify the compilers:

/usr/bin/riscv64-linux-gnu-gcc --version
/usr/bin/riscv64-linux-gnu-g++ --version

Note: On my RockOS system, the compilers were named riscv64-linux-gnu-gcc and riscv64-linux-gnu-g++ (not riscv64-unknown-linux-gnu-gcc). Check your binary names:

ls /usr/bin/*riscv64*linux-gnu*

Step 3: Clone PyTorch Source

Clone the PyTorch repository at the specific commit I used:

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git checkout 9a665ca

Step 4: Configure and Build PyTorch

  1. Create Build Directory:

    mkdir build
    cd build
    
    
  2. Configure CMake:
    Use the following CMake command to configure a CPU-only build for RISC-V:

    cmake .. \
        -DCMAKE_BUILD_TYPE=Release \
        -DPYTHON_EXECUTABLE=$(which python) \
        -DUSE_CUDA=OFF \
        -DUSE_ROCM=OFF \
        -DUSE_NNPACK=OFF \
        -DUSE_QNNPACK=OFF \
        -DUSE_PYTORCH_QNNPACK=OFF \
        -DUSE_CUDNN=OFF \
        -DUSE_FBGEMM=OFF \
        -DUSE_KINETO=OFF \
        -DUSE_NUMPY=ON \
        -DUSE_OPENMP=ON \
        -DUSE_SYSTEM_BLAS=ON \
        -DUSE_SYSTEM_LAPACK=ON \
        -DBUILD_TEST=OFF \
        -DBUILD_SHARED_LIBS=ON \
        -DCMAKE_C_COMPILER=/usr/bin/riscv64-linux-gnu-gcc \
        -DCMAKE_CXX_COMPILER=/usr/bin/riscv64-linux-gnu-g++ \
        -DCMAKE_POLICY_DEFAULT_CMP0126=NEW \
        -DUSE_NCCL=OFF \
        -DBUILD_PYTHON=True
    
    
    • Adjust compiler paths if your binaries differ (e.g., /usr/bin/riscv64-linux-gnu-gcc-14).

    • -DUSE_NCCL=OFF prevents unnecessary GPU library cloning.

    • -DBUILD_TEST=OFF skips tests to save time.

  3. Build PyTorch:

    make -j$(nproc)
    
    

    Warning: This step is time-consuming (hours). Monitor memory (free -h) and reduce parallelism (e.g., make -j4) if memory is low.

  4. Verify Build Completion:
    Look for [100%] Built target functorch in the output. Check artifacts:

    ls lib/
    
    

    You should see libtorch.so, libtorch_cpu.so, etc.

Step 5: Create and Install the Python Wheel

  1. Clean CMake Cache (to avoid Ninja conflicts):

    cd ~/pytorch/build
    rm -rf CMakeCache.txt CMakeFiles
    
    
  2. Build the Wheel:

    cd ~/pytorch
    python setup.py bdist_wheel
    
    

    If setup.py tries to use Ninja or unwanted settings, use:

    PYTHON_CMAKE_FLAGS="\
    -DCMAKE_BUILD_TYPE=Release \
    -DPYTHON_EXECUTABLE=$(which python) \
    -DUSE_CUDA=OFF \
    -DUSE_ROCM=OFF \
    -DUSE_NNPACK=OFF \
    -DUSE_QNNPACK=OFF \
    -DUSE_PYTORCH_QNNPACK=OFF \
    -DUSE_CUDNN=OFF \
    -DUSE_FBGEMM=OFF \
    -DUSE_KINETO=OFF \
    -DUSE_NUMPY=ON \
    -DUSE_OPENMP=ON \
    -DUSE_SYSTEM_BLAS=ON \
    -DUSE_SYSTEM_LAPACK=ON \
    -DBUILD_TEST=OFF \
    -DBUILD_SHARED_LIBS=ON \
    -DCMAKE_C_COMPILER=/usr/bin/riscv64-linux-gnu-gcc \
    -DCMAKE_CXX_COMPILER=/usr/bin/riscv64-linux-gnu-g++ \
    -CMAKE_POLICY_DEFAULT_CMP0126=NEW \
    -DUSE_NCCL=OFF \
    -DBUILD_PYTHON=True \
    -G'Unix Makefiles'" python setup.py bdist_wheel
    
    
  3. Install the Wheel:

    ls dist/
    pip install dist/torch-2.9.0a0+git9a665ca-*.whl
    
    

Step 6: Verify Installation

Run tests from outside the pytorch directory to avoid source conflicts:

cd ~
python -c "import torch; print(torch.__version__); print(torch.__file__)"

Expected output:

2.9.0a0+git9a665ca
/home/<user>/.pyenv/versions/pytorch-build-env/lib/python3.11/site-packages/torch/__init__.py

Test tensor operations:

python -c "import torch; print(torch.randn(2, 3)); print(torch.cuda.is_available())"

Expected: A random 2x3 tensor and False for CUDA.
Test BLAS performance:

python -c "import torch; a = torch.randn(1000, 1000); b = torch.randn(1000, 1000); print((a @ b).sum())"

Common Issues and Fixes

  • Ninja vs. Unix Makefiles Mismatch:
    If setup.py fails with CMake Error: Error: generator : Ninja, clean the CMake cache (rm -rf build/CMakeCache.txt build/CMakeFiles) and use PYTHON_CMAKE_FLAGS with -G'Unix Makefiles'.

  • Compiler Not Found:
    If CMake reports CMAKE_C_COMPILER not set, verify your toolchain:

    ls /usr/bin/*riscv64*linux-gnu*
    
    

    Update -DCMAKE_C_COMPILER and -DCMAKE_CXX_COMPILER paths accordingly.

  • Source Directory Conflict:
    If you see ImportError: Failed to load PyTorch C extensions, avoid running Python from ~/pytorch. Use cd ~ first.

  • Memory Issues:
    Monitor memory (free -h). If the build crashes, reduce parallelism (make -j4).

Tips for Using AI to Troubleshoot

I used an AI assistant (Grok) to debug issues, and it was a lifesaver. Here’s how to leverage AI effectively:

  1. Provide Full Error Output: Copy-paste the entire error message (e.g., CMake errors, Python tracebacks) into your AI query. This helps pinpoint the issue.

    • Example: Share CMake Error: Error: generator : Ninja with the command you ran.
  2. Include Context: Mention your system (RISC-V, RockOS), PyTorch version (2.9.0a0+git9a665ca), and whether you’re cross-compiling or building natively.

  3. Ask for Specific Fixes: Request step-by-step solutions for errors, like “How do I fix a Ninja mismatch in CMake?” or “Why is my compiler not found?”

  4. Iterate with Follow-Ups: If the AI’s suggestion fails, provide the new error output and ask for clarification. For example, I shared compiler check outputs (ls /usr/bin/*riscv64*) to fix a toolchain issue.

  5. Verify AI Suggestions: Cross-check AI advice with PyTorch’s official docs or GitHub issues to ensure accuracy.

Final Notes

Building PyTorch on RISC-V is a significant undertaking, but it’s rewarding to get it running. My setup was cross-compiled for RISC-V using riscv64-linux-gnu-gcc/g++ on a RockOS system. If you’re building natively, you may need gcc and g++ instead. The wheel file (dist/torch-2.9.0a0+git9a665ca-*.whl) is portable to other RISC-V systems with Python 3.11 and compatible libraries.

If you hit errors, share them in this thread with:

  • Full error output.

  • Your CMake command or setup.py invocation.

  • Output of ls /usr/bin/*riscv64*linux-gnu* and df -h.

Good luck, and happy coding with PyTorch on RISC-V!