GPU’s with 16gb+ vram running GPT-OSS:20b in Ubuntu 24, ROCm 7.1.x, vulkan, sycl : llama.cpp


GPU’s in this test:

AMD
Vega 64, Radeon Vii, Instinct Mi50, RX 6800, RX 7900 xtx, and RX 9070 xt

nVidia
RTX 3090, 4080, 5060 ti, 5070 ti, 5090

Intel
Arc A770

8096 context window

batch 512

fa “on”

Here’s the model used in all benchmarks:
https://huggingface.co/unsloth/gpt-oss-20b-GGUF/tree/main (Q4_K_M)

Here’s a link to my benchmark that I used in the video, so that you might try it on your own rig:
https://drive.google.com/file/d/1eXtBOATYKkchl96b7vS_ie3O4Lvdkyi3/view?usp=drive_link


(rotate your device for better viewing)

Comments

8 responses to “GPU’s with 16gb+ vram running GPT-OSS:20b in Ubuntu 24, ROCm 7.1.x, vulkan, sycl : llama.cpp”

  1. VikeBeer Avatar
    VikeBeer

    How are you getting the gfx9xx running on rocm 7.1?

    1. David Jarboe Avatar

      if you go through my 2nd blog post, running dual Mi50’s, I walk you through it…
      The most important thing is copy / pasting the tensor files for gfx906 (or whatever architecture) into the base ROCm folder library prior to rebooting… and immediately after the ROCm / amdgpu installation.

      The original link that I used from Arch Linux used to contain 156 gfx906 tensor files… now it only has 2. You may have to source another ROCm library that contains all of those old tensor files in rocblas. I don’t know why they just drop them out of the newer ROCm versions… seems silly. That to fix it / make it compatible, you just copy / paste files from the older ROCm versions.

      But yes… visit my 2nd blog post. Walks you through it.

  2. Peter Avatar
    Peter

    Hi, thanks for your awesome tests!
    Did you consider testing vllm vs llama cpp on dual mi50?
    Especially using vllm fork for mi50 https://github.com/nlzy/vllm-gfx906.git ?

    1. David Jarboe Avatar

      Ha… I just replied to someone else with that exact same setup. Yes, I should have a new motherboard arrive today… thread ripper. Has pcie4 x16 lanes a plenty. I’ll test llama.cpp vs vLLM in dual GPU setup once I get it booted and setup. To test parallelism performance between each app. THanks for the recommendation… Exactly on path with my channel goals. 🙂

  3. VikeBeer Avatar
    VikeBeer

    All the repositories are being purged of source and binaries for all the 8xx, amd is issuing takedowns. don’t lose your files and keep backups. Share or email me a good discord invite plz.

    1. David Jarboe Avatar

      try this on for size… I selected for it to be never expiring / unlimited. Let me know if it doesn’t work please:
      https://discord.gg/3cUNWaf5hX

  4. CavemanStu Avatar
    CavemanStu

    Was wondering for multi gpu set ups, when you run a mix of Nvidia cards say one or two 3090 and one or two 5060 ti’s does that affect performance at all can Nvidia GPUS from different gens be utilized effectively. Can you just put them on one board provided you have the power, cooling and pcie lanes and bandwidth and run them together?

    BTWs awesome videos I am amazed how some of these older cards can run with such great results. I also love that you actually show how to put the hardware together and set everything up. Id love a discord invite if possible. All the same thanks for taking the time to make these benchmarks the starwars motif made me laugh a good bit.

    1. David Jarboe Avatar

      yes, i’ve mixed and matched 50 series with no issues. I built llama.cpp with multi architexture versions so that any / all of the gpu’s that I had would run. So yes, to your question.

      Try this discord link:
      https://discord.gg/3cUNWaf5hX

Leave a Reply to Peter Cancel reply

Your email address will not be published. Required fields are marked *