#pest | 2024-12-28

← 2024-12-27 | 2025-01-05 → ↓

billymg[asciilifeform]: http://logs.bitdash.io/pest/2024-12-27#1034669 << yeah, i think this is true. but you could look into "fine tuning", that's something that's within the realm of a beefy home setup or rented server farm time

billymg[asciilifeform]: i think it's roughly taking an existing model and feeding it with enough domain-specific data that it can be used effectively for a given use case

discord_bridge[asciilifeform]: (awtho) billymg: I built llama-cpp. I attempted to run a 17gb deepseek model, but it ended up freezing my macbook. I tried another deepseek model using LM Studio (which should be accessible via Cline) but it is very very slow.

billymg[asciilifeform]: awt: is it an intel macbook pro or arm?

discord_bridge[asciilifeform]: (awtho) Arm

billymg[asciilifeform]: how much total ram?

billymg[asciilifeform]: when you try running the model with llama-server you can open 'Activity Monitor' and see how much memory you have available

discord_bridge[asciilifeform]: (awtho) 16 gb

discord_bridge[asciilifeform]: (awtho) Is there a way I can safely do that without freezing my machine?

billymg[asciilifeform]: ah, def not enough then for the 17gb model, it must have been swapping and that's what froze it

billymg[asciilifeform]: considering the OS, plus browser, IDE, whatever other random things are gonna take up at least 50% of your ram i'd say your best bet is trying it out on your desktop PC (assuming that has the specs for it)

billymg[asciilifeform]: you can run it on a desktop pc and serve on your local network too, so your macbook's VS Code plugin will just be making requests to llama-server on your desktop

discord_bridge[asciilifeform]: (awtho) Desktop has: 16 GB Radeon RX 6900 XT with 5120, 128 GB ECC ram.

billymg[asciilifeform]: that oughta be enough to get it going. i've only tried it with nvidia but you build it with HIP for AMD GPUs. llama-server then has a flag -ngl, --gpu-layers that lets you control how much to offload to VRAM

billymg[asciilifeform]: it will exit if you exceed your available vram, so the idea is to increase until it fails

billymg[asciilifeform]: the rest of the model will then load into your regular ram and the inferencing will happen on the CPU, so it will be slower but usable

discord_bridge[asciilifeform]: (awtho) ty

← 2024-12-27 | 2025-01-05 → ↑