Search This Blog

Thursday, 7 August 2025

Running your own AI LLM locally on your computer

 

I recently purchased a new M4 Macbook Pro and was keen to see how well it performs. What better way then trying to run a local AI language model. While there are different ways to deploy LLMs, this post focuses on the quick and easy way and aims to compare it to what is available in the cloud. Why do you want to run a local LLM you may ask? Firstly the fact that it can be run offline and all of your prompt details are not sent to a third party (i.e. OpenAI / meta, google, etc.) and secondly because you can :)

For this post I am using LMStudio (https://lmstudio.ai/) which is extremely simple to download and install (uses approx 500MB of space). Once installed you are prompted to choose a model with the default being OpenAI's gpt-oss-20b (~12GB) - which you can read more about here.


Once the model is downloaded you can "start a new chat". Below is my idle system usage prior to launching the model



And when the model has loaded



And when its in use. As you can see memory doesnt overly change but CPU usage increases when the prompt is returning. Further investigation into this shows that the GPU is used when the prompt is returning.



After reading some posts around speed and response i was reasonablly surprised it was quite performant taking less than a couple of seconds to respond. LMStudio also supports RAG so i may look to update this post with more details of RAG and or performance comparison with other LLMs which can be loaded into LMStudio.


Some considerations are to increase the token count when loading from the default 4096 tokens












No comments:

Post a Comment