Main Menu

Recent posts

#51
🔥 **Side-by-Side Coding Showdown for PowerBASIC Developers**

Hello PowerBASIC friends, 👋

I made a new video where we test the **original Qwen 3.5 9B model** against the **same model fine-tuned specifically for PowerBASIC 10**.

The question is simple:

👉 **Can fine-tuning really make a visible difference for PowerBASIC coding?**
👉 **Does a PB10-specialized model give better answers than the original base model?**

In the video, both models are tested side by side with real PowerBASIC-related prompts.

---

## 🧪 What we tested

✅ **Test 1:**
The fine-tuned model gives the better PowerBASIC 10 solution.

✅ **Test 2:**
The original model gives a long, general explanation.
The fine-tuned model gives a short, targeted PB10 answer.

✅ **Test 3:**
The original model can still get very good results if it uses the **SindByte MCP Server** and Internet search.
But the fine-tuned model already knows the solution immediately.

✅ **Test 4:**
We switch the fine-tuned model from **Q4** to **Q8** and compare the quality improvement.

---

## 💡 Main conclusion

A general AI model can become very powerful when it has access to tools, MCP servers, and Internet search.

But a **fine-tuned specialist model** has a big advantage:

🚀 It answers faster
🎯 It is more focused
🧠 It already knows PowerBASIC-specific details
🛠� It produces more practical PB10 code
📉 It needs less explanation and less correction

For PowerBASIC developers, this is especially interesting because many general AI models do not know enough about real PB10 syntax, compiler behavior, and coding conventions.

---

## 🧰 Also shown in the video

The video also demonstrates how useful a local MCP setup can be.

With the **SindByte MCP Server**, even the original model can search, use tools, and improve its results when it does not know the answer directly.

So the strongest setup may be:

⭐ **Fine-tuned PowerBASIC model**
➕ **local MCP Server tools**
➕ **Internet search when needed**

That combination gives both:

✅ built-in specialist knowledge
✅ external tool power
✅ better real-world coding workflow

---

## 🎥 Video title

**Qwen 3.5 9B vs PB10 Fine-Tuned: PowerBASIC 10 Coding Showdown**

📺 Watch the video here:
👉

---

## 🔗 Links

🌐 SindByte MCP Server:
https://smart-ai-robot.com/en/index.html

🐦 Follow me on X:
https://x.com/TheoGottwald

---

## ❓Question for the forum

What would you prefer for PowerBASIC work?

**A)** A large general model with MCP tools and Internet search
**B)** A smaller fine-tuned PowerBASIC specialist model
**C)** Both together

My current impression:
👉 **The fine-tuned model is clearly better for direct PB10 coding, but MCP tools make the full setup much stronger.**

Looking forward to your opinions and test ideas! 🚀

Qwen 3.5 9B Q4-Q8 PB 10 optimized for download.*

*Use with LM-Studio, Ollama etc.


#52
China's Mega Machines Just Did the IMPOSSIBLE — Engineers Around the World Are Speechless


https://www.youtube.com/watch?v=uY1MnsI4i2Q
#53
Project Progress and Learning / Blog 28.04.20226
Last post by Theo Gottwald - April 28, 2026, 12:44:22 PM
When Kimi thinks its Claude.
Don't believe KI has no weaknesses.

2026-04-28 08_13_04-Greenshot.png
#54
80 Years before we would have had this engeneering in germany.
Now its every where in the world - just not in germany.
This is the follow up of the lost war.
#55
I tested that some time ago, i remember that its somewhere limited after a while, you will see.
If not just use it.
KILO-Code can also be used inside VS-Code as "AddOn" which is possibly better then the pure CLI.

Kilo.png

There are several free usable Models (yet there is a daily cap on max use).

Kilo2.png
#56
Right now, I'm just messing around. I'm new at this, trying to get a better picture. I came across this video, "UNLIMITED FREE Deepseek-V4 PRO AI Coder:" "https://www.youtube.com/watch?v=e5aud8zON8o"
NVIDIA Developer Program: free
Has these new models:
Deepseek-V4 PRO and FLASH
Minmax-27
glm-5.1
Has kimi-2.5, doesn't have 2.6 yet.
So, I signed up. Got access to Deepseek-V4 PRO. I didn't try to do anything other than test it to see if it was working. Works!

Another very powerful free AI tool is Google AI mode in the browser. It won't write an application for you, but if you tell it what you want, it will tell you what you need and problems you might have. It will write all kinds of helper code for you.

Before I signed up for Free NVIDIA Developer Program, I used Google AI to get a better picture. If anyone is interested in NVIDIA Developer Program, Google AI can explain it better than I can. There's so much going on there, you'll probably need Google AI to find the actual URL to use an AI model.

I tried out a new local model, DavidAU/Qwen3.6-27B-NEO-CODE-Di-IMatrix-MAX-GGUF. I asked it to write a resizable C Windows GUI text editor with File, Edit and Font menu. Also, a menu to turn line wrap on and off. Everything worked except for the line wrap. It's a dense model, slow on even the best hardware. You have to just let it run in the background. This is the first model that I have tried to pass that test. I can reload it and get it to fix that and keep adding pieces. 
#58
Finetuning the Qwen 27b-q4 for PowerBasic.
Lets see what we get. The model is even without the Fine-Tuning really good.
Lets see if it get more nuances.

Finetuning.png
#59
OxygenBasic / Re: OxygenBasic PreRelease
Last post by Charles Pegge - April 24, 2026, 07:17:34 PM
Thanks Theo, If I do furtherwork on the o2 compiler, it might be worth further analysis by the agents. I was quite pleased with the feedback they provided.

Thanks Nicola, I think JuliaRings came from Mike Lobanovsky with minor adaptations. He also provided us with demos\GL\GLSL\glJulia1.o2bas, which is also a beautiful animation, but more difficult to understand, since it involves OpenGl shaders.
#60
    🚀 The Way of Local AI: From Prompting to Fine-Tuning 🧠🔥

    If you go the "way of the local AI", you will of course test multiple AIs/models, different prompts, different system messages, different quantizations, different samplers, different context sizes and all sorts of settings.

    But the final step, when you really want the model to behave in your own special way, is:

    🎯 Model Fine-Tuning

    This is where you stop only "talking nicely" to the model and start teaching it a specific behavior, style, format, workflow or domain knowledge.

    Examples:

    ✅ Your own coding style 
    ✅ Your own support-bot behavior 
    ✅ Your own company documentation style 
    ✅ Your own tool-calling format 
    ✅ Your own agent workflow 
    ✅ Your own prompt language 
    ✅ Your own local AI assistant personality 
    ✅ Better answers for your exact use-case 

    But fine-tuning is not magic. It is also the point where many things can fall on your leg. 😄



    🧩 First Important Question: Do You Really Need Fine-Tuning?

    Before fine-tuning, check these easier steps:

    1. Better prompting ✍️ 
    Many problems are simply bad prompts, missing examples or unclear system instructions.

    2. RAG / Document Search 📚 
    If the model only needs access to facts, manuals, PDFs, source code or documentation, then RAG is often better than fine-tuning.

    3. Tool usage / MCP / Agents 🛠� 
    If the model must do real actions, use tools. Fine-tuning does not magically give the model access to files, APIs or programs.

    4. Fine-tuning 🧠 
    Use fine-tuning when you want the model to learn a repeatable response pattern, format, coding style, classification behavior or domain-specific task behavior.



    🖥� Hardware Needed for Local Fine-Tuning

    For local fine-tuning, the most important thing is not CPU speed.

    The most important thing is:

    🔥 VRAM, VRAM, VRAM

    The GPU memory decides what model size, batch size, context length and training method you can use.

    Rough practical guide:

    • 8 GB VRAM – small models only, very limited experiments
    • 12 GB VRAM – small 3B/7B experiments with heavy quantization
    • 16 GB VRAM – useful for 7B/8B QLoRA experiments
    • 24 GB VRAM – good for 7B/8B/14B fine-tuning, depending on settings
    • 32 GB VRAM – very good local enthusiast level, e.g. RTX 5090 class
    • 48 GB+ VRAM – serious workstation level
    • 80 GB+ VRAM – A100/H100/H200/B200 cloud or enterprise level
    A modern local high-end card like an RTX 5090 with 32 GB VRAM is already very strong for LoRA/QLoRA style fine-tuning, but it is still not the same as having an 80 GB data-center GPU.

    For most private users, the realistic local path is:

    ✅ LoRA / QLoRA fine-tuning 
    ❌ Not full fine-tuning of huge models



    🧠 LoRA, QLoRA and Full Fine-Tuning

    Full fine-tuning changes the whole model. 
    This needs much more VRAM, more compute, more storage and more care.

    LoRA trains small adapter layers instead of the whole model. 
    This is much cheaper and much easier.

    QLoRA uses quantization to reduce memory usage even more. 
    This is currently the realistic method for most local users.

    So for local AI users, the usual recommendation is:

    Start with QLoRA. Do not start with full fine-tuning.



    ☁️ Renting GPUs Instead of Buying Hardware

    If you do not want to buy expensive hardware, you can rent GPUs in the cloud.

    Typical GPU rental platforms include providers like:

    • RunPod
    • Vast.ai
    • Lambda
    • Paperspace
    • FluidStack
    • Akash
    • Other cloud GPU marketplaces

    The advantage:

    ✅ No need to buy a 3000–6000 EUR workstation 
    ✅ You can rent stronger GPUs like A100, H100, H200 or B200 
    ✅ You only pay while training 
    ✅ Good for experiments 
    ✅ Good if you need more VRAM only sometimes 

    The disadvantage:

    ⚠️ You must upload your data 
    ⚠️ You must secure your API keys and SSH keys 
    ⚠️ Bad configuration can waste money fast 
    ⚠️ Storage costs can continue after the GPU is stopped 
    ⚠️ Some cheap marketplace machines may be unreliable 
    ⚠️ You need Linux knowledge 
    ⚠️ You must download your finished adapters/models before deleting the machine 

    Very important:

    🛑 Stop the GPU when you are finished. 
    🛑 Delete unused volumes if you no longer need them. 
    🛑 Do not leave expensive machines running overnight by mistake.

    A small LoRA job can be cheap. 
    A badly configured cloud training run can become expensive very quickly. 💸



    🧱 Dense Models vs. MoE Models

    This is one of the most important differences.

    1. Dense Models

    A dense model uses the whole model for each token.

    Example:

    • Llama 8B
    • Qwen 7B / 14B / 32B
    • Mistral 7B
    • Gemma dense models

    If it is a 14B dense model, then basically the 14B model is active during inference.

    Advantages:

    ✅ Easier to understand 
    ✅ Easier to fine-tune 
    ✅ Easier to deploy 
    ✅ More predictable memory behavior 
    ✅ Usually simpler for beginners 

    Disadvantages:

    ⚠️ Bigger dense models need more VRAM 
    ⚠️ Training cost rises directly with model size 
    ⚠️ A 32B dense model is much heavier than a 7B dense model 

    2. MoE Models – Mixture of Experts

    MoE means "Mixture of Experts".

    The model contains multiple expert networks, but only some experts are active for each token.

    Example idea:

    A model may have 8 experts, but only 2 are used per token.

    So the model may have a large total parameter count, but a smaller active parameter count.

    Advantages:

    ✅ Can be very powerful 
    ✅ Can have high total capacity 
    ✅ Only part of the model is active per token 
    ✅ Often strong for reasoning and broad knowledge 

    Disadvantages:

    ⚠️ More complicated to fine-tune 
    ⚠️ More complicated to serve 
    ⚠️ Can need a lot of total VRAM anyway 
    ⚠️ Expert routing can behave unexpectedly 
    ⚠️ Multi-GPU setups can become more difficult 
    ⚠️ Not always beginner-friendly 

    Important:

    A MoE model may say "only 13B active parameters", 
    but you may still need to load a much larger total model into memory.

    So do not only look at "active parameters". 
    Also check:

    • total parameters
    • quantization format
    • VRAM requirement
    • context length
    • framework support
    • fine-tuning support
    • inference speed



    📦 What Data Do You Need?

    Fine-tuning quality depends heavily on your dataset.

    Bad data creates a bad model.

    You need examples like:

    User: Please convert this CSV to JSON.
    Assistant: Sure. Here is the JSON output...

    Or for coding:

    Instruction: Write a PowerBASIC function that trims and validates a string.
    Answer: FUNCTION ...

    Good training data should be:

    ✅ Clean 
    ✅ Consistent 
    ✅ Correct 
    ✅ Deduplicated 
    ✅ Legally usable 
    ✅ In the right format 
    ✅ Similar to the task you want later 

    Bad training data causes:

    ❌ hallucinations 
    ❌ broken code 
    ❌ strange formatting 
    ❌ overfitting 
    ❌ repeated phrases 
    ❌ worse general ability 
    ❌ model personality damage 



    ⚠️ Things That Can Fall on Your Leg

    Here are the common traps:

    1. Too little VRAM 🧯 
    The training crashes with CUDA out-of-memory errors.

    2. Wrong model format 📦 
    GGUF is usually for inference. Fine-tuning often needs Hugging Face / safetensors models.

    3. Wrong tokenizer 🔤 
    If tokenizer and model do not match, the result can be broken.

    4. Bad dataset format 📄 
    The model learns garbage formatting.

    5. Too high learning rate 🔥 
    The model becomes stupid very quickly.

    6. Too many epochs 🔁 
    The model memorizes your examples instead of generalizing.

    7. No evaluation set 🧪 
    You do not know if the model improved or just became worse.

    8. Mixing languages badly 🌍 
    If you mix English, German, code, comments and instructions without structure, the model may become inconsistent.

    9. Expecting new knowledge from fine-tuning 🧠 
    Fine-tuning is not a database. For facts, use RAG.

    10. Fine-tuning the wrong base model 🎯 
    If the base model is bad at your task, fine-tuning may not rescue it.

    11. Ignoring licensing ⚖️ 
    Check model license and data license before commercial usage.

    12. Forgetting deployment 🚀 
    Training is only half the job. You also need to run the result locally in LM Studio, Ollama, text-generation-webui, vLLM, llama.cpp or your own system.



    🛠� Typical Fine-Tuning Workflow

    A practical workflow looks like this:

    [list=1]
    • Choose the base model
    • Prepare clean training examples
    • Split into train and validation data
    • Start with LoRA or QLoRA
    • Use a small test run first
    • Evaluate with real prompts
    • Adjust learning rate, epochs and dataset
    • Merge or export the adapter if needed
    • Quantize for local inference if needed
    • Test inside your real application

    Do not start with a huge training run.

    Start small:

    ✅ 100 examples 
    ✅ short test 
    ✅ check result 
    ✅ then scale up



    🧪 What Should You Fine-Tune For?

    Good fine-tuning targets:

    ✅ answer format 
    ✅ coding style 
    ✅ tool-call syntax 
    ✅ classification 
    ✅ support replies 
    ✅ domain-specific workflows 
    ✅ structured output 
    ✅ JSON output 
    ✅ agent behavior 
    ✅ company-specific style 

    Bad fine-tuning targets:

    ❌ storing large documentation 
    ❌ replacing a search engine 
    ❌ fixing a fundamentally bad base model 
    ❌ forcing a small model to become GPT-5 
    ❌ training from random scraped garbage 
    ❌ training without tests 



    🏁 Practical Recommendation

    For most local AI users:

    Use a good dense 7B/8B/14B model first. 
    Then try QLoRA. 
    Then test with your real use-case. 
    Only then move to larger dense or MoE models.

    Dense models are usually easier for beginners.

    MoE models can be powerful, but they bring more complexity, especially for fine-tuning and deployment.

    If you have a strong local GPU, use it for experiments. 
    If you need more VRAM, rent a cloud GPU for a few hours.

    But always remember:

    Fine-tuning is not magic. 
    Fine-tuning is data quality + correct settings + evaluation. 🧠⚙️

    The real secret is not just the GPU.

    The real secret is:

    Good examples. Good tests. Good workflow. 🚀


    2026-04-24 15_35_59-Greenshot.png