Raspberry PiAITutorial

Get Started with the AI HAT+ 2 on Raspberry Pi 5: A Step-by-Step Setup and First Projects

UUnknown

2026-01-21

9 min read

Step-by-step guide to unbox, install drivers, run generative models on Raspberry Pi 5 with AI HAT+ 2 and three beginner projects.

Beat the fragmentation: get your Raspberry Pi 5 running local generative AI in one afternoon

Too many tutorials stop at theory or show ancient hardware. If you're a student, teacher, or hobbyist who wants a practical, repeatable path to run local generative AI on a Raspberry Pi 5, this guide walks you through unboxing the AI HAT+ 2, installing drivers, running a sample model, and building three beginner-friendly projects. These steps reflect the latest edge-AI trends in late 2025 and early 2026 — when on-device models, quantized GGUF runtimes, and ARM/NPU acceleration became mainstream for makers.

The AI HAT+ 2 in 2026: why it matters

In 2025–2026 we've seen a clear shift: developers want models that run locally for privacy, latency, and cost predictability. The AI HAT+ 2 is one of the accessible hardware accelerators that bring that capability to the Raspberry Pi 5 at a hobbyist price point. It adds a dedicated inference engine (NPU/accelerator) and vendor drivers that make small-to-medium generative models practical at the edge.

Edge AI is no longer niche in 2026 — the ecosystem (ggml/gguf, llama.cpp, ONNX/ORT, optimized quantized models) is mature enough for classrooms and small labs.

What you'll need (hardware & accounts)

Raspberry Pi 5 (64-bit OS recommended)
AI HAT+ 2 (unit + mounting screws)
Power supply rated for Pi 5 + HAT (check HAT specs — often 3A+)
microSD or NVMe boot drive with Raspberry Pi OS 2026 build (64-bit)
USB keyboard/mouse and HDMI monitor (or SSH enabled)
Optional: fan/heatsink for sustained inference
Internet access for driver downloads and model pulls

Unboxing and quick hardware checklist

Open the box and verify the contents. A typical AI HAT+ 2 package contains:

AI HAT+ 2 board
Mounting screws and standoffs
USB/flat ribbon cable (if required)
Quick-start guide / QR code to vendor docs

Mount the HAT

Power down the Pi and unplug it.
Attach the HAT to the Pi's 40-pin header (align pins carefully).
Secure with the supplied standoffs; connect any ribbon cables as instructed.
Reattach power and boot.

Tip: the Pi 5 can run hotter under load. Use a small fan or heatsink when benchmarking models.

Prepare Raspberry Pi OS (2026 build)

Use a current 64-bit Raspberry Pi OS (Debian base in 2026). Update and install build essentials and Python:

sudo apt update && sudo apt upgrade -y
sudo apt install -y python3 python3-venv python3-pip git build-essential cmake libopenblas-dev

Create a virtual environment to isolate dependencies:

python3 -m venv ~/ai-hat-env
source ~/ai-hat-env/bin/activate
pip install --upgrade pip

Install AI HAT+ 2 drivers & runtime

Vendor drivers are required to enable the HAT's NPU. Follow the vendor instructions — most provide a GitHub repo and an installer script. The high-level flow is:

Clone the vendor repo.
Run the installer (it usually installs kernel modules, a user-space runtime, and udev rules).
Reboot and verify the device is visible.

# Example (replace  with the official URL)
git clone https://github.com/vendor/ai-hat-plus-2.git
cd ai-hat-plus-2
sudo ./install.sh
sudo reboot

After reboot, check dmesg or lsusb / lspci depending on interface:

dmesg | tail
# or
ls /dev | grep ai_hat

Troubleshoot: If the device doesn't appear, re-run the install with sudo, check kernel version compatibility, and confirm you installed the vendor-specified kernel headers. For broader deploy patterns and hybrid fallbacks, see our creator-led, cost-aware cloud playbook.

Runtimes and frameworks to install

In 2026 the common stack for edge generative AI is:

llama.cpp (via llama-cpp-python): CPU-friendly, supports GGML/GGUF quantized models
ONNX Runtime / OpenVINO / vendor runtime: for models converted to ONNX and accelerated on NPU
whisper.cpp and whisperx: for offline speech transcription
diffusers with optimized backends (for small image models)

Install core Python libraries:

pip install llama-cpp-python transformers onnxruntime-simd flask uvicorn fastapi streamlit

For speech: build whisper.cpp (fast and easy on Pi 5):

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make -j4
# you'll get a small binary for transcription

Run your first local generative AI (chat) — step-by-step

We'll run a tiny quantized model with llama.cpp through the Python binding. For classroom use pick models licensed for local use (check model license on Hugging Face). The example here uses a small community GGUF model; replace the name with any model suitable for education.

Download a small GGUF model (example):

# create a models directory
mkdir -p ~/models && cd ~/models
# Example: replace with actual model path you choose
wget https://huggingface.co/the-community/small-gguf-model/resolve/main/small-gguf-model.gguf -O small.gguf

Run a quick Python program that loads the model and responds:

cat > ~/chat_local.py << 'PY'
from llama_cpp import Llama
llm = Llama(model_path='~/models/small.gguf')
resp = llm.create(prompt='You are a helpful tutor. Explain HTTP in simple terms.', max_tokens=150)
print(resp['choices'][0]['text'])
PY
python ~/chat_local.py

You should see a short response. Performance depends on the model size, quantization, and whether the HAT runtime is used. For larger models, configure the vendor runtime or ONNX acceleration to use the NPU. For tips on profiling and on-device signals, check our note on edge performance & on-device signals.

Three beginner projects to showcase the AI HAT+ 2

These projects scale from simple to slightly more involved, each producing a deployable portfolio item.

Project 1 — Local Chatbot Web App (Streamlit)

Purpose: Build a simple web chat that runs locally and demonstrates low-latency, private inference.

Create a folder and virtual environment (we did earlier).
Install Streamlit and llama-cpp-python (done above).
Copy this minimal app to app.py:

cat > ~/chat_app.py << 'PY'
import streamlit as st
from llama_cpp import Llama

st.title('Pi Local Chat')
llm = Llama(model_path='~/models/small.gguf')

if 'history' not in st.session_state:
    st.session_state.history = []

prompt = st.text_input('Ask something')
if st.button('Send') and prompt:
    resp = llm.create(prompt=prompt, max_tokens=200)
    txt = resp['choices'][0]['text']
    st.session_state.history.append(('You: ' + prompt))
    st.session_state.history.append(('Bot: ' + txt))

for msg in st.session_state.history:
    st.write(msg)
PY

streamlit run ~/chat_app.py --server.port=8501 --server.address=0.0.0.0

Open your Pi's IP at port 8501 to chat. This is a neat demo for class presentations or adding to your portfolio. If you want reusable UI components for demos, see the new component marketplaces for lightweight micro‑UIs you can adapt.

Project 2 — Voice-enabled Assistant (Whisper + LLM + TTS)

Purpose: Demonstrate multimodal edge AI — transcribe voice, ask an LLM, and reply via TTS.

Record audio (arecord) and transcribe with whisper.cpp.
Pass transcription to the LLM (llama.cpp).
Synthesize the response using a lightweight TTS (e.g., eSpeak or Coqui TTS lightweight model).

# Record 5 seconds
arecord -D plughw:1,0 -f cd -t wav -d 5 -r 16000 voice.wav
# Transcribe
./whisper.cpp/main -m ~/models/ggml-small.bin -f voice.wav -otxt
# Use the text file as prompt to the LLM, then TTS

Wrap this in a Python script to orchestrate the steps and produce a one-file demo. This project is great for classroom demos about speech-to-text pipelines and responsible AI (local, private). For lessons and starter templates that combine local inference with cloud fallbacks, see our hybrid edge/cloud workflows playbook.

Project 3 — Image Captioning + Small Image Generation

Purpose: Show how to combine a vision model with a generative model. On Pi 5, full SDXL is heavy — use a tiny/optimized image model or offload to a small ONNX model that runs with the HAT runtime.

Use a vision encoder (CLIP-like) distilled model to create captions.
Feed captions into the LLM for refinement.
Optionally run a lightweight image generator (diffusers-lite) or provide a server-side fallback if local generation is too slow.

# Example: caption an image using a small Hugging Face model
pip install transformers pillow
python - << 'PY'
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image

img = Image.open('photo.jpg')
processor = BlipProcessor.from_pretrained('Salesforce/blip-image-captioning-small')
model = BlipForConditionalGeneration.from_pretrained('Salesforce/blip-image-captioning-small')
inputs = processor(images=img, return_tensors='pt')
out = model.generate(**inputs)
caption = processor.decode(out[0], skip_special_tokens=True)
print('Caption:', caption)
PY

If you then want to generate or refine images, try a small ONNX model or use an API as a hybrid approach — part local, part cloud — and clearly label what runs where for privacy transparency.

Optimization & real-world tips (what I learned teaching these labs)

Use quantized models: 4-bit and 8-bit GGUF models dramatically reduce RAM and inference time. See edge performance guidance for profiling on-device signals.
Model selection matters: choose models explicitly licensed for local/offline use.
Enable swap file carefully: it helps avoid OOM for larger models but slows inference. Use a fast NVMe drive if available.
Thermals and power: sustained inference can trigger throttling. Add a fan/heatsink and a good PSU.
Benchmark: run small prompt loops to measure tokens/sec; log the results for classroom reports. For broader benchmarking and deployment patterns see edge AI platform notes.

Troubleshooting checklist

No device found after install: verify kernel headers & recompile modules. Check vendor issues on GitHub.
Slow model response: confirm quantized model in use and that vendor runtime is enabled.
Model fails to load: insufficient RAM — pick a smaller GGUF or use a swap file.
Permission errors: ensure udev rules applied; reboot after install.

Security, licensing, and classroom best practices (2026)

Edge AI makes privacy easier but not automatic. In 2026, institutions expect students to document model origins and licenses. For every model you use keep a simple README in your project that documents:

Model name & source (URL)
License or terms of use
Quantization applied and any conversion steps

Also follow these safety practices:

Don't expose local models to the public internet without authentication. See privacy-by-design patterns for APIs if you expose a local service.
Label outputs when showing demos ("generated text/image") and avoid using sensitive training prompts.

What’s changed since 2025 — trends you should know

By early 2026 the ecosystem matured in three ways that directly affect Pi projects:

Standardized quantized formats (GGUF) — making distribution and inference consistent across runtimes.
Wider ARM/NPU optimization — vendors provide optimized runtimes for small NPUs, improving throughput.
Hybrid edge/cloud workflows — common practice to run private prompts locally and fall back to cloud for heavy tasks; see the hybrid playbook for recommended patterns.

Resources & learning next steps

Hugging Face model hub — filter by license and size
llama.cpp and llama-cpp-python repos — fast local inference helpers
whisper.cpp — tiny offline transcription
Vendor's AI HAT+ 2 GitHub & docs (always the authoritative driver source)
Field kits and beginner STEM hardware: see the FieldLab Explorer Kit for ideas on hands-on classroom kits.

Wrap-up: actionable takeaways

Install vendor drivers first, then set up a Python virtualenv.
Start small: run a tiny GGUF model with llama.cpp to validate your stack.
Build one project end-to-end (chat, voice, or vision) to produce a portfolio demo.
Document model licenses and performance notes for reproducibility.

Call to action

Ready to build your first Pi 5 + AI HAT+ 2 project? Pick one of the three projects above, follow the step-by-step commands, and publish a short demo video or GitHub repo. If you want guided lessons, check our practical workshops at webbclass that walk students through each lab with starter templates and graded assignments. Start now — local AI on the Pi is a perfect, low-cost way to learn modern ML engineering in 2026.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.