Building an AI Analysis Lab: Introduction

All rigorous AI Analysis needs a sturdy AI Analysis Lab. There is no better way to learn than to get hands on, so let’s get hand on. If you have not seen some of the practical examples in the Malice in the Mesh series, I highly suggest you go over there and check it out. They are the product of our AI Analysis Lab put to good use, and have crystallized the concepts discussed. We have gone from theory to practice, and this multi-part walkthrough is the recipe for building the infrastructure to do so.

Our lab stack will consist of these tools:

Layer 0a: Host-native observability

eBPF and Linux tracing toolkit. Installed in Phase 1, used in nearly every later phase.

bpftrace: high-level eBPF tracing language. Custom scripts for execve, openat, connect, write, clone.
bcc-tools (bpfcc): pre-built eBPF utilities. The blueprint specifically wraps execsnoop-bpfcc, opensnoop-bpfcc, and tcpconnect-bpfcc into lab-*.sh helper scripts.
perf: kernel performance counters and sampling profiler. CPU profiles during inference.
strace: syscall tracer. Wrapped as lab-strace.sh
ltrace: library call tracer.
auditd: kernel audit subsystem for file access and execve monitoring; persistent rule-based logging.
bpftool: inspect loaded eBPF programs (validates Falco modern_ebpf driver is loaded).
tcpdump: packet capture, wrapped as lab-tcpdump.sh.
tshark : Wireshark CLI dissector for parsing pcaps
ss: socket listing.
/proc filesystem: process inspection (/proc/<pid>/environ, etc.).
Falco (host mode): runtime security with the modern_ebpf driver.

Layer 0b: Kubernetes observability substrate

Docker: container runtime under kind.
kind: Kubernetes-in-Docker for the lab cluster (3 nodes: 1 control-plane + 2 workers).
kubectl: cluster CLI.
Helm: chart-based installs for Cilium, Tetragon, Falco.
Cilium: eBPF-based CNI (replaces kind’s default CNI).
Cilium CLI: install and status management.
Hubble: Cilium’s flow observability layer (pod-to-pod flows, DNS, TLS/JA3).
Hubble CLI and Hubble UI: flow inspection.
Tetragon: eBPF-based process and file event tracing in Kubernetes; uses TracingPolicy CRDs.
tetra CLI: Tetragon’s command-line client.
Falco (Kubernetes mode): runtime security inside the cluster.

Layer 1: Model gateway

LiteLLM: the single observable entrypoint for all frontier-model requests. Also serves as the front-end for local models in Phase 9.
PostgreSQL: metadata persistence for LiteLLM, reused by LangGraph for checkpoints.

Layer 2: Frontier model path

Anthropic API (Claude): locked frontier provider, reached via LiteLLM.

Layer 3: Tool protocol boundary (MCP)

MCP Python SDK (mcp): installed into the lab venv to build both transports.
Custom stdio MCP server: built in-lab.
Custom network MCP server: built in-lab (HTTP transport).
FastMCP: cloned for repo analysis in Phase 6.
python-sdk (official MCP): second analysis target repo in Phase 6.

RAG / memory extension stack

ChromaDB: vector store (pinned around FastAPI <0.132 due to a wire format regression).
sentence-transformers / HuggingFace embeddings: embedding model.
FastAPI, httpx, uvicorn: service stack underneath Chroma.

The construction of our lab is purposeful, one phase building upon the last and laying the groundwork for the next.

Phase 1 and 2 turns a Ubuntu 24.04 box into an observability station where we can drop anything into the middle and collect telemetry. Every tool has a validation test that proves it works on a known good process before any AI workload exists. At these phases is where we deploy Layer 0a and Layer0b from above.

Phase 3 is where we configure the gateway. LiteLLM deployed in cluster, backed by PostgreSQL, with Claude API. The gateway becomes the single chokepoint through which every future model call flows, which is what makes correlation possible later.

Phase 4 is where we set up the orchestrator. LangGraph wired to LiteLLM for model calls and to the same Postgres for checkpoint persistence.

Phase 5 is the tool boundary. We build one stdio MCP server and one network MCP server.

Now we begin to drop real subjects into our lab for analysis.

Phase 6 mounts real repos. FastMCP and the official MCP python-sdk, and builds a task library against them: map repo structure, find entrypoint, read depositories, run tests, read README. Then it wraps those tasks in a LangGraph agent and runs the whole thing under full instrumentation. Now you’re capturing the syscalls and network egress that happen when an agent actually walks a real codebase.

Phase 7 is the real meat. We add the sandbox action layer: OpenHands in Docker sandbox mode, model calls routed through LiteLLM to Claude, real repos mounted into the workspace. Action tasks now include things that have side effects: running tests, making code changes, starting local services.

Phase 8 points the host-native instrumentation at coding agents running directly on the OS. We capture syscall fingerprints with bpftrace, write Falco rules that detect coding-agent behavior patterns, and compare agents side-by-side using a standard task.

Phase 9 adds local model serving: GPT-J-6B registered in LiteLLM as just another upstream, which means every exercise from Phase 4 onward can now be re-run against local instead of frontier and the telemetry compared directly.

Phase 10 adds n8n as the first workflow-builder target, with a webhook to LLM to action workflow studied at the network and process layers, plus community-node installation as a supply-chain study.

Come back for the next installment where we will dive into the first phase.

The content published on this site reflects personal views and research only. It does not represent the views, positions, or policies of any current or former employer, client, or affiliated organization.

Any references to technologies, vulnerabilities, or security practices are for educational and informational purposes only. Nothing on this site should be interpreted as endorsement, disclosure of confidential information, or professional advice.

All examples are generalized or fictionalized unless explicitly stated otherwise.

Latest Posts

roccofiorecyber@gmail.com

Building an AI Analysis Lab: Part 2May 20, 2026
Welcome to the second installment of our AI Analysis Lab. Part 1 gave a thorough ordering of our host-native stack. Part 2 walks through the build of the Kubernetes observability layer that turns nameless syscalls into named flows. A kind cluster running Cilium as the CNI, Hubble for the flow log, Tetragon for in-container process… Read more: Building an AI Analysis Lab: Part 2
Building an AI Analysis Lab: Part 1May 11, 2026
Welcome to the first installment of our AI Analysis Lab, built for running the practical examples in the Malice in the Mesh series, as well as further continuing analysis. The first thing to know as we jump right in, is the examples below are examples only. While much of the same code and process will… Read more: Building an AI Analysis Lab: Part 1
Building an AI Analysis Lab: IntroductionMay 10, 2026
All rigorous AI Analysis needs a sturdy AI Analysis Lab. There is no better way to learn than to get hands on, so let’s get hand on. If you have not seen some of the practical examples in the Malice in the Mesh series, I highly suggest you go over there and check it out.… Read more: Building an AI Analysis Lab: Introduction
Malice in the Mesh // 06: Behavioral Analysis at the Network and Memory LayersMay 9, 2026
Welcome back. For the foreseeable future, we will be tacking detection engineering for AI Agents one detection surface at a time. In this installment, we begin by diving into behavioral analysis at the network layer. Each analysis type may be comprised of multiple installments, as we move through the discipline in a slow, but smooth… Read more: Malice in the Mesh // 06: Behavioral Analysis at the Network and Memory Layers
Malice in the Mesh // 05: Intro to Detection Engineering for AI Agent SystemsMay 2, 2026
If modern AI agents were as simple as one outbound API call and a model output, there would not be a need for a comprehensive guide regarding detection engineering of these systems. Agents and orchestration tools have grown to now encompass tool use, file access, local execution, memory, workflow orchestration, network access. Not only have… Read more: Malice in the Mesh // 05: Intro to Detection Engineering for AI Agent Systems