Building an AI Analysis Lab: Part 1

Welcome to the first installment of our AI Analysis Lab, built for running the practical examples in the Malice in the Mesh series, as well as further continuing analysis. The first thing to know as we jump right in, is the examples below are examples only. While much of the same code and process will work for your own homelab, there are numerous pieces which are specific to my Ubuntu 24.04 virtual machine environment, variables contained therein, and my own preferences.

While the primary architect for the AI Analysis Lab is myself, Claude assisted in code generation. With that being said, artificial intelligence can make errors, and subtle errors at that. I highly suggest to always proofread, unit test, validate, and verify in your own lab. As well as segregating all work to a virtual machine which can be cloned, rolled back, and is disposable. Similar to a malware analysis lab.

Important: I use the root user during this lab, and it is constructed for that. Using root is a security risk. I am in my homelab using an ephemeral virtual machine. Risk accepted due to my environment and outweighed efficiency cost for myself.

Spinning up your virtual machine

We download a fresh Ubuntu image and set up our virtual machine.

Section 0: Host readiness check

Before installing anything, confirm the host meets the floor. The lab assumes Ubuntu 24.04 with a 6.x kernel, as eBPF relies on kernel version, and most of the tooling for agent analysis (bpftrace one-liners, modern bcc tools, Falco’s modern_ebpf driver) targets 6.x.

You want Ubuntu 24.04.x, kernel 6.8.0 or higher, and CONFIG_BPF entries greater than zero. If you are stuck on a 5.x kernel, install the HWE kernel and reboot before going further. Trying to follow this post on an older kernel produces lots of failures whose root cause is “your kernel does not have the eBPF features the tools expect”:

This is why you use an ephemeral virtual machine. Messing with the kernel on your daily driver for this lab is… not advised.

Section 1: System update and base dependencies

The linux-headers package is the one to pay attention to. bpftrace will fail at runtime with errors like “Could not resolve symbol” if the headers for your running kernel are not installed; this is the most common single cause of mystery bpftrace failures on a fresh box. The rest are general-purpose build and inspection tools you will use throughout the series.

Section 2: Install all tracing and network tools

The lab manual installs everything in one shot rather than tool by tool:

Section 3: Verify the installs

Verification and validation is incredibly important every step of the way. We won’t belabor this point and I will leave the frequency up to you to decide, but I wanted to set the stage here:

Section 3.5: Raise inotify limits

This step looks unimportant and is in fact load-bearing, especially on Ubuntu Desktop. Firefox, VS Code, Docker, and any number of file watchers consume inotify resources by default, and the stock limits are too low for a host that adds Falco, auditd, and (later in the series) Cilium and Tetragon on top.

If you skip it, Falco refuses to start with could not initialize inotify handler, and auditd may log Failed to allocate directory watch: Too many open files during install. I have lost a real hour to this, the lab manual learned the hard way too.

Verify:

Section 4: Install Falco in host mode with the modern eBPF driver

Falco is the runtime security tool which reads syscalls through its own eBPF driver, evaluates them against a rule engine, and emits structured alerts. For the rest of this lab, every “detect when an agent does X” question has a Falco rule as part of the answer.

The dialog will prompt for driver selection. Choose Modern eBPF and Yes for automatic ruleset updates.

Section 5: Verify Falco

Expect Falco 0.39 or higher, the service in active state, the default ruleset loaded from /etc/falco/falco_rules.yaml, and Falco’s eBPF programs visible in bpftool output. The last line confirms the security tool’s eBPF programs are attached to the kernel and reading the same syscall stream you are about to start reading by hand.

If Falco fails to start, the most common cause is a driver mismatch:

Section 6: Deploy a custom Falco rule

The rule below detects curl execution. It is deliberately trivial because the goal here is to validate the deployment mechanism, not the rule itself.

The rule’s anatomy is worth understanding because every Falco rule in the rest of the series follows this shape. condition is a boolean expression over Falco’s macros and fields. spawned_process is a built-in macro for evt.type=execve, and proc.name is the basename of the executed binary. output is the alert template, with %-prefixed field interpolations that get filled in at alert time. priority controls the log level and (depending on your Falco config) whether the alert is even emitted. tags are arbitrary labels you can use to group, filter, or route alerts later.

Section 7: Configure auditd and kernel parameters

First, auditd starts and is enabled to start at boot. Second, the kernel’s perf_event_paranoid parameter is set to -1, which lifts the restriction on non-root use of performance counters. Ubuntu defaults to 2, which is appropriate for production but breaks perf and several bcc tools in subtle ways for a lab user. The setting persists via the sysctl.d file.

Do not set perf_event_paranoid=-1 on a production server. Lab hosts only.

Section 8: Create the lab directory structure

Note: using root user as stated above.

Every wrapper script in Section 9 writes its output to evidence/phase-NN/ with a timestamped filename. When you eventually point five different tracers at one Claude Code session, every artifact lands in the same directory with a coherent timestamp range, and you can find them again later.

evidence/ is for captured artifacts (logs, pcaps, strace traces). configs/ is for tool configuration files you want to version. scripts/ is for the wrappers in the next section. experiments/, repos/, logs/, and notes/ are workspace for actual analysis sessions later.

Section 9 : Reusable evidence-capture wrappers

Five wrappers, all using the same conventions. Each takes <phase> and<experiment> arguments to determine the output path and filename prefix, appends a timestamp, defaults the duration to a sensible value while letting you override it. Ad-hoc tracer runs produce ad-hoc files, and ad-hoc files become unparseable noise within a week. These wrappers force a discipline that pays off the first time you have to compare two captures.

9a: execsnoop wrapper

Captures every new process system-wide. This is what you leave running in a background terminal while a coding agent works on a real task.

9b: opensnoop wrapper

The same idea for file opens. Critical for catching what configuration, model, or repository files an agent is actually reading.

9c: tcpconnect wrapper

Outbound TCP connections. The wrapper you point at an agent to capture every API call, package manager fetch, and unauthorized egress in one log.

9d: strace wrapper

This one does the heavy lifting. Two modes: trace a command from start to finish, or attach to a running PID. The flag set -f -T -tt -yy is the canonical fingerprinting combination: follow forks (so a process that fans out into subprocesses doesn’t lose you), per-syscall timing (so you can produce a timing histogram), microsecond timestamps (so you can correlate against tcpdump captures), and resolve file descriptors to paths and socket addresses (so read(3, …) becomes read(3</home/user/.config/claude/credentials>, …)).

9e: tcpdump wrapper

Writes pcaps you can read with tcpdump -r or open in Wireshark.

With these five wrappers in place, every later phase of the lab can capture coherent evidence by chaining a few of them in parallel: execsnoop and opensnoop and tcpconnect against an agent process, with strace attached to the specific PID of interest, while tcpdump records the wire view of the same window.

Section 10: Validation for bpftrace execve tracing

Every coding agent, MCP server, tool invocation, and shell side-effect surfaces as an execve syscall. If you can see execves reliably, you can see every interesting thing an agent does that involves spawning a process. If you cannot, nothing else in this lab works.

Terminal 1 start the trace:

Terminal 2 generate activity:

Terminal 1 should print one line per command, with the PID, the calling shell name, and the absolute path of the binary about to be executed. Ctrl+C in Terminal 1 when done.

The probe itself is worth dissecting because it is the model for every bpftrace one-liner you will write later. tracepoint:syscalls:sys_enter_execve is the kernel’s stable tracepoint for execve entry. The action block { printf(...) } runs in the kernel each time the tracepoint fires. pid and comm are built-in fields, args.filename is the syscall’s first argument, retrieved from the tracepoint’s argument structure. The str() cast converts the kernel pointer to a userspace string.

Section 11: Validation for bpftrace openat tracing

Same idea but with different syscall. openat is what every file read or write resolves to in userspace.

Notice the filter /comm == "cat"/. Without it, openat tracing floods you with thousands of lines per second from every other process on the system. The kernel itself opens files constantly, and most of them are noise. The comm filter narrows the firehose to a specific process. This pattern, narrow the probe by comm, pid, tgid, or uid, is what makes bpftrace one-liners usable in production.

Terminal 1 start the trace:

Terminal 2 generate activity:

Expect a single line in Terminal 1 showing the PID and /etc/os-release. Ctrl+C when done.

Section 12: Validation for bpftrace connect tracing

Bpftrace connect tracing is for outbound network connections. This is the probe you point at coding agents to catch model API egress, package manager downloads, and (if anything is going wrong) unauthorized C2 traffic.

Terminal 1 start the trace:

Terminal 2 generate activity:

Expect one line in Terminal 1 showing curl’s connect(). Ctrl+C when done.

Section 13: Validation for execsnoop

execsnoop-bpfcc is the ready-made version of the execve probe from Section 10. Less customizable but takes zero scripting. Produces wider output with PPID and full argument arrays.

Terminal 1 start the trace:

Terminal 2 generate activity:

Expect three lines in Terminal 1, each showing the command with PID, PPID, and arguments.

Section 14: Validation for opensnoop

The analog for file opens. Same trade-off as execsnoop: no scripting, wider default output.

Expect lines showing the open() call for /etc/passwd with PID and process name.

Section 15: Validation for tcpconnect

The analog for outbound connections. This is the one you typically reach for first when you want to know what an unfamiliar process is talking to, because it includes the destination IP and port without you having to write a probe.

Terminal 1 start the trace:

Terminal 2 generate activity:

Expect one line in Terminal 1 showing curl connecting to the resolved IP of example.com on port 443.

Section 16: Validation for strace single command

bpftrace and bcc are good at watching the whole system, strace is good at watching one specific process in deep detail. The trade-off is that strace adds nontrivial overhead. A strace’d process can be five to ten times slower than an untraced one but it gives you a complete syscall log for that process.

Expect lines showing openat() calls for shared libraries (the runtime loader resolving libc, libdl, etc.), the /tmp directory listing itself, and the ls binary’s execve. The -f flag follows forks, which matters when an agent spawns subprocesses. -e trace= filters which syscalls to display.

Section 17: Validation for strace with timing

The -T flag is the one that matters most for behavioral fingerprinting. It appends per-syscall duration to each line.

Expect syscall lines with timing values like <0.000045> appended. The reason this matters: the syscall-timing histogram of a coding agent’s startup is a behavioral fingerprint.

Section 18: Validation for strace attach to running process

The third strace mode: attach to a process that is already running. This is how you analyze a long-lived agent : an MCP server, a daemon, a coding-agent worker process without restarting it.

Expect strace to attach, show the restart_syscall from sleep, and exit cleanly when the target is killed.

Section 19: Validation for perf

perf reads the kernel’s performance counters: CPU cycles, instructions retired, context switches, page faults, cache misses. For agent analysis it is most useful as a quick way to characterize the resource shape of a workload.

If you get a permission-denied error, the sysctl from Section 7 was not applied:

It should return -1. If not:

Section 20: Validation for ltrace

ltrace traces library calls (libc functions, dynamic-loader resolution) rather than syscalls. It sits one layer above strace. Less essential for syscall-level analysis, more useful when you want to see what getenv, malloc, getpwnam, or other libc-level activity a process performs.

Expect lines showing getenv() calls for environment variables like LS_COLORS and COLUMNS.

For agent analysis, ltrace becomes useful when you want to know what environment variables an agent is reading at startup, which is sometimes how API keys and configuration paths get pulled from the environment.

Section 21: Validation for auditd file access monitoring

auditd is the kernel’s audit subsystem, accessed through userspace tools. It is slower and noisier than eBPF but writes durable records to disk that survive reboots. For agent forensics, this matters: bpftrace output disappears when the trace stops, but an audit log of every read is still on disk a week later.

This validation has more moving parts than the previous ones. It sets up an audit watch on a directory, exercises the watch, queries the audit log, and tears the watch down.

The auditctl flags: -w adds a watch on the path; -p rwa filters by permission (read, write, attribute change); -k tags the rule with a key for later querying. ausearch -k retrieves matching audit records. -W removes the watch with the same parameters.

Expect audit records showing the file read event, including the process name and file path. This is the durable, on-disk equivalent of an opensnoop capture.

Section 22: Validation for auditd execve monitoring

This watches every execve call rather than file access on a specific path.

Expect audit records for the ls and whoami executions.

Section 23: Validation for Falco default rule

Falco ships with a default ruleset that catches well-known suspicious behaviors. One of them is reading sensitive system files like /etc/shadow. This validates that the bundled rules are loaded and firing.

Terminal 1 start the trace:

Terminal 2 generate activity:

Expect a Falco alert in Terminal 1 about a sensitive file being opened for reading. Ctrl+C when done.

Section 24: Validation for Falco custom rule

This validates the rule deployed in Section 6. If this fires, the custom-rule pipeline works end to end, and you can write more rules with confidence that they will load and trigger.

Terminal 1 start the trace:

Terminal 2 generate activity:

Expect a Falco alert in Terminal 1 about a sensitive file being opened for reading. Ctrl+C when done.

Expect an alert in Terminal 1 reading Notice Lab test rule triggered (user=... command=curl ... parent=...).

Section 25: Validation for tcpdump

tcpdump reads packets directly from the network stack, which means it sees everything the host sends and receives, regardless of which process produced it. For agent analysis, this is the ground truth against which everything else is calibrated.

Terminal 1 start the trace:

Terminal 2 generate activity:

Expect a DNS query for example.com, then a TCP SYN to the resolved IP on port 443, then TLS handshake packets. tcpdump exits after capturing 20 packets.

For agents, the most common tcpdump filter is loopback traffic which is host 127.0.0.1 and port 4000. For example, when you have LiteLLM bound to localhost. There is a real BPF gotcha to be aware of here: a bare port 4000 filter does not reliably match loopback traffic on Linux because of how the LINUX_SLL2 pseudo-link type compiles. The working filter always includes host 127.0.0.1. Worth knowing now because it is painful to debug later.

Section 26: Validation for ss socket listing

ss is the modern replacement for netstat. The -tlnp flag set lists listening TCP sockets numerically with the owning process. For the rest of the lab this is how you confirm that LiteLLM is listening on 4000, that PostgreSQL is on 5432, that an MCP server bound to the port you expected.

Expect a list of all listening TCP sockets with the owning process name and PID.

Section 27: Validation for /proc process inspection

/proc is the kernel’s runtime view of every running process. For any PID, you can ask the kernel directly: what is this process doing, what files does it have open, what memory has it mapped, what environment variables does it have.

Each file is its own primitive. cmdline gives the full command line: null-byte-separated, which the tr translates to spaces for readability. status shows the process name, state, UID, parent PID, memory usage. fd/ lists open file descriptors as symlinks to their targets, which means you can see exactly which files, sockets, and pipes the process has open right now. maps shows memory mappings which is every shared library loaded, every executable region, every anonymous memory area.

For security work, environ is the file that matters most. It contains the process’s environment variables, which is where API keys, configuration paths, and credentials often live:

Section 28: Validation for bpftool

bpftool is the introspection tool for eBPF itself. It shows you which eBPF programs are currently loaded, which maps exist, and which programs are attached to which hooks.

Expect a list of loaded eBPF programs. Falco’s programs will be visible.

Section 29: Completion checklist

Sanity check.

Section 30: Write a completion note

Bookkeeping.

Conclusion

I now have a working observability substrate: real-time process, file, and network visibility on the host, plus Falco runtime security, packet capture, kernel-level audit logging, and a disciplined directory structure for capturing it all. Critically, I’ve calibrated every tool against trivial known-good programs like ls and curl, because recognizing uncalibrated output against an unknown process requires first knowing what calibrated output looks like. The next post adds the Kubernetes layer to pair the host’s syscall view with the cluster’s identity-labeled flow view before any real agents get deployed.

The content published on this site reflects personal views and research only. It does not represent the views, positions, or policies of any current or former employer, client, or affiliated organization.

Any references to technologies, vulnerabilities, or security practices are for educational and informational purposes only. Nothing on this site should be interpreted as endorsement, disclosure of confidential information, or professional advice.

All examples are generalized or fictionalized unless explicitly stated otherwise.

Latest Posts

roccofiorecyber@gmail.com