Malice in the Mesh // 06: Behavioral Analysis at the Network and Memory Layers

Welcome back. For the foreseeable future, we will be tacking detection engineering for AI Agents one detection surface at a time. In this installment, we begin by diving into behavioral analysis at the network layer. Each analysis type may be comprised of multiple installments, as we move through the discipline in a slow, but smooth manner (remember: “slow is smooth, and smooth is fast”).

Behavioral signatures for AI agents are becoming increasingly important due to their ability to blend into normal traffic. As artificial intelligence at its very essence is the aim of replicating natural human behavior, AI agents are “born to mimic”. Not as a side effect, but as the design objective. Traditional malware’s go-to evasion tactics is to hide (in memory, etc.). Even when hijacking a process or service, if given the choice, malware still would prefer to hide. Whether it be process hollowing, DLL side loading, or leveraging living-off-the-land binaries, the goal is to minimize observable footprint. By doing this, as few deviations from baseline as possible are produced. In the case of a malicious AI agent or hijacked OpenClaw skill, a bad actor’s risk/reward analysis shifts more towards ’hiding in plain sight’ vs. attempting to fly under the radar. Agent traffic is genuinely high-volume and high-variance by its very nature. The volume and variance is not what matters most, but rather the layer where it lives. For example, the semantic layer of agent activity is one area where AI Agent attacks live, and the semantic layer is not traditional security tooling natively captures. While EDR hooks syscalls to infer the intent of a process from its behavior, indirect prompt injection is caught by applying natural language processing to capture semantic intent directly.

And this is why behavioral analysis is so important. Our detection processes must not operate at the level of “did the process do xyz?” They must operate at the level of “did the agent, considering its prompt, its tools, its observed reasoning and its outputs at every phase of the PPAO loop, behave in a way that is consistent with its declared purpose?” Then, the shift is from looking at signatures of presence to behavioral signatures of purpose.

The Network Layer

None of this means the network and process layers are useless. In fact, quite the opposite. As we see from our diagram above, understanding the full stack is important as AI agent operations are exclusive to neither the traditional or non-traditional layers. Traditional means cannot determine whether an agent has been subverted, but it can determine whether something is acting as an agent at all. And that is where we need to start. Distinguishing agent traffic from human traffic, and one agent’s behavior from another’s, is itself useful. It helps determine where to point semantic detection, helps to inventory agents, and helps to begin to build hypotheses for planning.

When it comes to network-layer activity, some of the lowest hanging fruit is the observable difference between an AI agent and a human user interacting with LLM APIs in a time-series. The difference is due to the fundamental architecture of agent systems, and how they differ from us carbon-based humans. Agent systems typically bifurcate the Plan and Act phases in the PPAO loop. Humans interleave the two at sub-second granularity, forming human behavior which produces a sporadic, irregular patterns of API calls. The human reads the previous response while opening a new tab, thinks about the implications while drafting the next prompt, pauses to check something else and forgets to come back for ninety seconds. An agent loop produces a fundamentally differing pattern. Each phase of the loop is discrete with no overlap and no parallellism. The reasoning happens at the LLM provider and the execution happens locally. This is what creates the ‘burst signature’. The time between calls is determined primarily by tool execution latency plus LLM inference latency, both far more measurable and observable.

Although AI agents are a relatively new concept, the idea of manipulating the human vs. machine spectrum is not one new to security. Popular malware such as Trickbot and Emotet employ evasion measures to halt sandbox analysis in its tracks. These measures include behavioral triggers which monitor the environment they are in to determine if it is a regular human’s endpoint or a sandbox VM. Clues that tip it off are if there is a non-default desktop wallpaper in use, number of recent mouse clicks, and if there are any browser tabs open.

Memory Behavior

Also far more measurable and observable is the difference in memory. Agents must externalize all memory. This means observable request size growth, or in more popular terms, a growing ‘context window’. As the conversation grows and one request flows to another, the preceding conversation is appended to the request. This leads to a geometric increase in request size – 2KB, 4KB, 7KB, 11KB, 16KB, etc.

Humans, on the other hand, maintain an enormous internal state space that cannot be observes. There is no agent equivalent of “knowing something without having to look it up”. While humans have the ability to externalization their cognition — writing down notes, recording music or a YouTube video, the agent is only external cognition. The agent’s entire cognition is observable, in a way that a human’s never is.

Agents which use the RAG (Retrieval Augmented Generation) produce a distinctive pattern which is valuable for detection, queries to a vector database. These databases can be external or local to the agent, with the agent calling it in order to perform a vector similarity search before or during each agent loop. The responses contain text chunks, metadata, and similarity scores. Providing not only a reliable method of detection, but another path of exfiltration risk.

Demonstrating Memory Behavior

To demonstrate this, we built a minimal RAG agent:

  • Anthropic SDK
  • Chroma vector store
  • Sentence-transformers for embeddings

We created a corpus of twenty short documents and pointed the RAG agent at it. Half of these are on-topic for a security analyst, such as documents on IR, malware analysis, network detection, and more. The other half are off-topic, such as Kubernetes, Python, and git basics. The off-topic ones are there to test what happens when the system retrieves items it should not be using. We captured network traffic with tcpdump -i any filtered to host 127.0.0.1 and port 8000 and 4000. We traced file I/O with strace and leveraged a Falco rule pack to watch process-level egress activity. If you do not remember, Falco is the tool which uses eBPF probes that attach to kernel tracepoints and captures syscall events. Falco is observing at the kernel syscall boundary, we are not just watching logs or network headers.

In addition, we added an exfiltration step to bring to life the ‘compromised RAG agent’ scenario.

Timing

Following on our discussion of agent timing, our first step was to collect timing telemetry. We used tshark to measure the gap between the vector query and the LLM call. Every gap is consistently between 21 and 27 milliseconds. The gap is the latency of two Python function calls, JSON serialization, and an HTTP round trip. The most impactful of these is the Python function calls. The gap from LLM call to next vector query is consistently 3 to 6 seconds. This is the LLM inference time. Together, these gaps form a repeatable and observable rhythm of tight, slow, tight, slow, tight, slow. It is also demonstrates the inherent time difference in preprocessing vs. inference. Regardless, the consistency and uniformity of the timing enables the ability to identify a RAG agent purely from the pattern, without payload visibility. The agent’s memory access has a heartbeat, the Observe phase ‘on the wire’.

Float Arrays Captured

Our second undeniable piece of evidence is the structural distinction of embedding vectors in network payloads. We captured 384 comma-separated floats, all between roughly -0.08 and +0.08.

Regardless of the numbers being displayed, we can accomplish accurate fingerprinting by way of measuring request size. This becomes useful not just in this exercise, but in the future, knowing LangChain defaults to 1536 dimension embeddings, CrewAI defaults to MiniLM’s 384 dimensions embedding, and more. The two embedding models we tested were all-MiniLM-L6-v2 at 384 dimensions and all-money-base-v2 at 768 dimensions. ‘Dimensions’ are simply numbers per sentence.

File system Analysis

With our multi-pronged detection strategy, we also did a quick analysis of what files were created by Chroma. One ‘prong’ informs another, as we then used this to monitor syscall activity towards these files.

Again, uniform and consistent timing patterns. Twelve pread64 syscalls per query in tight 200-microsecond bursts against the HNSW segment files. Derived from here, a Falco rule keying in on {data_level0, header, length, link_lists}.bin opened by a single process.

Falco Detection

We subsequently deployed a Falco rule that fires when the agent process opens a TCP or UDP connection to a non-allowlists destination, monitoring for exfiltration. We then ran a compromised version of the agent, one that uploads the retrieved documents to httpbin.org before passing them to the LLM. This keeps the retrieval logic the same, but slips in a hand off to a different destination.

The Falco rule fired fifty-one times during five queries. Every alert had attribution. Our Falco rule for egress activity is solid, but after exfiltration has already occurred. We added another rule to complement it. Our second rule flags any query where most of the retrieved documents fell outside the agent’s declared role. This is our intent signal. Either rule alone raises false positives. The first rule is noisy with 51 alerts. The second rule requires properly tagged documents. Together they create a chain which more closely defines true intent and a stronger behavioral signal.

Mapping to ATLAS techniques we can define the above as AML.T0036 (Data from Information Repositories) and AML.T0086 (Exfiltration via AI Agent Tool Invocation). They sound alike, but are different. The first defines the harvesting of content from a knowledge base that should not be accessed. The second is the act of exfiltration itself, being the httpx POST to httpbin.org.

Falco Detection Lessons Learned

Matching on the Python binary name is either too broad, and catches every Python process on the machine, or it breaks due to how virtual environments handle symlinks. Instead match on the working directory path.

Without extra filters, Falco picks up Python’s internal startup connections and produces dozens of useless alerts. Adding a filter for real IP-resolved TCP/UDP connections reduces false positives and noise.

To further reduce false positives you must whitelist DNS-related artifacts: 127.0.0.5, the local DNS resolver, and port 53. If you don’t build that logic into Falco, every time the agent looks up api.anthropic.com the rule fires, and you’ll create much noise.

Conclusion

In this installment, we broke down behavioral analysis at the network and memory layers. This leaves us with only part of the picture and in our next installment we will continue with behavioral analysis again, leveraging API traffic and the output layer. Be sure to check out the soon-to-come AI Analysis Lab series to learn more about how we conducted our analysis and gathered the telemetry above.

Have suggestions or want to collaborate on a future project? Shoot me an email at roccofiorecyber@gmail.com or find me on LinkedIn at the icon below.

The content published on this site reflects personal views and research only. It does not represent the views, positions, or policies of any current or former employer, client, or affiliated organization.

Any references to technologies, vulnerabilities, or security practices are for educational and informational purposes only. Nothing on this site should be interpreted as endorsement, disclosure of confidential information, or professional advice.

All examples are generalized or fictionalized unless explicitly stated otherwise.

Latest Posts

roccofiorecyber@gmail.com