A formal framework for LLM-assisted automated generation of Zeek signatures from binary artifacts

Abstract

Designing semantically meaningful and operationally effective intrusion detection signatures remains a labor-intensive and expertise-driven task, particularly within the Zeek network monitoring framework. In this paper, we introduce a formalized and modular system for automating Zeek signature generation using Large Language Models (LLMs). Our pipeline begins with static analysis of binary artifacts, extracts salient behavioral features, and transforms them into structured prompts for an LLM tasked with synthesizing Zeek scripts. We provide a rigorous formal framework that defines each stage of this transformation, along with theoretical models for prompt distortion, injection resilience, and sanitization. Furthermore, we explore the adversarial surface exposed by LLMs—introducing a taxonomy of injection attacks, prompt inversion risks, and behavioral feedback loops—and propose mitigations grounded in filtering and robust prompt engineering. Our approach not only accelerates signature creation but also enhances interpretability and adaptability in evolving threat environments. The framework lays the groundwork for future extensions involving dynamic analysis and automated post-validation of generated signatures.

Publication
Future Generation Computer Systems