AI Data Analyst
LLM-driven security investigation agent with pre-built anomaly index and scratchpad-based context management.
▶Try the demo$ cat story.md
Started in January 2026 as a generic data analyst on consumer hardware (RTX 5080, 7B model). Built query tools, got it producing findings on small test datasets. Wrote up the techniques from that phase as standalone findings.
Came back to the project after getting a Dell Pro Max GB10 (128GB unified memory). The new hardware meant I could run 70B models locally. Pointed the agent at real data (Splunk BOTS v1, 5M log lines) and every tool call timed out. The old tools loaded entire 1.5GB files into memory per query. Zero findings.
Rebuilt the tool layer: a single Python script streams all data sources into a SQLite index with pre-computed anomaly scores, entity relationships, and temporal bins. Tools went from 30-second timeouts to sub-millisecond lookups. Ran three models (Hermes 4 70B, Nemotron Cascade 30B, GPT-OSS 20B) against the same dataset. Hermes found 9 threats including ransomware C2 infrastructure. GPT-OSS finished in 24 seconds.
Added behavioral scoring that catches threats with zero IDS alerts (scanner 23.22.63.114 had behavioral_score=100 with no alerts). Built role labeling (client/server/scanner/sparse) for peer comparison. Added similar_behavior detection using Jaccard similarity on auth username sets with star topology for exact-match groups. Discovered a coordinated brute-force campaign: 69 IPs sharing an identical 25-username wordlist.
Solved the context management problem that kills most agent loops at scale. Scratchpad-based architecture: evict old messages after each finding, inject progress at the top of context every turn, cache entity details, track dedup pairs, detect stuck loops. Went from agents that loop forever to clean 21-call investigations with zero wasted turns.
$ cat agent-tools.txt
All tools read from a pre-built SQLite index (22K entities, 120K relationships). No raw file scanning at query time.
Top anomalous entities ranked by pre-computed scores. Agent calls this first.
Full cross-source profile: alerts, DNS, HTTP, auth, behavioral flags, similar_behavior links.
Follow 6 relationship types: resolved_to, queried, http_to, alerted_with, shared_target, similar_behavior.
Hourly activity bins and concurrent high-anomaly entities in the same window.
Percentile ranks vs behavioral peers (same role, type). Calibrates if behavior is normal or outlier.
Structured verdict with evidence chain. Triggers context eviction and scratchpad update.
$ ls ./components
- -Running local LLMs with llama.cpp
- -Tool-use patterns for structured output
- -Pre-computing vs runtime data processing
- -Multi-model evaluation on the same task
- -Deterministic anomaly detection for agent consumption
- -Scratchpad-based context management for agent loops
- -Cross-entity behavioral similarity detection
$ ls ./findings/ --project="AI Data Analyst"
Agent Trace Telemetry
evolvingWhat to measure about an agentic investigation loop, and how a trace explorer turns raw run data into evidence for the next prompt or harness change.
Three Local Models Compared on One Investigation
Running Hermes 4 70B, Nemotron Cascade 30B, and GPT-OSS 20B against the same security investigation exposes a speed-vs-depth tradeoff that shows up clearly when tools are fast.
Agent Investigation With Query Tools
Giving a 7B model two query tools and a 5 W's output format is enough to find attacks on a raw auth.log. The architecture beats dumping the logs into the prompt.
Entity Profiling Over Anomaly Flagging
Message-centric anomaly detection flags 269,000 'rare' events on 86,000 auth.log records. Entity profiling asks a different question and produces actionable intelligence instead.
Deterministic Validation for LLM Output
Schema-based validation catches the variance an LLM data cleaner produces between runs. Pattern: deterministic where you can, LLM where you must.
Local LLM Security Agent on Consumer Hardware
Running a security investigation agent on a 16GB consumer GPU with llama.cpp, the OpenAI-compatible API, and a small 7B model.