Builtllama.cppPython

AI Data Analyst

An agent pipeline that turns messy data into structured analysis.

$ cat story.md

This project exists because of time. At work, I build tools for security analysts who need to process massive amounts of data quickly. The bottleneck isn't computing power, it's the manual work of cleaning, formatting, and making sense of messy inputs.

The dream: feed it CSVs, logs, API responses, whatever, and get back structured output ready for analyst review. Not just formatted data, but actual analysis with anomalies flagged and patterns identified.

I'm building this on local AI (llama.cpp) for three reasons: cost (API calls add up), privacy (some data shouldn't leave the machine), and learning (I want to understand how this works at a lower level than just calling an API).

$ cat pipeline.txt

Parse

Clean

Validate

Extract Alerts

Investigate

Report

Components Complete8 of 8

Format ConverterLevel 1

Basic tool schema, local LLM setup

Complete

Schema ValidatorLevel 2

Complex output, error reporting

Complete

Data CleanerLevel 2

Multi-field processing, transformation

Complete

Log ParserLevel 3

Pattern detection, auto-format detection

Complete

Analysis PipelineLevel 4

Multi-tool orchestration

Complete

Data ExtractionLevel 3

Multi-source extraction, format handling

Complete

Query ToolsLevel 3

Agent tool design, data summarization

Complete

Investigation AgentLevel 4

Agentic workflows, LLM tool orchestration

Complete

$ cat current-status.md

Project Complete

Retrospective

January 5, 2026

This project is done. What started as a generic data analyst became a security investigation agent, and that scope creep taught me more than the original plan would have.

The final architecture: Python handles orchestration and data gathering, the LLM reasons about what it finds. Not the autonomous agent I imagined at the start, but it works on consumer hardware (RTX 5080, 7B model) and produces analyst-ready output.

Eleven blog posts document the journey, including a capstone retrospective on what I learned about scope creep, AI collaboration, and planning for outputs first. The tools work. The architecture is sound. Time for the next project.

Milestones

8 / 8

BOTS data extraction (12.6M events)

Query tools built (query_ip, query_domain)

Investigation agent working

Full BOTS investigation run (~20 min, 18 entities)

Blog post: profilers to agentic pivot

Blog post: BOTS test results

Auth.log test with agent tools

Capstone blog post published

$ cat agent-tools.txt

The Investigation Agent queries log data to assess threats. Python handles aggregation, the LLM does reasoning.

query_ip(ip)

Activity summary across Suricata, FortiGate, DNS, HTTP logs

query_domain(domain)

DNS queries, resolved IPs, alert associations for a domain

query_user(username)

Windows logon events, privilege escalations, host access

get_related_entities(type, value)

Find connected IPs, domains, or users for lateral movement

-Running local LLMs with llama.cpp
-Tool-use patterns for structured output
-Multi-step agent pipelines
-Data transformation at scale

$ ls ./blog/ --project="AI Data Analyst"

When the Agent Found the Attacks

Jan 5, 2026

I gave a 7B model query tools and asked it to answer the 5 W's of an investigation. On two different datasets, it found the attacks.

Why I Stopped Flagging Anomalies and Started Profiling Entities

Jan 5, 2026

I expected the LLM to do the heavy lifting. I learned most of the work should be deterministic.

Deterministic Where You Can, LLM Where You Must

Jan 3, 2026

I ran my data cleaner three times on the same input. Got three different results. That's when I started building safety nets.

Building a Local LLM Security Agent on Consumer Hardware

Jan 2, 2026

I avoided local AI for months. Work forced my hand, and I had a security agent running in a week.

View all posts ->

Jan 5, 2026

Project complete: capstone retrospective publishedDocumentation

Published "From Data Analyst to Security Analyst: A Week of Building with AI" covering the full journey: scope creep, architectural pivots, what I learned about AI collaboration, and why the next project needs guardrails.

Auth.log stats-based investigation validatedInvestigation Agent

Tested the agent on auth.log using statistics alone (no IDS signatures). Agent correctly identified brute force and credential stuffing patterns from behavioral data. Missed IoT default credentials, but architecture is sound.

Two blog posts publishedDocumentation

Published "From Profilers to Agent Investigation" explaining the architectural pivot, and "Testing the Agent on 12 Million Events" documenting the BOTS results with actual agent output.

Full BOTS investigation: agent passed the testInvestigation Agent

Ran investigation on 12.6M events (no entity limit). Agent found scanner IP, target server, ransomware victim, and C2 infrastructure in ~20 minutes. Results match published BOTS writeups used for analyst training.

Jan 4, 2026

Investigation system validated on BOTS subsetInvestigation Agent

Limited test with 3 entities confirmed the approach works. Agent correctly identified the attacker IP, target server, and victim domain from the BOTS scenario.

Query tools and investigator builtInvestigation Agent

Built query_ip and query_domain tools that summarize data for the LLM. Investigator agent uses tool calling to query entities extracted from alerts. Context resets between entities to stay within 8K limit.

BOTS data extraction completeData Extraction

Extracted 12.6M events from BOTS v1: Fortigate (7.7M), Suricata (3.6M), DNS (1.4M), HTTP (39K). Windows Event Logs skipped (no handler built, scope control).

Second pivot: profilers to agentic investigationArchitecture

Profilers worked for auth.log but building one per field type per log source would not scale. New approach: LLM-driven investigator queries data on demand starting from alerts. Same principle (Python aggregates, LLM reasons) but agent drives the investigation.

Full pipeline validated: 22 seconds, campaign detection workingAnalysis Pipeline

LLM correlation tested on 86k records. Identified coordinated credential stuffing campaign across 7 IPs. Smart cleaning implemented (skip when validation passes). Pipeline now runs in 22 seconds, down from 617s.

Pipeline updated: profilers replace anomaly flaggerAnalysis Pipeline

Integrated IP Profiler, Username Profiler, and Correlator into analysis_pipeline.py. New flow: detect → parse → validate → clean → profile → correlate. Removed old anomaly-flagger stages. Added --no-llm flag for fast deterministic analysis.

Correlator complete: cross-entity pattern detectionCorrelator

Built tool that combines IP and username profiles. Pre-correlation (Python) finds credential stuffing, distributed brute force, potential compromises. Optional LLM analysis for deep campaign detection. 125 HIGH severity findings from test data.

Username Profiler complete: targeted account detectionUsername Profiler

Groups events by target username. Detects distributed attacks (many IPs targeting one account) and potential compromises (success after failures). Pure Python, 0% variance. 543 usernames profiled, 69 HIGH priority.

IP Profiler complete: behavior classification at scaleIP Profiler

Groups events by source IP, classifies into SCANNING, AUTH_FAILURE, AUTH_SUCCESS, DISCONNECT, PRIVILEGE. Scores by behavioral diversity. Pure Python, 0% variance. Processes 86k records in 3.3 seconds (was 41s with LLM triage).

Architectural pivot: analysts think in IP profiles, not messagesArchitecture

Core insight from variance testing: the problem was not the prompt, it was the model. Message-centric anomaly detection creates noise (50 scanner IPs = 50 anomalies). Entity-centric profiling creates signal (50 scanner IPs = 50 behaviors on 1 profile). Analysts naturally group this way.

Analysis Pipeline complete: L4 capstone doneAnalysis Pipeline

Built orchestrator that chains all 5 tools. Single command runs full pipeline with auto-detect, metrics collection, and variance testing. Validated on 86k records in 41s. All pipeline components complete.

Variance testing reveals triage as weak pointAnalysis Pipeline

Ran 3 iterations to measure consistency. Statistical detection: perfectly stable (0% CoV). Investigation severity: stable. LLM triage decisions: variable (26% CoV). Now we know exactly where to focus prompt tuning.

Jan 3, 2026

Severity factors reduce LLM non-determinismAnomaly Flagger

Observed inconsistent severity ratings (same pattern rated HIGH vs CRITICAL). Tested hypothesis: prompting for justification before rating. Result: all 5 investigations got consistent HIGH ratings, and analysts now get the "why" behind each rating. Chain-of-thought forcing works.

Full pipeline validated at scale: the tool found the attacksAnomaly Flagger

Ran complete three-phase pipeline on 86k SSH attack logs. LLM triage filtered 20 → 19 (only 1 dismissed). Deep investigation found real attack patterns: brute force with common usernames, scanning probes, unusual disconnects. Added --summary-only flag for clean output.

Design pivot: auto-skip high-cardinality fieldsAnomaly Flagger

Solved the noise problem same session. Added auto-detection: skip fields by name pattern (timestamp, pid, uuid) or cardinality ratio (>60% unique). Result: 16% fewer anomalies, 5x faster runtime. The analyst shouldn't have to pre-analyze data.

First scale test: what breaks at 86,000 recordsPipeline

Tested full pipeline on SecRepo auth.log (86,839 SSH attack logs). Log Parser: 0.53 seconds, 100% success, regex-first validated. Anomaly Flagger: revealed design gap where high-cardinality fields (timestamps, PIDs) create noise.

Blog Post 4: Stats, Triage, InvestigateAnomaly Flagger

Published fourth blog post covering the three-phase architecture. Core insight: treat context window as a finite resource. Stats do bulk work (free), LLM triage is batched (efficient), deep investigation resets context between anomalies (prevents bias).

Design review: Format Converter vs Log Parser overlapPipeline

Knowledge check revealed sub-optimal design: Format Converter (always LLM) and Log Parser (regex-first) have overlapping functionality. Future consideration: add a routing layer that samples data and picks the fastest path automatically. Analyst shouldn't need to pre-sort data.

Anomaly Flagger completeAnomaly Flagger

Fifth tool done. Second Level 3 tool. Three-phase architecture: stats find outliers, LLM triage filters noise, deep investigation provides actionable analysis. Context window managed as a resource.

Log Parser completeLog Parser

Fourth tool done. First Level 3 tool. Auto-detects log format (syslog, JSON, key=value) using regex patterns, with LLM fallback for unknown formats. Regex-first for speed, LLM for flexibility.

Data Cleaner completeData Cleaner

Third tool done. Uses local LLM to normalize JSON data based on schema. Handles type conversions, enum normalization, whitespace trimming. Key learning: LLMs are non-deterministic, which is why you chain with the deterministic Schema Validator.

Jan 2, 2026

Schema Validator completeSchema Validator

Second tool done. Validates JSON against JSON Schema files, outputs structured violation reports with paths, messages, and expected vs actual values. Pure Python with jsonschema library, no LLM needed for v1. Teaches handling multiple inputs and complex error reporting.

Format Converter completeFormat Converter

First tool in the pipeline is working. Built Python CLI that calls local LLM via HTTP, converts messy data (CSV, logs, key-value) to structured JSON. Tested with real firewall logs. The understanding of how LLM APIs work (messages array, roles, HTTP transport) made building the tool straightforward.

Python bindings workingFormat Converter

Python calling llama.cpp server via HTTP requests. OpenAI-compatible format: messages array with system/user/assistant roles. Same pattern works with any LLM API.

Jan 1, 2026

Server mode workingFormat Converter

Tested llama.cpp server mode on port 8080. OpenAI-compatible API confirmed working. Multi-line prompts work properly via JSON body. Performance: ~22,000 t/s prompt (cached), 93 t/s generation.

CLI mode testedFormat Converter

Interactive prompts work. Learned that each newline submits a message (no multi-line input). Workaround: use -f prompt.txt for complex prompts.

CUDA 12.8 upgradeFormat Converter

RTX 5080 needs CUDA 12.8 for native sm_120 support. Upgraded from 12.6 and got 19x performance improvement. Generation went from 5.2 t/s to 100 t/s.

First local model runningFormat Converter

Downloaded Hermes 2 Pro 7B Q4_K_M (4.1GB). Built llama.cpp with CUDA support. Model loads and generates text.

Dec 31, 2025

Journey beginsFormat Converter

Chose Format Converter as the first tool to build. Set up llama.cpp infrastructure. The goal: learn tool building from the ground up.

--- journey start ---

AI Data Analyst

$ cat story.md

$ cat pipeline.txt

$ cat current-status.md

Project Complete

Milestones

$ cat agent-tools.txt

$ cat learning-goals.txt

$ ls ./blog/ --project="AI Data Analyst"

When the Agent Found the Attacks

Why I Stopped Flagging Anomalies and Started Profiling Entities

Deterministic Where You Can, LLM Where You Must

Building a Local LLM Security Agent on Consumer Hardware

$ git log --oneline