Skip to content
cd ../findings
created January 5, 2026 · 4 min readAI Data Analyst

Entity Profiling Over Anomaly Flagging

Message-centric anomaly detection flags 269,000 'rare' events on 86,000 auth.log records. Entity profiling asks a different question and produces actionable intelligence instead.

architecturesecuritydetectionagents

A message-centric anomaly detector on 86,000 auth.log records produces 269,000 "anomalies" because every unique IP is statistically rare. An entity-profiling approach groups events by actor and asks what each actor is doing overall. This page walks through the shift, why message-centric detection fails on real log volumes, and what replaces it.

The failure mode

Fed to a rarity-based anomaly detector, 86,000 auth.log records produced 269,000 anomalies. Every unique IP got flagged because that specific IP only appeared a few times. Technically correct. Completely useless. An analyst would be buried before finding anything real.

Prompt tuning made it worse. Adding chain-of-thought to make the LLM justify its triage decisions increased variance. Ten runs produced: 0, 20, 20, 20, 20, 9, 20, 20, 20, 20 anomalies flagged. The detector was answering the wrong question.

The analyst question

A real analyst handed that output would ignore individual messages and group by IP first. The next question is behavioral, not statistical: what is this IP doing overall?

Message-centric (what the tool produced):

Anomaly #1: "Failed password for root from 61.197.203.243" (MEDIUM)
Anomaly #2: "Failed password for admin from 61.197.203.243" (MEDIUM)
Anomaly #3: "Failed password for test from 61.197.203.243" (LOW)
... 47 more anomalies from the same IP ...

Entity-centric (what an analyst needs):

CREDENTIAL STUFFING (HIGH):
  IP 61.197.203.243 attempted 47 different usernames
  → Block this IP. Investigate if any succeeded elsewhere.

The profiler does the grouping. The output is actionable intelligence rather than raw events that require mental aggregation.

Profilers work until they don't

A pair of profilers (one for IPs, one for usernames) covered auth.log analysis cleanly. Fast, deterministic, zero variance.

The problem is generalization. A richer dataset like Splunk BOTS has firewall logs, IDS logs, DNS logs, and HTTP logs, each with different field types. Firewall logs have source and destination IPs, ports, and actions. IDS logs have signatures and severity. HTTP logs have URLs and user agents. Building a profiler for every field type across every log source does not scale.

The agent replacement

Instead of pre-computing profiles for every field combination upfront, the agent pattern exposes query tools and lets the agent pull what it needs during investigation:

# Before: pre-computed profiles for everything
ip_profiles = build_ip_profiles(records)       # Requires knowing fields upfront
user_profiles = build_user_profiles(records)

# After: agent queries what it needs
def query_ip(ip: str) -> dict:
    """Returns behavioral summary for any IP across all log sources."""
    return aggregate_entity_data(ip, entity_type="ip")

The agent starts from alerts (like a real analyst would) and pulls relevant data as the investigation unfolds. Adding a new data source means adding a query function, not building another profiler.

Two tools (query_ip, query_domain) were enough to trace an attack from web scanner to ransomware infection on the BOTS dataset.

The numbers

MetricMessage-centricProfilersAgent
Processing time (86K records)617s22.7s~20 min
Variance26% CoV0%Low (reasoning only)
New data source effortN/ABuild new profilerAdd query function
Output usefulnessNoiseGood for single sourceGood for any source

The agent is slower than profilers for a single known source. It scales to any data source without custom code per field type, which profilers cannot.

The pattern across the pipeline

Every iteration moved more work out of the LLM and into deterministic code:

ToolLLM role
Format converterLLM for everything
Log parserRegex-first, LLM-fallback
Anomaly flaggerStats-first, LLM-selective
ProfilersPython-only
AgentLLM for reasoning only

The last step still uses an LLM, but only to reason about pre-filtered summaries. The Python layer does the counting, grouping, and aggregation.

What I got wrong first

Expecting the LLM to make detection decisions on raw logs. LLMs are expensive, slow, and non-deterministic. Asking one to count and group across 86,000 records is using it against its strengths. The correct split is Python for counting and grouping, LLM for reasoning about what the groups mean.

Tradeoffs

Profilers are faster and zero-variance but require knowing the schema upfront. Agents are slower and bring back a little variance but handle any data source with minimal new code. The decision lives on the "how many sources do you have and how often do they change" axis.

  • Single source, known fields: profilers.
  • Multiple sources, varied fields: agent with tools.
  • Either way: do as much as possible in Python. Let the LLM reason, not count.