From c68acaca75b5065822e05bd1fbc91662756adff6 Mon Sep 17 00:00:00 2001 From: Rico Komenda Date: Fri, 19 Jun 2026 13:33:48 +0200 Subject: [PATCH] feat(C12): consolidate logging requirements from all chapters into C12 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Moves 12 logging-specific requirements scattered across C1, C2, C3, C8, C9, and C11 into C12 as the single authoritative source for logging controls. Remaining requirements in source chapters are renumbered. Changes in C12: - C12.1: add 12.1.7 (screening logs), 12.1.8 (hosted model identifier), 12.1.9 (RAG retrieval events) — moved from C2.2.3, C3.3.4, C8.1.4 - C12.2: add 12.2.7 (extraction-alert metadata) — moved from C11.5.2; renumber former 12.2.7-8 to 12.2.8-9 - C12.6: add 12.6.5 (agent audit log records), 12.6.6 (kill-switch logging), 12.6.7 (self-modification logging) — moved from C9.4.3, C9.6.3, C11.9.3; renumber former 12.6.5 to 12.6.8 - C12.7 (new): Training Data & Model Lifecycle Audit — adds 12.7.1-5 moved from C1.1.2, C1.2.2, C3.2.2, C8.1.2, C1.1.7 --- ...raining-Data-Integrity-and-Traceability.md | 15 ++++------ 1.0/en/0x10-C02-Input-Validation.md | 5 ++-- 1.0/en/0x10-C03-Model-Lifecycle-Management.md | 6 ++-- ...8-Memory-Embeddings-and-Vector-Database.md | 6 ++-- ...10-C09-Orchestration-and-Agentic-Action.md | 8 ++--- 1.0/en/0x10-C11-Adversarial-Robustness.md | 16 +++++----- 1.0/en/0x10-C12-Monitoring-and-Logging.md | 29 ++++++++++++++++--- 7 files changed, 47 insertions(+), 38 deletions(-) diff --git a/1.0/en/0x10-C01-Training-Data-Integrity-and-Traceability.md b/1.0/en/0x10-C01-Training-Data-Integrity-and-Traceability.md index 1c5cf45..fff82ce 100644 --- a/1.0/en/0x10-C01-Training-Data-Integrity-and-Traceability.md +++ b/1.0/en/0x10-C01-Training-Data-Integrity-and-Traceability.md @@ -13,12 +13,10 @@ Training data origin and data security are critical to the security and trustwor | # | Description | Level | | :--------: | --------------------------------------------------------------------------------------------------------------------- | :---: | | **1.1.1** | **Verify that** training data includes only features, attributes, and fields required for the model's stated purpose. | 1 | -| **1.1.2** | **Verify that** the lineage of each dataset and its components, including all transformations, augmentations, and merges, is recorded and can be reconstructed. | 1 | -| **1.1.3** | **Verify that** an up-to-date inventory is kept of every training-data source, including its origin, responsible party, license, collection method, intended use constraints, and processing history. | 2 | -| **1.1.4** | **Verify that** datasets are watermarked so their use can be attributed and any unauthorized use detected. | 3 | -| **1.1.5** | **Verify that** data integrity is provided when training data is stored and transferred. | 2 | -| **1.1.6** | **Verify that** integrity monitoring is applied to guard against unauthorized modifications or corruption of training data. | 2 | -| **1.1.7** | **Verify that** all training datasets are uniquely identified, with change tracking, to support rollback and forensic analysis. | 3 | +| **1.1.2** | **Verify that** an up-to-date inventory is kept of every training-data source, including its origin, responsible party, license, collection method, intended use constraints, and processing history. | 2 | +| **1.1.3** | **Verify that** datasets are watermarked so their use can be attributed and any unauthorized use detected. | 3 | +| **1.1.4** | **Verify that** data integrity is provided when training data is stored and transferred. | 2 | +| **1.1.5** | **Verify that** integrity monitoring is applied to guard against unauthorized modifications or corruption of training data. | 2 | --- @@ -29,9 +27,8 @@ Labeling and annotation processes must be protected against unauthorized modific | # | Description | Level | | :--------: | --------------------------------------------------------------------------------------------------------------------- | :---: | | **1.2.1** | **Verify that** labeling platforms enforce access controls that restrict who can create, modify, or approve annotations. | 1 | -| **1.2.2** | **Verify that** all labeling activities are recorded in logs. | 1 | -| **1.2.3** | **Verify that** cryptographic integrity is applied to labeling artifacts. | 2 | -| **1.2.4** | **Verify that** sensitive information in labels is redacted, anonymized, or encrypted before being used in any labeling artifact. | 2 | +| **1.2.2** | **Verify that** cryptographic integrity is applied to labeling artifacts. | 2 | +| **1.2.3** | **Verify that** sensitive information in labels is redacted, anonymized, or encrypted before being used in any labeling artifact. | 2 | --- diff --git a/1.0/en/0x10-C02-Input-Validation.md b/1.0/en/0x10-C02-Input-Validation.md index 8d7f43a..491a9de 100644 --- a/1.0/en/0x10-C02-Input-Validation.md +++ b/1.0/en/0x10-C02-Input-Validation.md @@ -31,9 +31,8 @@ Syntactically valid prompts may request disallowed content such as instructions | :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: | | **2.2.1** | **Verify that** every prompt is scored by a content classifier for violence, self-harm, hate, and sexual content against configurable thresholds. Prompts that exceed those thresholds are rejected or sanitized before reaching model context. | 1 | | **2.2.2** | **Verify that** prompt content classification is evaluated for languages that are not supported. | 1 | -| **2.2.3** | **Verify that** screening logs include classifier confidence scores and policy category tags with applied stage and trace metadata. | 2 | -| **2.2.4** | **Verify that** non-text inputs (image/video/audio) are checked for adversarial perturbations, steganographic payloads, hidden or embedded content, or known attack patterns. | 2 | -| **2.2.5** | **Verify that** coordinated attacks spanning multiple input types (e.g., steganographic payloads in images combined with prompt injection in text) are detected and blocked. | 3 | +| **2.2.3** | **Verify that** non-text inputs (image/video/audio) are checked for adversarial perturbations, steganographic payloads, hidden or embedded content, or known attack patterns. | 2 | +| **2.2.4** | **Verify that** coordinated attacks spanning multiple input types (e.g., steganographic payloads in images combined with prompt injection in text) are detected and blocked. | 3 | --- diff --git a/1.0/en/0x10-C03-Model-Lifecycle-Management.md b/1.0/en/0x10-C03-Model-Lifecycle-Management.md index 3bb017b..cae5f4d 100644 --- a/1.0/en/0x10-C03-Model-Lifecycle-Management.md +++ b/1.0/en/0x10-C03-Model-Lifecycle-Management.md @@ -25,9 +25,8 @@ Models must pass defined security and safety validations before deployment. | # | Description | Level | | :--------: | --------------------------------------------------------------------------------------------------------------- | :---: | | **3.2.1** | **Verify that** models undergo automated input validation testing, safety evaluation testing and output sanitization testing before deployment. | 1 | -| **3.2.2** | **Verify that** all model changes (deployment, configuration, retirement) generate immutable audit records. | 2 | -| **3.2.3** | **Verify that** models that are subjected to post-training quantization will be re-evaluated against the same safety and alignment test suite on the compressed artifact before deployment. | 2 | -| **3.2.4** | **Verify that** provider model, version, or routing changes trigger security re-evaluation before continued use. | 3 | +| **3.2.2** | **Verify that** models that are subjected to post-training quantization will be re-evaluated against the same safety and alignment test suite on the compressed artifact before deployment. | 2 | +| **3.2.3** | **Verify that** provider model, version, or routing changes trigger security re-evaluation before continued use. | 3 | --- @@ -40,7 +39,6 @@ Model deployments must be controlled, monitored, and reversible. | **3.3.1** | **Verify that** production deployments implement rollout mechanisms with automated rollback triggers. | 2 | | **3.3.2** | **Verify that** rollback capabilities restore the complete model state. | 2 | | **3.3.3** | **Verify that** model versions running in parallel use isolated runtime state so that AI-specific shared resources are not shared across deployments. | 2 | -| **3.3.4** | **Verify that** logs record the exact hosted model identifier returned by the provider. | 2 | --- diff --git a/1.0/en/0x10-C08-Memory-Embeddings-and-Vector-Database.md b/1.0/en/0x10-C08-Memory-Embeddings-and-Vector-Database.md index d62e266..f86fc07 100644 --- a/1.0/en/0x10-C08-Memory-Embeddings-and-Vector-Database.md +++ b/1.0/en/0x10-C08-Memory-Embeddings-and-Vector-Database.md @@ -13,10 +13,8 @@ Enforce fine-grained access controls and query-time scope enforcement for every | # | Description | Level | | :--: | --- | :---: | | **8.1.1** | **Verify that** vector identifiers and namespaces enforce uniqueness per tenant and prevent cross-tenant collisions. | 1 | -| **8.1.2** | **Verify that** every ingested document is tagged at write time with source, writer identity, and timestamp. | 2 | -| **8.1.3** | **Verify that** document metadata tags are immutable after the initial write. | 2 | -| **8.1.4** | **Verify that** RAG pipeline retrieval events are logged, including the query, documents retrieved, and knowledge source. | 2 | -| **8.1.5** | **Verify that** retrieval operations enforces scope constraints. | 2 | +| **8.1.2** | **Verify that** document metadata tags are immutable after the initial write. | 2 | +| **8.1.3** | **Verify that** retrieval operations enforces scope constraints. | 2 | --- diff --git a/1.0/en/0x10-C09-Orchestration-and-Agentic-Action.md b/1.0/en/0x10-C09-Orchestration-and-Agentic-Action.md index c1a4bca..b3fb529 100644 --- a/1.0/en/0x10-C09-Orchestration-and-Agentic-Action.md +++ b/1.0/en/0x10-C09-Orchestration-and-Agentic-Action.md @@ -59,9 +59,8 @@ Make every action attributable and every mutation detectable. | :--: | --- | :---: | | **9.4.1** | **Verify that** each agent instance has a unique cryptographic identity and authenticates as a first-class principal to downstream systems. | 2 | | **9.4.2** | **Verify that** agent-initiated actions are cryptographically bound to each step of the execution chain for non-repudiation. | 2 | -| **9.4.3** | **Verify that** audit log records include identity, scope, authorization decisions, tool parameters, and outcomes. | 2 | -| **9.4.4** | **Verify that** agent identity credentials rotate on a defined schedule. | 3 | -| **9.4.5** | **Verify that** agent state persisted between invocations is integrity-protected. | 3 | +| **9.4.3** | **Verify that** agent identity credentials rotate on a defined schedule. | 3 | +| **9.4.4** | **Verify that** agent state persisted between invocations is integrity-protected. | 3 | --- @@ -88,8 +87,7 @@ Provide shutdown and graceful degradation paths under human control, with mechan | :--: | --- | :---: | | **9.6.1** | **Verify that** a manual kill-switch mechanism exists to immediately halt AI model inference and outputs. | 1 | | **9.6.2** | **Verify that** when a human-approval gate is not satisfied within the defined approval time, the system blocks the pending action. | 2 | -| **9.6.3** | **Verify that** kill-switch activations and override commands are logged. | 2 | -| **9.6.4** | **Verify that** kill-switch commands are implemented through an out-of-band channel that is isolated from the agent runtime. | 3 | +| **9.6.3** | **Verify that** kill-switch commands are implemented through an out-of-band channel that is isolated from the agent runtime. | 3 | --- diff --git a/1.0/en/0x10-C11-Adversarial-Robustness.md b/1.0/en/0x10-C11-Adversarial-Robustness.md index eb716cf..f6bdb88 100644 --- a/1.0/en/0x10-C11-Adversarial-Robustness.md +++ b/1.0/en/0x10-C11-Adversarial-Robustness.md @@ -68,11 +68,10 @@ Detect and deter unauthorized model cloning through API abuse. Rate limiting, qu | # | Description | Level | | :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: | | **11.5.1** | **Verify that** inference endpoints enforce per-principal and global rate limits sized to the extraction threat model, and not solely as a generic API throttle. | 1 | -| **11.5.2** | **Verify that** extraction-alert events include offending query metadata (e.g., source principal, query volume, input distribution statistics) to support investigation. | 2 | -| **11.5.3** | **Verify that** query-pattern analysis (e.g., query diversity, input distribution anomalies, output-space coverage anomalies) feeds an automated extraction-attempt detector. | 2 | -| **11.5.4** | **Verify that** raw model outputs (e.g., full posterior distributions, output vectors) are not directly exposed beyond the application backend, and that externally visible responses minimize output informativeness calibrated to the extraction risk level. | 2 | -| **11.5.5** | **Verify that** model watermarking or fingerprinting techniques are applied so that unauthorized copies can be identified. | 3 | -| **11.5.6** | **Verify that** detection of suspected extraction activity triggers adaptive response measures proportional to estimated extraction risk. | 3 | +| **11.5.2** | **Verify that** query-pattern analysis (e.g., query diversity, input distribution anomalies, output-space coverage anomalies) feeds an automated extraction-attempt detector. | 2 | +| **11.5.3** | **Verify that** raw model outputs (e.g., full posterior distributions, output vectors) are not directly exposed beyond the application backend, and that externally visible responses minimize output informativeness calibrated to the extraction risk level. | 2 | +| **11.5.4** | **Verify that** model watermarking or fingerprinting techniques are applied so that unauthorized copies can be identified. | 3 | +| **11.5.5** | **Verify that** detection of suspected extraction activity triggers adaptive response measures proportional to estimated extraction risk. | 3 | --- @@ -121,10 +120,9 @@ Security controls for systems where the AI can modify its own configuration, pro | :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: | | **11.9.1** | **Verify that** any self-modification capability (e.g., prompt rewriting, tool-list changes, parameter updates) is restricted to explicitly designated areas with enforced boundaries. | 2 | | **11.9.2** | **Verify that** proposed self-modifications undergo security impact assessment or policy validation before taking effect. | 2 | -| **11.9.3** | **Verify that** self-modifications are explicitly classified as security-relevant events and logged with sufficient detail to reconstruct what changed, when, by which agent or principal, and under what authorization. This logging applies even if self-modification is not otherwise documented as a logged event. | 2 | -| **11.9.4** | **Verify that** self-modifications are reversible and subject to integrity verification, so that rollback to a known-good state is possible and can be confirmed. | 2 | -| **11.9.5** | **Verify that** self-modification scope is bounded (e.g., maximum change magnitude, rate limits on updates, prohibited modification targets) to prevent runaway or adversarially induced changes. | 3 | -| **11.9.6** | **Verify that** when safety violation data (blocked inputs, filtered outputs, flagged hallucinations) is used as training signal for model improvement, the feedback pipeline includes integrity verification, poisoning detection, and human review gates to prevent adversarial manipulation of the improvement mechanism. | 3 | +| **11.9.3** | **Verify that** self-modifications are reversible and subject to integrity verification, so that rollback to a known-good state is possible and can be confirmed. | 2 | +| **11.9.4** | **Verify that** self-modification scope is bounded (e.g., maximum change magnitude, rate limits on updates, prohibited modification targets) to prevent runaway or adversarially induced changes. | 3 | +| **11.9.5** | **Verify that** when safety violation data (blocked inputs, filtered outputs, flagged hallucinations) is used as training signal for model improvement, the feedback pipeline includes integrity verification, poisoning detection, and human review gates to prevent adversarial manipulation of the improvement mechanism. | 3 | ## C11.10 Adversarial Bias Exploitation Defense diff --git a/1.0/en/0x10-C12-Monitoring-and-Logging.md b/1.0/en/0x10-C12-Monitoring-and-Logging.md index 8676492..e8523b8 100644 --- a/1.0/en/0x10-C12-Monitoring-and-Logging.md +++ b/1.0/en/0x10-C12-Monitoring-and-Logging.md @@ -4,7 +4,7 @@ Deliver real-time and forensic visibility into what the model and other AI components see, do, and return, so AI-specific threats can be detected, triaged, and learned from. -This chapter focuses on controls unique to AI systems for monitoring, logging, and anomaly detection: AI-specific log content (model identifier, token usage, safety filter outcomes, prompt/response handling), AI-specific abuse and attack detection (jailbreak, prompt injection, extraction, multi-turn trajectory, covert channels over LLM endpoints), model and data drift detection, AI-specific telemetry signals (token attribution, output/input ratio anomalies), AI incident response, and proactive agent behavior monitoring. +This chapter focuses on controls unique to AI systems for monitoring, logging, and anomaly detection: AI-specific log content (model identifier, token usage, safety filter outcomes, prompt/response handling), AI-specific abuse and attack detection (jailbreak, prompt injection, extraction, multi-turn trajectory, covert channels over LLM endpoints), model and data drift detection, AI-specific telemetry signals (token attribution, output/input ratio anomalies), AI incident response, proactive agent behavior monitoring, and training data and model lifecycle audit logging. --- @@ -18,6 +18,9 @@ This chapter focuses on controls unique to AI systems for monitoring, logging, a | **12.1.4** | **Verify that** policy decisions and safety filtering actions are logged with enough detail to audit and debug content moderation systems. | 2 | | **12.1.5** | **Verify that** log entries for AI inference events follow a structured, interoperable schema that includes at least the model identifier, token usage (input and output), provider name, and operation type, so AI observability stays consistent across tools and platforms. | 2 | | **12.1.6** | **Verify that** full prompt and response content is logged only when a security-relevant event is detected (e.g., safety filter trigger, prompt injection detection, anomaly flag), or when required by explicit user consent and a documented legal basis. | 2 | +| **12.1.7** | **Verify that** screening logs include classifier confidence scores and policy category tags with applied stage and trace metadata. | 2 | +| **12.1.8** | **Verify that** logs record the exact hosted model identifier returned by the provider. | 2 | +| **12.1.9** | **Verify that** RAG pipeline retrieval events are logged, including the query, documents retrieved, and knowledge source. | 2 | --- @@ -33,8 +36,9 @@ Detect AI-specific attack patterns (jailbreak, prompt injection, model extractio | **12.2.4** | **Verify that** custom rules detect AI-specific threat patterns, including coordinated jailbreak attempts, prompt injection campaigns, system prompt extraction attempts, and model extraction attacks. | 2 | | **12.2.5** | **Verify that** per-user and per-session token consumption triggers an alert when consumption exceeds defined thresholds. | 2 | | **12.2.6** | **Verify that** automated incident response workflows can isolate compromised models and block malicious users. | 2 | -| **12.2.7** | **Verify that** session-level conversation trajectory analysis detects multi-turn jailbreak patterns where no single turn looks overtly malicious on its own, but the conversation as a whole shows attack indicators. | 3 | -| **12.2.8** | **Verify that** LLM API traffic is monitored for covert channel indicators, including Base64-encoded payloads, structured non-human query patterns, and communication signatures consistent with malware command-and-control activity using LLM endpoints. | 3 | +| **12.2.7** | **Verify that** extraction-alert events include offending query metadata (e.g., source principal, query volume, input distribution statistics) to support investigation. | 2 | +| **12.2.8** | **Verify that** session-level conversation trajectory analysis detects multi-turn jailbreak patterns where no single turn looks overtly malicious on its own, but the conversation as a whole shows attack indicators. | 3 | +| **12.2.9** | **Verify that** LLM API traffic is monitored for covert channel indicators, including Base64-encoded payloads, structured non-human query patterns, and communication signatures consistent with malware command-and-control activity using LLM endpoints. | 3 | --- @@ -88,7 +92,24 @@ Detect and prevent security threats arising from proactive (agent-initiated) beh | **12.6.2** | **Verify that** autonomous initiative triggers include security context evaluation and threat landscape assessment. | 2 | | **12.6.3** | **Verify that** proactive behavior patterns are analyzed for potential security implications and unintended consequences. | 2 | | **12.6.4** | **Verify that** audit logs capture the complete approval chain for security-critical proactive actions, including approver identity, timestamp, action parameters, and decision outcome. | 2 | -| **12.6.5** | **Verify that** behavioral anomaly detection identifies deviations in proactive agent patterns that may indicate compromise. | 3 | +| **12.6.5** | **Verify that** audit log records include identity, scope, authorization decisions, tool parameters, and outcomes. | 2 | +| **12.6.6** | **Verify that** kill-switch activations and override commands are logged. | 2 | +| **12.6.7** | **Verify that** self-modifications are explicitly classified as security-relevant events and logged with sufficient detail to reconstruct what changed, when, by which agent or principal, and under what authorization. | 2 | +| **12.6.8** | **Verify that** behavioral anomaly detection identifies deviations in proactive agent patterns that may indicate compromise. | 3 | + +--- + +## C12.7 Training Data & Model Lifecycle Audit + +Ensure that the provenance and change history of training data, model artifacts, and knowledge sources are auditable throughout the AI development lifecycle. + +| # | Description | Level | +| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: | +| **12.7.1** | **Verify that** the lineage of each dataset and its components, including all transformations, augmentations, and merges, is recorded and can be reconstructed. | 1 | +| **12.7.2** | **Verify that** all labeling activities are recorded in logs. | 1 | +| **12.7.3** | **Verify that** all model changes (deployment, configuration, retirement) generate immutable audit records. | 2 | +| **12.7.4** | **Verify that** every ingested document is tagged at write time with source, writer identity, and timestamp. | 2 | +| **12.7.5** | **Verify that** all training datasets are uniquely identified, with change tracking, to support rollback and forensic analysis. | 3 | ---