Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 6 additions & 9 deletions 1.0/en/0x10-C01-Training-Data-Integrity-and-Traceability.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,10 @@ Training data origin and data security are critical to the security and trustwor
| # | Description | Level |
| :--------: | --------------------------------------------------------------------------------------------------------------------- | :---: |
| **1.1.1** | **Verify that** training data includes only features, attributes, and fields required for the model's stated purpose. | 1 |
| **1.1.2** | **Verify that** the lineage of each dataset and its components, including all transformations, augmentations, and merges, is recorded and can be reconstructed. | 1 |
| **1.1.3** | **Verify that** an up-to-date inventory is kept of every training-data source, including its origin, responsible party, license, collection method, intended use constraints, and processing history. | 2 |
| **1.1.4** | **Verify that** datasets are watermarked so their use can be attributed and any unauthorized use detected. | 3 |
| **1.1.5** | **Verify that** data integrity is provided when training data is stored and transferred. | 2 |
| **1.1.6** | **Verify that** integrity monitoring is applied to guard against unauthorized modifications or corruption of training data. | 2 |
| **1.1.7** | **Verify that** all training datasets are uniquely identified, with change tracking, to support rollback and forensic analysis. | 3 |
| **1.1.2** | **Verify that** an up-to-date inventory is kept of every training-data source, including its origin, responsible party, license, collection method, intended use constraints, and processing history. | 2 |
| **1.1.3** | **Verify that** datasets are watermarked so their use can be attributed and any unauthorized use detected. | 3 |
| **1.1.4** | **Verify that** data integrity is provided when training data is stored and transferred. | 2 |
| **1.1.5** | **Verify that** integrity monitoring is applied to guard against unauthorized modifications or corruption of training data. | 2 |

---

Expand All @@ -29,9 +27,8 @@ Labeling and annotation processes must be protected against unauthorized modific
| # | Description | Level |
| :--------: | --------------------------------------------------------------------------------------------------------------------- | :---: |
| **1.2.1** | **Verify that** labeling platforms enforce access controls that restrict who can create, modify, or approve annotations. | 1 |
| **1.2.2** | **Verify that** all labeling activities are recorded in logs. | 1 |
| **1.2.3** | **Verify that** cryptographic integrity is applied to labeling artifacts. | 2 |
| **1.2.4** | **Verify that** sensitive information in labels is redacted, anonymized, or encrypted before being used in any labeling artifact. | 2 |
| **1.2.2** | **Verify that** cryptographic integrity is applied to labeling artifacts. | 2 |
| **1.2.3** | **Verify that** sensitive information in labels is redacted, anonymized, or encrypted before being used in any labeling artifact. | 2 |

---

Expand Down
5 changes: 2 additions & 3 deletions 1.0/en/0x10-C02-Input-Validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,8 @@ Syntactically valid prompts may request disallowed content such as instructions
| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: |
| **2.2.1** | **Verify that** every prompt is scored by a content classifier for violence, self-harm, hate, and sexual content against configurable thresholds. Prompts that exceed those thresholds are rejected or sanitized before reaching model context. | 1 |
| **2.2.2** | **Verify that** prompt content classification is evaluated for languages that are not supported. | 1 |
| **2.2.3** | **Verify that** screening logs include classifier confidence scores and policy category tags with applied stage and trace metadata. | 2 |
| **2.2.4** | **Verify that** non-text inputs (image/video/audio) are checked for adversarial perturbations, steganographic payloads, hidden or embedded content, or known attack patterns. | 2 |
| **2.2.5** | **Verify that** coordinated attacks spanning multiple input types (e.g., steganographic payloads in images combined with prompt injection in text) are detected and blocked. | 3 |
| **2.2.3** | **Verify that** non-text inputs (image/video/audio) are checked for adversarial perturbations, steganographic payloads, hidden or embedded content, or known attack patterns. | 2 |
| **2.2.4** | **Verify that** coordinated attacks spanning multiple input types (e.g., steganographic payloads in images combined with prompt injection in text) are detected and blocked. | 3 |

---

Expand Down
6 changes: 2 additions & 4 deletions 1.0/en/0x10-C03-Model-Lifecycle-Management.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,8 @@ Models must pass defined security and safety validations before deployment.
| # | Description | Level |
| :--------: | --------------------------------------------------------------------------------------------------------------- | :---: |
| **3.2.1** | **Verify that** models undergo automated input validation testing, safety evaluation testing and output sanitization testing before deployment. | 1 |
| **3.2.2** | **Verify that** all model changes (deployment, configuration, retirement) generate immutable audit records. | 2 |
| **3.2.3** | **Verify that** models that are subjected to post-training quantization will be re-evaluated against the same safety and alignment test suite on the compressed artifact before deployment. | 2 |
| **3.2.4** | **Verify that** provider model, version, or routing changes trigger security re-evaluation before continued use. | 3 |
| **3.2.2** | **Verify that** models that are subjected to post-training quantization will be re-evaluated against the same safety and alignment test suite on the compressed artifact before deployment. | 2 |
| **3.2.3** | **Verify that** provider model, version, or routing changes trigger security re-evaluation before continued use. | 3 |

---

Expand All @@ -40,7 +39,6 @@ Model deployments must be controlled, monitored, and reversible.
| **3.3.1** | **Verify that** production deployments implement rollout mechanisms with automated rollback triggers. | 2 |
| **3.3.2** | **Verify that** rollback capabilities restore the complete model state. | 2 |
| **3.3.3** | **Verify that** model versions running in parallel use isolated runtime state so that AI-specific shared resources are not shared across deployments. | 2 |
| **3.3.4** | **Verify that** logs record the exact hosted model identifier returned by the provider. | 2 |

---

Expand Down
6 changes: 2 additions & 4 deletions 1.0/en/0x10-C08-Memory-Embeddings-and-Vector-Database.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,8 @@ Enforce fine-grained access controls and query-time scope enforcement for every
| # | Description | Level |
| :--: | --- | :---: |
| **8.1.1** | **Verify that** vector identifiers and namespaces enforce uniqueness per tenant and prevent cross-tenant collisions. | 1 |
| **8.1.2** | **Verify that** every ingested document is tagged at write time with source, writer identity, and timestamp. | 2 |
| **8.1.3** | **Verify that** document metadata tags are immutable after the initial write. | 2 |
| **8.1.4** | **Verify that** RAG pipeline retrieval events are logged, including the query, documents retrieved, and knowledge source. | 2 |
| **8.1.5** | **Verify that** retrieval operations enforces scope constraints. | 2 |
| **8.1.2** | **Verify that** document metadata tags are immutable after the initial write. | 2 |
| **8.1.3** | **Verify that** retrieval operations enforces scope constraints. | 2 |

---

Expand Down
8 changes: 3 additions & 5 deletions 1.0/en/0x10-C09-Orchestration-and-Agentic-Action.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,8 @@ Make every action attributable and every mutation detectable.
| :--: | --- | :---: |
| **9.4.1** | **Verify that** each agent instance has a unique cryptographic identity and authenticates as a first-class principal to downstream systems. | 2 |
| **9.4.2** | **Verify that** agent-initiated actions are cryptographically bound to each step of the execution chain for non-repudiation. | 2 |
| **9.4.3** | **Verify that** audit log records include identity, scope, authorization decisions, tool parameters, and outcomes. | 2 |
| **9.4.4** | **Verify that** agent identity credentials rotate on a defined schedule. | 3 |
| **9.4.5** | **Verify that** agent state persisted between invocations is integrity-protected. | 3 |
| **9.4.3** | **Verify that** agent identity credentials rotate on a defined schedule. | 3 |
| **9.4.4** | **Verify that** agent state persisted between invocations is integrity-protected. | 3 |

---

Expand All @@ -88,8 +87,7 @@ Provide shutdown and graceful degradation paths under human control, with mechan
| :--: | --- | :---: |
| **9.6.1** | **Verify that** a manual kill-switch mechanism exists to immediately halt AI model inference and outputs. | 1 |
| **9.6.2** | **Verify that** when a human-approval gate is not satisfied within the defined approval time, the system blocks the pending action. | 2 |
| **9.6.3** | **Verify that** kill-switch activations and override commands are logged. | 2 |
| **9.6.4** | **Verify that** kill-switch commands are implemented through an out-of-band channel that is isolated from the agent runtime. | 3 |
| **9.6.3** | **Verify that** kill-switch commands are implemented through an out-of-band channel that is isolated from the agent runtime. | 3 |

---

Expand Down
16 changes: 7 additions & 9 deletions 1.0/en/0x10-C11-Adversarial-Robustness.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,10 @@ Detect and deter unauthorized model cloning through API abuse. Rate limiting, qu
| # | Description | Level |
| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: |
| **11.5.1** | **Verify that** inference endpoints enforce per-principal and global rate limits sized to the extraction threat model, and not solely as a generic API throttle. | 1 |
| **11.5.2** | **Verify that** extraction-alert events include offending query metadata (e.g., source principal, query volume, input distribution statistics) to support investigation. | 2 |
| **11.5.3** | **Verify that** query-pattern analysis (e.g., query diversity, input distribution anomalies, output-space coverage anomalies) feeds an automated extraction-attempt detector. | 2 |
| **11.5.4** | **Verify that** raw model outputs (e.g., full posterior distributions, output vectors) are not directly exposed beyond the application backend, and that externally visible responses minimize output informativeness calibrated to the extraction risk level. | 2 |
| **11.5.5** | **Verify that** model watermarking or fingerprinting techniques are applied so that unauthorized copies can be identified. | 3 |
| **11.5.6** | **Verify that** detection of suspected extraction activity triggers adaptive response measures proportional to estimated extraction risk. | 3 |
| **11.5.2** | **Verify that** query-pattern analysis (e.g., query diversity, input distribution anomalies, output-space coverage anomalies) feeds an automated extraction-attempt detector. | 2 |
| **11.5.3** | **Verify that** raw model outputs (e.g., full posterior distributions, output vectors) are not directly exposed beyond the application backend, and that externally visible responses minimize output informativeness calibrated to the extraction risk level. | 2 |
| **11.5.4** | **Verify that** model watermarking or fingerprinting techniques are applied so that unauthorized copies can be identified. | 3 |
| **11.5.5** | **Verify that** detection of suspected extraction activity triggers adaptive response measures proportional to estimated extraction risk. | 3 |

---

Expand Down Expand Up @@ -121,10 +120,9 @@ Security controls for systems where the AI can modify its own configuration, pro
| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: |
| **11.9.1** | **Verify that** any self-modification capability (e.g., prompt rewriting, tool-list changes, parameter updates) is restricted to explicitly designated areas with enforced boundaries. | 2 |
| **11.9.2** | **Verify that** proposed self-modifications undergo security impact assessment or policy validation before taking effect. | 2 |
| **11.9.3** | **Verify that** self-modifications are explicitly classified as security-relevant events and logged with sufficient detail to reconstruct what changed, when, by which agent or principal, and under what authorization. This logging applies even if self-modification is not otherwise documented as a logged event. | 2 |
| **11.9.4** | **Verify that** self-modifications are reversible and subject to integrity verification, so that rollback to a known-good state is possible and can be confirmed. | 2 |
| **11.9.5** | **Verify that** self-modification scope is bounded (e.g., maximum change magnitude, rate limits on updates, prohibited modification targets) to prevent runaway or adversarially induced changes. | 3 |
| **11.9.6** | **Verify that** when safety violation data (blocked inputs, filtered outputs, flagged hallucinations) is used as training signal for model improvement, the feedback pipeline includes integrity verification, poisoning detection, and human review gates to prevent adversarial manipulation of the improvement mechanism. | 3 |
| **11.9.3** | **Verify that** self-modifications are reversible and subject to integrity verification, so that rollback to a known-good state is possible and can be confirmed. | 2 |
| **11.9.4** | **Verify that** self-modification scope is bounded (e.g., maximum change magnitude, rate limits on updates, prohibited modification targets) to prevent runaway or adversarially induced changes. | 3 |
| **11.9.5** | **Verify that** when safety violation data (blocked inputs, filtered outputs, flagged hallucinations) is used as training signal for model improvement, the feedback pipeline includes integrity verification, poisoning detection, and human review gates to prevent adversarial manipulation of the improvement mechanism. | 3 |

## C11.10 Adversarial Bias Exploitation Defense

Expand Down
Loading
Loading