What Is Annex IV Technical Documentation Under the EU AI Act?

A plain-English breakdown of all 9 categories of technical documentation required by Article 11 and Annex IV of the EU AI Act — what each section contains and what reviewers look for.

Article 11 of the EU AI Act requires providers of high-risk AI systems to prepare technical documentation before placing their system on the EU market. The specific content of that documentation is defined in Annex IV, which lists nine categories of information that must be included.

This post explains what each category requires in plain language — without the regulatory boilerplate — and highlights where teams most commonly run into trouble.

Why Annex IV Exists

Technical documentation is the primary mechanism by which market surveillance authorities verify that a high-risk AI system complies with the Act. Unlike physical products, AI systems cannot be inspected by picking them up and examining them. The technical documentation is the inspectable artifact.

It also serves a second function: the discipline of producing it forces providers to make explicit decisions that are often left implicit during development — about intended purpose, performance thresholds, acceptable risks, and what "good enough" looks like for each Article 9-15 obligation.

The documentation must be ready before placing the system on the EU market, and it must be kept up to date throughout the system's lifecycle. It is not a one-time deliverable.

The 9 Annex IV Categories

1. General Description of the AI System

This section describes the system as a product. It must include:

The intended purpose — specifically and precisely stated, not "an AI system that helps HR teams"
The name, version number, and system identifier
The hardware and software environment the system runs on
The form in which the system is placed on the market (SaaS, API, embedded module, etc.)
How the system interacts with other hardware or software components

What reviewers look for: Version specificity. The documentation must describe the specific version being assessed, not the product in general. Vague intended purpose descriptions are a common gap — "helps HR teams evaluate candidates" is insufficient; "an AI system that ranks job applications for software engineering roles at mid-market technology companies based on CV content and structured interview scores" is the level of precision required.

2. Description of Elements and Development Process

This is the most technically demanding section. It requires:

Methods and steps performed during development
The system's general logic, algorithm description, and key design choices
Rationale for design choices, including trade-offs considered
System architecture description: how components interact
Datasheets: description of training, validation, and test datasets — sources, size, collection methodology, known limitations
Assessment of circumstances that could cause the system to fail or underperform

What reviewers look for: Training data documentation is the section most commonly thin or absent. Describing the model architecture without describing the data that trained it leaves a significant gap. The datasheet requirement is not just "we used 100,000 labelled examples from internal data" — it includes demographic coverage, collection methodology, and known limitations.

3. Monitoring, Functioning, and Control

This section covers the operational characteristics of the system:

Performance capabilities and limitations, including accuracy and error rates on defined test sets
Known or foreseeable circumstances that could lead to failures or increased risks
Description of human oversight measures — the technical design of how oversight is enabled
Input data specifications and validation mechanisms

What reviewers look for: This section is where Article 14 (human oversight) requirements must be documented concretely. "Human oversight is provided by the deployment team" is not sufficient. The documentation must describe how the system is designed to support oversight — what information is surfaced to human operators, what intervention mechanisms exist, and how they were tested.

4. Appropriateness of Performance Metrics

This section justifies the metrics used to evaluate the system:

Description of the metrics chosen and why they are appropriate for the intended purpose
Benchmark results on defined test sets
Performance disaggregated by relevant demographic subgroups (where applicable)

What reviewers look for: Accuracy alone is almost never the right sole metric for a high-risk system. For systems that affect individuals, reviewers will look for demographic breakdowns. A credit scoring system evaluated only on AUC without demographic parity analysis does not satisfy this section.

5. Risk Management System

This section incorporates the Article 9 risk management documentation:

Summary or reference to the risk management system maintained under Article 9
Identified risks and their assessment
Mitigation measures adopted
Residual risk evaluation and acceptability judgment

What reviewers look for: This section must contain substance — not just a reference to a separate risk management document. At minimum, include a summary of the risk identification methodology, the most significant identified risks, and the rationale for accepting residual risks. The explicit acceptability judgment is frequently missing.

6. Changes Made Through the Lifecycle

A version-controlled log of material changes to the system after initial documentation:

What changed and when
Assessment of whether any change constitutes a "substantial modification" requiring a new conformity assessment
Updated documentation references for changed sections

What reviewers look for: This section is the first thing reviewers check when investigating an incident. An empty change log — or one that only records major releases and ignores model retraining runs — is a significant documentation gap. Maintain this log from first deployment, not from the compliance deadline.

7. Standards, Specifications, and Conformity

A reference to the technical standards and specifications relied upon:

List of harmonised standards applied (in full or in part)
Where no harmonised standards exist: description of the technical solutions adopted to meet Chapter III Section 2 requirements

What reviewers look for: As of 2026, EU AI Act harmonised standards are still being developed by CEN/CENELEC. In their absence, this section requires a technical narrative explaining how your implementation satisfies each relevant obligation — not just a statement that you are compliant. Reference technical approaches, testing methodologies, and industry standards used as proxies.

8. EU Declaration of Conformity

A copy of the EU declaration of conformity signed by the provider:

Identifies the provider and the system
States that the system complies with the requirements of Chapter III Section 2
References the conformity assessment procedure applied

What reviewers look for: This is a legal document, not a template to be completed at the last minute. It requires an authorised signatory, a specific system identifier, and an accurate reference to the conformity procedure. The declaration must be updated when the system is substantially modified.

9. Post-Market Monitoring System

A description of the monitoring infrastructure in place after deployment:

How production performance data is collected and analysed
How incidents and near-misses are detected and reported
How monitoring findings feed back into the risk management system (Article 9)
Incident reporting procedures and timeframes

What reviewers look for: The monitoring system must exist at the time documentation is finalised — it cannot be described as a future plan. This means production logging infrastructure, demographic parity monitoring, and incident escalation procedures must be implemented before market placement.

The Most Common Documentation Failures

Three patterns appear repeatedly in EU AI Act documentation reviews:

The point-in-time snapshot: Documentation accurately describes the system as it was when documentation was written, but is not updated when the system changes. By the time a market surveillance authority reviews it, the documentation describes a system that no longer exists.

The vague intended purpose: Insufficient precision in Category 1 makes it impossible to verify which Annex III category applies, and makes risk identification in Category 5 inconsistent with the system's actual deployment context.

The missing data documentation: Category 2 training data requirements are treated as optional or given a single paragraph. The datasheet requirement for training, validation, and test data is substantive and frequently undermet.

Nytivo generates all 9 Annex IV categories from your system data, keeps them linked to your risk management documentation, and exports the full pack as a structured PDF. Start your documentation →