-
Notifications
You must be signed in to change notification settings - Fork 4.3k
[FC-0118] docs: add ADR-0029 standardized error responses decision #38246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,298 @@ | ||
| Standardize Error Responses | ||
| ============================ | ||
|
|
||
| :Status: Accepted | ||
| :Date: 2026-03-31 | ||
| :Deciders: API Working Group | ||
| :Technical Story: Open edX REST API Standards – Error response interoperability | ||
|
|
||
| Context | ||
| ------- | ||
|
|
||
| Open edX APIs currently return errors in multiple incompatible shapes (e.g., ``{"error": ...}``, | ||
| ``{"detail": ...}``, nested field errors, and even HTTP 200 responses containing ``"success": false``). This | ||
| inconsistency makes it difficult for external clients and AI systems to reliably detect and map error | ||
| states across services. | ||
|
|
||
| Objectives | ||
| ---------- | ||
|
|
||
| We want error responses that: | ||
|
|
||
| * Use **correct HTTP status codes** (4xx/5xx) for failures, and avoid masking errors behind HTTP 200. | ||
| * Provide a **single, predictable JSON shape** so clients can implement one parsing path across services. | ||
| * Include **machine-readable identifiers** (e.g. a URI for the error class) so tools and integrations can | ||
| classify failures without scraping free-form text. | ||
| * Carry a **short human-readable summary** plus a **specific explanation** for this request when helpful. | ||
| * Tie errors to the **request** when useful (e.g. request path or URL) for support and logging. | ||
| * Represent **validation failures** in a consistent way (e.g. field/path to messages) instead of ad-hoc nesting. | ||
| * Are **documented and enforced** in DRF (central exception handling + schema generation). | ||
|
|
||
| Decision | ||
| -------- | ||
|
|
||
| We will standardize all Open edX REST APIs to return errors using a **structured JSON error object** for | ||
| non-2xx responses that meets the objectives above. | ||
|
|
||
| Implementation requirements: | ||
|
|
||
| * Use appropriate HTTP status codes (4xx/5xx). Avoid returning HTTP 200 for error conditions. | ||
| * Return a consistent payload with these core fields: | ||
|
|
||
| * ``type`` (URI identifying the problem type) | ||
| * ``title`` (short, developer/operator-facing summary of the error class; not intended for display to end users) | ||
| * ``status`` (HTTP status code) | ||
| * ``detail`` (stable, developer-facing explanation specific to this occurrence; safe for log | ||
| aggregators and APM tools — see *Note on RFC 9457 deviation* below) | ||
| * ``instance`` (the URI of the request that produced this error, e.g. the request path; see | ||
| *Note on ``instance``* below) | ||
| * ``user_message`` *(optional)* — a human-readable, translatable string intended for | ||
| display in MFEs or end-user UIs. MFE clients should prefer mapping the ``type`` URI to a | ||
| locally-translated string; use ``user_message`` when the server must supply context that cannot | ||
| be expressed by ``type`` alone. | ||
|
|
||
| * For validation errors, include a predictable extension member ``errors``: a dict mapping each | ||
| invalid field/path to a list of error message strings. This maps directly onto DRF's native | ||
| ``ValidationError.detail`` dict, so the central exception handler can populate it without | ||
| per-view changes. Example:: | ||
|
|
||
| "errors": { | ||
| "course_id": ["This field is required."], | ||
| "display_name": ["Ensure this field has no more than 255 characters."] | ||
| } | ||
|
|
||
| * Define a small catalog of common ``type`` URIs for shared errors. Initial entries: | ||
|
feanil marked this conversation as resolved.
|
||
|
|
||
| .. list-table:: | ||
| :header-rows: 1 | ||
| :widths: 50 10 40 | ||
|
|
||
| * - URI | ||
| - Status | ||
| - When to use | ||
| * - ``https://docs.openedx.org/errors/not-found`` | ||
| - 404 | ||
| - Resource does not exist | ||
| * - ``https://docs.openedx.org/errors/authz`` | ||
| - 403 | ||
| - Authenticated but not authorized | ||
| * - ``https://docs.openedx.org/errors/authn`` | ||
| - 401 | ||
| - Not authenticated | ||
| * - ``https://docs.openedx.org/errors/validation`` | ||
| - 400 | ||
| - Request body / query-param validation failure | ||
| * - ``https://docs.openedx.org/errors/rate-limited`` | ||
| - 429 | ||
| - Rate limit exceeded | ||
| * - ``https://docs.openedx.org/errors/internal`` | ||
| - 500 | ||
| - Unexpected server error | ||
|
|
||
| App-specific types may extend this catalog; they must still be absolute URIs. | ||
|
|
||
| While many catalog entries map 1-to-1 with an HTTP status code, ``type`` provides | ||
| sub-category granularity that HTTP status alone cannot express (e.g. ``authn`` vs | ||
| ``authz`` vs ``validation`` vs ``not-found`` are all 4xx but represent distinct failure | ||
| classes). App-specific ``type`` extensions add even finer-grained identifiers (e.g. | ||
| ``https://docs.openedx.org/errors/enrollment/already-enrolled``). The ``status`` field is | ||
| a convenience duplicate for clients that triage responses by status code without | ||
| inspecting the body further. | ||
|
|
||
| These URIs serve as **opaque, stable identifiers** first. They *should* eventually resolve to | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we want to have these resolve to valid URLs eventually, does it make more sense to put them on the docs.openedx.org site or some other subsite? It seems strange to keep this catalog at openedx.org which is a marketing site for the product.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated to docs.openedx.org |
||
| human-readable documentation pages on ``docs.openedx.org`` describing the error class, its | ||
| causes, and remediation steps — but dereference-ability is not a requirement for the initial | ||
| rollout. Clients must treat ``type`` as an opaque string and never rely on HTTP-fetching it at | ||
| runtime. | ||
| * Error responses must respect the content type signalled by the request. The platform must not | ||
| produce HTML error pages when the request used JSON (i.e. when ``Content-Type: application/json`` | ||
| or ``Accept: application/json`` was sent). The platform-level DRF exception handler must catch | ||
| exceptions that would otherwise produce Django's default HTML error page and return a JSON body | ||
| in the standardized format instead. Endpoints not using DRF's ``APIView`` must be identified and | ||
| wrapped accordingly. | ||
| * For **5xx / unhandled exceptions** in **production** (``DEBUG=False``), the handler must return | ||
| a **generic error body** — no stack traces, no internal exception messages, and no sensitive | ||
| system details must be included in the response. Only the ``https://docs.openedx.org/errors/internal`` | ||
| ``type`` and a fixed ``"Internal Server Error"`` title are safe to return. Detailed diagnostics | ||
| belong in server-side logs and APM tooling, not in API responses. | ||
|
|
||
| In **development** (``DEBUG=True``), the handler MAY include additional diagnostic information | ||
| (e.g. the exception class and message) in an extension field (e.g. ``debug_detail``) to ease | ||
| local debugging. Stack traces should still be written to the server log regardless of mode. | ||
| * Preserve **CORS headers** on error responses. When the exception handler short-circuits the | ||
| normal response cycle, ``Access-Control-*`` headers set by ``django-cors-headers`` can be | ||
| dropped, causing browsers to surface a misleading CORS error rather than the actual error | ||
| body. The platform-level exception handler must ensure CORS headers are not stripped from | ||
| error responses. | ||
| * Ensure the schema is **documented in drf-spectacular** by registering the standardized error | ||
| shape as a reusable component (``#/components/schemas/ErrorResponse``), so all API endpoint | ||
| docs automatically reference it for 4xx/5xx response types. | ||
|
|
||
| Note on RFC 9457 deviation | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| `RFC 9457 <https://www.rfc-editor.org/rfc/rfc9457>`_ (Problem Details for HTTP APIs) defines | ||
| ``detail`` as a "human-readable explanation" intended for the client/end-user. This ADR | ||
| intentionally deviates from that definition: we use ``detail`` for a **stable, developer-facing, | ||
| English-language** string that is safe to forward to APM systems and log aggregators. User-facing | ||
| copy is carried in the separate ``user_message`` field instead. This separation keeps localizable, | ||
| UI-bound strings out of the machine-readable layer while still providing a meaningful explanation | ||
| for developers and on-call engineers. | ||
|
|
||
| Note on ``instance`` | ||
| ~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| The ``instance`` field in this ADR is the **path of the request that produced the error** (e.g. | ||
| ``request.path``, yielding ``/api/courses/v1/``). A path-only value is preferred over a full | ||
| absolute URL (``request.build_absolute_uri()``) because it is useful for correlation and support | ||
| without embedding the server hostname or protocol, which can vary across environments. RFC 9457 | ||
| permits ``instance`` to be either relative or absolute and does not require it to be | ||
| dereferenceable; using the request path is a valid application of the field. | ||
|
|
||
| Relevance in edx-platform | ||
| ------------------------- | ||
|
|
||
| Current error shapes in the codebase are inconsistent: | ||
|
|
||
| * **DeveloperErrorViewMixin** (``openedx/core/lib/api/view_utils.py``) returns | ||
| ``{"developer_message": "...", "error_code": "..."}`` and for validation | ||
| ``{"developer_message": "...", "field_errors": {field: {"developer_message": "..."}}}``. | ||
| * **Instructor API** (``lms/djangoapps/instructor/views/api.py``) uses | ||
| ``JsonResponse({"error": msg}, 400)``. | ||
| * **Registration** (``openedx/core/djangoapps/user_authn/views/register.py``) returns | ||
| HTTP 200 with ``success: true/false`` and ``error_code`` for some failures. | ||
| * **ORA Staff Grader** (``lms/djangoapps/ora_staff_grader/errors.py``) uses a custom | ||
| ``ErrorSerializer`` with an ``error`` field. | ||
| * **Enrollment API** (``openedx/core/djangoapps/enrollments/``) returns | ||
| ``{"message": "..."}`` or ``{"message": "...", "localizedMessage": "..."}`` for errors. | ||
|
|
||
|
taimoor-ahmed-1 marked this conversation as resolved.
|
||
| Code example (target shape) | ||
| --------------------------- | ||
|
|
||
| **Example structured error response (4xx):** | ||
|
|
||
| .. code-block:: json | ||
|
|
||
| { | ||
| "type": "https://docs.openedx.org/errors/validation", | ||
| "title": "Validation Error", | ||
| "status": 400, | ||
| "detail": "The request body failed validation.", | ||
| "user_message": "Some required fields are missing or invalid.", | ||
| "instance": "/api/courses/v1/", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This example and the code below is inconsistent. Here we have the path without giving the full url. I like that. It's useful without identifying the specific server. But below, in the code example you're using
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated to request.path |
||
| "errors": { | ||
| "course_id": ["This field is required."], | ||
| "display_name": ["Ensure this field has no more than 255 characters."] | ||
| } | ||
| } | ||
|
|
||
| **Attaching a** ``user_message`` **to an exception:** | ||
|
|
||
| Because ``user_message`` is detected via ``hasattr``, it can be set on any ``APIException`` | ||
| instance before raising — no subclass required: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from django.utils.translation import gettext_lazy as _ | ||
| from rest_framework.exceptions import APIException | ||
|
|
||
| exc = APIException("Enrollment limit reached for course-v1:edX+DemoX+Demo_Course.") | ||
| exc.user_message = _("This course is currently full. Please try again later.") | ||
| raise exc | ||
|
|
||
| The central exception handler's ``hasattr(exc, 'user_message')`` check picks this up | ||
| automatically, requiring no per-view changes. | ||
|
|
||
| **Example DRF exception handler emitting the standard shape:** | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| # Central exception handler (e.g. in openedx/core/lib/api/exceptions.py) | ||
| def standardized_error_exception_handler(exc, context): | ||
| from rest_framework.views import exception_handler | ||
| response = exception_handler(exc, context) | ||
| if response is None: | ||
| # DRF returned None — unhandled exception (e.g. IntegrityError, unexpected 5xx). | ||
| # Always return a generic body; never include stack traces or exception details. | ||
| return Response( | ||
| { | ||
| "type": "https://docs.openedx.org/errors/internal", | ||
| "title": "Internal Server Error", | ||
| "status": 500, | ||
| "detail": "An unexpected error occurred. Please try again later.", | ||
| }, | ||
| status=500, | ||
| ) | ||
| request = context.get("request") | ||
| body = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have updated the code example to show how |
||
| "type": f"https://docs.openedx.org/errors/{_error_type(exc)}", | ||
| "title": _error_title(exc), | ||
| "status": response.status_code, | ||
| "detail": _flatten_detail(response.data), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a problem here because the DRF exception_handler does not catch all 5xx errors. Sending all the response data here would include error messages and stack traces.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch on both of the points, I have addressed in the updated handler example |
||
| } | ||
| if request: | ||
| body["instance"] = request.path | ||
| if hasattr(exc, "user_message") and exc.user_message: | ||
| body["user_message"] = exc.user_message | ||
|
Comment on lines
+235
to
+236
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the standard APIException object doesn't have a user_message attribute. Are we going to make a new base exception for this? Or do we want to show an example of how we add this extra detail to an existing APIException message? It would be good to show what good code looks like that properly uses this standard error response.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added code on how |
||
| if isinstance(exc, ValidationError) and hasattr(exc, "detail"): | ||
| body["errors"] = _normalize_validation_errors(exc.detail) | ||
| response.data = body | ||
| response["Content-Type"] = "application/json" | ||
| return response | ||
|
|
||
| Consequences | ||
| ------------ | ||
|
|
||
| Positive | ||
| ~~~~~~~~ | ||
|
|
||
| * Clients can implement a single error-handling path across services. | ||
| * AI agents and external integrations can programmatically detect and classify error states. | ||
| * Removes "hidden failures" caused by HTTP 200 + ``success: false`` patterns. | ||
|
|
||
| Negative / Trade-offs | ||
| ~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| * Requires refactoring of existing endpoints and tests that currently depend on ad-hoc error shapes. | ||
| * Some clients may need a migration period if they parse legacy error formats. | ||
|
|
||
| Alternatives Considered | ||
| ----------------------- | ||
|
|
||
| * **Keep per-app formats**: rejected due to interoperability and client complexity. | ||
| * **Use DRF defaults only**: rejected because DRF defaults still vary across validation/auth exceptions | ||
| unless centrally handled and documented. | ||
| * **`drf-standardized-errors <https://github.com/ghazi-git/drf-standardized-errors>`_**: a well-maintained | ||
| third-party library that implements RFC 9457-style responses for DRF. Considered but not adopted | ||
| because: (a) it would add a new dependency to platform core, (b) we need custom behavior for CORS | ||
| header preservation and the non-``APIView`` 500 path that would require overriding most of the | ||
| library anyway, and (c) the contract defined here is lightweight enough to implement directly in | ||
| the platform exception handler without a library. | ||
|
|
||
| Rollout Plan | ||
| ------------ | ||
|
|
||
| Error response format changes are considered backwards-compatible: well-behaved clients should | ||
| handle unexpected JSON fields gracefully (robustness principle). The default migration path is | ||
| therefore **in-place** — update the exception handler and, where needed, individual views without | ||
| bumping the URL version. Teams with clients that are tightly coupled to a legacy error shape MAY | ||
| version their endpoint following ADR-0037 (API Versioning Strategy) and maintain both shapes | ||
| during a deprecation window. | ||
|
|
||
| 1. Introduce a shared DRF exception handler (platform-level) that emits the standardized error shape, | ||
| including catching unhandled exceptions that would otherwise produce Django's HTML 500 page. | ||
| 2. Verify CORS headers (``Access-Control-*``) are preserved on all error responses; update the | ||
| exception handler if ``django-cors-headers`` does not run before it. | ||
| 3. Update existing endpoint unit tests to assert the standardized error shape. Contract tests | ||
| across services are optional but encouraged for endpoints consumed by external clients. | ||
| 4. Audit and fix endpoints that still return HTML errors on 500 (e.g. non-``APIView`` entry points). | ||
| 5. Migrate apps module-by-module; keep a short deprecation window for legacy shapes where feasible. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you imagine making these changes in-place on existing API endpoints or adding a new version url that uses the new behavior?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In-place is the intended default. Added a clarifying paragraph at the top of the Rollout Plan. |
||
| 6. Update API documentation to specify the standard error schema. | ||
|
|
||
| References | ||
| ---------- | ||
|
|
||
| * Open edX REST API Standards: "Inconsistent Error Response Structure" and alignment with structured, | ||
| interoperable error payloads across services. | ||
| * `RFC 9457 – Problem Details for HTTP APIs <https://www.rfc-editor.org/rfc/rfc9457>`_ | ||
| * `drf-standardized-errors <https://github.com/ghazi-git/drf-standardized-errors>`_ | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about in OpenAPI specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the standardized error shape should be reflected in OpenAPI specs. The ADR now includes an explicit implementation requirement for this