Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
298 changes: 298 additions & 0 deletions docs/decisions/0029-standardize-error-responses.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
Standardize Error Responses
============================

:Status: Accepted
:Date: 2026-03-31
:Deciders: API Working Group
:Technical Story: Open edX REST API Standards – Error response interoperability

Context
-------

Open edX APIs currently return errors in multiple incompatible shapes (e.g., ``{"error": ...}``,
``{"detail": ...}``, nested field errors, and even HTTP 200 responses containing ``"success": false``). This
inconsistency makes it difficult for external clients and AI systems to reliably detect and map error
states across services.

Objectives
----------

We want error responses that:

* Use **correct HTTP status codes** (4xx/5xx) for failures, and avoid masking errors behind HTTP 200.
* Provide a **single, predictable JSON shape** so clients can implement one parsing path across services.
* Include **machine-readable identifiers** (e.g. a URI for the error class) so tools and integrations can
classify failures without scraping free-form text.
* Carry a **short human-readable summary** plus a **specific explanation** for this request when helpful.
* Tie errors to the **request** when useful (e.g. request path or URL) for support and logging.
* Represent **validation failures** in a consistent way (e.g. field/path to messages) instead of ad-hoc nesting.
* Are **documented and enforced** in DRF (central exception handling + schema generation).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about in OpenAPI specs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the standardized error shape should be reflected in OpenAPI specs. The ADR now includes an explicit implementation requirement for this


Decision
--------

We will standardize all Open edX REST APIs to return errors using a **structured JSON error object** for
non-2xx responses that meets the objectives above.

Implementation requirements:

* Use appropriate HTTP status codes (4xx/5xx). Avoid returning HTTP 200 for error conditions.
* Return a consistent payload with these core fields:

* ``type`` (URI identifying the problem type)
* ``title`` (short, developer/operator-facing summary of the error class; not intended for display to end users)
* ``status`` (HTTP status code)
* ``detail`` (stable, developer-facing explanation specific to this occurrence; safe for log
aggregators and APM tools — see *Note on RFC 9457 deviation* below)
* ``instance`` (the URI of the request that produced this error, e.g. the request path; see
*Note on ``instance``* below)
* ``user_message`` *(optional)* — a human-readable, translatable string intended for
display in MFEs or end-user UIs. MFE clients should prefer mapping the ``type`` URI to a
locally-translated string; use ``user_message`` when the server must supply context that cannot
be expressed by ``type`` alone.

* For validation errors, include a predictable extension member ``errors``: a dict mapping each
invalid field/path to a list of error message strings. This maps directly onto DRF's native
``ValidationError.detail`` dict, so the central exception handler can populate it without
per-view changes. Example::

"errors": {
"course_id": ["This field is required."],
"display_name": ["Ensure this field has no more than 255 characters."]
}

* Define a small catalog of common ``type`` URIs for shared errors. Initial entries:
Comment thread
feanil marked this conversation as resolved.

.. list-table::
:header-rows: 1
:widths: 50 10 40

* - URI
- Status
- When to use
* - ``https://docs.openedx.org/errors/not-found``
- 404
- Resource does not exist
* - ``https://docs.openedx.org/errors/authz``
- 403
- Authenticated but not authorized
* - ``https://docs.openedx.org/errors/authn``
- 401
- Not authenticated
* - ``https://docs.openedx.org/errors/validation``
- 400
- Request body / query-param validation failure
* - ``https://docs.openedx.org/errors/rate-limited``
- 429
- Rate limit exceeded
* - ``https://docs.openedx.org/errors/internal``
- 500
- Unexpected server error

App-specific types may extend this catalog; they must still be absolute URIs.

While many catalog entries map 1-to-1 with an HTTP status code, ``type`` provides
sub-category granularity that HTTP status alone cannot express (e.g. ``authn`` vs
``authz`` vs ``validation`` vs ``not-found`` are all 4xx but represent distinct failure
classes). App-specific ``type`` extensions add even finer-grained identifiers (e.g.
``https://docs.openedx.org/errors/enrollment/already-enrolled``). The ``status`` field is
a convenience duplicate for clients that triage responses by status code without
inspecting the body further.

These URIs serve as **opaque, stable identifiers** first. They *should* eventually resolve to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to have these resolve to valid URLs eventually, does it make more sense to put them on the docs.openedx.org site or some other subsite? It seems strange to keep this catalog at openedx.org which is a marketing site for the product.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to docs.openedx.org

human-readable documentation pages on ``docs.openedx.org`` describing the error class, its
causes, and remediation steps — but dereference-ability is not a requirement for the initial
rollout. Clients must treat ``type`` as an opaque string and never rely on HTTP-fetching it at
runtime.
* Error responses must respect the content type signalled by the request. The platform must not
produce HTML error pages when the request used JSON (i.e. when ``Content-Type: application/json``
or ``Accept: application/json`` was sent). The platform-level DRF exception handler must catch
exceptions that would otherwise produce Django's default HTML error page and return a JSON body
in the standardized format instead. Endpoints not using DRF's ``APIView`` must be identified and
wrapped accordingly.
* For **5xx / unhandled exceptions** in **production** (``DEBUG=False``), the handler must return
a **generic error body** — no stack traces, no internal exception messages, and no sensitive
system details must be included in the response. Only the ``https://docs.openedx.org/errors/internal``
``type`` and a fixed ``"Internal Server Error"`` title are safe to return. Detailed diagnostics
belong in server-side logs and APM tooling, not in API responses.

In **development** (``DEBUG=True``), the handler MAY include additional diagnostic information
(e.g. the exception class and message) in an extension field (e.g. ``debug_detail``) to ease
local debugging. Stack traces should still be written to the server log regardless of mode.
* Preserve **CORS headers** on error responses. When the exception handler short-circuits the
normal response cycle, ``Access-Control-*`` headers set by ``django-cors-headers`` can be
dropped, causing browsers to surface a misleading CORS error rather than the actual error
body. The platform-level exception handler must ensure CORS headers are not stripped from
error responses.
* Ensure the schema is **documented in drf-spectacular** by registering the standardized error
shape as a reusable component (``#/components/schemas/ErrorResponse``), so all API endpoint
docs automatically reference it for 4xx/5xx response types.

Note on RFC 9457 deviation
~~~~~~~~~~~~~~~~~~~~~~~~~~

`RFC 9457 <https://www.rfc-editor.org/rfc/rfc9457>`_ (Problem Details for HTTP APIs) defines
``detail`` as a "human-readable explanation" intended for the client/end-user. This ADR
intentionally deviates from that definition: we use ``detail`` for a **stable, developer-facing,
English-language** string that is safe to forward to APM systems and log aggregators. User-facing
copy is carried in the separate ``user_message`` field instead. This separation keeps localizable,
UI-bound strings out of the machine-readable layer while still providing a meaningful explanation
for developers and on-call engineers.

Note on ``instance``
~~~~~~~~~~~~~~~~~~~~

The ``instance`` field in this ADR is the **path of the request that produced the error** (e.g.
``request.path``, yielding ``/api/courses/v1/``). A path-only value is preferred over a full
absolute URL (``request.build_absolute_uri()``) because it is useful for correlation and support
without embedding the server hostname or protocol, which can vary across environments. RFC 9457
permits ``instance`` to be either relative or absolute and does not require it to be
dereferenceable; using the request path is a valid application of the field.

Relevance in edx-platform
-------------------------

Current error shapes in the codebase are inconsistent:

* **DeveloperErrorViewMixin** (``openedx/core/lib/api/view_utils.py``) returns
``{"developer_message": "...", "error_code": "..."}`` and for validation
``{"developer_message": "...", "field_errors": {field: {"developer_message": "..."}}}``.
* **Instructor API** (``lms/djangoapps/instructor/views/api.py``) uses
``JsonResponse({"error": msg}, 400)``.
* **Registration** (``openedx/core/djangoapps/user_authn/views/register.py``) returns
HTTP 200 with ``success: true/false`` and ``error_code`` for some failures.
* **ORA Staff Grader** (``lms/djangoapps/ora_staff_grader/errors.py``) uses a custom
``ErrorSerializer`` with an ``error`` field.
* **Enrollment API** (``openedx/core/djangoapps/enrollments/``) returns
``{"message": "..."}`` or ``{"message": "...", "localizedMessage": "..."}`` for errors.

Comment thread
taimoor-ahmed-1 marked this conversation as resolved.
Code example (target shape)
---------------------------

**Example structured error response (4xx):**

.. code-block:: json

{
"type": "https://docs.openedx.org/errors/validation",
"title": "Validation Error",
"status": 400,
"detail": "The request body failed validation.",
"user_message": "Some required fields are missing or invalid.",
"instance": "/api/courses/v1/",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example and the code below is inconsistent. Here we have the path without giving the full url. I like that. It's useful without identifying the specific server. But below, in the code example you're using build_absolute_uri which will include the full url including protocol. Can you update the code to match what we have documented.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to request.path

"errors": {
"course_id": ["This field is required."],
"display_name": ["Ensure this field has no more than 255 characters."]
}
}

**Attaching a** ``user_message`` **to an exception:**

Because ``user_message`` is detected via ``hasattr``, it can be set on any ``APIException``
instance before raising — no subclass required:

.. code-block:: python

from django.utils.translation import gettext_lazy as _
from rest_framework.exceptions import APIException

exc = APIException("Enrollment limit reached for course-v1:edX+DemoX+Demo_Course.")
exc.user_message = _("This course is currently full. Please try again later.")
raise exc

The central exception handler's ``hasattr(exc, 'user_message')`` check picks this up
automatically, requiring no per-view changes.

**Example DRF exception handler emitting the standard shape:**

.. code-block:: python

# Central exception handler (e.g. in openedx/core/lib/api/exceptions.py)
def standardized_error_exception_handler(exc, context):
from rest_framework.views import exception_handler
response = exception_handler(exc, context)
if response is None:
# DRF returned None — unhandled exception (e.g. IntegrityError, unexpected 5xx).
# Always return a generic body; never include stack traces or exception details.
return Response(
{
"type": "https://docs.openedx.org/errors/internal",
"title": "Internal Server Error",
"status": 500,
"detail": "An unexpected error occurred. Please try again later.",
},
status=500,
)
request = context.get("request")
body = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the user_message it seems like that's not being handled here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code example to show how user_message gets populated

"type": f"https://docs.openedx.org/errors/{_error_type(exc)}",
"title": _error_title(exc),
"status": response.status_code,
"detail": _flatten_detail(response.data),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem here because the DRF exception_handler does not catch all 5xx errors. Sending all the response data here would include error messages and stack traces.
A really important job of this function would be to catch any uncaught or unknown exceptions that are not handled by DRF, and send a standard "Internal Server Error" message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on both of the points, I have addressed in the updated handler example

}
if request:
body["instance"] = request.path
if hasattr(exc, "user_message") and exc.user_message:
body["user_message"] = exc.user_message
Comment on lines +235 to +236
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the standard APIException object doesn't have a user_message attribute. Are we going to make a new base exception for this? Or do we want to show an example of how we add this extra detail to an existing APIException message? It would be good to show what good code looks like that properly uses this standard error response.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added code on how user_message be added.

if isinstance(exc, ValidationError) and hasattr(exc, "detail"):
body["errors"] = _normalize_validation_errors(exc.detail)
response.data = body
response["Content-Type"] = "application/json"
return response

Consequences
------------

Positive
~~~~~~~~

* Clients can implement a single error-handling path across services.
* AI agents and external integrations can programmatically detect and classify error states.
* Removes "hidden failures" caused by HTTP 200 + ``success: false`` patterns.

Negative / Trade-offs
~~~~~~~~~~~~~~~~~~~~~

* Requires refactoring of existing endpoints and tests that currently depend on ad-hoc error shapes.
* Some clients may need a migration period if they parse legacy error formats.

Alternatives Considered
-----------------------

* **Keep per-app formats**: rejected due to interoperability and client complexity.
* **Use DRF defaults only**: rejected because DRF defaults still vary across validation/auth exceptions
unless centrally handled and documented.
* **`drf-standardized-errors <https://github.com/ghazi-git/drf-standardized-errors>`_**: a well-maintained
third-party library that implements RFC 9457-style responses for DRF. Considered but not adopted
because: (a) it would add a new dependency to platform core, (b) we need custom behavior for CORS
header preservation and the non-``APIView`` 500 path that would require overriding most of the
library anyway, and (c) the contract defined here is lightweight enough to implement directly in
the platform exception handler without a library.

Rollout Plan
------------

Error response format changes are considered backwards-compatible: well-behaved clients should
handle unexpected JSON fields gracefully (robustness principle). The default migration path is
therefore **in-place** — update the exception handler and, where needed, individual views without
bumping the URL version. Teams with clients that are tightly coupled to a legacy error shape MAY
version their endpoint following ADR-0037 (API Versioning Strategy) and maintain both shapes
during a deprecation window.

1. Introduce a shared DRF exception handler (platform-level) that emits the standardized error shape,
including catching unhandled exceptions that would otherwise produce Django's HTML 500 page.
2. Verify CORS headers (``Access-Control-*``) are preserved on all error responses; update the
exception handler if ``django-cors-headers`` does not run before it.
3. Update existing endpoint unit tests to assert the standardized error shape. Contract tests
across services are optional but encouraged for endpoints consumed by external clients.
4. Audit and fix endpoints that still return HTML errors on 500 (e.g. non-``APIView`` entry points).
5. Migrate apps module-by-module; keep a short deprecation window for legacy shapes where feasible.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you imagine making these changes in-place on existing API endpoints or adding a new version url that uses the new behavior?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-place is the intended default. Added a clarifying paragraph at the top of the Rollout Plan.

6. Update API documentation to specify the standard error schema.

References
----------

* Open edX REST API Standards: "Inconsistent Error Response Structure" and alignment with structured,
interoperable error payloads across services.
* `RFC 9457 – Problem Details for HTTP APIs <https://www.rfc-editor.org/rfc/rfc9457>`_
* `drf-standardized-errors <https://github.com/ghazi-git/drf-standardized-errors>`_
Loading