From 9dcc51e5363587d21ddf2a7d783e98f26ce4dda2 Mon Sep 17 00:00:00 2001 From: choldgraf Date: Fri, 15 May 2026 07:03:47 -0700 Subject: [PATCH 1/3] Add light data subprocessors doc --- community-lead/user-policy/index.md | 1 + community-lead/user-policy/sub-processors.md | 51 ++++++++++++++++++++ 2 files changed, 52 insertions(+) create mode 100644 community-lead/user-policy/sub-processors.md diff --git a/community-lead/user-policy/index.md b/community-lead/user-policy/index.md index fdfb044..0cdb05e 100644 --- a/community-lead/user-policy/index.md +++ b/community-lead/user-policy/index.md @@ -7,4 +7,5 @@ These describe the expectations and rules around the service. code-of-conduct acceptable-use privacy +sub-processors ``` diff --git a/community-lead/user-policy/sub-processors.md b/community-lead/user-policy/sub-processors.md new file mode 100644 index 0000000..0677c18 --- /dev/null +++ b/community-lead/user-policy/sub-processors.md @@ -0,0 +1,51 @@ +# Sub-processors + +A **sub-processor** is a third-party service that processes data on behalf of 2i2c while we operate infrastructure for a community. +This page describes the sub-processors 2i2c uses in a typical hub deployment. +The specifics will differ based on each community's cluster configuration, but this gives a broad idea. +See [](#data-processor:find) below to determine the list for a particular hub. + +:::{admonition} This is not a legal document +:class: warning +This document is provided to set expectations and understanding about 2i2c's cloud infrastructure service. +It is not legally binding. +::: + +## Typical sub-processors + +The services below are common touchpoints for user data (e.g., their identity, files they create, etc). + +### Cloud or infrastructure provider + +An infrastructure provider (like a cloud provider) hosts the core infrastructure that runs the hub, and provides compute, storage, networking, etc. +2i2c either manages its own account, or uses one managed by the community. +All user-generated content (notebooks, files, outputs) and the hub's authentication database live within this account. + +Here are a few common cloud providers we use and their sub-processors pages: + +- **Amazon Web Services (AWS)**. [GDPR Center](https://aws.amazon.com/compliance/gdpr-center/), [sub-processors](https://aws.amazon.com/compliance/sub-processors/) +- **Google Cloud Platform (GCP)**. [GDPR resource center](https://cloud.google.com/privacy/gdpr), [sub-processors](https://cloud.google.com/terms/subprocessors) +- **Microsoft Azure**. [Data Protection Addendum](https://www.microsoft.com/licensing/docs/view/Microsoft-Products-and-Services-Data-Protection-Addendum-DPA), [service trust page](https://servicetrust.microsoft.com) + +### Identity provider + +The service used to authenticate users when they log in. It receives the user identifiers needed to grant access to the hub (e.g., GitHub username or email). +Here are a few common ones we use: + +- **GitHub**. [Privacy Statement](https://docs.github.com/en/site-policy/privacy-policies/github-general-privacy-statement) +- **CILogon**. operated by the [University of Illinois NCSA](https://www.cilogon.org/about) +- **Google**. [Privacy & Terms](https://policies.google.com/privacy) + +**Note**: If a hub uses CILogon, the user's home institution is itself the identity provider and effectively the relevant sub-processor for that user's authentication data. + +(data-processor:find)= +## How to find the data processors for a specific hub + +The sub-processors used by a given hub are determined by its configuration in the [2i2c infrastructure repository](https://github.com/2i2c-org/infrastructure/). +This will usually involve digging through `.yaml` configuration to find the services that your hub uses. +If you need this information, please [open a support ticket](../../support.md). + +## Related + +- [](./privacy.md). what data 2i2c does and does not retain +- [](./acceptable-use.md). expectations for using 2i2c-managed infrastructure From 9f49b9e2e7235f2a9efa27e0249297ca6d607e53 Mon Sep 17 00:00:00 2001 From: choldgraf Date: Fri, 15 May 2026 08:39:30 -0700 Subject: [PATCH 2/3] Add some clarification --- community-lead/user-policy/sub-processors.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/community-lead/user-policy/sub-processors.md b/community-lead/user-policy/sub-processors.md index 0677c18..ab09412 100644 --- a/community-lead/user-policy/sub-processors.md +++ b/community-lead/user-policy/sub-processors.md @@ -17,9 +17,10 @@ The services below are common touchpoints for user data (e.g., their identity, f ### Cloud or infrastructure provider -An infrastructure provider (like a cloud provider) hosts the core infrastructure that runs the hub, and provides compute, storage, networking, etc. -2i2c either manages its own account, or uses one managed by the community. -All user-generated content (notebooks, files, outputs) and the hub's authentication database live within this account. +An infrastructure provider (like a cloud provider) hosts the core infrastructure that runs the hub, and provides **compute**, **storage**, **logging**, **username database**, etc. +This includes the files that a user stores when they do their work (e.g. notebooks, data, etc). + +2i2c either manages its own cloud provider account, or uses one managed by the community. Here are a few common cloud providers we use and their sub-processors pages: @@ -30,6 +31,8 @@ Here are a few common cloud providers we use and their sub-processors pages: ### Identity provider The service used to authenticate users when they log in. It receives the user identifiers needed to grant access to the hub (e.g., GitHub username or email). +Generally speaking, the hub _uses_ these providers to authenticate but does not store authentication on the providers, the user or their home institution have a direct relationship with the identity provider. + Here are a few common ones we use: - **GitHub**. [Privacy Statement](https://docs.github.com/en/site-policy/privacy-policies/github-general-privacy-statement) From 510b185d57b4debeb7fb69f0fda302809503f6f8 Mon Sep 17 00:00:00 2001 From: choldgraf Date: Fri, 15 May 2026 13:15:23 -0700 Subject: [PATCH 3/3] Make the usage situations clearer for data subprocessors --- community-lead/user-policy/sub-processors.md | 25 ++++++++++++++------ 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/community-lead/user-policy/sub-processors.md b/community-lead/user-policy/sub-processors.md index ab09412..2129519 100644 --- a/community-lead/user-policy/sub-processors.md +++ b/community-lead/user-policy/sub-processors.md @@ -1,8 +1,8 @@ # Sub-processors A **sub-processor** is a third-party service that processes data on behalf of 2i2c while we operate infrastructure for a community. -This page describes the sub-processors 2i2c uses in a typical hub deployment. -The specifics will differ based on each community's cluster configuration, but this gives a broad idea. +This page describes the sub-processors used in a typical hub deployment and the kinds of personal data each one handles. +The specifics differ per hub (e.g., the cloud region where data is stored, or any add-on services), so the list below is not exhaustive. See [](#data-processor:find) below to determine the list for a particular hub. :::{admonition} This is not a legal document @@ -17,11 +17,16 @@ The services below are common touchpoints for user data (e.g., their identity, f ### Cloud or infrastructure provider -An infrastructure provider (like a cloud provider) hosts the core infrastructure that runs the hub, and provides **compute**, **storage**, **logging**, **username database**, etc. -This includes the files that a user stores when they do their work (e.g. notebooks, data, etc). - +An infrastructure provider (like a cloud provider) hosts the core infrastructure that runs the hub. This includes **compute**, **storage**, and **logging**. +The cloud region is configured per hub and determines where this data is stored. 2i2c either manages its own cloud provider account, or uses one managed by the community. +Personal data processed by this sub-processor: + +- User home directory contents (e.g., notebooks, data files created during use) +- The JupyterHub username / authentication database +- System and access logs + Here are a few common cloud providers we use and their sub-processors pages: - **Amazon Web Services (AWS)**. [GDPR Center](https://aws.amazon.com/compliance/gdpr-center/), [sub-processors](https://aws.amazon.com/compliance/sub-processors/) @@ -30,8 +35,14 @@ Here are a few common cloud providers we use and their sub-processors pages: ### Identity provider -The service used to authenticate users when they log in. It receives the user identifiers needed to grant access to the hub (e.g., GitHub username or email). -Generally speaking, the hub _uses_ these providers to authenticate but does not store authentication on the providers, the user or their home institution have a direct relationship with the identity provider. +The service used to authenticate users when they log in. +The hub _uses_ these providers to authenticate but does not store authentication credentials on them. The user or their home institution has a direct relationship with the identity provider. + +Personal data processed by this sub-processor (exchanged at login only): + +- Username +- Email address (depending on the provider) +- Membership in a designated organization (where applicable) Here are a few common ones we use: