diff --git a/pages/_meta.js b/pages/_meta.js index 20b8f4e..560b6b0 100644 --- a/pages/_meta.js +++ b/pages/_meta.js @@ -14,9 +14,24 @@ export default { introduction: "Introduction", auto_drive: 'Auto Drive', sdk: 'Auto SDK', - evm: 'Auto EVM', - application_examples: 'Example Applications', - auto_agents_framework: 'Autonomys Agents Framework', + evm: { + title: 'Auto EVM', + theme: { + collapsed: true, + }, + }, + application_examples: { + title: 'Example Applications', + theme: { + collapsed: true, + }, + }, + auto_agents_framework: { + title: 'Autonomys Agents Framework', + theme: { + collapsed: true, + }, + }, llm_friendly_docs: { title: 'LLM-Friendly Docs', theme: { diff --git a/pages/auto_drive/_meta.js b/pages/auto_drive/_meta.js index d77b716..3e96e47 100644 --- a/pages/auto_drive/_meta.js +++ b/pages/auto_drive/_meta.js @@ -1,4 +1,5 @@ export default { 'index': 'Overview', 'pay_with_ai3': 'Pay with AI3', + 'rclone': 'rclone integration', } diff --git a/pages/auto_drive/rclone.mdx b/pages/auto_drive/rclone.mdx new file mode 100644 index 0000000..861cb36 --- /dev/null +++ b/pages/auto_drive/rclone.mdx @@ -0,0 +1,669 @@ +# Using rclone with Auto Drive + +## What is rclone? + +[rclone](https://rclone.org) is an open-source command-line program — often described as "rsync for cloud storage" — that syncs, copies, and serves files between local disks and 70+ cloud storage providers through a single, consistent interface. It is widely used for backups, archival, migrations, and as the backbone of many self-hosted file workflows. + +Highlights of what rclone offers out of the box: + +- **One CLI for many backends.** S3-compatible providers, Google Drive, Dropbox, Backblaze B2, Azure Blob, SFTP, WebDAV, and many more — all behind the same commands (`copy`, `sync`, `ls`, `mount`, etc.). +- **Resumable, parallel, deduplicating transfers.** Idempotent file copies, configurable parallelism, automatic retries, and skip-on-already-uploaded behavior make large transfers safe to interrupt and re-run. +- **Integrity verification.** Built-in checksum comparison via `rclone check` and `rclone md5sum` confirms that what's stored matches what was sent. +- **Mount as a filesystem.** `rclone mount` exposes a remote as a FUSE filesystem so any tool can read remote files as if they were local. +- **Serve over HTTP, WebDAV, FTP, S3.** `rclone serve` turns any remote into a server that other tools can read from. +- **Encryption, compression, and chunking.** Built-in transformers can transparently encrypt or compress data before upload. +- **Scheduling-friendly.** Predictable exit codes, structured logs, and dry-run support make it easy to embed in cron, systemd timers, GitHub Actions, GitLab CI, or any orchestration system. + +Because Auto Drive exposes an S3-compatible API, you can use rclone to talk to it the same way you would to AWS S3 — with one important difference: **Auto Drive is permanent, immutable, decentralized storage.** Files stored on the Autonomys Distributed Storage Network (DSN) cannot be modified, overwritten, or deleted. That is the core guarantee. rclone operations that assume mutability (`delete`, `purge`, overwrite, rename) will not work — by design. + +The rest of this page is the technical reference for configuring rclone against Auto Drive and using it day-to-day. + +--- + +## Quickstart + +**1. Get an API key.** Sign in at [ai3.storage](https://ai3.storage), open the **Developers** section, and create an API key. A free tier is available for getting started. + +**2. Install rclone.** + +```bash +# macOS +brew install rclone + +# Linux +curl https://rclone.org/install.sh | sudo bash + +# Windows +winget install Rclone.Rclone +``` + +**3. Configure the Auto Drive remote.** Create or edit `~/.config/rclone/rclone.conf` (Linux/macOS) or `%APPDATA%\rclone\rclone.conf` (Windows): + +```ini +[autodrive] +type = s3 +provider = Other +access_key_id = YOUR_API_KEY_HERE +secret_access_key = placeholder +endpoint = https://public.auto-drive.autonomys.xyz/s3 +no_check_bucket = true +list_version = 2 +``` + +**4. Verify the connection.** + +```bash +# List your buckets +rclone lsd autodrive: + +# Copy a file to permanent storage +rclone copy ./my-file.txt autodrive:my-archive/ --immutable + +# Download it back +rclone copy autodrive:my-archive/my-file.txt ./downloaded/ +``` + +--- + +## Configuration Reference + +| Option | Value | Notes | +| --- | --- | --- | +| `type` | `s3` | Use the generic S3 backend | +| `provider` | `Other` | Generic S3-compatible mode — no provider-specific assumptions | +| `access_key_id` | Your Auto Drive API key | From the **Developers** tab at [ai3.storage](https://ai3.storage) | +| `secret_access_key` | Any non-empty string | Required by rclone but ignored by Auto Drive | +| `endpoint` | `https://public.auto-drive.autonomys.xyz/s3` | Auto Drive S3 API base URL | +| `no_check_bucket` | `true` | Skip bucket existence check — buckets are created implicitly on first write | +| `list_version` | `2` | **Required.** Auto Drive implements ListObjectsV2 only; without this, rclone defaults to V1 and listings fail | + +For local development against a self-hosted Auto Drive instance: + +```ini +endpoint = http://localhost:3000/s3 +``` + +### Interactive setup + +If you prefer the interactive flow, run: + +```bash +rclone config +``` + +1. Select `n` for new remote +2. Name it `autodrive` +3. Select `s3` (Amazon S3 Compliant Storage Providers) +4. Select `Other` as the provider +5. Enter your API key as `access_key_id` +6. Enter `placeholder` for `secret_access_key` +7. Enter `https://public.auto-drive.autonomys.xyz/s3` as the endpoint +8. Accept defaults for the remaining options + +Then manually add `no_check_bucket = true` and `list_version = 2` to the config section. + +--- + +## Bucket Model + +Auto Drive treats the **first segment** of an object key as the bucket name. When you copy files to `autodrive:my-archive/`, rclone uses `my-archive` as the bucket and the remaining path as the key: + +| rclone path | Bucket | Key | +| --- | --- | --- | +| `autodrive:my-archive/file.txt` | `my-archive` | `file.txt` | +| `autodrive:my-archive/sub/file.txt` | `my-archive` | `sub/file.txt` | + +Buckets are created implicitly on first write — there is no `CreateBucket` call. `rclone lsd autodrive:` lists every bucket you have written to. + +Objects uploaded before bucket support was introduced (flat keys with no `/`) remain accessible in the `default` bucket. + +--- + +## Recommended Flags + +| Flag | Purpose | When to use | +| --- | --- | --- | +| `--immutable` | Prevent overwrite attempts on existing files | All upload commands | +| `-v` / `--verbose` | Show progress and transfer details | Debugging | +| `--log-file=FILE` | Write logs to a file | Production and CI/CD | +| `--transfers=N` | Number of parallel transfers (default: 4) | Large jobs | +| `--ignore-checksum` | Skip MD5 checksum comparison | Legacy objects without MD5 ETags | +| `--size-only` | Compare by size instead of checksum | Alternative to `--ignore-checksum` | + +### A note on checksums + +Auto Drive returns standard MD5 hashes as S3 ETags for all objects uploaded after MD5 ETag support was introduced. `rclone check` and `rclone md5sum` work correctly on those objects without any special flags. + +Objects uploaded before MD5 ETag support have no stored MD5 — for those legacy objects, add `--ignore-checksum` or `--size-only` to skip checksum comparison. + +--- + +## Operations Reference + +### Supported operations + +#### `copy` (upload) + +Copies files from source to destination without deleting source files. The primary command for archiving to Auto Drive. + +```bash +# Single file +rclone copy ./report.pdf autodrive:my-archive/reports/ --immutable + +# Whole directory +rclone copy ./backup/ autodrive:my-archive/backup-2026-05-03/ --immutable -v +``` + +Files larger than 5 MB automatically use multipart upload. + +#### `copy` (download) + +```bash +rclone copy autodrive:my-archive/reports/report.pdf ./downloaded/ +rclone copy autodrive:my-archive/backup-2026-05-03/ ./restored/ +``` + +Byte-range downloads are also supported. + +#### `cat` — read file contents to stdout + +```bash +rclone cat autodrive:my-archive/logs/app.log +``` + +#### `copyto` — copy a single file to a specific path + +```bash +rclone copyto ./local-file.txt autodrive:my-archive/archive/renamed-file.txt --immutable +``` + +#### `rcat` — upload from stdin + +```bash +echo "log entry" | rclone rcat autodrive:my-archive/logs/entry.txt --immutable +tar czf - ./project/ | rclone rcat autodrive:my-archive/archives/project.tar.gz --immutable +``` + +#### `ls` / `lsl` — list files + +```bash +rclone ls autodrive:my-archive/ +rclone lsl autodrive:my-archive/reports/ +``` + +Uses ListObjectsV2 with prefix and delimiter folding. + +#### `lsd` — list buckets or directories + +```bash +# All buckets +rclone lsd autodrive: + +# Top-level virtual directories in a bucket +rclone lsd autodrive:my-archive/ +``` + +The top-level form uses `ListBuckets`; the per-bucket form uses `ListObjectsV2` with delimiter folding. + +#### `tree` — recursive tree view + +```bash +rclone tree autodrive:my-archive/ +``` + +#### `md5sum`, `check`, `hashsum` + +```bash +rclone md5sum autodrive:my-archive/ +rclone check ./local/ autodrive:my-archive/ +rclone hashsum md5 autodrive:my-archive/ +``` + +All three use the MD5 ETag for verification. `rclone md5sum` is shorthand for `rclone hashsum md5`; only `md5` is supported because Auto Drive stores MD5 ETags. Add `--ignore-checksum` for legacy objects without a stored MD5. + +#### `size` — total size and count + +```bash +rclone size autodrive:my-archive/ +rclone size autodrive:my-archive/reports/ +``` + +### Supported with caveats + +#### `sync` — partially supported + +`rclone sync` makes the destination match the source, including deletions on the destination side. With Auto Drive, the upload side works but deletions fail with `403`. + +```bash +rclone sync ./local/ autodrive:my-archive/mirror/ --immutable +``` + +**Recommendation:** prefer `rclone copy` over `rclone sync`. The sync mental model assumes the ability to delete, which does not apply to permanent storage. + +#### `move` — partially supported + +```bash +rclone move ./local-files/ autodrive:my-archive/archive/ --immutable +``` + +The upload succeeds. Local source deletion depends on rclone's post-transfer behavior. Moves *from* Auto Drive will fail at the remote-delete step. + +#### `mount` — read-only fully supported, read-write partial + +```bash +mkdir -p /mnt/autodrive +rclone mount autodrive:my-archive/ /mnt/autodrive \ + --read-only \ + --vfs-cache-mode full \ + --vfs-read-chunk-size 5M +``` + +| Flag | Purpose | +| --- | --- | +| `--read-only` | Prevents confusing behavior from failed write/delete operations | +| `--vfs-cache-mode full` | Caches files locally for better read performance | +| `--vfs-read-chunk-size 5M` | Reads files in 5 MB chunks for efficient streaming | + +A read-write mount lets you create new files (which are uploaded normally), but `delete`, `rename`, and overwrite operations fail at the S3 layer. File explorers may show "deleted" files reappearing on refresh. Use `--read-only` unless you have a specific reason not to. + +Unmount with: + +```bash +fusermount -u /mnt/autodrive # Linux +umount /mnt/autodrive # macOS +``` + +### Unsupported by design + +These commands require mutability and will fail: + +- `rclone delete` — returns `403 Access Denied: Auto Drive storage is immutable. Objects cannot be deleted from the Autonomys DSN.` +- `rclone purge` — same 403 +- `rclone rmdir` / `rmdirs` — no `DeleteObject` to call +- `rclone cleanup` — returns `501`; the multipart-upload APIs it relies on (`ListMultipartUploads`, `AbortMultipartUpload`) are not implemented +- `rclone mkdir` — buckets are created implicitly on first write; explicit `CreateBucket` is not implemented + +### Operations summary + +| Command | Status | Notes | +| --- | --- | --- | +| `copy` (upload) | ✅ Supported | Use `--immutable` | +| `copy` (download) | ✅ Supported | Byte-range supported | +| `cat` | ✅ Supported | Stream file to stdout | +| `copyto` | ✅ Supported | Copy single file to specific path | +| `rcat` | ✅ Supported | Upload from stdin | +| `ls` / `lsl` | ✅ Supported | Uses ListObjectsV2 | +| `lsd` | ✅ Supported | Bucket and directory listing | +| `tree` | ✅ Supported | Recursive tree via ListObjectsV2 | +| `md5sum` | ✅ Supported | MD5 ETags on new objects | +| `check` | ✅ Supported | Same caveat as `md5sum` for legacy objects | +| `size` | ✅ Supported | Aggregates across prefix | +| `mount` (read-only) | ✅ Supported | Use `--read-only --vfs-cache-mode full` | +| `sync` | ⚠️ Partial | Upload-only; deletions error | +| `move` | ⚠️ Partial | Upload works; source delete may fail | +| `mount` (read-write) | ⚠️ Partial | New files work; delete/rename fail | +| `about` | ❌ Not implemented | No quota/usage API | +| `delete` | ❌ Unsupported by design | Returns 403 | +| `purge` | ❌ Unsupported by design | Returns 403 | +| `rmdir` / `rmdirs` | ❌ Unsupported by design | No deletion | +| `mkdir` | ❌ Not implemented | Buckets created implicitly | +| `cleanup` | ❌ Not implemented | Returns 501; multipart-upload APIs not implemented | + +--- + +## Workflows + +### Archive a local directory + +```bash +rclone copy ./important-data/ autodrive:my-archive/important-data/ \ + --immutable \ + -v +``` + +Re-running the same command after adding files only uploads the new ones (existing keys are skipped due to `--immutable`), making large transfers safely resumable. + +### Scheduled archival with cron + +A cron-ready archive script template: + +```bash +#!/usr/bin/env bash +set -euo pipefail + +SOURCE_DIR="${SOURCE_DIR:-/path/to/your/data}" +BUCKET="${BUCKET:-my-archive}" +DEST_PREFIX="${DEST_PREFIX:-daily}" +REMOTE="autodrive" +TRANSFERS="${TRANSFERS:-4}" + +TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S') +echo "[$TIMESTAMP] Starting archive: $SOURCE_DIR -> $REMOTE:$BUCKET/$DEST_PREFIX/" + +rclone copy "$SOURCE_DIR" "$REMOTE:$BUCKET/$DEST_PREFIX/" \ + --immutable \ + --transfers "$TRANSFERS" \ + -v +``` + +Crontab entry (daily at 02:00): + +``` +0 2 * * * BUCKET=my-archive SOURCE_DIR=/path/to/data /path/to/archive-script.sh >> /var/log/autodrive-archive.log 2>&1 +``` + +### Browse and verify + +```bash +rclone lsd autodrive: # all buckets +rclone lsd autodrive:my-archive/ # top-level dirs in bucket +rclone ls autodrive:my-archive/ # all files +rclone ls autodrive:my-archive/reports/ # files under prefix +rclone tree autodrive:my-archive/ # recursive tree +rclone check ./local/ autodrive:my-archive/ # verify checksums +rclone cat autodrive:my-archive/logs/app.log # stream a file +``` + +### Read-only mount + +```bash +mkdir -p /mnt/autodrive +rclone mount autodrive:my-archive/ /mnt/autodrive \ + --read-only \ + --vfs-cache-mode full \ + --vfs-read-chunk-size 5M +``` + +Then browse with any standard tool: + +```bash +ls /mnt/autodrive/ +cat /mnt/autodrive/reports/q1.pdf +``` + +### Migrate from another cloud provider + +The pattern is the same regardless of source remote: + +```bash +rclone copy :/ autodrive:// --immutable -v +``` + +```bash +# AWS S3 +rclone copy s3:my-source-bucket/data/ autodrive:migrated/aws-data/ --immutable -v + +# Google Cloud Storage +rclone copy gcs:my-gcs-bucket/archives/ autodrive:migrated/gcs-archives/ --immutable -v + +# Backblaze B2 +rclone copy b2:my-b2-bucket/ autodrive:migrated/b2-backup/ --immutable -v +``` + +Tips for large migrations: + +- Use `-v` or `--progress` to monitor progress +- Tune `--transfers=N` to your bandwidth +- Capture full logs with `--log-file=migration.log` +- Run inside `screen` or `tmux` for long jobs +- Use `--immutable` to make the migration resumable; if interrupted, just re-run the same command + +### CI/CD integration + +#### GitHub Actions + +```yaml +name: Archive Build Artifacts + +on: + push: + branches: [main] + +jobs: + build-and-archive: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - run: make build + + - name: Install rclone + run: curl https://rclone.org/install.sh | sudo bash + + - name: Configure rclone + run: | + mkdir -p ~/.config/rclone + cat > ~/.config/rclone/rclone.conf << 'EOF' + [autodrive] + type = s3 + provider = Other + access_key_id = ${{ secrets.AUTO_DRIVE_API_KEY }} + secret_access_key = placeholder + endpoint = https://public.auto-drive.autonomys.xyz/s3 + no_check_bucket = true + list_version = 2 + EOF + + - name: Archive build artifacts + run: | + TIMESTAMP=$(date +%Y%m%d-%H%M%S) + rclone copy ./dist/ \ + autodrive:ci-builds/${{ github.sha }}-${TIMESTAMP}/ \ + --immutable -v +``` + +#### GitLab CI + +```yaml +archive-artifacts: + stage: deploy + image: rclone/rclone:latest + script: + - mkdir -p ~/.config/rclone + - | + cat > ~/.config/rclone/rclone.conf << 'EOF' + [autodrive] + type = s3 + provider = Other + access_key_id = ${AUTO_DRIVE_API_KEY} + secret_access_key = placeholder + endpoint = https://public.auto-drive.autonomys.xyz/s3 + no_check_bucket = true + list_version = 2 + EOF + - TIMESTAMP=$(date +%Y%m%d-%H%M%S) + - rclone copy ./dist/ autodrive:ci-builds/${CI_COMMIT_SHA}-${TIMESTAMP}/ --immutable -v + only: + - main +``` + +### Compliance and audit logs + +Permanent storage meets retention requirements by construction — records cannot be tampered with or accidentally deleted. + +```bash +# Daily application logs +rclone copy /var/log/app/ autodrive:compliance/logs/$(date +%Y/%m/%d)/ \ + --immutable \ + --include "*.log" + +# Database backup +pg_dump mydb | rclone rcat autodrive:compliance/db-backups/$(date +%Y%m%d-%H%M%S).sql --immutable +``` + +### Pipe and stream + +```bash +# Compressed tarball +tar czf - ./project/ | rclone rcat autodrive:my-archive/archives/project-$(date +%Y%m%d).tar.gz --immutable + +# Database dump +mysqldump --all-databases | rclone rcat autodrive:backups/mysql-$(date +%Y%m%d).sql --immutable + +# Docker image +docker save my-app:latest | rclone rcat autodrive:docker-images/my-app-$(date +%Y%m%d).tar --immutable +``` + +### Run rclone in Docker + +The rclone official image works against Auto Drive with a config volume or env-driven config. Below is a `docker-compose.yml` that serves a bucket over HTTP (read-only) using `rclone serve http` — simpler than a FUSE mount in containers since FUSE requires `--privileged` or `--cap-add SYS_ADMIN`. + +```yaml +services: + rclone: + image: rclone/rclone:latest + container_name: rclone-autodrive + restart: unless-stopped + environment: + - AUTO_DRIVE_API_KEY=${AUTO_DRIVE_API_KEY:?Set AUTO_DRIVE_API_KEY environment variable} + - AUTO_DRIVE_BUCKET=${AUTO_DRIVE_BUCKET:-my-archive} + volumes: + - ./data:/data:ro + ports: + - "8181:8181" + entrypoint: ["/bin/sh", "-c"] + command: + - | + mkdir -p /root/.config/rclone + cat > /root/.config/rclone/rclone.conf << EOF + [autodrive] + type = s3 + provider = Other + access_key_id = $${AUTO_DRIVE_API_KEY} + secret_access_key = placeholder + endpoint = https://public.auto-drive.autonomys.xyz/s3 + no_check_bucket = true + list_version = 2 + EOF + + rclone serve http "autodrive:$${AUTO_DRIVE_BUCKET}/" \ + --addr :8181 \ + --read-only \ + --vfs-cache-mode full +``` + +To upload from inside the container: + +```bash +docker compose exec rclone rclone copy /data autodrive:my-archive/ --immutable +``` + +--- + +## Troubleshooting + +### Listing fails (404 NoSuchKey, or 500 on older deployments) + +**Symptom:** `rclone ls`, `rclone lsd`, `rclone check`, or `rclone sync` fail with `ListObjects ... 404 NoSuchKey`. (Older Auto Drive deployments returned `500 InternalServerError` for the same cause.) + +**Cause:** Auto Drive implements **ListObjectsV2** only. By default rclone's generic S3 backend (`provider = Other`) uses the older **ListObjectsV1** API; Auto Drive does not recognise it and falls through to a per-object lookup, which reports the key as missing. + +**Fix:** Set `list_version = 2` in your rclone config. + +```bash +rclone config update autodrive list_version 2 +``` + +### Checksum mismatch on legacy objects + +**Symptom:** + +``` +ERROR : file.txt: corrupted on transfer: md5 hash differ "abc123" vs "bafkr..." +``` + +**Cause:** The object was uploaded before MD5 ETag support shipped, so its ETag is the CID, which rclone interprets as a failed checksum. + +**Fix:** Add `--ignore-checksum` (or `--size-only`) to your command: + +```bash +rclone copy ./files/ autodrive:my-archive/ --immutable --ignore-checksum +``` + +Objects uploaded after the MD5 ETag feature do not need this flag. + +### Delete returns 403 + +**Symptom:** + +``` +ERROR : file.txt: Failed to delete: 403 Access Denied: Auto Drive storage is immutable. Objects cannot be deleted from the Autonomys DSN. +``` + +**Cause:** Auto Drive storage is permanent by design. The S3 layer rejects all DELETE requests. + +**Affected commands:** `rclone delete`, `rclone purge`, `rclone rmdir`, `rclone move` (delete phase), `rclone sync` (when the destination has files no longer in the source). + +**Fix:** Use `rclone copy` instead of `rclone sync` or `rclone move`. + +### Authentication errors + +**Symptoms:** `AccessDenied`, `401 Unauthorized`. + +**Possible causes:** + +1. The API key is incorrect or has been revoked +2. The API key has expired (if you set an expiry) +3. Your account has no remaining credits + +**Fix:** + +- Verify the key at [ai3.storage](https://ai3.storage) → Developers +- Create a new key if needed +- Check your credit balance and top up if required +- Confirm `access_key_id` in your rclone config matches the API key exactly + +### Connection timeouts + +**Symptom:** `Failed to copy: connection timed out`. + +**Fix:** + +- `curl -I https://public.auto-drive.autonomys.xyz/s3` to verify reachability +- Increase the timeout: `--timeout=60s` +- Add retries: `--retries=5 --retries-sleep=10s` + +### Immutable file conflict + +**Symptom:** `File exists and --immutable is set`. + +**Cause:** You are trying to upload a key that already exists, with `--immutable` set (as recommended). + +- If the content is the same, this is expected — the file is already safely stored. +- If you want to publish a new version, write to a new key (e.g. add a timestamp or version suffix). Removing `--immutable` and re-uploading to the same key creates a new object and updates the key mapping, but the old data still persists on the DSN. + +### Bucket not found + +**Symptom:** `NoSuchBucket: The specified bucket does not exist`. + +**Cause:** You are referencing a bucket that has never had any objects written to it. Buckets exist implicitly only after the first write. + +**Fix:** Write an object first, or correct the bucket name. Also confirm `no_check_bucket = true` is in your config so rclone does not attempt a `HeadBucket` probe before each operation. + +### Diagnostic commands + +```bash +# Inspect rclone config +rclone config show autodrive + +# Test endpoint reachability +curl -I https://public.auto-drive.autonomys.xyz/s3 + +# Verbose upload with full logging +rclone copy ./test.txt autodrive:test-bucket/ \ + --immutable -vv \ + --log-file=rclone-debug.log \ + --dump headers + +# rclone version (use 1.60+ for full S3 compatibility) +rclone version +``` + +--- + +## Further Reading + +- [Auto Drive S3 Layer Guide](/sdk/auto-drive/s3_layer) — the underlying S3-compatible API used by rclone +- [Auto Drive dashboard](https://ai3.storage) — manage your account, API keys, and credits +- [Autonomys Academy: Rewards and Fees](https://academy.autonomys.xyz/autonomys-network/rewards-and-fees) — how storage pricing works on the Autonomys Network +- [rclone documentation](https://rclone.org/docs/) — full rclone reference +- [Auto Drive rclone examples](https://github.com/autonomys/auto-drive/tree/main/docs/rclone/examples) — annotated config, archive script, and Docker compose recipe diff --git a/pages/sdk/auto-drive/_meta.js b/pages/sdk/auto-drive/_meta.js index a8b268b..9fd40fb 100644 --- a/pages/sdk/auto-drive/_meta.js +++ b/pages/sdk/auto-drive/_meta.js @@ -2,9 +2,9 @@ export default { 'overview_setup': 'Overview & Setup', 'create_api_key': 'Create an API Key', 'available_functions': 'Available Functions', + 'pay_with_ai3': 'Pay with AI3', 's3_layer': 'S3 Layer (S3 compatibility)', 'usage_examples': 'Usage Examples', - 'pay_with_ai3': 'Pay with AI3', 'encryption': 'File Encryption Specification', 'gateway': "Gateway", 'api_reference': 'API Reference', diff --git a/pages/sdk/auto-drive/s3_layer.mdx b/pages/sdk/auto-drive/s3_layer.mdx index 99331a0..e697456 100644 --- a/pages/sdk/auto-drive/s3_layer.mdx +++ b/pages/sdk/auto-drive/s3_layer.mdx @@ -2,33 +2,118 @@ ## Overview -**Auto Drive** provides an **S3-compatible API layer** that allows you to interact with **decentralized storage(DSN)** using standard **Amazon Web Services Simple Storage Service (AWS S3)** SDK commands. This bridges the gap between familiar cloud storage patterns and next-generation decentralized infrastructure, giving developers the best of both worlds: the **reliability and developer experience of S3 APIs** with the **permanence and censorship-resistance** of decentralized storage. +**Auto Drive** provides an **S3-compatible API layer** that allows you to interact with **decentralized storage (DSN)** using standard **Amazon Web Services Simple Storage Service (AWS S3)** SDK commands. This bridges the gap between familiar cloud storage patterns and next-generation decentralized infrastructure, giving developers the best of both worlds: the **reliability and developer experience of S3 APIs** with the **permanence and censorship-resistance** of decentralized storage. -For those unfamiliar, [Amazon Web Services Simple Storage Service (AWS S3)](https://aws.amazon.com/s3/) is an industry-standard object storage service that powers much of the modern web's file storage needs. **Auto Drive** maintains complete compatibility with S3's APIs while storing your data on a **decentralized network** instead of **centralized servers**. +For those unfamiliar, [Amazon Web Services Simple Storage Service (AWS S3)](https://aws.amazon.com/s3/) is an industry-standard object storage service that powers much of the modern web's file storage needs. **Auto Drive** maintains compatibility with the most-used parts of S3's APIs while storing your data on a **decentralized network** instead of **centralized servers**. + +> **Permanent and immutable by design.** Auto Drive storage is permanent. Objects on the Autonomys DSN cannot be modified, overwritten, or deleted. `DeleteObject` returns `403 Forbidden`. Re-uploading the same key creates a new object — the old data persists on the DSN. Plan your data model around this guarantee. ## How It Works -Auto Drive maintains an object_mappings table in the database that maps S3 object keys to Content Identifiers (CIDs). When you upload via S3 API, the system: -1. Stores the file content on the decentralized network (DSN) -2. Records the key-to-CID mapping in the database -3. Returns the CID as the ETag for S3 compatibility -4. Enables cross-API access between S3 and Auto Drive APIs +Auto Drive maps S3 (bucket, key) pairs to Content Identifiers (CIDs) and MD5 checksums. When you upload via the S3 API, the system: +1. Stores the file content on the decentralized network (DSN) +2. Computes the MD5 of the content and records the `(bucket, key) → (cid, md5)` mapping +3. Returns the **MD5 as the `ETag`** (standard S3 format) and exposes the **CID in the `x-amz-meta-cid` response header** +4. Enables cross-API access between the S3 API and the native Auto Drive API ## Key Features -### 1. **Standard S3 SDK Compatibility** +### 1. Standard S3 SDK and CLI Compatibility + +- Works with the official AWS S3 SDKs (`@aws-sdk/client-s3`, `boto3`), the **AWS CLI**, and S3-compatible tools like **[rclone](#rclone-integration)** +- Supported operations: `ListBuckets`, `ListObjectsV2`, `PutObject`, `GetObject`, `HeadObject`, multipart uploads (`CreateMultipartUpload`, `UploadPart`, `CompleteMultipartUpload`) +- No code changes required for existing S3 applications other than swapping the endpoint and credentials + +### 2. Buckets + +Buckets behave like standard S3 buckets. The **first path segment of the object key is the bucket name**, and the remainder is the key: + +| S3 path | Bucket | Key | +| --- | --- | --- | +| `my-archive/report.pdf` | `my-archive` | `report.pdf` | +| `my-archive/sub/file.txt` | `my-archive` | `sub/file.txt` | +| `test.txt` (no slash) | `default` | `test.txt` | + +Buckets are **created implicitly** on first write — there is no `CreateBucket` call. `ListBuckets` (`GET /`) returns every distinct bucket you have written to. + +```typescript +import { ListBucketsCommand } from "@aws-sdk/client-s3"; + +const result = await s3Client.send(new ListBucketsCommand({})); +console.log(result.Buckets); // [{ Name: "my-archive", CreationDate: ... }, ...] +``` + +Single-segment keys uploaded before bucket support was introduced remain accessible under the `default` bucket. + +### 3. ListObjectsV2 with Prefix, Delimiter, and Pagination -- Use official AWS S3 SDK (`@aws-sdk/client-s3`) -- Supports all major S3 operations: `PutObject`, `GetObject`, `HeadObject`, multipart uploads (`CreateMultipartUploadCommand`, `UploadPartCommand` & `CompleteMultipartUploadCommand`) -- No code changes required for existing S3 applications +Auto Drive implements `ListObjectsV2` end-to-end, with the parameters most clients depend on: -### 2. **Enhanced Metadata Support** +- **Prefix filtering** — `?prefix=subdir/` returns only keys starting with `subdir/` +- **Delimiter folding** — `?delimiter=/` collapses keys into `` virtual directories +- **Pagination** — `?max-keys=N` plus `?continuation-token=…` for cursor-based paging +- **Object size** — `` is populated from indexed metadata +- **MD5 ETag in listings** — listing entries return the MD5 ETag, so checksum verification does not require an extra `HeadObject` per object + +> Auto Drive implements **`ListObjectsV2` only**. The legacy `ListObjects` (V1) API is not implemented. rclone users must set `list_version = 2` in their remote config (see [rclone integration](#rclone-integration)). + +```typescript +import { ListObjectsV2Command } from "@aws-sdk/client-s3"; + +const result = await s3Client.send( + new ListObjectsV2Command({ + Bucket: "my-archive", + Prefix: "logs/", + Delimiter: "/", + MaxKeys: 100, + }) +); + +console.log(result.Contents); // objects directly under "logs/" +console.log(result.CommonPrefixes); // virtual subdirectories like "logs/2026/" + +if (result.IsTruncated) { + // Fetch the next page using result.NextContinuationToken +} +``` + +### 4. MD5 ETags + CID via `x-amz-meta-cid` + +`PutObject`, `GetObject`, `HeadObject`, `ListObjectsV2`, and `CompleteMultipartUpload` return a **standard quoted MD5 ETag** (e.g. `"d41d8cd98f00b204e9800998ecf8427e"`) for objects uploaded after the MD5 ETag feature shipped. For multipart uploads, the composite ETag follows the AWS format `"-"`. Legacy objects (those uploaded before the feature) have no stored MD5: `HEAD`/`GET` omit the `ETag` header entirely, and `ListObjectsV2` falls back to wrapping the CID as the ETag value. See the migration note below. + +The Autonomys CID is exposed on every object response as a custom header: + +```http +x-amz-meta-cid: bafkreig... +``` + +This makes `rclone check`, `rclone md5sum`, AWS CLI checksum verification, and any other MD5-based tooling work out of the box. + +> **Migration note.** Objects uploaded before the MD5 ETag feature shipped have a `NULL` MD5 in the database. They will not return an `ETag` header on `HEAD`/`GET` until they are re-uploaded. The CID is always available via `x-amz-meta-cid`. For listings of legacy objects, the CID is returned as a fallback. + +```typescript +import { HeadObjectCommand } from "@aws-sdk/client-s3"; + +const head = await s3Client.send( + new HeadObjectCommand({ Bucket: "my-archive", Key: "report.pdf" }) +); + +console.log(head.ETag); // '"d41d8cd98f00b204e9800998ecf8427e"' (MD5) +console.log(head.Metadata?.cid); // 'bafkreig...' (CID, via x-amz-meta-cid) +``` + +### 5. Deletion is Forbidden by Design + +`DeleteObject` always returns `403 Forbidden` with an informative message — Auto Drive storage is permanent. Tools that try to delete (e.g. `rclone delete`, `rclone purge`, `rclone sync` with deletions) will see the 403 and surface it as an error. Re-uploading the same key creates a new object pointing at new content; the prior data still exists on the DSN. + +### 6. Enhanced Metadata Support + +Custom user metadata is stored alongside the object and returned on subsequent `HEAD`/`GET` calls. Auto Drive recognizes `compression` and `encryption` metadata for content stored with those transforms applied client-side. ```typescript -// Compression and encryption metadata const command = new PutObjectCommand({ - Bucket: "https://public.auto-drive.autonomys.xyz/api/s3", + Bucket: "my-archive", Key: "file.txt", Body: buffer, Metadata: { @@ -38,132 +123,139 @@ const command = new PutObjectCommand({ }); ``` -### 3. **Range Requests** +### 7. Range Requests -- Partial file downloads supported -- Standard HTTP Range headers +Partial file downloads using standard HTTP `Range` headers are supported. ```typescript const command = new GetObjectCommand({ - Bucket: bucket, - Key: key, - Range: "bytes=0-9", // Download first 10 bytes + Bucket: "my-archive", + Key: "large-file.bin", + Range: "bytes=0-9", // first 10 bytes }); ``` -### 4. **Multipart Upload Support** +### 8. Multipart Uploads -- Full multipart upload workflow -- Create → Upload Parts → Complete pattern -- Automatic chunking for large files +The full multipart workflow (`CreateMultipartUpload` → `UploadPart` × N → `CompleteMultipartUpload`) is supported. `UploadPart` returns the MD5 of the part body; `CompleteMultipartUpload` returns the standard AWS composite ETag `"-"`. ```typescript -// Complete multipart upload example -const key = "large-file.txt"; -const fileContent = Buffer.from("Large file content..."); +const key = "large-file.bin"; +const Bucket = "my-archive"; // Step 1: Create multipart upload -const createCommand = new CreateMultipartUploadCommand({ - Bucket: "https://public.auto-drive.autonomys.xyz/api/s3", - Key: key, -}); -const createResult = await s3Client.send(createCommand); -const uploadId = createResult.UploadId!; - -// Step 2: Upload parts -const uploadPartCommand = new UploadPartCommand({ - Bucket: "https://public.auto-drive.autonomys.xyz/api/s3", - Key: key, - UploadId: uploadId, - PartNumber: 1, - Body: fileContent, -}); -const partResult = await s3Client.send(uploadPartCommand); - - -// Step 3: Complete multipart upload -const completeCommand = new CompleteMultipartUploadCommand({ - Bucket: "https://public.auto-drive.autonomys.xyz/api/s3", - Key: key, - UploadId: uploadId, - MultipartUpload: { - Parts: [ - { - ETag: partResult.ETag!, - PartNumber: 1, - }, - ], - }, -}); -const completeResult = await s3Client.send(completeCommand); +const { UploadId } = await s3Client.send( + new CreateMultipartUploadCommand({ Bucket, Key: key }) +); + +// Step 2: Upload one or more parts +const part1 = await s3Client.send( + new UploadPartCommand({ + Bucket, + Key: key, + UploadId: UploadId!, + PartNumber: 1, + Body: fileChunk1, + }) +); + +// Step 3: Complete the upload +const result = await s3Client.send( + new CompleteMultipartUploadCommand({ + Bucket, + Key: key, + UploadId: UploadId!, + MultipartUpload: { + Parts: [{ ETag: part1.ETag!, PartNumber: 1 }], + }, + }) +); + +console.log(result.ETag); // '"-1"' ``` ## Configuration -### Client Setup +### Endpoint + +| Environment | Endpoint | +| --- | --- | +| **Mainnet (public)** | `https://public.auto-drive.autonomys.xyz/s3` | + +### Client Setup (AWS JS SDK) + +The Auto Drive backend exposes objects under a single `/:key(*)` route and parses bucket and key from the path, so a path-style client works well in practice: ```typescript +import { S3Client } from "@aws-sdk/client-s3"; + const s3Client = new S3Client({ - region: "us-east-1", + region: "us-east-1", // required by the SDK; ignored by Auto Drive + endpoint: "https://public.auto-drive.autonomys.xyz/s3", credentials: { - accessKeyId: "your-auto-drive-api-key", // Your Auto Drive API key - secretAccessKey: "", // Always empty for Auto Drive + accessKeyId: "your-auto-drive-api-key", // your Auto Drive API key + secretAccessKey: "placeholder", // any non-empty string; ignored }, - bucketEndpoint: true, // Required for custom endpoints + forcePathStyle: true, // bucket lives in the path }); ``` -### Endpoint Configuration +Then use bucket names as you would with AWS S3: ```typescript -// The "Bucket" parameter becomes part of the endpoint URL -const Bucket = `${baseURL}/s3`; // e.g., "https://public.auto-drive.autonomys.xyz/api/s3" -// No actual S3 bucket is created - it's just URL routing +await s3Client.send( + new PutObjectCommand({ Bucket: "my-archive", Key: "report.pdf", Body: buffer }) +); ``` -- Mainnet: `https://public.auto-drive.autonomys.xyz/api/s3` -- Base URL: http://localhost:3000/s3 (development) -- Bucket name becomes the full endpoint path -- No actual bucket concept - uses path-based routing + +> **Alternative: `bucketEndpoint` style.** The upstream backend integration tests configure the SDK with `bucketEndpoint: true` and bake the bucket into the endpoint path (e.g. `Bucket: "my-archive/s3"`). Either pattern works against the `/:key(*)` route; `forcePathStyle: true` with a plain bucket name is shown here because it's the more familiar S3 configuration. ## Authentication -- Uses Auto Drive API key-based authentication -- Integrates with Auto Drive's user management system -- API key goes in `accessKeyId`, `secretAccessKey` remains empty -- Supports the same authentication as the Auto Drive API +- Uses Auto Drive API key-based authentication (the same keys as the native Auto Drive API) +- The API key goes in `accessKeyId`; `secretAccessKey` must be present but is ignored (use any non-empty placeholder) +- Files uploaded via the S3 API are owned by the user the API key belongs to -## File Ownership & Access +Get an API key from the **Developers** section at [ai3.storage](https://ai3.storage). -- **Cross-API compatibility**: Files uploaded via S3 API are accessible through Auto Drive API and vice versa -- **Centralized ownership**: File ownership is tracked centrally, not per-API -- **Content deduplication**: Multiple users uploading identical content will share the same underlying CID -- **Shared access**: If different users upload the same file via different APIs, both can access it through either API +## File Ownership & Cross-API Access + +- **Cross-API compatibility** — files uploaded via the S3 API are accessible via the native Auto Drive API, and vice versa +- **Centralized ownership** — file ownership is tracked centrally per user, not per API surface +- **Content deduplication** — multiple users uploading identical content share the same underlying CID +- **Shared access** — if different users upload the same file, both can access it through either API ## Storage Characteristics ### Content Addressing -- Files are stored using Content Identifiers (CIDs) -- ETag returned is the actual CID of the uploaded content -- Immutable storage - same content always produces same CID +- Files are stored using Content Identifiers (CIDs) on the Autonomys DSN +- The CID is exposed via the `x-amz-meta-cid` response header on every object response (and as a fallback ETag for legacy objects) +- Storage is **immutable** — the same content always produces the same CID ### Decentralized Backend -- Files stored on the DSN of Autonomys Network (available on Autonomys Mainnet & Testnet) +- Files are stored on the Distributed Storage Network of the Autonomys Network - Automatic replication and redundancy - No single point of failure +## rclone Integration + +Auto Drive's S3 layer is fully [rclone](https://rclone.org)-compatible — including `rclone check`, `rclone md5sum`, virtual directory listings, multipart uploads, and pagination — so any rclone-driven workflow (archival, scheduled backups, mounts, cloud-to-cloud migration, CI/CD artifact storage) works against Auto Drive. + +For the dedicated walkthrough — quickstart, full config reference, every supported command, workflows, and troubleshooting — see the [Using rclone](/auto_drive/rclone) guide. + ## Migrating from AWS S3 For developers moving from traditional AWS S3: -1. **Update endpoint** to Auto Drive server URL -2. **Change credentials** to use Auto Drive API key (with empty secret) -3. **Set `bucketEndpoint`**: true in S3Client configuration -4. **Handle longer response times** due to blockchain network latency -5. **Expect CIDs as ETags** instead of MD5 hashes -6. **Update bucket references** to use full endpoint URLs -7. **Test multipart uploads** as they may behave slightly differently +1. **Update the endpoint** to `https://public.auto-drive.autonomys.xyz/s3` +2. **Change credentials** to use your Auto Drive API key as `accessKeyId`; set `secretAccessKey` to a placeholder +3. **Set `forcePathStyle: true`** in the S3 client configuration +4. **Drop bucket-creation calls** — buckets are created implicitly on first write +5. **Remove deletion logic** — `DeleteObject` returns 403; design around immutability +6. **Read CIDs from `x-amz-meta-cid`**, not from the ETag (the ETag is the standard MD5) +7. **Handle longer response times** due to network latency vs. AWS edge ```typescript // Before (AWS S3) @@ -178,40 +270,76 @@ const s3Client = new S3Client({ // After (Auto Drive) const s3Client = new S3Client({ region: "us-east-1", + endpoint: "https://public.auto-drive.autonomys.xyz/s3", credentials: { accessKeyId: "your-auto-drive-api-key", - secretAccessKey: "", + secretAccessKey: "placeholder", }, - bucketEndpoint: true, + forcePathStyle: true, }); ``` +## Building Your Own S3-Compatible Layer + +If you are building your own S3-compatible service on top of Autonomys storage, the [Auto SDK](https://github.com/autonomys/auto-sdk) ships reusable, framework-agnostic S3 server-side helpers in [`@autonomys/file-server`](https://www.npmjs.com/package/@autonomys/file-server): + +- `buildListResult(rows, prefix, delimiter, maxKeys)` — `ListObjectsV2` delimiter folding into `CommonPrefixes` plus `maxKeys` pagination with correct continuation-token placement +- `computeListObjectsDbLimit(maxKeys, delimiter)` — how many rows to fetch from storage for a single page +- `finalizeListObjects(params, fetchedRows, dbLimit)` — wraps `buildListResult` and applies the full-batch truncation override that prevents duplicate `CommonPrefixes` across page boundaries +- `md5Hex`, `formatETag`, `multipartETag` — S3 ETag computation, including the AWS composite multipart format +- Types: `S3ObjectListing`, `ListObjectsParams`, `ListObjectsResult` + +```typescript +import { + computeListObjectsDbLimit, + finalizeListObjects, + multipartETag, +} from "@autonomys/file-server"; + +const dbLimit = computeListObjectsDbLimit(maxKeys, delimiter); +const rows = await fetchSortedObjectsFromStorage(prefix, continuationToken, dbLimit); + +const result = finalizeListObjects( + { bucket, prefix, delimiter, maxKeys, continuationToken }, + rows, + dbLimit, +); +// result is ready to render as ListObjectsV2 XML + +const completedETag = multipartETag(parts.map((p) => p.etag)); +``` + +Auto Drive itself uses these helpers, so you get the same delimiter folding, pagination, and composite-ETag behavior the public service is validated against. + ## Limitations & Considerations ### Performance -- The DSN (on-chain) storage has higher latency than traditional S3 -- Multipart uploads recommended for files > 5MB -- Range requests may have different performance characteristics +- The DSN has higher write latency than traditional S3 — use multipart uploads for files > 5 MB +- Listings paginate at the server's configured `maxKeys` (default 1,000); requested values larger than 1,000 are clamped to the 1,000 hard cap +- Range requests may have different performance characteristics than AWS S3 ### Compatibility Notes -- Not all S3 features supported (e.g., versioning, lifecycle policies) -- Custom metadata handling for compression/encryption -- Bucket operations are virtual (no actual bucket creation) +- **Implemented**: `ListBuckets`, `ListObjectsV2` (with `prefix`, `delimiter`, `max-keys`, `continuation-token`), `PutObject`, `GetObject`, `HeadObject`, multipart uploads +- **Not implemented (return 501)**: `ListObjects` (V1) — use V2; `CreateBucket` — buckets are created implicitly; `CopyObject`; `DeleteObjects` (bulk/multi-object delete); `versioning`, `lifecycle policies`, `ACLs`, `bucket policies`, presigned URLs +- **Forbidden by design (return 403)**: `DeleteObject`, `DeleteBucket` +- **Effectively no-ops**: any operation that assumes mutability ## Best Practices -1. **Use Multipart Uploads** for files larger than 5MB -2. **Leverage Range Requests** for partial file access -3. **Include Compression/Encryption** metadata when needed -4. **Handle ETags as CIDs** for content verification -5. **Implement Retry Logic** for blockchain network delays +1. **Treat storage as append-only** — design data models around immutability rather than fighting it +2. **Use multipart uploads** for files larger than 5 MB +3. **Read CIDs from `x-amz-meta-cid`**, not from the ETag +4. **Use the MD5 ETag for integrity verification** — it matches `md5sum` of the content for non-multipart objects +5. **Paginate listings** with `MaxKeys` + `NextContinuationToken` rather than fetching everything in one call +6. **Implement retry logic** for occasional network-induced timeouts +7. **For rclone**, always pass `--immutable` and set `list_version = 2` ## Error Handling -- Standard S3 error responses -- Additional blockchain-specific error codes -- Network timeouts may be longer than traditional S3 +- Standard S3 XML error responses +- `403 Forbidden` for any deletion attempt — distinguish this from a permissions error in your client +- Network timeouts may be longer than traditional S3; configure your SDK accordingly -This S3 layer provides a familiar interface while leveraging the benefits of decentralized storage, making it easy to migrate existing S3-based applications to Auto Drive. \ No newline at end of file +This S3 layer provides a familiar interface while leveraging the benefits of decentralized storage, making it easy to migrate existing S3-based applications — and S3-compatible tooling like rclone and the AWS CLI — to Auto Drive.