Skip to content

chore: add restore database script#527

Open
dan2k3k4 wants to merge 3 commits into
devfrom
add-restore-database-script
Open

chore: add restore database script#527
dan2k3k4 wants to merge 3 commits into
devfrom
add-restore-database-script

Conversation

@dan2k3k4
Copy link
Copy Markdown
Member

@dan2k3k4 dan2k3k4 commented May 22, 2026

Greptile Summary

Adds a new scripts/restore_database.py utility that restores a PostgreSQL database from a Lagoon .tar.gz backup: it optionally takes a pre-restore safety backup via pg_dump -Fc, drops and recreates the target database, extracts the nested archive with path-traversal protection, and applies the dump with pg_restore -Fd.

  • Safety backup now uses pg_dump --format=custom capturing all schema objects (sequences, indexes, views, functions, triggers), a significant improvement over row-by-row SQL reconstruction.
  • Archive extraction validates every member against the destination path and rejects non-regular-file/non-directory entries before calling extractall.
  • When --yes is passed and the backup fails, the script now correctly aborts rather than falling through to drop the live database — however, the pg_dump error reason is silently discarded at the except SystemExit: call site in main(), leaving the user with no diagnostic information about why the backup failed.

Confidence Score: 3/5

The script is safe to merge for non-automated use, but the missing error message on backup failure makes production/pipeline use opaque and harder to recover from.

The backup failure path catches SystemExit without binding the exception, so the actual pg_dump error (authentication failure, disk full, connection refused) is never shown to the user. In --yes mode the script exits cleanly, but operators get no clue what went wrong and must re-run pg_dump manually to diagnose. Interactive mode similarly silently drops the reason before prompting "Continue without backup?". This is a real defect in the error-reporting path of a destructive operation.

scripts/restore_database.py — specifically the except SystemExit: block in main() around the backup step.

Important Files Changed

Filename Overview
scripts/restore_database.py New script to restore a PostgreSQL database from a Lagoon .tar.gz backup; uses pg_dump -Fc for safety backup and pg_restore -Fd for restore, with path-traversal protection in archive extraction. One P1: pg_dump failure reason is silently dropped when the SystemExit is caught in main().

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Start]) --> B{--no-backup?}
    B -- No --> C[pg_dump -Fc → safety_backup.dump]
    C -- Success --> D[Step 2: Extract]
    C -- Failure --> E{--yes?}
    E -- Yes --> F[sys.exit 1\n⚠ error reason discarded]
    E -- No --> G{User: continue?}
    G -- No --> H([Abort])
    G -- Yes --> D
    B -- Yes --> D
    D[Extract .tar.gz\nsafe_extract_tar validates\nreg/dir only + path bounds] --> I[Locate toc.dat\ndump directory]
    I --> J[Step 3: dropdb --force\nthen createdb]
    J -- dropdb/createdb fails --> K([SystemExit with error])
    J -- Success --> L[Step 4: pg_restore -Fd\n--no-owner --no-privileges\n--exit-on-error]
    L -- Failure --> M([SystemExit + recovery hint])
    L -- Success --> N([Restore complete\ncleanup tmpdir])
Loading

Reviews (4): Last reviewed commit: "fix unresolved restore review findings" | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Copilot AI review requested due to automatic review settings May 22, 2026 06:53
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new operational script to restore a PostgreSQL database from Lagoon-provided backups (tar.gz containing pg_dump directory-format artifacts), including optional pre-restore “safety backup”, extraction, DB recreation, and a restore flow that streams table data via COPY ... FROM STDIN.

Changes:

  • Introduces scripts/restore_database.py to restore a DB from Lagoon .tar.gz backups by extracting and applying restore.sql + .dat files.
  • Adds optional pre-restore backup generation and interactive confirmations/disk space checks.
  • Implements a tar extraction safety check and a restore pipeline that rewrites COPY-from-file into COPY-from-STDIN streaming.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Fixed
@dan2k3k4 dan2k3k4 force-pushed the add-restore-database-script branch from 13cba29 to e789e4f Compare May 22, 2026 08:50
Comment thread scripts/restore_database.py Outdated
@dan2k3k4 dan2k3k4 force-pushed the add-restore-database-script branch 3 times, most recently from ace32e5 to 5d055e8 Compare May 22, 2026 19:00
Comment thread scripts/restore_database.py Outdated
@dan2k3k4 dan2k3k4 requested a review from dspachos May 22, 2026 19:54
Copy link
Copy Markdown
Contributor

@dspachos dspachos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Safety backup is completely broken — binary/text mode mismatch

(throughout backup_current_database)

The file is opened in binary mode ("wb") but every f.write(...) call passes a plain Python string. This raises TypeError: a bytes-like object is required, not
'str' on the very first write, meaning no backup data is ever written. Since this is the only safety net before a destructive DROP DATABASE, a failed restore
would leave the database unrecoverable.

  # Current (broken):
  with os.fdopen(fd, "wb") as f:
      f.write("-- Pre-restore safety backup\n")  # TypeError!

  # Fix: encode all string writes, or open in text mode and use a separate binary file for COPY

Either:

  • Encode all string writes: f.write("...".encode()), or
  • Open in text mode and handle copy_expert via a separate binary buffer

2. ALTER OWNER is a dead pattern — causes false-fatal aborts

Line 468

  non_critical_prefixes = ("COMMENT ", "GRANT ", "REVOKE ", "SET ", "SELECT ", "ALTER OWNER ")

No PostgreSQL statement starts with "ALTER OWNER". pg_dump generates ownership transfers as ALTER TABLE ... OWNER TO ... or ALTER SEQUENCE ... OWNER TO ....
When these fail (common when the target role doesn't exist), the script treats them as fatal and aborts — leaving a dropped, empty database. The latest commit
appears to have fixed this with an in-body check (" OWNER TO " in block_upper), but the old dead prefix is still in the prefix tuple. Verify the latest commit
actually removed it.

@dan2k3k4 dan2k3k4 force-pushed the add-restore-database-script branch from ee35b61 to 2532c99 Compare May 26, 2026 12:54
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Outdated
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
@dan2k3k4
Copy link
Copy Markdown
Member Author

@copilot implement fixes for unresolved review comments in this PR

Comment thread scripts/restore_database.py Fixed
Comment thread scripts/restore_database.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI commented May 27, 2026

@copilot implement fixes for unresolved review comments in this PR

Implemented fixes for the unresolved review comments in commit 83a0451:

  • Decoded percent-encoded DATABASE_URL components correctly (including path edge cases).
  • Hardened tar extraction validation to reject non-regular/non-directory entries.
  • Updated restore configuration logging to avoid clear-text sensitive-data findings from CodeQL.

@dan2k3k4 dan2k3k4 requested a review from dspachos May 27, 2026 12:15
Comment thread scripts/restore_database.py
Copy link
Copy Markdown
Contributor

@dspachos dspachos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. pg_restore error diagnostics silently lost

restore_database.py:156apply_restore()

run_pg_tool(cmd, config, capture=False)

When capture=False, subprocess.run does not capture stdout/stderr — they stream to the terminal. But then in run_pg_tool (line 91), result.stderr is None, so the CalledProcessError is raised with an empty stderr string. The apply_restore SystemExit message becomes:

pg_restore failed: CalledProcessError(...) — with no actual error detail

This is a destructive operation (database already dropped). If restore fails, the operator has zero diagnostic information about why.

Fix: Either always capture stderr for error handling, or capture separately:

def run_pg_tool(cmd, config, *, capture=True, check=True):
    result = subprocess.run(
        cmd,
        env=pg_env(config),
        stdout=subprocess.PIPE if capture else None,
        stderr=subprocess.PIPE,        # ALWAYS capture stderr for sanitization
        text=True,
        check=False,
    )

2. Unsanitized pg_restore stderr leaks password to terminal

restore_database.py:156apply_restore()

Related to #1: capture=False sends pg_restore stderr directly to the terminal bypassing the sanitize() function. If pg_restore includes connection info in error messages, the database password is logged in cleartext.

Fix: Same as above — always capture stderr, sanitize it, then print it yourself.

dan2k3k4 and others added 2 commits May 29, 2026 12:20
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@dan2k3k4 dan2k3k4 force-pushed the add-restore-database-script branch from 51e4c51 to ae2bb6e Compare May 29, 2026 10:20
@dan2k3k4 dan2k3k4 force-pushed the add-restore-database-script branch from ae2bb6e to 674cd03 Compare May 29, 2026 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants