chore: add restore database script#527
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new operational script to restore a PostgreSQL database from Lagoon-provided backups (tar.gz containing pg_dump directory-format artifacts), including optional pre-restore “safety backup”, extraction, DB recreation, and a restore flow that streams table data via COPY ... FROM STDIN.
Changes:
- Introduces
scripts/restore_database.pyto restore a DB from Lagoon.tar.gzbackups by extracting and applyingrestore.sql+.datfiles. - Adds optional pre-restore backup generation and interactive confirmations/disk space checks.
- Implements a tar extraction safety check and a restore pipeline that rewrites COPY-from-file into COPY-from-STDIN streaming.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
13cba29 to
e789e4f
Compare
ace32e5 to
5d055e8
Compare
dspachos
left a comment
There was a problem hiding this comment.
1. Safety backup is completely broken — binary/text mode mismatch
(throughout backup_current_database)
The file is opened in binary mode ("wb") but every f.write(...) call passes a plain Python string. This raises TypeError: a bytes-like object is required, not
'str' on the very first write, meaning no backup data is ever written. Since this is the only safety net before a destructive DROP DATABASE, a failed restore
would leave the database unrecoverable.
# Current (broken):
with os.fdopen(fd, "wb") as f:
f.write("-- Pre-restore safety backup\n") # TypeError!
# Fix: encode all string writes, or open in text mode and use a separate binary file for COPYEither:
- Encode all string writes: f.write("...".encode()), or
- Open in text mode and handle copy_expert via a separate binary buffer
2. ALTER OWNER is a dead pattern — causes false-fatal aborts
Line 468
non_critical_prefixes = ("COMMENT ", "GRANT ", "REVOKE ", "SET ", "SELECT ", "ALTER OWNER ")No PostgreSQL statement starts with "ALTER OWNER". pg_dump generates ownership transfers as ALTER TABLE ... OWNER TO ... or ALTER SEQUENCE ... OWNER TO ....
When these fail (common when the target role doesn't exist), the script treats them as fatal and aborts — leaving a dropped, empty database. The latest commit
appears to have fixed this with an in-body check (" OWNER TO " in block_upper), but the old dead prefix is still in the prefix tuple. Verify the latest commit
actually removed it.
ee35b61 to
2532c99
Compare
|
@copilot implement fixes for unresolved review comments in this PR |
Implemented fixes for the unresolved review comments in commit
|
dspachos
left a comment
There was a problem hiding this comment.
1. pg_restore error diagnostics silently lost
restore_database.py:156 — apply_restore()
run_pg_tool(cmd, config, capture=False)When capture=False, subprocess.run does not capture stdout/stderr — they stream to the terminal. But then in run_pg_tool (line 91), result.stderr is None, so the CalledProcessError is raised with an empty stderr string. The apply_restore SystemExit message becomes:
pg_restore failed: CalledProcessError(...)— with no actual error detail
This is a destructive operation (database already dropped). If restore fails, the operator has zero diagnostic information about why.
Fix: Either always capture stderr for error handling, or capture separately:
def run_pg_tool(cmd, config, *, capture=True, check=True):
result = subprocess.run(
cmd,
env=pg_env(config),
stdout=subprocess.PIPE if capture else None,
stderr=subprocess.PIPE, # ALWAYS capture stderr for sanitization
text=True,
check=False,
)2. Unsanitized pg_restore stderr leaks password to terminal
restore_database.py:156 — apply_restore()
Related to #1: capture=False sends pg_restore stderr directly to the terminal bypassing the sanitize() function. If pg_restore includes connection info in error messages, the database password is logged in cleartext.
Fix: Same as above — always capture stderr, sanitize it, then print it yourself.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
51e4c51 to
ae2bb6e
Compare
ae2bb6e to
674cd03
Compare
Greptile Summary
Adds a new
scripts/restore_database.pyutility that restores a PostgreSQL database from a Lagoon.tar.gzbackup: it optionally takes a pre-restore safety backup viapg_dump -Fc, drops and recreates the target database, extracts the nested archive with path-traversal protection, and applies the dump withpg_restore -Fd.pg_dump --format=customcapturing all schema objects (sequences, indexes, views, functions, triggers), a significant improvement over row-by-row SQL reconstruction.extractall.--yesis passed and the backup fails, the script now correctly aborts rather than falling through to drop the live database — however, the pg_dump error reason is silently discarded at theexcept SystemExit:call site inmain(), leaving the user with no diagnostic information about why the backup failed.Confidence Score: 3/5
The script is safe to merge for non-automated use, but the missing error message on backup failure makes production/pipeline use opaque and harder to recover from.
The backup failure path catches
SystemExitwithout binding the exception, so the actual pg_dump error (authentication failure, disk full, connection refused) is never shown to the user. In--yesmode the script exits cleanly, but operators get no clue what went wrong and must re-run pg_dump manually to diagnose. Interactive mode similarly silently drops the reason before prompting "Continue without backup?". This is a real defect in the error-reporting path of a destructive operation.scripts/restore_database.py — specifically the
except SystemExit:block inmain()around the backup step.Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A([Start]) --> B{--no-backup?} B -- No --> C[pg_dump -Fc → safety_backup.dump] C -- Success --> D[Step 2: Extract] C -- Failure --> E{--yes?} E -- Yes --> F[sys.exit 1\n⚠ error reason discarded] E -- No --> G{User: continue?} G -- No --> H([Abort]) G -- Yes --> D B -- Yes --> D D[Extract .tar.gz\nsafe_extract_tar validates\nreg/dir only + path bounds] --> I[Locate toc.dat\ndump directory] I --> J[Step 3: dropdb --force\nthen createdb] J -- dropdb/createdb fails --> K([SystemExit with error]) J -- Success --> L[Step 4: pg_restore -Fd\n--no-owner --no-privileges\n--exit-on-error] L -- Failure --> M([SystemExit + recovery hint]) L -- Success --> N([Restore complete\ncleanup tmpdir])Reviews (4): Last reviewed commit: "fix unresolved restore review findings" | Re-trigger Greptile