Add online shared_buffers resize without server restart#13
Open
Add online shared_buffers resize without server restart#13
Conversation
This eliminates MultiXactOffset wraparound and the 2^32 limit on the total number of multixid members. Multixids are still limited to 2^31, but this is a nice improvement because 'members' can grow much faster than the number of multixids. On such systems, you can now run longer before hitting hard limits or triggering anti-wraparound vacuums. Not having to deal with MultiXactOffset wraparound also simplifies the code and removes some gnarly corner cases. We no longer need to perform emergency anti-wraparound freezing because of running out of 'members' space, so the offset stop limit is gone. But you might still not want 'members' to consume huge amounts of disk space. For that reason, I kept the logic for lowering vacuum's multixid freezing cutoff if a large amount of 'members' space is used. The thresholds for that are roughly the same as the "safe" and "danger" thresholds used before, 2 billion transactions and 4 billion transactions. This keeps the behavior for the freeze cutoff roughly the same as before. It might make sense to make this smarter or configurable, now that the threshold is only needed to manage disk usage, but that's left for the future. Add code to pg_upgrade to convert multitransactions from the old to the new format, rewriting the pg_multixact SLRU files. Because pg_upgrade now rewrites the files, we can get rid of some hacks we had put in place to deal with old bugs and upgraded clusters. Bump catalog version for the pg_multixact/offsets format change. Author: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://www.postgresql.org/message-id/CACG%3DezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com
Per OS X buildfarm members.
Author: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://www.postgresql.org/message-id/CA%2BFpmFf-hWXtrC0Q3Cr_Xo78zuP_M_VC5xgWPOYOkwqOD0T8eg@mail.gmail.com
POSIX has for a long time defined the "j" length modifier for printf conversions as meaning the size of intmax_t or uintmax_t. We got away without supporting that so far, because we were not using intmax_t anywhere. However, commit e6be843 re-introduced upstream's use of intmax_t and PRIdMAX into zic.c. It emerges that on some platforms (at least FreeBSD and macOS), <inttypes.h> defines PRIdMAX as "jd", so that snprintf.c falls over if that is used. (We hadn't noticed yet because it would only be apparent if bad data is fed to zic, resulting in an error report, and even then the only visible symptom is a missing line number in the error message.) We could revert that decision from our copy of zic.c, but on the whole it seems better to update snprintf.c to support this standard modifier. There might well be extensions, now or in future, that expect it to work. I did this in the lazy man's way of translating "j" to either "l" or "ll" depending on a compile-time sizeof() check, just as was done long ago to support "z" for size_t. One could imagine promoting intmax_t to have full support in snprintf.c, for example converting fmtint()'s value argument and internal arithmetic to use [u]intmax_t not [unsigned] long long. But that'd be more work and I'm hesitant to do it anyway: if there are any platforms out there where intmax_t is actually wider than "long long", this would doubtless result in a noticeable speed penalty to snprintf(). Let's not go there until we have positive evidence that there's a reason to, and some way to measure what size of penalty we're taking. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3210703.1765236740@sss.pgh.pa.us
As in commit 59d6c03, use <function> rather than <structname> in the <title> to be consistent with how other functions in this module are documented. Oversights in commits dcf7e16 and 9ccc049. Author: Noboru Saito <noborusai@gmail.com> Discussion: https://postgr.es/m/CAAM3qn%2B7KraFkCyoJCHq6m%3DurxcoHPEPryuyYeg%3DQ0EjJxjdTA%40mail.gmail.com Backpatch-through: 18
The new columns, mode and started_by, indicate the vacuum
mode ('normal', 'aggressive', or 'failsafe') and the initiator of the
vacuum ('manual', 'autovacuum', or 'autovacuum_wraparound'),
respectively. This allows users and monitoring tools to better
understand VACUUM behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAOzEurQcOY-OBL_ouEVfEaFqe_md3vB5pXjR_m6L71Dcp1JKCQ@mail.gmail.com
The new column, started_by, indicates the initiator of the
analyze ('manual' or 'autovacuum'), helping users and monitoring tools
to better understand ANALYZE behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAA5RZ0suoicwxFeK_eDkUrzF7s0BVTaE7M%2BehCpYcCk5wiECpw%40mail.gmail.com
Presently, the "echo" and "quiet" variables are carted around to various functions, which is a bit tedious. To simplify things, this commit moves them into the vacuumingOptions struct and removes the related function parameters. While at it, remove some redundant initialization code in vacuumdb's main() function. This is preparatory work for a follow-up commit that will add a --dry-run option to vacuumdb. Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
This commit refactors the code for marking a ParallelSlot as idle to a new static inline function. This can be used to mark a slot that was obtained via ParallelSlotGetIdle() but that we don't intend to actually use for a query as idle again. This is preparatory work for a follow-up commit that will add a --dry-run option to vacuumdb. Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
This new option instructs vacuumdb to print, but not execute, the VACUUM and ANALYZE commands that would've been sent to the server. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
PostgreSQL's src/port/open.c has always set bInheritHandle = TRUE when opening files on Windows, making all file descriptors inheritable by child processes. This meant the O_CLOEXEC flag, added to many call sites by commit 1da569c (v16), was silently ignored. The original commit included a comment suggesting that our open() replacement doesn't create inheritable handles, but it was a mis- understanding of the code path. In practice, the code was creating inheritable handles in all cases. This hasn't caused widespread problems because most child processes (archive_command, COPY PROGRAM, etc.) operate on file paths passed as arguments rather than inherited file descriptors. Even if a child wanted to use an inherited handle, it would need to learn the numeric handle value, which isn't passed through our IPC mechanisms. Nonetheless, the current behavior is wrong. It violates documented O_CLOEXEC semantics, contradicts our own code comments, and makes PostgreSQL behave differently on Windows than on Unix. It also creates potential issues with future code or security auditing tools. To fix, define O_CLOEXEC to _O_NOINHERIT in master, previously used by O_DSYNC. We use different values in the back branches to preserve existing values. In pgwin32_open_handle() we set bInheritHandle according to whether O_CLOEXEC is specified, for the same atomic semantics as POSIX in multi-threaded programs that create processes. Backpatch-through: 16 Author: Bryan Green <dbryan.green@gmail.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> (minor adjustments) Discussion: https://postgr.es/m/e2b16375-7430-4053-bda3-5d2194ff1880%40gmail.com
The idea is to encourage more the use of these new routines across the tree, as these offer stronger type safety guarantees than palloc(). This batch of changes includes most of the trivial changes suggested by the author for src/backend/. A total of 334 files are updated here. Among these files, 48 of them have their build change slightly; these are caused by line number changes as the new allocation formulas are simpler, shaving around 100 lines of code in total. Similar work has been done in 0c3c5c3 and 31d3847. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
A comment in tuplesort.c was claiming that the code was defining INITIAL_MEMTUPSIZE so that it *does not* exceed ALLOCSET_SEPARATE_THRESHOLD, but the code actually ensures that we purposefully *do* exceed ALLOCSET_SEPARATE_THRESHOLD for the initial allocation of the tuples array, as per reasons detailed in the commentary of grow_memtuples(). Also, there's not much need to repeat the mention about ALLOCSET_SEPARATE_THRESHOLD in each location where INITIAL_MEMTUPSIZE is used, so remove those comments. Author: ChangAo Chen <cca5507@qq.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Discussion: https://postgr.es/m/tencent_6FA14F85D6B5B5291532D6789E07F4765C08%40qq.com
Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CABPTF7V8CbOXGePqrad6EH3Om7DRhNiO3C0rQ-62UuT7RdU-GQ@mail.gmail.com
The warning was showing up in the early stages of the meson build, when the contents of Makefile.global is generated based on the configuration of meson for PGXS. NM is added to pgxs_empty. This declaration is only used internally for the libpq sanity check, so there is no point in exposing it in PGXS. Oversight in 4a8e6f4. Reported-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/4423e01f-1e52-4f47-a6ca-05cc8081c888@eisentraut.org
REGRESS has forgotten about the test nbtree_half_dead_pages, and a .gitignore was missing from the module. Oversights in c085aab for REGRESS and 1e4e578 for the missing .gitignore. Discussion: https://postgr.es/m/aTipJA1Y1zVSmH3H@paquier.xyz
Buildfarm members skimmer and crake have reported that pg_upgrade running from v18 fails due to the changes of d52c24b, with the expectations that the objects removed in the test module injection_points should still be present post upgrades, but the test module does not have them anymore. The origin of the issue is that the following test modules depend on injection_points, but they do not drop the extension once the tests finish, leaving its traces in the dumps used for the upgrades: - gin, down to v17 - typcache, down to v18 - nbtree, HEAD-only Test modules have no upgrade requirements, as they are used only for.. Tests, so there is no point in keeping them around. An alternative solution would be to drop the databases created by these modules in AdjustUpgrade.pm, but the solution of this commit to drop the extension is simpler. Note that there would be a catch if using a solution based on AdjustUpgrade.pm as the database name used for the test runs differs between configure and meson: - configure relies on USE_MODULE_DB for the database name unicity, that would build a database name based on the *first* entry of REGRESS, that lists all the SQL tests. - meson relies on a "name" field. For example, for the test module "gin", the regression database is named "regression_gin" under meson, while it is more complex for configure, as of "contrib_regression_gin_incomplete_splits". So a AdjustUpgrade.pm would need a set of DROP DATABASE IF EXISTS to solve this issue, to cope with each build system. The failure has been caused by d52c24b, and the problem can happen with upgrade dumps from v17 and v18 to HEAD. This problem is not currently reachable in the back-branches, but it could be possible that a future change in injection_points in stable branches invalidates this theory, so this commit is applied down to v17 in the test modules that matter. Per discussion with Tom Lane and Heikki Linnakangas. Discussion: https://postgr.es/m/2899652.1765167313@sss.pgh.pa.us Backpatch-through: 17
pthread_exit() is added to the list of symbols allowed when building libpq. This has been reported as possible when libpq is statically linked to libcrypto, where pthread_exit() could be called. Reported-by: Torsten Rupp <torsten.rupp@gmx.net> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/19095-6d8256d0c37d4be2@postgresql.org
These functions took a ResourceOwner argument, but only checked if it was NULL, and then used CurrentResourceOwner for the actual work. Surely the intention was to use the passed-in resource owner. All current callers passed CurrentResourceOwner or NULL, so this has no consequences at the moment, but it's an accident waiting to happen for future caller and extensions. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com Backpatch-through: 17
This function gets the list of relations associated with the publication but the comment said the opposite. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CANhcyEV3C_CGBeDtjvKjALDJDMH-Uuc9BWfSd=eck8SCXnE=fQ@mail.gmail.com
Phase I vacuum gives the page a once-over after pruning and freezing to check that the values of all_visible and all_frozen agree with the result of heap_page_is_all_visible(). This is meant to keep the logic in phase I for determining visibility in sync with the logic in phase III. Rewrite the assertion to avoid an Assert(false). Suggested by Andres Freund. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y%40sy4ymcdtdklo
The comment above heap_xlog_visible() about the critical integrity requirement for PD_ALL_VISIBLE and the visibility map should also be in heap_xlog_prune_freeze() where we set PD_ALL_VISIBLE. Oversight in add323d Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
1. The test initially focuses on the "parent" table, then switches to the "part" table, and goes back to the "parent" table. That seems a little weird, so move the tests around so that all the commands on the "parent" table are done first, followed by the "part" table. 2. ALTER TABLE ALTER COLUMN SET EXPRESSION was not tested, so add that. Author: jian he <jian.universality@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CACJufxFDi7fnwB-8xXd_ExML7-7pKbTaK4j46AJ=4-14DXvtVg@mail.gmail.com
The test seemed to incorrectly think that query_safe() takes an argument that describes what the query does, similar to e.g. command_ok(). Until commit bd8d9c9 the extra arguments were harmless and were just ignored, but when commit bd8d9c9 introduced a new optional argument to query_safe(), the extra arguments started clashing with that, causing the test to fail. Backpatch to v17, that's the oldest branch where the test exists. The extra arguments didn't cause any trouble on the older branches, but they were clearly bogus anyway.
It's only useful for an ILIKE optimization for the libc provider using a single-byte encoding and a non-C locale, but it creates significant internal complexity. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
True if character has multiple case forms. Will be a useful multibyte-aware replacement for char_is_cased(). Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Always return TIDs in descending order when returning groups of TIDs from an nbtree posting list tuple during nbtree backwards scans. This makes backwards scans tend to require fewer buffer hits, since the scan is less likely to repeatedly pin and unpin the same heap page/buffer (we'll get exactly as many buffer hits as we get with a similar forwards scan case). Commit 0d861bb, which added nbtree deduplication, originally did things this way to avoid interfering with _bt_killitems's approach to setting LP_DEAD bits on posting list tuples. _bt_killitems makes a soft assumption that it can always iterate through posting lists in ascending TID order, finding corresponding killItems[]/so->currPos.items[] entries in that same order. This worked out because of the prior _bt_readpage backwards scan behavior. If we just changed the backwards scan posting list logic in _bt_readpage, without altering _bt_killitems itself, it would break its soft assumption. Avoid that problem by sorting the so->killedItems[] array at the start of _bt_killitems. That way the order that dead items are saved in from btgettuple can't matter; so->killedItems[] will always be in the same order as so->currPos.items[] in the end. Since so->currPos.items[] is now always in leaf page order, regardless of the scan direction used within _bt_readpage, and since so->killedItems[] is always in that same order, the _bt_killitems loop can continue to make a uniform assumption about everything being in page order. In fact, sorting like this makes the previous soft assumption about item order into a hard invariant. Also deduplicate the so->killedItems[] array after it is sorted. That way there's no risk of the _bt_killitems loop becoming confused by a duplicate dead item/TID. This was possible in cases that involved a scrollable cursor that encountered the same dead TID more than once (within the same leaf page/so->currPos context). This doesn't come up very much in practice, but it seems best to be as consistent as possible about how and when _bt_killitems will LP_DEAD-mark index tuples. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Mircea Cadariu <cadariu.mircea@gmail.com> Reviewed-By: Victor Yegorov <vyegorov@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=Wut2pKvbW-u3hJ_LXwsYeiXHiW8oN1GfbKPavcGo8Ow@mail.gmail.com
Although clang claims to be compatible with gcc's printf format archetypes, this appears to be a falsehood: it likes __syslog__ (which gcc does not, on most platforms) and doesn't accept gnu_printf. This means that if you try to use gcc with clang++ or clang with g++, you get compiler warnings when compiling printf-like calls in our C++ code. This has been true for quite awhile, but it's gotten more annoying with the recent appearance of several buildfarm members that are configured like this. To fix, run separate probes for the format archetype to use with the C and C++ compilers, and conditionally define PG_PRINTF_ATTRIBUTE depending on __cplusplus. (We could alternatively insist that you not mix-and-match C and C++ compilers; but if the case works otherwise, this is a poor reason to insist on that.) No back-patch for now, but we may want to do that if this patch survives buildfarm testing. Discussion: https://postgr.es/m/986485.1764825548@sss.pgh.pa.us
Oversight in commit bd8d9c9. Discussion: https://postgr.es/m/CAH2-WzmvwVKZ+0Z=RL_+g_aOku8QxWddDCXmtyLj02y+nYaD0g@mail.gmail.com
An array of LLVMBasicBlockRef is allocated with the size used for an element being "LLVMBasicBlockRef *" rather than "LLVMBasicBlockRef". LLVMBasicBlockRef is a type that refers to a pointer, so this did not directly cause a problem because both should have the same size, still it is incorrect. This issue has been spotted while reviewing a different patch, and exists since 2a0faed, so backpatch all the way down. Discussion: https://postgr.es/m/CA+hUKGLngd9cKHtTUuUdEo2eWEgUcZ_EQRbP55MigV2t_zTReg@mail.gmail.com Backpatch-through: 14
GUC is a commonly used term but previously appeared only in the acronym documentation. This commit adds glossary and documentation index entries for GUC to make it easier to find and understand. Author: Robert Treat <rob@xzilla.net> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CABV9wwPQnkeo_G6-orMGnHPK9SXGVWm7ajJPzsbE6944tDx=hQ@mail.gmail.com
Followup to 44f4951. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAHGQGwH_UfN96vcvLGA%3DYro%2Bo6qCn0nEgEGoviwzEiLTHtt2Pw%40mail.gmail.com Backpatch-through: 18
Introduced in b139bd3. Reported-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/tencent_121C1BB152CAF3195C99D56C@qq.com
During catcache searches, the most-recently searched entries are kept at the head of the list to speed up subsequent searches, keeping the "freshest" entries at its beginning. A rehash of the catcache was doing the opposite: fresh entries were moved to the tail of the newly-created buckets, causing a rehash to slow down a bit. When a rehash is done, this commit switches the code to use dlist_push_tail() instead of dlist_push_head(), so as fresh entries are kept at the head of the lists, not their tail. Author: ChangAo Chen <cca5507@qq.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/tencent_9EA10D8512B5FE29E7323F780A0749768708@qq.com
Commit c7eab0e changed the default password_encryption setting to 'scram-sha-256', so update the example for creating a user with an assigned password. In addition, commit 08951a7 added new options that in turn pass default tokens NOBYPASSRLS and NOREPLICATION to the CREATE ROLE command, so fix this omission as well for v16 and later. Reported-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/cff1ea60-c67d-4320-9e33-094637c2c4fb%40iki.fi Backpatch-through: 14
Reported-by: Xueyu Gao <gaoxueyu_hope@163.com> Discussion: https://www.postgresql.org/message-id/42b5c99a.856d.19b73d858e2.Coremail.gaoxueyu_hope%40163.com
…up()" This reverts commit f30848c. Due to buildfarm failures related to recovery conflicts while executing the WAIT FOR command. It appears that WAIT FOR still holds xmin for a while. Reported-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BL0OQR8dQtsNBSaA3FNNyQeFabdeRVzurm0b7Xtp592g%40mail.gmail.com
The USER and IN GROUP clauses of CREATE ROLE are deprecated, and commit 8e78f0a removed them from the CREATE ROLE syntax syntax synopsis in the docs. However, previously CREATE USER and CREATE GROUP docs still listed these clauses. Since CREATE USER is equivalent to CREATE ROLE ... WITH LOGIN and CREATE GROUP is equivalent to CREATE ROLE, their documented syntax synopsis should match CREATE ROLE to avoid confusion. Therefore this commit removes the deprecated USER and IN GROUP clauses from the CREATE USER and CREATE GROUP syntax synopsis in the docs. Author: Japin Li <japinli@hotmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/MEAPR01MB3031C30E72EF16CFC08C8565B687A@MEAPR01MB3031.ausprd01.prod.outlook.com
Fix comments that incorrectly described transformations performed by the "Avoid extra index searches through preprocessing" mechanism introduced by commit b3f1a13. Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-By: Chao Li <li.evan.chao@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/20251230190145.c3c88c5eb0f88b136adda92f@sraoss.co.jp Backpatch-through: 18
This commit does the following to get tests passing for MSVC/AArch64: * Implements spin_delay() with an ISB instruction (like we do for gcc/clang on AArch64). * Sets USE_ARMV8_CRC32C unconditionally. Vendor-supported versions of Windows for AArch64 require at least ARMv8.1, which is where CRC extension support became mandatory. * Implements S_UNLOCK() with _InterlockedExchange(). The existing implementation for MSVC uses _ReadWriteBarrier() (a compiler barrier), which is insufficient for this purpose on non-TSO architectures. There are likely other changes required to take full advantage of the hardware (e.g., atomics/arch-arm.h, simd.h, pg_popcount_aarch64.c), but those can be dealt with later. Author: Niyas Sait <niyas.sait@linaro.org> Co-authored-by: Greg Burd <greg@burd.me> Co-authored-by: Dave Cramer <davecramer@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Tested-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/A6152C7C-F5E3-4958-8F8E-7692D259FF2F%40greg.burd.me Discussion: https://postgr.es/m/CAFPTBD-74%2BAEuN9n7caJ0YUnW5A0r-KBX8rYoEJWqFPgLKpzdg%40mail.gmail.com
The isolation tests for INSERT ON CONFLICT behavior during CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY (added by bc32a12, 2bc7e88, and 90eae92) were disabled in 77038d6 due to persistent CI flakiness, after several attempts at stabilization. This commit removes them and introduces a TAP test in test_misc module (010_index_concurrently_upsert.pl) that covers the same scenarios. This new test should hopefully be more stable while providing assurance that the fixes in all those commits (plus 81f7211) continue to work. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/ccssrhafzbp3a3beju3ptyc56a7gbfimj4vwkbokoldofckrc7@bso37rxskjtf Discussion: https://postgr.es/m/CANtu0ogv+6wqRzPK241jik4U95s1pW3MCZ3rX5ZqbFdUysz7Qw@mail.gmail.com Discussion: https://postgr.es/m/202512112014.icpomgc37zx4@alvherre.pgsql
This commit adds tab completion support for the keywords "pstdin" and "pstdout" in the \copy command. "pstdin" is now suggested after FROM, and "pstdout" is suggested after TO, alongside filenames and other keywords. Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/20251231183953.95132e171e43abd5e9b78084@sraoss.co.jp
…tion When repurposing a standby to a logical replica, pg_createsubscriber uses for the new replica a set of configuration parameters saved into postgresql.auto.conf, to force recovery patterns when the physical replica is promoted. While not wrong in practice, this approach can cause issues when forcing again recovery on a logical replica or its base backup as the recovery parameters are not reset on the target server once pg_createsubscriber is done with the node. This commit aims at improving the situation, by changing the way recovery parameters are saved on the target node. Instead of writing all the configuration to postgresql.auto.conf, this file now uses an include_if_exists, that points to a pg_createsubscriber.conf. This new file contains all the recovery configuration, and is renamed to pg_createsubscriber.conf.disabled when pg_createsubscriber exits. This approach resets the recovery parameters, and offers the benefit to keep a trace of the setup used when the target node got promoted, for debugging purposes. If pg_createsubscriber.conf cannot be renamed (unlikely scenario), a warning is issued to inform users that a manual intervention may be required to reset this configuration. This commit includes a test case to demonstrate the problematic case: a standby node created from a base backup of what was the target node of pg_createsubscriber does not get confused when started. If removing this new logic, the test fails with the standby not able to start due to an incorrect recovery target setup, where the startup process fails quickly with a FATAL. I have provided the design idea for the patch, that Alyona has written (with some code adjustments from me). This could be considered as a bug, but after discussion this is put into the bucket for improvements. Redesigning pg_createsubscriber would not be acceptable in the stable branches anyway. Author: Alyona Vinter <dlaaren8@gmail.com> Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andrey Rudometov <unlimitedhikari@gmail.com> Discussion: https://postgr.es/m/CAGWv16K6L6Pzm99i1KiXLjFWx2bUS3DVsR6yV87-YR9QO7xb3A@mail.gmail.com
Remove all configure checks and workarounds for strnlen() missing. It is required by POSIX 2008. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/98ce805c-6103-421b-adc3-fcf8f3dddbe3%40eisentraut.org
rindex() has been removed from POSIX 2008. Replace the one remaining use with the equivalent and more standard strrchr(). Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/98ce805c-6103-421b-adc3-fcf8f3dddbe3%40eisentraut.org
Author: "Dewei Dai" <daidewei1970@163.com> Author: zengman <zengman@halodbtech.com> Author: Zhiyuan Su <suzhiyuan_pg@126.com> Discussion: https://postgr.es/m/2026010719201902382410@163.com Discussion: https://postgr.es/m/tencent_4DC563C83443A4B1082D2BFF@qq.com Discussion: https://postgr.es/m/44656d72.2a63.19b9b92b0a3.Coremail.suzhiyuan_pg@126.com
The only user-visible change is the fix in the "malformed pg_dependencies" error detail. That one is new in commit e1405aa, so no backpatching required.
Reported-by: Shinya Kato <shinya11.kato@gmail.com> Discussion: https://www.postgresql.org/message-id/CAOzEurS=PzRzGba3mpNXgEhbnQFA0dxXaU0ujCJ0aa9yMSH6Pw@mail.gmail.com
btree_gist's gist_inet_ops and gist_cidr_ops opclasses are fundamentally broken: they rely on an approximate representation of the inet values and hence sometimes miss rows they should return. We want to eventually get rid of them altogether, but as the first step on that journey, we should mark them not-opcdefault. To do that, roll up the preceding deltas since 1.2 into a new base script btree_gist--1.9.sql. This will allow installing 1.9 without going through a transient situation where gist_inet_ops and gist_cidr_ops are marked as opcdefault; trying to create them that way will fail if there's already a matching default opclass in the core system. Additionally provide btree_gist--1.8--1.9.sql, so that a database that's been pg_upgraded from an older version can be migrated to 1.9. I noted along the way that commit 57e3c51 had missed marking the gist_bool_ops support functions as PARALLEL SAFE. While that probably has little harmful effect (since AFAIK we don't check that when calling index support functions), this seems like a good time to make things consistent. Readers will also note that I removed the former habit of installing some opclass operators/functions with ALTER OPERATOR FAMILY, instead just rolling them all into the CREATE OPERATOR CLASS steps. The comment in btree_gist--1.2.sql that it's necessary to use ALTER for pg_upgrade reproducibility has been obsolete since we invented the amadjustmembers infrastructure. Nowadays, gistadjustmembers will force all operators and non-required support functions to have "soft" opfamily dependencies, regardless of whether they are installed by CREATE or ALTER. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
This patch completes the transition to making inet_ops be default for inet/cidr columns, rather than btree_gist's opclasses. Once we do that, though, pg_upgrade has a big problem. A dump from an older version will see btree_gist's opclasses as being default, so it will not mention the opclass explicitly in CREATE INDEX commands, which would cause the restore to create the indexes using inet_ops. Since that's not compatible with what's actually in the files, havoc would ensue. This isn't readily fixable, because the CREATE INDEX command strings are built by the older server's pg_get_indexdef() function; pg_dump hasn't nearly enough knowledge to modify those strings successfully. Even if we cared to put in the work to make that happen in pg_dump, it would be counterproductive because the end goal here is to get people off of these opclasses. Allowing such indexes to persist through pg_upgrade wouldn't advance that goal. Therefore, this patch just adds code to pg_upgrade to detect indexes that would be problematic and refuse to upgrade. There's another issue too: even without any indexes to worry about, pg_dump in binary-upgrade mode will reproduce the "CREATE OPERATOR CLASS ... DEFAULT" commands for btree_gist's opclasses, and those will fail because now we have a built-in opclass that provides a conflicting default. We could ask users to drop the btree_gist extension altogether before upgrading, but that would carry very severe penalties. It would affect perfectly-valid indexes for other data types, and it would drop operators that might be relied on in views or other database objects. Instead, put a hack in DefineOpClass to ignore the DEFAULT clauses for these opclasses when in binary-upgrade mode. This will result in installing a version of btree_gist that isn't quite the version it claims to be, but that can be fixed by issuing ALTER EXTENSION UPDATE afterwards. Since we don't apply that hack when not in binary-upgrade mode, it is now impossible to install any version of btree_gist less than 1.9 via CREATE EXTENSION. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
In the wake of the previous commit, this script will fail if executed via CREATE EXTENSION, so it's useless. Remove it, but keep the delta scripts, allowing old (even very old) versions of the btree_gist SQL objects to be upgraded to 1.9 via ALTER EXTENSION UPDATE after a pg_upgrade. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
Comprehensive design proposal for making shared_buffers dynamically adjustable via SIGHUP without requiring a PostgreSQL restart. Covers: - Detailed analysis of all NBuffers-dependent data structures and code paths - Cross-system prior art (MySQL/InnoDB, Oracle SGA, SQL Server, MariaDB) - Phased implementation: virtual address reservation, online grow, online shrink, dynamic hash table resizing - ProcSignalBarrier-based coordination protocol - 15 edge/corner cases analyzed (concurrent resize, crash recovery, pinned condemned buffers, huge pages, AIO interactions, etc.) - Portability layer for Linux, FreeBSD, macOS, Windows - Performance impact analysis with zero steady-state overhead goal - Testing strategy covering unit, concurrency, crash recovery, and stress - References to Dmitry Dolgov's active pgsql-hackers RFC patch series https://claude.ai/code/session_01BvQguZ27Xd1dpgKCfsuFos
Add infrastructure for dynamically resizing the shared buffer pool
via SIGHUP, without requiring a PostgreSQL restart. This implements
Phases 1-3 from the design document.
Key changes:
New GUC: max_shared_buffers (PGC_POSTMASTER)
When set > shared_buffers, reserves virtual address space at startup
for buffer pool arrays, enabling online resize up to that limit.
Default 0 means same as shared_buffers (no online resize, preserving
current behavior).
Changed GUC: shared_buffers (PGC_POSTMASTER -> PGC_SIGHUP)
Now dynamically adjustable via ALTER SYSTEM + pg_reload_conf() when
max_shared_buffers is configured. Check/assign hooks validate limits
and initiate resize requests.
Buffer pool memory management (buf_resize.c):
- BufferPoolReserveMemory(): reserves VA space via MAP_SHARED|MAP_NORESERVE
- Grow: commits memory, initializes new descriptors, emits barrier
- Shrink: drains condemned buffers (flush dirty, wait for unpins),
emits barrier, decommits memory via MADV_DONTNEED
- Zero steady-state overhead: base pointers never change, NBuffers
remains a plain int updated via ProcSignalBarrier
Coordination protocol:
- New PROCSIGNAL_BARRIER_BUFFER_POOL_RESIZE barrier type
- ProcessBarrierBufferPoolResize() updates backend-local NBuffers
- Serialized resize operations with progress tracking
Hash table pre-sizing:
- When max_shared_buffers > shared_buffers, the buffer lookup hash
table is pre-sized for MaxNBuffers, avoiding rehashing on grow
Separate memory allocation path:
- When max_shared_buffers is configured, buffer arrays are allocated
from separately-mapped memory regions instead of main shmem segment
- BufferManagerShmemSize() excludes array sizes from main shmem
- Global pointers (BufferDescriptors, BufferBlocks, etc.) remain
stable across resize operations
https://claude.ai/code/session_01BvQguZ27Xd1dpgKCfsuFos
Major fixes to make the online buffer pool resize actually work: 1. GUC variable indirection: The GUC variable for shared_buffers was pointing directly at NBuffers, which caused the GUC mechanism to overwrite NBuffers on SIGHUP before any actual resize occurred. Introduce SharedBuffersGUC as the GUC target variable. NBuffers is now only updated by the resize code (or at startup). 2. Postmaster/child coordination: The assign hook now properly distinguishes between postmaster (requests resize) and child processes (read current_buffers from shared memory). Replaced ProcSignalBarrier approach with SIGHUP-based coordination since the postmaster lacks a PGPROC for ConditionVariable waits. 3. ExecuteBufferPoolResize in postmaster loop: Added call to process_pm_reload_request() after ProcessConfigFile() but before SignalChildren(), ensuring the resize completes before children update their NBuffers. 4. MADV_POPULATE_WRITE fallback: Handle older kernels (pre-5.14) that don't support MADV_POPULATE_WRITE by falling back to manual page touching for memory commit. Tested: grow (32MB->64MB->128MB), shrink (128MB->48MB), grow-back (48MB->96MB), exceed-max rejection, resize under concurrent pgbench load (zero failed transactions), clean shutdown/restart persistence. https://claude.ai/code/session_01BvQguZ27Xd1dpgKCfsuFos
…writer drain Address code review findings across all severity levels: CRITICAL: - Add pg_read_barrier() in child processes after reading current_buffers atomic to pair with pg_write_barrier() in GrowBufferPool/ShrinkBufferPool, ensuring descriptor initialization is visible on ARM/POWER architectures - Add rollback logic to BufferPoolCommitMemory() for partial MADV_POPULATE_WRITE failures (release already-committed pages via MADV_DONTNEED when a later array fails) HIGH: - Replace bare arithmetic with mul_size()/add_size() throughout BufferPoolReserveMemory, BufferPoolCommitMemory, BufferPoolDecommitMemory to detect integer overflow on 32-bit systems - Fix stale comment claiming MAP_PRIVATE when code uses MAP_SHARED - Fix header comment claiming BufResizeLock (LWLock) when struct uses mutex spinlock - Update file header comment to describe SIGHUP-based coordination instead of removed ProcSignalBarrier approach MEDIUM: - Remove unused includes (postmaster/bgwriter.h, storage/ipc.h, storage/pg_shmem.h, storage/lwlock.h) - Add comment on magic number 16 in Assert (matches GUC minimum) Move buffer eviction from postmaster to bgwriter: - ShrinkBufferPool now only updates NBuffers and records condemned range - New BufPoolDrainCondemnedBuffers() runs in bgwriter main loop with full backend infrastructure (ResourceOwner, private refcounts) - Fixes SIGSEGV crash when EvictUnpinnedBuffer was called from postmaster https://claude.ai/code/session_01BvQguZ27Xd1dpgKCfsuFos
…sues Round 2 code review fixes addressing findings from 5 focused review agents. CRITICAL fixes: - Bgwriter stale decommit race: BufPoolDrainCondemnedBuffers now validates that status is still BUF_RESIZE_DRAINING and drain_from/drain_to match cached values before decommitting. Prevents MADV_REMOVE from destroying pages that a concurrent grow has reclaimed for active buffers. - GrowBufferPool active descriptor race: Skip reinitialization of descriptors that still have BM_TAG_VALID set from a cancelled shrink, as backends may hold active pins. Such buffers are naturally reused by clock sweep. - EXEC_BACKEND guard: Emit FATAL on platforms using EXEC_BACKEND (Windows) when max_shared_buffers is configured, since fork()-based MAP_SHARED inheritance is required. HIGH fixes: - Switch decommit to MADV_REMOVE for buffer blocks (always page-aligned). MADV_DONTNEED on MAP_SHARED|MAP_ANONYMOUS only zaps PTEs without releasing shmem-backed pages. MADV_REMOVE punches holes in the shmem backing store. Falls back to MADV_DONTNEED if MADV_REMOVE unavailable. - Make BufferPoolCommitMemory incremental: only commit [old, new) range instead of [0, new). Avoids re-touching live pages and ensures rollback on partial failure only affects the new range. MEDIUM fixes: - Remove dead BUF_RESIZE_COMPLETING enum value (never set anywhere) - Fix remaining overcount in drain: only count buffers that fail eviction, not all valid buffers before attempting eviction https://claude.ai/code/session_01BvQguZ27Xd1dpgKCfsuFos
7fae1ee to
41687fa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Allow
shared_buffersto be changed at runtime viaALTER SYSTEM+pg_reload_conf()(SIGHUP) without requiring a PostgreSQL restart, when a new GUCmax_shared_buffersis set at startup to reserve virtual address space.max_shared_buffersGUC (PGC_POSTMASTER) reserves VA space at startup viaMAP_SHARED|MAP_ANONYMOUS|MAP_NORESERVE, enabling online resize up to that limit with zero steady-state overheadshared_buffersfrom PGC_POSTMASTER to PGC_SIGHUP with check/assign hooks for runtime validation and resize requestsNBuffersatomically, signals children via SIGHUPNBuffersimmediately, bgwriter asynchronously drains condemned buffers (flush dirty, evict unpinned), then decommits memory viaMADV_REMOVEMaxNBuffersto avoid rehashing on growChanges
buf_resize.c(new, 864 lines): Core resize implementation — VA reservation, incremental commit/decommit, grow/shrink logic, bgwriter drain, GUC hooksbuf_resize.h(new): Shared memory control structure (BufPoolResizeCtl), function declarationsbuf_init.c: Dual-path initialization — uses pre-reserved memory whenmax_shared_buffersis configured, traditionalShmemInitStructotherwisepostmaster.c: CallsExecuteBufferPoolResize()afterProcessConfigFile()before signaling childrenbgwriter.c: CallsBufPoolDrainCondemnedBuffers()afterBgBufferSync()each cyclefreelist.c: Hash table pre-sized forMaxNBufferswhen configuredipci.c: CallsBufferPoolReserveMemory()andBufPoolResizeShmemInit()during shared memory setupguc_parameters.dat:shared_bufferschanged to PGC_SIGHUP with hooks; newmax_shared_buffersentryglobals.c/miscadmin.h: New globalsSharedBuffersGUC,MaxNBuffersDesign highlights
BufferDescriptors,BufferBlocks, etc.) never move;NBuffersremains a plain intcurrent_bufferswrites with descriptor initialization visibility; bgwriter validates drain range under spinlock before decommitting to prevent races with concurrent growshared_bufferson restartfork()forMAP_SHAREDinheritance; EXEC_BACKEND (Windows) emits FATAL ifmax_shared_buffersis set;MADV_REMOVEwithMADV_DONTNEEDfallback;MADV_POPULATE_WRITEwith manual page-touch fallback for kernels < 5.14Test plan
https://claude.ai/code/session_01BvQguZ27Xd1dpgKCfsuFos