PostgreSQL Source Code git master
Loading...
Searching...
No Matches
vacuumlazy.c
Go to the documentation of this file.
1/*-------------------------------------------------------------------------
2 *
3 * vacuumlazy.c
4 * Concurrent ("lazy") vacuuming.
5 *
6 * Heap relations are vacuumed in three main phases. In phase I, vacuum scans
7 * relation pages, pruning and freezing tuples and saving dead tuples' TIDs in
8 * a TID store. If that TID store fills up or vacuum finishes scanning the
9 * relation, it progresses to phase II: index vacuuming. Index vacuuming
10 * deletes the dead index entries referenced in the TID store. In phase III,
11 * vacuum scans the blocks of the relation referred to by the TIDs in the TID
12 * store and reaps the corresponding dead items, freeing that space for future
13 * tuples.
14 *
15 * If there are no indexes or index scanning is disabled, phase II may be
16 * skipped. If phase I identified very few dead index entries or if vacuum's
17 * failsafe mechanism has triggered (to avoid transaction ID wraparound),
18 * vacuum may skip phases II and III.
19 *
20 * If the TID store fills up in phase I, vacuum suspends phase I and proceeds
21 * to phases II and III, cleaning up the dead tuples referenced in the current
22 * TID store. This empties the TID store, allowing vacuum to resume phase I.
23 *
24 * In a way, the phases are more like states in a state machine, but they have
25 * been referred to colloquially as phases for so long that they are referred
26 * to as such here.
27 *
28 * Manually invoked VACUUMs may scan indexes during phase II in parallel. For
29 * more information on this, see the comment at the top of vacuumparallel.c.
30 *
31 * In between phases, vacuum updates the freespace map (every
32 * VACUUM_FSM_EVERY_PAGES).
33 *
34 * After completing all three phases, vacuum may truncate the relation if it
35 * has emptied pages at the end. Finally, vacuum updates relation statistics
36 * in pg_class and the cumulative statistics subsystem.
37 *
38 * Relation Scanning:
39 *
40 * Vacuum scans the heap relation, starting at the beginning and progressing
41 * to the end, skipping pages as permitted by their visibility status, vacuum
42 * options, and various other requirements.
43 *
44 * Vacuums are either aggressive or normal. Aggressive vacuums must scan every
45 * unfrozen tuple in order to advance relfrozenxid and avoid transaction ID
46 * wraparound. Normal vacuums may scan otherwise skippable pages for one of
47 * two reasons:
48 *
49 * When page skipping is not disabled, a normal vacuum may scan pages that are
50 * marked all-visible (and even all-frozen) in the visibility map if the range
51 * of skippable pages is below SKIP_PAGES_THRESHOLD. This is primarily for the
52 * benefit of kernel readahead (see comment in heap_vac_scan_next_block()).
53 *
54 * A normal vacuum may also scan skippable pages in an effort to freeze them
55 * and decrease the backlog of all-visible but not all-frozen pages that have
56 * to be processed by the next aggressive vacuum. These are referred to as
57 * eagerly scanned pages. Pages scanned due to SKIP_PAGES_THRESHOLD do not
58 * count as eagerly scanned pages.
59 *
60 * Eagerly scanned pages that are set all-frozen in the VM are successful
61 * eager freezes and those not set all-frozen in the VM are failed eager
62 * freezes.
63 *
64 * Because we want to amortize the overhead of freezing pages over multiple
65 * vacuums, normal vacuums cap the number of successful eager freezes to
66 * MAX_EAGER_FREEZE_SUCCESS_RATE of the number of all-visible but not
67 * all-frozen pages at the beginning of the vacuum. Since eagerly frozen pages
68 * may be unfrozen before the next aggressive vacuum, capping the number of
69 * successful eager freezes also caps the downside of eager freezing:
70 * potentially wasted work.
71 *
72 * Once the success cap has been hit, eager scanning is disabled for the
73 * remainder of the vacuum of the relation.
74 *
75 * Success is capped globally because we don't want to limit our successes if
76 * old data happens to be concentrated in a particular part of the table. This
77 * is especially likely to happen for append-mostly workloads where the oldest
78 * data is at the beginning of the unfrozen portion of the relation.
79 *
80 * On the assumption that different regions of the table are likely to contain
81 * similarly aged data, normal vacuums use a localized eager freeze failure
82 * cap. The failure count is reset for each region of the table -- comprised
83 * of EAGER_SCAN_REGION_SIZE blocks. In each region, we tolerate
84 * vacuum_max_eager_freeze_failure_rate of EAGER_SCAN_REGION_SIZE failures
85 * before suspending eager scanning until the end of the region.
86 * vacuum_max_eager_freeze_failure_rate is configurable both globally and per
87 * table.
88 *
89 * Aggressive vacuums must examine every unfrozen tuple and thus are not
90 * subject to any of the limits imposed by the eager scanning algorithm.
91 *
92 * Once vacuum has decided to scan a given block, it must read the block and
93 * obtain a cleanup lock to prune tuples on the page. A non-aggressive vacuum
94 * may choose to skip pruning and freezing if it cannot acquire a cleanup lock
95 * on the buffer right away. In this case, it may miss cleaning up dead tuples
96 * and their associated index entries (though it is free to reap any existing
97 * dead items on the page).
98 *
99 * After pruning and freezing, pages that are newly all-visible and all-frozen
100 * are marked as such in the visibility map.
101 *
102 * Dead TID Storage:
103 *
104 * The major space usage for vacuuming is storage for the dead tuple IDs that
105 * are to be removed from indexes. We want to ensure we can vacuum even the
106 * very largest relations with finite memory space usage. To do that, we set
107 * upper bounds on the memory that can be used for keeping track of dead TIDs
108 * at once.
109 *
110 * We are willing to use at most maintenance_work_mem (or perhaps
111 * autovacuum_work_mem) memory space to keep track of dead TIDs. If the
112 * TID store is full, we must call lazy_vacuum to vacuum indexes (and to vacuum
113 * the pages that we've pruned). This frees up the memory space dedicated to
114 * store dead TIDs.
115 *
116 * In practice VACUUM will often complete its initial pass over the target
117 * heap relation without ever running out of space to store TIDs. This means
118 * that there only needs to be one call to lazy_vacuum, after the initial pass
119 * completes.
120 *
121 * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
122 * Portions Copyright (c) 1994, Regents of the University of California
123 *
124 *
125 * IDENTIFICATION
126 * src/backend/access/heap/vacuumlazy.c
127 *
128 *-------------------------------------------------------------------------
129 */
130#include "postgres.h"
131
132#include "access/genam.h"
133#include "access/heapam.h"
134#include "access/htup_details.h"
135#include "access/multixact.h"
136#include "access/tidstore.h"
137#include "access/transam.h"
138#include "access/visibilitymap.h"
139#include "access/xloginsert.h"
140#include "catalog/storage.h"
141#include "commands/progress.h"
142#include "commands/vacuum.h"
143#include "common/int.h"
144#include "common/pg_prng.h"
145#include "executor/instrument.h"
146#include "miscadmin.h"
147#include "pgstat.h"
150#include "storage/bufmgr.h"
151#include "storage/freespace.h"
152#include "storage/latch.h"
153#include "storage/lmgr.h"
154#include "storage/read_stream.h"
155#include "utils/lsyscache.h"
156#include "utils/pg_rusage.h"
157#include "utils/timestamp.h"
158#include "utils/wait_event.h"
159
160
161/*
162 * Space/time tradeoff parameters: do these need to be user-tunable?
163 *
164 * To consider truncating the relation, we want there to be at least
165 * REL_TRUNCATE_MINIMUM or (relsize / REL_TRUNCATE_FRACTION) (whichever
166 * is less) potentially-freeable pages.
167 */
168#define REL_TRUNCATE_MINIMUM 1000
169#define REL_TRUNCATE_FRACTION 16
170
171/*
172 * Timing parameters for truncate locking heuristics.
173 *
174 * These were not exposed as user tunable GUC values because it didn't seem
175 * that the potential for improvement was great enough to merit the cost of
176 * supporting them.
177 */
178#define VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL 20 /* ms */
179#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 /* ms */
180#define VACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */
181
182/*
183 * Threshold that controls whether we bypass index vacuuming and heap
184 * vacuuming as an optimization
185 */
186#define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */
187
188/*
189 * Perform a failsafe check each time we scan another 4GB of pages.
190 * (Note that this is deliberately kept to a power-of-two, usually 2^19.)
191 */
192#define FAILSAFE_EVERY_PAGES \
193 ((BlockNumber) (((uint64) 4 * 1024 * 1024 * 1024) / BLCKSZ))
194
195/*
196 * When a table has no indexes, vacuum the FSM after every 8GB, approximately
197 * (it won't be exact because we only vacuum FSM after processing a heap page
198 * that has some removable tuples). When there are indexes, this is ignored,
199 * and we vacuum FSM after each index/heap cleaning pass.
200 */
201#define VACUUM_FSM_EVERY_PAGES \
202 ((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
203
204/*
205 * Before we consider skipping a page that's marked as clean in
206 * visibility map, we must've seen at least this many clean pages.
207 */
208#define SKIP_PAGES_THRESHOLD ((BlockNumber) 32)
209
210/*
211 * Size of the prefetch window for lazy vacuum backwards truncation scan.
212 * Needs to be a power of 2.
213 */
214#define PREFETCH_SIZE ((BlockNumber) 32)
215
216/*
217 * Macro to check if we are in a parallel vacuum. If true, we are in the
218 * parallel mode and the DSM segment is initialized.
219 */
220#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
221
222/* Phases of vacuum during which we report error context. */
232
233/*
234 * An eager scan of a page that is set all-frozen in the VM is considered
235 * "successful". To spread out freezing overhead across multiple normal
236 * vacuums, we limit the number of successful eager page freezes. The maximum
237 * number of eager page freezes is calculated as a ratio of the all-visible
238 * but not all-frozen pages at the beginning of the vacuum.
239 */
240#define MAX_EAGER_FREEZE_SUCCESS_RATE 0.2
241
242/*
243 * On the assumption that different regions of the table tend to have
244 * similarly aged data, once vacuum fails to freeze
245 * vacuum_max_eager_freeze_failure_rate of the blocks in a region of size
246 * EAGER_SCAN_REGION_SIZE, it suspends eager scanning until it has progressed
247 * to another region of the table with potentially older data.
248 */
249#define EAGER_SCAN_REGION_SIZE 4096
250
251typedef struct LVRelState
252{
253 /* Target heap relation and its indexes */
257
258 /* Buffer access strategy and parallel vacuum state */
261
262 /* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
264 /* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
266 /* Consider index vacuuming bypass optimization? */
268
269 /* Doing index vacuuming, index cleanup, rel truncation? */
273
274 /* VACUUM operation's cutoffs for freezing and pruning */
277 /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
281
282 /* Error reporting state */
283 char *dbname;
285 char *relname;
286 char *indname; /* Current index name */
287 BlockNumber blkno; /* used only for heap operations */
288 OffsetNumber offnum; /* used only for heap operations */
290 bool verbose; /* VACUUM VERBOSE? */
291
292 /*
293 * dead_items stores TIDs whose index tuples are deleted by index
294 * vacuuming. Each TID points to an LP_DEAD line pointer from a heap page
295 * that has been processed by lazy_scan_prune. Also needed by
296 * lazy_vacuum_heap_rel, which marks the same LP_DEAD line pointers as
297 * LP_UNUSED during second heap pass.
298 *
299 * Both dead_items and dead_items_info are allocated in shared memory in
300 * parallel vacuum cases.
301 */
302 TidStore *dead_items; /* TIDs whose index tuples we'll delete */
304
305 BlockNumber rel_pages; /* total number of pages */
306 BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
307
308 /*
309 * Count of all-visible blocks eagerly scanned (for logging only). This
310 * does not include skippable blocks scanned due to SKIP_PAGES_THRESHOLD.
311 */
313
314 BlockNumber removed_pages; /* # pages removed by relation truncation */
315 BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */
316
317 /* # pages newly set all-visible in the VM */
319
320 /*
321 * # pages newly set all-visible and all-frozen in the VM. This is a
322 * subset of new_all_visible_pages. That is, new_all_visible_pages
323 * includes all pages set all-visible, but
324 * new_all_visible_all_frozen_pages includes only those which were also
325 * set all-frozen.
326 */
328
329 /* # all-visible pages newly set all-frozen in the VM */
331
332 BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
333 BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
334 BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
335
336 /* Statistics output by us, for table */
337 double new_rel_tuples; /* new estimated total # of tuples */
338 double new_live_tuples; /* new estimated total # of live tuples */
339 /* Statistics output by index AMs */
341
342 /* Instrumentation counters */
346 /* Counters that follow are only for scanned_pages */
347 int64 tuples_deleted; /* # deleted from table */
348 int64 tuples_frozen; /* # newly frozen */
349 int64 lpdead_items; /* # deleted from indexes */
350 int64 live_tuples; /* # live tuples remaining */
351 int64 recently_dead_tuples; /* # dead, but not yet removable */
352 int64 missed_dead_tuples; /* # removable, but not removed */
353
354 /* State maintained by heap_vac_scan_next_block() */
355 BlockNumber current_block; /* last block returned */
356 BlockNumber next_unskippable_block; /* next unskippable block */
357 bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
358 Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
359
360 /* State related to managing eager scanning of all-visible pages */
361
362 /*
363 * A normal vacuum that has failed to freeze too many eagerly scanned
364 * blocks in a region suspends eager scanning.
365 * next_eager_scan_region_start is the block number of the first block
366 * eligible for resumed eager scanning.
367 *
368 * When eager scanning is permanently disabled, either initially
369 * (including for aggressive vacuum) or due to hitting the success cap,
370 * this is set to InvalidBlockNumber.
371 */
373
374 /*
375 * The remaining number of blocks a normal vacuum will consider eager
376 * scanning when it is successful. When eager scanning is enabled, this is
377 * initialized to MAX_EAGER_FREEZE_SUCCESS_RATE of the total number of
378 * all-visible but not all-frozen pages. For each eager freeze success,
379 * this is decremented. Once it hits 0, eager scanning is permanently
380 * disabled. It is initialized to 0 if eager scanning starts out disabled
381 * (including for aggressive vacuum).
382 */
384
385 /*
386 * The maximum number of blocks which may be eagerly scanned and not
387 * frozen before eager scanning is temporarily suspended. This is
388 * configurable both globally, via the
389 * vacuum_max_eager_freeze_failure_rate GUC, and per table, with a table
390 * storage parameter of the same name. It is calculated as
391 * vacuum_max_eager_freeze_failure_rate of EAGER_SCAN_REGION_SIZE blocks.
392 * It is 0 when eager scanning is disabled.
393 */
395
396 /*
397 * The number of eagerly scanned blocks vacuum failed to freeze (due to
398 * age) in the current eager scan region. Vacuum resets it to
399 * eager_scan_max_fails_per_region each time it enters a new region of the
400 * relation. If eager_scan_remaining_fails hits 0, eager scanning is
401 * suspended until the next region. It is also 0 if eager scanning has
402 * been permanently disabled.
403 */
406
407
408/* Struct for saving and restoring vacuum error information. */
415
416
417/* non-export function prototypes */
418static void lazy_scan_heap(LVRelState *vacrel);
420 const VacuumParams params);
422 void *callback_private_data,
423 void *per_buffer_data);
426 BlockNumber blkno, Page page,
427 bool sharelock, Buffer vmbuffer);
430 int nlpdead_items,
431 Buffer vmbuffer,
432 uint8 *vmbits);
434 BlockNumber blkno, Page page,
435 Buffer vmbuffer,
436 bool *has_lpdead_items, bool *vm_page_frozen);
438 BlockNumber blkno, Page page,
439 bool *has_lpdead_items);
440static void lazy_vacuum(LVRelState *vacrel);
444 Buffer buffer, OffsetNumber *deadoffsets,
445 int num_offsets, Buffer vmbuffer);
450 double reltuples,
454 double reltuples,
455 bool estimated_count,
461static void dead_items_alloc(LVRelState *vacrel, int nworkers);
462static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
463 int num_offsets);
466
467#ifdef USE_ASSERT_CHECKING
469 TransactionId OldestXmin,
470 bool *all_frozen,
471 TransactionId *visibility_cutoff_xid,
473#endif
475 TransactionId OldestXmin,
476 OffsetNumber *deadoffsets,
477 int ndeadoffsets,
478 bool *all_frozen,
479 TransactionId *visibility_cutoff_xid,
482static void vacuum_error_callback(void *arg);
485 int phase, BlockNumber blkno,
486 OffsetNumber offnum);
489
490
491
492/*
493 * Helper to set up the eager scanning state for vacuuming a single relation.
494 * Initializes the eager scan management related members of the LVRelState.
495 *
496 * Caller provides whether or not an aggressive vacuum is required due to
497 * vacuum options or for relfrozenxid/relminmxid advancement.
498 */
499static void
501{
505 float first_region_ratio;
507
508 /*
509 * Initialize eager scan management fields to their disabled values.
510 * Aggressive vacuums, normal vacuums of small tables, and normal vacuums
511 * of tables without sufficiently old tuples disable eager scanning.
512 */
513 vacrel->next_eager_scan_region_start = InvalidBlockNumber;
514 vacrel->eager_scan_max_fails_per_region = 0;
515 vacrel->eager_scan_remaining_fails = 0;
516 vacrel->eager_scan_remaining_successes = 0;
517
518 /* If eager scanning is explicitly disabled, just return. */
519 if (params.max_eager_freeze_failure_rate == 0)
520 return;
521
522 /*
523 * The caller will have determined whether or not an aggressive vacuum is
524 * required by either the vacuum parameters or the relative age of the
525 * oldest unfrozen transaction IDs. An aggressive vacuum must scan every
526 * all-visible page to safely advance the relfrozenxid and/or relminmxid,
527 * so scans of all-visible pages are not considered eager.
528 */
529 if (vacrel->aggressive)
530 return;
531
532 /*
533 * Aggressively vacuuming a small relation shouldn't take long, so it
534 * isn't worth amortizing. We use two times the region size as the size
535 * cutoff because the eager scan start block is a random spot somewhere in
536 * the first region, making the second region the first to be eager
537 * scanned normally.
538 */
539 if (vacrel->rel_pages < 2 * EAGER_SCAN_REGION_SIZE)
540 return;
541
542 /*
543 * We only want to enable eager scanning if we are likely to be able to
544 * freeze some of the pages in the relation.
545 *
546 * Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact
547 * are technically freezable, but we won't freeze them unless the criteria
548 * for opportunistic freezing is met. Only tuples with XIDs/MXIDs older
549 * than the FreezeLimit/MultiXactCutoff are frozen in the common case.
550 *
551 * So, as a heuristic, we wait until the FreezeLimit has advanced past the
552 * relfrozenxid or the MultiXactCutoff has advanced past the relminmxid to
553 * enable eager scanning.
554 */
555 if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
556 TransactionIdPrecedes(vacrel->cutoffs.relfrozenxid,
557 vacrel->cutoffs.FreezeLimit))
559
561 MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
562 MultiXactIdPrecedes(vacrel->cutoffs.relminmxid,
563 vacrel->cutoffs.MultiXactCutoff))
565
567 return;
568
569 /* We have met the criteria to eagerly scan some pages. */
570
571 /*
572 * Our success cap is MAX_EAGER_FREEZE_SUCCESS_RATE of the number of
573 * all-visible but not all-frozen blocks in the relation.
574 */
576
577 vacrel->eager_scan_remaining_successes =
580
581 /* If every all-visible page is frozen, eager scanning is disabled. */
582 if (vacrel->eager_scan_remaining_successes == 0)
583 return;
584
585 /*
586 * Now calculate the bounds of the first eager scan region. Its end block
587 * will be a random spot somewhere in the first EAGER_SCAN_REGION_SIZE
588 * blocks. This affects the bounds of all subsequent regions and avoids
589 * eager scanning and failing to freeze the same blocks each vacuum of the
590 * relation.
591 */
593
594 vacrel->next_eager_scan_region_start = randseed % EAGER_SCAN_REGION_SIZE;
595
598
599 vacrel->eager_scan_max_fails_per_region =
602
603 /*
604 * The first region will be smaller than subsequent regions. As such,
605 * adjust the eager freeze failures tolerated for this region.
606 */
607 first_region_ratio = 1 - (float) vacrel->next_eager_scan_region_start /
609
610 vacrel->eager_scan_remaining_fails =
611 vacrel->eager_scan_max_fails_per_region *
613}
614
615/*
616 * heap_vacuum_rel() -- perform VACUUM for one heap relation
617 *
618 * This routine sets things up for and then calls lazy_scan_heap, where
619 * almost all work actually takes place. Finalizes everything after call
620 * returns by managing relation truncation and updating rel's pg_class
621 * entry. (Also updates pg_class entries for any indexes that need it.)
622 *
623 * At entry, we have already established a transaction and opened
624 * and locked the relation.
625 */
626void
628 BufferAccessStrategy bstrategy)
629{
631 bool verbose,
632 instrument,
633 skipwithvm,
641 TimestampTz starttime = 0;
643 startwritetime = 0;
646 ErrorContextCallback errcallback;
647 char **indnames = NULL;
649
650 verbose = (params.options & VACOPT_VERBOSE) != 0;
651 instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
652 params.log_vacuum_min_duration >= 0));
653 if (instrument)
654 {
656 if (track_io_timing)
657 {
660 }
661 }
662
663 /* Used for instrumentation and stats report */
664 starttime = GetCurrentTimestamp();
665
667 RelationGetRelid(rel));
670 params.is_wraparound
673 else
676
677 /*
678 * Setup error traceback support for ereport() first. The idea is to set
679 * up an error context callback to display additional information on any
680 * error during a vacuum. During different phases of vacuum, we update
681 * the state so that the error context callback always display current
682 * information.
683 *
684 * Copy the names of heap rel into local memory for error reporting
685 * purposes, too. It isn't always safe to assume that we can get the name
686 * of each rel. It's convenient for code in lazy_scan_heap to always use
687 * these temp copies.
688 */
691 vacrel->relnamespace = get_namespace_name(RelationGetNamespace(rel));
692 vacrel->relname = pstrdup(RelationGetRelationName(rel));
693 vacrel->indname = NULL;
695 vacrel->verbose = verbose;
696 errcallback.callback = vacuum_error_callback;
697 errcallback.arg = vacrel;
698 errcallback.previous = error_context_stack;
699 error_context_stack = &errcallback;
700
701 /* Set up high level stuff about rel and its indexes */
702 vacrel->rel = rel;
704 &vacrel->indrels);
705 vacrel->bstrategy = bstrategy;
706 if (instrument && vacrel->nindexes > 0)
707 {
708 /* Copy index names used by instrumentation (not error reporting) */
709 indnames = palloc_array(char *, vacrel->nindexes);
710 for (int i = 0; i < vacrel->nindexes; i++)
712 }
713
714 /*
715 * The index_cleanup param either disables index vacuuming and cleanup or
716 * forces it to go ahead when we would otherwise apply the index bypass
717 * optimization. The default is 'auto', which leaves the final decision
718 * up to lazy_vacuum().
719 *
720 * The truncate param allows user to avoid attempting relation truncation,
721 * though it can't force truncation to happen.
722 */
725 params.truncate != VACOPTVALUE_AUTO);
726
727 /*
728 * While VacuumFailSafeActive is reset to false before calling this, we
729 * still need to reset it here due to recursive calls.
730 */
731 VacuumFailsafeActive = false;
732 vacrel->consider_bypass_optimization = true;
733 vacrel->do_index_vacuuming = true;
734 vacrel->do_index_cleanup = true;
735 vacrel->do_rel_truncate = (params.truncate != VACOPTVALUE_DISABLED);
737 {
738 /* Force disable index vacuuming up-front */
739 vacrel->do_index_vacuuming = false;
740 vacrel->do_index_cleanup = false;
741 }
742 else if (params.index_cleanup == VACOPTVALUE_ENABLED)
743 {
744 /* Force index vacuuming. Note that failsafe can still bypass. */
745 vacrel->consider_bypass_optimization = false;
746 }
747 else
748 {
749 /* Default/auto, make all decisions dynamically */
751 }
752
753 /* Initialize page counters explicitly (be tidy) */
754 vacrel->scanned_pages = 0;
755 vacrel->eager_scanned_pages = 0;
756 vacrel->removed_pages = 0;
757 vacrel->new_frozen_tuple_pages = 0;
758 vacrel->lpdead_item_pages = 0;
759 vacrel->missed_dead_pages = 0;
760 vacrel->nonempty_pages = 0;
761 /* dead_items_alloc allocates vacrel->dead_items later on */
762
763 /* Allocate/initialize output statistics state */
764 vacrel->new_rel_tuples = 0;
765 vacrel->new_live_tuples = 0;
766 vacrel->indstats = (IndexBulkDeleteResult **)
767 palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
768
769 /* Initialize remaining counters (be tidy) */
770 vacrel->num_index_scans = 0;
771 vacrel->num_dead_items_resets = 0;
772 vacrel->total_dead_items_bytes = 0;
773 vacrel->tuples_deleted = 0;
774 vacrel->tuples_frozen = 0;
775 vacrel->lpdead_items = 0;
776 vacrel->live_tuples = 0;
777 vacrel->recently_dead_tuples = 0;
778 vacrel->missed_dead_tuples = 0;
779
780 vacrel->new_all_visible_pages = 0;
781 vacrel->new_all_visible_all_frozen_pages = 0;
782 vacrel->new_all_frozen_pages = 0;
783
784 /*
785 * Get cutoffs that determine which deleted tuples are considered DEAD,
786 * not just RECENTLY_DEAD, and which XIDs/MXIDs to freeze. Then determine
787 * the extent of the blocks that we'll scan in lazy_scan_heap. It has to
788 * happen in this order to ensure that the OldestXmin cutoff field works
789 * as an upper bound on the XIDs stored in the pages we'll actually scan
790 * (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
791 *
792 * Next acquire vistest, a related cutoff that's used in pruning. We use
793 * vistest in combination with OldestXmin to ensure that
794 * heap_page_prune_and_freeze() always removes any deleted tuple whose
795 * xmax is < OldestXmin. lazy_scan_prune must never become confused about
796 * whether a tuple should be frozen or removed. (In the future we might
797 * want to teach lazy_scan_prune to recompute vistest from time to time,
798 * to increase the number of dead tuples it can prune away.)
799 */
800 vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
802 vacrel->vistest = GlobalVisTestFor(rel);
803
804 /* Initialize state used to track oldest extant XID/MXID */
805 vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
806 vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
807
808 /*
809 * Initialize state related to tracking all-visible page skipping. This is
810 * very important to determine whether or not it is safe to advance the
811 * relfrozenxid/relminmxid.
812 */
813 vacrel->skippedallvis = false;
814 skipwithvm = true;
816 {
817 /*
818 * Force aggressive mode, and disable skipping blocks using the
819 * visibility map (even those set all-frozen)
820 */
821 vacrel->aggressive = true;
822 skipwithvm = false;
823 }
824
825 vacrel->skipwithvm = skipwithvm;
826
827 /*
828 * Set up eager scan tracking state. This must happen after determining
829 * whether or not the vacuum must be aggressive, because only normal
830 * vacuums use the eager scan algorithm.
831 */
833
834 /* Report the vacuum mode: 'normal' or 'aggressive' */
836 vacrel->aggressive
839
840 if (verbose)
841 {
842 if (vacrel->aggressive)
844 (errmsg("aggressively vacuuming \"%s.%s.%s\"",
845 vacrel->dbname, vacrel->relnamespace,
846 vacrel->relname)));
847 else
849 (errmsg("vacuuming \"%s.%s.%s\"",
850 vacrel->dbname, vacrel->relnamespace,
851 vacrel->relname)));
852 }
853
854 /*
855 * Allocate dead_items memory using dead_items_alloc. This handles
856 * parallel VACUUM initialization as part of allocating shared memory
857 * space used for dead_items. (But do a failsafe precheck first, to
858 * ensure that parallel VACUUM won't be attempted at all when relfrozenxid
859 * is already dangerously old.)
860 */
863
864 /*
865 * Call lazy_scan_heap to perform all required heap pruning, index
866 * vacuuming, and heap vacuuming (plus related processing)
867 */
869
870 /*
871 * Save dead items max_bytes and update the memory usage statistics before
872 * cleanup, they are freed in parallel vacuum cases during
873 * dead_items_cleanup().
874 */
875 dead_items_max_bytes = vacrel->dead_items_info->max_bytes;
876 vacrel->total_dead_items_bytes += TidStoreMemoryUsage(vacrel->dead_items);
877
878 /*
879 * Free resources managed by dead_items_alloc. This ends parallel mode in
880 * passing when necessary.
881 */
884
885 /*
886 * Update pg_class entries for each of rel's indexes where appropriate.
887 *
888 * Unlike the later update to rel's pg_class entry, this is not critical.
889 * Maintains relpages/reltuples statistics used by the planner only.
890 */
891 if (vacrel->do_index_cleanup)
893
894 /* Done with rel's indexes */
895 vac_close_indexes(vacrel->nindexes, vacrel->indrels, NoLock);
896
897 /* Optionally truncate rel */
900
901 /* Pop the error context stack */
902 error_context_stack = errcallback.previous;
903
904 /* Report that we are now doing final cleanup */
907
908 /*
909 * Prepare to update rel's pg_class entry.
910 *
911 * Aggressive VACUUMs must always be able to advance relfrozenxid to a
912 * value >= FreezeLimit, and relminmxid to a value >= MultiXactCutoff.
913 * Non-aggressive VACUUMs may advance them by any amount, or not at all.
914 */
915 Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
916 TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
917 vacrel->cutoffs.relfrozenxid,
918 vacrel->NewRelfrozenXid));
919 Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
920 MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
921 vacrel->cutoffs.relminmxid,
922 vacrel->NewRelminMxid));
923 if (vacrel->skippedallvis)
924 {
925 /*
926 * Must keep original relfrozenxid in a non-aggressive VACUUM that
927 * chose to skip an all-visible page range. The state that tracks new
928 * values will have missed unfrozen XIDs from the pages we skipped.
929 */
930 Assert(!vacrel->aggressive);
931 vacrel->NewRelfrozenXid = InvalidTransactionId;
932 vacrel->NewRelminMxid = InvalidMultiXactId;
933 }
934
935 /*
936 * For safety, clamp relallvisible to be not more than what we're setting
937 * pg_class.relpages to
938 */
939 new_rel_pages = vacrel->rel_pages; /* After possible rel truncation */
943
944 /*
945 * An all-frozen block _must_ be all-visible. As such, clamp the count of
946 * all-frozen blocks to the count of all-visible blocks. This matches the
947 * clamping of relallvisible above.
948 */
951
952 /*
953 * Now actually update rel's pg_class entry.
954 *
955 * In principle new_live_tuples could be -1 indicating that we (still)
956 * don't know the tuple count. In practice that can't happen, since we
957 * scan every page that isn't skipped using the visibility map.
958 */
959 vac_update_relstats(rel, new_rel_pages, vacrel->new_live_tuples,
961 vacrel->nindexes > 0,
962 vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
964
965 /*
966 * Report results to the cumulative stats system, too.
967 *
968 * Deliberately avoid telling the stats system about LP_DEAD items that
969 * remain in the table due to VACUUM bypassing index and heap vacuuming.
970 * ANALYZE will consider the remaining LP_DEAD items to be dead "tuples".
971 * It seems like a good idea to err on the side of not vacuuming again too
972 * soon in cases where the failsafe prevented significant amounts of heap
973 * vacuuming.
974 */
976 Max(vacrel->new_live_tuples, 0),
977 vacrel->recently_dead_tuples +
978 vacrel->missed_dead_tuples,
979 starttime);
981
982 if (instrument)
983 {
985
986 if (verbose || params.log_vacuum_min_duration == 0 ||
989 {
990 long secs_dur;
991 int usecs_dur;
992 WalUsage walusage;
993 BufferUsage bufferusage;
995 char *msgfmt;
996 int32 diff;
997 double read_rate = 0,
998 write_rate = 0;
1002
1004 memset(&walusage, 0, sizeof(WalUsage));
1006 memset(&bufferusage, 0, sizeof(BufferUsage));
1008
1009 total_blks_hit = bufferusage.shared_blks_hit +
1010 bufferusage.local_blks_hit;
1011 total_blks_read = bufferusage.shared_blks_read +
1012 bufferusage.local_blks_read;
1014 bufferusage.local_blks_dirtied;
1015
1017 if (verbose)
1018 {
1019 /*
1020 * Aggressiveness already reported earlier, in dedicated
1021 * VACUUM VERBOSE ereport
1022 */
1023 Assert(!params.is_wraparound);
1024 msgfmt = _("finished vacuuming \"%s.%s.%s\": index scans: %d\n");
1025 }
1026 else if (params.is_wraparound)
1027 {
1028 /*
1029 * While it's possible for a VACUUM to be both is_wraparound
1030 * and !aggressive, that's just a corner-case -- is_wraparound
1031 * implies aggressive. Produce distinct output for the corner
1032 * case all the same, just in case.
1033 */
1034 if (vacrel->aggressive)
1035 msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
1036 else
1037 msgfmt = _("automatic vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
1038 }
1039 else
1040 {
1041 if (vacrel->aggressive)
1042 msgfmt = _("automatic aggressive vacuum of table \"%s.%s.%s\": index scans: %d\n");
1043 else
1044 msgfmt = _("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n");
1045 }
1047 vacrel->dbname,
1048 vacrel->relnamespace,
1049 vacrel->relname,
1050 vacrel->num_index_scans);
1051 appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total), %u eagerly scanned\n"),
1052 vacrel->removed_pages,
1054 vacrel->scanned_pages,
1055 orig_rel_pages == 0 ? 100.0 :
1056 100.0 * vacrel->scanned_pages /
1058 vacrel->eager_scanned_pages);
1060 _("tuples: %" PRId64 " removed, %" PRId64 " remain, %" PRId64 " are dead but not yet removable\n"),
1061 vacrel->tuples_deleted,
1062 (int64) vacrel->new_rel_tuples,
1063 vacrel->recently_dead_tuples);
1064 if (vacrel->missed_dead_tuples > 0)
1066 _("tuples missed: %" PRId64 " dead from %u pages not removed due to cleanup lock contention\n"),
1067 vacrel->missed_dead_tuples,
1068 vacrel->missed_dead_pages);
1070 vacrel->cutoffs.OldestXmin);
1072 _("removable cutoff: %u, which was %d XIDs old when operation ended\n"),
1073 vacrel->cutoffs.OldestXmin, diff);
1075 {
1076 diff = (int32) (vacrel->NewRelfrozenXid -
1077 vacrel->cutoffs.relfrozenxid);
1079 _("new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"),
1080 vacrel->NewRelfrozenXid, diff);
1081 }
1082 if (minmulti_updated)
1083 {
1084 diff = (int32) (vacrel->NewRelminMxid -
1085 vacrel->cutoffs.relminmxid);
1087 _("new relminmxid: %u, which is %d MXIDs ahead of previous value\n"),
1088 vacrel->NewRelminMxid, diff);
1089 }
1090 appendStringInfo(&buf, _("frozen: %u pages from table (%.2f%% of total) had %" PRId64 " tuples frozen\n"),
1091 vacrel->new_frozen_tuple_pages,
1092 orig_rel_pages == 0 ? 100.0 :
1093 100.0 * vacrel->new_frozen_tuple_pages /
1095 vacrel->tuples_frozen);
1096
1098 _("visibility map: %u pages set all-visible, %u pages set all-frozen (%u were all-visible)\n"),
1099 vacrel->new_all_visible_pages,
1100 vacrel->new_all_visible_all_frozen_pages +
1101 vacrel->new_all_frozen_pages,
1102 vacrel->new_all_frozen_pages);
1103 if (vacrel->do_index_vacuuming)
1104 {
1105 if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
1106 appendStringInfoString(&buf, _("index scan not needed: "));
1107 else
1108 appendStringInfoString(&buf, _("index scan needed: "));
1109
1110 msgfmt = _("%u pages from table (%.2f%% of total) had %" PRId64 " dead item identifiers removed\n");
1111 }
1112 else
1113 {
1115 appendStringInfoString(&buf, _("index scan bypassed: "));
1116 else
1117 appendStringInfoString(&buf, _("index scan bypassed by failsafe: "));
1118
1119 msgfmt = _("%u pages from table (%.2f%% of total) have %" PRId64 " dead item identifiers\n");
1120 }
1122 vacrel->lpdead_item_pages,
1123 orig_rel_pages == 0 ? 100.0 :
1124 100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
1125 vacrel->lpdead_items);
1126 for (int i = 0; i < vacrel->nindexes; i++)
1127 {
1128 IndexBulkDeleteResult *istat = vacrel->indstats[i];
1129
1130 if (!istat)
1131 continue;
1132
1134 _("index \"%s\": pages: %u in total, %u newly deleted, %u currently deleted, %u reusable\n"),
1135 indnames[i],
1136 istat->num_pages,
1137 istat->pages_newly_deleted,
1138 istat->pages_deleted,
1139 istat->pages_free);
1140 }
1142 {
1143 /*
1144 * We bypass the changecount mechanism because this value is
1145 * only updated by the calling process. We also rely on the
1146 * above call to pgstat_progress_end_command() to not clear
1147 * the st_progress_param array.
1148 */
1149 appendStringInfo(&buf, _("delay time: %.3f ms\n"),
1151 }
1152 if (track_io_timing)
1153 {
1154 double read_ms = (double) (pgStatBlockReadTime - startreadtime) / 1000;
1155 double write_ms = (double) (pgStatBlockWriteTime - startwritetime) / 1000;
1156
1157 appendStringInfo(&buf, _("I/O timings: read: %.3f ms, write: %.3f ms\n"),
1158 read_ms, write_ms);
1159 }
1160 if (secs_dur > 0 || usecs_dur > 0)
1161 {
1163 (1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
1165 (1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
1166 }
1167 appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
1170 _("buffer usage: %" PRId64 " hits, %" PRId64 " reads, %" PRId64 " dirtied\n"),
1175 _("WAL usage: %" PRId64 " records, %" PRId64 " full page images, %" PRIu64 " bytes, %" PRIu64 " full page image bytes, %" PRId64 " buffers full\n"),
1176 walusage.wal_records,
1177 walusage.wal_fpi,
1178 walusage.wal_bytes,
1179 walusage.wal_fpi_bytes,
1180 walusage.wal_buffers_full);
1181
1182 /*
1183 * Report the dead items memory usage.
1184 *
1185 * The num_dead_items_resets counter increases when we reset the
1186 * collected dead items, so the counter is non-zero if at least
1187 * one dead items are collected, even if index vacuuming is
1188 * disabled.
1189 */
1191 ngettext("memory usage: dead item storage %.2f MB accumulated across %d reset (limit %.2f MB each)\n",
1192 "memory usage: dead item storage %.2f MB accumulated across %d resets (limit %.2f MB each)\n",
1193 vacrel->num_dead_items_resets),
1194 (double) vacrel->total_dead_items_bytes / (1024 * 1024),
1195 vacrel->num_dead_items_resets,
1196 (double) dead_items_max_bytes / (1024 * 1024));
1197 appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
1198
1199 ereport(verbose ? INFO : LOG,
1200 (errmsg_internal("%s", buf.data)));
1201 pfree(buf.data);
1202 }
1203 }
1204
1205 /* Cleanup index statistics and index names */
1206 for (int i = 0; i < vacrel->nindexes; i++)
1207 {
1208 if (vacrel->indstats[i])
1209 pfree(vacrel->indstats[i]);
1210
1211 if (instrument)
1212 pfree(indnames[i]);
1213 }
1214}
1215
1216/*
1217 * lazy_scan_heap() -- workhorse function for VACUUM
1218 *
1219 * This routine prunes each page in the heap, and considers the need to
1220 * freeze remaining tuples with storage (not including pages that can be
1221 * skipped using the visibility map). Also performs related maintenance
1222 * of the FSM and visibility map. These steps all take place during an
1223 * initial pass over the target heap relation.
1224 *
1225 * Also invokes lazy_vacuum_all_indexes to vacuum indexes, which largely
1226 * consists of deleting index tuples that point to LP_DEAD items left in
1227 * heap pages following pruning. Earlier initial pass over the heap will
1228 * have collected the TIDs whose index tuples need to be removed.
1229 *
1230 * Finally, invokes lazy_vacuum_heap_rel to vacuum heap pages, which
1231 * largely consists of marking LP_DEAD items (from vacrel->dead_items)
1232 * as LP_UNUSED. This has to happen in a second, final pass over the
1233 * heap, to preserve a basic invariant that all index AMs rely on: no
1234 * extant index tuple can ever be allowed to contain a TID that points to
1235 * an LP_UNUSED line pointer in the heap. We must disallow premature
1236 * recycling of line pointers to avoid index scans that get confused
1237 * about which TID points to which tuple immediately after recycling.
1238 * (Actually, this isn't a concern when target heap relation happens to
1239 * have no indexes, which allows us to safely apply the one-pass strategy
1240 * as an optimization).
1241 *
1242 * In practice we often have enough space to fit all TIDs, and so won't
1243 * need to call lazy_vacuum more than once, after our initial pass over
1244 * the heap has totally finished. Otherwise things are slightly more
1245 * complicated: our "initial pass" over the heap applies only to those
1246 * pages that were pruned before we needed to call lazy_vacuum, and our
1247 * "final pass" over the heap only vacuums these same heap pages.
1248 * However, we process indexes in full every time lazy_vacuum is called,
1249 * which makes index processing very inefficient when memory is in short
1250 * supply.
1251 */
1252static void
1254{
1255 ReadStream *stream;
1256 BlockNumber rel_pages = vacrel->rel_pages,
1257 blkno = 0,
1260 vacrel->eager_scan_remaining_successes; /* for logging */
1261 Buffer vmbuffer = InvalidBuffer;
1262 const int initprog_index[] = {
1266 };
1268
1269 /* Report that we're scanning the heap, advertising total # of blocks */
1271 initprog_val[1] = rel_pages;
1272 initprog_val[2] = vacrel->dead_items_info->max_bytes;
1274
1275 /* Initialize for the first heap_vac_scan_next_block() call */
1276 vacrel->current_block = InvalidBlockNumber;
1277 vacrel->next_unskippable_block = InvalidBlockNumber;
1278 vacrel->next_unskippable_eager_scanned = false;
1279 vacrel->next_unskippable_vmbuffer = InvalidBuffer;
1280
1281 /*
1282 * Set up the read stream for vacuum's first pass through the heap.
1283 *
1284 * This could be made safe for READ_STREAM_USE_BATCHING, but only with
1285 * explicit work in heap_vac_scan_next_block.
1286 */
1288 vacrel->bstrategy,
1289 vacrel->rel,
1292 vacrel,
1293 sizeof(bool));
1294
1295 while (true)
1296 {
1297 Buffer buf;
1298 Page page;
1299 bool was_eager_scanned = false;
1300 int ndeleted = 0;
1301 bool has_lpdead_items;
1302 void *per_buffer_data = NULL;
1303 bool vm_page_frozen = false;
1304 bool got_cleanup_lock = false;
1305
1306 vacuum_delay_point(false);
1307
1308 /*
1309 * Regularly check if wraparound failsafe should trigger.
1310 *
1311 * There is a similar check inside lazy_vacuum_all_indexes(), but
1312 * relfrozenxid might start to look dangerously old before we reach
1313 * that point. This check also provides failsafe coverage for the
1314 * one-pass strategy, and the two-pass strategy with the index_cleanup
1315 * param set to 'off'.
1316 */
1317 if (vacrel->scanned_pages > 0 &&
1318 vacrel->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
1320
1321 /*
1322 * Consider if we definitely have enough space to process TIDs on page
1323 * already. If we are close to overrunning the available space for
1324 * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
1325 * this page. However, let's force at least one page-worth of tuples
1326 * to be stored as to ensure we do at least some work when the memory
1327 * configured is so low that we run out before storing anything.
1328 */
1329 if (vacrel->dead_items_info->num_items > 0 &&
1330 TidStoreMemoryUsage(vacrel->dead_items) > vacrel->dead_items_info->max_bytes)
1331 {
1332 /*
1333 * Before beginning index vacuuming, we release any pin we may
1334 * hold on the visibility map page. This isn't necessary for
1335 * correctness, but we do it anyway to avoid holding the pin
1336 * across a lengthy, unrelated operation.
1337 */
1338 if (BufferIsValid(vmbuffer))
1339 {
1340 ReleaseBuffer(vmbuffer);
1341 vmbuffer = InvalidBuffer;
1342 }
1343
1344 /* Perform a round of index and heap vacuuming */
1345 vacrel->consider_bypass_optimization = false;
1347
1348 /*
1349 * Vacuum the Free Space Map to make newly-freed space visible on
1350 * upper-level FSM pages. Note that blkno is the previously
1351 * processed block.
1352 */
1354 blkno + 1);
1356
1357 /* Report that we are once again scanning the heap */
1360 }
1361
1362 buf = read_stream_next_buffer(stream, &per_buffer_data);
1363
1364 /* The relation is exhausted. */
1365 if (!BufferIsValid(buf))
1366 break;
1367
1368 was_eager_scanned = *((bool *) per_buffer_data);
1370 page = BufferGetPage(buf);
1371 blkno = BufferGetBlockNumber(buf);
1372
1373 vacrel->scanned_pages++;
1375 vacrel->eager_scanned_pages++;
1376
1377 /* Report as block scanned, update error traceback information */
1380 blkno, InvalidOffsetNumber);
1381
1382 /*
1383 * Pin the visibility map page in case we need to mark the page
1384 * all-visible. In most cases this will be very cheap, because we'll
1385 * already have the correct page pinned anyway.
1386 */
1387 visibilitymap_pin(vacrel->rel, blkno, &vmbuffer);
1388
1389 /*
1390 * We need a buffer cleanup lock to prune HOT chains and defragment
1391 * the page in lazy_scan_prune. But when it's not possible to acquire
1392 * a cleanup lock right away, we may be able to settle for reduced
1393 * processing using lazy_scan_noprune.
1394 */
1396
1397 if (!got_cleanup_lock)
1399
1400 /* Check for new or empty pages before lazy_scan_[no]prune call */
1402 vmbuffer))
1403 {
1404 /* Processed as new/empty page (lock and pin released) */
1405 continue;
1406 }
1407
1408 /*
1409 * If we didn't get the cleanup lock, we can still collect LP_DEAD
1410 * items in the dead_items area for later vacuuming, count live and
1411 * recently dead tuples for vacuum logging, and determine if this
1412 * block could later be truncated. If we encounter any xid/mxids that
1413 * require advancing the relfrozenxid/relminxid, we'll have to wait
1414 * for a cleanup lock and call lazy_scan_prune().
1415 */
1416 if (!got_cleanup_lock &&
1417 !lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
1418 {
1419 /*
1420 * lazy_scan_noprune could not do all required processing. Wait
1421 * for a cleanup lock, and call lazy_scan_prune in the usual way.
1422 */
1423 Assert(vacrel->aggressive);
1426 got_cleanup_lock = true;
1427 }
1428
1429 /*
1430 * If we have a cleanup lock, we must now prune, freeze, and count
1431 * tuples. We may have acquired the cleanup lock originally, or we may
1432 * have gone back and acquired it after lazy_scan_noprune() returned
1433 * false. Either way, the page hasn't been processed yet.
1434 *
1435 * Like lazy_scan_noprune(), lazy_scan_prune() will count
1436 * recently_dead_tuples and live tuples for vacuum logging, determine
1437 * if the block can later be truncated, and accumulate the details of
1438 * remaining LP_DEAD line pointers on the page into dead_items. These
1439 * dead items include those pruned by lazy_scan_prune() as well as
1440 * line pointers previously marked LP_DEAD.
1441 */
1442 if (got_cleanup_lock)
1443 ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
1444 vmbuffer,
1446
1447 /*
1448 * Count an eagerly scanned page as a failure or a success.
1449 *
1450 * Only lazy_scan_prune() freezes pages, so if we didn't get the
1451 * cleanup lock, we won't have frozen the page. However, we only count
1452 * pages that were too new to require freezing as eager freeze
1453 * failures.
1454 *
1455 * We could gather more information from lazy_scan_noprune() about
1456 * whether or not there were tuples with XIDs or MXIDs older than the
1457 * FreezeLimit or MultiXactCutoff. However, for simplicity, we simply
1458 * exclude pages skipped due to cleanup lock contention from eager
1459 * freeze algorithm caps.
1460 */
1462 {
1463 /* Aggressive vacuums do not eager scan. */
1464 Assert(!vacrel->aggressive);
1465
1466 if (vm_page_frozen)
1467 {
1468 if (vacrel->eager_scan_remaining_successes > 0)
1469 vacrel->eager_scan_remaining_successes--;
1470
1471 if (vacrel->eager_scan_remaining_successes == 0)
1472 {
1473 /*
1474 * Report only once that we disabled eager scanning. We
1475 * may eagerly read ahead blocks in excess of the success
1476 * or failure caps before attempting to freeze them, so we
1477 * could reach here even after disabling additional eager
1478 * scanning.
1479 */
1480 if (vacrel->eager_scan_max_fails_per_region > 0)
1481 ereport(vacrel->verbose ? INFO : DEBUG2,
1482 (errmsg("disabling eager scanning after freezing %u eagerly scanned blocks of relation \"%s.%s.%s\"",
1484 vacrel->dbname, vacrel->relnamespace,
1485 vacrel->relname)));
1486
1487 /*
1488 * If we hit our success cap, permanently disable eager
1489 * scanning by setting the other eager scan management
1490 * fields to their disabled values.
1491 */
1492 vacrel->eager_scan_remaining_fails = 0;
1493 vacrel->next_eager_scan_region_start = InvalidBlockNumber;
1494 vacrel->eager_scan_max_fails_per_region = 0;
1495 }
1496 }
1497 else if (vacrel->eager_scan_remaining_fails > 0)
1498 vacrel->eager_scan_remaining_fails--;
1499 }
1500
1501 /*
1502 * Now drop the buffer lock and, potentially, update the FSM.
1503 *
1504 * Our goal is to update the freespace map the last time we touch the
1505 * page. If we'll process a block in the second pass, we may free up
1506 * additional space on the page, so it is better to update the FSM
1507 * after the second pass. If the relation has no indexes, or if index
1508 * vacuuming is disabled, there will be no second heap pass; if this
1509 * particular page has no dead items, the second heap pass will not
1510 * touch this page. So, in those cases, update the FSM now.
1511 *
1512 * Note: In corner cases, it's possible to miss updating the FSM
1513 * entirely. If index vacuuming is currently enabled, we'll skip the
1514 * FSM update now. But if failsafe mode is later activated, or there
1515 * are so few dead tuples that index vacuuming is bypassed, there will
1516 * also be no opportunity to update the FSM later, because we'll never
1517 * revisit this page. Since updating the FSM is desirable but not
1518 * absolutely required, that's OK.
1519 */
1520 if (vacrel->nindexes == 0
1521 || !vacrel->do_index_vacuuming
1522 || !has_lpdead_items)
1523 {
1524 Size freespace = PageGetHeapFreeSpace(page);
1525
1527 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
1528
1529 /*
1530 * Periodically perform FSM vacuuming to make newly-freed space
1531 * visible on upper FSM pages. This is done after vacuuming if the
1532 * table has indexes. There will only be newly-freed space if we
1533 * held the cleanup lock and lazy_scan_prune() was called.
1534 */
1535 if (got_cleanup_lock && vacrel->nindexes == 0 && ndeleted > 0 &&
1537 {
1539 blkno);
1541 }
1542 }
1543 else
1545 }
1546
1547 vacrel->blkno = InvalidBlockNumber;
1548 if (BufferIsValid(vmbuffer))
1549 ReleaseBuffer(vmbuffer);
1550
1551 /*
1552 * Report that everything is now scanned. We never skip scanning the last
1553 * block in the relation, so we can pass rel_pages here.
1554 */
1556 rel_pages);
1557
1558 /* now we can compute the new value for pg_class.reltuples */
1559 vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
1560 vacrel->scanned_pages,
1561 vacrel->live_tuples);
1562
1563 /*
1564 * Also compute the total number of surviving heap entries. In the
1565 * (unlikely) scenario that new_live_tuples is -1, take it as zero.
1566 */
1567 vacrel->new_rel_tuples =
1568 Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
1569 vacrel->missed_dead_tuples;
1570
1571 read_stream_end(stream);
1572
1573 /*
1574 * Do index vacuuming (call each index's ambulkdelete routine), then do
1575 * related heap vacuuming
1576 */
1577 if (vacrel->dead_items_info->num_items > 0)
1579
1580 /*
1581 * Vacuum the remainder of the Free Space Map. We must do this whether or
1582 * not there were indexes, and whether or not we bypassed index vacuuming.
1583 * We can pass rel_pages here because we never skip scanning the last
1584 * block of the relation.
1585 */
1586 if (rel_pages > next_fsm_block_to_vacuum)
1588
1589 /* report all blocks vacuumed */
1591
1592 /* Do final index cleanup (call each index's amvacuumcleanup routine) */
1593 if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
1595}
1596
1597/*
1598 * heap_vac_scan_next_block() -- read stream callback to get the next block
1599 * for vacuum to process
1600 *
1601 * Every time lazy_scan_heap() needs a new block to process during its first
1602 * phase, it invokes read_stream_next_buffer() with a stream set up to call
1603 * heap_vac_scan_next_block() to get the next block.
1604 *
1605 * heap_vac_scan_next_block() uses the visibility map, vacuum options, and
1606 * various thresholds to skip blocks which do not need to be processed and
1607 * returns the next block to process or InvalidBlockNumber if there are no
1608 * remaining blocks.
1609 *
1610 * The visibility status of the next block to process and whether or not it
1611 * was eager scanned is set in the per_buffer_data.
1612 *
1613 * callback_private_data contains a reference to the LVRelState, passed to the
1614 * read stream API during stream setup. The LVRelState is an in/out parameter
1615 * here (locally named `vacrel`). Vacuum options and information about the
1616 * relation are read from it. vacrel->skippedallvis is set if we skip a block
1617 * that's all-visible but not all-frozen (to ensure that we don't update
1618 * relfrozenxid in that case). vacrel also holds information about the next
1619 * unskippable block -- as bookkeeping for this function.
1620 */
1621static BlockNumber
1623 void *callback_private_data,
1624 void *per_buffer_data)
1625{
1627 LVRelState *vacrel = callback_private_data;
1628
1629 /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
1631
1632 /* Have we reached the end of the relation? */
1633 if (next_block >= vacrel->rel_pages)
1634 {
1635 if (BufferIsValid(vacrel->next_unskippable_vmbuffer))
1636 {
1637 ReleaseBuffer(vacrel->next_unskippable_vmbuffer);
1638 vacrel->next_unskippable_vmbuffer = InvalidBuffer;
1639 }
1640 return InvalidBlockNumber;
1641 }
1642
1643 /*
1644 * We must be in one of the three following states:
1645 */
1646 if (next_block > vacrel->next_unskippable_block ||
1647 vacrel->next_unskippable_block == InvalidBlockNumber)
1648 {
1649 /*
1650 * 1. We have just processed an unskippable block (or we're at the
1651 * beginning of the scan). Find the next unskippable block using the
1652 * visibility map.
1653 */
1654 bool skipsallvis;
1655
1657
1658 /*
1659 * We now know the next block that we must process. It can be the
1660 * next block after the one we just processed, or something further
1661 * ahead. If it's further ahead, we can jump to it, but we choose to
1662 * do so only if we can skip at least SKIP_PAGES_THRESHOLD consecutive
1663 * pages. Since we're reading sequentially, the OS should be doing
1664 * readahead for us, so there's no gain in skipping a page now and
1665 * then. Skipping such a range might even discourage sequential
1666 * detection.
1667 *
1668 * This test also enables more frequent relfrozenxid advancement
1669 * during non-aggressive VACUUMs. If the range has any all-visible
1670 * pages then skipping makes updating relfrozenxid unsafe, which is a
1671 * real downside.
1672 */
1673 if (vacrel->next_unskippable_block - next_block >= SKIP_PAGES_THRESHOLD)
1674 {
1675 next_block = vacrel->next_unskippable_block;
1676 if (skipsallvis)
1677 vacrel->skippedallvis = true;
1678 }
1679 }
1680
1681 /* Now we must be in one of the two remaining states: */
1682 if (next_block < vacrel->next_unskippable_block)
1683 {
1684 /*
1685 * 2. We are processing a range of blocks that we could have skipped
1686 * but chose not to. We know that they are all-visible in the VM,
1687 * otherwise they would've been unskippable.
1688 */
1689 vacrel->current_block = next_block;
1690 /* Block was not eager scanned */
1691 *((bool *) per_buffer_data) = false;
1692 return vacrel->current_block;
1693 }
1694 else
1695 {
1696 /*
1697 * 3. We reached the next unskippable block. Process it. On next
1698 * iteration, we will be back in state 1.
1699 */
1700 Assert(next_block == vacrel->next_unskippable_block);
1701
1702 vacrel->current_block = next_block;
1703 *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
1704 return vacrel->current_block;
1705 }
1706}
1707
1708/*
1709 * Find the next unskippable block in a vacuum scan using the visibility map.
1710 * The next unskippable block and its visibility information is updated in
1711 * vacrel.
1712 *
1713 * Note: our opinion of which blocks can be skipped can go stale immediately.
1714 * It's okay if caller "misses" a page whose all-visible or all-frozen marking
1715 * was concurrently cleared, though. All that matters is that caller scan all
1716 * pages whose tuples might contain XIDs < OldestXmin, or MXIDs < OldestMxact.
1717 * (Actually, non-aggressive VACUUMs can choose to skip all-visible pages with
1718 * older XIDs/MXIDs. The *skippedallvis flag will be set here when the choice
1719 * to skip such a range is actually made, making everything safe.)
1720 */
1721static void
1723{
1724 BlockNumber rel_pages = vacrel->rel_pages;
1725 BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
1726 Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
1727 bool next_unskippable_eager_scanned = false;
1728
1729 *skipsallvis = false;
1730
1731 for (;; next_unskippable_block++)
1732 {
1734 next_unskippable_block,
1735 &next_unskippable_vmbuffer);
1736
1737
1738 /*
1739 * At the start of each eager scan region, normal vacuums with eager
1740 * scanning enabled reset the failure counter, allowing vacuum to
1741 * resume eager scanning if it had been suspended in the previous
1742 * region.
1743 */
1744 if (next_unskippable_block >= vacrel->next_eager_scan_region_start)
1745 {
1746 vacrel->eager_scan_remaining_fails =
1747 vacrel->eager_scan_max_fails_per_region;
1748 vacrel->next_eager_scan_region_start += EAGER_SCAN_REGION_SIZE;
1749 }
1750
1751 /*
1752 * A block is unskippable if it is not all visible according to the
1753 * visibility map.
1754 */
1756 {
1758 break;
1759 }
1760
1761 /*
1762 * Caller must scan the last page to determine whether it has tuples
1763 * (caller must have the opportunity to set vacrel->nonempty_pages).
1764 * This rule avoids having lazy_truncate_heap() take access-exclusive
1765 * lock on rel to attempt a truncation that fails anyway, just because
1766 * there are tuples on the last page (it is likely that there will be
1767 * tuples on other nearby pages as well, but those can be skipped).
1768 *
1769 * Implement this by always treating the last block as unsafe to skip.
1770 */
1771 if (next_unskippable_block == rel_pages - 1)
1772 break;
1773
1774 /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
1775 if (!vacrel->skipwithvm)
1776 break;
1777
1778 /*
1779 * All-frozen pages cannot contain XIDs < OldestXmin (XIDs that aren't
1780 * already frozen by now), so this page can be skipped.
1781 */
1782 if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
1783 continue;
1784
1785 /*
1786 * Aggressive vacuums cannot skip any all-visible pages that are not
1787 * also all-frozen.
1788 */
1789 if (vacrel->aggressive)
1790 break;
1791
1792 /*
1793 * Normal vacuums with eager scanning enabled only skip all-visible
1794 * but not all-frozen pages if they have hit the failure limit for the
1795 * current eager scan region.
1796 */
1797 if (vacrel->eager_scan_remaining_fails > 0)
1798 {
1799 next_unskippable_eager_scanned = true;
1800 break;
1801 }
1802
1803 /*
1804 * All-visible blocks are safe to skip in a normal vacuum. But
1805 * remember that the final range contains such a block for later.
1806 */
1807 *skipsallvis = true;
1808 }
1809
1810 /* write the local variables back to vacrel */
1811 vacrel->next_unskippable_block = next_unskippable_block;
1812 vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
1813 vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
1814}
1815
1816/*
1817 * lazy_scan_new_or_empty() -- lazy_scan_heap() new/empty page handling.
1818 *
1819 * Must call here to handle both new and empty pages before calling
1820 * lazy_scan_prune or lazy_scan_noprune, since they're not prepared to deal
1821 * with new or empty pages.
1822 *
1823 * It's necessary to consider new pages as a special case, since the rules for
1824 * maintaining the visibility map and FSM with empty pages are a little
1825 * different (though new pages can be truncated away during rel truncation).
1826 *
1827 * Empty pages are not really a special case -- they're just heap pages that
1828 * have no allocated tuples (including even LP_UNUSED items). You might
1829 * wonder why we need to handle them here all the same. It's only necessary
1830 * because of a corner-case involving a hard crash during heap relation
1831 * extension. If we ever make relation-extension crash safe, then it should
1832 * no longer be necessary to deal with empty pages here (or new pages, for
1833 * that matter).
1834 *
1835 * Caller must hold at least a shared lock. We might need to escalate the
1836 * lock in that case, so the type of lock caller holds needs to be specified
1837 * using 'sharelock' argument.
1838 *
1839 * Returns false in common case where caller should go on to call
1840 * lazy_scan_prune (or lazy_scan_noprune). Otherwise returns true, indicating
1841 * that lazy_scan_heap is done processing the page, releasing lock on caller's
1842 * behalf.
1843 *
1844 * No vm_page_frozen output parameter (like that passed to lazy_scan_prune())
1845 * is passed here because neither empty nor new pages can be eagerly frozen.
1846 * New pages are never frozen. Empty pages are always set frozen in the VM at
1847 * the same time that they are set all-visible, and we don't eagerly scan
1848 * frozen pages.
1849 */
1850static bool
1852 Page page, bool sharelock, Buffer vmbuffer)
1853{
1854 Size freespace;
1855
1856 if (PageIsNew(page))
1857 {
1858 /*
1859 * All-zeroes pages can be left over if either a backend extends the
1860 * relation by a single page, but crashes before the newly initialized
1861 * page has been written out, or when bulk-extending the relation
1862 * (which creates a number of empty pages at the tail end of the
1863 * relation), and then enters them into the FSM.
1864 *
1865 * Note we do not enter the page into the visibilitymap. That has the
1866 * downside that we repeatedly visit this page in subsequent vacuums,
1867 * but otherwise we'll never discover the space on a promoted standby.
1868 * The harm of repeated checking ought to normally not be too bad. The
1869 * space usually should be used at some point, otherwise there
1870 * wouldn't be any regular vacuums.
1871 *
1872 * Make sure these pages are in the FSM, to ensure they can be reused.
1873 * Do that by testing if there's any space recorded for the page. If
1874 * not, enter it. We do so after releasing the lock on the heap page,
1875 * the FSM is approximate, after all.
1876 */
1878
1879 if (GetRecordedFreeSpace(vacrel->rel, blkno) == 0)
1880 {
1881 freespace = BLCKSZ - SizeOfPageHeaderData;
1882
1883 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
1884 }
1885
1886 return true;
1887 }
1888
1889 if (PageIsEmpty(page))
1890 {
1891 /*
1892 * It seems likely that caller will always be able to get a cleanup
1893 * lock on an empty page. But don't take any chances -- escalate to
1894 * an exclusive lock (still don't need a cleanup lock, though).
1895 */
1896 if (sharelock)
1897 {
1900
1901 if (!PageIsEmpty(page))
1902 {
1903 /* page isn't new or empty -- keep lock and pin for now */
1904 return false;
1905 }
1906 }
1907 else
1908 {
1909 /* Already have a full cleanup lock (which is more than enough) */
1910 }
1911
1912 /*
1913 * Unlike new pages, empty pages are always set all-visible and
1914 * all-frozen.
1915 */
1916 if (!PageIsAllVisible(page))
1917 {
1919
1920 /* mark buffer dirty before writing a WAL record */
1922
1923 /*
1924 * It's possible that another backend has extended the heap,
1925 * initialized the page, and then failed to WAL-log the page due
1926 * to an ERROR. Since heap extension is not WAL-logged, recovery
1927 * might try to replay our record setting the page all-visible and
1928 * find that the page isn't initialized, which will cause a PANIC.
1929 * To prevent that, check whether the page has been previously
1930 * WAL-logged, and if not, do that now.
1931 */
1932 if (RelationNeedsWAL(vacrel->rel) &&
1934 log_newpage_buffer(buf, true);
1935
1936 PageSetAllVisible(page);
1937 PageClearPrunable(page);
1938 visibilitymap_set(vacrel->rel, blkno, buf,
1940 vmbuffer, InvalidTransactionId,
1944
1945 /* Count the newly all-frozen pages for logging */
1946 vacrel->new_all_visible_pages++;
1947 vacrel->new_all_visible_all_frozen_pages++;
1948 }
1949
1950 freespace = PageGetHeapFreeSpace(page);
1952 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
1953 return true;
1954 }
1955
1956 /* page isn't new or empty -- keep lock and pin */
1957 return false;
1958}
1959
1960/* qsort comparator for sorting OffsetNumbers */
1961static int
1962cmpOffsetNumbers(const void *a, const void *b)
1963{
1964 return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
1965}
1966
1967/*
1968 * Helper to correct any corruption detected on a heap page and its
1969 * corresponding visibility map page after pruning but before setting the
1970 * visibility map. It examines the heap page, the associated VM page, and the
1971 * number of dead items previously identified.
1972 *
1973 * This function must be called while holding an exclusive lock on the heap
1974 * buffer, and the dead items must have been discovered under that same lock.
1975
1976 * The provided vmbits must reflect the current state of the VM block
1977 * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
1978 * is pinned, and the heap buffer is exclusively locked, ensuring that no
1979 * other backend can update the VM bits corresponding to this heap page.
1980 *
1981 * If it clears corruption, it will zero out vmbits.
1982 */
1983static void
1986 int nlpdead_items,
1987 Buffer vmbuffer,
1988 uint8 *vmbits)
1989{
1990 Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
1991
1993
1994 /*
1995 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
1996 * page-level bit is clear. However, it's possible that the bit got
1997 * cleared after heap_vac_scan_next_block() was called, so we must recheck
1998 * with buffer lock before concluding that the VM is corrupt.
1999 */
2001 ((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
2002 {
2005 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
2007
2008 visibilitymap_clear(rel, heap_blk, vmbuffer,
2010 *vmbits = 0;
2011 }
2012
2013 /*
2014 * It's possible for the value returned by
2015 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
2016 * wrong for us to see tuples that appear to not be visible to everyone
2017 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
2018 * never moves backwards, but GetOldestNonRemovableTransactionId() is
2019 * conservative and sometimes returns a value that's unnecessarily small,
2020 * so if we see that contradiction it just means that the tuples that we
2021 * think are not visible to everyone yet actually are, and the
2022 * PD_ALL_VISIBLE flag is correct.
2023 *
2024 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
2025 * however.
2026 */
2027 else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
2028 {
2031 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
2033
2036 visibilitymap_clear(rel, heap_blk, vmbuffer,
2038 *vmbits = 0;
2039 }
2040}
2041
2042/*
2043 * lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
2044 *
2045 * Caller must hold pin and buffer cleanup lock on the buffer.
2046 *
2047 * vmbuffer is the buffer containing the VM block with visibility information
2048 * for the heap block, blkno.
2049 *
2050 * *has_lpdead_items is set to true or false depending on whether, upon return
2051 * from this function, any LP_DEAD items are still present on the page.
2052 *
2053 * *vm_page_frozen is set to true if the page is newly set all-frozen in the
2054 * VM. The caller currently only uses this for determining whether an eagerly
2055 * scanned page was successfully set all-frozen.
2056 *
2057 * Returns the number of tuples deleted from the page during HOT pruning.
2058 */
2059static int
2061 Buffer buf,
2062 BlockNumber blkno,
2063 Page page,
2064 Buffer vmbuffer,
2065 bool *has_lpdead_items,
2066 bool *vm_page_frozen)
2067{
2068 Relation rel = vacrel->rel;
2070 PruneFreezeParams params = {
2071 .relation = rel,
2072 .buffer = buf,
2073 .reason = PRUNE_VACUUM_SCAN,
2074 .options = HEAP_PAGE_PRUNE_FREEZE,
2075 .vistest = vacrel->vistest,
2076 .cutoffs = &vacrel->cutoffs,
2077 };
2078 uint8 old_vmbits = 0;
2079 uint8 new_vmbits = 0;
2080
2081 Assert(BufferGetBlockNumber(buf) == blkno);
2082
2083 /*
2084 * Prune all HOT-update chains and potentially freeze tuples on this page.
2085 *
2086 * If the relation has no indexes, we can immediately mark would-be dead
2087 * items LP_UNUSED.
2088 *
2089 * The number of tuples removed from the page is returned in
2090 * presult.ndeleted. It should not be confused with presult.lpdead_items;
2091 * presult.lpdead_items's final value can be thought of as the number of
2092 * tuples that were deleted from indexes.
2093 *
2094 * We will update the VM after collecting LP_DEAD items and freezing
2095 * tuples. Pruning will have determined whether or not the page is
2096 * all-visible.
2097 */
2098 if (vacrel->nindexes == 0)
2100
2102 &presult,
2103 &vacrel->offnum,
2104 &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
2105
2106 Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
2107 Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
2108
2109 if (presult.nfrozen > 0)
2110 {
2111 /*
2112 * We don't increment the new_frozen_tuple_pages instrumentation
2113 * counter when nfrozen == 0, since it only counts pages with newly
2114 * frozen tuples (don't confuse that with pages newly set all-frozen
2115 * in VM).
2116 */
2117 vacrel->new_frozen_tuple_pages++;
2118 }
2119
2120 /*
2121 * VACUUM will call heap_page_is_all_visible() during the second pass over
2122 * the heap to determine all_visible and all_frozen for the page -- this
2123 * is a specialized version of the logic from this function. Now that
2124 * we've finished pruning and freezing, make sure that we're in total
2125 * agreement with heap_page_is_all_visible() using an assertion.
2126 */
2127#ifdef USE_ASSERT_CHECKING
2128 if (presult.set_all_visible)
2129 {
2131 bool debug_all_frozen;
2132
2133 Assert(presult.lpdead_items == 0);
2134
2136 vacrel->cutoffs.OldestXmin, &debug_all_frozen,
2137 &debug_cutoff, &vacrel->offnum));
2138
2139 Assert(presult.set_all_frozen == debug_all_frozen);
2140
2142 debug_cutoff == presult.vm_conflict_horizon);
2143 }
2144#endif
2145
2146 /*
2147 * Now save details of the LP_DEAD items from the page in vacrel
2148 */
2149 if (presult.lpdead_items > 0)
2150 {
2151 vacrel->lpdead_item_pages++;
2152
2153 /*
2154 * deadoffsets are collected incrementally in
2155 * heap_page_prune_and_freeze() as each dead line pointer is recorded,
2156 * with an indeterminate order, but dead_items_add requires them to be
2157 * sorted.
2158 */
2159 qsort(presult.deadoffsets, presult.lpdead_items, sizeof(OffsetNumber),
2161
2162 dead_items_add(vacrel, blkno, presult.deadoffsets, presult.lpdead_items);
2163 }
2164
2165 /* Finally, add page-local counts to whole-VACUUM counts */
2166 vacrel->tuples_deleted += presult.ndeleted;
2167 vacrel->tuples_frozen += presult.nfrozen;
2168 vacrel->lpdead_items += presult.lpdead_items;
2169 vacrel->live_tuples += presult.live_tuples;
2170 vacrel->recently_dead_tuples += presult.recently_dead_tuples;
2171
2172 /* Can't truncate this page */
2173 if (presult.hastup)
2174 vacrel->nonempty_pages = blkno + 1;
2175
2176 /* Did we find LP_DEAD items? */
2177 *has_lpdead_items = (presult.lpdead_items > 0);
2178
2179 Assert(!presult.set_all_visible || !(*has_lpdead_items));
2180 Assert(!presult.set_all_frozen || presult.set_all_visible);
2181
2182 old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
2183
2184 identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
2185 presult.lpdead_items, vmbuffer,
2186 &old_vmbits);
2187
2188 if (!presult.set_all_visible)
2189 return presult.ndeleted;
2190
2191 /* Set the visibility map and page visibility hint */
2193
2194 if (presult.set_all_frozen)
2196
2197 /* Nothing to do */
2198 if (old_vmbits == new_vmbits)
2199 return presult.ndeleted;
2200
2201 /*
2202 * It should never be the case that the visibility map page is set while
2203 * the page-level bit is clear (and if so, we cleared it above), but the
2204 * reverse is allowed (if checksums are not enabled). Regardless, set both
2205 * bits so that we get back in sync.
2206 *
2207 * The heap buffer must be marked dirty before adding it to the WAL chain
2208 * when setting the VM. We don't worry about unnecessarily dirtying the
2209 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
2210 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
2211 * the VM bits clear, so there is no point in optimizing it.
2212 */
2213 PageSetAllVisible(page);
2214 PageClearPrunable(page);
2216
2217 /*
2218 * If the page is being set all-frozen, we pass InvalidTransactionId as
2219 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
2220 * everything safe for REDO was logged when the page's tuples were frozen.
2221 */
2222 Assert(!presult.set_all_frozen ||
2223 !TransactionIdIsValid(presult.vm_conflict_horizon));
2224
2225 visibilitymap_set(vacrel->rel, blkno, buf,
2227 vmbuffer, presult.vm_conflict_horizon,
2228 new_vmbits);
2229
2230 /*
2231 * If the page wasn't already set all-visible and/or all-frozen in the VM,
2232 * count it as newly set for logging.
2233 */
2235 {
2236 vacrel->new_all_visible_pages++;
2237 if (presult.set_all_frozen)
2238 {
2239 vacrel->new_all_visible_all_frozen_pages++;
2240 *vm_page_frozen = true;
2241 }
2242 }
2243 else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
2244 presult.set_all_frozen)
2245 {
2246 vacrel->new_all_frozen_pages++;
2247 *vm_page_frozen = true;
2248 }
2249
2250 return presult.ndeleted;
2251}
2252
2253/*
2254 * lazy_scan_noprune() -- lazy_scan_prune() without pruning or freezing
2255 *
2256 * Caller need only hold a pin and share lock on the buffer, unlike
2257 * lazy_scan_prune, which requires a full cleanup lock. While pruning isn't
2258 * performed here, it's quite possible that an earlier opportunistic pruning
2259 * operation left LP_DEAD items behind. We'll at least collect any such items
2260 * in dead_items for removal from indexes.
2261 *
2262 * For aggressive VACUUM callers, we may return false to indicate that a full
2263 * cleanup lock is required for processing by lazy_scan_prune. This is only
2264 * necessary when the aggressive VACUUM needs to freeze some tuple XIDs from
2265 * one or more tuples on the page. We always return true for non-aggressive
2266 * callers.
2267 *
2268 * If this function returns true, *has_lpdead_items gets set to true or false
2269 * depending on whether, upon return from this function, any LP_DEAD items are
2270 * present on the page. If this function returns false, *has_lpdead_items
2271 * is not updated.
2272 */
2273static bool
2275 Buffer buf,
2276 BlockNumber blkno,
2277 Page page,
2278 bool *has_lpdead_items)
2279{
2280 OffsetNumber offnum,
2281 maxoff;
2282 int lpdead_items,
2283 live_tuples,
2284 recently_dead_tuples,
2285 missed_dead_tuples;
2286 bool hastup;
2288 TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
2289 MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
2291
2292 Assert(BufferGetBlockNumber(buf) == blkno);
2293
2294 hastup = false; /* for now */
2295
2296 lpdead_items = 0;
2297 live_tuples = 0;
2298 recently_dead_tuples = 0;
2299 missed_dead_tuples = 0;
2300
2301 maxoff = PageGetMaxOffsetNumber(page);
2302 for (offnum = FirstOffsetNumber;
2303 offnum <= maxoff;
2304 offnum = OffsetNumberNext(offnum))
2305 {
2306 ItemId itemid;
2307 HeapTupleData tuple;
2308
2309 vacrel->offnum = offnum;
2310 itemid = PageGetItemId(page, offnum);
2311
2312 if (!ItemIdIsUsed(itemid))
2313 continue;
2314
2315 if (ItemIdIsRedirected(itemid))
2316 {
2317 hastup = true;
2318 continue;
2319 }
2320
2321 if (ItemIdIsDead(itemid))
2322 {
2323 /*
2324 * Deliberately don't set hastup=true here. See same point in
2325 * lazy_scan_prune for an explanation.
2326 */
2327 deadoffsets[lpdead_items++] = offnum;
2328 continue;
2329 }
2330
2331 hastup = true; /* page prevents rel truncation */
2332 tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
2334 &NoFreezePageRelfrozenXid,
2335 &NoFreezePageRelminMxid))
2336 {
2337 /* Tuple with XID < FreezeLimit (or MXID < MultiXactCutoff) */
2338 if (vacrel->aggressive)
2339 {
2340 /*
2341 * Aggressive VACUUMs must always be able to advance rel's
2342 * relfrozenxid to a value >= FreezeLimit (and be able to
2343 * advance rel's relminmxid to a value >= MultiXactCutoff).
2344 * The ongoing aggressive VACUUM won't be able to do that
2345 * unless it can freeze an XID (or MXID) from this tuple now.
2346 *
2347 * The only safe option is to have caller perform processing
2348 * of this page using lazy_scan_prune. Caller might have to
2349 * wait a while for a cleanup lock, but it can't be helped.
2350 */
2351 vacrel->offnum = InvalidOffsetNumber;
2352 return false;
2353 }
2354
2355 /*
2356 * Non-aggressive VACUUMs are under no obligation to advance
2357 * relfrozenxid (even by one XID). We can be much laxer here.
2358 *
2359 * Currently we always just accept an older final relfrozenxid
2360 * and/or relminmxid value. We never make caller wait or work a
2361 * little harder, even when it likely makes sense to do so.
2362 */
2363 }
2364
2365 ItemPointerSet(&(tuple.t_self), blkno, offnum);
2366 tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
2367 tuple.t_len = ItemIdGetLength(itemid);
2368 tuple.t_tableOid = RelationGetRelid(vacrel->rel);
2369
2370 switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
2371 buf))
2372 {
2374 case HEAPTUPLE_LIVE:
2375
2376 /*
2377 * Count both cases as live, just like lazy_scan_prune
2378 */
2379 live_tuples++;
2380
2381 break;
2382 case HEAPTUPLE_DEAD:
2383
2384 /*
2385 * There is some useful work for pruning to do, that won't be
2386 * done due to failure to get a cleanup lock.
2387 */
2388 missed_dead_tuples++;
2389 break;
2391
2392 /*
2393 * Count in recently_dead_tuples, just like lazy_scan_prune
2394 */
2395 recently_dead_tuples++;
2396 break;
2398
2399 /*
2400 * Do not count these rows as live, just like lazy_scan_prune
2401 */
2402 break;
2403 default:
2404 elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
2405 break;
2406 }
2407 }
2408
2409 vacrel->offnum = InvalidOffsetNumber;
2410
2411 /*
2412 * By here we know for sure that caller can put off freezing and pruning
2413 * this particular page until the next VACUUM. Remember its details now.
2414 * (lazy_scan_prune expects a clean slate, so we have to do this last.)
2415 */
2416 vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid;
2417 vacrel->NewRelminMxid = NoFreezePageRelminMxid;
2418
2419 /* Save any LP_DEAD items found on the page in dead_items */
2420 if (vacrel->nindexes == 0)
2421 {
2422 /* Using one-pass strategy (since table has no indexes) */
2423 if (lpdead_items > 0)
2424 {
2425 /*
2426 * Perfunctory handling for the corner case where a single pass
2427 * strategy VACUUM cannot get a cleanup lock, and it turns out
2428 * that there is one or more LP_DEAD items: just count the LP_DEAD
2429 * items as missed_dead_tuples instead. (This is a bit dishonest,
2430 * but it beats having to maintain specialized heap vacuuming code
2431 * forever, for vanishingly little benefit.)
2432 */
2433 hastup = true;
2434 missed_dead_tuples += lpdead_items;
2435 }
2436 }
2437 else if (lpdead_items > 0)
2438 {
2439 /*
2440 * Page has LP_DEAD items, and so any references/TIDs that remain in
2441 * indexes will be deleted during index vacuuming (and then marked
2442 * LP_UNUSED in the heap)
2443 */
2444 vacrel->lpdead_item_pages++;
2445
2446 dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
2447
2448 vacrel->lpdead_items += lpdead_items;
2449 }
2450
2451 /*
2452 * Finally, add relevant page-local counts to whole-VACUUM counts
2453 */
2454 vacrel->live_tuples += live_tuples;
2455 vacrel->recently_dead_tuples += recently_dead_tuples;
2456 vacrel->missed_dead_tuples += missed_dead_tuples;
2457 if (missed_dead_tuples > 0)
2458 vacrel->missed_dead_pages++;
2459
2460 /* Can't truncate this page */
2461 if (hastup)
2462 vacrel->nonempty_pages = blkno + 1;
2463
2464 /* Did we find LP_DEAD items? */
2465 *has_lpdead_items = (lpdead_items > 0);
2466
2467 /* Caller won't need to call lazy_scan_prune with same page */
2468 return true;
2469}
2470
2471/*
2472 * Main entry point for index vacuuming and heap vacuuming.
2473 *
2474 * Removes items collected in dead_items from table's indexes, then marks the
2475 * same items LP_UNUSED in the heap. See the comments above lazy_scan_heap
2476 * for full details.
2477 *
2478 * Also empties dead_items, freeing up space for later TIDs.
2479 *
2480 * We may choose to bypass index vacuuming at this point, though only when the
2481 * ongoing VACUUM operation will definitely only have one index scan/round of
2482 * index vacuuming.
2483 */
2484static void
2486{
2487 bool bypass;
2488
2489 /* Should not end up here with no indexes */
2490 Assert(vacrel->nindexes > 0);
2491 Assert(vacrel->lpdead_item_pages > 0);
2492
2493 if (!vacrel->do_index_vacuuming)
2494 {
2495 Assert(!vacrel->do_index_cleanup);
2497 return;
2498 }
2499
2500 /*
2501 * Consider bypassing index vacuuming (and heap vacuuming) entirely.
2502 *
2503 * We currently only do this in cases where the number of LP_DEAD items
2504 * for the entire VACUUM operation is close to zero. This avoids sharp
2505 * discontinuities in the duration and overhead of successive VACUUM
2506 * operations that run against the same table with a fixed workload.
2507 * Ideally, successive VACUUM operations will behave as if there are
2508 * exactly zero LP_DEAD items in cases where there are close to zero.
2509 *
2510 * This is likely to be helpful with a table that is continually affected
2511 * by UPDATEs that can mostly apply the HOT optimization, but occasionally
2512 * have small aberrations that lead to just a few heap pages retaining
2513 * only one or two LP_DEAD items. This is pretty common; even when the
2514 * DBA goes out of their way to make UPDATEs use HOT, it is practically
2515 * impossible to predict whether HOT will be applied in 100% of cases.
2516 * It's far easier to ensure that 99%+ of all UPDATEs against a table use
2517 * HOT through careful tuning.
2518 */
2519 bypass = false;
2520 if (vacrel->consider_bypass_optimization && vacrel->rel_pages > 0)
2521 {
2523
2524 Assert(vacrel->num_index_scans == 0);
2525 Assert(vacrel->lpdead_items == vacrel->dead_items_info->num_items);
2526 Assert(vacrel->do_index_vacuuming);
2527 Assert(vacrel->do_index_cleanup);
2528
2529 /*
2530 * This crossover point at which we'll start to do index vacuuming is
2531 * expressed as a percentage of the total number of heap pages in the
2532 * table that are known to have at least one LP_DEAD item. This is
2533 * much more important than the total number of LP_DEAD items, since
2534 * it's a proxy for the number of heap pages whose visibility map bits
2535 * cannot be set on account of bypassing index and heap vacuuming.
2536 *
2537 * We apply one further precautionary test: the space currently used
2538 * to store the TIDs (TIDs that now all point to LP_DEAD items) must
2539 * not exceed 32MB. This limits the risk that we will bypass index
2540 * vacuuming again and again until eventually there is a VACUUM whose
2541 * dead_items space is not CPU cache resident.
2542 *
2543 * We don't take any special steps to remember the LP_DEAD items (such
2544 * as counting them in our final update to the stats system) when the
2545 * optimization is applied. Though the accounting used in analyze.c's
2546 * acquire_sample_rows() will recognize the same LP_DEAD items as dead
2547 * rows in its own stats report, that's okay. The discrepancy should
2548 * be negligible. If this optimization is ever expanded to cover more
2549 * cases then this may need to be reconsidered.
2550 */
2552 bypass = (vacrel->lpdead_item_pages < threshold &&
2553 TidStoreMemoryUsage(vacrel->dead_items) < 32 * 1024 * 1024);
2554 }
2555
2556 if (bypass)
2557 {
2558 /*
2559 * There are almost zero TIDs. Behave as if there were precisely
2560 * zero: bypass index vacuuming, but do index cleanup.
2561 *
2562 * We expect that the ongoing VACUUM operation will finish very
2563 * quickly, so there is no point in considering speeding up as a
2564 * failsafe against wraparound failure. (Index cleanup is expected to
2565 * finish very quickly in cases where there were no ambulkdelete()
2566 * calls.)
2567 */
2568 vacrel->do_index_vacuuming = false;
2569 }
2571 {
2572 /*
2573 * We successfully completed a round of index vacuuming. Do related
2574 * heap vacuuming now.
2575 */
2577 }
2578 else
2579 {
2580 /*
2581 * Failsafe case.
2582 *
2583 * We attempted index vacuuming, but didn't finish a full round/full
2584 * index scan. This happens when relfrozenxid or relminmxid is too
2585 * far in the past.
2586 *
2587 * From this point on the VACUUM operation will do no further index
2588 * vacuuming or heap vacuuming. This VACUUM operation won't end up
2589 * back here again.
2590 */
2592 }
2593
2594 /*
2595 * Forget the LP_DEAD items that we just vacuumed (or just decided to not
2596 * vacuum)
2597 */
2599}
2600
2601/*
2602 * lazy_vacuum_all_indexes() -- Main entry for index vacuuming
2603 *
2604 * Returns true in the common case when all indexes were successfully
2605 * vacuumed. Returns false in rare cases where we determined that the ongoing
2606 * VACUUM operation is at risk of taking too long to finish, leading to
2607 * wraparound failure.
2608 */
2609static bool
2611{
2612 bool allindexes = true;
2613 double old_live_tuples = vacrel->rel->rd_rel->reltuples;
2614 const int progress_start_index[] = {
2617 };
2618 const int progress_end_index[] = {
2622 };
2625
2626 Assert(vacrel->nindexes > 0);
2627 Assert(vacrel->do_index_vacuuming);
2628 Assert(vacrel->do_index_cleanup);
2629
2630 /* Precheck for XID wraparound emergencies */
2632 {
2633 /* Wraparound emergency -- don't even start an index scan */
2634 return false;
2635 }
2636
2637 /*
2638 * Report that we are now vacuuming indexes and the number of indexes to
2639 * vacuum.
2640 */
2642 progress_start_val[1] = vacrel->nindexes;
2644
2646 {
2647 for (int idx = 0; idx < vacrel->nindexes; idx++)
2648 {
2649 Relation indrel = vacrel->indrels[idx];
2650 IndexBulkDeleteResult *istat = vacrel->indstats[idx];
2651
2652 vacrel->indstats[idx] = lazy_vacuum_one_index(indrel, istat,
2654 vacrel);
2655
2656 /* Report the number of indexes vacuumed */
2658 idx + 1);
2659
2661 {
2662 /* Wraparound emergency -- end current index scan */
2663 allindexes = false;
2664 break;
2665 }
2666 }
2667 }
2668 else
2669 {
2670 /* Outsource everything to parallel variant */
2672 vacrel->num_index_scans);
2673
2674 /*
2675 * Do a postcheck to consider applying wraparound failsafe now. Note
2676 * that parallel VACUUM only gets the precheck and this postcheck.
2677 */
2679 allindexes = false;
2680 }
2681
2682 /*
2683 * We delete all LP_DEAD items from the first heap pass in all indexes on
2684 * each call here (except calls where we choose to do the failsafe). This
2685 * makes the next call to lazy_vacuum_heap_rel() safe (except in the event
2686 * of the failsafe triggering, which prevents the next call from taking
2687 * place).
2688 */
2689 Assert(vacrel->num_index_scans > 0 ||
2690 vacrel->dead_items_info->num_items == vacrel->lpdead_items);
2692
2693 /*
2694 * Increase and report the number of index scans. Also, we reset
2695 * PROGRESS_VACUUM_INDEXES_TOTAL and PROGRESS_VACUUM_INDEXES_PROCESSED.
2696 *
2697 * We deliberately include the case where we started a round of bulk
2698 * deletes that we weren't able to finish due to the failsafe triggering.
2699 */
2700 vacrel->num_index_scans++;
2701 progress_end_val[0] = 0;
2702 progress_end_val[1] = 0;
2703 progress_end_val[2] = vacrel->num_index_scans;
2705
2706 return allindexes;
2707}
2708
2709/*
2710 * Read stream callback for vacuum's third phase (second pass over the heap).
2711 * Gets the next block from the TID store and returns it or InvalidBlockNumber
2712 * if there are no further blocks to vacuum.
2713 *
2714 * NB: Assumed to be safe to use with READ_STREAM_USE_BATCHING.
2715 */
2716static BlockNumber
2718 void *callback_private_data,
2719 void *per_buffer_data)
2720{
2721 TidStoreIter *iter = callback_private_data;
2723
2725 if (iter_result == NULL)
2726 return InvalidBlockNumber;
2727
2728 /*
2729 * Save the TidStoreIterResult for later, so we can extract the offsets.
2730 * It is safe to copy the result, according to TidStoreIterateNext().
2731 */
2732 memcpy(per_buffer_data, iter_result, sizeof(*iter_result));
2733
2734 return iter_result->blkno;
2735}
2736
2737/*
2738 * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
2739 *
2740 * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
2741 * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
2742 *
2743 * We may also be able to truncate the line pointer array of the heap pages we
2744 * visit. If there is a contiguous group of LP_UNUSED items at the end of the
2745 * array, it can be reclaimed as free space. These LP_UNUSED items usually
2746 * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
2747 * each page to LP_UNUSED, and then consider if it's possible to truncate the
2748 * page's line pointer array).
2749 *
2750 * Note: the reason for doing this as a second pass is we cannot remove the
2751 * tuples until we've removed their index entries, and we want to process
2752 * index entry removal in batches as large as possible.
2753 */
2754static void
2756{
2757 ReadStream *stream;
2759 Buffer vmbuffer = InvalidBuffer;
2761 TidStoreIter *iter;
2762
2763 Assert(vacrel->do_index_vacuuming);
2764 Assert(vacrel->do_index_cleanup);
2765 Assert(vacrel->num_index_scans > 0);
2766
2767 /* Report that we are now vacuuming the heap */
2770
2771 /* Update error traceback information */
2775
2776 iter = TidStoreBeginIterate(vacrel->dead_items);
2777
2778 /*
2779 * Set up the read stream for vacuum's second pass through the heap.
2780 *
2781 * It is safe to use batchmode, as vacuum_reap_lp_read_stream_next() does
2782 * not need to wait for IO and does not perform locking. Once we support
2783 * parallelism it should still be fine, as presumably the holder of locks
2784 * would never be blocked by IO while holding the lock.
2785 */
2788 vacrel->bstrategy,
2789 vacrel->rel,
2792 iter,
2793 sizeof(TidStoreIterResult));
2794
2795 while (true)
2796 {
2797 BlockNumber blkno;
2798 Buffer buf;
2799 Page page;
2801 Size freespace;
2803 int num_offsets;
2804
2805 vacuum_delay_point(false);
2806
2807 buf = read_stream_next_buffer(stream, (void **) &iter_result);
2808
2809 /* The relation is exhausted */
2810 if (!BufferIsValid(buf))
2811 break;
2812
2813 vacrel->blkno = blkno = BufferGetBlockNumber(buf);
2814
2817 Assert(num_offsets <= lengthof(offsets));
2818
2819 /*
2820 * Pin the visibility map page in case we need to mark the page
2821 * all-visible. In most cases this will be very cheap, because we'll
2822 * already have the correct page pinned anyway.
2823 */
2824 visibilitymap_pin(vacrel->rel, blkno, &vmbuffer);
2825
2826 /* We need a non-cleanup exclusive lock to mark dead_items unused */
2828 lazy_vacuum_heap_page(vacrel, blkno, buf, offsets,
2829 num_offsets, vmbuffer);
2830
2831 /* Now that we've vacuumed the page, record its available space */
2832 page = BufferGetPage(buf);
2833 freespace = PageGetHeapFreeSpace(page);
2834
2836 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
2838 }
2839
2840 read_stream_end(stream);
2841 TidStoreEndIterate(iter);
2842
2843 vacrel->blkno = InvalidBlockNumber;
2844 if (BufferIsValid(vmbuffer))
2845 ReleaseBuffer(vmbuffer);
2846
2847 /*
2848 * We set all LP_DEAD items from the first heap pass to LP_UNUSED during
2849 * the second heap pass. No more, no less.
2850 */
2851 Assert(vacrel->num_index_scans > 1 ||
2852 (vacrel->dead_items_info->num_items == vacrel->lpdead_items &&
2853 vacuumed_pages == vacrel->lpdead_item_pages));
2854
2856 (errmsg("table \"%s\": removed %" PRId64 " dead item identifiers in %u pages",
2857 vacrel->relname, vacrel->dead_items_info->num_items,
2858 vacuumed_pages)));
2859
2860 /* Revert to the previous phase information for error traceback */
2862}
2863
2864/*
2865 * lazy_vacuum_heap_page() -- free page's LP_DEAD items listed in the
2866 * vacrel->dead_items store.
2867 *
2868 * Caller must have an exclusive buffer lock on the buffer (though a full
2869 * cleanup lock is also acceptable). vmbuffer must be valid and already have
2870 * a pin on blkno's visibility map page.
2871 */
2872static void
2874 OffsetNumber *deadoffsets, int num_offsets,
2875 Buffer vmbuffer)
2876{
2877 Page page = BufferGetPage(buffer);
2879 int nunused = 0;
2880 TransactionId visibility_cutoff_xid;
2882 bool all_frozen;
2884 uint8 vmflags = 0;
2885
2886 Assert(vacrel->do_index_vacuuming);
2887
2889
2890 /* Update error traceback information */
2894
2895 /*
2896 * Before marking dead items unused, check whether the page will become
2897 * all-visible once that change is applied. This lets us reap the tuples
2898 * and mark the page all-visible within the same critical section,
2899 * enabling both changes to be emitted in a single WAL record. Since the
2900 * visibility checks may perform I/O and allocate memory, they must be
2901 * done outside the critical section.
2902 */
2903 if (heap_page_would_be_all_visible(vacrel->rel, buffer,
2904 vacrel->cutoffs.OldestXmin,
2905 deadoffsets, num_offsets,
2906 &all_frozen, &visibility_cutoff_xid,
2907 &vacrel->offnum))
2908 {
2910 if (all_frozen)
2911 {
2913 Assert(!TransactionIdIsValid(visibility_cutoff_xid));
2914 }
2915
2916 /*
2917 * Take the lock on the vmbuffer before entering a critical section.
2918 * The heap page lock must also be held while updating the VM to
2919 * ensure consistency.
2920 */
2922 }
2923
2925
2926 for (int i = 0; i < num_offsets; i++)
2927 {
2928 ItemId itemid;
2929 OffsetNumber toff = deadoffsets[i];
2930
2931 itemid = PageGetItemId(page, toff);
2932
2933 Assert(ItemIdIsDead(itemid) && !ItemIdHasStorage(itemid));
2934 ItemIdSetUnused(itemid);
2935 unused[nunused++] = toff;
2936 }
2937
2938 Assert(nunused > 0);
2939
2940 /* Attempt to truncate line pointer array now */
2942
2943 if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
2944 {
2945 /*
2946 * The page is guaranteed to have had dead line pointers, so we always
2947 * set PD_ALL_VISIBLE.
2948 */
2949 PageSetAllVisible(page);
2950 PageClearPrunable(page);
2952 vmbuffer, vmflags,
2953 vacrel->rel->rd_locator);
2954 conflict_xid = visibility_cutoff_xid;
2955 }
2956
2957 /*
2958 * Mark buffer dirty before we write WAL.
2959 */
2960 MarkBufferDirty(buffer);
2961
2962 /* XLOG stuff */
2963 if (RelationNeedsWAL(vacrel->rel))
2964 {
2965 log_heap_prune_and_freeze(vacrel->rel, buffer,
2966 vmflags != 0 ? vmbuffer : InvalidBuffer,
2967 vmflags,
2969 false, /* no cleanup lock required */
2971 NULL, 0, /* frozen */
2972 NULL, 0, /* redirected */
2973 NULL, 0, /* dead */
2974 unused, nunused);
2975 }
2976
2978
2980 {
2981 /* Count the newly set VM page for logging */
2982 LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
2983 vacrel->new_all_visible_pages++;
2984 if (all_frozen)
2985 vacrel->new_all_visible_all_frozen_pages++;
2986 }
2987
2988 /* Revert to the previous phase information for error traceback */
2990}
2991
2992/*
2993 * Trigger the failsafe to avoid wraparound failure when vacrel table has a
2994 * relfrozenxid and/or relminmxid that is dangerously far in the past.
2995 * Triggering the failsafe makes the ongoing VACUUM bypass any further index
2996 * vacuuming and heap vacuuming. Truncating the heap is also bypassed.
2997 *
2998 * Any remaining work (work that VACUUM cannot just bypass) is typically sped
2999 * up when the failsafe triggers. VACUUM stops applying any cost-based delay
3000 * that it started out with.
3001 *
3002 * Returns true when failsafe has been triggered.
3003 */
3004static bool
3006{
3007 /* Don't warn more than once per VACUUM */
3009 return true;
3010
3012 {
3013 const int progress_index[] = {
3017 };
3019
3020 VacuumFailsafeActive = true;
3021
3022 /*
3023 * Abandon use of a buffer access strategy to allow use of all of
3024 * shared buffers. We assume the caller who allocated the memory for
3025 * the BufferAccessStrategy will free it.
3026 */
3027 vacrel->bstrategy = NULL;
3028
3029 /* Disable index vacuuming, index cleanup, and heap rel truncation */
3030 vacrel->do_index_vacuuming = false;
3031 vacrel->do_index_cleanup = false;
3032 vacrel->do_rel_truncate = false;
3033
3034 /* Reset the progress counters and set the failsafe mode */
3036
3038 (errmsg("bypassing nonessential maintenance of table \"%s.%s.%s\" as a failsafe after %d index scans",
3039 vacrel->dbname, vacrel->relnamespace, vacrel->relname,
3040 vacrel->num_index_scans),
3041 errdetail("The table's relfrozenxid or relminmxid is too far in the past."),
3042 errhint("Consider increasing configuration parameter \"maintenance_work_mem\" or \"autovacuum_work_mem\".\n"
3043 "You might also need to consider other ways for VACUUM to keep up with the allocation of transaction IDs.")));
3044
3045 /* Stop applying cost limits from this point on */
3046 VacuumCostActive = false;
3048
3049 return true;
3050 }
3051
3052 return false;
3053}
3054
3055/*
3056 * lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
3057 */
3058static void
3060{
3061 double reltuples = vacrel->new_rel_tuples;
3062 bool estimated_count = vacrel->scanned_pages < vacrel->rel_pages;
3063 const int progress_start_index[] = {
3066 };
3067 const int progress_end_index[] = {
3070 };
3072 int64 progress_end_val[2] = {0, 0};
3073
3074 Assert(vacrel->do_index_cleanup);
3075 Assert(vacrel->nindexes > 0);
3076
3077 /*
3078 * Report that we are now cleaning up indexes and the number of indexes to
3079 * cleanup.
3080 */
3082 progress_start_val[1] = vacrel->nindexes;
3084
3086 {
3087 for (int idx = 0; idx < vacrel->nindexes; idx++)
3088 {
3089 Relation indrel = vacrel->indrels[idx];
3090 IndexBulkDeleteResult *istat = vacrel->indstats[idx];
3091
3092 vacrel->indstats[idx] =
3093 lazy_cleanup_one_index(indrel, istat, reltuples,
3094 estimated_count, vacrel);
3095
3096 /* Report the number of indexes cleaned up */
3098 idx + 1);
3099 }
3100 }
3101 else
3102 {
3103 /* Outsource everything to parallel variant */
3105 vacrel->num_index_scans,
3106 estimated_count);
3107 }
3108
3109 /* Reset the progress counters */
3111}
3112
3113/*
3114 * lazy_vacuum_one_index() -- vacuum index relation.
3115 *
3116 * Delete all the index tuples containing a TID collected in
3117 * vacrel->dead_items. Also update running statistics. Exact
3118 * details depend on index AM's ambulkdelete routine.
3119 *
3120 * reltuples is the number of heap tuples to be passed to the
3121 * bulkdelete callback. It's always assumed to be estimated.
3122 * See indexam.sgml for more info.
3123 *
3124 * Returns bulk delete stats derived from input stats
3125 */
3126static IndexBulkDeleteResult *
3128 double reltuples, LVRelState *vacrel)
3129{
3132
3133 ivinfo.index = indrel;
3134 ivinfo.heaprel = vacrel->rel;
3135 ivinfo.analyze_only = false;
3136 ivinfo.report_progress = false;
3137 ivinfo.estimated_count = true;
3138 ivinfo.message_level = DEBUG2;
3139 ivinfo.num_heap_tuples = reltuples;
3140 ivinfo.strategy = vacrel->bstrategy;
3141
3142 /*
3143 * Update error traceback information.
3144 *
3145 * The index name is saved during this phase and restored immediately
3146 * after this phase. See vacuum_error_callback.
3147 */
3148 Assert(vacrel->indname == NULL);
3153
3154 /* Do bulk deletion */
3155 istat = vac_bulkdel_one_index(&ivinfo, istat, vacrel->dead_items,
3156 vacrel->dead_items_info);
3157
3158 /* Revert to the previous phase information for error traceback */
3160 pfree(vacrel->indname);
3161 vacrel->indname = NULL;
3162
3163 return istat;
3164}
3165
3166/*
3167 * lazy_cleanup_one_index() -- do post-vacuum cleanup for index relation.
3168 *
3169 * Calls index AM's amvacuumcleanup routine. reltuples is the number
3170 * of heap tuples and estimated_count is true if reltuples is an
3171 * estimated value. See indexam.sgml for more info.
3172 *
3173 * Returns bulk delete stats derived from input stats
3174 */
3175static IndexBulkDeleteResult *
3177 double reltuples, bool estimated_count,
3179{
3182
3183 ivinfo.index = indrel;
3184 ivinfo.heaprel = vacrel->rel;
3185 ivinfo.analyze_only = false;
3186 ivinfo.report_progress = false;
3187 ivinfo.estimated_count = estimated_count;
3188 ivinfo.message_level = DEBUG2;
3189
3190 ivinfo.num_heap_tuples = reltuples;
3191 ivinfo.strategy = vacrel->bstrategy;
3192
3193 /*
3194 * Update error traceback information.
3195 *
3196 * The index name is saved during this phase and restored immediately
3197 * after this phase. See vacuum_error_callback.
3198 */
3199 Assert(vacrel->indname == NULL);
3204
3205 istat = vac_cleanup_one_index(&ivinfo, istat);
3206
3207 /* Revert to the previous phase information for error traceback */
3209 pfree(vacrel->indname);
3210 vacrel->indname = NULL;
3211
3212 return istat;
3213}
3214
3215/*
3216 * should_attempt_truncation - should we attempt to truncate the heap?
3217 *
3218 * Don't even think about it unless we have a shot at releasing a goodly
3219 * number of pages. Otherwise, the time taken isn't worth it, mainly because
3220 * an AccessExclusive lock must be replayed on any hot standby, where it can
3221 * be particularly disruptive.
3222 *
3223 * Also don't attempt it if wraparound failsafe is in effect. The entire
3224 * system might be refusing to allocate new XIDs at this point. The system
3225 * definitely won't return to normal unless and until VACUUM actually advances
3226 * the oldest relfrozenxid -- which hasn't happened for target rel just yet.
3227 * If lazy_truncate_heap attempted to acquire an AccessExclusiveLock to
3228 * truncate the table under these circumstances, an XID exhaustion error might
3229 * make it impossible for VACUUM to fix the underlying XID exhaustion problem.
3230 * There is very little chance of truncation working out when the failsafe is
3231 * in effect in any case. lazy_scan_prune makes the optimistic assumption
3232 * that any LP_DEAD items it encounters will always be LP_UNUSED by the time
3233 * we're called.
3234 */
3235static bool
3237{
3239
3240 if (!vacrel->do_rel_truncate || VacuumFailsafeActive)
3241 return false;
3242
3243 possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
3244 if (possibly_freeable > 0 &&
3247 return true;
3248
3249 return false;
3250}
3251
3252/*
3253 * lazy_truncate_heap - try to truncate off any empty pages at the end
3254 */
3255static void
3257{
3258 BlockNumber orig_rel_pages = vacrel->rel_pages;
3261 int lock_retry;
3262
3263 /* Report that we are now truncating */
3266
3267 /* Update error traceback information one last time */
3269 vacrel->nonempty_pages, InvalidOffsetNumber);
3270
3271 /*
3272 * Loop until no more truncating can be done.
3273 */
3274 do
3275 {
3276 /*
3277 * We need full exclusive lock on the relation in order to do
3278 * truncation. If we can't get it, give up rather than waiting --- we
3279 * don't want to block other backends, and we don't want to deadlock
3280 * (which is quite possible considering we already hold a lower-grade
3281 * lock).
3282 */
3283 lock_waiter_detected = false;
3284 lock_retry = 0;
3285 while (true)
3286 {
3288 break;
3289
3290 /*
3291 * Check for interrupts while trying to (re-)acquire the exclusive
3292 * lock.
3293 */
3295
3298 {
3299 /*
3300 * We failed to establish the lock in the specified number of
3301 * retries. This means we give up truncating.
3302 */
3303 ereport(vacrel->verbose ? INFO : DEBUG2,
3304 (errmsg("\"%s\": stopping truncate due to conflicting lock request",
3305 vacrel->relname)));
3306 return;
3307 }
3308
3314 }
3315
3316 /*
3317 * Now that we have exclusive lock, look to see if the rel has grown
3318 * whilst we were vacuuming with non-exclusive lock. If so, give up;
3319 * the newly added pages presumably contain non-deletable tuples.
3320 */
3323 {
3324 /*
3325 * Note: we intentionally don't update vacrel->rel_pages with the
3326 * new rel size here. If we did, it would amount to assuming that
3327 * the new pages are empty, which is unlikely. Leaving the numbers
3328 * alone amounts to assuming that the new pages have the same
3329 * tuple density as existing ones, which is less unlikely.
3330 */
3332 return;
3333 }
3334
3335 /*
3336 * Scan backwards from the end to verify that the end pages actually
3337 * contain no tuples. This is *necessary*, not optional, because
3338 * other backends could have added tuples to these pages whilst we
3339 * were vacuuming.
3340 */
3342 vacrel->blkno = new_rel_pages;
3343
3345 {
3346 /* can't do anything after all */
3348 return;
3349 }
3350
3351 /*
3352 * Okay to truncate.
3353 */
3355
3356 /*
3357 * We can release the exclusive lock as soon as we have truncated.
3358 * Other backends can't safely access the relation until they have
3359 * processed the smgr invalidation that smgrtruncate sent out ... but
3360 * that should happen as part of standard invalidation processing once
3361 * they acquire lock on the relation.
3362 */
3364
3365 /*
3366 * Update statistics. Here, it *is* correct to adjust rel_pages
3367 * without also touching reltuples, since the tuple count wasn't
3368 * changed by the truncation.
3369 */
3370 vacrel->removed_pages += orig_rel_pages - new_rel_pages;
3371 vacrel->rel_pages = new_rel_pages;
3372
3373 ereport(vacrel->verbose ? INFO : DEBUG2,
3374 (errmsg("table \"%s\": truncated %u to %u pages",
3375 vacrel->relname,
3378 } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
3379}
3380
3381/*
3382 * Rescan end pages to verify that they are (still) empty of tuples.
3383 *
3384 * Returns number of nondeletable pages (last nonempty page + 1).
3385 */
3386static BlockNumber
3388{
3390 "prefetch size must be power of 2");
3391
3392 BlockNumber blkno;
3394 instr_time starttime;
3395
3396 /* Initialize the starttime if we check for conflicting lock requests */
3397 INSTR_TIME_SET_CURRENT(starttime);
3398
3399 /*
3400 * Start checking blocks at what we believe relation end to be and move
3401 * backwards. (Strange coding of loop control is needed because blkno is
3402 * unsigned.) To make the scan faster, we prefetch a few blocks at a time
3403 * in forward direction, so that OS-level readahead can kick in.
3404 */
3405 blkno = vacrel->rel_pages;
3407 while (blkno > vacrel->nonempty_pages)
3408 {
3409 Buffer buf;
3410 Page page;
3411 OffsetNumber offnum,
3412 maxoff;
3413 bool hastup;
3414
3415 /*
3416 * Check if another process requests a lock on our relation. We are
3417 * holding an AccessExclusiveLock here, so they will be waiting. We
3418 * only do this once per VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL, and we
3419 * only check if that interval has elapsed once every 32 blocks to
3420 * keep the number of system calls and actual shared lock table
3421 * lookups to a minimum.
3422 */
3423 if ((blkno % 32) == 0)
3424 {
3427
3430 INSTR_TIME_SUBTRACT(elapsed, starttime);
3431 if ((INSTR_TIME_GET_MICROSEC(elapsed) / 1000)
3433 {
3435 {
3436 ereport(vacrel->verbose ? INFO : DEBUG2,
3437 (errmsg("table \"%s\": suspending truncate due to conflicting lock request",
3438 vacrel->relname)));
3439
3440 *lock_waiter_detected = true;
3441 return blkno;
3442 }
3443 starttime = currenttime;
3444 }
3445 }
3446
3447 /*
3448 * We don't insert a vacuum delay point here, because we have an
3449 * exclusive lock on the table which we want to hold for as short a
3450 * time as possible. We still need to check for interrupts however.
3451 */
3453
3454 blkno--;
3455
3456 /* If we haven't prefetched this lot yet, do so now. */
3457 if (prefetchedUntil > blkno)
3458 {
3461
3462 prefetchStart = blkno & ~(PREFETCH_SIZE - 1);
3463 for (pblkno = prefetchStart; pblkno <= blkno; pblkno++)
3464 {
3467 }
3469 }
3470
3472 vacrel->bstrategy);
3473
3474 /* In this phase we only need shared access to the buffer */
3476
3477 page = BufferGetPage(buf);
3478
3479 if (PageIsNew(page) || PageIsEmpty(page))
3480 {
3482 continue;
3483 }
3484
3485 hastup = false;
3486 maxoff = PageGetMaxOffsetNumber(page);
3487 for (offnum = FirstOffsetNumber;
3488 offnum <= maxoff;
3489 offnum = OffsetNumberNext(offnum))
3490 {
3491 ItemId itemid;
3492
3493 itemid = PageGetItemId(page, offnum);
3494
3495 /*
3496 * Note: any non-unused item should be taken as a reason to keep
3497 * this page. Even an LP_DEAD item makes truncation unsafe, since
3498 * we must not have cleaned out its index entries.
3499 */
3500 if (ItemIdIsUsed(itemid))
3501 {
3502 hastup = true;
3503 break; /* can stop scanning */
3504 }
3505 } /* scan along page */
3506
3508
3509 /* Done scanning if we found a tuple here */
3510 if (hastup)
3511 return blkno + 1;
3512 }
3513
3514 /*
3515 * If we fall out of the loop, all the previously-thought-to-be-empty
3516 * pages still are; we need not bother to look at the last known-nonempty
3517 * page.
3518 */
3519 return vacrel->nonempty_pages;
3520}
3521
3522/*
3523 * Allocate dead_items and dead_items_info (either using palloc, or in dynamic
3524 * shared memory). Sets both in vacrel for caller.
3525 *
3526 * Also handles parallel initialization as part of allocating dead_items in
3527 * DSM when required.
3528 */
3529static void
3531{
3532 VacDeadItemsInfo *dead_items_info;
3534 autovacuum_work_mem != -1 ?
3536
3537 /*
3538 * Initialize state for a parallel vacuum. As of now, only one worker can
3539 * be used for an index, so we invoke parallelism only if there are at
3540 * least two indexes on a table.
3541 */
3542 if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
3543 {
3544 /*
3545 * Since parallel workers cannot access data in temporary tables, we
3546 * can't perform parallel vacuum on them.
3547 */
3549 {
3550 /*
3551 * Give warning only if the user explicitly tries to perform a
3552 * parallel vacuum on the temporary table.
3553 */
3554 if (nworkers > 0)
3556 (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
3557 vacrel->relname)));
3558 }
3559 else
3560 vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
3561 vacrel->nindexes, nworkers,
3563 vacrel->verbose ? INFO : DEBUG2,
3564 vacrel->bstrategy);
3565
3566 /*
3567 * If parallel mode started, dead_items and dead_items_info spaces are
3568 * allocated in DSM.
3569 */
3571 {
3573 &vacrel->dead_items_info);
3574 return;
3575 }
3576 }
3577
3578 /*
3579 * Serial VACUUM case. Allocate both dead_items and dead_items_info
3580 * locally.
3581 */
3582
3583 dead_items_info = palloc_object(VacDeadItemsInfo);
3584 dead_items_info->max_bytes = vac_work_mem * (Size) 1024;
3585 dead_items_info->num_items = 0;
3586 vacrel->dead_items_info = dead_items_info;
3587
3588 vacrel->dead_items = TidStoreCreateLocal(dead_items_info->max_bytes, true);
3589}
3590
3591/*
3592 * Add the given block number and offset numbers to dead_items.
3593 */
3594static void
3596 int num_offsets)
3597{
3598 const int prog_index[2] = {
3601 };
3602 int64 prog_val[2];
3603
3604 TidStoreSetBlockOffsets(vacrel->dead_items, blkno, offsets, num_offsets);
3605 vacrel->dead_items_info->num_items += num_offsets;
3606
3607 /* update the progress information */
3608 prog_val[0] = vacrel->dead_items_info->num_items;
3609 prog_val[1] = TidStoreMemoryUsage(vacrel->dead_items);
3611}
3612
3613/*
3614 * Forget all collected dead items.
3615 */
3616static void
3618{
3619 /* Update statistics for dead items */
3620 vacrel->num_dead_items_resets++;
3621 vacrel->total_dead_items_bytes += TidStoreMemoryUsage(vacrel->dead_items);
3622
3624 {
3626 vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs,
3627 &vacrel->dead_items_info);
3628 return;
3629 }
3630
3631 /* Recreate the tidstore with the same max_bytes limitation */
3632 TidStoreDestroy(vacrel->dead_items);
3633 vacrel->dead_items = TidStoreCreateLocal(vacrel->dead_items_info->max_bytes, true);
3634
3635 /* Reset the counter */
3636 vacrel->dead_items_info->num_items = 0;
3637}
3638
3639/*
3640 * Perform cleanup for resources allocated in dead_items_alloc
3641 */
3642static void
3644{
3646 {
3647 /* Don't bother with pfree here */
3648 return;
3649 }
3650
3651 /* End parallel mode */
3652 parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
3653 vacrel->pvs = NULL;
3654}
3655
3656#ifdef USE_ASSERT_CHECKING
3657
3658/*
3659 * Wrapper for heap_page_would_be_all_visible() which can be used for callers
3660 * that expect no LP_DEAD on the page. Currently assert-only, but there is no
3661 * reason not to use it outside of asserts.
3662 */
3663static bool
3665 TransactionId OldestXmin,
3666 bool *all_frozen,
3667 TransactionId *visibility_cutoff_xid,
3669{
3670
3672 OldestXmin,
3673 NULL, 0,
3674 all_frozen,
3675 visibility_cutoff_xid,
3677}
3678#endif
3679
3680/*
3681 * Check whether the heap page in buf is all-visible except for the dead
3682 * tuples referenced in the deadoffsets array.
3683 *
3684 * Vacuum uses this to check if a page would become all-visible after reaping
3685 * known dead tuples. This function does not remove the dead items.
3686 *
3687 * This cannot be called in a critical section, as the visibility checks may
3688 * perform IO and allocate memory.
3689 *
3690 * Returns true if the page is all-visible other than the provided
3691 * deadoffsets and false otherwise.
3692 *
3693 * OldestXmin is used to determine visibility.
3694 *
3695 * Output parameters:
3696 *
3697 * - *all_frozen: true if every tuple on the page is frozen
3698 * - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
3699 * - *logging_offnum: OffsetNumber of current tuple being processed;
3700 * used by vacuum's error callback system.
3701 *
3702 * Callers looking to verify that the page is already all-visible can call
3703 * heap_page_is_all_visible().
3704 *
3705 * This logic is closely related to heap_prune_record_unchanged_lp_normal().
3706 * If you modify this function, ensure consistency with that code. An
3707 * assertion cross-checks that both remain in agreement. Do not introduce new
3708 * side-effects.
3709 */
3710static bool
3712 TransactionId OldestXmin,
3713 OffsetNumber *deadoffsets,
3714 int ndeadoffsets,
3715 bool *all_frozen,
3716 TransactionId *visibility_cutoff_xid,
3718{
3719 Page page = BufferGetPage(buf);
3721 OffsetNumber offnum,
3722 maxoff;
3723 bool all_visible = true;
3724 int matched_dead_count = 0;
3725
3726 *visibility_cutoff_xid = InvalidTransactionId;
3727 *all_frozen = true;
3728
3729 Assert(ndeadoffsets == 0 || deadoffsets);
3730
3731#ifdef USE_ASSERT_CHECKING
3732 /* Confirm input deadoffsets[] is strictly sorted */
3733 if (ndeadoffsets > 1)
3734 {
3735 for (int i = 1; i < ndeadoffsets; i++)
3736 Assert(deadoffsets[i - 1] < deadoffsets[i]);
3737 }
3738#endif
3739
3740 maxoff = PageGetMaxOffsetNumber(page);
3741 for (offnum = FirstOffsetNumber;
3742 offnum <= maxoff && all_visible;
3743 offnum = OffsetNumberNext(offnum))
3744 {
3745 ItemId itemid;
3746 HeapTupleData tuple;
3748
3749 /*
3750 * Set the offset number so that we can display it along with any
3751 * error that occurred while processing this tuple.
3752 */
3753 *logging_offnum = offnum;
3754 itemid = PageGetItemId(page, offnum);
3755
3756 /* Unused or redirect line pointers are of no interest */
3757 if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
3758 continue;
3759
3760 ItemPointerSet(&(tuple.t_self), blockno, offnum);
3761
3762 /*
3763 * Dead line pointers can have index pointers pointing to them. So
3764 * they can't be treated as visible
3765 */
3766 if (ItemIdIsDead(itemid))
3767 {
3768 if (!deadoffsets ||
3770 deadoffsets[matched_dead_count] != offnum)
3771 {
3772 *all_frozen = all_visible = false;
3773 break;
3774 }
3776 continue;
3777 }
3778
3779 Assert(ItemIdIsNormal(itemid));
3780
3781 tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
3782 tuple.t_len = ItemIdGetLength(itemid);
3783 tuple.t_tableOid = RelationGetRelid(rel);
3784
3785 /* Visibility checks may do IO or allocate memory */
3788 {
3789 case HEAPTUPLE_LIVE:
3790 {
3791 TransactionId xmin;
3792
3793 /* Check comments in lazy_scan_prune. */
3795 {
3796 all_visible = false;
3797 *all_frozen = false;
3798 break;
3799 }
3800
3801 /*
3802 * The inserter definitely committed. But is it old enough
3803 * that everyone sees it as committed?
3804 */
3805 xmin = HeapTupleHeaderGetXmin(tuple.t_data);
3806 if (!TransactionIdPrecedes(xmin, OldestXmin))
3807 {
3808 all_visible = false;
3809 *all_frozen = false;
3810 break;
3811 }
3812
3813 /* Track newest xmin on page. */
3814 if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
3816 *visibility_cutoff_xid = xmin;
3817
3818 /* Check whether this tuple is already frozen or not */
3819 if (all_visible && *all_frozen &&
3821 *all_frozen = false;
3822 }
3823 break;
3824
3825 case HEAPTUPLE_DEAD:
3829 {
3830 all_visible = false;
3831 *all_frozen = false;
3832 break;
3833 }
3834 default:
3835 elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
3836 break;
3837 }
3838 } /* scan along page */
3839
3840 /* Clear the offset information once we have processed the given page. */
3842
3843 return all_visible;
3844}
3845
3846/*
3847 * Update index statistics in pg_class if the statistics are accurate.
3848 */
3849static void
3851{
3852 Relation *indrels = vacrel->indrels;
3853 int nindexes = vacrel->nindexes;
3854 IndexBulkDeleteResult **indstats = vacrel->indstats;
3855
3856 Assert(vacrel->do_index_cleanup);
3857
3858 for (int idx = 0; idx < nindexes; idx++)
3859 {
3860 Relation indrel = indrels[idx];
3861 IndexBulkDeleteResult *istat = indstats[idx];
3862
3863 if (istat == NULL || istat->estimated_count)
3864 continue;
3865
3866 /* Update index statistics */
3868 istat->num_pages,
3869 istat->num_index_tuples,
3870 0, 0,
3871 false,
3874 NULL, NULL, false);
3875 }
3876}
3877
3878/*
3879 * Error context callback for errors occurring during vacuum. The error
3880 * context messages for index phases should match the messages set in parallel
3881 * vacuum. If you change this function for those phases, change
3882 * parallel_vacuum_error_callback() as well.
3883 */
3884static void
3886{
3888
3889 switch (errinfo->phase)
3890 {
3892 if (BlockNumberIsValid(errinfo->blkno))
3893 {
3894 if (OffsetNumberIsValid(errinfo->offnum))
3895 errcontext("while scanning block %u offset %u of relation \"%s.%s\"",
3896 errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
3897 else
3898 errcontext("while scanning block %u of relation \"%s.%s\"",
3899 errinfo->blkno, errinfo->relnamespace, errinfo->relname);
3900 }
3901 else
3902 errcontext("while scanning relation \"%s.%s\"",
3903 errinfo->relnamespace, errinfo->relname);
3904 break;
3905
3907 if (BlockNumberIsValid(errinfo->blkno))
3908 {
3909 if (OffsetNumberIsValid(errinfo->offnum))
3910 errcontext("while vacuuming block %u offset %u of relation \"%s.%s\"",
3911 errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
3912 else
3913 errcontext("while vacuuming block %u of relation \"%s.%s\"",
3914 errinfo->blkno, errinfo->relnamespace, errinfo->relname);
3915 }
3916 else
3917 errcontext("while vacuuming relation \"%s.%s\"",
3918 errinfo->relnamespace, errinfo->relname);
3919 break;
3920
3922 errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
3923 errinfo->indname, errinfo->relnamespace, errinfo->relname);
3924 break;
3925
3927 errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",
3928 errinfo->indname, errinfo->relnamespace, errinfo->relname);
3929 break;
3930
3932 if (BlockNumberIsValid(errinfo->blkno))
3933 errcontext("while truncating relation \"%s.%s\" to %u blocks",
3934 errinfo->relnamespace, errinfo->relname, errinfo->blkno);
3935 break;
3936
3938 default:
3939 return; /* do nothing; the errinfo may not be
3940 * initialized */
3941 }
3942}
3943
3944/*
3945 * Updates the information required for vacuum error callback. This also saves
3946 * the current information which can be later restored via restore_vacuum_error_info.
3947 */
3948static void
3950 int phase, BlockNumber blkno, OffsetNumber offnum)
3951{
3952 if (saved_vacrel)
3953 {
3954 saved_vacrel->offnum = vacrel->offnum;
3955 saved_vacrel->blkno = vacrel->blkno;
3956 saved_vacrel->phase = vacrel->phase;
3957 }
3958
3959 vacrel->blkno = blkno;
3960 vacrel->offnum = offnum;
3961 vacrel->phase = phase;
3962}
3963
3964/*
3965 * Restores the vacuum information saved via a prior call to update_vacuum_error_info.
3966 */
3967static void
3970{
3971 vacrel->blkno = saved_vacrel->blkno;
3972 vacrel->offnum = saved_vacrel->offnum;
3973 vacrel->phase = saved_vacrel->phase;
3974}
Datum idx(PG_FUNCTION_ARGS)
Definition _int_op.c:262
int autovacuum_work_mem
Definition autovacuum.c:122
void TimestampDifference(TimestampTz start_time, TimestampTz stop_time, long *secs, int *microsecs)
Definition timestamp.c:1712
bool TimestampDifferenceExceeds(TimestampTz start_time, TimestampTz stop_time, int msec)
Definition timestamp.c:1772
TimestampTz GetCurrentTimestamp(void)
Definition timestamp.c:1636
void pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
void pgstat_progress_update_param(int index, int64 val)
void pgstat_progress_update_multi_param(int nparam, const int *index, const int64 *val)
void pgstat_progress_end_command(void)
@ PROGRESS_COMMAND_VACUUM
PgBackendStatus * MyBEEntry
uint32 BlockNumber
Definition block.h:31
#define InvalidBlockNumber
Definition block.h:33
static bool BlockNumberIsValid(BlockNumber blockNumber)
Definition block.h:71
int Buffer
Definition buf.h:23
#define InvalidBuffer
Definition buf.h:25
bool track_io_timing
Definition bufmgr.c:192
void CheckBufferIsPinnedOnce(Buffer buffer)
Definition bufmgr.c:6504
BlockNumber BufferGetBlockNumber(Buffer buffer)
Definition bufmgr.c:4357
PrefetchBufferResult PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
Definition bufmgr.c:782
bool BufferIsLockedByMeInMode(Buffer buffer, BufferLockMode mode)
Definition bufmgr.c:3003
void ReleaseBuffer(Buffer buffer)
Definition bufmgr.c:5505
void UnlockReleaseBuffer(Buffer buffer)
Definition bufmgr.c:5522
void MarkBufferDirty(Buffer buffer)
Definition bufmgr.c:3063
void LockBufferForCleanup(Buffer buffer)
Definition bufmgr.c:6537
Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode, BufferAccessStrategy strategy)
Definition bufmgr.c:921
bool ConditionalLockBufferForCleanup(Buffer buffer)
Definition bufmgr.c:6710
#define RelationGetNumberOfBlocks(reln)
Definition bufmgr.h:307
static Page BufferGetPage(Buffer buffer)
Definition bufmgr.h:470
@ BUFFER_LOCK_SHARE
Definition bufmgr.h:210
@ BUFFER_LOCK_EXCLUSIVE
Definition bufmgr.h:220
@ BUFFER_LOCK_UNLOCK
Definition bufmgr.h:205
static void LockBuffer(Buffer buffer, BufferLockMode mode)
Definition bufmgr.h:332
@ RBM_NORMAL
Definition bufmgr.h:46
static bool BufferIsValid(Buffer bufnum)
Definition bufmgr.h:421
Size PageGetHeapFreeSpace(const PageData *page)
Definition bufpage.c:990
void PageTruncateLinePointerArray(Page page)
Definition bufpage.c:834
static bool PageIsEmpty(const PageData *page)
Definition bufpage.h:249
static bool PageIsAllVisible(const PageData *page)
Definition bufpage.h:455
static void PageClearAllVisible(Page page)
Definition bufpage.h:465
static bool PageIsNew(const PageData *page)
Definition bufpage.h:259
#define SizeOfPageHeaderData
Definition bufpage.h:242
static void PageSetAllVisible(Page page)
Definition bufpage.h:460
static ItemId PageGetItemId(Page page, OffsetNumber offsetNumber)
Definition bufpage.h:269
static void * PageGetItem(PageData *page, const ItemIdData *itemId)
Definition bufpage.h:379
PageData * Page
Definition bufpage.h:81
#define PageClearPrunable(page)
Definition bufpage.h:486
static XLogRecPtr PageGetLSN(const PageData *page)
Definition bufpage.h:411
static OffsetNumber PageGetMaxOffsetNumber(const PageData *page)
Definition bufpage.h:397
uint8_t uint8
Definition c.h:616
#define ngettext(s, p, n)
Definition c.h:1272
#define Max(x, y)
Definition c.h:1087
#define Assert(condition)
Definition c.h:945
int64_t int64
Definition c.h:615
TransactionId MultiXactId
Definition c.h:748
int32_t int32
Definition c.h:614
#define unlikely(x)
Definition c.h:432
uint32_t uint32
Definition c.h:618
#define lengthof(array)
Definition c.h:875
#define StaticAssertDecl(condition, errmessage)
Definition c.h:1010
uint32 TransactionId
Definition c.h:738
size_t Size
Definition c.h:691
int64 TimestampTz
Definition timestamp.h:39
Datum arg
Definition elog.c:1322
ErrorContextCallback * error_context_stack
Definition elog.c:99
int errcode(int sqlerrcode)
Definition elog.c:874
#define _(x)
Definition elog.c:95
#define LOG
Definition elog.h:31
#define errcontext
Definition elog.h:198
int errhint(const char *fmt,...) pg_attribute_printf(1
int errdetail(const char *fmt,...) pg_attribute_printf(1
int int errmsg_internal(const char *fmt,...) pg_attribute_printf(1
#define WARNING
Definition elog.h:36
#define DEBUG2
Definition elog.h:29
#define ERROR
Definition elog.h:39
#define elog(elevel,...)
Definition elog.h:226
#define INFO
Definition elog.h:34
#define ereport(elevel,...)
Definition elog.h:150
#define palloc_object(type)
Definition fe_memutils.h:74
#define palloc_array(type, count)
Definition fe_memutils.h:76
#define palloc0_object(type)
Definition fe_memutils.h:75
void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start, BlockNumber end)
Definition freespace.c:377
Size GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
Definition freespace.c:244
void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk, Size spaceAvail)
Definition freespace.c:194
bool VacuumCostActive
Definition globals.c:158
int VacuumCostBalance
Definition globals.c:157
int maintenance_work_mem
Definition globals.c:133
volatile uint32 CritSectionCount
Definition globals.c:45
struct Latch * MyLatch
Definition globals.c:63
Oid MyDatabaseId
Definition globals.c:94
bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
Definition heapam.c:7910
bool heap_tuple_should_freeze(HeapTupleHeader tuple, const struct VacuumCutoffs *cutoffs, TransactionId *NoFreezePageRelfrozenXid, MultiXactId *NoFreezePageRelminMxid)
Definition heapam.c:7965
#define HEAP_PAGE_PRUNE_FREEZE
Definition heapam.h:44
@ HEAPTUPLE_RECENTLY_DEAD
Definition heapam.h:140
@ HEAPTUPLE_INSERT_IN_PROGRESS
Definition heapam.h:141
@ HEAPTUPLE_LIVE
Definition heapam.h:139
@ HEAPTUPLE_DELETE_IN_PROGRESS
Definition heapam.h:142
@ HEAPTUPLE_DEAD
Definition heapam.h:138
@ PRUNE_VACUUM_CLEANUP
Definition heapam.h:254
@ PRUNE_VACUUM_SCAN
Definition heapam.h:253
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW
Definition heapam.h:43
HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer, TransactionId *dead_after)
HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, Buffer buffer)
HeapTupleHeaderData * HeapTupleHeader
Definition htup.h:23
static TransactionId HeapTupleHeaderGetXmin(const HeapTupleHeaderData *tup)
#define MaxHeapTuplesPerPage
static bool HeapTupleHeaderXminCommitted(const HeapTupleHeaderData *tup)
#define INSTR_TIME_SET_CURRENT(t)
Definition instr_time.h:122
#define INSTR_TIME_SUBTRACT(x, y)
Definition instr_time.h:177
#define INSTR_TIME_GET_MICROSEC(t)
Definition instr_time.h:192
WalUsage pgWalUsage
Definition instrument.c:22
void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
Definition instrument.c:289
BufferUsage pgBufferUsage
Definition instrument.c:20
void BufferUsageAccumDiff(BufferUsage *dst, const BufferUsage *add, const BufferUsage *sub)
Definition instrument.c:249
static int pg_cmp_u16(uint16 a, uint16 b)
Definition int.h:707
int b
Definition isn.c:74
int a
Definition isn.c:73
int i
Definition isn.c:77
#define ItemIdGetLength(itemId)
Definition itemid.h:59
#define ItemIdIsNormal(itemId)
Definition itemid.h:99
#define ItemIdIsDead(itemId)
Definition itemid.h:113
#define ItemIdIsUsed(itemId)
Definition itemid.h:92
#define ItemIdSetUnused(itemId)
Definition itemid.h:128
#define ItemIdIsRedirected(itemId)
Definition itemid.h:106
#define ItemIdHasStorage(itemId)
Definition itemid.h:120
static void ItemPointerSet(ItemPointerData *pointer, BlockNumber blockNumber, OffsetNumber offNum)
Definition itemptr.h:135
void ResetLatch(Latch *latch)
Definition latch.c:374
int WaitLatch(Latch *latch, int wakeEvents, long timeout, uint32 wait_event_info)
Definition latch.c:172
void UnlockRelation(Relation relation, LOCKMODE lockmode)
Definition lmgr.c:314
bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode)
Definition lmgr.c:278
bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode)
Definition lmgr.c:367
#define NoLock
Definition lockdefs.h:34
#define AccessExclusiveLock
Definition lockdefs.h:43
#define RowExclusiveLock
Definition lockdefs.h:38
char * get_database_name(Oid dbid)
Definition lsyscache.c:1312
char * get_namespace_name(Oid nspid)
Definition lsyscache.c:3588
char * pstrdup(const char *in)
Definition mcxt.c:1781
void pfree(void *pointer)
Definition mcxt.c:1616
void * palloc0(Size size)
Definition mcxt.c:1417
#define AmAutoVacuumWorkerProcess()
Definition miscadmin.h:383
#define START_CRIT_SECTION()
Definition miscadmin.h:150
#define CHECK_FOR_INTERRUPTS()
Definition miscadmin.h:123
#define END_CRIT_SECTION()
Definition miscadmin.h:152
bool MultiXactIdPrecedes(MultiXactId multi1, MultiXactId multi2)
Definition multixact.c:2849
bool MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
Definition multixact.c:2863
#define MultiXactIdIsValid(multi)
Definition multixact.h:29
#define InvalidMultiXactId
Definition multixact.h:25
static char * errmsg
#define InvalidOffsetNumber
Definition off.h:26
#define OffsetNumberIsValid(offsetNumber)
Definition off.h:39
#define OffsetNumberNext(offsetNumber)
Definition off.h:52
uint16 OffsetNumber
Definition off.h:24
#define FirstOffsetNumber
Definition off.h:27
#define MaxOffsetNumber
Definition off.h:28
static int verbose
#define ERRCODE_DATA_CORRUPTED
uint32 pg_prng_uint32(pg_prng_state *state)
Definition pg_prng.c:227
pg_prng_state pg_global_prng_state
Definition pg_prng.c:34
const char * pg_rusage_show(const PGRUsage *ru0)
Definition pg_rusage.c:40
void pg_rusage_init(PGRUsage *ru0)
Definition pg_rusage.c:27
static char buf[DEFAULT_XLOG_SEG_SIZE]
int64 PgStat_Counter
Definition pgstat.h:70
PgStat_Counter pgStatBlockReadTime
PgStat_Counter pgStatBlockWriteTime
void pgstat_report_vacuum(Relation rel, PgStat_Counter livetuples, PgStat_Counter deadtuples, TimestampTz starttime)
#define qsort(a, b, c, d)
Definition port.h:495
static int fb(int x)
GlobalVisState * GlobalVisTestFor(Relation rel)
Definition procarray.c:4114
#define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP
Definition progress.h:41
#define PROGRESS_VACUUM_MODE
Definition progress.h:32
#define PROGRESS_VACUUM_MODE_NORMAL
Definition progress.h:44
#define PROGRESS_VACUUM_STARTED_BY_AUTOVACUUM
Definition progress.h:50
#define PROGRESS_VACUUM_DEAD_TUPLE_BYTES
Definition progress.h:27
#define PROGRESS_VACUUM_PHASE_SCAN_HEAP
Definition progress.h:36
#define PROGRESS_VACUUM_TOTAL_HEAP_BLKS
Definition progress.h:22
#define PROGRESS_VACUUM_PHASE
Definition progress.h:21
#define PROGRESS_VACUUM_DELAY_TIME
Definition progress.h:31
#define PROGRESS_VACUUM_STARTED_BY_AUTOVACUUM_WRAPAROUND
Definition progress.h:51
#define PROGRESS_VACUUM_NUM_INDEX_VACUUMS
Definition progress.h:25
#define PROGRESS_VACUUM_PHASE_VACUUM_HEAP
Definition progress.h:38
#define PROGRESS_VACUUM_NUM_DEAD_ITEM_IDS
Definition progress.h:28
#define PROGRESS_VACUUM_MAX_DEAD_TUPLE_BYTES
Definition progress.h:26
#define PROGRESS_VACUUM_STARTED_BY_MANUAL
Definition progress.h:49
#define PROGRESS_VACUUM_HEAP_BLKS_SCANNED
Definition progress.h:23
#define PROGRESS_VACUUM_STARTED_BY
Definition progress.h:33
#define PROGRESS_VACUUM_PHASE_INDEX_CLEANUP
Definition progress.h:39
#define PROGRESS_VACUUM_PHASE_VACUUM_INDEX
Definition progress.h:37
#define PROGRESS_VACUUM_MODE_FAILSAFE
Definition progress.h:46
#define PROGRESS_VACUUM_INDEXES_PROCESSED
Definition progress.h:30
#define PROGRESS_VACUUM_INDEXES_TOTAL
Definition progress.h:29
#define PROGRESS_VACUUM_MODE_AGGRESSIVE
Definition progress.h:45
#define PROGRESS_VACUUM_HEAP_BLKS_VACUUMED
Definition progress.h:24
#define PROGRESS_VACUUM_PHASE_TRUNCATE
Definition progress.h:40
void heap_page_prune_and_freeze(PruneFreezeParams *params, PruneFreezeResult *presult, OffsetNumber *off_loc, TransactionId *new_relfrozen_xid, MultiXactId *new_relmin_mxid)
Definition pruneheap.c:815
void log_heap_prune_and_freeze(Relation relation, Buffer buffer, Buffer vmbuffer, uint8 vmflags, TransactionId conflict_xid, bool cleanup_lock, PruneReason reason, HeapTupleFreeze *frozen, int nfrozen, OffsetNumber *redirected, int nredirected, OffsetNumber *dead, int ndead, OffsetNumber *unused, int nunused)
Definition pruneheap.c:2162
Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
ReadStream * read_stream_begin_relation(int flags, BufferAccessStrategy strategy, Relation rel, ForkNumber forknum, ReadStreamBlockNumberCB callback, void *callback_private_data, size_t per_buffer_data_size)
void read_stream_end(ReadStream *stream)
#define READ_STREAM_MAINTENANCE
Definition read_stream.h:28
#define READ_STREAM_USE_BATCHING
Definition read_stream.h:64
#define RelationGetRelid(relation)
Definition rel.h:514
#define RelationGetRelationName(relation)
Definition rel.h:548
#define RelationNeedsWAL(relation)
Definition rel.h:637
#define RelationUsesLocalBuffers(relation)
Definition rel.h:646
#define RelationGetNamespace(relation)
Definition rel.h:555
@ MAIN_FORKNUM
Definition relpath.h:58
void RelationTruncate(Relation rel, BlockNumber nblocks)
Definition storage.c:289
void appendStringInfo(StringInfo str, const char *fmt,...)
Definition stringinfo.c:145
void appendStringInfoString(StringInfo str, const char *s)
Definition stringinfo.c:230
void initStringInfo(StringInfo str)
Definition stringinfo.c:97
int64 shared_blks_dirtied
Definition instrument.h:28
int64 local_blks_hit
Definition instrument.h:30
int64 shared_blks_read
Definition instrument.h:27
int64 local_blks_read
Definition instrument.h:31
int64 local_blks_dirtied
Definition instrument.h:32
int64 shared_blks_hit
Definition instrument.h:26
struct ErrorContextCallback * previous
Definition elog.h:297
void(* callback)(void *arg)
Definition elog.h:298
ItemPointerData t_self
Definition htup.h:65
uint32 t_len
Definition htup.h:64
HeapTupleHeader t_data
Definition htup.h:68
Oid t_tableOid
Definition htup.h:66
BlockNumber pages_deleted
Definition genam.h:90
BlockNumber pages_newly_deleted
Definition genam.h:89
BlockNumber pages_free
Definition genam.h:91
BlockNumber num_pages
Definition genam.h:85
double num_index_tuples
Definition genam.h:87
BlockNumber next_eager_scan_region_start
Definition vacuumlazy.c:372
ParallelVacuumState * pvs
Definition vacuumlazy.c:260
bool next_unskippable_eager_scanned
Definition vacuumlazy.c:357
VacDeadItemsInfo * dead_items_info
Definition vacuumlazy.c:303
Buffer next_unskippable_vmbuffer
Definition vacuumlazy.c:358
OffsetNumber offnum
Definition vacuumlazy.c:288
TidStore * dead_items
Definition vacuumlazy.c:302
int64 tuples_deleted
Definition vacuumlazy.c:347
BlockNumber nonempty_pages
Definition vacuumlazy.c:334
BlockNumber eager_scan_remaining_fails
Definition vacuumlazy.c:404
bool do_rel_truncate
Definition vacuumlazy.c:272
BlockNumber scanned_pages
Definition vacuumlazy.c:306
int num_dead_items_resets
Definition vacuumlazy.c:344
bool aggressive
Definition vacuumlazy.c:263
BlockNumber new_frozen_tuple_pages
Definition vacuumlazy.c:315
GlobalVisState * vistest
Definition vacuumlazy.c:276
BlockNumber removed_pages
Definition vacuumlazy.c:314
int num_index_scans
Definition vacuumlazy.c:343
IndexBulkDeleteResult ** indstats
Definition vacuumlazy.c:340
BlockNumber new_all_frozen_pages
Definition vacuumlazy.c:330
double new_live_tuples
Definition vacuumlazy.c:338
double new_rel_tuples
Definition vacuumlazy.c:337
BlockNumber new_all_visible_all_frozen_pages
Definition vacuumlazy.c:327
BlockNumber new_all_visible_pages
Definition vacuumlazy.c:318
TransactionId NewRelfrozenXid
Definition vacuumlazy.c:278
Relation rel
Definition vacuumlazy.c:254
bool consider_bypass_optimization
Definition vacuumlazy.c:267
BlockNumber rel_pages
Definition vacuumlazy.c:305
Size total_dead_items_bytes
Definition vacuumlazy.c:345
BlockNumber next_unskippable_block
Definition vacuumlazy.c:356
int64 recently_dead_tuples
Definition vacuumlazy.c:351
int64 tuples_frozen
Definition vacuumlazy.c:348
char * dbname
Definition vacuumlazy.c:283
BlockNumber missed_dead_pages
Definition vacuumlazy.c:333
BlockNumber current_block
Definition vacuumlazy.c:355
char * relnamespace
Definition vacuumlazy.c:284
int64 live_tuples
Definition vacuumlazy.c:350
int64 lpdead_items
Definition vacuumlazy.c:349
BufferAccessStrategy bstrategy
Definition vacuumlazy.c:259
BlockNumber eager_scan_remaining_successes
Definition vacuumlazy.c:383
bool skippedallvis
Definition vacuumlazy.c:280
BlockNumber lpdead_item_pages
Definition vacuumlazy.c:332
BlockNumber eager_scanned_pages
Definition vacuumlazy.c:312
Relation * indrels
Definition vacuumlazy.c:255
bool skipwithvm
Definition vacuumlazy.c:265
bool do_index_cleanup
Definition vacuumlazy.c:271
MultiXactId NewRelminMxid
Definition vacuumlazy.c:279
int64 missed_dead_tuples
Definition vacuumlazy.c:352
BlockNumber blkno
Definition vacuumlazy.c:287
struct VacuumCutoffs cutoffs
Definition vacuumlazy.c:275
char * relname
Definition vacuumlazy.c:285
BlockNumber eager_scan_max_fails_per_region
Definition vacuumlazy.c:394
VacErrPhase phase
Definition vacuumlazy.c:289
char * indname
Definition vacuumlazy.c:286
bool do_index_vacuuming
Definition vacuumlazy.c:270
BlockNumber blkno
Definition vacuumlazy.c:411
VacErrPhase phase
Definition vacuumlazy.c:413
OffsetNumber offnum
Definition vacuumlazy.c:412
int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]
Relation relation
Definition heapam.h:262
size_t max_bytes
Definition vacuum.h:299
int64 num_items
Definition vacuum.h:300
int nworkers
Definition vacuum.h:251
VacOptValue truncate
Definition vacuum.h:236
bits32 options
Definition vacuum.h:219
int log_vacuum_min_duration
Definition vacuum.h:227
bool is_wraparound
Definition vacuum.h:226
VacOptValue index_cleanup
Definition vacuum.h:235
double max_eager_freeze_failure_rate
Definition vacuum.h:244
int64 wal_buffers_full
Definition instrument.h:57
uint64 wal_bytes
Definition instrument.h:55
int64 wal_fpi
Definition instrument.h:54
uint64 wal_fpi_bytes
Definition instrument.h:56
int64 wal_records
Definition instrument.h:53
TidStoreIter * TidStoreBeginIterate(TidStore *ts)
Definition tidstore.c:471
void TidStoreEndIterate(TidStoreIter *iter)
Definition tidstore.c:518
TidStoreIterResult * TidStoreIterateNext(TidStoreIter *iter)
Definition tidstore.c:493
TidStore * TidStoreCreateLocal(size_t max_bytes, bool insert_only)
Definition tidstore.c:162
void TidStoreDestroy(TidStore *ts)
Definition tidstore.c:317
int TidStoreGetBlockOffsets(TidStoreIterResult *result, OffsetNumber *offsets, int max_offsets)
Definition tidstore.c:566
void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets, int num_offsets)
Definition tidstore.c:345
size_t TidStoreMemoryUsage(TidStore *ts)
Definition tidstore.c:532
static bool TransactionIdFollows(TransactionId id1, TransactionId id2)
Definition transam.h:297
static TransactionId ReadNextTransactionId(void)
Definition transam.h:377
#define InvalidTransactionId
Definition transam.h:31
static bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
Definition transam.h:282
#define TransactionIdIsValid(xid)
Definition transam.h:41
#define TransactionIdIsNormal(xid)
Definition transam.h:42
static bool TransactionIdPrecedes(TransactionId id1, TransactionId id2)
Definition transam.h:263
bool track_cost_delay_timing
Definition vacuum.c:83
void vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
Definition vacuum.c:2367
IndexBulkDeleteResult * vac_cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
Definition vacuum.c:2659
void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode)
Definition vacuum.c:2410
void vacuum_delay_point(bool is_analyze)
Definition vacuum.c:2431
bool vacuum_xid_failsafe_check(const struct VacuumCutoffs *cutoffs)
Definition vacuum.c:1268
bool VacuumFailsafeActive
Definition vacuum.c:111
double vac_estimate_reltuples(Relation relation, BlockNumber total_pages, BlockNumber scanned_pages, double scanned_tuples)
Definition vacuum.c:1330
void vac_update_relstats(Relation relation, BlockNumber num_pages, double num_tuples, BlockNumber num_all_visible_pages, BlockNumber num_all_frozen_pages, bool hasindex, TransactionId frozenxid, MultiXactId minmulti, bool *frozenxid_updated, bool *minmulti_updated, bool in_outer_xact)
Definition vacuum.c:1426
bool vacuum_get_cutoffs(Relation rel, const VacuumParams params, struct VacuumCutoffs *cutoffs)
Definition vacuum.c:1100
IndexBulkDeleteResult * vac_bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat, TidStore *dead_items, VacDeadItemsInfo *dead_items_info)
Definition vacuum.c:2638
#define VACOPT_VERBOSE
Definition vacuum.h:182
@ VACOPTVALUE_AUTO
Definition vacuum.h:203
@ VACOPTVALUE_ENABLED
Definition vacuum.h:205
@ VACOPTVALUE_UNSPECIFIED
Definition vacuum.h:202
@ VACOPTVALUE_DISABLED
Definition vacuum.h:204
#define VACOPT_DISABLE_PAGE_SKIPPING
Definition vacuum.h:188
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, Buffer vmbuffer, bool *has_lpdead_items, bool *vm_page_frozen)
static void dead_items_cleanup(LVRelState *vacrel)
static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer, BlockNumber heap_blk, Page heap_page, int nlpdead_items, Buffer vmbuffer, uint8 *vmbits)
static void update_relstats_all_indexes(LVRelState *vacrel)
static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets, int num_offsets)
void heap_vacuum_rel(Relation rel, const VacuumParams params, BufferAccessStrategy bstrategy)
Definition vacuumlazy.c:627
static BlockNumber heap_vac_scan_next_block(ReadStream *stream, void *callback_private_data, void *per_buffer_data)
static void heap_vacuum_eager_scan_setup(LVRelState *vacrel, const VacuumParams params)
Definition vacuumlazy.c:500
#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL
Definition vacuumlazy.c:179
static void vacuum_error_callback(void *arg)
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf, TransactionId OldestXmin, OffsetNumber *deadoffsets, int ndeadoffsets, bool *all_frozen, TransactionId *visibility_cutoff_xid, OffsetNumber *logging_offnum)
#define EAGER_SCAN_REGION_SIZE
Definition vacuumlazy.c:249
static void lazy_truncate_heap(LVRelState *vacrel)
static void lazy_vacuum(LVRelState *vacrel)
static void lazy_cleanup_all_indexes(LVRelState *vacrel)
#define MAX_EAGER_FREEZE_SUCCESS_RATE
Definition vacuumlazy.c:240
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, bool *has_lpdead_items)
static BlockNumber vacuum_reap_lp_read_stream_next(ReadStream *stream, void *callback_private_data, void *per_buffer_data)
#define REL_TRUNCATE_MINIMUM
Definition vacuumlazy.c:168
static bool should_attempt_truncation(LVRelState *vacrel)
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, bool sharelock, Buffer vmbuffer)
VacErrPhase
Definition vacuumlazy.c:224
@ VACUUM_ERRCB_PHASE_SCAN_HEAP
Definition vacuumlazy.c:226
@ VACUUM_ERRCB_PHASE_VACUUM_INDEX
Definition vacuumlazy.c:227
@ VACUUM_ERRCB_PHASE_TRUNCATE
Definition vacuumlazy.c:230
@ VACUUM_ERRCB_PHASE_INDEX_CLEANUP
Definition vacuumlazy.c:229
@ VACUUM_ERRCB_PHASE_VACUUM_HEAP
Definition vacuumlazy.c:228
@ VACUUM_ERRCB_PHASE_UNKNOWN
Definition vacuumlazy.c:225
static void lazy_scan_heap(LVRelState *vacrel)
#define ParallelVacuumIsActive(vacrel)
Definition vacuumlazy.c:220
static void restore_vacuum_error_info(LVRelState *vacrel, const LVSavedErrInfo *saved_vacrel)
static IndexBulkDeleteResult * lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat, double reltuples, LVRelState *vacrel)
static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
static void dead_items_reset(LVRelState *vacrel)
#define REL_TRUNCATE_FRACTION
Definition vacuumlazy.c:169
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel)
static IndexBulkDeleteResult * lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat, double reltuples, bool estimated_count, LVRelState *vacrel)
#define PREFETCH_SIZE
Definition vacuumlazy.c:214
static void lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer, OffsetNumber *deadoffsets, int num_offsets, Buffer vmbuffer)
#define BYPASS_THRESHOLD_PAGES
Definition vacuumlazy.c:186
static void dead_items_alloc(LVRelState *vacrel, int nworkers)
#define VACUUM_TRUNCATE_LOCK_TIMEOUT
Definition vacuumlazy.c:180
static bool lazy_vacuum_all_indexes(LVRelState *vacrel)
static void update_vacuum_error_info(LVRelState *vacrel, LVSavedErrInfo *saved_vacrel, int phase, BlockNumber blkno, OffsetNumber offnum)
static BlockNumber count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
#define SKIP_PAGES_THRESHOLD
Definition vacuumlazy.c:208
#define FAILSAFE_EVERY_PAGES
Definition vacuumlazy.c:192
#define VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL
Definition vacuumlazy.c:178
static int cmpOffsetNumbers(const void *a, const void *b)
static void lazy_vacuum_heap_rel(LVRelState *vacrel)
#define VACUUM_FSM_EVERY_PAGES
Definition vacuumlazy.c:201
TidStore * parallel_vacuum_get_dead_items(ParallelVacuumState *pvs, VacDeadItemsInfo **dead_items_info_p)
ParallelVacuumState * parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes, int nrequested_workers, int vac_work_mem, int elevel, BufferAccessStrategy bstrategy)
void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples, int num_index_scans)
void parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs)
void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples, int num_index_scans, bool estimated_count)
void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid, uint8 flags)
bool visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags)
void visibilitymap_pin(Relation rel, BlockNumber heapBlk, Buffer *vmbuf)
uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf)
void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen)
void visibilitymap_set_vmbits(BlockNumber heapBlk, Buffer vmBuf, uint8 flags, const RelFileLocator rlocator)
#define VISIBILITYMAP_VALID_BITS
#define VISIBILITYMAP_ALL_FROZEN
#define VISIBILITYMAP_ALL_VISIBLE
#define WL_TIMEOUT
#define WL_EXIT_ON_PM_DEATH
#define WL_LATCH_SET
bool IsInParallelMode(void)
Definition xact.c:1091
#define XLogRecPtrIsValid(r)
Definition xlogdefs.h:29
#define InvalidXLogRecPtr
Definition xlogdefs.h:28
XLogRecPtr log_newpage_buffer(Buffer buffer, bool page_std)