PostgreSQL Source Code  git master
nodeHashjoin.c
Go to the documentation of this file.
1 /*-------------------------------------------------------------------------
2  *
3  * nodeHashjoin.c
4  * Routines to handle hash join nodes
5  *
6  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
7  * Portions Copyright (c) 1994, Regents of the University of California
8  *
9  *
10  * IDENTIFICATION
11  * src/backend/executor/nodeHashjoin.c
12  *
13  * HASH JOIN
14  *
15  * This is based on the "hybrid hash join" algorithm described shortly in the
16  * following page
17  *
18  * https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join
19  *
20  * and in detail in the referenced paper:
21  *
22  * "An Adaptive Hash Join Algorithm for Multiuser Environments"
23  * Hansjörg Zeller; Jim Gray (1990). Proceedings of the 16th VLDB conference.
24  * Brisbane: 186–197.
25  *
26  * If the inner side tuples of a hash join do not fit in memory, the hash join
27  * can be executed in multiple batches.
28  *
29  * If the statistics on the inner side relation are accurate, planner chooses a
30  * multi-batch strategy and estimates the number of batches.
31  *
32  * The query executor measures the real size of the hashtable and increases the
33  * number of batches if the hashtable grows too large.
34  *
35  * The number of batches is always a power of two, so an increase in the number
36  * of batches doubles it.
37  *
38  * Serial hash join measures batch size lazily -- waiting until it is loading a
39  * batch to determine if it will fit in memory. While inserting tuples into the
40  * hashtable, serial hash join will, if that tuple were to exceed work_mem,
41  * dump out the hashtable and reassign them either to other batch files or the
42  * current batch resident in the hashtable.
43  *
44  * Parallel hash join, on the other hand, completes all changes to the number
45  * of batches during the build phase. If it increases the number of batches, it
46  * dumps out all the tuples from all batches and reassigns them to entirely new
47  * batch files. Then it checks every batch to ensure it will fit in the space
48  * budget for the query.
49  *
50  * In both parallel and serial hash join, the executor currently makes a best
51  * effort. If a particular batch will not fit in memory, it tries doubling the
52  * number of batches. If after a batch increase, there is a batch which
53  * retained all or none of its tuples, the executor disables growth in the
54  * number of batches globally. After growth is disabled, all batches that would
55  * have previously triggered an increase in the number of batches instead
56  * exceed the space allowed.
57  *
58  * PARALLELISM
59  *
60  * Hash joins can participate in parallel query execution in several ways. A
61  * parallel-oblivious hash join is one where the node is unaware that it is
62  * part of a parallel plan. In this case, a copy of the inner plan is used to
63  * build a copy of the hash table in every backend, and the outer plan could
64  * either be built from a partial or complete path, so that the results of the
65  * hash join are correspondingly either partial or complete. A parallel-aware
66  * hash join is one that behaves differently, coordinating work between
67  * backends, and appears as Parallel Hash Join in EXPLAIN output. A Parallel
68  * Hash Join always appears with a Parallel Hash node.
69  *
70  * Parallel-aware hash joins use the same per-backend state machine to track
71  * progress through the hash join algorithm as parallel-oblivious hash joins.
72  * In a parallel-aware hash join, there is also a shared state machine that
73  * co-operating backends use to synchronize their local state machines and
74  * program counters. The shared state machine is managed with a Barrier IPC
75  * primitive. When all attached participants arrive at a barrier, the phase
76  * advances and all waiting participants are released.
77  *
78  * When a participant begins working on a parallel hash join, it must first
79  * figure out how much progress has already been made, because participants
80  * don't wait for each other to begin. For this reason there are switch
81  * statements at key points in the code where we have to synchronize our local
82  * state machine with the phase, and then jump to the correct part of the
83  * algorithm so that we can get started.
84  *
85  * One barrier called build_barrier is used to coordinate the hashing phases.
86  * The phase is represented by an integer which begins at zero and increments
87  * one by one, but in the code it is referred to by symbolic names as follows.
88  * An asterisk indicates a phase that is performed by a single arbitrarily
89  * chosen process.
90  *
91  * PHJ_BUILD_ELECT -- initial state
92  * PHJ_BUILD_ALLOCATE* -- one sets up the batches and table 0
93  * PHJ_BUILD_HASH_INNER -- all hash the inner rel
94  * PHJ_BUILD_HASH_OUTER -- (multi-batch only) all hash the outer
95  * PHJ_BUILD_RUN -- building done, probing can begin
96  * PHJ_BUILD_FREE* -- all work complete, one frees batches
97  *
98  * While in the phase PHJ_BUILD_HASH_INNER a separate pair of barriers may
99  * be used repeatedly as required to coordinate expansions in the number of
100  * batches or buckets. Their phases are as follows:
101  *
102  * PHJ_GROW_BATCHES_ELECT -- initial state
103  * PHJ_GROW_BATCHES_REALLOCATE* -- one allocates new batches
104  * PHJ_GROW_BATCHES_REPARTITION -- all repartition
105  * PHJ_GROW_BATCHES_DECIDE* -- one detects skew and cleans up
106  * PHJ_GROW_BATCHES_FINISH -- finished one growth cycle
107  *
108  * PHJ_GROW_BUCKETS_ELECT -- initial state
109  * PHJ_GROW_BUCKETS_REALLOCATE* -- one allocates new buckets
110  * PHJ_GROW_BUCKETS_REINSERT -- all insert tuples
111  *
112  * If the planner got the number of batches and buckets right, those won't be
113  * necessary, but on the other hand we might finish up needing to expand the
114  * buckets or batches multiple times while hashing the inner relation to stay
115  * within our memory budget and load factor target. For that reason it's a
116  * separate pair of barriers using circular phases.
117  *
118  * The PHJ_BUILD_HASH_OUTER phase is required only for multi-batch joins,
119  * because we need to divide the outer relation into batches up front in order
120  * to be able to process batches entirely independently. In contrast, the
121  * parallel-oblivious algorithm simply throws tuples 'forward' to 'later'
122  * batches whenever it encounters them while scanning and probing, which it
123  * can do because it processes batches in serial order.
124  *
125  * Once PHJ_BUILD_RUN is reached, backends then split up and process
126  * different batches, or gang up and work together on probing batches if there
127  * aren't enough to go around. For each batch there is a separate barrier
128  * with the following phases:
129  *
130  * PHJ_BATCH_ELECT -- initial state
131  * PHJ_BATCH_ALLOCATE* -- one allocates buckets
132  * PHJ_BATCH_LOAD -- all load the hash table from disk
133  * PHJ_BATCH_PROBE -- all probe
134  * PHJ_BATCH_SCAN* -- one does right/right-anti/full unmatched scan
135  * PHJ_BATCH_FREE* -- one frees memory
136  *
137  * Batch 0 is a special case, because it starts out in phase
138  * PHJ_BATCH_PROBE; populating batch 0's hash table is done during
139  * PHJ_BUILD_HASH_INNER so we can skip loading.
140  *
141  * Initially we try to plan for a single-batch hash join using the combined
142  * hash_mem of all participants to create a large shared hash table. If that
143  * turns out either at planning or execution time to be impossible then we
144  * fall back to regular hash_mem sized hash tables.
145  *
146  * To avoid deadlocks, we never wait for any barrier unless it is known that
147  * all other backends attached to it are actively executing the node or have
148  * finished. Practically, that means that we never emit a tuple while attached
149  * to a barrier, unless the barrier has reached a phase that means that no
150  * process will wait on it again. We emit tuples while attached to the build
151  * barrier in phase PHJ_BUILD_RUN, and to a per-batch barrier in phase
152  * PHJ_BATCH_PROBE. These are advanced to PHJ_BUILD_FREE and PHJ_BATCH_SCAN
153  * respectively without waiting, using BarrierArriveAndDetach() and
154  * BarrierArriveAndDetachExceptLast() respectively. The last to detach
155  * receives a different return value so that it knows that it's safe to
156  * clean up. Any straggler process that attaches after that phase is reached
157  * will see that it's too late to participate or access the relevant shared
158  * memory objects.
159  *
160  *-------------------------------------------------------------------------
161  */
162 
163 #include "postgres.h"
164 
165 #include "access/htup_details.h"
166 #include "access/parallel.h"
167 #include "executor/executor.h"
168 #include "executor/hashjoin.h"
169 #include "executor/nodeHash.h"
170 #include "executor/nodeHashjoin.h"
171 #include "miscadmin.h"
172 #include "utils/sharedtuplestore.h"
173 #include "utils/wait_event.h"
174 
175 
176 /*
177  * States of the ExecHashJoin state machine
178  */
179 #define HJ_BUILD_HASHTABLE 1
180 #define HJ_NEED_NEW_OUTER 2
181 #define HJ_SCAN_BUCKET 3
182 #define HJ_FILL_OUTER_TUPLE 4
183 #define HJ_FILL_INNER_TUPLES 5
184 #define HJ_NEED_NEW_BATCH 6
185 
186 /* Returns true if doing null-fill on outer relation */
187 #define HJ_FILL_OUTER(hjstate) ((hjstate)->hj_NullInnerTupleSlot != NULL)
188 /* Returns true if doing null-fill on inner relation */
189 #define HJ_FILL_INNER(hjstate) ((hjstate)->hj_NullOuterTupleSlot != NULL)
190 
192  HashJoinState *hjstate,
193  uint32 *hashvalue);
195  HashJoinState *hjstate,
196  uint32 *hashvalue);
198  BufFile *file,
199  uint32 *hashvalue,
200  TupleTableSlot *tupleSlot);
201 static bool ExecHashJoinNewBatch(HashJoinState *hjstate);
202 static bool ExecParallelHashJoinNewBatch(HashJoinState *hjstate);
204 
205 
206 /* ----------------------------------------------------------------
207  * ExecHashJoinImpl
208  *
209  * This function implements the Hybrid Hashjoin algorithm. It is marked
210  * with an always-inline attribute so that ExecHashJoin() and
211  * ExecParallelHashJoin() can inline it. Compilers that respect the
212  * attribute should create versions specialized for parallel == true and
213  * parallel == false with unnecessary branches removed.
214  *
215  * Note: the relation we build hash table on is the "inner"
216  * the other one is "outer".
217  * ----------------------------------------------------------------
218  */
220 ExecHashJoinImpl(PlanState *pstate, bool parallel)
221 {
222  HashJoinState *node = castNode(HashJoinState, pstate);
223  PlanState *outerNode;
224  HashState *hashNode;
225  ExprState *joinqual;
226  ExprState *otherqual;
227  ExprContext *econtext;
228  HashJoinTable hashtable;
229  TupleTableSlot *outerTupleSlot;
230  uint32 hashvalue;
231  int batchno;
232  ParallelHashJoinState *parallel_state;
233 
234  /*
235  * get information from HashJoin node
236  */
237  joinqual = node->js.joinqual;
238  otherqual = node->js.ps.qual;
239  hashNode = (HashState *) innerPlanState(node);
240  outerNode = outerPlanState(node);
241  hashtable = node->hj_HashTable;
242  econtext = node->js.ps.ps_ExprContext;
243  parallel_state = hashNode->parallel_state;
244 
245  /*
246  * Reset per-tuple memory context to free any expression evaluation
247  * storage allocated in the previous tuple cycle.
248  */
249  ResetExprContext(econtext);
250 
251  /*
252  * run the hash join state machine
253  */
254  for (;;)
255  {
256  /*
257  * It's possible to iterate this loop many times before returning a
258  * tuple, in some pathological cases such as needing to move much of
259  * the current batch to a later batch. So let's check for interrupts
260  * each time through.
261  */
263 
264  switch (node->hj_JoinState)
265  {
266  case HJ_BUILD_HASHTABLE:
267 
268  /*
269  * First time through: build hash table for inner relation.
270  */
271  Assert(hashtable == NULL);
272 
273  /*
274  * If the outer relation is completely empty, and it's not
275  * right/right-anti/full join, we can quit without building
276  * the hash table. However, for an inner join it is only a
277  * win to check this when the outer relation's startup cost is
278  * less than the projected cost of building the hash table.
279  * Otherwise it's best to build the hash table first and see
280  * if the inner relation is empty. (When it's a left join, we
281  * should always make this check, since we aren't going to be
282  * able to skip the join on the strength of an empty inner
283  * relation anyway.)
284  *
285  * If we are rescanning the join, we make use of information
286  * gained on the previous scan: don't bother to try the
287  * prefetch if the previous scan found the outer relation
288  * nonempty. This is not 100% reliable since with new
289  * parameters the outer relation might yield different
290  * results, but it's a good heuristic.
291  *
292  * The only way to make the check is to try to fetch a tuple
293  * from the outer plan node. If we succeed, we have to stash
294  * it away for later consumption by ExecHashJoinOuterGetTuple.
295  */
296  if (HJ_FILL_INNER(node))
297  {
298  /* no chance to not build the hash table */
299  node->hj_FirstOuterTupleSlot = NULL;
300  }
301  else if (parallel)
302  {
303  /*
304  * The empty-outer optimization is not implemented for
305  * shared hash tables, because no one participant can
306  * determine that there are no outer tuples, and it's not
307  * yet clear that it's worth the synchronization overhead
308  * of reaching consensus to figure that out. So we have
309  * to build the hash table.
310  */
311  node->hj_FirstOuterTupleSlot = NULL;
312  }
313  else if (HJ_FILL_OUTER(node) ||
314  (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
315  !node->hj_OuterNotEmpty))
316  {
317  node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
319  {
320  node->hj_OuterNotEmpty = false;
321  return NULL;
322  }
323  else
324  node->hj_OuterNotEmpty = true;
325  }
326  else
327  node->hj_FirstOuterTupleSlot = NULL;
328 
329  /*
330  * Create the hash table. If using Parallel Hash, then
331  * whoever gets here first will create the hash table and any
332  * later arrivals will merely attach to it.
333  */
334  hashtable = ExecHashTableCreate(hashNode,
335  node->hj_HashOperators,
336  node->hj_Collations,
337  HJ_FILL_INNER(node));
338  node->hj_HashTable = hashtable;
339 
340  /*
341  * Execute the Hash node, to build the hash table. If using
342  * Parallel Hash, then we'll try to help hashing unless we
343  * arrived too late.
344  */
345  hashNode->hashtable = hashtable;
346  (void) MultiExecProcNode((PlanState *) hashNode);
347 
348  /*
349  * If the inner relation is completely empty, and we're not
350  * doing a left outer join, we can quit without scanning the
351  * outer relation.
352  */
353  if (hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
354  {
355  if (parallel)
356  {
357  /*
358  * Advance the build barrier to PHJ_BUILD_RUN before
359  * proceeding so we can negotiate resource cleanup.
360  */
361  Barrier *build_barrier = &parallel_state->build_barrier;
362 
363  while (BarrierPhase(build_barrier) < PHJ_BUILD_RUN)
364  BarrierArriveAndWait(build_barrier, 0);
365  }
366  return NULL;
367  }
368 
369  /*
370  * need to remember whether nbatch has increased since we
371  * began scanning the outer relation
372  */
373  hashtable->nbatch_outstart = hashtable->nbatch;
374 
375  /*
376  * Reset OuterNotEmpty for scan. (It's OK if we fetched a
377  * tuple above, because ExecHashJoinOuterGetTuple will
378  * immediately set it again.)
379  */
380  node->hj_OuterNotEmpty = false;
381 
382  if (parallel)
383  {
384  Barrier *build_barrier;
385 
386  build_barrier = &parallel_state->build_barrier;
387  Assert(BarrierPhase(build_barrier) == PHJ_BUILD_HASH_OUTER ||
388  BarrierPhase(build_barrier) == PHJ_BUILD_RUN ||
389  BarrierPhase(build_barrier) == PHJ_BUILD_FREE);
390  if (BarrierPhase(build_barrier) == PHJ_BUILD_HASH_OUTER)
391  {
392  /*
393  * If multi-batch, we need to hash the outer relation
394  * up front.
395  */
396  if (hashtable->nbatch > 1)
398  BarrierArriveAndWait(build_barrier,
399  WAIT_EVENT_HASH_BUILD_HASH_OUTER);
400  }
401  else if (BarrierPhase(build_barrier) == PHJ_BUILD_FREE)
402  {
403  /*
404  * If we attached so late that the job is finished and
405  * the batch state has been freed, we can return
406  * immediately.
407  */
408  return NULL;
409  }
410 
411  /* Each backend should now select a batch to work on. */
412  Assert(BarrierPhase(build_barrier) == PHJ_BUILD_RUN);
413  hashtable->curbatch = -1;
415 
416  continue;
417  }
418  else
420 
421  /* FALL THRU */
422 
423  case HJ_NEED_NEW_OUTER:
424 
425  /*
426  * We don't have an outer tuple, try to get the next one
427  */
428  if (parallel)
429  outerTupleSlot =
430  ExecParallelHashJoinOuterGetTuple(outerNode, node,
431  &hashvalue);
432  else
433  outerTupleSlot =
434  ExecHashJoinOuterGetTuple(outerNode, node, &hashvalue);
435 
436  if (TupIsNull(outerTupleSlot))
437  {
438  /* end of batch, or maybe whole join */
439  if (HJ_FILL_INNER(node))
440  {
441  /* set up to scan for unmatched inner tuples */
442  if (parallel)
443  {
444  /*
445  * Only one process is currently allow to handle
446  * each batch's unmatched tuples, in a parallel
447  * join.
448  */
451  else
453  }
454  else
455  {
458  }
459  }
460  else
462  continue;
463  }
464 
465  econtext->ecxt_outertuple = outerTupleSlot;
466  node->hj_MatchedOuter = false;
467 
468  /*
469  * Find the corresponding bucket for this tuple in the main
470  * hash table or skew hash table.
471  */
472  node->hj_CurHashValue = hashvalue;
473  ExecHashGetBucketAndBatch(hashtable, hashvalue,
474  &node->hj_CurBucketNo, &batchno);
475  node->hj_CurSkewBucketNo = ExecHashGetSkewBucket(hashtable,
476  hashvalue);
477  node->hj_CurTuple = NULL;
478 
479  /*
480  * The tuple might not belong to the current batch (where
481  * "current batch" includes the skew buckets if any).
482  */
483  if (batchno != hashtable->curbatch &&
485  {
486  bool shouldFree;
487  MinimalTuple mintuple = ExecFetchSlotMinimalTuple(outerTupleSlot,
488  &shouldFree);
489 
490  /*
491  * Need to postpone this outer tuple to a later batch.
492  * Save it in the corresponding outer-batch file.
493  */
494  Assert(parallel_state == NULL);
495  Assert(batchno > hashtable->curbatch);
496  ExecHashJoinSaveTuple(mintuple, hashvalue,
497  &hashtable->outerBatchFile[batchno],
498  hashtable);
499 
500  if (shouldFree)
501  heap_free_minimal_tuple(mintuple);
502 
503  /* Loop around, staying in HJ_NEED_NEW_OUTER state */
504  continue;
505  }
506 
507  /* OK, let's scan the bucket for matches */
509 
510  /* FALL THRU */
511 
512  case HJ_SCAN_BUCKET:
513 
514  /*
515  * Scan the selected hash bucket for matches to current outer
516  */
517  if (parallel)
518  {
519  if (!ExecParallelScanHashBucket(node, econtext))
520  {
521  /* out of matches; check for possible outer-join fill */
523  continue;
524  }
525  }
526  else
527  {
528  if (!ExecScanHashBucket(node, econtext))
529  {
530  /* out of matches; check for possible outer-join fill */
532  continue;
533  }
534  }
535 
536  /*
537  * In a right-semijoin, we only need the first match for each
538  * inner tuple.
539  */
540  if (node->js.jointype == JOIN_RIGHT_SEMI &&
542  continue;
543 
544  /*
545  * We've got a match, but still need to test non-hashed quals.
546  * ExecScanHashBucket already set up all the state needed to
547  * call ExecQual.
548  *
549  * If we pass the qual, then save state for next call and have
550  * ExecProject form the projection, store it in the tuple
551  * table, and return the slot.
552  *
553  * Only the joinquals determine tuple match status, but all
554  * quals must pass to actually return the tuple.
555  */
556  if (joinqual == NULL || ExecQual(joinqual, econtext))
557  {
558  node->hj_MatchedOuter = true;
559 
560  /*
561  * This is really only needed if HJ_FILL_INNER(node) or if
562  * we are in a right-semijoin, but we'll avoid the branch
563  * and just set it always.
564  */
567 
568  /* In an antijoin, we never return a matched tuple */
569  if (node->js.jointype == JOIN_ANTI)
570  {
572  continue;
573  }
574 
575  /*
576  * If we only need to consider the first matching inner
577  * tuple, then advance to next outer tuple after we've
578  * processed this one.
579  */
580  if (node->js.single_match)
582 
583  /*
584  * In a right-antijoin, we never return a matched tuple.
585  * If it's not an inner_unique join, we need to stay on
586  * the current outer tuple to continue scanning the inner
587  * side for matches.
588  */
589  if (node->js.jointype == JOIN_RIGHT_ANTI)
590  continue;
591 
592  if (otherqual == NULL || ExecQual(otherqual, econtext))
593  return ExecProject(node->js.ps.ps_ProjInfo);
594  else
595  InstrCountFiltered2(node, 1);
596  }
597  else
598  InstrCountFiltered1(node, 1);
599  break;
600 
601  case HJ_FILL_OUTER_TUPLE:
602 
603  /*
604  * The current outer tuple has run out of matches, so check
605  * whether to emit a dummy outer-join tuple. Whether we emit
606  * one or not, the next state is NEED_NEW_OUTER.
607  */
609 
610  if (!node->hj_MatchedOuter &&
611  HJ_FILL_OUTER(node))
612  {
613  /*
614  * Generate a fake join tuple with nulls for the inner
615  * tuple, and return it if it passes the non-join quals.
616  */
617  econtext->ecxt_innertuple = node->hj_NullInnerTupleSlot;
618 
619  if (otherqual == NULL || ExecQual(otherqual, econtext))
620  return ExecProject(node->js.ps.ps_ProjInfo);
621  else
622  InstrCountFiltered2(node, 1);
623  }
624  break;
625 
627 
628  /*
629  * We have finished a batch, but we are doing
630  * right/right-anti/full join, so any unmatched inner tuples
631  * in the hashtable have to be emitted before we continue to
632  * the next batch.
633  */
634  if (!(parallel ? ExecParallelScanHashTableForUnmatched(node, econtext)
635  : ExecScanHashTableForUnmatched(node, econtext)))
636  {
637  /* no more unmatched tuples */
639  continue;
640  }
641 
642  /*
643  * Generate a fake join tuple with nulls for the outer tuple,
644  * and return it if it passes the non-join quals.
645  */
646  econtext->ecxt_outertuple = node->hj_NullOuterTupleSlot;
647 
648  if (otherqual == NULL || ExecQual(otherqual, econtext))
649  return ExecProject(node->js.ps.ps_ProjInfo);
650  else
651  InstrCountFiltered2(node, 1);
652  break;
653 
654  case HJ_NEED_NEW_BATCH:
655 
656  /*
657  * Try to advance to next batch. Done if there are no more.
658  */
659  if (parallel)
660  {
661  if (!ExecParallelHashJoinNewBatch(node))
662  return NULL; /* end of parallel-aware join */
663  }
664  else
665  {
666  if (!ExecHashJoinNewBatch(node))
667  return NULL; /* end of parallel-oblivious join */
668  }
670  break;
671 
672  default:
673  elog(ERROR, "unrecognized hashjoin state: %d",
674  (int) node->hj_JoinState);
675  }
676  }
677 }
678 
679 /* ----------------------------------------------------------------
680  * ExecHashJoin
681  *
682  * Parallel-oblivious version.
683  * ----------------------------------------------------------------
684  */
685 static TupleTableSlot * /* return: a tuple or NULL */
687 {
688  /*
689  * On sufficiently smart compilers this should be inlined with the
690  * parallel-aware branches removed.
691  */
692  return ExecHashJoinImpl(pstate, false);
693 }
694 
695 /* ----------------------------------------------------------------
696  * ExecParallelHashJoin
697  *
698  * Parallel-aware version.
699  * ----------------------------------------------------------------
700  */
701 static TupleTableSlot * /* return: a tuple or NULL */
703 {
704  /*
705  * On sufficiently smart compilers this should be inlined with the
706  * parallel-oblivious branches removed.
707  */
708  return ExecHashJoinImpl(pstate, true);
709 }
710 
711 /* ----------------------------------------------------------------
712  * ExecInitHashJoin
713  *
714  * Init routine for HashJoin node.
715  * ----------------------------------------------------------------
716  */
718 ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
719 {
720  HashJoinState *hjstate;
721  Plan *outerNode;
722  Hash *hashNode;
723  TupleDesc outerDesc,
724  innerDesc;
725  const TupleTableSlotOps *ops;
726 
727  /* check for unsupported flags */
728  Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
729 
730  /*
731  * create state structure
732  */
733  hjstate = makeNode(HashJoinState);
734  hjstate->js.ps.plan = (Plan *) node;
735  hjstate->js.ps.state = estate;
736 
737  /*
738  * See ExecHashJoinInitializeDSM() and ExecHashJoinInitializeWorker()
739  * where this function may be replaced with a parallel version, if we
740  * managed to launch a parallel query.
741  */
742  hjstate->js.ps.ExecProcNode = ExecHashJoin;
743  hjstate->js.jointype = node->join.jointype;
744 
745  /*
746  * Miscellaneous initialization
747  *
748  * create expression context for node
749  */
750  ExecAssignExprContext(estate, &hjstate->js.ps);
751 
752  /*
753  * initialize child nodes
754  *
755  * Note: we could suppress the REWIND flag for the inner input, which
756  * would amount to betting that the hash will be a single batch. Not
757  * clear if this would be a win or not.
758  */
759  outerNode = outerPlan(node);
760  hashNode = (Hash *) innerPlan(node);
761 
762  outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
763  outerDesc = ExecGetResultType(outerPlanState(hjstate));
764  innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
765  innerDesc = ExecGetResultType(innerPlanState(hjstate));
766 
767  /*
768  * Initialize result slot, type and projection.
769  */
771  ExecAssignProjectionInfo(&hjstate->js.ps, NULL);
772 
773  /*
774  * tuple table initialization
775  */
776  ops = ExecGetResultSlotOps(outerPlanState(hjstate), NULL);
777  hjstate->hj_OuterTupleSlot = ExecInitExtraTupleSlot(estate, outerDesc,
778  ops);
779 
780  /*
781  * detect whether we need only consider the first matching inner tuple
782  */
783  hjstate->js.single_match = (node->join.inner_unique ||
784  node->join.jointype == JOIN_SEMI);
785 
786  /* set up null tuples for outer joins, if needed */
787  switch (node->join.jointype)
788  {
789  case JOIN_INNER:
790  case JOIN_SEMI:
791  case JOIN_RIGHT_SEMI:
792  break;
793  case JOIN_LEFT:
794  case JOIN_ANTI:
795  hjstate->hj_NullInnerTupleSlot =
796  ExecInitNullTupleSlot(estate, innerDesc, &TTSOpsVirtual);
797  break;
798  case JOIN_RIGHT:
799  case JOIN_RIGHT_ANTI:
800  hjstate->hj_NullOuterTupleSlot =
801  ExecInitNullTupleSlot(estate, outerDesc, &TTSOpsVirtual);
802  break;
803  case JOIN_FULL:
804  hjstate->hj_NullOuterTupleSlot =
805  ExecInitNullTupleSlot(estate, outerDesc, &TTSOpsVirtual);
806  hjstate->hj_NullInnerTupleSlot =
807  ExecInitNullTupleSlot(estate, innerDesc, &TTSOpsVirtual);
808  break;
809  default:
810  elog(ERROR, "unrecognized join type: %d",
811  (int) node->join.jointype);
812  }
813 
814  /*
815  * now for some voodoo. our temporary tuple slot is actually the result
816  * tuple slot of the Hash node (which is our inner plan). we can do this
817  * because Hash nodes don't return tuples via ExecProcNode() -- instead
818  * the hash join node uses ExecScanHashBucket() to get at the contents of
819  * the hash table. -cim 6/9/91
820  */
821  {
822  HashState *hashstate = (HashState *) innerPlanState(hjstate);
823  TupleTableSlot *slot = hashstate->ps.ps_ResultTupleSlot;
824 
825  hjstate->hj_HashTupleSlot = slot;
826  }
827 
828  /*
829  * initialize child expressions
830  */
831  hjstate->js.ps.qual =
832  ExecInitQual(node->join.plan.qual, (PlanState *) hjstate);
833  hjstate->js.joinqual =
834  ExecInitQual(node->join.joinqual, (PlanState *) hjstate);
835  hjstate->hashclauses =
836  ExecInitQual(node->hashclauses, (PlanState *) hjstate);
837 
838  /*
839  * initialize hash-specific info
840  */
841  hjstate->hj_HashTable = NULL;
842  hjstate->hj_FirstOuterTupleSlot = NULL;
843 
844  hjstate->hj_CurHashValue = 0;
845  hjstate->hj_CurBucketNo = 0;
847  hjstate->hj_CurTuple = NULL;
848 
849  hjstate->hj_OuterHashKeys = ExecInitExprList(node->hashkeys,
850  (PlanState *) hjstate);
851  hjstate->hj_HashOperators = node->hashoperators;
852  hjstate->hj_Collations = node->hashcollations;
853 
854  hjstate->hj_JoinState = HJ_BUILD_HASHTABLE;
855  hjstate->hj_MatchedOuter = false;
856  hjstate->hj_OuterNotEmpty = false;
857 
858  return hjstate;
859 }
860 
861 /* ----------------------------------------------------------------
862  * ExecEndHashJoin
863  *
864  * clean up routine for HashJoin node
865  * ----------------------------------------------------------------
866  */
867 void
869 {
870  /*
871  * Free hash table
872  */
873  if (node->hj_HashTable)
874  {
876  node->hj_HashTable = NULL;
877  }
878 
879  /*
880  * clean up subtrees
881  */
884 }
885 
886 /*
887  * ExecHashJoinOuterGetTuple
888  *
889  * get the next outer tuple for a parallel oblivious hashjoin: either by
890  * executing the outer plan node in the first pass, or from the temp
891  * files for the hashjoin batches.
892  *
893  * Returns a null slot if no more outer tuples (within the current batch).
894  *
895  * On success, the tuple's hash value is stored at *hashvalue --- this is
896  * either originally computed, or re-read from the temp file.
897  */
898 static TupleTableSlot *
900  HashJoinState *hjstate,
901  uint32 *hashvalue)
902 {
903  HashJoinTable hashtable = hjstate->hj_HashTable;
904  int curbatch = hashtable->curbatch;
905  TupleTableSlot *slot;
906 
907  if (curbatch == 0) /* if it is the first pass */
908  {
909  /*
910  * Check to see if first outer tuple was already fetched by
911  * ExecHashJoin() and not used yet.
912  */
913  slot = hjstate->hj_FirstOuterTupleSlot;
914  if (!TupIsNull(slot))
915  hjstate->hj_FirstOuterTupleSlot = NULL;
916  else
917  slot = ExecProcNode(outerNode);
918 
919  while (!TupIsNull(slot))
920  {
921  /*
922  * We have to compute the tuple's hash value.
923  */
924  ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
925 
926  econtext->ecxt_outertuple = slot;
927  if (ExecHashGetHashValue(hashtable, econtext,
928  hjstate->hj_OuterHashKeys,
929  true, /* outer tuple */
930  HJ_FILL_OUTER(hjstate),
931  hashvalue))
932  {
933  /* remember outer relation is not empty for possible rescan */
934  hjstate->hj_OuterNotEmpty = true;
935 
936  return slot;
937  }
938 
939  /*
940  * That tuple couldn't match because of a NULL, so discard it and
941  * continue with the next one.
942  */
943  slot = ExecProcNode(outerNode);
944  }
945  }
946  else if (curbatch < hashtable->nbatch)
947  {
948  BufFile *file = hashtable->outerBatchFile[curbatch];
949 
950  /*
951  * In outer-join cases, we could get here even though the batch file
952  * is empty.
953  */
954  if (file == NULL)
955  return NULL;
956 
957  slot = ExecHashJoinGetSavedTuple(hjstate,
958  file,
959  hashvalue,
960  hjstate->hj_OuterTupleSlot);
961  if (!TupIsNull(slot))
962  return slot;
963  }
964 
965  /* End of this batch */
966  return NULL;
967 }
968 
969 /*
970  * ExecHashJoinOuterGetTuple variant for the parallel case.
971  */
972 static TupleTableSlot *
974  HashJoinState *hjstate,
975  uint32 *hashvalue)
976 {
977  HashJoinTable hashtable = hjstate->hj_HashTable;
978  int curbatch = hashtable->curbatch;
979  TupleTableSlot *slot;
980 
981  /*
982  * In the Parallel Hash case we only run the outer plan directly for
983  * single-batch hash joins. Otherwise we have to go to batch files, even
984  * for batch 0.
985  */
986  if (curbatch == 0 && hashtable->nbatch == 1)
987  {
988  slot = ExecProcNode(outerNode);
989 
990  while (!TupIsNull(slot))
991  {
992  ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
993 
994  econtext->ecxt_outertuple = slot;
995  if (ExecHashGetHashValue(hashtable, econtext,
996  hjstate->hj_OuterHashKeys,
997  true, /* outer tuple */
998  HJ_FILL_OUTER(hjstate),
999  hashvalue))
1000  return slot;
1001 
1002  /*
1003  * That tuple couldn't match because of a NULL, so discard it and
1004  * continue with the next one.
1005  */
1006  slot = ExecProcNode(outerNode);
1007  }
1008  }
1009  else if (curbatch < hashtable->nbatch)
1010  {
1011  MinimalTuple tuple;
1012 
1013  tuple = sts_parallel_scan_next(hashtable->batches[curbatch].outer_tuples,
1014  hashvalue);
1015  if (tuple != NULL)
1016  {
1018  hjstate->hj_OuterTupleSlot,
1019  false);
1020  slot = hjstate->hj_OuterTupleSlot;
1021  return slot;
1022  }
1023  else
1025  }
1026 
1027  /* End of this batch */
1028  hashtable->batches[curbatch].outer_eof = true;
1029 
1030  return NULL;
1031 }
1032 
1033 /*
1034  * ExecHashJoinNewBatch
1035  * switch to a new hashjoin batch
1036  *
1037  * Returns true if successful, false if there are no more batches.
1038  */
1039 static bool
1041 {
1042  HashJoinTable hashtable = hjstate->hj_HashTable;
1043  int nbatch;
1044  int curbatch;
1045  BufFile *innerFile;
1046  TupleTableSlot *slot;
1047  uint32 hashvalue;
1048 
1049  nbatch = hashtable->nbatch;
1050  curbatch = hashtable->curbatch;
1051 
1052  if (curbatch > 0)
1053  {
1054  /*
1055  * We no longer need the previous outer batch file; close it right
1056  * away to free disk space.
1057  */
1058  if (hashtable->outerBatchFile[curbatch])
1059  BufFileClose(hashtable->outerBatchFile[curbatch]);
1060  hashtable->outerBatchFile[curbatch] = NULL;
1061  }
1062  else /* we just finished the first batch */
1063  {
1064  /*
1065  * Reset some of the skew optimization state variables, since we no
1066  * longer need to consider skew tuples after the first batch. The
1067  * memory context reset we are about to do will release the skew
1068  * hashtable itself.
1069  */
1070  hashtable->skewEnabled = false;
1071  hashtable->skewBucket = NULL;
1072  hashtable->skewBucketNums = NULL;
1073  hashtable->nSkewBuckets = 0;
1074  hashtable->spaceUsedSkew = 0;
1075  }
1076 
1077  /*
1078  * We can always skip over any batches that are completely empty on both
1079  * sides. We can sometimes skip over batches that are empty on only one
1080  * side, but there are exceptions:
1081  *
1082  * 1. In a left/full outer join, we have to process outer batches even if
1083  * the inner batch is empty. Similarly, in a right/right-anti/full outer
1084  * join, we have to process inner batches even if the outer batch is
1085  * empty.
1086  *
1087  * 2. If we have increased nbatch since the initial estimate, we have to
1088  * scan inner batches since they might contain tuples that need to be
1089  * reassigned to later inner batches.
1090  *
1091  * 3. Similarly, if we have increased nbatch since starting the outer
1092  * scan, we have to rescan outer batches in case they contain tuples that
1093  * need to be reassigned.
1094  */
1095  curbatch++;
1096  while (curbatch < nbatch &&
1097  (hashtable->outerBatchFile[curbatch] == NULL ||
1098  hashtable->innerBatchFile[curbatch] == NULL))
1099  {
1100  if (hashtable->outerBatchFile[curbatch] &&
1101  HJ_FILL_OUTER(hjstate))
1102  break; /* must process due to rule 1 */
1103  if (hashtable->innerBatchFile[curbatch] &&
1104  HJ_FILL_INNER(hjstate))
1105  break; /* must process due to rule 1 */
1106  if (hashtable->innerBatchFile[curbatch] &&
1107  nbatch != hashtable->nbatch_original)
1108  break; /* must process due to rule 2 */
1109  if (hashtable->outerBatchFile[curbatch] &&
1110  nbatch != hashtable->nbatch_outstart)
1111  break; /* must process due to rule 3 */
1112  /* We can ignore this batch. */
1113  /* Release associated temp files right away. */
1114  if (hashtable->innerBatchFile[curbatch])
1115  BufFileClose(hashtable->innerBatchFile[curbatch]);
1116  hashtable->innerBatchFile[curbatch] = NULL;
1117  if (hashtable->outerBatchFile[curbatch])
1118  BufFileClose(hashtable->outerBatchFile[curbatch]);
1119  hashtable->outerBatchFile[curbatch] = NULL;
1120  curbatch++;
1121  }
1122 
1123  if (curbatch >= nbatch)
1124  return false; /* no more batches */
1125 
1126  hashtable->curbatch = curbatch;
1127 
1128  /*
1129  * Reload the hash table with the new inner batch (which could be empty)
1130  */
1131  ExecHashTableReset(hashtable);
1132 
1133  innerFile = hashtable->innerBatchFile[curbatch];
1134 
1135  if (innerFile != NULL)
1136  {
1137  if (BufFileSeek(innerFile, 0, 0, SEEK_SET))
1138  ereport(ERROR,
1140  errmsg("could not rewind hash-join temporary file")));
1141 
1142  while ((slot = ExecHashJoinGetSavedTuple(hjstate,
1143  innerFile,
1144  &hashvalue,
1145  hjstate->hj_HashTupleSlot)))
1146  {
1147  /*
1148  * NOTE: some tuples may be sent to future batches. Also, it is
1149  * possible for hashtable->nbatch to be increased here!
1150  */
1151  ExecHashTableInsert(hashtable, slot, hashvalue);
1152  }
1153 
1154  /*
1155  * after we build the hash table, the inner batch file is no longer
1156  * needed
1157  */
1158  BufFileClose(innerFile);
1159  hashtable->innerBatchFile[curbatch] = NULL;
1160  }
1161 
1162  /*
1163  * Rewind outer batch file (if present), so that we can start reading it.
1164  */
1165  if (hashtable->outerBatchFile[curbatch] != NULL)
1166  {
1167  if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0, SEEK_SET))
1168  ereport(ERROR,
1170  errmsg("could not rewind hash-join temporary file")));
1171  }
1172 
1173  return true;
1174 }
1175 
1176 /*
1177  * Choose a batch to work on, and attach to it. Returns true if successful,
1178  * false if there are no more batches.
1179  */
1180 static bool
1182 {
1183  HashJoinTable hashtable = hjstate->hj_HashTable;
1184  int start_batchno;
1185  int batchno;
1186 
1187  /*
1188  * If we were already attached to a batch, remember not to bother checking
1189  * it again, and detach from it (possibly freeing the hash table if we are
1190  * last to detach).
1191  */
1192  if (hashtable->curbatch >= 0)
1193  {
1194  hashtable->batches[hashtable->curbatch].done = true;
1195  ExecHashTableDetachBatch(hashtable);
1196  }
1197 
1198  /*
1199  * Search for a batch that isn't done. We use an atomic counter to start
1200  * our search at a different batch in every participant when there are
1201  * more batches than participants.
1202  */
1203  batchno = start_batchno =
1205  hashtable->nbatch;
1206  do
1207  {
1208  uint32 hashvalue;
1209  MinimalTuple tuple;
1210  TupleTableSlot *slot;
1211 
1212  if (!hashtable->batches[batchno].done)
1213  {
1214  SharedTuplestoreAccessor *inner_tuples;
1215  Barrier *batch_barrier =
1216  &hashtable->batches[batchno].shared->batch_barrier;
1217 
1218  switch (BarrierAttach(batch_barrier))
1219  {
1220  case PHJ_BATCH_ELECT:
1221 
1222  /* One backend allocates the hash table. */
1223  if (BarrierArriveAndWait(batch_barrier,
1224  WAIT_EVENT_HASH_BATCH_ELECT))
1225  ExecParallelHashTableAlloc(hashtable, batchno);
1226  /* Fall through. */
1227 
1228  case PHJ_BATCH_ALLOCATE:
1229  /* Wait for allocation to complete. */
1230  BarrierArriveAndWait(batch_barrier,
1231  WAIT_EVENT_HASH_BATCH_ALLOCATE);
1232  /* Fall through. */
1233 
1234  case PHJ_BATCH_LOAD:
1235  /* Start (or join in) loading tuples. */
1236  ExecParallelHashTableSetCurrentBatch(hashtable, batchno);
1237  inner_tuples = hashtable->batches[batchno].inner_tuples;
1238  sts_begin_parallel_scan(inner_tuples);
1239  while ((tuple = sts_parallel_scan_next(inner_tuples,
1240  &hashvalue)))
1241  {
1243  hjstate->hj_HashTupleSlot,
1244  false);
1245  slot = hjstate->hj_HashTupleSlot;
1247  hashvalue);
1248  }
1249  sts_end_parallel_scan(inner_tuples);
1250  BarrierArriveAndWait(batch_barrier,
1251  WAIT_EVENT_HASH_BATCH_LOAD);
1252  /* Fall through. */
1253 
1254  case PHJ_BATCH_PROBE:
1255 
1256  /*
1257  * This batch is ready to probe. Return control to
1258  * caller. We stay attached to batch_barrier so that the
1259  * hash table stays alive until everyone's finished
1260  * probing it, but no participant is allowed to wait at
1261  * this barrier again (or else a deadlock could occur).
1262  * All attached participants must eventually detach from
1263  * the barrier and one worker must advance the phase so
1264  * that the final phase is reached.
1265  */
1266  ExecParallelHashTableSetCurrentBatch(hashtable, batchno);
1267  sts_begin_parallel_scan(hashtable->batches[batchno].outer_tuples);
1268 
1269  return true;
1270  case PHJ_BATCH_SCAN:
1271 
1272  /*
1273  * In principle, we could help scan for unmatched tuples,
1274  * since that phase is already underway (the thing we
1275  * can't do under current deadlock-avoidance rules is wait
1276  * for others to arrive at PHJ_BATCH_SCAN, because
1277  * PHJ_BATCH_PROBE emits tuples, but in this case we just
1278  * got here without waiting). That is not yet done. For
1279  * now, we just detach and go around again. We have to
1280  * use ExecHashTableDetachBatch() because there's a small
1281  * chance we'll be the last to detach, and then we're
1282  * responsible for freeing memory.
1283  */
1284  ExecParallelHashTableSetCurrentBatch(hashtable, batchno);
1285  hashtable->batches[batchno].done = true;
1286  ExecHashTableDetachBatch(hashtable);
1287  break;
1288 
1289  case PHJ_BATCH_FREE:
1290 
1291  /*
1292  * Already done. Detach and go around again (if any
1293  * remain).
1294  */
1295  BarrierDetach(batch_barrier);
1296  hashtable->batches[batchno].done = true;
1297  hashtable->curbatch = -1;
1298  break;
1299 
1300  default:
1301  elog(ERROR, "unexpected batch phase %d",
1302  BarrierPhase(batch_barrier));
1303  }
1304  }
1305  batchno = (batchno + 1) % hashtable->nbatch;
1306  } while (batchno != start_batchno);
1307 
1308  return false;
1309 }
1310 
1311 /*
1312  * ExecHashJoinSaveTuple
1313  * save a tuple to a batch file.
1314  *
1315  * The data recorded in the file for each tuple is its hash value,
1316  * then the tuple in MinimalTuple format.
1317  *
1318  * fileptr points to a batch file in one of the hashtable arrays.
1319  *
1320  * The batch files (and their buffers) are allocated in the spill context
1321  * created for the hashtable.
1322  */
1323 void
1325  BufFile **fileptr, HashJoinTable hashtable)
1326 {
1327  BufFile *file = *fileptr;
1328 
1329  /*
1330  * The batch file is lazily created. If this is the first tuple written to
1331  * this batch, the batch file is created and its buffer is allocated in
1332  * the spillCxt context, NOT in the batchCxt.
1333  *
1334  * During the build phase, buffered files are created for inner batches.
1335  * Each batch's buffered file is closed (and its buffer freed) after the
1336  * batch is loaded into memory during the outer side scan. Therefore, it
1337  * is necessary to allocate the batch file buffer in a memory context
1338  * which outlives the batch itself.
1339  *
1340  * Also, we use spillCxt instead of hashCxt for a better accounting of the
1341  * spilling memory consumption.
1342  */
1343  if (file == NULL)
1344  {
1345  MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
1346 
1347  file = BufFileCreateTemp(false);
1348  *fileptr = file;
1349 
1350  MemoryContextSwitchTo(oldctx);
1351  }
1352 
1353  BufFileWrite(file, &hashvalue, sizeof(uint32));
1354  BufFileWrite(file, tuple, tuple->t_len);
1355 }
1356 
1357 /*
1358  * ExecHashJoinGetSavedTuple
1359  * read the next tuple from a batch file. Return NULL if no more.
1360  *
1361  * On success, *hashvalue is set to the tuple's hash value, and the tuple
1362  * itself is stored in the given slot.
1363  */
1364 static TupleTableSlot *
1366  BufFile *file,
1367  uint32 *hashvalue,
1368  TupleTableSlot *tupleSlot)
1369 {
1370  uint32 header[2];
1371  size_t nread;
1372  MinimalTuple tuple;
1373 
1374  /*
1375  * We check for interrupts here because this is typically taken as an
1376  * alternative code path to an ExecProcNode() call, which would include
1377  * such a check.
1378  */
1380 
1381  /*
1382  * Since both the hash value and the MinimalTuple length word are uint32,
1383  * we can read them both in one BufFileRead() call without any type
1384  * cheating.
1385  */
1386  nread = BufFileReadMaybeEOF(file, header, sizeof(header), true);
1387  if (nread == 0) /* end of file */
1388  {
1389  ExecClearTuple(tupleSlot);
1390  return NULL;
1391  }
1392  *hashvalue = header[0];
1393  tuple = (MinimalTuple) palloc(header[1]);
1394  tuple->t_len = header[1];
1395  BufFileReadExact(file,
1396  (char *) tuple + sizeof(uint32),
1397  header[1] - sizeof(uint32));
1398  ExecForceStoreMinimalTuple(tuple, tupleSlot, true);
1399  return tupleSlot;
1400 }
1401 
1402 
1403 void
1405 {
1408 
1409  /*
1410  * In a multi-batch join, we currently have to do rescans the hard way,
1411  * primarily because batch temp files may have already been released. But
1412  * if it's a single-batch join, and there is no parameter change for the
1413  * inner subnode, then we can just re-use the existing hash table without
1414  * rebuilding it.
1415  */
1416  if (node->hj_HashTable != NULL)
1417  {
1418  if (node->hj_HashTable->nbatch == 1 &&
1419  innerPlan->chgParam == NULL)
1420  {
1421  /*
1422  * Okay to reuse the hash table; needn't rescan inner, either.
1423  *
1424  * However, if it's a right/right-anti/full join, we'd better
1425  * reset the inner-tuple match flags contained in the table.
1426  */
1427  if (HJ_FILL_INNER(node))
1429 
1430  /*
1431  * Also, we need to reset our state about the emptiness of the
1432  * outer relation, so that the new scan of the outer will update
1433  * it correctly if it turns out to be empty this time. (There's no
1434  * harm in clearing it now because ExecHashJoin won't need the
1435  * info. In the other cases, where the hash table doesn't exist
1436  * or we are destroying it, we leave this state alone because
1437  * ExecHashJoin will need it the first time through.)
1438  */
1439  node->hj_OuterNotEmpty = false;
1440 
1441  /* ExecHashJoin can skip the BUILD_HASHTABLE step */
1443  }
1444  else
1445  {
1446  /* must destroy and rebuild hash table */
1447  HashState *hashNode = castNode(HashState, innerPlan);
1448 
1449  Assert(hashNode->hashtable == node->hj_HashTable);
1450  /* accumulate stats from old hash table, if wanted */
1451  /* (this should match ExecShutdownHash) */
1452  if (hashNode->ps.instrument && !hashNode->hinstrument)
1453  hashNode->hinstrument = (HashInstrumentation *)
1454  palloc0(sizeof(HashInstrumentation));
1455  if (hashNode->hinstrument)
1457  hashNode->hashtable);
1458  /* for safety, be sure to clear child plan node's pointer too */
1459  hashNode->hashtable = NULL;
1460 
1462  node->hj_HashTable = NULL;
1464 
1465  /*
1466  * if chgParam of subnode is not null then plan will be re-scanned
1467  * by first ExecProcNode.
1468  */
1469  if (innerPlan->chgParam == NULL)
1471  }
1472  }
1473 
1474  /* Always reset intra-tuple state */
1475  node->hj_CurHashValue = 0;
1476  node->hj_CurBucketNo = 0;
1478  node->hj_CurTuple = NULL;
1479 
1480  node->hj_MatchedOuter = false;
1481  node->hj_FirstOuterTupleSlot = NULL;
1482 
1483  /*
1484  * if chgParam of subnode is not null then plan will be re-scanned by
1485  * first ExecProcNode.
1486  */
1487  if (outerPlan->chgParam == NULL)
1489 }
1490 
1491 void
1493 {
1494  if (node->hj_HashTable)
1495  {
1496  /*
1497  * Detach from shared state before DSM memory goes away. This makes
1498  * sure that we don't have any pointers into DSM memory by the time
1499  * ExecEndHashJoin runs.
1500  */
1503  }
1504 }
1505 
1506 static void
1508 {
1509  PlanState *outerState = outerPlanState(hjstate);
1510  ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
1511  HashJoinTable hashtable = hjstate->hj_HashTable;
1512  TupleTableSlot *slot;
1513  uint32 hashvalue;
1514  int i;
1515 
1516  Assert(hjstate->hj_FirstOuterTupleSlot == NULL);
1517 
1518  /* Execute outer plan, writing all tuples to shared tuplestores. */
1519  for (;;)
1520  {
1521  slot = ExecProcNode(outerState);
1522  if (TupIsNull(slot))
1523  break;
1524  econtext->ecxt_outertuple = slot;
1525  if (ExecHashGetHashValue(hashtable, econtext,
1526  hjstate->hj_OuterHashKeys,
1527  true, /* outer tuple */
1528  HJ_FILL_OUTER(hjstate),
1529  &hashvalue))
1530  {
1531  int batchno;
1532  int bucketno;
1533  bool shouldFree;
1534  MinimalTuple mintup = ExecFetchSlotMinimalTuple(slot, &shouldFree);
1535 
1536  ExecHashGetBucketAndBatch(hashtable, hashvalue, &bucketno,
1537  &batchno);
1538  sts_puttuple(hashtable->batches[batchno].outer_tuples,
1539  &hashvalue, mintup);
1540 
1541  if (shouldFree)
1542  heap_free_minimal_tuple(mintup);
1543  }
1545  }
1546 
1547  /* Make sure all outer partitions are readable by any backend. */
1548  for (i = 0; i < hashtable->nbatch; ++i)
1549  sts_end_write(hashtable->batches[i].outer_tuples);
1550 }
1551 
1552 void
1554 {
1556  shm_toc_estimate_keys(&pcxt->estimator, 1);
1557 }
1558 
1559 void
1561 {
1562  int plan_node_id = state->js.ps.plan->plan_node_id;
1563  HashState *hashNode;
1564  ParallelHashJoinState *pstate;
1565 
1566  /*
1567  * Disable shared hash table mode if we failed to create a real DSM
1568  * segment, because that means that we don't have a DSA area to work with.
1569  */
1570  if (pcxt->seg == NULL)
1571  return;
1572 
1574 
1575  /*
1576  * Set up the state needed to coordinate access to the shared hash
1577  * table(s), using the plan node ID as the toc key.
1578  */
1579  pstate = shm_toc_allocate(pcxt->toc, sizeof(ParallelHashJoinState));
1580  shm_toc_insert(pcxt->toc, plan_node_id, pstate);
1581 
1582  /*
1583  * Set up the shared hash join state with no batches initially.
1584  * ExecHashTableCreate() will prepare at least one later and set nbatch
1585  * and space_allowed.
1586  */
1587  pstate->nbatch = 0;
1588  pstate->space_allowed = 0;
1589  pstate->batches = InvalidDsaPointer;
1590  pstate->old_batches = InvalidDsaPointer;
1591  pstate->nbuckets = 0;
1592  pstate->growth = PHJ_GROWTH_OK;
1594  pg_atomic_init_u32(&pstate->distributor, 0);
1595  pstate->nparticipants = pcxt->nworkers + 1;
1596  pstate->total_tuples = 0;
1597  LWLockInitialize(&pstate->lock,
1599  BarrierInit(&pstate->build_barrier, 0);
1600  BarrierInit(&pstate->grow_batches_barrier, 0);
1601  BarrierInit(&pstate->grow_buckets_barrier, 0);
1602 
1603  /* Set up the space we'll use for shared temporary files. */
1604  SharedFileSetInit(&pstate->fileset, pcxt->seg);
1605 
1606  /* Initialize the shared state in the hash node. */
1607  hashNode = (HashState *) innerPlanState(state);
1608  hashNode->parallel_state = pstate;
1609 }
1610 
1611 /* ----------------------------------------------------------------
1612  * ExecHashJoinReInitializeDSM
1613  *
1614  * Reset shared state before beginning a fresh scan.
1615  * ----------------------------------------------------------------
1616  */
1617 void
1619 {
1620  int plan_node_id = state->js.ps.plan->plan_node_id;
1621  ParallelHashJoinState *pstate =
1622  shm_toc_lookup(pcxt->toc, plan_node_id, false);
1623 
1624  /*
1625  * It would be possible to reuse the shared hash table in single-batch
1626  * cases by resetting and then fast-forwarding build_barrier to
1627  * PHJ_BUILD_FREE and batch 0's batch_barrier to PHJ_BATCH_PROBE, but
1628  * currently shared hash tables are already freed by now (by the last
1629  * participant to detach from the batch). We could consider keeping it
1630  * around for single-batch joins. We'd also need to adjust
1631  * finalize_plan() so that it doesn't record a dummy dependency for
1632  * Parallel Hash nodes, preventing the rescan optimization. For now we
1633  * don't try.
1634  */
1635 
1636  /* Detach, freeing any remaining shared memory. */
1637  if (state->hj_HashTable != NULL)
1638  {
1639  ExecHashTableDetachBatch(state->hj_HashTable);
1640  ExecHashTableDetach(state->hj_HashTable);
1641  }
1642 
1643  /* Clear any shared batch files. */
1644  SharedFileSetDeleteAll(&pstate->fileset);
1645 
1646  /* Reset build_barrier to PHJ_BUILD_ELECT so we can go around again. */
1647  BarrierInit(&pstate->build_barrier, 0);
1648 }
1649 
1650 void
1652  ParallelWorkerContext *pwcxt)
1653 {
1654  HashState *hashNode;
1655  int plan_node_id = state->js.ps.plan->plan_node_id;
1656  ParallelHashJoinState *pstate =
1657  shm_toc_lookup(pwcxt->toc, plan_node_id, false);
1658 
1659  /* Attach to the space for shared temporary files. */
1660  SharedFileSetAttach(&pstate->fileset, pwcxt->seg);
1661 
1662  /* Attach to the shared state in the hash node. */
1663  hashNode = (HashState *) innerPlanState(state);
1664  hashNode->parallel_state = pstate;
1665 
1667 }
static void pg_atomic_init_u32(volatile pg_atomic_uint32 *ptr, uint32 val)
Definition: atomics.h:214
static uint32 pg_atomic_fetch_add_u32(volatile pg_atomic_uint32 *ptr, int32 add_)
Definition: atomics.h:359
int BarrierAttach(Barrier *barrier)
Definition: barrier.c:236
void BarrierInit(Barrier *barrier, int participants)
Definition: barrier.c:100
int BarrierPhase(Barrier *barrier)
Definition: barrier.c:265
bool BarrierArriveAndWait(Barrier *barrier, uint32 wait_event_info)
Definition: barrier.c:125
bool BarrierDetach(Barrier *barrier)
Definition: barrier.c:256
void BufFileReadExact(BufFile *file, void *ptr, size_t size)
Definition: buffile.c:654
BufFile * BufFileCreateTemp(bool interXact)
Definition: buffile.c:193
void BufFileWrite(BufFile *file, const void *ptr, size_t size)
Definition: buffile.c:676
size_t BufFileReadMaybeEOF(BufFile *file, void *ptr, size_t size, bool eofOK)
Definition: buffile.c:664
int BufFileSeek(BufFile *file, int fileno, off_t offset, int whence)
Definition: buffile.c:740
void BufFileClose(BufFile *file)
Definition: buffile.c:412
unsigned int uint32
Definition: c.h:506
#define Assert(condition)
Definition: c.h:858
#define pg_attribute_always_inline
Definition: c.h:234
#define InvalidDsaPointer
Definition: dsa.h:78
int errcode_for_file_access(void)
Definition: elog.c:876
int errmsg(const char *fmt,...)
Definition: elog.c:1070
#define ERROR
Definition: elog.h:39
#define elog(elevel,...)
Definition: elog.h:224
#define ereport(elevel,...)
Definition: elog.h:149
void ExecReScan(PlanState *node)
Definition: execAmi.c:76
List * ExecInitExprList(List *nodes, PlanState *parent)
Definition: execExpr.c:326
ExprState * ExecInitQual(List *qual, PlanState *parent)
Definition: execExpr.c:220
void ExecEndNode(PlanState *node)
Definition: execProcnode.c:557
Node * MultiExecProcNode(PlanState *node)
Definition: execProcnode.c:502
void ExecSetExecProcNode(PlanState *node, ExecProcNodeMtd function)
Definition: execProcnode.c:425
PlanState * ExecInitNode(Plan *node, EState *estate, int eflags)
Definition: execProcnode.c:142
const TupleTableSlotOps TTSOpsVirtual
Definition: execTuples.c:84
TupleTableSlot * ExecInitNullTupleSlot(EState *estate, TupleDesc tupType, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1934
void ExecForceStoreMinimalTuple(MinimalTuple mtup, TupleTableSlot *slot, bool shouldFree)
Definition: execTuples.c:1599
MinimalTuple ExecFetchSlotMinimalTuple(TupleTableSlot *slot, bool *shouldFree)
Definition: execTuples.c:1779
TupleTableSlot * ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1918
void ExecInitResultTupleSlotTL(PlanState *planstate, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1886
TupleDesc ExecGetResultType(PlanState *planstate)
Definition: execUtils.c:493
const TupleTableSlotOps * ExecGetResultSlotOps(PlanState *planstate, bool *isfixed)
Definition: execUtils.c:502
void ExecAssignExprContext(EState *estate, PlanState *planstate)
Definition: execUtils.c:483
void ExecAssignProjectionInfo(PlanState *planstate, TupleDesc inputDesc)
Definition: execUtils.c:538
#define InstrCountFiltered1(node, delta)
Definition: execnodes.h:1220
#define outerPlanState(node)
Definition: execnodes.h:1212
#define InstrCountFiltered2(node, delta)
Definition: execnodes.h:1225
#define innerPlanState(node)
Definition: execnodes.h:1211
#define EXEC_FLAG_BACKWARD
Definition: executor.h:68
static TupleTableSlot * ExecProject(ProjectionInfo *projInfo)
Definition: executor.h:376
#define ResetExprContext(econtext)
Definition: executor.h:544
static bool ExecQual(ExprState *state, ExprContext *econtext)
Definition: executor.h:413
#define EXEC_FLAG_MARK
Definition: executor.h:69
static TupleTableSlot * ExecProcNode(PlanState *node)
Definition: executor.h:269
#define PHJ_BATCH_SCAN
Definition: hashjoin.h:281
#define PHJ_BATCH_PROBE
Definition: hashjoin.h:280
#define PHJ_BATCH_LOAD
Definition: hashjoin.h:279
#define PHJ_BUILD_FREE
Definition: hashjoin.h:274
#define PHJ_BUILD_HASH_OUTER
Definition: hashjoin.h:272
#define HJTUPLE_MINTUPLE(hjtup)
Definition: hashjoin.h:91
#define PHJ_BATCH_ELECT
Definition: hashjoin.h:277
#define PHJ_BATCH_ALLOCATE
Definition: hashjoin.h:278
#define PHJ_BATCH_FREE
Definition: hashjoin.h:282
@ PHJ_GROWTH_OK
Definition: hashjoin.h:233
#define PHJ_BUILD_RUN
Definition: hashjoin.h:273
#define INVALID_SKEW_BUCKET_NO
Definition: hashjoin.h:120
void heap_free_minimal_tuple(MinimalTuple mtup)
Definition: heaptuple.c:1523
MinimalTupleData * MinimalTuple
Definition: htup.h:27
#define HeapTupleHeaderHasMatch(tup)
Definition: htup_details.h:514
#define HeapTupleHeaderSetMatch(tup)
Definition: htup_details.h:519
int i
Definition: isn.c:73
void LWLockInitialize(LWLock *lock, int tranche_id)
Definition: lwlock.c:707
@ LWTRANCHE_PARALLEL_HASH_JOIN
Definition: lwlock.h:194
void * palloc0(Size size)
Definition: mcxt.c:1347
void * palloc(Size size)
Definition: mcxt.c:1317
#define CHECK_FOR_INTERRUPTS()
Definition: miscadmin.h:122
void ExecParallelHashTableSetCurrentBatch(HashJoinTable hashtable, int batchno)
Definition: nodeHash.c:3472
void ExecHashTableReset(HashJoinTable hashtable)
Definition: nodeHash.c:2299
bool ExecHashGetHashValue(HashJoinTable hashtable, ExprContext *econtext, List *hashkeys, bool outer_tuple, bool keep_nulls, uint32 *hashvalue)
Definition: nodeHash.c:1824
bool ExecParallelScanHashBucket(HashJoinState *hjstate, ExprContext *econtext)
Definition: nodeHash.c:2025
void ExecHashAccumInstrumentation(HashInstrumentation *instrument, HashJoinTable hashtable)
Definition: nodeHash.c:2850
void ExecHashTableDetachBatch(HashJoinTable hashtable)
Definition: nodeHash.c:3282
void ExecPrepHashTableForUnmatched(HashJoinState *hjstate)
Definition: nodeHash.c:2076
void ExecHashTableDetach(HashJoinTable hashtable)
Definition: nodeHash.c:3374
bool ExecParallelScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
Definition: nodeHash.c:2236
void ExecHashTableDestroy(HashJoinTable hashtable)
Definition: nodeHash.c:883
int ExecHashGetSkewBucket(HashJoinTable hashtable, uint32 hashvalue)
Definition: nodeHash.c:2528
bool ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
Definition: nodeHash.c:2162
void ExecHashTableResetMatchFlags(HashJoinTable hashtable)
Definition: nodeHash.c:2327
void ExecHashTableInsert(HashJoinTable hashtable, TupleTableSlot *slot, uint32 hashvalue)
Definition: nodeHash.c:1624
void ExecHashGetBucketAndBatch(HashJoinTable hashtable, uint32 hashvalue, int *bucketno, int *batchno)
Definition: nodeHash.c:1932
void ExecParallelHashTableAlloc(HashJoinTable hashtable, int batchno)
Definition: nodeHash.c:3262
HashJoinTable ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations, bool keepNulls)
Definition: nodeHash.c:432
bool ExecParallelPrepHashTableForUnmatched(HashJoinState *hjstate)
Definition: nodeHash.c:2097
void ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable, TupleTableSlot *slot, uint32 hashvalue)
Definition: nodeHash.c:1780
bool ExecScanHashBucket(HashJoinState *hjstate, ExprContext *econtext)
Definition: nodeHash.c:1964
#define HJ_NEED_NEW_BATCH
Definition: nodeHashjoin.c:184
static pg_attribute_always_inline TupleTableSlot * ExecHashJoinImpl(PlanState *pstate, bool parallel)
Definition: nodeHashjoin.c:220
void ExecHashJoinInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
HashJoinState * ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
Definition: nodeHashjoin.c:718
#define HJ_SCAN_BUCKET
Definition: nodeHashjoin.c:181
void ExecEndHashJoin(HashJoinState *node)
Definition: nodeHashjoin.c:868
#define HJ_FILL_OUTER_TUPLE
Definition: nodeHashjoin.c:182
static bool ExecHashJoinNewBatch(HashJoinState *hjstate)
static TupleTableSlot * ExecParallelHashJoinOuterGetTuple(PlanState *outerNode, HashJoinState *hjstate, uint32 *hashvalue)
Definition: nodeHashjoin.c:973
#define HJ_FILL_INNER(hjstate)
Definition: nodeHashjoin.c:189
static bool ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
static TupleTableSlot * ExecHashJoinGetSavedTuple(HashJoinState *hjstate, BufFile *file, uint32 *hashvalue, TupleTableSlot *tupleSlot)
static TupleTableSlot * ExecParallelHashJoin(PlanState *pstate)
Definition: nodeHashjoin.c:702
void ExecShutdownHashJoin(HashJoinState *node)
#define HJ_FILL_INNER_TUPLES
Definition: nodeHashjoin.c:183
void ExecHashJoinEstimate(HashJoinState *state, ParallelContext *pcxt)
static TupleTableSlot * ExecHashJoinOuterGetTuple(PlanState *outerNode, HashJoinState *hjstate, uint32 *hashvalue)
Definition: nodeHashjoin.c:899
void ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue, BufFile **fileptr, HashJoinTable hashtable)
#define HJ_NEED_NEW_OUTER
Definition: nodeHashjoin.c:180
#define HJ_FILL_OUTER(hjstate)
Definition: nodeHashjoin.c:187
static TupleTableSlot * ExecHashJoin(PlanState *pstate)
Definition: nodeHashjoin.c:686
void ExecReScanHashJoin(HashJoinState *node)
void ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
void ExecHashJoinInitializeWorker(HashJoinState *state, ParallelWorkerContext *pwcxt)
static void ExecParallelHashJoinPartitionOuter(HashJoinState *hjstate)
#define HJ_BUILD_HASHTABLE
Definition: nodeHashjoin.c:179
#define makeNode(_type_)
Definition: nodes.h:155
#define castNode(_type_, nodeptr)
Definition: nodes.h:176
@ JOIN_SEMI
Definition: nodes.h:307
@ JOIN_FULL
Definition: nodes.h:295
@ JOIN_INNER
Definition: nodes.h:293
@ JOIN_RIGHT
Definition: nodes.h:296
@ JOIN_RIGHT_SEMI
Definition: nodes.h:309
@ JOIN_LEFT
Definition: nodes.h:294
@ JOIN_RIGHT_ANTI
Definition: nodes.h:310
@ JOIN_ANTI
Definition: nodes.h:308
#define innerPlan(node)
Definition: plannodes.h:181
#define outerPlan(node)
Definition: plannodes.h:182
MemoryContextSwitchTo(old_ctx)
void SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg)
Definition: sharedfileset.c:56
void SharedFileSetDeleteAll(SharedFileSet *fileset)
Definition: sharedfileset.c:83
void SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
Definition: sharedfileset.c:38
MinimalTuple sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
void sts_end_write(SharedTuplestoreAccessor *accessor)
void sts_end_parallel_scan(SharedTuplestoreAccessor *accessor)
void sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data, MinimalTuple tuple)
void sts_begin_parallel_scan(SharedTuplestoreAccessor *accessor)
void shm_toc_insert(shm_toc *toc, uint64 key, void *address)
Definition: shm_toc.c:171
void * shm_toc_allocate(shm_toc *toc, Size nbytes)
Definition: shm_toc.c:88
void * shm_toc_lookup(shm_toc *toc, uint64 key, bool noError)
Definition: shm_toc.c:232
#define shm_toc_estimate_chunk(e, sz)
Definition: shm_toc.h:51
#define shm_toc_estimate_keys(e, cnt)
Definition: shm_toc.h:53
TupleTableSlot * ecxt_innertuple
Definition: execnodes.h:257
TupleTableSlot * ecxt_outertuple
Definition: execnodes.h:259
HashJoinTuple hj_CurTuple
Definition: execnodes.h:2224
int hj_CurSkewBucketNo
Definition: execnodes.h:2223
List * hj_OuterHashKeys
Definition: execnodes.h:2217
TupleTableSlot * hj_NullOuterTupleSlot
Definition: execnodes.h:2227
TupleTableSlot * hj_OuterTupleSlot
Definition: execnodes.h:2225
bool hj_OuterNotEmpty
Definition: execnodes.h:2232
TupleTableSlot * hj_NullInnerTupleSlot
Definition: execnodes.h:2228
List * hj_HashOperators
Definition: execnodes.h:2218
ExprState * hashclauses
Definition: execnodes.h:2216
JoinState js
Definition: execnodes.h:2215
TupleTableSlot * hj_FirstOuterTupleSlot
Definition: execnodes.h:2229
bool hj_MatchedOuter
Definition: execnodes.h:2231
uint32 hj_CurHashValue
Definition: execnodes.h:2221
List * hj_Collations
Definition: execnodes.h:2219
int hj_CurBucketNo
Definition: execnodes.h:2222
HashJoinTable hj_HashTable
Definition: execnodes.h:2220
TupleTableSlot * hj_HashTupleSlot
Definition: execnodes.h:2226
ParallelHashJoinBatchAccessor * batches
Definition: hashjoin.h:373
double totalTuples
Definition: hashjoin.h:332
ParallelHashJoinState * parallel_state
Definition: hashjoin.h:372
MemoryContext spillCxt
Definition: hashjoin.h:364
int * skewBucketNums
Definition: hashjoin.h:322
BufFile ** innerBatchFile
Definition: hashjoin.h:343
BufFile ** outerBatchFile
Definition: hashjoin.h:344
HashSkewBucket ** skewBucket
Definition: hashjoin.h:319
List * hashcollations
Definition: plannodes.h:867
List * hashclauses
Definition: plannodes.h:865
List * hashoperators
Definition: plannodes.h:866
Join join
Definition: plannodes.h:864
List * hashkeys
Definition: plannodes.h:873
struct ParallelHashJoinState * parallel_state
Definition: execnodes.h:2790
HashJoinTable hashtable
Definition: execnodes.h:2771
PlanState ps
Definition: execnodes.h:2770
HashInstrumentation * hinstrument
Definition: execnodes.h:2787
JoinType jointype
Definition: execnodes.h:2113
PlanState ps
Definition: execnodes.h:2112
ExprState * joinqual
Definition: execnodes.h:2116
bool single_match
Definition: execnodes.h:2114
List * joinqual
Definition: plannodes.h:793
JoinType jointype
Definition: plannodes.h:791
bool inner_unique
Definition: plannodes.h:792
dsm_segment * seg
Definition: parallel.h:42
shm_toc_estimator estimator
Definition: parallel.h:41
shm_toc * toc
Definition: parallel.h:44
SharedTuplestoreAccessor * outer_tuples
Definition: hashjoin.h:221
ParallelHashJoinBatch * shared
Definition: hashjoin.h:209
SharedTuplestoreAccessor * inner_tuples
Definition: hashjoin.h:220
Barrier grow_batches_barrier
Definition: hashjoin.h:261
dsa_pointer old_batches
Definition: hashjoin.h:249
dsa_pointer chunk_work_queue
Definition: hashjoin.h:254
Barrier grow_buckets_barrier
Definition: hashjoin.h:262
ParallelHashGrowth growth
Definition: hashjoin.h:253
pg_atomic_uint32 distributor
Definition: hashjoin.h:263
SharedFileSet fileset
Definition: hashjoin.h:265
dsa_pointer batches
Definition: hashjoin.h:248
dsm_segment * seg
Definition: parallel.h:52
Instrumentation * instrument
Definition: execnodes.h:1126
ExprState * qual
Definition: execnodes.h:1137
Plan * plan
Definition: execnodes.h:1116
EState * state
Definition: execnodes.h:1118
ExprContext * ps_ExprContext
Definition: execnodes.h:1155
TupleTableSlot * ps_ResultTupleSlot
Definition: execnodes.h:1154
ProjectionInfo * ps_ProjInfo
Definition: execnodes.h:1156
ExecProcNodeMtd ExecProcNode
Definition: execnodes.h:1122
Cost total_cost
Definition: plannodes.h:129
Cost startup_cost
Definition: plannodes.h:128
Definition: regguts.h:323
static TupleTableSlot * ExecClearTuple(TupleTableSlot *slot)
Definition: tuptable.h:454
#define TupIsNull(slot)
Definition: tuptable.h:306