PostgreSQL Source Code  git master
nodeAgg.c
Go to the documentation of this file.
1 /*-------------------------------------------------------------------------
2  *
3  * nodeAgg.c
4  * Routines to handle aggregate nodes.
5  *
6  * ExecAgg normally evaluates each aggregate in the following steps:
7  *
8  * transvalue = initcond
9  * foreach input_tuple do
10  * transvalue = transfunc(transvalue, input_value(s))
11  * result = finalfunc(transvalue, direct_argument(s))
12  *
13  * If a finalfunc is not supplied then the result is just the ending
14  * value of transvalue.
15  *
16  * Other behaviors can be selected by the "aggsplit" mode, which exists
17  * to support partial aggregation. It is possible to:
18  * * Skip running the finalfunc, so that the output is always the
19  * final transvalue state.
20  * * Substitute the combinefunc for the transfunc, so that transvalue
21  * states (propagated up from a child partial-aggregation step) are merged
22  * rather than processing raw input rows. (The statements below about
23  * the transfunc apply equally to the combinefunc, when it's selected.)
24  * * Apply the serializefunc to the output values (this only makes sense
25  * when skipping the finalfunc, since the serializefunc works on the
26  * transvalue data type).
27  * * Apply the deserializefunc to the input values (this only makes sense
28  * when using the combinefunc, for similar reasons).
29  * It is the planner's responsibility to connect up Agg nodes using these
30  * alternate behaviors in a way that makes sense, with partial aggregation
31  * results being fed to nodes that expect them.
32  *
33  * If a normal aggregate call specifies DISTINCT or ORDER BY, we sort the
34  * input tuples and eliminate duplicates (if required) before performing
35  * the above-depicted process. (However, we don't do that for ordered-set
36  * aggregates; their "ORDER BY" inputs are ordinary aggregate arguments
37  * so far as this module is concerned.) Note that partial aggregation
38  * is not supported in these cases, since we couldn't ensure global
39  * ordering or distinctness of the inputs.
40  *
41  * If transfunc is marked "strict" in pg_proc and initcond is NULL,
42  * then the first non-NULL input_value is assigned directly to transvalue,
43  * and transfunc isn't applied until the second non-NULL input_value.
44  * The agg's first input type and transtype must be the same in this case!
45  *
46  * If transfunc is marked "strict" then NULL input_values are skipped,
47  * keeping the previous transvalue. If transfunc is not strict then it
48  * is called for every input tuple and must deal with NULL initcond
49  * or NULL input_values for itself.
50  *
51  * If finalfunc is marked "strict" then it is not called when the
52  * ending transvalue is NULL, instead a NULL result is created
53  * automatically (this is just the usual handling of strict functions,
54  * of course). A non-strict finalfunc can make its own choice of
55  * what to return for a NULL ending transvalue.
56  *
57  * Ordered-set aggregates are treated specially in one other way: we
58  * evaluate any "direct" arguments and pass them to the finalfunc along
59  * with the transition value.
60  *
61  * A finalfunc can have additional arguments beyond the transvalue and
62  * any "direct" arguments, corresponding to the input arguments of the
63  * aggregate. These are always just passed as NULL. Such arguments may be
64  * needed to allow resolution of a polymorphic aggregate's result type.
65  *
66  * We compute aggregate input expressions and run the transition functions
67  * in a temporary econtext (aggstate->tmpcontext). This is reset at least
68  * once per input tuple, so when the transvalue datatype is
69  * pass-by-reference, we have to be careful to copy it into a longer-lived
70  * memory context, and free the prior value to avoid memory leakage. We
71  * store transvalues in another set of econtexts, aggstate->aggcontexts
72  * (one per grouping set, see below), which are also used for the hashtable
73  * structures in AGG_HASHED mode. These econtexts are rescanned, not just
74  * reset, at group boundaries so that aggregate transition functions can
75  * register shutdown callbacks via AggRegisterCallback.
76  *
77  * The node's regular econtext (aggstate->ss.ps.ps_ExprContext) is used to
78  * run finalize functions and compute the output tuple; this context can be
79  * reset once per output tuple.
80  *
81  * The executor's AggState node is passed as the fmgr "context" value in
82  * all transfunc and finalfunc calls. It is not recommended that the
83  * transition functions look at the AggState node directly, but they can
84  * use AggCheckCallContext() to verify that they are being called by
85  * nodeAgg.c (and not as ordinary SQL functions). The main reason a
86  * transition function might want to know this is so that it can avoid
87  * palloc'ing a fixed-size pass-by-ref transition value on every call:
88  * it can instead just scribble on and return its left input. Ordinarily
89  * it is completely forbidden for functions to modify pass-by-ref inputs,
90  * but in the aggregate case we know the left input is either the initial
91  * transition value or a previous function result, and in either case its
92  * value need not be preserved. See int8inc() for an example. Notice that
93  * the EEOP_AGG_PLAIN_TRANS step is coded to avoid a data copy step when
94  * the previous transition value pointer is returned. It is also possible
95  * to avoid repeated data copying when the transition value is an expanded
96  * object: to do that, the transition function must take care to return
97  * an expanded object that is in a child context of the memory context
98  * returned by AggCheckCallContext(). Also, some transition functions want
99  * to store working state in addition to the nominal transition value; they
100  * can use the memory context returned by AggCheckCallContext() to do that.
101  *
102  * Note: AggCheckCallContext() is available as of PostgreSQL 9.0. The
103  * AggState is available as context in earlier releases (back to 8.1),
104  * but direct examination of the node is needed to use it before 9.0.
105  *
106  * As of 9.4, aggregate transition functions can also use AggGetAggref()
107  * to get hold of the Aggref expression node for their aggregate call.
108  * This is mainly intended for ordered-set aggregates, which are not
109  * supported as window functions. (A regular aggregate function would
110  * need some fallback logic to use this, since there's no Aggref node
111  * for a window function.)
112  *
113  * Grouping sets:
114  *
115  * A list of grouping sets which is structurally equivalent to a ROLLUP
116  * clause (e.g. (a,b,c), (a,b), (a)) can be processed in a single pass over
117  * ordered data. We do this by keeping a separate set of transition values
118  * for each grouping set being concurrently processed; for each input tuple
119  * we update them all, and on group boundaries we reset those states
120  * (starting at the front of the list) whose grouping values have changed
121  * (the list of grouping sets is ordered from most specific to least
122  * specific).
123  *
124  * Where more complex grouping sets are used, we break them down into
125  * "phases", where each phase has a different sort order (except phase 0
126  * which is reserved for hashing). During each phase but the last, the
127  * input tuples are additionally stored in a tuplesort which is keyed to the
128  * next phase's sort order; during each phase but the first, the input
129  * tuples are drawn from the previously sorted data. (The sorting of the
130  * data for the first phase is handled by the planner, as it might be
131  * satisfied by underlying nodes.)
132  *
133  * Hashing can be mixed with sorted grouping. To do this, we have an
134  * AGG_MIXED strategy that populates the hashtables during the first sorted
135  * phase, and switches to reading them out after completing all sort phases.
136  * We can also support AGG_HASHED with multiple hash tables and no sorting
137  * at all.
138  *
139  * From the perspective of aggregate transition and final functions, the
140  * only issue regarding grouping sets is this: a single call site (flinfo)
141  * of an aggregate function may be used for updating several different
142  * transition values in turn. So the function must not cache in the flinfo
143  * anything which logically belongs as part of the transition value (most
144  * importantly, the memory context in which the transition value exists).
145  * The support API functions (AggCheckCallContext, AggRegisterCallback) are
146  * sensitive to the grouping set for which the aggregate function is
147  * currently being called.
148  *
149  * Plan structure:
150  *
151  * What we get from the planner is actually one "real" Agg node which is
152  * part of the plan tree proper, but which optionally has an additional list
153  * of Agg nodes hung off the side via the "chain" field. This is because an
154  * Agg node happens to be a convenient representation of all the data we
155  * need for grouping sets.
156  *
157  * For many purposes, we treat the "real" node as if it were just the first
158  * node in the chain. The chain must be ordered such that hashed entries
159  * come before sorted/plain entries; the real node is marked AGG_MIXED if
160  * there are both types present (in which case the real node describes one
161  * of the hashed groupings, other AGG_HASHED nodes may optionally follow in
162  * the chain, followed in turn by AGG_SORTED or (one) AGG_PLAIN node). If
163  * the real node is marked AGG_HASHED or AGG_SORTED, then all the chained
164  * nodes must be of the same type; if it is AGG_PLAIN, there can be no
165  * chained nodes.
166  *
167  * We collect all hashed nodes into a single "phase", numbered 0, and create
168  * a sorted phase (numbered 1..n) for each AGG_SORTED or AGG_PLAIN node.
169  * Phase 0 is allocated even if there are no hashes, but remains unused in
170  * that case.
171  *
172  * AGG_HASHED nodes actually refer to only a single grouping set each,
173  * because for each hashed grouping we need a separate grpColIdx and
174  * numGroups estimate. AGG_SORTED nodes represent a "rollup", a list of
175  * grouping sets that share a sort order. Each AGG_SORTED node other than
176  * the first one has an associated Sort node which describes the sort order
177  * to be used; the first sorted node takes its input from the outer subtree,
178  * which the planner has already arranged to provide ordered data.
179  *
180  * Memory and ExprContext usage:
181  *
182  * Because we're accumulating aggregate values across input rows, we need to
183  * use more memory contexts than just simple input/output tuple contexts.
184  * In fact, for a rollup, we need a separate context for each grouping set
185  * so that we can reset the inner (finer-grained) aggregates on their group
186  * boundaries while continuing to accumulate values for outer
187  * (coarser-grained) groupings. On top of this, we might be simultaneously
188  * populating hashtables; however, we only need one context for all the
189  * hashtables.
190  *
191  * So we create an array, aggcontexts, with an ExprContext for each grouping
192  * set in the largest rollup that we're going to process, and use the
193  * per-tuple memory context of those ExprContexts to store the aggregate
194  * transition values. hashcontext is the single context created to support
195  * all hash tables.
196  *
197  * Spilling To Disk
198  *
199  * When performing hash aggregation, if the hash table memory exceeds the
200  * limit (see hash_agg_check_limits()), we enter "spill mode". In spill
201  * mode, we advance the transition states only for groups already in the
202  * hash table. For tuples that would need to create a new hash table
203  * entries (and initialize new transition states), we instead spill them to
204  * disk to be processed later. The tuples are spilled in a partitioned
205  * manner, so that subsequent batches are smaller and less likely to exceed
206  * hash_mem (if a batch does exceed hash_mem, it must be spilled
207  * recursively).
208  *
209  * Spilled data is written to logical tapes. These provide better control
210  * over memory usage, disk space, and the number of files than if we were
211  * to use a BufFile for each spill. We don't know the number of tapes needed
212  * at the start of the algorithm (because it can recurse), so a tape set is
213  * allocated at the beginning, and individual tapes are created as needed.
214  * As a particular tape is read, logtape.c recycles its disk space. When a
215  * tape is read to completion, it is destroyed entirely.
216  *
217  * Tapes' buffers can take up substantial memory when many tapes are open at
218  * once. We only need one tape open at a time in read mode (using a buffer
219  * that's a multiple of BLCKSZ); but we need one tape open in write mode (each
220  * requiring a buffer of size BLCKSZ) for each partition.
221  *
222  * Note that it's possible for transition states to start small but then
223  * grow very large; for instance in the case of ARRAY_AGG. In such cases,
224  * it's still possible to significantly exceed hash_mem. We try to avoid
225  * this situation by estimating what will fit in the available memory, and
226  * imposing a limit on the number of groups separately from the amount of
227  * memory consumed.
228  *
229  * Transition / Combine function invocation:
230  *
231  * For performance reasons transition functions, including combine
232  * functions, aren't invoked one-by-one from nodeAgg.c after computing
233  * arguments using the expression evaluation engine. Instead
234  * ExecBuildAggTrans() builds one large expression that does both argument
235  * evaluation and transition function invocation. That avoids performance
236  * issues due to repeated uses of expression evaluation, complications due
237  * to filter expressions having to be evaluated early, and allows to JIT
238  * the entire expression into one native function.
239  *
240  * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
241  * Portions Copyright (c) 1994, Regents of the University of California
242  *
243  * IDENTIFICATION
244  * src/backend/executor/nodeAgg.c
245  *
246  *-------------------------------------------------------------------------
247  */
248 
249 #include "postgres.h"
250 
251 #include "access/htup_details.h"
252 #include "access/parallel.h"
253 #include "catalog/objectaccess.h"
254 #include "catalog/pg_aggregate.h"
255 #include "catalog/pg_proc.h"
256 #include "catalog/pg_type.h"
257 #include "common/hashfn.h"
258 #include "executor/execExpr.h"
259 #include "executor/executor.h"
260 #include "executor/nodeAgg.h"
261 #include "lib/hyperloglog.h"
262 #include "miscadmin.h"
263 #include "nodes/makefuncs.h"
264 #include "nodes/nodeFuncs.h"
265 #include "optimizer/optimizer.h"
266 #include "parser/parse_agg.h"
267 #include "parser/parse_coerce.h"
268 #include "utils/acl.h"
269 #include "utils/builtins.h"
270 #include "utils/datum.h"
271 #include "utils/dynahash.h"
272 #include "utils/expandeddatum.h"
273 #include "utils/logtape.h"
274 #include "utils/lsyscache.h"
275 #include "utils/memutils.h"
276 #include "utils/syscache.h"
277 #include "utils/tuplesort.h"
278 
279 /*
280  * Control how many partitions are created when spilling HashAgg to
281  * disk.
282  *
283  * HASHAGG_PARTITION_FACTOR is multiplied by the estimated number of
284  * partitions needed such that each partition will fit in memory. The factor
285  * is set higher than one because there's not a high cost to having a few too
286  * many partitions, and it makes it less likely that a partition will need to
287  * be spilled recursively. Another benefit of having more, smaller partitions
288  * is that small hash tables may perform better than large ones due to memory
289  * caching effects.
290  *
291  * We also specify a min and max number of partitions per spill. Too few might
292  * mean a lot of wasted I/O from repeated spilling of the same tuples. Too
293  * many will result in lots of memory wasted buffering the spill files (which
294  * could instead be spent on a larger hash table).
295  */
296 #define HASHAGG_PARTITION_FACTOR 1.50
297 #define HASHAGG_MIN_PARTITIONS 4
298 #define HASHAGG_MAX_PARTITIONS 1024
299 
300 /*
301  * For reading from tapes, the buffer size must be a multiple of
302  * BLCKSZ. Larger values help when reading from multiple tapes concurrently,
303  * but that doesn't happen in HashAgg, so we simply use BLCKSZ. Writing to a
304  * tape always uses a buffer of size BLCKSZ.
305  */
306 #define HASHAGG_READ_BUFFER_SIZE BLCKSZ
307 #define HASHAGG_WRITE_BUFFER_SIZE BLCKSZ
308 
309 /*
310  * HyperLogLog is used for estimating the cardinality of the spilled tuples in
311  * a given partition. 5 bits corresponds to a size of about 32 bytes and a
312  * worst-case error of around 18%. That's effective enough to choose a
313  * reasonable number of partitions when recursing.
314  */
315 #define HASHAGG_HLL_BIT_WIDTH 5
316 
317 /*
318  * Estimate chunk overhead as a constant 16 bytes. XXX: should this be
319  * improved?
320  */
321 #define CHUNKHDRSZ 16
322 
323 /*
324  * Represents partitioned spill data for a single hashtable. Contains the
325  * necessary information to route tuples to the correct partition, and to
326  * transform the spilled data into new batches.
327  *
328  * The high bits are used for partition selection (when recursing, we ignore
329  * the bits that have already been used for partition selection at an earlier
330  * level).
331  */
332 typedef struct HashAggSpill
333 {
334  int npartitions; /* number of partitions */
335  LogicalTape **partitions; /* spill partition tapes */
336  int64 *ntuples; /* number of tuples in each partition */
337  uint32 mask; /* mask to find partition from hash value */
338  int shift; /* after masking, shift by this amount */
339  hyperLogLogState *hll_card; /* cardinality estimate for contents */
341 
342 /*
343  * Represents work to be done for one pass of hash aggregation (with only one
344  * grouping set).
345  *
346  * Also tracks the bits of the hash already used for partition selection by
347  * earlier iterations, so that this batch can use new bits. If all bits have
348  * already been used, no partitioning will be done (any spilled data will go
349  * to a single output tape).
350  */
351 typedef struct HashAggBatch
352 {
353  int setno; /* grouping set */
354  int used_bits; /* number of bits of hash already used */
355  LogicalTape *input_tape; /* input partition tape */
356  int64 input_tuples; /* number of tuples in this batch */
357  double input_card; /* estimated group cardinality */
359 
360 /* used to find referenced colnos */
361 typedef struct FindColsContext
362 {
363  bool is_aggref; /* is under an aggref */
364  Bitmapset *aggregated; /* column references under an aggref */
365  Bitmapset *unaggregated; /* other column references */
367 
368 static void select_current_set(AggState *aggstate, int setno, bool is_hash);
369 static void initialize_phase(AggState *aggstate, int newphase);
370 static TupleTableSlot *fetch_input_tuple(AggState *aggstate);
371 static void initialize_aggregates(AggState *aggstate,
372  AggStatePerGroup *pergroups,
373  int numReset);
374 static void advance_transition_function(AggState *aggstate,
375  AggStatePerTrans pertrans,
376  AggStatePerGroup pergroupstate);
377 static void advance_aggregates(AggState *aggstate);
378 static void process_ordered_aggregate_single(AggState *aggstate,
379  AggStatePerTrans pertrans,
380  AggStatePerGroup pergroupstate);
381 static void process_ordered_aggregate_multi(AggState *aggstate,
382  AggStatePerTrans pertrans,
383  AggStatePerGroup pergroupstate);
384 static void finalize_aggregate(AggState *aggstate,
385  AggStatePerAgg peragg,
386  AggStatePerGroup pergroupstate,
387  Datum *resultVal, bool *resultIsNull);
388 static void finalize_partialaggregate(AggState *aggstate,
389  AggStatePerAgg peragg,
390  AggStatePerGroup pergroupstate,
391  Datum *resultVal, bool *resultIsNull);
392 static inline void prepare_hash_slot(AggStatePerHash perhash,
393  TupleTableSlot *inputslot,
394  TupleTableSlot *hashslot);
395 static void prepare_projection_slot(AggState *aggstate,
396  TupleTableSlot *slot,
397  int currentSet);
398 static void finalize_aggregates(AggState *aggstate,
399  AggStatePerAgg peraggs,
400  AggStatePerGroup pergroup);
401 static TupleTableSlot *project_aggregates(AggState *aggstate);
402 static void find_cols(AggState *aggstate, Bitmapset **aggregated,
403  Bitmapset **unaggregated);
404 static bool find_cols_walker(Node *node, FindColsContext *context);
405 static void build_hash_tables(AggState *aggstate);
406 static void build_hash_table(AggState *aggstate, int setno, long nbuckets);
407 static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
408  bool nullcheck);
409 static long hash_choose_num_buckets(double hashentrysize,
410  long ngroups, Size memory);
411 static int hash_choose_num_partitions(double input_groups,
412  double hashentrysize,
413  int used_bits,
414  int *log2_npartitions);
415 static void initialize_hash_entry(AggState *aggstate,
416  TupleHashTable hashtable,
417  TupleHashEntry entry);
418 static void lookup_hash_entries(AggState *aggstate);
419 static TupleTableSlot *agg_retrieve_direct(AggState *aggstate);
420 static void agg_fill_hash_table(AggState *aggstate);
421 static bool agg_refill_hash_table(AggState *aggstate);
424 static void hash_agg_check_limits(AggState *aggstate);
425 static void hash_agg_enter_spill_mode(AggState *aggstate);
426 static void hash_agg_update_metrics(AggState *aggstate, bool from_tape,
427  int npartitions);
428 static void hashagg_finish_initial_spills(AggState *aggstate);
429 static void hashagg_reset_spill_state(AggState *aggstate);
430 static HashAggBatch *hashagg_batch_new(LogicalTape *input_tape, int setno,
431  int64 input_tuples, double input_card,
432  int used_bits);
433 static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp);
434 static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
435  int used_bits, double input_groups,
436  double hashentrysize);
437 static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
438  TupleTableSlot *inputslot, uint32 hash);
439 static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill,
440  int setno);
441 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
442 static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
443  AggState *aggstate, EState *estate,
444  Aggref *aggref, Oid transfn_oid,
445  Oid aggtranstype, Oid aggserialfn,
446  Oid aggdeserialfn, Datum initValue,
447  bool initValueIsNull, Oid *inputTypes,
448  int numArguments);
449 
450 
451 /*
452  * Select the current grouping set; affects current_set and
453  * curaggcontext.
454  */
455 static void
456 select_current_set(AggState *aggstate, int setno, bool is_hash)
457 {
458  /*
459  * When changing this, also adapt ExecAggPlainTransByVal() and
460  * ExecAggPlainTransByRef().
461  */
462  if (is_hash)
463  aggstate->curaggcontext = aggstate->hashcontext;
464  else
465  aggstate->curaggcontext = aggstate->aggcontexts[setno];
466 
467  aggstate->current_set = setno;
468 }
469 
470 /*
471  * Switch to phase "newphase", which must either be 0 or 1 (to reset) or
472  * current_phase + 1. Juggle the tuplesorts accordingly.
473  *
474  * Phase 0 is for hashing, which we currently handle last in the AGG_MIXED
475  * case, so when entering phase 0, all we need to do is drop open sorts.
476  */
477 static void
478 initialize_phase(AggState *aggstate, int newphase)
479 {
480  Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);
481 
482  /*
483  * Whatever the previous state, we're now done with whatever input
484  * tuplesort was in use.
485  */
486  if (aggstate->sort_in)
487  {
488  tuplesort_end(aggstate->sort_in);
489  aggstate->sort_in = NULL;
490  }
491 
492  if (newphase <= 1)
493  {
494  /*
495  * Discard any existing output tuplesort.
496  */
497  if (aggstate->sort_out)
498  {
499  tuplesort_end(aggstate->sort_out);
500  aggstate->sort_out = NULL;
501  }
502  }
503  else
504  {
505  /*
506  * The old output tuplesort becomes the new input one, and this is the
507  * right time to actually sort it.
508  */
509  aggstate->sort_in = aggstate->sort_out;
510  aggstate->sort_out = NULL;
511  Assert(aggstate->sort_in);
512  tuplesort_performsort(aggstate->sort_in);
513  }
514 
515  /*
516  * If this isn't the last phase, we need to sort appropriately for the
517  * next phase in sequence.
518  */
519  if (newphase > 0 && newphase < aggstate->numphases - 1)
520  {
521  Sort *sortnode = aggstate->phases[newphase + 1].sortnode;
522  PlanState *outerNode = outerPlanState(aggstate);
523  TupleDesc tupDesc = ExecGetResultType(outerNode);
524 
525  aggstate->sort_out = tuplesort_begin_heap(tupDesc,
526  sortnode->numCols,
527  sortnode->sortColIdx,
528  sortnode->sortOperators,
529  sortnode->collations,
530  sortnode->nullsFirst,
531  work_mem,
532  NULL, TUPLESORT_NONE);
533  }
534 
535  aggstate->current_phase = newphase;
536  aggstate->phase = &aggstate->phases[newphase];
537 }
538 
539 /*
540  * Fetch a tuple from either the outer plan (for phase 1) or from the sorter
541  * populated by the previous phase. Copy it to the sorter for the next phase
542  * if any.
543  *
544  * Callers cannot rely on memory for tuple in returned slot remaining valid
545  * past any subsequently fetched tuple.
546  */
547 static TupleTableSlot *
549 {
550  TupleTableSlot *slot;
551 
552  if (aggstate->sort_in)
553  {
554  /* make sure we check for interrupts in either path through here */
556  if (!tuplesort_gettupleslot(aggstate->sort_in, true, false,
557  aggstate->sort_slot, NULL))
558  return NULL;
559  slot = aggstate->sort_slot;
560  }
561  else
562  slot = ExecProcNode(outerPlanState(aggstate));
563 
564  if (!TupIsNull(slot) && aggstate->sort_out)
565  tuplesort_puttupleslot(aggstate->sort_out, slot);
566 
567  return slot;
568 }
569 
570 /*
571  * (Re)Initialize an individual aggregate.
572  *
573  * This function handles only one grouping set, already set in
574  * aggstate->current_set.
575  *
576  * When called, CurrentMemoryContext should be the per-query context.
577  */
578 static void
580  AggStatePerGroup pergroupstate)
581 {
582  /*
583  * Start a fresh sort operation for each DISTINCT/ORDER BY aggregate.
584  */
585  if (pertrans->aggsortrequired)
586  {
587  /*
588  * In case of rescan, maybe there could be an uncompleted sort
589  * operation? Clean it up if so.
590  */
591  if (pertrans->sortstates[aggstate->current_set])
592  tuplesort_end(pertrans->sortstates[aggstate->current_set]);
593 
594 
595  /*
596  * We use a plain Datum sorter when there's a single input column;
597  * otherwise sort the full tuple. (See comments for
598  * process_ordered_aggregate_single.)
599  */
600  if (pertrans->numInputs == 1)
601  {
602  Form_pg_attribute attr = TupleDescAttr(pertrans->sortdesc, 0);
603 
604  pertrans->sortstates[aggstate->current_set] =
605  tuplesort_begin_datum(attr->atttypid,
606  pertrans->sortOperators[0],
607  pertrans->sortCollations[0],
608  pertrans->sortNullsFirst[0],
609  work_mem, NULL, TUPLESORT_NONE);
610  }
611  else
612  pertrans->sortstates[aggstate->current_set] =
613  tuplesort_begin_heap(pertrans->sortdesc,
614  pertrans->numSortCols,
615  pertrans->sortColIdx,
616  pertrans->sortOperators,
617  pertrans->sortCollations,
618  pertrans->sortNullsFirst,
619  work_mem, NULL, TUPLESORT_NONE);
620  }
621 
622  /*
623  * (Re)set transValue to the initial value.
624  *
625  * Note that when the initial value is pass-by-ref, we must copy it (into
626  * the aggcontext) since we will pfree the transValue later.
627  */
628  if (pertrans->initValueIsNull)
629  pergroupstate->transValue = pertrans->initValue;
630  else
631  {
632  MemoryContext oldContext;
633 
635  pergroupstate->transValue = datumCopy(pertrans->initValue,
636  pertrans->transtypeByVal,
637  pertrans->transtypeLen);
638  MemoryContextSwitchTo(oldContext);
639  }
640  pergroupstate->transValueIsNull = pertrans->initValueIsNull;
641 
642  /*
643  * If the initial value for the transition state doesn't exist in the
644  * pg_aggregate table then we will let the first non-NULL value returned
645  * from the outer procNode become the initial value. (This is useful for
646  * aggregates like max() and min().) The noTransValue flag signals that we
647  * still need to do this.
648  */
649  pergroupstate->noTransValue = pertrans->initValueIsNull;
650 }
651 
652 /*
653  * Initialize all aggregate transition states for a new group of input values.
654  *
655  * If there are multiple grouping sets, we initialize only the first numReset
656  * of them (the grouping sets are ordered so that the most specific one, which
657  * is reset most often, is first). As a convenience, if numReset is 0, we
658  * reinitialize all sets.
659  *
660  * NB: This cannot be used for hash aggregates, as for those the grouping set
661  * number has to be specified from further up.
662  *
663  * When called, CurrentMemoryContext should be the per-query context.
664  */
665 static void
667  AggStatePerGroup *pergroups,
668  int numReset)
669 {
670  int transno;
671  int numGroupingSets = Max(aggstate->phase->numsets, 1);
672  int setno = 0;
673  int numTrans = aggstate->numtrans;
674  AggStatePerTrans transstates = aggstate->pertrans;
675 
676  if (numReset == 0)
677  numReset = numGroupingSets;
678 
679  for (setno = 0; setno < numReset; setno++)
680  {
681  AggStatePerGroup pergroup = pergroups[setno];
682 
683  select_current_set(aggstate, setno, false);
684 
685  for (transno = 0; transno < numTrans; transno++)
686  {
687  AggStatePerTrans pertrans = &transstates[transno];
688  AggStatePerGroup pergroupstate = &pergroup[transno];
689 
690  initialize_aggregate(aggstate, pertrans, pergroupstate);
691  }
692  }
693 }
694 
695 /*
696  * Given new input value(s), advance the transition function of one aggregate
697  * state within one grouping set only (already set in aggstate->current_set)
698  *
699  * The new values (and null flags) have been preloaded into argument positions
700  * 1 and up in pertrans->transfn_fcinfo, so that we needn't copy them again to
701  * pass to the transition function. We also expect that the static fields of
702  * the fcinfo are already initialized; that was done by ExecInitAgg().
703  *
704  * It doesn't matter which memory context this is called in.
705  */
706 static void
708  AggStatePerTrans pertrans,
709  AggStatePerGroup pergroupstate)
710 {
711  FunctionCallInfo fcinfo = pertrans->transfn_fcinfo;
712  MemoryContext oldContext;
713  Datum newVal;
714 
715  if (pertrans->transfn.fn_strict)
716  {
717  /*
718  * For a strict transfn, nothing happens when there's a NULL input; we
719  * just keep the prior transValue.
720  */
721  int numTransInputs = pertrans->numTransInputs;
722  int i;
723 
724  for (i = 1; i <= numTransInputs; i++)
725  {
726  if (fcinfo->args[i].isnull)
727  return;
728  }
729  if (pergroupstate->noTransValue)
730  {
731  /*
732  * transValue has not been initialized. This is the first non-NULL
733  * input value. We use it as the initial value for transValue. (We
734  * already checked that the agg's input type is binary-compatible
735  * with its transtype, so straight copy here is OK.)
736  *
737  * We must copy the datum into aggcontext if it is pass-by-ref. We
738  * do not need to pfree the old transValue, since it's NULL.
739  */
741  pergroupstate->transValue = datumCopy(fcinfo->args[1].value,
742  pertrans->transtypeByVal,
743  pertrans->transtypeLen);
744  pergroupstate->transValueIsNull = false;
745  pergroupstate->noTransValue = false;
746  MemoryContextSwitchTo(oldContext);
747  return;
748  }
749  if (pergroupstate->transValueIsNull)
750  {
751  /*
752  * Don't call a strict function with NULL inputs. Note it is
753  * possible to get here despite the above tests, if the transfn is
754  * strict *and* returned a NULL on a prior cycle. If that happens
755  * we will propagate the NULL all the way to the end.
756  */
757  return;
758  }
759  }
760 
761  /* We run the transition functions in per-input-tuple memory context */
762  oldContext = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
763 
764  /* set up aggstate->curpertrans for AggGetAggref() */
765  aggstate->curpertrans = pertrans;
766 
767  /*
768  * OK to call the transition function
769  */
770  fcinfo->args[0].value = pergroupstate->transValue;
771  fcinfo->args[0].isnull = pergroupstate->transValueIsNull;
772  fcinfo->isnull = false; /* just in case transfn doesn't set it */
773 
774  newVal = FunctionCallInvoke(fcinfo);
775 
776  aggstate->curpertrans = NULL;
777 
778  /*
779  * If pass-by-ref datatype, must copy the new value into aggcontext and
780  * free the prior transValue. But if transfn returned a pointer to its
781  * first input, we don't need to do anything. Also, if transfn returned a
782  * pointer to a R/W expanded object that is already a child of the
783  * aggcontext, assume we can adopt that value without copying it.
784  *
785  * It's safe to compare newVal with pergroup->transValue without regard
786  * for either being NULL, because ExecAggTransReparent() takes care to set
787  * transValue to 0 when NULL. Otherwise we could end up accidentally not
788  * reparenting, when the transValue has the same numerical value as
789  * newValue, despite being NULL. This is a somewhat hot path, making it
790  * undesirable to instead solve this with another branch for the common
791  * case of the transition function returning its (modified) input
792  * argument.
793  */
794  if (!pertrans->transtypeByVal &&
795  DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))
796  newVal = ExecAggTransReparent(aggstate, pertrans,
797  newVal, fcinfo->isnull,
798  pergroupstate->transValue,
799  pergroupstate->transValueIsNull);
800 
801  pergroupstate->transValue = newVal;
802  pergroupstate->transValueIsNull = fcinfo->isnull;
803 
804  MemoryContextSwitchTo(oldContext);
805 }
806 
807 /*
808  * Advance each aggregate transition state for one input tuple. The input
809  * tuple has been stored in tmpcontext->ecxt_outertuple, so that it is
810  * accessible to ExecEvalExpr.
811  *
812  * We have two sets of transition states to handle: one for sorted aggregation
813  * and one for hashed; we do them both here, to avoid multiple evaluation of
814  * the inputs.
815  *
816  * When called, CurrentMemoryContext should be the per-query context.
817  */
818 static void
820 {
821  bool dummynull;
822 
824  aggstate->tmpcontext,
825  &dummynull);
826 }
827 
828 /*
829  * Run the transition function for a DISTINCT or ORDER BY aggregate
830  * with only one input. This is called after we have completed
831  * entering all the input values into the sort object. We complete the
832  * sort, read out the values in sorted order, and run the transition
833  * function on each value (applying DISTINCT if appropriate).
834  *
835  * Note that the strictness of the transition function was checked when
836  * entering the values into the sort, so we don't check it again here;
837  * we just apply standard SQL DISTINCT logic.
838  *
839  * The one-input case is handled separately from the multi-input case
840  * for performance reasons: for single by-value inputs, such as the
841  * common case of count(distinct id), the tuplesort_getdatum code path
842  * is around 300% faster. (The speedup for by-reference types is less
843  * but still noticeable.)
844  *
845  * This function handles only one grouping set (already set in
846  * aggstate->current_set).
847  *
848  * When called, CurrentMemoryContext should be the per-query context.
849  */
850 static void
852  AggStatePerTrans pertrans,
853  AggStatePerGroup pergroupstate)
854 {
855  Datum oldVal = (Datum) 0;
856  bool oldIsNull = true;
857  bool haveOldVal = false;
858  MemoryContext workcontext = aggstate->tmpcontext->ecxt_per_tuple_memory;
859  MemoryContext oldContext;
860  bool isDistinct = (pertrans->numDistinctCols > 0);
861  Datum newAbbrevVal = (Datum) 0;
862  Datum oldAbbrevVal = (Datum) 0;
863  FunctionCallInfo fcinfo = pertrans->transfn_fcinfo;
864  Datum *newVal;
865  bool *isNull;
866 
867  Assert(pertrans->numDistinctCols < 2);
868 
869  tuplesort_performsort(pertrans->sortstates[aggstate->current_set]);
870 
871  /* Load the column into argument 1 (arg 0 will be transition value) */
872  newVal = &fcinfo->args[1].value;
873  isNull = &fcinfo->args[1].isnull;
874 
875  /*
876  * Note: if input type is pass-by-ref, the datums returned by the sort are
877  * freshly palloc'd in the per-query context, so we must be careful to
878  * pfree them when they are no longer needed.
879  */
880 
881  while (tuplesort_getdatum(pertrans->sortstates[aggstate->current_set],
882  true, false, newVal, isNull, &newAbbrevVal))
883  {
884  /*
885  * Clear and select the working context for evaluation of the equality
886  * function and transition function.
887  */
888  MemoryContextReset(workcontext);
889  oldContext = MemoryContextSwitchTo(workcontext);
890 
891  /*
892  * If DISTINCT mode, and not distinct from prior, skip it.
893  */
894  if (isDistinct &&
895  haveOldVal &&
896  ((oldIsNull && *isNull) ||
897  (!oldIsNull && !*isNull &&
898  oldAbbrevVal == newAbbrevVal &&
900  pertrans->aggCollation,
901  oldVal, *newVal)))))
902  {
903  MemoryContextSwitchTo(oldContext);
904  continue;
905  }
906  else
907  {
908  advance_transition_function(aggstate, pertrans, pergroupstate);
909 
910  MemoryContextSwitchTo(oldContext);
911 
912  /*
913  * Forget the old value, if any, and remember the new one for
914  * subsequent equality checks.
915  */
916  if (!pertrans->inputtypeByVal)
917  {
918  if (!oldIsNull)
919  pfree(DatumGetPointer(oldVal));
920  if (!*isNull)
921  oldVal = datumCopy(*newVal, pertrans->inputtypeByVal,
922  pertrans->inputtypeLen);
923  }
924  else
925  oldVal = *newVal;
926  oldAbbrevVal = newAbbrevVal;
927  oldIsNull = *isNull;
928  haveOldVal = true;
929  }
930  }
931 
932  if (!oldIsNull && !pertrans->inputtypeByVal)
933  pfree(DatumGetPointer(oldVal));
934 
935  tuplesort_end(pertrans->sortstates[aggstate->current_set]);
936  pertrans->sortstates[aggstate->current_set] = NULL;
937 }
938 
939 /*
940  * Run the transition function for a DISTINCT or ORDER BY aggregate
941  * with more than one input. This is called after we have completed
942  * entering all the input values into the sort object. We complete the
943  * sort, read out the values in sorted order, and run the transition
944  * function on each value (applying DISTINCT if appropriate).
945  *
946  * This function handles only one grouping set (already set in
947  * aggstate->current_set).
948  *
949  * When called, CurrentMemoryContext should be the per-query context.
950  */
951 static void
953  AggStatePerTrans pertrans,
954  AggStatePerGroup pergroupstate)
955 {
956  ExprContext *tmpcontext = aggstate->tmpcontext;
957  FunctionCallInfo fcinfo = pertrans->transfn_fcinfo;
958  TupleTableSlot *slot1 = pertrans->sortslot;
959  TupleTableSlot *slot2 = pertrans->uniqslot;
960  int numTransInputs = pertrans->numTransInputs;
961  int numDistinctCols = pertrans->numDistinctCols;
962  Datum newAbbrevVal = (Datum) 0;
963  Datum oldAbbrevVal = (Datum) 0;
964  bool haveOldValue = false;
965  TupleTableSlot *save = aggstate->tmpcontext->ecxt_outertuple;
966  int i;
967 
968  tuplesort_performsort(pertrans->sortstates[aggstate->current_set]);
969 
970  ExecClearTuple(slot1);
971  if (slot2)
972  ExecClearTuple(slot2);
973 
974  while (tuplesort_gettupleslot(pertrans->sortstates[aggstate->current_set],
975  true, true, slot1, &newAbbrevVal))
976  {
978 
979  tmpcontext->ecxt_outertuple = slot1;
980  tmpcontext->ecxt_innertuple = slot2;
981 
982  if (numDistinctCols == 0 ||
983  !haveOldValue ||
984  newAbbrevVal != oldAbbrevVal ||
985  !ExecQual(pertrans->equalfnMulti, tmpcontext))
986  {
987  /*
988  * Extract the first numTransInputs columns as datums to pass to
989  * the transfn.
990  */
991  slot_getsomeattrs(slot1, numTransInputs);
992 
993  /* Load values into fcinfo */
994  /* Start from 1, since the 0th arg will be the transition value */
995  for (i = 0; i < numTransInputs; i++)
996  {
997  fcinfo->args[i + 1].value = slot1->tts_values[i];
998  fcinfo->args[i + 1].isnull = slot1->tts_isnull[i];
999  }
1000 
1001  advance_transition_function(aggstate, pertrans, pergroupstate);
1002 
1003  if (numDistinctCols > 0)
1004  {
1005  /* swap the slot pointers to retain the current tuple */
1006  TupleTableSlot *tmpslot = slot2;
1007 
1008  slot2 = slot1;
1009  slot1 = tmpslot;
1010  /* avoid ExecQual() calls by reusing abbreviated keys */
1011  oldAbbrevVal = newAbbrevVal;
1012  haveOldValue = true;
1013  }
1014  }
1015 
1016  /* Reset context each time */
1017  ResetExprContext(tmpcontext);
1018 
1019  ExecClearTuple(slot1);
1020  }
1021 
1022  if (slot2)
1023  ExecClearTuple(slot2);
1024 
1025  tuplesort_end(pertrans->sortstates[aggstate->current_set]);
1026  pertrans->sortstates[aggstate->current_set] = NULL;
1027 
1028  /* restore previous slot, potentially in use for grouping sets */
1029  tmpcontext->ecxt_outertuple = save;
1030 }
1031 
1032 /*
1033  * Compute the final value of one aggregate.
1034  *
1035  * This function handles only one grouping set (already set in
1036  * aggstate->current_set).
1037  *
1038  * The finalfn will be run, and the result delivered, in the
1039  * output-tuple context; caller's CurrentMemoryContext does not matter.
1040  * (But note that in some cases, such as when there is no finalfn, the
1041  * result might be a pointer to or into the agg's transition value.)
1042  *
1043  * The finalfn uses the state as set in the transno. This also might be
1044  * being used by another aggregate function, so it's important that we do
1045  * nothing destructive here.
1046  */
1047 static void
1049  AggStatePerAgg peragg,
1050  AggStatePerGroup pergroupstate,
1051  Datum *resultVal, bool *resultIsNull)
1052 {
1053  LOCAL_FCINFO(fcinfo, FUNC_MAX_ARGS);
1054  bool anynull = false;
1055  MemoryContext oldContext;
1056  int i;
1057  ListCell *lc;
1058  AggStatePerTrans pertrans = &aggstate->pertrans[peragg->transno];
1059 
1061 
1062  /*
1063  * Evaluate any direct arguments. We do this even if there's no finalfn
1064  * (which is unlikely anyway), so that side-effects happen as expected.
1065  * The direct arguments go into arg positions 1 and up, leaving position 0
1066  * for the transition state value.
1067  */
1068  i = 1;
1069  foreach(lc, peragg->aggdirectargs)
1070  {
1071  ExprState *expr = (ExprState *) lfirst(lc);
1072 
1073  fcinfo->args[i].value = ExecEvalExpr(expr,
1074  aggstate->ss.ps.ps_ExprContext,
1075  &fcinfo->args[i].isnull);
1076  anynull |= fcinfo->args[i].isnull;
1077  i++;
1078  }
1079 
1080  /*
1081  * Apply the agg's finalfn if one is provided, else return transValue.
1082  */
1083  if (OidIsValid(peragg->finalfn_oid))
1084  {
1085  int numFinalArgs = peragg->numFinalArgs;
1086 
1087  /* set up aggstate->curperagg for AggGetAggref() */
1088  aggstate->curperagg = peragg;
1089 
1090  InitFunctionCallInfoData(*fcinfo, &peragg->finalfn,
1091  numFinalArgs,
1092  pertrans->aggCollation,
1093  (void *) aggstate, NULL);
1094 
1095  /* Fill in the transition state value */
1096  fcinfo->args[0].value =
1097  MakeExpandedObjectReadOnly(pergroupstate->transValue,
1098  pergroupstate->transValueIsNull,
1099  pertrans->transtypeLen);
1100  fcinfo->args[0].isnull = pergroupstate->transValueIsNull;
1101  anynull |= pergroupstate->transValueIsNull;
1102 
1103  /* Fill any remaining argument positions with nulls */
1104  for (; i < numFinalArgs; i++)
1105  {
1106  fcinfo->args[i].value = (Datum) 0;
1107  fcinfo->args[i].isnull = true;
1108  anynull = true;
1109  }
1110 
1111  if (fcinfo->flinfo->fn_strict && anynull)
1112  {
1113  /* don't call a strict function with NULL inputs */
1114  *resultVal = (Datum) 0;
1115  *resultIsNull = true;
1116  }
1117  else
1118  {
1119  *resultVal = FunctionCallInvoke(fcinfo);
1120  *resultIsNull = fcinfo->isnull;
1121  }
1122  aggstate->curperagg = NULL;
1123  }
1124  else
1125  {
1126  *resultVal =
1127  MakeExpandedObjectReadOnly(pergroupstate->transValue,
1128  pergroupstate->transValueIsNull,
1129  pertrans->transtypeLen);
1130  *resultIsNull = pergroupstate->transValueIsNull;
1131  }
1132 
1133  MemoryContextSwitchTo(oldContext);
1134 }
1135 
1136 /*
1137  * Compute the output value of one partial aggregate.
1138  *
1139  * The serialization function will be run, and the result delivered, in the
1140  * output-tuple context; caller's CurrentMemoryContext does not matter.
1141  */
1142 static void
1144  AggStatePerAgg peragg,
1145  AggStatePerGroup pergroupstate,
1146  Datum *resultVal, bool *resultIsNull)
1147 {
1148  AggStatePerTrans pertrans = &aggstate->pertrans[peragg->transno];
1149  MemoryContext oldContext;
1150 
1152 
1153  /*
1154  * serialfn_oid will be set if we must serialize the transvalue before
1155  * returning it
1156  */
1157  if (OidIsValid(pertrans->serialfn_oid))
1158  {
1159  /* Don't call a strict serialization function with NULL input. */
1160  if (pertrans->serialfn.fn_strict && pergroupstate->transValueIsNull)
1161  {
1162  *resultVal = (Datum) 0;
1163  *resultIsNull = true;
1164  }
1165  else
1166  {
1167  FunctionCallInfo fcinfo = pertrans->serialfn_fcinfo;
1168 
1169  fcinfo->args[0].value =
1170  MakeExpandedObjectReadOnly(pergroupstate->transValue,
1171  pergroupstate->transValueIsNull,
1172  pertrans->transtypeLen);
1173  fcinfo->args[0].isnull = pergroupstate->transValueIsNull;
1174  fcinfo->isnull = false;
1175 
1176  *resultVal = FunctionCallInvoke(fcinfo);
1177  *resultIsNull = fcinfo->isnull;
1178  }
1179  }
1180  else
1181  {
1182  *resultVal =
1183  MakeExpandedObjectReadOnly(pergroupstate->transValue,
1184  pergroupstate->transValueIsNull,
1185  pertrans->transtypeLen);
1186  *resultIsNull = pergroupstate->transValueIsNull;
1187  }
1188 
1189  MemoryContextSwitchTo(oldContext);
1190 }
1191 
1192 /*
1193  * Extract the attributes that make up the grouping key into the
1194  * hashslot. This is necessary to compute the hash or perform a lookup.
1195  */
1196 static inline void
1198  TupleTableSlot *inputslot,
1199  TupleTableSlot *hashslot)
1200 {
1201  int i;
1202 
1203  /* transfer just the needed columns into hashslot */
1204  slot_getsomeattrs(inputslot, perhash->largestGrpColIdx);
1205  ExecClearTuple(hashslot);
1206 
1207  for (i = 0; i < perhash->numhashGrpCols; i++)
1208  {
1209  int varNumber = perhash->hashGrpColIdxInput[i] - 1;
1210 
1211  hashslot->tts_values[i] = inputslot->tts_values[varNumber];
1212  hashslot->tts_isnull[i] = inputslot->tts_isnull[varNumber];
1213  }
1214  ExecStoreVirtualTuple(hashslot);
1215 }
1216 
1217 /*
1218  * Prepare to finalize and project based on the specified representative tuple
1219  * slot and grouping set.
1220  *
1221  * In the specified tuple slot, force to null all attributes that should be
1222  * read as null in the context of the current grouping set. Also stash the
1223  * current group bitmap where GroupingExpr can get at it.
1224  *
1225  * This relies on three conditions:
1226  *
1227  * 1) Nothing is ever going to try and extract the whole tuple from this slot,
1228  * only reference it in evaluations, which will only access individual
1229  * attributes.
1230  *
1231  * 2) No system columns are going to need to be nulled. (If a system column is
1232  * referenced in a group clause, it is actually projected in the outer plan
1233  * tlist.)
1234  *
1235  * 3) Within a given phase, we never need to recover the value of an attribute
1236  * once it has been set to null.
1237  *
1238  * Poking into the slot this way is a bit ugly, but the consensus is that the
1239  * alternative was worse.
1240  */
1241 static void
1242 prepare_projection_slot(AggState *aggstate, TupleTableSlot *slot, int currentSet)
1243 {
1244  if (aggstate->phase->grouped_cols)
1245  {
1246  Bitmapset *grouped_cols = aggstate->phase->grouped_cols[currentSet];
1247 
1248  aggstate->grouped_cols = grouped_cols;
1249 
1250  if (TTS_EMPTY(slot))
1251  {
1252  /*
1253  * Force all values to be NULL if working on an empty input tuple
1254  * (i.e. an empty grouping set for which no input rows were
1255  * supplied).
1256  */
1257  ExecStoreAllNullTuple(slot);
1258  }
1259  else if (aggstate->all_grouped_cols)
1260  {
1261  ListCell *lc;
1262 
1263  /* all_grouped_cols is arranged in desc order */
1265 
1266  foreach(lc, aggstate->all_grouped_cols)
1267  {
1268  int attnum = lfirst_int(lc);
1269 
1270  if (!bms_is_member(attnum, grouped_cols))
1271  slot->tts_isnull[attnum - 1] = true;
1272  }
1273  }
1274  }
1275 }
1276 
1277 /*
1278  * Compute the final value of all aggregates for one group.
1279  *
1280  * This function handles only one grouping set at a time, which the caller must
1281  * have selected. It's also the caller's responsibility to adjust the supplied
1282  * pergroup parameter to point to the current set's transvalues.
1283  *
1284  * Results are stored in the output econtext aggvalues/aggnulls.
1285  */
1286 static void
1288  AggStatePerAgg peraggs,
1289  AggStatePerGroup pergroup)
1290 {
1291  ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
1292  Datum *aggvalues = econtext->ecxt_aggvalues;
1293  bool *aggnulls = econtext->ecxt_aggnulls;
1294  int aggno;
1295 
1296  /*
1297  * If there were any DISTINCT and/or ORDER BY aggregates, sort their
1298  * inputs and run the transition functions.
1299  */
1300  for (int transno = 0; transno < aggstate->numtrans; transno++)
1301  {
1302  AggStatePerTrans pertrans = &aggstate->pertrans[transno];
1303  AggStatePerGroup pergroupstate;
1304 
1305  pergroupstate = &pergroup[transno];
1306 
1307  if (pertrans->aggsortrequired)
1308  {
1309  Assert(aggstate->aggstrategy != AGG_HASHED &&
1310  aggstate->aggstrategy != AGG_MIXED);
1311 
1312  if (pertrans->numInputs == 1)
1314  pertrans,
1315  pergroupstate);
1316  else
1318  pertrans,
1319  pergroupstate);
1320  }
1321  else if (pertrans->numDistinctCols > 0 && pertrans->haslast)
1322  {
1323  pertrans->haslast = false;
1324 
1325  if (pertrans->numDistinctCols == 1)
1326  {
1327  if (!pertrans->inputtypeByVal && !pertrans->lastisnull)
1328  pfree(DatumGetPointer(pertrans->lastdatum));
1329 
1330  pertrans->lastisnull = false;
1331  pertrans->lastdatum = (Datum) 0;
1332  }
1333  else
1334  ExecClearTuple(pertrans->uniqslot);
1335  }
1336  }
1337 
1338  /*
1339  * Run the final functions.
1340  */
1341  for (aggno = 0; aggno < aggstate->numaggs; aggno++)
1342  {
1343  AggStatePerAgg peragg = &peraggs[aggno];
1344  int transno = peragg->transno;
1345  AggStatePerGroup pergroupstate;
1346 
1347  pergroupstate = &pergroup[transno];
1348 
1349  if (DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit))
1350  finalize_partialaggregate(aggstate, peragg, pergroupstate,
1351  &aggvalues[aggno], &aggnulls[aggno]);
1352  else
1353  finalize_aggregate(aggstate, peragg, pergroupstate,
1354  &aggvalues[aggno], &aggnulls[aggno]);
1355  }
1356 }
1357 
1358 /*
1359  * Project the result of a group (whose aggs have already been calculated by
1360  * finalize_aggregates). Returns the result slot, or NULL if no row is
1361  * projected (suppressed by qual).
1362  */
1363 static TupleTableSlot *
1365 {
1366  ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
1367 
1368  /*
1369  * Check the qual (HAVING clause); if the group does not match, ignore it.
1370  */
1371  if (ExecQual(aggstate->ss.ps.qual, econtext))
1372  {
1373  /*
1374  * Form and return projection tuple using the aggregate results and
1375  * the representative input tuple.
1376  */
1377  return ExecProject(aggstate->ss.ps.ps_ProjInfo);
1378  }
1379  else
1380  InstrCountFiltered1(aggstate, 1);
1381 
1382  return NULL;
1383 }
1384 
1385 /*
1386  * Find input-tuple columns that are needed, dividing them into
1387  * aggregated and unaggregated sets.
1388  */
1389 static void
1390 find_cols(AggState *aggstate, Bitmapset **aggregated, Bitmapset **unaggregated)
1391 {
1392  Agg *agg = (Agg *) aggstate->ss.ps.plan;
1393  FindColsContext context;
1394 
1395  context.is_aggref = false;
1396  context.aggregated = NULL;
1397  context.unaggregated = NULL;
1398 
1399  /* Examine tlist and quals */
1400  (void) find_cols_walker((Node *) agg->plan.targetlist, &context);
1401  (void) find_cols_walker((Node *) agg->plan.qual, &context);
1402 
1403  /* In some cases, grouping columns will not appear in the tlist */
1404  for (int i = 0; i < agg->numCols; i++)
1405  context.unaggregated = bms_add_member(context.unaggregated,
1406  agg->grpColIdx[i]);
1407 
1408  *aggregated = context.aggregated;
1409  *unaggregated = context.unaggregated;
1410 }
1411 
1412 static bool
1414 {
1415  if (node == NULL)
1416  return false;
1417  if (IsA(node, Var))
1418  {
1419  Var *var = (Var *) node;
1420 
1421  /* setrefs.c should have set the varno to OUTER_VAR */
1422  Assert(var->varno == OUTER_VAR);
1423  Assert(var->varlevelsup == 0);
1424  if (context->is_aggref)
1425  context->aggregated = bms_add_member(context->aggregated,
1426  var->varattno);
1427  else
1428  context->unaggregated = bms_add_member(context->unaggregated,
1429  var->varattno);
1430  return false;
1431  }
1432  if (IsA(node, Aggref))
1433  {
1434  Assert(!context->is_aggref);
1435  context->is_aggref = true;
1436  expression_tree_walker(node, find_cols_walker, (void *) context);
1437  context->is_aggref = false;
1438  return false;
1439  }
1441  (void *) context);
1442 }
1443 
1444 /*
1445  * (Re-)initialize the hash table(s) to empty.
1446  *
1447  * To implement hashed aggregation, we need a hashtable that stores a
1448  * representative tuple and an array of AggStatePerGroup structs for each
1449  * distinct set of GROUP BY column values. We compute the hash key from the
1450  * GROUP BY columns. The per-group data is allocated in lookup_hash_entry(),
1451  * for each entry.
1452  *
1453  * We have a separate hashtable and associated perhash data structure for each
1454  * grouping set for which we're doing hashing.
1455  *
1456  * The contents of the hash tables always live in the hashcontext's per-tuple
1457  * memory context (there is only one of these for all tables together, since
1458  * they are all reset at the same time).
1459  */
1460 static void
1462 {
1463  int setno;
1464 
1465  for (setno = 0; setno < aggstate->num_hashes; ++setno)
1466  {
1467  AggStatePerHash perhash = &aggstate->perhash[setno];
1468  long nbuckets;
1469  Size memory;
1470 
1471  if (perhash->hashtable != NULL)
1472  {
1473  ResetTupleHashTable(perhash->hashtable);
1474  continue;
1475  }
1476 
1477  Assert(perhash->aggnode->numGroups > 0);
1478 
1479  memory = aggstate->hash_mem_limit / aggstate->num_hashes;
1480 
1481  /* choose reasonable number of buckets per hashtable */
1482  nbuckets = hash_choose_num_buckets(aggstate->hashentrysize,
1483  perhash->aggnode->numGroups,
1484  memory);
1485 
1486  build_hash_table(aggstate, setno, nbuckets);
1487  }
1488 
1489  aggstate->hash_ngroups_current = 0;
1490 }
1491 
1492 /*
1493  * Build a single hashtable for this grouping set.
1494  */
1495 static void
1496 build_hash_table(AggState *aggstate, int setno, long nbuckets)
1497 {
1498  AggStatePerHash perhash = &aggstate->perhash[setno];
1499  MemoryContext metacxt = aggstate->hash_metacxt;
1500  MemoryContext hashcxt = aggstate->hashcontext->ecxt_per_tuple_memory;
1501  MemoryContext tmpcxt = aggstate->tmpcontext->ecxt_per_tuple_memory;
1502  Size additionalsize;
1503 
1504  Assert(aggstate->aggstrategy == AGG_HASHED ||
1505  aggstate->aggstrategy == AGG_MIXED);
1506 
1507  /*
1508  * Used to make sure initial hash table allocation does not exceed
1509  * hash_mem. Note that the estimate does not include space for
1510  * pass-by-reference transition data values, nor for the representative
1511  * tuple of each group.
1512  */
1513  additionalsize = aggstate->numtrans * sizeof(AggStatePerGroupData);
1514 
1515  perhash->hashtable = BuildTupleHashTableExt(&aggstate->ss.ps,
1516  perhash->hashslot->tts_tupleDescriptor,
1517  perhash->numCols,
1518  perhash->hashGrpColIdxHash,
1519  perhash->eqfuncoids,
1520  perhash->hashfunctions,
1521  perhash->aggnode->grpCollations,
1522  nbuckets,
1523  additionalsize,
1524  metacxt,
1525  hashcxt,
1526  tmpcxt,
1527  DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit));
1528 }
1529 
1530 /*
1531  * Compute columns that actually need to be stored in hashtable entries. The
1532  * incoming tuples from the child plan node will contain grouping columns,
1533  * other columns referenced in our targetlist and qual, columns used to
1534  * compute the aggregate functions, and perhaps just junk columns we don't use
1535  * at all. Only columns of the first two types need to be stored in the
1536  * hashtable, and getting rid of the others can make the table entries
1537  * significantly smaller. The hashtable only contains the relevant columns,
1538  * and is packed/unpacked in lookup_hash_entry() / agg_retrieve_hash_table()
1539  * into the format of the normal input descriptor.
1540  *
1541  * Additional columns, in addition to the columns grouped by, come from two
1542  * sources: Firstly functionally dependent columns that we don't need to group
1543  * by themselves, and secondly ctids for row-marks.
1544  *
1545  * To eliminate duplicates, we build a bitmapset of the needed columns, and
1546  * then build an array of the columns included in the hashtable. We might
1547  * still have duplicates if the passed-in grpColIdx has them, which can happen
1548  * in edge cases from semijoins/distinct; these can't always be removed,
1549  * because it's not certain that the duplicate cols will be using the same
1550  * hash function.
1551  *
1552  * Note that the array is preserved over ExecReScanAgg, so we allocate it in
1553  * the per-query context (unlike the hash table itself).
1554  */
1555 static void
1557 {
1558  Bitmapset *base_colnos;
1559  Bitmapset *aggregated_colnos;
1560  TupleDesc scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
1561  List *outerTlist = outerPlanState(aggstate)->plan->targetlist;
1562  int numHashes = aggstate->num_hashes;
1563  EState *estate = aggstate->ss.ps.state;
1564  int j;
1565 
1566  /* Find Vars that will be needed in tlist and qual */
1567  find_cols(aggstate, &aggregated_colnos, &base_colnos);
1568  aggstate->colnos_needed = bms_union(base_colnos, aggregated_colnos);
1569  aggstate->max_colno_needed = 0;
1570  aggstate->all_cols_needed = true;
1571 
1572  for (int i = 0; i < scanDesc->natts; i++)
1573  {
1574  int colno = i + 1;
1575 
1576  if (bms_is_member(colno, aggstate->colnos_needed))
1577  aggstate->max_colno_needed = colno;
1578  else
1579  aggstate->all_cols_needed = false;
1580  }
1581 
1582  for (j = 0; j < numHashes; ++j)
1583  {
1584  AggStatePerHash perhash = &aggstate->perhash[j];
1585  Bitmapset *colnos = bms_copy(base_colnos);
1586  AttrNumber *grpColIdx = perhash->aggnode->grpColIdx;
1587  List *hashTlist = NIL;
1588  TupleDesc hashDesc;
1589  int maxCols;
1590  int i;
1591 
1592  perhash->largestGrpColIdx = 0;
1593 
1594  /*
1595  * If we're doing grouping sets, then some Vars might be referenced in
1596  * tlist/qual for the benefit of other grouping sets, but not needed
1597  * when hashing; i.e. prepare_projection_slot will null them out, so
1598  * there'd be no point storing them. Use prepare_projection_slot's
1599  * logic to determine which.
1600  */
1601  if (aggstate->phases[0].grouped_cols)
1602  {
1603  Bitmapset *grouped_cols = aggstate->phases[0].grouped_cols[j];
1604  ListCell *lc;
1605 
1606  foreach(lc, aggstate->all_grouped_cols)
1607  {
1608  int attnum = lfirst_int(lc);
1609 
1610  if (!bms_is_member(attnum, grouped_cols))
1611  colnos = bms_del_member(colnos, attnum);
1612  }
1613  }
1614 
1615  /*
1616  * Compute maximum number of input columns accounting for possible
1617  * duplications in the grpColIdx array, which can happen in some edge
1618  * cases where HashAggregate was generated as part of a semijoin or a
1619  * DISTINCT.
1620  */
1621  maxCols = bms_num_members(colnos) + perhash->numCols;
1622 
1623  perhash->hashGrpColIdxInput =
1624  palloc(maxCols * sizeof(AttrNumber));
1625  perhash->hashGrpColIdxHash =
1626  palloc(perhash->numCols * sizeof(AttrNumber));
1627 
1628  /* Add all the grouping columns to colnos */
1629  for (i = 0; i < perhash->numCols; i++)
1630  colnos = bms_add_member(colnos, grpColIdx[i]);
1631 
1632  /*
1633  * First build mapping for columns directly hashed. These are the
1634  * first, because they'll be accessed when computing hash values and
1635  * comparing tuples for exact matches. We also build simple mapping
1636  * for execGrouping, so it knows where to find the to-be-hashed /
1637  * compared columns in the input.
1638  */
1639  for (i = 0; i < perhash->numCols; i++)
1640  {
1641  perhash->hashGrpColIdxInput[i] = grpColIdx[i];
1642  perhash->hashGrpColIdxHash[i] = i + 1;
1643  perhash->numhashGrpCols++;
1644  /* delete already mapped columns */
1645  bms_del_member(colnos, grpColIdx[i]);
1646  }
1647 
1648  /* and add the remaining columns */
1649  while ((i = bms_first_member(colnos)) >= 0)
1650  {
1651  perhash->hashGrpColIdxInput[perhash->numhashGrpCols] = i;
1652  perhash->numhashGrpCols++;
1653  }
1654 
1655  /* and build a tuple descriptor for the hashtable */
1656  for (i = 0; i < perhash->numhashGrpCols; i++)
1657  {
1658  int varNumber = perhash->hashGrpColIdxInput[i] - 1;
1659 
1660  hashTlist = lappend(hashTlist, list_nth(outerTlist, varNumber));
1661  perhash->largestGrpColIdx =
1662  Max(varNumber + 1, perhash->largestGrpColIdx);
1663  }
1664 
1665  hashDesc = ExecTypeFromTL(hashTlist);
1666 
1667  execTuplesHashPrepare(perhash->numCols,
1668  perhash->aggnode->grpOperators,
1669  &perhash->eqfuncoids,
1670  &perhash->hashfunctions);
1671  perhash->hashslot =
1672  ExecAllocTableSlot(&estate->es_tupleTable, hashDesc,
1674 
1675  list_free(hashTlist);
1676  bms_free(colnos);
1677  }
1678 
1679  bms_free(base_colnos);
1680 }
1681 
1682 /*
1683  * Estimate per-hash-table-entry overhead.
1684  */
1685 Size
1686 hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
1687 {
1688  Size tupleChunkSize;
1689  Size pergroupChunkSize;
1690  Size transitionChunkSize;
1691  Size tupleSize = (MAXALIGN(SizeofMinimalTupleHeader) +
1692  tupleWidth);
1693  Size pergroupSize = numTrans * sizeof(AggStatePerGroupData);
1694 
1695  tupleChunkSize = CHUNKHDRSZ + tupleSize;
1696 
1697  if (pergroupSize > 0)
1698  pergroupChunkSize = CHUNKHDRSZ + pergroupSize;
1699  else
1700  pergroupChunkSize = 0;
1701 
1702  if (transitionSpace > 0)
1703  transitionChunkSize = CHUNKHDRSZ + transitionSpace;
1704  else
1705  transitionChunkSize = 0;
1706 
1707  return
1708  sizeof(TupleHashEntryData) +
1709  tupleChunkSize +
1710  pergroupChunkSize +
1711  transitionChunkSize;
1712 }
1713 
1714 /*
1715  * hashagg_recompile_expressions()
1716  *
1717  * Identifies the right phase, compiles the right expression given the
1718  * arguments, and then sets phase->evalfunc to that expression.
1719  *
1720  * Different versions of the compiled expression are needed depending on
1721  * whether hash aggregation has spilled or not, and whether it's reading from
1722  * the outer plan or a tape. Before spilling to disk, the expression reads
1723  * from the outer plan and does not need to perform a NULL check. After
1724  * HashAgg begins to spill, new groups will not be created in the hash table,
1725  * and the AggStatePerGroup array may be NULL; therefore we need to add a null
1726  * pointer check to the expression. Then, when reading spilled data from a
1727  * tape, we change the outer slot type to be a fixed minimal tuple slot.
1728  *
1729  * It would be wasteful to recompile every time, so cache the compiled
1730  * expressions in the AggStatePerPhase, and reuse when appropriate.
1731  */
1732 static void
1733 hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
1734 {
1735  AggStatePerPhase phase;
1736  int i = minslot ? 1 : 0;
1737  int j = nullcheck ? 1 : 0;
1738 
1739  Assert(aggstate->aggstrategy == AGG_HASHED ||
1740  aggstate->aggstrategy == AGG_MIXED);
1741 
1742  if (aggstate->aggstrategy == AGG_HASHED)
1743  phase = &aggstate->phases[0];
1744  else /* AGG_MIXED */
1745  phase = &aggstate->phases[1];
1746 
1747  if (phase->evaltrans_cache[i][j] == NULL)
1748  {
1749  const TupleTableSlotOps *outerops = aggstate->ss.ps.outerops;
1750  bool outerfixed = aggstate->ss.ps.outeropsfixed;
1751  bool dohash = true;
1752  bool dosort = false;
1753 
1754  /*
1755  * If minslot is true, that means we are processing a spilled batch
1756  * (inside agg_refill_hash_table()), and we must not advance the
1757  * sorted grouping sets.
1758  */
1759  if (aggstate->aggstrategy == AGG_MIXED && !minslot)
1760  dosort = true;
1761 
1762  /* temporarily change the outerops while compiling the expression */
1763  if (minslot)
1764  {
1765  aggstate->ss.ps.outerops = &TTSOpsMinimalTuple;
1766  aggstate->ss.ps.outeropsfixed = true;
1767  }
1768 
1769  phase->evaltrans_cache[i][j] = ExecBuildAggTrans(aggstate, phase,
1770  dosort, dohash,
1771  nullcheck);
1772 
1773  /* change back */
1774  aggstate->ss.ps.outerops = outerops;
1775  aggstate->ss.ps.outeropsfixed = outerfixed;
1776  }
1777 
1778  phase->evaltrans = phase->evaltrans_cache[i][j];
1779 }
1780 
1781 /*
1782  * Set limits that trigger spilling to avoid exceeding hash_mem. Consider the
1783  * number of partitions we expect to create (if we do spill).
1784  *
1785  * There are two limits: a memory limit, and also an ngroups limit. The
1786  * ngroups limit becomes important when we expect transition values to grow
1787  * substantially larger than the initial value.
1788  */
1789 void
1790 hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
1791  Size *mem_limit, uint64 *ngroups_limit,
1792  int *num_partitions)
1793 {
1794  int npartitions;
1795  Size partition_mem;
1796  Size hash_mem_limit = get_hash_memory_limit();
1797 
1798  /* if not expected to spill, use all of hash_mem */
1799  if (input_groups * hashentrysize <= hash_mem_limit)
1800  {
1801  if (num_partitions != NULL)
1802  *num_partitions = 0;
1803  *mem_limit = hash_mem_limit;
1804  *ngroups_limit = hash_mem_limit / hashentrysize;
1805  return;
1806  }
1807 
1808  /*
1809  * Calculate expected memory requirements for spilling, which is the size
1810  * of the buffers needed for all the tapes that need to be open at once.
1811  * Then, subtract that from the memory available for holding hash tables.
1812  */
1813  npartitions = hash_choose_num_partitions(input_groups,
1814  hashentrysize,
1815  used_bits,
1816  NULL);
1817  if (num_partitions != NULL)
1818  *num_partitions = npartitions;
1819 
1820  partition_mem =
1822  HASHAGG_WRITE_BUFFER_SIZE * npartitions;
1823 
1824  /*
1825  * Don't set the limit below 3/4 of hash_mem. In that case, we are at the
1826  * minimum number of partitions, so we aren't going to dramatically exceed
1827  * work mem anyway.
1828  */
1829  if (hash_mem_limit > 4 * partition_mem)
1830  *mem_limit = hash_mem_limit - partition_mem;
1831  else
1832  *mem_limit = hash_mem_limit * 0.75;
1833 
1834  if (*mem_limit > hashentrysize)
1835  *ngroups_limit = *mem_limit / hashentrysize;
1836  else
1837  *ngroups_limit = 1;
1838 }
1839 
1840 /*
1841  * hash_agg_check_limits
1842  *
1843  * After adding a new group to the hash table, check whether we need to enter
1844  * spill mode. Allocations may happen without adding new groups (for instance,
1845  * if the transition state size grows), so this check is imperfect.
1846  */
1847 static void
1849 {
1850  uint64 ngroups = aggstate->hash_ngroups_current;
1851  Size meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt,
1852  true);
1854  true);
1855 
1856  /*
1857  * Don't spill unless there's at least one group in the hash table so we
1858  * can be sure to make progress even in edge cases.
1859  */
1860  if (aggstate->hash_ngroups_current > 0 &&
1861  (meta_mem + hashkey_mem > aggstate->hash_mem_limit ||
1862  ngroups > aggstate->hash_ngroups_limit))
1863  {
1864  hash_agg_enter_spill_mode(aggstate);
1865  }
1866 }
1867 
1868 /*
1869  * Enter "spill mode", meaning that no new groups are added to any of the hash
1870  * tables. Tuples that would create a new group are instead spilled, and
1871  * processed later.
1872  */
1873 static void
1875 {
1876  aggstate->hash_spill_mode = true;
1877  hashagg_recompile_expressions(aggstate, aggstate->table_filled, true);
1878 
1879  if (!aggstate->hash_ever_spilled)
1880  {
1881  Assert(aggstate->hash_tapeset == NULL);
1882  Assert(aggstate->hash_spills == NULL);
1883 
1884  aggstate->hash_ever_spilled = true;
1885 
1886  aggstate->hash_tapeset = LogicalTapeSetCreate(true, NULL, -1);
1887 
1888  aggstate->hash_spills = palloc(sizeof(HashAggSpill) * aggstate->num_hashes);
1889 
1890  for (int setno = 0; setno < aggstate->num_hashes; setno++)
1891  {
1892  AggStatePerHash perhash = &aggstate->perhash[setno];
1893  HashAggSpill *spill = &aggstate->hash_spills[setno];
1894 
1895  hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
1896  perhash->aggnode->numGroups,
1897  aggstate->hashentrysize);
1898  }
1899  }
1900 }
1901 
1902 /*
1903  * Update metrics after filling the hash table.
1904  *
1905  * If reading from the outer plan, from_tape should be false; if reading from
1906  * another tape, from_tape should be true.
1907  */
1908 static void
1909 hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
1910 {
1911  Size meta_mem;
1912  Size hashkey_mem;
1913  Size buffer_mem;
1914  Size total_mem;
1915 
1916  if (aggstate->aggstrategy != AGG_MIXED &&
1917  aggstate->aggstrategy != AGG_HASHED)
1918  return;
1919 
1920  /* memory for the hash table itself */
1921  meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
1922 
1923  /* memory for the group keys and transition states */
1924  hashkey_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
1925 
1926  /* memory for read/write tape buffers, if spilled */
1927  buffer_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
1928  if (from_tape)
1929  buffer_mem += HASHAGG_READ_BUFFER_SIZE;
1930 
1931  /* update peak mem */
1932  total_mem = meta_mem + hashkey_mem + buffer_mem;
1933  if (total_mem > aggstate->hash_mem_peak)
1934  aggstate->hash_mem_peak = total_mem;
1935 
1936  /* update disk usage */
1937  if (aggstate->hash_tapeset != NULL)
1938  {
1939  uint64 disk_used = LogicalTapeSetBlocks(aggstate->hash_tapeset) * (BLCKSZ / 1024);
1940 
1941  if (aggstate->hash_disk_used < disk_used)
1942  aggstate->hash_disk_used = disk_used;
1943  }
1944 
1945  /* update hashentrysize estimate based on contents */
1946  if (aggstate->hash_ngroups_current > 0)
1947  {
1948  aggstate->hashentrysize =
1949  sizeof(TupleHashEntryData) +
1950  (hashkey_mem / (double) aggstate->hash_ngroups_current);
1951  }
1952 }
1953 
1954 /*
1955  * Choose a reasonable number of buckets for the initial hash table size.
1956  */
1957 static long
1958 hash_choose_num_buckets(double hashentrysize, long ngroups, Size memory)
1959 {
1960  long max_nbuckets;
1961  long nbuckets = ngroups;
1962 
1963  max_nbuckets = memory / hashentrysize;
1964 
1965  /*
1966  * Underestimating is better than overestimating. Too many buckets crowd
1967  * out space for group keys and transition state values.
1968  */
1969  max_nbuckets >>= 1;
1970 
1971  if (nbuckets > max_nbuckets)
1972  nbuckets = max_nbuckets;
1973 
1974  return Max(nbuckets, 1);
1975 }
1976 
1977 /*
1978  * Determine the number of partitions to create when spilling, which will
1979  * always be a power of two. If log2_npartitions is non-NULL, set
1980  * *log2_npartitions to the log2() of the number of partitions.
1981  */
1982 static int
1983 hash_choose_num_partitions(double input_groups, double hashentrysize,
1984  int used_bits, int *log2_npartitions)
1985 {
1986  Size hash_mem_limit = get_hash_memory_limit();
1987  double partition_limit;
1988  double mem_wanted;
1989  double dpartitions;
1990  int npartitions;
1991  int partition_bits;
1992 
1993  /*
1994  * Avoid creating so many partitions that the memory requirements of the
1995  * open partition files are greater than 1/4 of hash_mem.
1996  */
1997  partition_limit =
1998  (hash_mem_limit * 0.25 - HASHAGG_READ_BUFFER_SIZE) /
2000 
2001  mem_wanted = HASHAGG_PARTITION_FACTOR * input_groups * hashentrysize;
2002 
2003  /* make enough partitions so that each one is likely to fit in memory */
2004  dpartitions = 1 + (mem_wanted / hash_mem_limit);
2005 
2006  if (dpartitions > partition_limit)
2007  dpartitions = partition_limit;
2008 
2009  if (dpartitions < HASHAGG_MIN_PARTITIONS)
2010  dpartitions = HASHAGG_MIN_PARTITIONS;
2011  if (dpartitions > HASHAGG_MAX_PARTITIONS)
2012  dpartitions = HASHAGG_MAX_PARTITIONS;
2013 
2014  /* HASHAGG_MAX_PARTITIONS limit makes this safe */
2015  npartitions = (int) dpartitions;
2016 
2017  /* ceil(log2(npartitions)) */
2018  partition_bits = my_log2(npartitions);
2019 
2020  /* make sure that we don't exhaust the hash bits */
2021  if (partition_bits + used_bits >= 32)
2022  partition_bits = 32 - used_bits;
2023 
2024  if (log2_npartitions != NULL)
2025  *log2_npartitions = partition_bits;
2026 
2027  /* number of partitions will be a power of two */
2028  npartitions = 1 << partition_bits;
2029 
2030  return npartitions;
2031 }
2032 
2033 /*
2034  * Initialize a freshly-created TupleHashEntry.
2035  */
2036 static void
2038  TupleHashEntry entry)
2039 {
2040  AggStatePerGroup pergroup;
2041  int transno;
2042 
2043  aggstate->hash_ngroups_current++;
2044  hash_agg_check_limits(aggstate);
2045 
2046  /* no need to allocate or initialize per-group state */
2047  if (aggstate->numtrans == 0)
2048  return;
2049 
2050  pergroup = (AggStatePerGroup)
2051  MemoryContextAlloc(hashtable->tablecxt,
2052  sizeof(AggStatePerGroupData) * aggstate->numtrans);
2053 
2054  entry->additional = pergroup;
2055 
2056  /*
2057  * Initialize aggregates for new tuple group, lookup_hash_entries()
2058  * already has selected the relevant grouping set.
2059  */
2060  for (transno = 0; transno < aggstate->numtrans; transno++)
2061  {
2062  AggStatePerTrans pertrans = &aggstate->pertrans[transno];
2063  AggStatePerGroup pergroupstate = &pergroup[transno];
2064 
2065  initialize_aggregate(aggstate, pertrans, pergroupstate);
2066  }
2067 }
2068 
2069 /*
2070  * Look up hash entries for the current tuple in all hashed grouping sets.
2071  *
2072  * Be aware that lookup_hash_entry can reset the tmpcontext.
2073  *
2074  * Some entries may be left NULL if we are in "spill mode". The same tuple
2075  * will belong to different groups for each grouping set, so may match a group
2076  * already in memory for one set and match a group not in memory for another
2077  * set. When in "spill mode", the tuple will be spilled for each grouping set
2078  * where it doesn't match a group in memory.
2079  *
2080  * NB: It's possible to spill the same tuple for several different grouping
2081  * sets. This may seem wasteful, but it's actually a trade-off: if we spill
2082  * the tuple multiple times for multiple grouping sets, it can be partitioned
2083  * for each grouping set, making the refilling of the hash table very
2084  * efficient.
2085  */
2086 static void
2088 {
2089  AggStatePerGroup *pergroup = aggstate->hash_pergroup;
2090  TupleTableSlot *outerslot = aggstate->tmpcontext->ecxt_outertuple;
2091  int setno;
2092 
2093  for (setno = 0; setno < aggstate->num_hashes; setno++)
2094  {
2095  AggStatePerHash perhash = &aggstate->perhash[setno];
2096  TupleHashTable hashtable = perhash->hashtable;
2097  TupleTableSlot *hashslot = perhash->hashslot;
2098  TupleHashEntry entry;
2099  uint32 hash;
2100  bool isnew = false;
2101  bool *p_isnew;
2102 
2103  /* if hash table already spilled, don't create new entries */
2104  p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
2105 
2106  select_current_set(aggstate, setno, true);
2107  prepare_hash_slot(perhash,
2108  outerslot,
2109  hashslot);
2110 
2111  entry = LookupTupleHashEntry(hashtable, hashslot,
2112  p_isnew, &hash);
2113 
2114  if (entry != NULL)
2115  {
2116  if (isnew)
2117  initialize_hash_entry(aggstate, hashtable, entry);
2118  pergroup[setno] = entry->additional;
2119  }
2120  else
2121  {
2122  HashAggSpill *spill = &aggstate->hash_spills[setno];
2123  TupleTableSlot *slot = aggstate->tmpcontext->ecxt_outertuple;
2124 
2125  if (spill->partitions == NULL)
2126  hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
2127  perhash->aggnode->numGroups,
2128  aggstate->hashentrysize);
2129 
2130  hashagg_spill_tuple(aggstate, spill, slot, hash);
2131  pergroup[setno] = NULL;
2132  }
2133  }
2134 }
2135 
2136 /*
2137  * ExecAgg -
2138  *
2139  * ExecAgg receives tuples from its outer subplan and aggregates over
2140  * the appropriate attribute for each aggregate function use (Aggref
2141  * node) appearing in the targetlist or qual of the node. The number
2142  * of tuples to aggregate over depends on whether grouped or plain
2143  * aggregation is selected. In grouped aggregation, we produce a result
2144  * row for each group; in plain aggregation there's a single result row
2145  * for the whole query. In either case, the value of each aggregate is
2146  * stored in the expression context to be used when ExecProject evaluates
2147  * the result tuple.
2148  */
2149 static TupleTableSlot *
2151 {
2152  AggState *node = castNode(AggState, pstate);
2153  TupleTableSlot *result = NULL;
2154 
2156 
2157  if (!node->agg_done)
2158  {
2159  /* Dispatch based on strategy */
2160  switch (node->phase->aggstrategy)
2161  {
2162  case AGG_HASHED:
2163  if (!node->table_filled)
2164  agg_fill_hash_table(node);
2165  /* FALLTHROUGH */
2166  case AGG_MIXED:
2167  result = agg_retrieve_hash_table(node);
2168  break;
2169  case AGG_PLAIN:
2170  case AGG_SORTED:
2171  result = agg_retrieve_direct(node);
2172  break;
2173  }
2174 
2175  if (!TupIsNull(result))
2176  return result;
2177  }
2178 
2179  return NULL;
2180 }
2181 
2182 /*
2183  * ExecAgg for non-hashed case
2184  */
2185 static TupleTableSlot *
2187 {
2188  Agg *node = aggstate->phase->aggnode;
2189  ExprContext *econtext;
2190  ExprContext *tmpcontext;
2191  AggStatePerAgg peragg;
2192  AggStatePerGroup *pergroups;
2193  TupleTableSlot *outerslot;
2194  TupleTableSlot *firstSlot;
2195  TupleTableSlot *result;
2196  bool hasGroupingSets = aggstate->phase->numsets > 0;
2197  int numGroupingSets = Max(aggstate->phase->numsets, 1);
2198  int currentSet;
2199  int nextSetSize;
2200  int numReset;
2201  int i;
2202 
2203  /*
2204  * get state info from node
2205  *
2206  * econtext is the per-output-tuple expression context
2207  *
2208  * tmpcontext is the per-input-tuple expression context
2209  */
2210  econtext = aggstate->ss.ps.ps_ExprContext;
2211  tmpcontext = aggstate->tmpcontext;
2212 
2213  peragg = aggstate->peragg;
2214  pergroups = aggstate->pergroups;
2215  firstSlot = aggstate->ss.ss_ScanTupleSlot;
2216 
2217  /*
2218  * We loop retrieving groups until we find one matching
2219  * aggstate->ss.ps.qual
2220  *
2221  * For grouping sets, we have the invariant that aggstate->projected_set
2222  * is either -1 (initial call) or the index (starting from 0) in
2223  * gset_lengths for the group we just completed (either by projecting a
2224  * row or by discarding it in the qual).
2225  */
2226  while (!aggstate->agg_done)
2227  {
2228  /*
2229  * Clear the per-output-tuple context for each group, as well as
2230  * aggcontext (which contains any pass-by-ref transvalues of the old
2231  * group). Some aggregate functions store working state in child
2232  * contexts; those now get reset automatically without us needing to
2233  * do anything special.
2234  *
2235  * We use ReScanExprContext not just ResetExprContext because we want
2236  * any registered shutdown callbacks to be called. That allows
2237  * aggregate functions to ensure they've cleaned up any non-memory
2238  * resources.
2239  */
2240  ReScanExprContext(econtext);
2241 
2242  /*
2243  * Determine how many grouping sets need to be reset at this boundary.
2244  */
2245  if (aggstate->projected_set >= 0 &&
2246  aggstate->projected_set < numGroupingSets)
2247  numReset = aggstate->projected_set + 1;
2248  else
2249  numReset = numGroupingSets;
2250 
2251  /*
2252  * numReset can change on a phase boundary, but that's OK; we want to
2253  * reset the contexts used in _this_ phase, and later, after possibly
2254  * changing phase, initialize the right number of aggregates for the
2255  * _new_ phase.
2256  */
2257 
2258  for (i = 0; i < numReset; i++)
2259  {
2260  ReScanExprContext(aggstate->aggcontexts[i]);
2261  }
2262 
2263  /*
2264  * Check if input is complete and there are no more groups to project
2265  * in this phase; move to next phase or mark as done.
2266  */
2267  if (aggstate->input_done == true &&
2268  aggstate->projected_set >= (numGroupingSets - 1))
2269  {
2270  if (aggstate->current_phase < aggstate->numphases - 1)
2271  {
2272  initialize_phase(aggstate, aggstate->current_phase + 1);
2273  aggstate->input_done = false;
2274  aggstate->projected_set = -1;
2275  numGroupingSets = Max(aggstate->phase->numsets, 1);
2276  node = aggstate->phase->aggnode;
2277  numReset = numGroupingSets;
2278  }
2279  else if (aggstate->aggstrategy == AGG_MIXED)
2280  {
2281  /*
2282  * Mixed mode; we've output all the grouped stuff and have
2283  * full hashtables, so switch to outputting those.
2284  */
2285  initialize_phase(aggstate, 0);
2286  aggstate->table_filled = true;
2288  &aggstate->perhash[0].hashiter);
2289  select_current_set(aggstate, 0, true);
2290  return agg_retrieve_hash_table(aggstate);
2291  }
2292  else
2293  {
2294  aggstate->agg_done = true;
2295  break;
2296  }
2297  }
2298 
2299  /*
2300  * Get the number of columns in the next grouping set after the last
2301  * projected one (if any). This is the number of columns to compare to
2302  * see if we reached the boundary of that set too.
2303  */
2304  if (aggstate->projected_set >= 0 &&
2305  aggstate->projected_set < (numGroupingSets - 1))
2306  nextSetSize = aggstate->phase->gset_lengths[aggstate->projected_set + 1];
2307  else
2308  nextSetSize = 0;
2309 
2310  /*----------
2311  * If a subgroup for the current grouping set is present, project it.
2312  *
2313  * We have a new group if:
2314  * - we're out of input but haven't projected all grouping sets
2315  * (checked above)
2316  * OR
2317  * - we already projected a row that wasn't from the last grouping
2318  * set
2319  * AND
2320  * - the next grouping set has at least one grouping column (since
2321  * empty grouping sets project only once input is exhausted)
2322  * AND
2323  * - the previous and pending rows differ on the grouping columns
2324  * of the next grouping set
2325  *----------
2326  */
2327  tmpcontext->ecxt_innertuple = econtext->ecxt_outertuple;
2328  if (aggstate->input_done ||
2329  (node->aggstrategy != AGG_PLAIN &&
2330  aggstate->projected_set != -1 &&
2331  aggstate->projected_set < (numGroupingSets - 1) &&
2332  nextSetSize > 0 &&
2333  !ExecQualAndReset(aggstate->phase->eqfunctions[nextSetSize - 1],
2334  tmpcontext)))
2335  {
2336  aggstate->projected_set += 1;
2337 
2338  Assert(aggstate->projected_set < numGroupingSets);
2339  Assert(nextSetSize > 0 || aggstate->input_done);
2340  }
2341  else
2342  {
2343  /*
2344  * We no longer care what group we just projected, the next
2345  * projection will always be the first (or only) grouping set
2346  * (unless the input proves to be empty).
2347  */
2348  aggstate->projected_set = 0;
2349 
2350  /*
2351  * If we don't already have the first tuple of the new group,
2352  * fetch it from the outer plan.
2353  */
2354  if (aggstate->grp_firstTuple == NULL)
2355  {
2356  outerslot = fetch_input_tuple(aggstate);
2357  if (!TupIsNull(outerslot))
2358  {
2359  /*
2360  * Make a copy of the first input tuple; we will use this
2361  * for comparisons (in group mode) and for projection.
2362  */
2363  aggstate->grp_firstTuple = ExecCopySlotHeapTuple(outerslot);
2364  }
2365  else
2366  {
2367  /* outer plan produced no tuples at all */
2368  if (hasGroupingSets)
2369  {
2370  /*
2371  * If there was no input at all, we need to project
2372  * rows only if there are grouping sets of size 0.
2373  * Note that this implies that there can't be any
2374  * references to ungrouped Vars, which would otherwise
2375  * cause issues with the empty output slot.
2376  *
2377  * XXX: This is no longer true, we currently deal with
2378  * this in finalize_aggregates().
2379  */
2380  aggstate->input_done = true;
2381 
2382  while (aggstate->phase->gset_lengths[aggstate->projected_set] > 0)
2383  {
2384  aggstate->projected_set += 1;
2385  if (aggstate->projected_set >= numGroupingSets)
2386  {
2387  /*
2388  * We can't set agg_done here because we might
2389  * have more phases to do, even though the
2390  * input is empty. So we need to restart the
2391  * whole outer loop.
2392  */
2393  break;
2394  }
2395  }
2396 
2397  if (aggstate->projected_set >= numGroupingSets)
2398  continue;
2399  }
2400  else
2401  {
2402  aggstate->agg_done = true;
2403  /* If we are grouping, we should produce no tuples too */
2404  if (node->aggstrategy != AGG_PLAIN)
2405  return NULL;
2406  }
2407  }
2408  }
2409 
2410  /*
2411  * Initialize working state for a new input tuple group.
2412  */
2413  initialize_aggregates(aggstate, pergroups, numReset);
2414 
2415  if (aggstate->grp_firstTuple != NULL)
2416  {
2417  /*
2418  * Store the copied first input tuple in the tuple table slot
2419  * reserved for it. The tuple will be deleted when it is
2420  * cleared from the slot.
2421  */
2423  firstSlot, true);
2424  aggstate->grp_firstTuple = NULL; /* don't keep two pointers */
2425 
2426  /* set up for first advance_aggregates call */
2427  tmpcontext->ecxt_outertuple = firstSlot;
2428 
2429  /*
2430  * Process each outer-plan tuple, and then fetch the next one,
2431  * until we exhaust the outer plan or cross a group boundary.
2432  */
2433  for (;;)
2434  {
2435  /*
2436  * During phase 1 only of a mixed agg, we need to update
2437  * hashtables as well in advance_aggregates.
2438  */
2439  if (aggstate->aggstrategy == AGG_MIXED &&
2440  aggstate->current_phase == 1)
2441  {
2442  lookup_hash_entries(aggstate);
2443  }
2444 
2445  /* Advance the aggregates (or combine functions) */
2446  advance_aggregates(aggstate);
2447 
2448  /* Reset per-input-tuple context after each tuple */
2449  ResetExprContext(tmpcontext);
2450 
2451  outerslot = fetch_input_tuple(aggstate);
2452  if (TupIsNull(outerslot))
2453  {
2454  /* no more outer-plan tuples available */
2455 
2456  /* if we built hash tables, finalize any spills */
2457  if (aggstate->aggstrategy == AGG_MIXED &&
2458  aggstate->current_phase == 1)
2460 
2461  if (hasGroupingSets)
2462  {
2463  aggstate->input_done = true;
2464  break;
2465  }
2466  else
2467  {
2468  aggstate->agg_done = true;
2469  break;
2470  }
2471  }
2472  /* set up for next advance_aggregates call */
2473  tmpcontext->ecxt_outertuple = outerslot;
2474 
2475  /*
2476  * If we are grouping, check whether we've crossed a group
2477  * boundary.
2478  */
2479  if (node->aggstrategy != AGG_PLAIN && node->numCols > 0)
2480  {
2481  tmpcontext->ecxt_innertuple = firstSlot;
2482  if (!ExecQual(aggstate->phase->eqfunctions[node->numCols - 1],
2483  tmpcontext))
2484  {
2485  aggstate->grp_firstTuple = ExecCopySlotHeapTuple(outerslot);
2486  break;
2487  }
2488  }
2489  }
2490  }
2491 
2492  /*
2493  * Use the representative input tuple for any references to
2494  * non-aggregated input columns in aggregate direct args, the node
2495  * qual, and the tlist. (If we are not grouping, and there are no
2496  * input rows at all, we will come here with an empty firstSlot
2497  * ... but if not grouping, there can't be any references to
2498  * non-aggregated input columns, so no problem.)
2499  */
2500  econtext->ecxt_outertuple = firstSlot;
2501  }
2502 
2503  Assert(aggstate->projected_set >= 0);
2504 
2505  currentSet = aggstate->projected_set;
2506 
2507  prepare_projection_slot(aggstate, econtext->ecxt_outertuple, currentSet);
2508 
2509  select_current_set(aggstate, currentSet, false);
2510 
2511  finalize_aggregates(aggstate,
2512  peragg,
2513  pergroups[currentSet]);
2514 
2515  /*
2516  * If there's no row to project right now, we must continue rather
2517  * than returning a null since there might be more groups.
2518  */
2519  result = project_aggregates(aggstate);
2520  if (result)
2521  return result;
2522  }
2523 
2524  /* No more groups */
2525  return NULL;
2526 }
2527 
2528 /*
2529  * ExecAgg for hashed case: read input and build hash table
2530  */
2531 static void
2533 {
2534  TupleTableSlot *outerslot;
2535  ExprContext *tmpcontext = aggstate->tmpcontext;
2536 
2537  /*
2538  * Process each outer-plan tuple, and then fetch the next one, until we
2539  * exhaust the outer plan.
2540  */
2541  for (;;)
2542  {
2543  outerslot = fetch_input_tuple(aggstate);
2544  if (TupIsNull(outerslot))
2545  break;
2546 
2547  /* set up for lookup_hash_entries and advance_aggregates */
2548  tmpcontext->ecxt_outertuple = outerslot;
2549 
2550  /* Find or build hashtable entries */
2551  lookup_hash_entries(aggstate);
2552 
2553  /* Advance the aggregates (or combine functions) */
2554  advance_aggregates(aggstate);
2555 
2556  /*
2557  * Reset per-input-tuple context after each tuple, but note that the
2558  * hash lookups do this too
2559  */
2560  ResetExprContext(aggstate->tmpcontext);
2561  }
2562 
2563  /* finalize spills, if any */
2565 
2566  aggstate->table_filled = true;
2567  /* Initialize to walk the first hash table */
2568  select_current_set(aggstate, 0, true);
2570  &aggstate->perhash[0].hashiter);
2571 }
2572 
2573 /*
2574  * If any data was spilled during hash aggregation, reset the hash table and
2575  * reprocess one batch of spilled data. After reprocessing a batch, the hash
2576  * table will again contain data, ready to be consumed by
2577  * agg_retrieve_hash_table_in_memory().
2578  *
2579  * Should only be called after all in memory hash table entries have been
2580  * finalized and emitted.
2581  *
2582  * Return false when input is exhausted and there's no more work to be done;
2583  * otherwise return true.
2584  */
2585 static bool
2587 {
2588  HashAggBatch *batch;
2589  AggStatePerHash perhash;
2590  HashAggSpill spill;
2591  LogicalTapeSet *tapeset = aggstate->hash_tapeset;
2592  bool spill_initialized = false;
2593 
2594  if (aggstate->hash_batches == NIL)
2595  return false;
2596 
2597  /* hash_batches is a stack, with the top item at the end of the list */
2598  batch = llast(aggstate->hash_batches);
2599  aggstate->hash_batches = list_delete_last(aggstate->hash_batches);
2600 
2601  hash_agg_set_limits(aggstate->hashentrysize, batch->input_card,
2602  batch->used_bits, &aggstate->hash_mem_limit,
2603  &aggstate->hash_ngroups_limit, NULL);
2604 
2605  /*
2606  * Each batch only processes one grouping set; set the rest to NULL so
2607  * that advance_aggregates() knows to ignore them. We don't touch
2608  * pergroups for sorted grouping sets here, because they will be needed if
2609  * we rescan later. The expressions for sorted grouping sets will not be
2610  * evaluated after we recompile anyway.
2611  */
2612  MemSet(aggstate->hash_pergroup, 0,
2613  sizeof(AggStatePerGroup) * aggstate->num_hashes);
2614 
2615  /* free memory and reset hash tables */
2616  ReScanExprContext(aggstate->hashcontext);
2617  for (int setno = 0; setno < aggstate->num_hashes; setno++)
2618  ResetTupleHashTable(aggstate->perhash[setno].hashtable);
2619 
2620  aggstate->hash_ngroups_current = 0;
2621 
2622  /*
2623  * In AGG_MIXED mode, hash aggregation happens in phase 1 and the output
2624  * happens in phase 0. So, we switch to phase 1 when processing a batch,
2625  * and back to phase 0 after the batch is done.
2626  */
2627  Assert(aggstate->current_phase == 0);
2628  if (aggstate->phase->aggstrategy == AGG_MIXED)
2629  {
2630  aggstate->current_phase = 1;
2631  aggstate->phase = &aggstate->phases[aggstate->current_phase];
2632  }
2633 
2634  select_current_set(aggstate, batch->setno, true);
2635 
2636  perhash = &aggstate->perhash[aggstate->current_set];
2637 
2638  /*
2639  * Spilled tuples are always read back as MinimalTuples, which may be
2640  * different from the outer plan, so recompile the aggregate expressions.
2641  *
2642  * We still need the NULL check, because we are only processing one
2643  * grouping set at a time and the rest will be NULL.
2644  */
2645  hashagg_recompile_expressions(aggstate, true, true);
2646 
2647  for (;;)
2648  {
2649  TupleTableSlot *spillslot = aggstate->hash_spill_rslot;
2650  TupleTableSlot *hashslot = perhash->hashslot;
2651  TupleHashEntry entry;
2652  MinimalTuple tuple;
2653  uint32 hash;
2654  bool isnew = false;
2655  bool *p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
2656 
2658 
2659  tuple = hashagg_batch_read(batch, &hash);
2660  if (tuple == NULL)
2661  break;
2662 
2663  ExecStoreMinimalTuple(tuple, spillslot, true);
2664  aggstate->tmpcontext->ecxt_outertuple = spillslot;
2665 
2666  prepare_hash_slot(perhash,
2667  aggstate->tmpcontext->ecxt_outertuple,
2668  hashslot);
2669  entry = LookupTupleHashEntryHash(perhash->hashtable, hashslot,
2670  p_isnew, hash);
2671 
2672  if (entry != NULL)
2673  {
2674  if (isnew)
2675  initialize_hash_entry(aggstate, perhash->hashtable, entry);
2676  aggstate->hash_pergroup[batch->setno] = entry->additional;
2677  advance_aggregates(aggstate);
2678  }
2679  else
2680  {
2681  if (!spill_initialized)
2682  {
2683  /*
2684  * Avoid initializing the spill until we actually need it so
2685  * that we don't assign tapes that will never be used.
2686  */
2687  spill_initialized = true;
2688  hashagg_spill_init(&spill, tapeset, batch->used_bits,
2689  batch->input_card, aggstate->hashentrysize);
2690  }
2691  /* no memory for a new group, spill */
2692  hashagg_spill_tuple(aggstate, &spill, spillslot, hash);
2693 
2694  aggstate->hash_pergroup[batch->setno] = NULL;
2695  }
2696 
2697  /*
2698  * Reset per-input-tuple context after each tuple, but note that the
2699  * hash lookups do this too
2700  */
2701  ResetExprContext(aggstate->tmpcontext);
2702  }
2703 
2704  LogicalTapeClose(batch->input_tape);
2705 
2706  /* change back to phase 0 */
2707  aggstate->current_phase = 0;
2708  aggstate->phase = &aggstate->phases[aggstate->current_phase];
2709 
2710  if (spill_initialized)
2711  {
2712  hashagg_spill_finish(aggstate, &spill, batch->setno);
2713  hash_agg_update_metrics(aggstate, true, spill.npartitions);
2714  }
2715  else
2716  hash_agg_update_metrics(aggstate, true, 0);
2717 
2718  aggstate->hash_spill_mode = false;
2719 
2720  /* prepare to walk the first hash table */
2721  select_current_set(aggstate, batch->setno, true);
2722  ResetTupleHashIterator(aggstate->perhash[batch->setno].hashtable,
2723  &aggstate->perhash[batch->setno].hashiter);
2724 
2725  pfree(batch);
2726 
2727  return true;
2728 }
2729 
2730 /*
2731  * ExecAgg for hashed case: retrieving groups from hash table
2732  *
2733  * After exhausting in-memory tuples, also try refilling the hash table using
2734  * previously-spilled tuples. Only returns NULL after all in-memory and
2735  * spilled tuples are exhausted.
2736  */
2737 static TupleTableSlot *
2739 {
2740  TupleTableSlot *result = NULL;
2741 
2742  while (result == NULL)
2743  {
2744  result = agg_retrieve_hash_table_in_memory(aggstate);
2745  if (result == NULL)
2746  {
2747  if (!agg_refill_hash_table(aggstate))
2748  {
2749  aggstate->agg_done = true;
2750  break;
2751  }
2752  }
2753  }
2754 
2755  return result;
2756 }
2757 
2758 /*
2759  * Retrieve the groups from the in-memory hash tables without considering any
2760  * spilled tuples.
2761  */
2762 static TupleTableSlot *
2764 {
2765  ExprContext *econtext;
2766  AggStatePerAgg peragg;
2767  AggStatePerGroup pergroup;
2768  TupleHashEntryData *entry;
2769  TupleTableSlot *firstSlot;
2770  TupleTableSlot *result;
2771  AggStatePerHash perhash;
2772 
2773  /*
2774  * get state info from node.
2775  *
2776  * econtext is the per-output-tuple expression context.
2777  */
2778  econtext = aggstate->ss.ps.ps_ExprContext;
2779  peragg = aggstate->peragg;
2780  firstSlot = aggstate->ss.ss_ScanTupleSlot;
2781 
2782  /*
2783  * Note that perhash (and therefore anything accessed through it) can
2784  * change inside the loop, as we change between grouping sets.
2785  */
2786  perhash = &aggstate->perhash[aggstate->current_set];
2787 
2788  /*
2789  * We loop retrieving groups until we find one satisfying
2790  * aggstate->ss.ps.qual
2791  */
2792  for (;;)
2793  {
2794  TupleTableSlot *hashslot = perhash->hashslot;
2795  int i;
2796 
2798 
2799  /*
2800  * Find the next entry in the hash table
2801  */
2802  entry = ScanTupleHashTable(perhash->hashtable, &perhash->hashiter);
2803  if (entry == NULL)
2804  {
2805  int nextset = aggstate->current_set + 1;
2806 
2807  if (nextset < aggstate->num_hashes)
2808  {
2809  /*
2810  * Switch to next grouping set, reinitialize, and restart the
2811  * loop.
2812  */
2813  select_current_set(aggstate, nextset, true);
2814 
2815  perhash = &aggstate->perhash[aggstate->current_set];
2816 
2817  ResetTupleHashIterator(perhash->hashtable, &perhash->hashiter);
2818 
2819  continue;
2820  }
2821  else
2822  {
2823  return NULL;
2824  }
2825  }
2826 
2827  /*
2828  * Clear the per-output-tuple context for each group
2829  *
2830  * We intentionally don't use ReScanExprContext here; if any aggs have
2831  * registered shutdown callbacks, they mustn't be called yet, since we
2832  * might not be done with that agg.
2833  */
2834  ResetExprContext(econtext);
2835 
2836  /*
2837  * Transform representative tuple back into one with the right
2838  * columns.
2839  */
2840  ExecStoreMinimalTuple(entry->firstTuple, hashslot, false);
2841  slot_getallattrs(hashslot);
2842 
2843  ExecClearTuple(firstSlot);
2844  memset(firstSlot->tts_isnull, true,
2845  firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
2846 
2847  for (i = 0; i < perhash->numhashGrpCols; i++)
2848  {
2849  int varNumber = perhash->hashGrpColIdxInput[i] - 1;
2850 
2851  firstSlot->tts_values[varNumber] = hashslot->tts_values[i];
2852  firstSlot->tts_isnull[varNumber] = hashslot->tts_isnull[i];
2853  }
2854  ExecStoreVirtualTuple(firstSlot);
2855 
2856  pergroup = (AggStatePerGroup) entry->additional;
2857 
2858  /*
2859  * Use the representative input tuple for any references to
2860  * non-aggregated input columns in the qual and tlist.
2861  */
2862  econtext->ecxt_outertuple = firstSlot;
2863 
2864  prepare_projection_slot(aggstate,
2865  econtext->ecxt_outertuple,
2866  aggstate->current_set);
2867 
2868  finalize_aggregates(aggstate, peragg, pergroup);
2869 
2870  result = project_aggregates(aggstate);
2871  if (result)
2872  return result;
2873  }
2874 
2875  /* No more groups */
2876  return NULL;
2877 }
2878 
2879 /*
2880  * hashagg_spill_init
2881  *
2882  * Called after we determined that spilling is necessary. Chooses the number
2883  * of partitions to create, and initializes them.
2884  */
2885 static void
2886 hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
2887  double input_groups, double hashentrysize)
2888 {
2889  int npartitions;
2890  int partition_bits;
2891 
2892  npartitions = hash_choose_num_partitions(input_groups, hashentrysize,
2893  used_bits, &partition_bits);
2894 
2895  spill->partitions = palloc0(sizeof(LogicalTape *) * npartitions);
2896  spill->ntuples = palloc0(sizeof(int64) * npartitions);
2897  spill->hll_card = palloc0(sizeof(hyperLogLogState) * npartitions);
2898 
2899  for (int i = 0; i < npartitions; i++)
2900  spill->partitions[i] = LogicalTapeCreate(tapeset);
2901 
2902  spill->shift = 32 - used_bits - partition_bits;
2903  spill->mask = (npartitions - 1) << spill->shift;
2904  spill->npartitions = npartitions;
2905 
2906  for (int i = 0; i < npartitions; i++)
2908 }
2909 
2910 /*
2911  * hashagg_spill_tuple
2912  *
2913  * No room for new groups in the hash table. Save for later in the appropriate
2914  * partition.
2915  */
2916 static Size
2918  TupleTableSlot *inputslot, uint32 hash)
2919 {
2920  TupleTableSlot *spillslot;
2921  int partition;
2922  MinimalTuple tuple;
2923  LogicalTape *tape;
2924  int total_written = 0;
2925  bool shouldFree;
2926 
2927  Assert(spill->partitions != NULL);
2928 
2929  /* spill only attributes that we actually need */
2930  if (!aggstate->all_cols_needed)
2931  {
2932  spillslot = aggstate->hash_spill_wslot;
2933  slot_getsomeattrs(inputslot, aggstate->max_colno_needed);
2934  ExecClearTuple(spillslot);
2935  for (int i = 0; i < spillslot->tts_tupleDescriptor->natts; i++)
2936  {
2937  if (bms_is_member(i + 1, aggstate->colnos_needed))
2938  {
2939  spillslot->tts_values[i] = inputslot->tts_values[i];
2940  spillslot->tts_isnull[i] = inputslot->tts_isnull[i];
2941  }
2942  else
2943  spillslot->tts_isnull[i] = true;
2944  }
2945  ExecStoreVirtualTuple(spillslot);
2946  }
2947  else
2948  spillslot = inputslot;
2949 
2950  tuple = ExecFetchSlotMinimalTuple(spillslot, &shouldFree);
2951 
2952  partition = (hash & spill->mask) >> spill->shift;
2953  spill->ntuples[partition]++;
2954 
2955  /*
2956  * All hash values destined for a given partition have some bits in
2957  * common, which causes bad HLL cardinality estimates. Hash the hash to
2958  * get a more uniform distribution.
2959  */
2960  addHyperLogLog(&spill->hll_card[partition], hash_bytes_uint32(hash));
2961 
2962  tape = spill->partitions[partition];
2963 
2964  LogicalTapeWrite(tape, &hash, sizeof(uint32));
2965  total_written += sizeof(uint32);
2966 
2967  LogicalTapeWrite(tape, tuple, tuple->t_len);
2968  total_written += tuple->t_len;
2969 
2970  if (shouldFree)
2971  pfree(tuple);
2972 
2973  return total_written;
2974 }
2975 
2976 /*
2977  * hashagg_batch_new
2978  *
2979  * Construct a HashAggBatch item, which represents one iteration of HashAgg to
2980  * be done.
2981  */
2982 static HashAggBatch *
2983 hashagg_batch_new(LogicalTape *input_tape, int setno,
2984  int64 input_tuples, double input_card, int used_bits)
2985 {
2986  HashAggBatch *batch = palloc0(sizeof(HashAggBatch));
2987 
2988  batch->setno = setno;
2989  batch->used_bits = used_bits;
2990  batch->input_tape = input_tape;
2991  batch->input_tuples = input_tuples;
2992  batch->input_card = input_card;
2993 
2994  return batch;
2995 }
2996 
2997 /*
2998  * read_spilled_tuple
2999  * read the next tuple from a batch's tape. Return NULL if no more.
3000  */
3001 static MinimalTuple
3003 {
3004  LogicalTape *tape = batch->input_tape;
3005  MinimalTuple tuple;
3006  uint32 t_len;
3007  size_t nread;
3008  uint32 hash;
3009 
3010  nread = LogicalTapeRead(tape, &hash, sizeof(uint32));
3011  if (nread == 0)
3012  return NULL;
3013  if (nread != sizeof(uint32))
3014  ereport(ERROR,
3016  errmsg("unexpected EOF for tape %p: requested %zu bytes, read %zu bytes",
3017  tape, sizeof(uint32), nread)));
3018  if (hashp != NULL)
3019  *hashp = hash;
3020 
3021  nread = LogicalTapeRead(tape, &t_len, sizeof(t_len));
3022  if (nread != sizeof(uint32))
3023  ereport(ERROR,
3025  errmsg("unexpected EOF for tape %p: requested %zu bytes, read %zu bytes",
3026  tape, sizeof(uint32), nread)));
3027 
3028  tuple = (MinimalTuple) palloc(t_len);
3029  tuple->t_len = t_len;
3030 
3031  nread = LogicalTapeRead(tape,
3032  (char *) tuple + sizeof(uint32),
3033  t_len - sizeof(uint32));
3034  if (nread != t_len - sizeof(uint32))
3035  ereport(ERROR,
3037  errmsg("unexpected EOF for tape %p: requested %zu bytes, read %zu bytes",
3038  tape, t_len - sizeof(uint32), nread)));
3039 
3040  return tuple;
3041 }
3042 
3043 /*
3044  * hashagg_finish_initial_spills
3045  *
3046  * After a HashAggBatch has been processed, it may have spilled tuples to
3047  * disk. If so, turn the spilled partitions into new batches that must later
3048  * be executed.
3049  */
3050 static void
3052 {
3053  int setno;
3054  int total_npartitions = 0;
3055 
3056  if (aggstate->hash_spills != NULL)
3057  {
3058  for (setno = 0; setno < aggstate->num_hashes; setno++)
3059  {
3060  HashAggSpill *spill = &aggstate->hash_spills[setno];
3061 
3062  total_npartitions += spill->npartitions;
3063  hashagg_spill_finish(aggstate, spill, setno);
3064  }
3065 
3066  /*
3067  * We're not processing tuples from outer plan any more; only
3068  * processing batches of spilled tuples. The initial spill structures
3069  * are no longer needed.
3070  */
3071  pfree(aggstate->hash_spills);
3072  aggstate->hash_spills = NULL;
3073  }
3074 
3075  hash_agg_update_metrics(aggstate, false, total_npartitions);
3076  aggstate->hash_spill_mode = false;
3077 }
3078 
3079 /*
3080  * hashagg_spill_finish
3081  *
3082  * Transform spill partitions into new batches.
3083  */
3084 static void
3085 hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
3086 {
3087  int i;
3088  int used_bits = 32 - spill->shift;
3089 
3090  if (spill->npartitions == 0)
3091  return; /* didn't spill */
3092 
3093  for (i = 0; i < spill->npartitions; i++)
3094  {
3095  LogicalTape *tape = spill->partitions[i];
3096  HashAggBatch *new_batch;
3097  double cardinality;
3098 
3099  /* if the partition is empty, don't create a new batch of work */
3100  if (spill->ntuples[i] == 0)
3101  continue;
3102 
3103  cardinality = estimateHyperLogLog(&spill->hll_card[i]);
3104  freeHyperLogLog(&spill->hll_card[i]);
3105 
3106  /* rewinding frees the buffer while not in use */
3108 
3109  new_batch = hashagg_batch_new(tape, setno,
3110  spill->ntuples[i], cardinality,
3111  used_bits);
3112  aggstate->hash_batches = lappend(aggstate->hash_batches, new_batch);
3113  aggstate->hash_batches_used++;
3114  }
3115 
3116  pfree(spill->ntuples);
3117  pfree(spill->hll_card);
3118  pfree(spill->partitions);
3119 }
3120 
3121 /*
3122  * Free resources related to a spilled HashAgg.
3123  */
3124 static void
3126 {
3127  /* free spills from initial pass */
3128  if (aggstate->hash_spills != NULL)
3129  {
3130  int setno;
3131 
3132  for (setno = 0; setno < aggstate->num_hashes; setno++)
3133  {
3134  HashAggSpill *spill = &aggstate->hash_spills[setno];
3135 
3136  pfree(spill->ntuples);
3137  pfree(spill->partitions);
3138  }
3139  pfree(aggstate->hash_spills);
3140  aggstate->hash_spills = NULL;
3141  }
3142 
3143  /* free batches */
3144  list_free_deep(aggstate->hash_batches);
3145  aggstate->hash_batches = NIL;
3146 
3147  /* close tape set */
3148  if (aggstate->hash_tapeset != NULL)
3149  {
3150  LogicalTapeSetClose(aggstate->hash_tapeset);
3151  aggstate->hash_tapeset = NULL;
3152  }
3153 }
3154 
3155 
3156 /* -----------------
3157  * ExecInitAgg
3158  *
3159  * Creates the run-time information for the agg node produced by the
3160  * planner and initializes its outer subtree.
3161  *
3162  * -----------------
3163  */
3164 AggState *
3165 ExecInitAgg(Agg *node, EState *estate, int eflags)
3166 {
3167  AggState *aggstate;
3168  AggStatePerAgg peraggs;
3169  AggStatePerTrans pertransstates;
3170  AggStatePerGroup *pergroups;
3171  Plan *outerPlan;
3172  ExprContext *econtext;
3173  TupleDesc scanDesc;
3174  int max_aggno;
3175  int max_transno;
3176  int numaggrefs;
3177  int numaggs;
3178  int numtrans;
3179  int phase;
3180  int phaseidx;
3181  ListCell *l;
3182  Bitmapset *all_grouped_cols = NULL;
3183  int numGroupingSets = 1;
3184  int numPhases;
3185  int numHashes;
3186  int i = 0;
3187  int j = 0;
3188  bool use_hashing = (node->aggstrategy == AGG_HASHED ||
3189  node->aggstrategy == AGG_MIXED);
3190 
3191  /* check for unsupported flags */
3192  Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
3193 
3194  /*
3195  * create state structure
3196  */
3197  aggstate = makeNode(AggState);
3198  aggstate->ss.ps.plan = (Plan *) node;
3199  aggstate->ss.ps.state = estate;
3200  aggstate->ss.ps.ExecProcNode = ExecAgg;
3201 
3202  aggstate->aggs = NIL;
3203  aggstate->numaggs = 0;
3204  aggstate->numtrans = 0;
3205  aggstate->aggstrategy = node->aggstrategy;
3206  aggstate->aggsplit = node->aggsplit;
3207  aggstate->maxsets = 0;
3208  aggstate->projected_set = -1;
3209  aggstate->current_set = 0;
3210  aggstate->peragg = NULL;
3211  aggstate->pertrans = NULL;
3212  aggstate->curperagg = NULL;
3213  aggstate->curpertrans = NULL;
3214  aggstate->input_done = false;
3215  aggstate->agg_done = false;
3216  aggstate->pergroups = NULL;
3217  aggstate->grp_firstTuple = NULL;
3218  aggstate->sort_in = NULL;
3219  aggstate->sort_out = NULL;
3220 
3221  /*
3222  * phases[0] always exists, but is dummy in sorted/plain mode
3223  */
3224  numPhases = (use_hashing ? 1 : 2);
3225  numHashes = (use_hashing ? 1 : 0);
3226 
3227  /*
3228  * Calculate the maximum number of grouping sets in any phase; this
3229  * determines the size of some allocations. Also calculate the number of
3230  * phases, since all hashed/mixed nodes contribute to only a single phase.
3231  */
3232  if (node->groupingSets)
3233  {
3234  numGroupingSets = list_length(node->groupingSets);
3235 
3236  foreach(l, node->chain)
3237  {
3238  Agg *agg = lfirst(l);
3239 
3240  numGroupingSets = Max(numGroupingSets,
3241  list_length(agg->groupingSets));
3242 
3243  /*
3244  * additional AGG_HASHED aggs become part of phase 0, but all
3245  * others add an extra phase.
3246  */
3247  if (agg->aggstrategy != AGG_HASHED)
3248  ++numPhases;
3249  else
3250  ++numHashes;
3251  }
3252  }
3253 
3254  aggstate->maxsets = numGroupingSets;
3255  aggstate->numphases = numPhases;
3256 
3257  aggstate->aggcontexts = (ExprContext **)
3258  palloc0(sizeof(ExprContext *) * numGroupingSets);
3259 
3260  /*
3261  * Create expression contexts. We need three or more, one for
3262  * per-input-tuple processing, one for per-output-tuple processing, one
3263  * for all the hashtables, and one for each grouping set. The per-tuple
3264  * memory context of the per-grouping-set ExprContexts (aggcontexts)
3265  * replaces the standalone memory context formerly used to hold transition
3266  * values. We cheat a little by using ExecAssignExprContext() to build
3267  * all of them.
3268  *
3269  * NOTE: the details of what is stored in aggcontexts and what is stored
3270  * in the regular per-query memory context are driven by a simple
3271  * decision: we want to reset the aggcontext at group boundaries (if not
3272  * hashing) and in ExecReScanAgg to recover no-longer-wanted space.
3273  */
3274  ExecAssignExprContext(estate, &aggstate->ss.ps);
3275  aggstate->tmpcontext = aggstate->ss.ps.ps_ExprContext;
3276 
3277  for (i = 0; i < numGroupingSets; ++i)
3278  {
3279  ExecAssignExprContext(estate, &aggstate->ss.ps);
3280  aggstate->aggcontexts[i] = aggstate->ss.ps.ps_ExprContext;
3281  }
3282 
3283  if (use_hashing)
3284  aggstate->hashcontext = CreateWorkExprContext(estate);
3285 
3286  ExecAssignExprContext(estate, &aggstate->ss.ps);
3287 
3288  /*
3289  * Initialize child nodes.
3290  *
3291  * If we are doing a hashed aggregation then the child plan does not need
3292  * to handle REWIND efficiently; see ExecReScanAgg.
3293  */
3294  if (node->aggstrategy == AGG_HASHED)
3295  eflags &= ~EXEC_FLAG_REWIND;
3296  outerPlan = outerPlan(node);
3297  outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
3298 
3299  /*
3300  * initialize source tuple type.
3301  */
3302  aggstate->ss.ps.outerops =
3304  &aggstate->ss.ps.outeropsfixed);
3305  aggstate->ss.ps.outeropsset = true;
3306 
3307  ExecCreateScanSlotFromOuterPlan(estate, &aggstate->ss,
3308  aggstate->ss.ps.outerops);
3309  scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
3310 
3311  /*
3312  * If there are more than two phases (including a potential dummy phase
3313  * 0), input will be resorted using tuplesort. Need a slot for that.
3314  */
3315  if (numPhases > 2)
3316  {
3317  aggstate->sort_slot = ExecInitExtraTupleSlot(estate, scanDesc,
3319 
3320  /*
3321  * The output of the tuplesort, and the output from the outer child
3322  * might not use the same type of slot. In most cases the child will
3323  * be a Sort, and thus return a TTSOpsMinimalTuple type slot - but the
3324  * input can also be presorted due an index, in which case it could be
3325  * a different type of slot.
3326  *
3327  * XXX: For efficiency it would be good to instead/additionally
3328  * generate expressions with corresponding settings of outerops* for
3329  * the individual phases - deforming is often a bottleneck for
3330  * aggregations with lots of rows per group. If there's multiple
3331  * sorts, we know that all but the first use TTSOpsMinimalTuple (via
3332  * the nodeAgg.c internal tuplesort).
3333  */
3334  if (aggstate->ss.ps.outeropsfixed &&
3335  aggstate->ss.ps.outerops != &TTSOpsMinimalTuple)
3336  aggstate->ss.ps.outeropsfixed = false;
3337  }
3338 
3339  /*
3340  * Initialize result type, slot and projection.
3341  */
3343  ExecAssignProjectionInfo(&aggstate->ss.ps, NULL);
3344 
3345  /*
3346  * initialize child expressions
3347  *
3348  * We expect the parser to have checked that no aggs contain other agg
3349  * calls in their arguments (and just to be sure, we verify it again while
3350  * initializing the plan node). This would make no sense under SQL
3351  * semantics, and it's forbidden by the spec. Because it is true, we
3352  * don't need to worry about evaluating the aggs in any particular order.
3353  *
3354  * Note: execExpr.c finds Aggrefs for us, and adds them to aggstate->aggs.
3355  * Aggrefs in the qual are found here; Aggrefs in the targetlist are found
3356  * during ExecAssignProjectionInfo, above.
3357  */
3358  aggstate->ss.ps.qual =
3359  ExecInitQual(node->plan.qual, (PlanState *) aggstate);
3360 
3361  /*
3362  * We should now have found all Aggrefs in the targetlist and quals.
3363  */
3364  numaggrefs = list_length(aggstate->aggs);
3365  max_aggno = -1;
3366  max_transno = -1;
3367  foreach(l, aggstate->aggs)
3368  {
3369  Aggref *aggref = (Aggref *) lfirst(l);
3370 
3371  max_aggno = Max(max_aggno, aggref->aggno);
3372  max_transno = Max(max_transno, aggref->aggtransno);
3373  }
3374  numaggs = max_aggno + 1;
3375  numtrans = max_transno + 1;
3376 
3377  /*
3378  * For each phase, prepare grouping set data and fmgr lookup data for
3379  * compare functions. Accumulate all_grouped_cols in passing.
3380  */
3381  aggstate->phases = palloc0(numPhases * sizeof(AggStatePerPhaseData));
3382 
3383  aggstate->num_hashes = numHashes;
3384  if (numHashes)
3385  {
3386  aggstate->perhash = palloc0(sizeof(AggStatePerHashData) * numHashes);
3387  aggstate->phases[0].numsets = 0;
3388  aggstate->phases[0].gset_lengths = palloc(numHashes * sizeof(int));
3389  aggstate->phases[0].grouped_cols = palloc(numHashes * sizeof(Bitmapset *));
3390  }
3391 
3392  phase = 0;
3393  for (phaseidx = 0; phaseidx <= list_length(node->chain); ++phaseidx)
3394  {
3395  Agg *aggnode;
3396  Sort *sortnode;
3397 
3398  if (phaseidx > 0)
3399  {
3400  aggnode = list_nth_node(Agg, node->chain, phaseidx - 1);
3401  sortnode = castNode(Sort, outerPlan(aggnode));
3402  }
3403  else
3404  {
3405  aggnode = node;
3406  sortnode = NULL;
3407  }
3408 
3409  Assert(phase <= 1 || sortnode);
3410 
3411  if (aggnode->aggstrategy == AGG_HASHED
3412  || aggnode->aggstrategy == AGG_MIXED)
3413  {
3414  AggStatePerPhase phasedata = &aggstate->phases[0];
3415  AggStatePerHash perhash;
3416  Bitmapset *cols = NULL;
3417 
3418  Assert(phase == 0);
3419  i = phasedata->numsets++;
3420  perhash = &aggstate->perhash[i];
3421 
3422  /* phase 0 always points to the "real" Agg in the hash case */
3423  phasedata->aggnode = node;
3424  phasedata->aggstrategy = node->aggstrategy;
3425 
3426  /* but the actual Agg node representing this hash is saved here */
3427  perhash->aggnode = aggnode;
3428 
3429  phasedata->gset_lengths[i] = perhash->numCols = aggnode->numCols;
3430 
3431  for (j = 0; j < aggnode->numCols; ++j)
3432  cols = bms_add_member(cols, aggnode->grpColIdx[j]);
3433 
3434  phasedata->grouped_cols[i] = cols;
3435 
3436  all_grouped_cols = bms_add_members(all_grouped_cols, cols);
3437  continue;
3438  }
3439  else
3440  {
3441  AggStatePerPhase phasedata = &aggstate->phases[++phase];
3442  int num_sets;
3443 
3444  phasedata->numsets = num_sets = list_length(aggnode->groupingSets);
3445 
3446  if (num_sets)
3447  {
3448  phasedata->gset_lengths = palloc(num_sets * sizeof(int));
3449  phasedata->grouped_cols = palloc(num_sets * sizeof(Bitmapset *));
3450 
3451  i = 0;
3452  foreach(l, aggnode->groupingSets)
3453  {
3454  int current_length = list_length(lfirst(l));
3455  Bitmapset *cols = NULL;
3456 
3457  /* planner forces this to be correct */
3458  for (j = 0; j < current_length; ++j)
3459  cols = bms_add_member(cols, aggnode->grpColIdx[j]);
3460 
3461  phasedata->grouped_cols[i] = cols;
3462  phasedata->gset_lengths[i] = current_length;
3463 
3464  ++i;
3465  }
3466 
3467  all_grouped_cols = bms_add_members(all_grouped_cols,
3468  phasedata->grouped_cols[0]);
3469  }
3470  else
3471  {
3472  Assert(phaseidx == 0);
3473 
3474  phasedata->gset_lengths = NULL;
3475  phasedata->grouped_cols = NULL;
3476  }
3477 
3478  /*
3479  * If we are grouping, precompute fmgr lookup data for inner loop.
3480  */
3481  if (aggnode->aggstrategy == AGG_SORTED)
3482  {
3483  /*
3484  * Build a separate function for each subset of columns that
3485  * need to be compared.
3486  */
3487  phasedata->eqfunctions =
3488  (ExprState **) palloc0(aggnode->numCols * sizeof(ExprState *));
3489 
3490  /* for each grouping set */
3491  for (int k = 0; k < phasedata->numsets; k++)
3492  {
3493  int length = phasedata->gset_lengths[k];
3494 
3495  /* nothing to do for empty grouping set */
3496  if (length == 0)
3497  continue;
3498 
3499  /* if we already had one of this length, it'll do */
3500  if (phasedata->eqfunctions[length - 1] != NULL)
3501  continue;
3502 
3503  phasedata->eqfunctions[length - 1] =
3504  execTuplesMatchPrepare(scanDesc,
3505  length,
3506  aggnode->grpColIdx,
3507  aggnode->grpOperators,
3508  aggnode->grpCollations,
3509  (PlanState *) aggstate);
3510  }
3511 
3512  /* and for all grouped columns, unless already computed */
3513  if (aggnode->numCols > 0 &&
3514  phasedata->eqfunctions[aggnode->numCols - 1] == NULL)
3515  {
3516  phasedata->eqfunctions[aggnode->numCols - 1] =
3517  execTuplesMatchPrepare(scanDesc,
3518  aggnode->numCols,
3519  aggnode->grpColIdx,
3520  aggnode->grpOperators,
3521  aggnode->grpCollations,
3522  (PlanState *) aggstate);
3523  }
3524  }
3525 
3526  phasedata->aggnode = aggnode;
3527  phasedata->aggstrategy = aggnode->aggstrategy;
3528  phasedata->sortnode = sortnode;
3529  }
3530  }
3531 
3532  /*
3533  * Convert all_grouped_cols to a descending-order list.
3534  */
3535  i = -1;
3536  while ((i = bms_next_member(all_grouped_cols, i)) >= 0)
3537  aggstate->all_grouped_cols = lcons_int(i, aggstate->all_grouped_cols);
3538 
3539  /*
3540  * Set up aggregate-result storage in the output expr context, and also
3541  * allocate my private per-agg working storage
3542  */
3543  econtext = aggstate->ss.ps.ps_ExprContext;
3544  econtext->ecxt_aggvalues = (Datum *) palloc0(sizeof(Datum) * numaggs);
3545  econtext->ecxt_aggnulls = (bool *) palloc0(sizeof(bool) * numaggs);
3546 
3547  peraggs = (AggStatePerAgg) palloc0(sizeof(AggStatePerAggData) * numaggs);
3548  pertransstates = (AggStatePerTrans) palloc0(sizeof(AggStatePerTransData) * numtrans);
3549 
3550  aggstate->peragg = peraggs;
3551  aggstate->pertrans = pertransstates;
3552 
3553 
3554  aggstate->all_pergroups =
3556  * (numGroupingSets + numHashes));
3557  pergroups = aggstate->all_pergroups;
3558 
3559  if (node->aggstrategy != AGG_HASHED)
3560  {
3561  for (i = 0; i < numGroupingSets; i++)
3562  {
3563  pergroups[i] = (AggStatePerGroup) palloc0(sizeof(AggStatePerGroupData)
3564  * numaggs);
3565  }
3566 
3567  aggstate->pergroups = pergroups;
3568  pergroups += numGroupingSets;
3569  }
3570 
3571  /*
3572  * Hashing can only appear in the initial phase.
3573  */
3574  if (use_hashing)
3575  {
3576  Plan *outerplan = outerPlan(node);
3577  uint64 totalGroups = 0;
3578 
3579  aggstate->hash_metacxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
3580  "HashAgg meta context",
3582  aggstate->hash_spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
3584  aggstate->hash_spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
3585  &TTSOpsVirtual);
3586 
3587  /* this is an array of pointers, not structures */
3588  aggstate->hash_pergroup = pergroups;
3589 
3590  aggstate->hashentrysize = hash_agg_entry_size(aggstate->numtrans,
3591  outerplan->plan_width,
3592  node->transitionSpace);
3593 
3594  /*
3595  * Consider all of the grouping sets together when setting the limits
3596  * and estimating the number of partitions. This can be inaccurate
3597  * when there is more than one grouping set, but should still be
3598  * reasonable.
3599  */
3600  for (int k = 0; k < aggstate->num_hashes; k++)
3601  totalGroups += aggstate->perhash[k].aggnode->numGroups;
3602 
3603  hash_agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
3604  &aggstate->hash_mem_limit,
3605  &aggstate->hash_ngroups_limit,
3606  &aggstate->hash_planned_partitions);
3607  find_hash_columns(aggstate);
3608 
3609  /* Skip massive memory allocation if we are just doing EXPLAIN */
3610  if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
3611  build_hash_tables(aggstate);
3612 
3613  aggstate->table_filled = false;
3614 
3615  /* Initialize this to 1, meaning nothing spilled, yet */
3616  aggstate->hash_batches_used = 1;
3617  }
3618 
3619  /*
3620  * Initialize current phase-dependent values to initial phase. The initial
3621  * phase is 1 (first sort pass) for all strategies that use sorting (if
3622  * hashing is being done too, then phase 0 is processed last); but if only
3623  * hashing is being done, then phase 0 is all there is.
3624  */
3625  if (node->aggstrategy == AGG_HASHED)
3626  {
3627  aggstate->current_phase = 0;
3628  initialize_phase(aggstate, 0);
3629  select_current_set(aggstate, 0, true);
3630  }
3631  else
3632  {
3633  aggstate->current_phase = 1;
3634  initialize_phase(aggstate, 1);
3635  select_current_set(aggstate, 0, false);
3636  }
3637 
3638  /*
3639  * Perform lookups of aggregate function info, and initialize the
3640  * unchanging fields of the per-agg and per-trans data.
3641  */
3642  foreach(l, aggstate->aggs)
3643  {
3644  Aggref *aggref = lfirst(l);
3645  AggStatePerAgg peragg;
3646  AggStatePerTrans pertrans;
3647  Oid aggTransFnInputTypes[FUNC_MAX_ARGS];
3648  int numAggTransFnArgs;
3649  int numDirectArgs;
3650  HeapTuple aggTuple;
3651  Form_pg_aggregate aggform;
3652  AclResult aclresult;
3653  Oid finalfn_oid;
3654  Oid serialfn_oid,
3655  deserialfn_oid;
3656  Oid aggOwner;
3657  Expr *finalfnexpr;
3658  Oid aggtranstype;
3659 
3660  /* Planner should have assigned aggregate to correct level */
3661  Assert(aggref->agglevelsup == 0);
3662  /* ... and the split mode should match */
3663  Assert(aggref->aggsplit == aggstate->aggsplit);
3664 
3665  peragg = &peraggs[aggref->aggno];
3666 
3667  /* Check if we initialized the state for this aggregate already. */
3668  if (peragg->aggref != NULL)
3669  continue;
3670 
3671  peragg->aggref = aggref;
3672  peragg->transno = aggref->aggtransno;
3673 
3674  /* Fetch the pg_aggregate row */
3675  aggTuple = SearchSysCache1(AGGFNOID,
3676  ObjectIdGetDatum(aggref->aggfnoid));
3677  if (!HeapTupleIsValid(aggTuple))
3678  elog(ERROR, "cache lookup failed for aggregate %u",
3679  aggref->aggfnoid);
3680  aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
3681 
3682  /* Check permission to call aggregate function */
3683  aclresult = object_aclcheck(ProcedureRelationId, aggref->aggfnoid, GetUserId(),
3684  ACL_EXECUTE);
3685  if (aclresult != ACLCHECK_OK)
3686  aclcheck_error(aclresult, OBJECT_AGGREGATE,
3687  get_func_name(aggref->aggfnoid));
3689 
3690  /* planner recorded transition state type in the Aggref itself */
3691  aggtranstype = aggref->aggtranstype;
3692  Assert(OidIsValid(aggtranstype));
3693 
3694  /* Final function only required if we're finalizing the aggregates */
3695  if (DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit))
3696  peragg->finalfn_oid = finalfn_oid = InvalidOid;
3697  else
3698  peragg->finalfn_oid = finalfn_oid = aggform->aggfinalfn;
3699 
3700  serialfn_oid = InvalidOid;
3701  deserialfn_oid = InvalidOid;
3702 
3703  /*
3704  * Check if serialization/deserialization is required. We only do it
3705  * for aggregates that have transtype INTERNAL.
3706  */
3707  if (aggtranstype == INTERNALOID)
3708  {
3709  /*
3710  * The planner should only have generated a serialize agg node if
3711  * every aggregate with an INTERNAL state has a serialization
3712  * function. Verify that.
3713  */
3714  if (DO_AGGSPLIT_SERIALIZE(aggstate->aggsplit))
3715  {
3716  /* serialization only valid when not running finalfn */
3718 
3719  if (!OidIsValid(aggform->aggserialfn))
3720  elog(ERROR, "serialfunc not provided for serialization aggregation");
3721  serialfn_oid = aggform->aggserialfn;
3722  }
3723 
3724  /* Likewise for deserialization functions */
3725  if (DO_AGGSPLIT_DESERIALIZE(aggstate->aggsplit))
3726  {
3727  /* deserialization only valid when combining states */
3728  Assert(DO_AGGSPLIT_COMBINE(aggstate->aggsplit));
3729 
3730  if (!OidIsValid(aggform->aggdeserialfn))
3731  elog(ERROR, "deserialfunc not provided for deserialization aggregation");
3732  deserialfn_oid = aggform->aggdeserialfn;
3733  }
3734  }
3735 
3736  /* Check that aggregate owner has permission to call component fns */
3737  {
3738  HeapTuple procTuple;
3739 
3740  procTuple = SearchSysCache1(PROCOID,
3741  ObjectIdGetDatum(aggref->aggfnoid));
3742  if (!HeapTupleIsValid(procTuple))
3743  elog(ERROR, "cache lookup failed for function %u",
3744  aggref->aggfnoid);
3745  aggOwner = ((Form_pg_proc) GETSTRUCT(procTuple))->proowner;
3746  ReleaseSysCache(procTuple);
3747 
3748  if (OidIsValid(finalfn_oid))
3749  {
3750  aclresult = object_aclcheck(ProcedureRelationId, finalfn_oid, aggOwner,
3751  ACL_EXECUTE);
3752  if (aclresult != ACLCHECK_OK)
3753  aclcheck_error(aclresult, OBJECT_FUNCTION,
3754  get_func_name(finalfn_oid));
3755  InvokeFunctionExecuteHook(finalfn_oid);
3756  }
3757  if (OidIsValid(serialfn_oid))
3758  {
3759  aclresult = object_aclcheck(ProcedureRelationId, serialfn_oid, aggOwner,
3760  ACL_EXECUTE);
3761  if (aclresult != ACLCHECK_OK)
3762  aclcheck_error(aclresult, OBJECT_FUNCTION,
3763  get_func_name(serialfn_oid));
3764  InvokeFunctionExecuteHook(serialfn_oid);
3765  }
3766  if (OidIsValid(deserialfn_oid))
3767  {
3768  aclresult = object_aclcheck(ProcedureRelationId, deserialfn_oid, aggOwner,
3769  ACL_EXECUTE);
3770  if (aclresult != ACLCHECK_OK)
3771  aclcheck_error(aclresult, OBJECT_FUNCTION,
3772  get_func_name(deserialfn_oid));
3773  InvokeFunctionExecuteHook(deserialfn_oid);
3774  }
3775  }
3776 
3777  /*
3778  * Get actual datatypes of the (nominal) aggregate inputs. These
3779  * could be different from the agg's declared input types, when the
3780  * agg accepts ANY or a polymorphic type.
3781  */
3782  numAggTransFnArgs = get_aggregate_argtypes(aggref,
3783  aggTransFnInputTypes);
3784 
3785  /* Count the "direct" arguments, if any */
3786  numDirectArgs = list_length(aggref->aggdirectargs);
3787 
3788  /* Detect how many arguments to pass to the finalfn */
3789  if (aggform->aggfinalextra)
3790  peragg->numFinalArgs = numAggTransFnArgs + 1;
3791  else
3792  peragg->numFinalArgs = numDirectArgs + 1;
3793 
3794  /* Initialize any direct-argument expressions */
3795  peragg->aggdirectargs = ExecInitExprList(aggref->aggdirectargs,
3796  (PlanState *) aggstate);
3797 
3798  /*
3799  * build expression trees using actual argument & result types for the
3800  * finalfn, if it exists and is required.
3801  */
3802  if (OidIsValid(finalfn_oid))
3803  {
3804  build_aggregate_finalfn_expr(aggTransFnInputTypes,
3805  peragg->numFinalArgs,
3806  aggtranstype,
3807  aggref->aggtype,
3808  aggref->inputcollid,
3809  finalfn_oid,
3810  &finalfnexpr);
3811  fmgr_info(finalfn_oid, &peragg->finalfn);
3812  fmgr_info_set_expr((Node *) finalfnexpr, &peragg->finalfn);
3813  }
3814 
3815  /* get info about the output value's datatype */
3816  get_typlenbyval(aggref->aggtype,
3817  &peragg->resulttypeLen,
3818  &peragg->resulttypeByVal);
3819 
3820  /*
3821  * Build working state for invoking the transition function, if we
3822  * haven't done it already.
3823  */
3824  pertrans = &pertransstates[aggref->aggtransno];
3825  if (pertrans->aggref == NULL)
3826  {
3827  Datum textInitVal;
3828  Datum initValue;
3829  bool initValueIsNull;
3830  Oid transfn_oid;
3831 
3832  /*
3833  * If this aggregation is performing state combines, then instead
3834  * of using the transition function, we'll use the combine
3835  * function.
3836  */
3837  if (DO_AGGSPLIT_COMBINE(aggstate->aggsplit))
3838  {
3839  transfn_oid = aggform->aggcombinefn;
3840 
3841  /* If not set then the planner messed up */
3842  if (!OidIsValid(transfn_oid))
3843  elog(ERROR, "combinefn not set for aggregate function");
3844  }
3845  else
3846  transfn_oid = aggform->aggtransfn;
3847 
3848  aclresult = object_aclcheck(ProcedureRelationId, transfn_oid, aggOwner, ACL_EXECUTE);
3849  if (aclresult != ACLCHECK_OK)
3850  aclcheck_error(aclresult, OBJECT_FUNCTION,
3851  get_func_name(transfn_oid));
3852  InvokeFunctionExecuteHook(transfn_oid);
3853 
3854  /*
3855  * initval is potentially null, so don't try to access it as a
3856  * struct field. Must do it the hard way with SysCacheGetAttr.
3857  */
3858  textInitVal = SysCacheGetAttr(AGGFNOID, aggTuple,
3859  Anum_pg_aggregate_agginitval,
3860  &initValueIsNull);
3861  if (initValueIsNull)
3862  initValue = (Datum) 0;
3863  else
3864  initValue = GetAggInitVal(textInitVal, aggtranstype);
3865 
3866  if (DO_AGGSPLIT_COMBINE(aggstate->aggsplit))
3867  {
3868  Oid combineFnInputTypes[] = {aggtranstype,
3869  aggtranstype};
3870 
3871  /*
3872  * When combining there's only one input, the to-be-combined
3873  * transition value. The transition value is not counted
3874  * here.
3875  */
3876  pertrans->numTransInputs = 1;
3877 
3878  /* aggcombinefn always has two arguments of aggtranstype */
3879  build_pertrans_for_aggref(pertrans, aggstate, estate,
3880  aggref, transfn_oid, aggtranstype,
3881  serialfn_oid, deserialfn_oid,
3882  initValue, initValueIsNull,
3883  combineFnInputTypes, 2);
3884 
3885  /*
3886  * Ensure that a combine function to combine INTERNAL states
3887  * is not strict. This should have been checked during CREATE
3888  * AGGREGATE, but the strict property could have been changed
3889  * since then.
3890  */
3891  if (pertrans->transfn.fn_strict && aggtranstype == INTERNALOID)
3892  ereport(ERROR,
3893  (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
3894  errmsg("combine function with transition type %s must not be declared STRICT",
3895  format_type_be(aggtranstype))));
3896  }
3897  else
3898  {
3899  /* Detect how many arguments to pass to the transfn */
3900  if (AGGKIND_IS_ORDERED_SET(aggref->aggkind))
3901  pertrans->numTransInputs = list_length(aggref->args);
3902  else
3903  pertrans->numTransInputs = numAggTransFnArgs;
3904 
3905  build_pertrans_for_aggref(pertrans, aggstate, estate,
3906  aggref, transfn_oid, aggtranstype,
3907  serialfn_oid, deserialfn_oid,
3908  initValue, initValueIsNull,
3909  aggTransFnInputTypes,
3910  numAggTransFnArgs);
3911 
3912  /*
3913  * If the transfn is strict and the initval is NULL, make sure
3914  * input type and transtype are the same (or at least
3915  * binary-compatible), so that it's OK to use the first
3916  * aggregated input value as the initial transValue. This
3917  * should have been checked at agg definition time, but we
3918  * must check again in case the transfn's strictness property
3919  * has been changed.
3920  */
3921  if (pertrans->transfn.fn_strict && pertrans->initValueIsNull)
3922  {
3923  if (numAggTransFnArgs <= numDirectArgs ||
3924  !IsBinaryCoercible(aggTransFnInputTypes[numDirectArgs],
3925  aggtranstype))
3926  ereport(ERROR,
3927  (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
3928  errmsg("aggregate %u needs to have compatible input type and transition type",
3929  aggref->aggfnoid)));
3930  }
3931  }
3932  }
3933  else
3934  pertrans->aggshared = true;
3935  ReleaseSysCache(aggTuple);
3936  }
3937 
3938  /*
3939  * Update aggstate->numaggs to be the number of unique aggregates found.
3940  * Also set numstates to the number of unique transition states found.
3941  */
3942  aggstate->numaggs = numaggs;
3943  aggstate->numtrans = numtrans;
3944 
3945  /*
3946  * Last, check whether any more aggregates got added onto the node while
3947  * we processed the expressions for the aggregate arguments (including not
3948  * only the regular arguments and FILTER expressions handled immediately
3949  * above, but any direct arguments we might've handled earlier). If so,
3950  * we have nested aggregate functions, which is semantically nonsensical,
3951  * so complain. (This should have been caught by the parser, so we don't
3952  * need to work hard on a helpful error message; but we defend against it
3953  * here anyway, just to be sure.)
3954  */
3955  if (numaggrefs != list_length(aggstate->aggs))
3956  ereport(ERROR,
3957  (errcode(ERRCODE_GROUPING_ERROR),
3958  errmsg("aggregate function calls cannot be nested")));
3959 
3960  /*
3961  * Build expressions doing all the transition work at once. We build a
3962  * different one for each phase, as the number of transition function
3963  * invocation can differ between phases. Note this'll work both for
3964  * transition and combination functions (although there'll only be one
3965  * phase in the latter case).
3966  */
3967  for (phaseidx = 0; phaseidx < aggstate->numphases; phaseidx++)
3968  {
3969  AggStatePerPhase phase = &aggstate->phases[phaseidx];
3970  bool dohash = false;
3971  bool dosort = false;
3972 
3973  /* phase 0 doesn't necessarily exist */
3974  if (!phase->aggnode)
3975  continue;
3976 
3977  if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 1)
3978  {
3979  /*
3980  * Phase one, and only phase one, in a mixed agg performs both
3981  * sorting and aggregation.
3982  */
3983  dohash = true;
3984  dosort = true;
3985  }
3986  else if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 0)
3987  {
3988  /*
3989  * No need to compute a transition function for an AGG_MIXED phase
3990  * 0 - the contents of the hashtables will have been computed
3991  * during phase 1.
3992  */
3993  continue;
3994  }
3995  else if (phase->aggstrategy == AGG_PLAIN ||
3996  phase->aggstrategy == AGG_SORTED)
3997  {
3998  dohash = false;
3999  dosort = true;
4000  }
4001  else if (phase->aggstrategy == AGG_HASHED)
4002  {
4003  dohash = true;
4004  dosort = false;
4005  }
4006  else
4007  Assert(false);
4008 
4009  phase->evaltrans = ExecBuildAggTrans(aggstate, phase, dosort, dohash,
4010  false);
4011 
4012  /* cache compiled expression for outer slot without NULL check */
4013  phase->evaltrans_cache[0][0] = phase->evaltrans;
4014  }
4015 
4016  return aggstate;
4017 }
4018 
4019 /*
4020  * Build the state needed to calculate a state value for an aggregate.
4021  *
4022  * This initializes all the fields in 'pertrans'. 'aggref' is the aggregate
4023  * to initialize the state for. 'transfn_oid', 'aggtranstype', and the rest
4024  * of the arguments could be calculated from 'aggref', but the caller has
4025  * calculated them already, so might as well pass them.
4026  *
4027  * 'transfn_oid' may be either the Oid of the aggtransfn or the aggcombinefn.
4028  */
4029 static void
4031  AggState *aggstate, EState *estate,
4032  Aggref *aggref,
4033  Oid transfn_oid, Oid aggtranstype,
4034  Oid aggserialfn, Oid aggdeserialfn,
4035  Datum initValue, bool initValueIsNull,
4036  Oid *inputTypes, int numArguments)
4037 {
4038  int numGroupingSets = Max(aggstate->maxsets, 1);
4039  Expr *transfnexpr;
4040  int numTransArgs;
4041  Expr *serialfnexpr = NULL;
4042  Expr *deserialfnexpr = NULL;
4043  ListCell *lc;
4044  int numInputs;
4045  int numDirectArgs;
4046  List *sortlist;
4047  int numSortCols;
4048  int numDistinctCols;
4049  int i;
4050 
4051  /* Begin filling in the pertrans data */
4052  pertrans->aggref = aggref;
4053  pertrans->aggshared = false;
4054  pertrans->aggCollation = aggref->inputcollid;
4055  pertrans->transfn_oid = transfn_oid;
4056  pertrans->serialfn_oid = aggserialfn;
4057  pertrans->deserialfn_oid = aggdeserialfn;
4058  pertrans->initValue = initValue;
4059  pertrans->initValueIsNull = initValueIsNull;
4060 
4061  /* Count the "direct" arguments, if any */
4062  numDirectArgs = list_length(aggref->aggdirectargs);
4063 
4064  /* Count the number of aggregated input columns */
4065  pertrans->numInputs = numInputs = list_length(aggref->args);
4066 
4067  pertrans->aggtranstype = aggtranstype;
4068 
4069  /* account for the current transition state */
4070  numTransArgs = pertrans->numTransInputs + 1;
4071 
4072  /*
4073  * Set up infrastructure for calling the transfn. Note that invtrans is
4074  * not needed here.
4075  */
4076  build_aggregate_transfn_expr(inputTypes,
4077  numArguments,
4078  numDirectArgs,
4079  aggref->aggvariadic,
4080  aggtranstype,
4081  aggref->inputcollid,
4082  transfn_oid,
4083  InvalidOid,
4084  &transfnexpr,
4085  NULL);
4086 
4087  fmgr_info(transfn_oid, &pertrans->transfn);
4088  fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
4089 
4090  pertrans->transfn_fcinfo =
4093  &pertrans->transfn,
4094  numTransArgs,
4095  pertrans->aggCollation,
4096  (void *) aggstate, NULL);
4097 
4098  /* get info about the state value's datatype */
4099  get_typlenbyval(aggtranstype,
4100  &pertrans->transtypeLen,
4101  &pertrans->transtypeByVal);
4102 
4103  if (OidIsValid(aggserialfn))
4104  {
4105  build_aggregate_serialfn_expr(aggserialfn,
4106  &serialfnexpr);
4107  fmgr_info(aggserialfn, &pertrans->serialfn);
4108  fmgr_info_set_expr((Node *) serialfnexpr, &pertrans->serialfn);
4109 
4110  pertrans->serialfn_fcinfo =
4113  &pertrans->serialfn,
4114  1,
4115  InvalidOid,
4116  (void *) aggstate, NULL);
4117  }
4118 
4119  if (OidIsValid(aggdeserialfn))
4120  {
4121  build_aggregate_deserialfn_expr(aggdeserialfn,
4122  &deserialfnexpr);
4123  fmgr_info(aggdeserialfn, &pertrans->deserialfn);
4124  fmgr_info_set_expr((Node *) deserialfnexpr, &pertrans->deserialfn);
4125 
4126  pertrans->deserialfn_fcinfo =
4129  &pertrans->deserialfn,
4130  2,
4131  InvalidOid,
4132  (void *) aggstate, NULL);
4133  }
4134 
4135  /*
4136  * If we're doing either DISTINCT or ORDER BY for a plain agg, then we
4137  * have a list of SortGroupClause nodes; fish out the data in them and
4138  * stick them into arrays. We ignore ORDER BY for an ordered-set agg,
4139  * however; the agg's transfn and finalfn are responsible for that.
4140  *
4141  * When the planner has set the aggpresorted flag, the input to the
4142  * aggregate is already correctly sorted. For ORDER BY aggregates we can
4143  * simply treat these as normal aggregates. For presorted DISTINCT
4144  * aggregates an extra step must be added to remove duplicate consecutive
4145  * inputs.
4146  *
4147  * Note that by construction, if there is a DISTINCT clause then the ORDER
4148  * BY clause is a prefix of it (see transformDistinctClause).
4149  */
4150  if (AGGKIND_IS_ORDERED_SET(aggref->aggkind))
4151  {
4152  sortlist = NIL;
4153  numSortCols = numDistinctCols = 0;
4154  pertrans->aggsortrequired = false;
4155  }
4156  else if (aggref->aggpresorted && aggref->aggdistinct == NIL)
4157  {
4158  sortlist = NIL;
4159  numSortCols = numDistinctCols = 0;
4160  pertrans->aggsortrequired = false;
4161  }
4162  else if (aggref->aggdistinct)
4163  {
4164  sortlist = aggref->aggdistinct;
4165  numSortCols = numDistinctCols = list_length(sortlist);
4166  Assert(numSortCols >= list_length(aggref->aggorder));
4167  pertrans->aggsortrequired = !aggref->aggpresorted;
4168  }
4169  else
4170  {
4171  sortlist = aggref->aggorder;
4172  numSortCols = list_length(sortlist);
4173  numDistinctCols = 0;
4174  pertrans->aggsortrequired = (numSortCols > 0);
4175  }
4176 
4177  pertrans->numSortCols = numSortCols;
4178  pertrans->numDistinctCols = numDistinctCols;
4179 
4180  /*
4181  * If we have either sorting or filtering to do, create a tupledesc and
4182  * slot corresponding to the aggregated inputs (including sort
4183  * expressions) of the agg.
4184  */
4185  if (numSortCols > 0 || aggref->aggfilter)
4186  {
4187  pertrans->sortdesc = ExecTypeFromTL(aggref->args);
4188  pertrans->sortslot =
4189  ExecInitExtraTupleSlot(estate, pertrans->sortdesc,
4191  }
4192 
4193  if (numSortCols > 0)
4194  {
4195  /*
4196  * We don't implement DISTINCT or ORDER BY aggs in the HASHED case
4197  * (yet)
4198  */
4199  Assert(aggstate->aggstrategy != AGG_HASHED && aggstate->aggstrategy != AGG_MIXED);
4200 
4201  /* ORDER BY aggregates are not supported with partial aggregation */
4202  Assert(!DO_AGGSPLIT_COMBINE(aggstate->aggsplit));
4203 
4204  /* If we have only one input, we need its len/byval info. */
4205  if (numInputs == 1)
4206  {
4207  get_typlenbyval(inputTypes[numDirectArgs],
4208  &pertrans->inputtypeLen,
4209  &pertrans->inputtypeByVal);
4210  }
4211  else if (numDistinctCols > 0)
4212  {
4213  /* we will need an extra slot to store prior values */
4214  pertrans->uniqslot =
4215  ExecInitExtraTupleSlot(estate, pertrans->sortdesc,
4217  }
4218 
4219  /* Extract the sort information for use later */
4220  pertrans->sortColIdx =
4221  (AttrNumber *) palloc(numSortCols * sizeof(AttrNumber));
4222  pertrans->sortOperators =
4223  (Oid *) palloc(numSortCols * sizeof(Oid));
4224  pertrans->sortCollations =
4225  (Oid *) palloc(numSortCols * sizeof(Oid));
4226  pertrans->sortNullsFirst =
4227  (bool *) palloc(numSortCols * sizeof(bool));
4228 
4229  i = 0;
4230  foreach(lc, sortlist)
4231  {
4232  SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
4233  TargetEntry *tle = get_sortgroupclause_tle(sortcl, aggref->args);
4234 
4235  /* the parser should have made sure of this */
4236  Assert(OidIsValid(sortcl->sortop));
4237 
4238  pertrans->sortColIdx[i] = tle->resno;
4239  pertrans->sortOperators[i] = sortcl->sortop;
4240  pertrans->sortCollations[i] = exprCollation((Node *) tle->expr);
4241  pertrans->sortNullsFirst[i] = sortcl->nulls_first;
4242  i++;
4243  }
4244  Assert(i == numSortCols);
4245  }
4246 
4247  if (aggref->aggdistinct)
4248  {
4249  Oid *ops;
4250 
4251  Assert(numArguments > 0);
4252  Assert(list_length(aggref->aggdistinct) == numDistinctCols);
4253 
4254  ops = palloc(numDistinctCols * sizeof(Oid));
4255 
4256  i = 0;
4257  foreach(lc, aggref->aggdistinct)
4258  ops[i++] = ((SortGroupClause *) lfirst(lc))->eqop;
4259 
4260  /* lookup / build the necessary comparators */
4261  if (numDistinctCols == 1)
4262  fmgr_info(get_opcode(ops[0]), &pertrans->equalfnOne);
4263  else
4264  pertrans->equalfnMulti =
4265  execTuplesMatchPrepare(pertrans->sortdesc,
4266  numDistinctCols,
4267  pertrans->sortColIdx,
4268  ops,
4269  pertrans->sortCollations,
4270  &aggstate->ss.ps);
4271  pfree(ops);
4272  }
4273 
4274  pertrans->sortstates = (Tuplesortstate **)
4275  palloc0(sizeof(Tuplesortstate *) * numGroupingSets);
4276 }
4277 
4278 
4279 static Datum
4280 GetAggInitVal(Datum textInitVal, Oid transtype)
4281 {
4282  Oid typinput,
4283  typioparam;
4284  char *strInitVal;
4285  Datum initVal;
4286 
4287  getTypeInputInfo(transtype, &typinput, &typioparam);
4288  strInitVal = TextDatumGetCString(textInitVal);
4289  initVal = OidInputFunctionCall(typinput, strInitVal,
4290  typioparam, -1);
4291  pfree(strInitVal);
4292  return initVal;
4293 }
4294 
4295 void
4297 {
4299  int transno;
4300  int numGroupingSets = Max(node->maxsets, 1);
4301  int setno;
4302 
4303  /*
4304  * When ending a parallel worker, copy the statistics gathered by the
4305  * worker back into shared memory so that it can be picked up by the main
4306  * process to report in EXPLAIN ANALYZE.
4307  */
4308  if (node->shared_info && IsParallelWorker())
4309  {
4311 
4312  Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
4315  si->hash_disk_used = node->hash_disk_used;
4316  si->hash_mem_peak = node->hash_mem_peak;
4317  }
4318 
4319  /* Make sure we have closed any open tuplesorts */
4320 
4321  if (node->sort_in)
4322  tuplesort_end(node->sort_in);
4323  if (node->sort_out)
4324  tuplesort_end(node->sort_out);
4325 
4327 
4328  if (node->hash_metacxt != NULL)
4329  {
4331  node->hash_metacxt = NULL;
4332  }
4333 
4334  for (transno = 0; transno < node->numtrans; transno++)
4335  {
4336  AggStatePerTrans pertrans = &node->pertrans[transno];
4337 
4338  for (setno = 0; setno < numGroupingSets; setno++)
4339  {
4340  if (pertrans->sortstates[setno])
4341  tuplesort_end(pertrans->sortstates[setno]);
4342  }
4343  }
4344 
4345  /* And ensure any agg shutdown callbacks have been called */
4346  for (setno = 0; setno < numGroupingSets; setno++)
4347  ReScanExprContext(node->aggcontexts[setno]);
4348  if (node->hashcontext)
4350 
4351  /*
4352  * We don't actually free any ExprContexts here (see comment in
4353  * ExecFreeExprContext), just unlinking the output one from the plan node
4354  * suffices.
4355  */
4356  ExecFreeExprContext(&node->ss.ps);
4357 
4358  /* clean up tuple table */
4360 
4361  outerPlan = outerPlanState(node);
4363 }
4364 
4365 void
4367 {
4368  ExprContext *econtext = node->ss.ps.ps_ExprContext;
4370  Agg *aggnode = (Agg *) node->ss.ps.plan;
4371  int transno;
4372  int numGroupingSets = Max(node->maxsets, 1);
4373  int setno;
4374 
4375  node->agg_done = false;
4376 
4377  if (node->aggstrategy == AGG_HASHED)
4378  {
4379  /*
4380  * In the hashed case, if we haven't yet built the hash table then we
4381  * can just return; nothing done yet, so nothing to undo. If subnode's
4382  * chgParam is not NULL then it will be re-scanned by ExecProcNode,
4383  * else no reason to re-scan it at all.
4384  */
4385  if (!node->table_filled)
4386  return;
4387 
4388  /*
4389  * If we do have the hash table, and it never spilled, and the subplan
4390  * does not have any parameter changes, and none of our own parameter
4391  * changes affect input expressions of the aggregated functions, then
4392  * we can just rescan the existing hash table; no need to build it
4393  * again.
4394  */
4395  if (outerPlan->chgParam == NULL && !node->hash_ever_spilled &&
4396  !bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
4397  {
4399  &node->perhash[0].hashiter);
4400  select_current_set(node, 0, true);
4401  return;
4402  }
4403  }
4404 
4405  /* Make sure we have closed any open tuplesorts */
4406  for (transno = 0; transno < node->numtrans; transno++)
4407  {
4408  for (setno = 0; setno < numGroupingSets; setno++)
4409  {
4410  AggStatePerTrans pertrans = &node->pertrans[transno];
4411 
4412  if (pertrans->sortstates[setno])
4413  {
4414  tuplesort_end(pertrans->sortstates[setno]);
4415  pertrans->sortstates[setno] = NULL;
4416  }
4417  }
4418  }
4419 
4420  /*
4421  * We don't need to ReScanExprContext the output tuple context here;
4422  * ExecReScan already did it. But we do need to reset our per-grouping-set
4423  * contexts, which may have transvalues stored in them. (We use rescan
4424  * rather than just reset because transfns may have registered callbacks
4425  * that need to be run now.) For the AGG_HASHED case, see below.
4426  */
4427 
4428  for (setno = 0; setno < numGroupingSets; setno++)
4429  {
4430  ReScanExprContext(node->aggcontexts[setno]);
4431  }
4432 
4433  /* Release first tuple of group, if we have made a copy */
4434  if (node->grp_firstTuple != NULL)
4435  {
4437  node->grp_firstTuple = NULL;
4438  }
4440 
4441  /* Forget current agg values */
4442  MemSet(econtext->ecxt_aggvalues, 0, sizeof(Datum) * node->numaggs);
4443  MemSet(econtext->ecxt_aggnulls, 0, sizeof(bool) * node->numaggs);
4444 
4445  /*
4446  * With AGG_HASHED/MIXED, the hash table is allocated in a sub-context of
4447  * the hashcontext. This used to be an issue, but now, resetting a context
4448  * automatically deletes sub-contexts too.
4449  */
4450  if (node->aggstrategy == AGG_HASHED || node->aggstrategy == AGG_MIXED)
4451  {
4453 
4454  node->hash_ever_spilled = false;
4455  node->hash_spill_mode = false;
4456  node->hash_ngroups_current = 0;
4457 
4459  /* Rebuild an empty hash table */
4460  build_hash_tables(node);
4461  node->table_filled = false;
4462  /* iterator will be reset when the table is filled */
4463 
4464  hashagg_recompile_expressions(node, false, false);
4465  }
4466 
4467  if (node->aggstrategy != AGG_HASHED)
4468  {
4469  /*
4470  * Reset the per-group state (in particular, mark transvalues null)
4471  */
4472  for (setno = 0; setno < numGroupingSets; setno++)
4473  {
4474  MemSet(node->pergroups[setno], 0,
4475  sizeof(AggStatePerGroupData) * node->numaggs);
4476  }
4477 
4478  /* reset to phase 1 */
4479  initialize_phase(node, 1);
4480 
4481  node->input_done = false;
4482  node->projected_set = -1;
4483  }
4484 
4485  if (outerPlan->chgParam == NULL)
4487 }
4488 
4489 
4490 /***********************************************************************
4491  * API exposed to aggregate functions
4492  ***********************************************************************/
4493 
4494 
4495 /*
4496  * AggCheckCallContext - test if a SQL function is being called as an aggregate
4497  *
4498  * The transition and/or final functions of an aggregate may want to verify
4499  * that they are being called as aggregates, rather than as plain SQL
4500  * functions. They should use this function to do so. The return value
4501  * is nonzero if being called as an aggregate, or zero if not. (Specific
4502  * nonzero values are AGG_CONTEXT_AGGREGATE or AGG_CONTEXT_WINDOW, but more
4503  * values could conceivably appear in future.)
4504  *
4505  * If aggcontext isn't NULL, the function also stores at *aggcontext the
4506  * identity of the memory context that aggregate transition values are being
4507  * stored in. Note that the same aggregate call site (flinfo) may be called
4508  * interleaved on different transition values in different contexts, so it's
4509  * not kosher to cache aggcontext under fn_extra. It is, however, kosher to
4510  * cache it in the transvalue itself (for internal-type transvalues).
4511  */
4512 int
4514 {
4515  if (fcinfo->context && IsA(fcinfo->context, AggState))
4516  {
4517  if (aggcontext)
4518  {
4519  AggState *aggstate = ((AggState *) fcinfo->context);
4520  ExprContext *cxt = aggstate->curaggcontext;
4521 
4522  *aggcontext = cxt->ecxt_per_tuple_memory;
4523  }
4524  return AGG_CONTEXT_AGGREGATE;
4525  }
4526  if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
4527  {
4528  if (aggcontext)
4529  *aggcontext = ((WindowAggState *) fcinfo->context)->curaggcontext;
4530  return AGG_CONTEXT_WINDOW;
4531  }
4532 
4533  /* this is just to prevent "uninitialized variable" warnings */
4534  if (aggcontext)
4535  *aggcontext = NULL;
4536  return 0;
4537 }
4538 
4539 /*
4540  * AggGetAggref - allow an aggregate support function to get its Aggref
4541  *
4542  * If the function is being called as an aggregate support function,
4543  * return the Aggref node for the aggregate call. Otherwise, return NULL.
4544  *
4545  * Aggregates sharing the same inputs and transition functions can get
4546  * merged into a single transition calculation. If the transition function
4547  * calls AggGetAggref, it will get some one of the Aggrefs for which it is
4548  * executing. It must therefore not pay attention to the Aggref fields that
4549  * relate to the final function, as those are indeterminate. But if a final
4550  * function calls AggGetAggref, it will get a precise result.
4551  *
4552  * Note that if an aggregate is being used as a window function, this will
4553  * return NULL. We could provide a similar function to return the relevant
4554  * WindowFunc node in such cases, but it's not needed yet.
4555  */
4556 Aggref *
4558 {
4559  if (fcinfo->context && IsA(fcinfo->context, AggState))
4560  {
4561  AggState *aggstate = (AggState *) fcinfo->context;
4562  AggStatePerAgg curperagg;
4563  AggStatePerTrans curpertrans;
4564 
4565  /* check curperagg (valid when in a final function) */
4566  curperagg = aggstate->curperagg;
4567 
4568  if (curperagg)
4569  return curperagg->aggref;
4570 
4571  /* check curpertrans (valid when in a transition function) */
4572  curpertrans = aggstate->curpertrans;
4573 
4574  if (curpertrans)
4575  return curpertrans->aggref;
4576  }
4577  return NULL;
4578 }
4579 
4580 /*
4581  * AggGetTempMemoryContext - fetch short-term memory context for aggregates
4582  *
4583  * This is useful in agg final functions; the context returned is one that
4584  * the final function can safely reset as desired. This isn't useful for
4585  * transition functions, since the context returned MAY (we don't promise)
4586  * be the same as the context those are called in.
4587  *
4588  * As above, this is currently not useful for aggs called as window functions.
4589  */
4592 {
4593  if (fcinfo->context && IsA(fcinfo->context, AggState))
4594  {
4595  AggState *aggstate = (AggState *) fcinfo->context;
4596 
4597  return aggstate->tmpcontext->ecxt_per_tuple_memory;
4598  }
4599  return NULL;
4600 }
4601 
4602 /*
4603  * AggStateIsShared - find out whether transition state is shared
4604  *
4605  * If the function is being called as an aggregate support function,
4606  * return true if the aggregate's transition state is shared across
4607  * multiple aggregates, false if it is not.
4608  *
4609  * Returns true if not called as an aggregate support function.
4610  * This is intended as a conservative answer, ie "no you'd better not
4611  * scribble on your input". In particular, will return true if the
4612  * aggregate is being used as a window function, which is a scenario
4613  * in which changing the transition state is a bad idea. We might
4614  * want to refine the behavior for the window case in future.
4615  */
4616 bool
4618 {
4619  if (fcinfo->context && IsA(fcinfo->context, AggState))
4620  {
4621  AggState *aggstate = (AggState *) fcinfo->context;
4622  AggStatePerAgg curperagg;
4623  AggStatePerTrans curpertrans;
4624 
4625  /* check curperagg (valid when in a final function) */
4626  curperagg = aggstate->curperagg;
4627 
4628  if (curperagg)
4629  return aggstate->pertrans[curperagg->transno].aggshared;
4630 
4631  /* check curpertrans (valid when in a transition function) */
4632  curpertrans = aggstate->curpertrans;
4633 
4634  if (curpertrans)
4635  return curpertrans->aggshared;
4636  }
4637  return true;
4638 }
4639 
4640 /*
4641  * AggRegisterCallback - register a cleanup callback for an aggregate
4642  *
4643  * This is useful for aggs to register shutdown callbacks, which will ensure
4644  * that non-memory resources are freed. The callback will occur just before
4645  * the associated aggcontext (as returned by AggCheckCallContext) is reset,
4646  * either between groups or as a result of rescanning the query. The callback
4647  * will NOT be called on error paths. The typical use-case is for freeing of
4648  * tuplestores or tuplesorts maintained in aggcontext, or pins held by slots
4649  * created by the agg functions. (The callback will not be called until after
4650  * the result of the finalfn is no longer needed, so it's safe for the finalfn
4651  * to return data that will be freed by the callback.)
4652  *
4653  * As above, this is currently not useful for aggs called as window functions.
4654  */
4655 void
4658  Datum arg)
4659 {
4660  if (fcinfo->context && IsA(fcinfo->context, AggState))
4661  {
4662  AggState *aggstate = (AggState *) fcinfo->context;
4663  ExprContext *cxt = aggstate->curaggcontext;
4664 
4665  RegisterExprContextCallback(cxt, func, arg);
4666 
4667  return;
4668  }
4669  elog(ERROR, "aggregate function cannot register a callback in this context");
4670 }
4671 
4672 
4673 /* ----------------------------------------------------------------
4674  * Parallel Query Support
4675  * ----------------------------------------------------------------
4676  */
4677 
4678  /* ----------------------------------------------------------------
4679  * ExecAggEstimate
4680  *
4681  * Estimate space required to propagate aggregate statistics.
4682  * ----------------------------------------------------------------
4683  */
4684 void
4686 {
4687  Size size;
4688 
4689  /* don't need this if not instrumenting or no workers */
4690  if (!node->ss.ps.instrument || pcxt->nworkers == 0)
4691  return;
4692 
4693  size = mul_size(pcxt->nworkers, sizeof(AggregateInstrumentation));
4694  size = add_size(size, offsetof(SharedAggInfo, sinstrument));
4695  shm_toc_estimate_chunk(&pcxt->estimator, size);
4696  shm_toc_estimate_keys(&pcxt->estimator, 1);
4697 }
4698 
4699 /* ----------------------------------------------------------------
4700  * ExecAggInitializeDSM
4701  *
4702  * Initialize DSM space for aggregate statistics.
4703  * ----------------------------------------------------------------
4704  */
4705 void
4707 {
4708  Size size;
4709 
4710  /* don't need this if not instrumenting or no workers */
4711  if (!node->ss.ps.instrument || pcxt->nworkers == 0)
4712  return;
4713 
4714  size = offsetof(SharedAggInfo, sinstrument)
4715  + pcxt->nworkers * sizeof(AggregateInstrumentation);
4716  node->shared_info = shm_toc_allocate(pcxt->toc, size);
4717  /* ensure any unfilled slots will contain zeroes */
4718  memset(node->shared_info, 0, size);
4719  node->shared_info->num_workers = pcxt->nworkers;
4720  shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
4721  node->shared_info);
4722 }
4723 
4724 /* ----------------------------------------------------------------
4725  * ExecAggInitializeWorker
4726  *
4727  * Attach worker to DSM space for aggregate statistics.
4728  * ----------------------------------------------------------------
4729  */
4730 void
4732 {
4733  node->shared_info =
4734  shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
4735 }
4736 
4737 /* ----------------------------------------------------------------
4738  * ExecAggRetrieveInstrumentation
4739  *
4740  * Transfer aggregate statistics from DSM to private memory.
4741  * ----------------------------------------------------------------
4742  */
4743 void
4745 {
4746  Size size;
4747  SharedAggInfo *si;
4748 
4749  if (node->shared_info == NULL)
4750  return;
4751 
4752  size = offsetof(SharedAggInfo, sinstrument)
4754  si = palloc(size);
4755  memcpy(si, node->shared_info, size);
4756  node->shared_info = si;
4757 }
AclResult
Definition: acl.h:182
@ ACLCHECK_OK
Definition: acl.h:183
void aclcheck_error(AclResult aclerr, ObjectType objtype, const char *objectname)
Definition: aclchk.c:2679
AclResult object_aclcheck(Oid classid, Oid objectid, Oid roleid, AclMode mode)
Definition: aclchk.c:3783
int16 AttrNumber
Definition: attnum.h:21
int ParallelWorkerNumber
Definition: parallel.c:113
int bms_next_member(const Bitmapset *a, int prevbit)
Definition: bitmapset.c:1047
void bms_free(Bitmapset *a)
Definition: bitmapset.c:209
int bms_num_members(const Bitmapset *a)
Definition: bitmapset.c:649
bool bms_is_member(int x, const Bitmapset *a)
Definition: bitmapset.c:428
Bitmapset * bms_add_member(Bitmapset *a, int x)
Definition: bitmapset.c:739
Bitmapset * bms_union(const Bitmapset *a, const Bitmapset *b)
Definition: bitmapset.c:226
Bitmapset * bms_add_members(Bitmapset *a, const Bitmapset *b)
Definition: bitmapset.c:796
Bitmapset * bms_del_member(Bitmapset *a, int x)
Definition: bitmapset.c:776
bool bms_overlap(const Bitmapset *a, const Bitmapset *b)
Definition: bitmapset.c:495
Bitmapset * bms_copy(const Bitmapset *a)
Definition: bitmapset.c:74
int bms_first_member(Bitmapset *a)
Definition: bitmapset.c:1000
#define TextDatumGetCString(d)
Definition: builtins.h:95
unsigned int uint32
Definition: c.h:490
#define MAXALIGN(LEN)
Definition: c.h:795
#define Max(x, y)
Definition: c.h:982
#define MemSet(start, val, len)
Definition: c.h:1004
#define OidIsValid(objectId)
Definition: c.h:759
size_t Size
Definition: c.h:589
Datum datumCopy(Datum value, bool typByVal, int typLen)
Definition: datum.c:132
int my_log2(long num)
Definition: dynahash.c:1760
int errcode_for_file_access(void)
Definition: elog.c:881
int errcode(int sqlerrcode)
Definition: elog.c:858
int errmsg(const char *fmt,...)
Definition: elog.c:1069
#define ERROR
Definition: elog.h:39
#define ereport(elevel,...)
Definition: elog.h:149
void ExecReScan(PlanState *node)
Definition: execAmi.c:78
Datum ExecAggTransReparent(AggState *aggstate, AggStatePerTrans pertrans, Datum newValue, bool newValueIsNull, Datum oldValue, bool oldValueIsNull)
List * ExecInitExprList(List *nodes, PlanState *parent)
Definition: execExpr.c:319
ExprState * ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase, bool doSort, bool doHash, bool nullcheck)
Definition: execExpr.c:3263
ExprState * ExecInitQual(List *qual, PlanState *parent)
Definition: execExpr.c:210
void execTuplesHashPrepare(int numCols, const Oid *eqOperators, Oid **eqFuncOids, FmgrInfo **hashFunctions)
Definition: execGrouping.c:96
TupleHashEntry LookupTupleHashEntryHash(TupleHashTable hashtable, TupleTableSlot *slot, bool *isnew, uint32 hash)
Definition: execGrouping.c:361
TupleHashEntry LookupTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot, bool *isnew, uint32 *hash)
Definition: execGrouping.c:306
TupleHashTable BuildTupleHashTableExt(PlanState *parent, TupleDesc inputDesc, int numCols, AttrNumber *keyColIdx, const Oid *eqfuncoids, FmgrInfo *hashfunctions, Oid *collations, long nbuckets, Size additionalsize, MemoryContext metacxt, MemoryContext tablecxt, MemoryContext tempcxt, bool use_variable_hash_iv)
Definition: execGrouping.c:154
void ResetTupleHashTable(TupleHashTable hashtable)
Definition: execGrouping.c:285
ExprState * execTuplesMatchPrepare(TupleDesc desc, int numCols, const AttrNumber *keyColIdx, const Oid *eqOperators, const Oid *collations, PlanState *parent)
Definition: execGrouping.c:59
void ExecEndNode(PlanState *node)
Definition: execProcnode.c:557
PlanState * ExecInitNode(Plan *node, EState *estate, int eflags)
Definition: execProcnode.c:142
const TupleTableSlotOps TTSOpsVirtual
Definition: execTuples.c:83
TupleTableSlot * ExecStoreVirtualTuple(TupleTableSlot *slot)
Definition: execTuples.c:1552
MinimalTuple ExecFetchSlotMinimalTuple(TupleTableSlot *slot, bool *shouldFree)
Definition: execTuples.c:1692
TupleTableSlot * ExecStoreAllNullTuple(TupleTableSlot *slot)
Definition: execTuples.c:1576
TupleTableSlot * ExecStoreMinimalTuple(MinimalTuple mtup, TupleTableSlot *slot, bool shouldFree)
Definition: execTuples.c:1446
TupleTableSlot * ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1831
void ExecInitResultTupleSlotTL(PlanState *planstate, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1799
const TupleTableSlotOps TTSOpsMinimalTuple
Definition: execTuples.c:85
TupleDesc ExecTypeFromTL(List *targetList)
Definition: execTuples.c:1938
TupleTableSlot * ExecAllocTableSlot(List **tupleTable, TupleDesc desc, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1171
void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot, bool shouldFree)
Definition: execTuples.c:1469
TupleDesc ExecGetResultType(PlanState *planstate)
Definition: execUtils.c:497
void ReScanExprContext(ExprContext *econtext)
Definition: execUtils.c:445
ExprContext * CreateWorkExprContext(EState *estate)
Definition: execUtils.c:323
const TupleTableSlotOps * ExecGetResultSlotOps(PlanState *planstate, bool *isfixed)
Definition: execUtils.c:506
void ExecCreateScanSlotFromOuterPlan(EState *estate, ScanState *scanstate, const TupleTableSlotOps *tts_ops)
Definition: execUtils.c:689
void ExecAssignExprContext(EState *estate, PlanState *planstate)
Definition: execUtils.c:487
void ExecAssignProjectionInfo(PlanState *planstate, TupleDesc inputDesc)
Definition: execUtils.c:542
void RegisterExprContextCallback(ExprContext *econtext, ExprContextCallbackFunction function, Datum arg)
Definition: execUtils.c:932
void ExecFreeExprContext(PlanState *planstate)
Definition: execUtils.c:657
void(* ExprContextCallbackFunction)(Datum arg)
Definition: execnodes.h:209
#define InstrCountFiltered1(node, delta)
Definition: execnodes.h:1135
#define outerPlanState(node)
Definition: execnodes.h:1127
#define ScanTupleHashTable(htable, iter)
Definition: execnodes.h:834
#define ResetTupleHashIterator(htable, iter)
Definition: execnodes.h:832
struct AggStatePerGroupData * AggStatePerGroup
Definition: execnodes.h:2354
struct AggStatePerTransData * AggStatePerTrans
Definition: execnodes.h:2353
struct TupleHashEntryData TupleHashEntryData
struct AggregateInstrumentation AggregateInstrumentation
struct AggStatePerAggData * AggStatePerAgg
Definition: execnodes.h:2352
#define EXEC_FLAG_BACKWARD
Definition: executor.h:58
#define EXEC_FLAG_REWIND
Definition: executor.h:57
static TupleTableSlot * ExecProject(ProjectionInfo *projInfo)
Definition: executor.h:364
#define ResetExprContext(econtext)
Definition: executor.h:532
static bool ExecQual(ExprState *state, ExprContext *econtext)
Definition: executor.h:401
static bool ExecQualAndReset(ExprState *state, ExprContext *econtext)
Definition: executor.h:428
static Datum ExecEvalExpr(ExprState *state, ExprContext *econtext, bool *isNull)
Definition: executor.h:321
static Datum ExecEvalExprSwitchContext(ExprState *state, ExprContext *econtext, bool *isNull)
Definition: executor.h:336
#define EXEC_FLAG_EXPLAIN_ONLY
Definition: executor.h:56
#define EXEC_FLAG_MARK
Definition: executor.h:59
static TupleTableSlot * ExecProcNode(PlanState *node)
Definition: executor.h:257
#define MakeExpandedObjectReadOnly(d, isnull, typlen)
Datum FunctionCall2Coll(FmgrInfo *flinfo, Oid collation, Datum arg1, Datum arg2)
Definition: fmgr.c:1136
void fmgr_info(Oid functionId, FmgrInfo *finfo)
Definition: fmgr.c:127
Datum OidInputFunctionCall(Oid functionId, char *str, Oid typioparam, int32 typmod)
Definition: fmgr.c:1741
#define SizeForFunctionCallInfo(nargs)
Definition: fmgr.h:102
#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs, Collation, Context, Resultinfo)
Definition: fmgr.h:150
#define AGG_CONTEXT_WINDOW
Definition: fmgr.h:762
#define LOCAL_FCINFO(name, nargs)
Definition: fmgr.h:110
#define AGG_CONTEXT_AGGREGATE
Definition: fmgr.h:761
struct FunctionCallInfoBaseData * FunctionCallInfo
Definition: fmgr.h:38
#define FunctionCallInvoke(fcinfo)
Definition: fmgr.h:172
#define fmgr_info_set_expr(expr, finfo)
Definition: fmgr.h:135
char * format_type_be(Oid type_oid)
Definition: format_type.c:339
int work_mem
Definition: globals.c:125
uint32 hash_bytes_uint32(uint32 k)
Definition: hashfn.c:610
void heap_freetuple(HeapTuple htup)
Definition: heaptuple.c:1338
MinimalTupleData * MinimalTuple
Definition: htup.h:27
#define HeapTupleIsValid(tuple)
Definition: htup.h:78
#define SizeofMinimalTupleHeader
Definition: htup_details.h:647
#define GETSTRUCT(TUP)
Definition: htup_details.h:653
void initHyperLogLog(hyperLogLogState *cState, uint8 bwidth)
Definition: hyperloglog.c:66
double estimateHyperLogLog(hyperLogLogState *cState)
Definition: hyperloglog.c:186
void addHyperLogLog(hyperLogLogState *cState, uint32 hash)
Definition: hyperloglog.c:167
void freeHyperLogLog(hyperLogLogState *cState)
Definition: hyperloglog.c:151
#define IsParallelWorker()
Definition: parallel.h:61
static int initValue(long lng_val)
Definition: informix.c:677
int j
Definition: isn.c:74
int i
Definition: isn.c:73
if(TABLE==NULL||TABLE_index==NULL)
Definition: isn.c:77
Assert(fmt[strlen(fmt) - 1] !='\n')
List * lcons_int(int datum, List *list)
Definition: list.c:512
List * lappend(List *list, void *datum)
Definition: list.c:338
void list_free(List *list)
Definition: list.c:1545
void list_free_deep(List *list)
Definition: list.c:1559
List * list_delete_last(List *list)
Definition: list.c:956
LogicalTape * LogicalTapeCreate(LogicalTapeSet *lts)
Definition: logtape.c:680
void LogicalTapeRewindForRead(LogicalTape *lt, size_t buffer_size)
Definition: logtape.c:846
size_t LogicalTapeRead(LogicalTape *lt, void *ptr, size_t size)
Definition: logtape.c:928
void LogicalTapeClose(LogicalTape *lt)
Definition: logtape.c:733
void LogicalTapeSetClose(LogicalTapeSet *lts)
Definition: logtape.c:667
void LogicalTapeWrite(LogicalTape *lt, const void *ptr, size_t size)
Definition: logtape.c:761
LogicalTapeSet * LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
Definition: logtape.c:556
long LogicalTapeSetBlocks(LogicalTapeSet *lts)
Definition: logtape.c:1183
void get_typlenbyval(Oid typid, int16 *typlen, bool *typbyval)
Definition: lsyscache.c:2209
RegProcedure get_opcode(Oid opno)
Definition: lsyscache.c:1267
void getTypeInputInfo(Oid type, Oid *typInput, Oid *typIOParam)
Definition: lsyscache.c:2832
char * get_func_name(Oid funcid)
Definition: lsyscache.c:1590
void MemoryContextReset(MemoryContext context)
Definition: mcxt.c:314
void pfree(void *pointer)
Definition: mcxt.c:1436
void * palloc0(Size size)
Definition: mcxt.c:1241
void * MemoryContextAlloc(MemoryContext context, Size size)
Definition: mcxt.c:1005
Size MemoryContextMemAllocated(MemoryContext context, bool recurse)
Definition: mcxt.c:655
void MemoryContextDelete(MemoryContext context)
Definition: mcxt.c:387
void * palloc(Size size)
Definition: mcxt.c:1210
#define AllocSetContextCreate
Definition: memutils.h:129
#define ALLOCSET_DEFAULT_SIZES
Definition: memutils.h:153
#define CHECK_FOR_INTERRUPTS()
Definition: miscadmin.h:121
Oid GetUserId(void)
Definition: miscinit.c:502
static void hashagg_finish_initial_spills(AggState *aggstate)
Definition: nodeAgg.c:3051
static long hash_choose_num_buckets(double hashentrysize, long ngroups, Size memory)
Definition: nodeAgg.c:1958
static void hash_agg_check_limits(AggState *aggstate)
Definition: nodeAgg.c:1848
static void initialize_hash_entry(AggState *aggstate, TupleHashTable hashtable, TupleHashEntry entry)
Definition: nodeAgg.c:2037
static void find_hash_columns(AggState *aggstate)
Definition: nodeAgg.c:1556
static bool agg_refill_hash_table(AggState *aggstate)
Definition: nodeAgg.c:2586
static void build_hash_table(AggState *aggstate, int setno, long nbuckets)
Definition: nodeAgg.c:1496
void ExecAggEstimate(AggState *node, ParallelContext *pcxt)
Definition: nodeAgg.c:4685
struct FindColsContext FindColsContext
static void hash_agg_enter_spill_mode(AggState *aggstate)
Definition: nodeAgg.c:1874
struct HashAggBatch HashAggBatch
static Datum GetAggInitVal(Datum textInitVal, Oid transtype)
Definition: nodeAgg.c:4280
static void find_cols(AggState *aggstate, Bitmapset **aggregated, Bitmapset **unaggregated)
Definition: nodeAgg.c:1390
void AggRegisterCallback(FunctionCallInfo fcinfo, ExprContextCallbackFunction func, Datum arg)
Definition: nodeAgg.c:4656
#define HASHAGG_HLL_BIT_WIDTH
Definition: nodeAgg.c:315
static void agg_fill_hash_table(AggState *aggstate)
Definition: nodeAgg.c:2532
static void initialize_aggregate(AggState *aggstate, AggStatePerTrans pertrans, AggStatePerGroup pergroupstate)
Definition: nodeAgg.c:579
static TupleTableSlot * fetch_input_tuple(AggState *aggstate)
Definition: nodeAgg.c:548
static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
Definition: nodeAgg.c:3085
static bool find_cols_walker(Node *node, FindColsContext *context)
Definition: nodeAgg.c:1413
void ExecAggInitializeWorker(AggState *node, ParallelWorkerContext *pwcxt)
Definition: nodeAgg.c:4731
AggState * ExecInitAgg(Agg *node, EState *estate, int eflags)
Definition: nodeAgg.c:3165
void ExecAggRetrieveInstrumentation(AggState *node)
Definition: nodeAgg.c:4744
static TupleTableSlot * ExecAgg(PlanState *pstate)
Definition: nodeAgg.c:2150
static TupleTableSlot * project_aggregates(AggState *aggstate)
Definition: nodeAgg.c:1364
static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp)
Definition: nodeAgg.c:3002
struct HashAggSpill HashAggSpill
static void process_ordered_aggregate_multi(AggState *aggstate, AggStatePerTrans pertrans, AggStatePerGroup pergroupstate)
Definition: nodeAgg.c:952
void ExecReScanAgg(AggState *node)
Definition: nodeAgg.c:4366
int AggCheckCallContext(FunctionCallInfo fcinfo, MemoryContext *aggcontext)
Definition: nodeAgg.c:4513
static void advance_transition_function(AggState *aggstate, AggStatePerTrans pertrans, AggStatePerGroup pergroupstate)
Definition: nodeAgg.c:707
static void hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
Definition: nodeAgg.c:1909
static void finalize_aggregates(AggState *aggstate, AggStatePerAgg peraggs, AggStatePerGroup pergroup)
Definition: nodeAgg.c:1287
static void initialize_phase(AggState *aggstate, int newphase)
Definition: nodeAgg.c:478
Size hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
Definition: nodeAgg.c:1686
static void initialize_aggregates(AggState *aggstate, AggStatePerGroup *pergroups, int numReset)
Definition: nodeAgg.c:666
static TupleTableSlot * agg_retrieve_hash_table_in_memory(AggState *aggstate)
Definition: nodeAgg.c:2763
void ExecAggInitializeDSM(AggState *node, ParallelContext *pcxt)
Definition: nodeAgg.c:4706
static void finalize_aggregate(AggState *aggstate, AggStatePerAgg peragg, AggStatePerGroup pergroupstate, Datum *resultVal, bool *resultIsNull)
Definition: nodeAgg.c:1048
#define HASHAGG_MAX_PARTITIONS
Definition: nodeAgg.c:298
static void lookup_hash_entries(AggState *aggstate)
Definition: nodeAgg.c:2087
static TupleTableSlot * agg_retrieve_direct(AggState *aggstate)
Definition: nodeAgg.c:2186
static void hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
Definition: nodeAgg.c:1733
static void prepare_projection_slot(AggState *aggstate, TupleTableSlot *slot, int currentSet)
Definition: nodeAgg.c:1242
bool AggStateIsShared(FunctionCallInfo fcinfo)
Definition: nodeAgg.c:4617
static void build_pertrans_for_aggref(AggStatePerTrans pertrans, AggState *aggstate, EState *estate, Aggref *aggref, Oid transfn_oid, Oid aggtranstype, Oid aggserialfn, Oid aggdeserialfn, Datum initValue, bool initValueIsNull, Oid *inputTypes, int numArguments)
Definition: nodeAgg.c:4030
Aggref * AggGetAggref(FunctionCallInfo fcinfo)
Definition: nodeAgg.c:4557
#define CHUNKHDRSZ
Definition: nodeAgg.c:321
static TupleTableSlot * agg_retrieve_hash_table(AggState *aggstate)
Definition: nodeAgg.c:2738
static void process_ordered_aggregate_single(AggState *aggstate, AggStatePerTrans pertrans, AggStatePerGroup pergroupstate)
Definition: nodeAgg.c:851
static void advance_aggregates(AggState *aggstate)
Definition: nodeAgg.c:819
static void prepare_hash_slot(AggStatePerHash perhash, TupleTableSlot *inputslot, TupleTableSlot *hashslot)
Definition: nodeAgg.c:1197
static void build_hash_tables(AggState *aggstate)
Definition: nodeAgg.c:1461
void ExecEndAgg(AggState *node)
Definition: nodeAgg.c:4296
#define HASHAGG_READ_BUFFER_SIZE
Definition: nodeAgg.c:306
static void hashagg_reset_spill_state(AggState *aggstate)
Definition: nodeAgg.c:3125
static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill, TupleTableSlot *inputslot, uint32 hash)
Definition: nodeAgg.c:2917
static void select_current_set(AggState *aggstate, int setno, bool is_hash)
Definition: nodeAgg.c:456
static void finalize_partialaggregate(AggState *aggstate, AggStatePerAgg peragg, AggStatePerGroup pergroupstate, Datum *resultVal, bool *resultIsNull)
Definition: nodeAgg.c:1143
static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits, double input_groups, double hashentrysize)
Definition: nodeAgg.c:2886
#define HASHAGG_MIN_PARTITIONS
Definition: nodeAgg.c:297
void hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits, Size *mem_limit, uint64 *ngroups_limit, int *num_partitions)
Definition: nodeAgg.c:1790
MemoryContext AggGetTempMemoryContext(FunctionCallInfo fcinfo)
Definition: nodeAgg.c:4591
#define HASHAGG_PARTITION_FACTOR
Definition: nodeAgg.c:296
static HashAggBatch * hashagg_batch_new(LogicalTape *input_tape, int setno, int64 input_tuples, double input_card, int used_bits)
Definition: nodeAgg.c:2983
#define HASHAGG_WRITE_BUFFER_SIZE
Definition: nodeAgg.c:307
static int hash_choose_num_partitions(double input_groups, double hashentrysize, int used_bits, int *log2_npartitions)
Definition: nodeAgg.c:1983
struct AggStatePerGroupData AggStatePerGroupData
Oid exprCollation(const Node *expr)
Definition: nodeFuncs.c:764
#define expression_tree_walker(n, w, c)
Definition: nodeFuncs.h:151
size_t get_hash_memory_limit(void)
Definition: nodeHash.c:3390
#define DO_AGGSPLIT_SKIPFINAL(as)
Definition: nodes.h:394
#define IsA(nodeptr, _type_)
Definition: nodes.h:179
#define DO_AGGSPLIT_DESERIALIZE(as)
Definition: nodes.h:396
#define DO_AGGSPLIT_COMBINE(as)
Definition: nodes.h:393
@ AGG_SORTED
Definition: nodes.h:363
@ AGG_HASHED
Definition: nodes.h:364
@ AGG_MIXED
Definition: nodes.h:365
@ AGG_PLAIN
Definition: nodes.h:362
#define DO_AGGSPLIT_SERIALIZE(as)
Definition: nodes.h:395
#define makeNode(_type_)
Definition: nodes.h:176
#define castNode(_type_, nodeptr)
Definition: nodes.h:197
#define InvokeFunctionExecuteHook(objectId)
Definition: objectaccess.h:213
static MemoryContext MemoryContextSwitchTo(MemoryContext context)
Definition: palloc.h:138
void build_aggregate_finalfn_expr(Oid *agg_input_types, int num_finalfn_inputs, Oid agg_state_type, Oid agg_result_type, Oid agg_input_collation, Oid finalfn_oid, Expr **finalfnexpr)
Definition: parse_agg.c:2123
void build_aggregate_deserialfn_expr(Oid deserialfn_oid, Expr **deserialfnexpr)
Definition: parse_agg.c:2099
void build_aggregate_transfn_expr(Oid *agg_input_types, int agg_num_inputs, int agg_num_direct_inputs, bool agg_variadic, Oid agg_state_type, Oid agg_input_collation, Oid transfn_oid, Oid invtransfn_oid, Expr **transfnexpr, Expr **invtransfnexpr)
Definition: parse_agg.c:2015
int get_aggregate_argtypes(Aggref *aggref, Oid *inputTypes)
Definition: parse_agg.c:1895
void build_aggregate_serialfn_expr(Oid serialfn_oid, Expr **serialfnexpr)
Definition: parse_agg.c:2076
bool IsBinaryCoercible(Oid srctype, Oid targettype)
@ OBJECT_AGGREGATE
Definition: parsenodes.h:1970
@ OBJECT_FUNCTION
Definition: parsenodes.h:1988
#define ACL_EXECUTE
Definition: parsenodes.h:90
FormData_pg_aggregate * Form_pg_aggregate
Definition: pg_aggregate.h:109
int16 attnum
Definition: pg_attribute.h:83
FormData_pg_attribute * Form_pg_attribute
Definition: pg_attribute.h:207
void * arg
#define FUNC_MAX_ARGS
#define lfirst(lc)
Definition: pg_list.h:172
#define llast(l)
Definition: pg_list.h:198
static int list_length(const List *l)
Definition: pg_list.h:152
#define NIL
Definition: pg_list.h:68
#define lfirst_int(lc)
Definition: pg_list.h:173
#define linitial_int(l)
Definition: pg_list.h:179
static void * list_nth(const List *list, int n)
Definition: pg_list.h:299
#define list_nth_node(type, list, n)
Definition: pg_list.h:327
FormData_pg_proc * Form_pg_proc
Definition: pg_proc.h:136
#define outerPlan(node)
Definition: plannodes.h:186
static bool DatumGetBool(Datum X)
Definition: postgres.h:90
uintptr_t Datum
Definition: postgres.h:64
static Datum ObjectIdGetDatum(Oid X)
Definition: postgres.h:252
static Pointer DatumGetPointer(Datum X)
Definition: postgres.h:312
#define InvalidOid
Definition: postgres_ext.h:36
unsigned int Oid
Definition: postgres_ext.h:31
#define OUTER_VAR
Definition: primnodes.h:212
static unsigned hash(unsigned *uv, int n)
Definition: rege_dfa.c:715
void shm_toc_insert(shm_toc *toc, uint64 key, void *address)
Definition: shm_toc.c:171
void * shm_toc_allocate(shm_toc *toc, Size nbytes)
Definition: shm_toc.c:88
void * shm_toc_lookup(shm_toc *toc, uint64 key, bool noError)
Definition: shm_toc.c:232
#define shm_toc_estimate_chunk(e, sz)
Definition: shm_toc.h:51
#define shm_toc_estimate_keys(e, cnt)
Definition: shm_toc.h:53
Size add_size(Size s1, Size s2)
Definition: shmem.c:502
Size mul_size(Size s1, Size s2)
Definition: shmem.c:519
FmgrInfo finalfn
Definition: nodeAgg.h:207
bool resulttypeByVal
Definition: nodeAgg.h:225
List * aggdirectargs
Definition: nodeAgg.h:218
Aggref * aggref
Definition: nodeAgg.h:195
int16 resulttypeLen
Definition: nodeAgg.h:224
FmgrInfo * hashfunctions
Definition: nodeAgg.h:314
TupleHashTable hashtable
Definition: nodeAgg.h:311
TupleTableSlot * hashslot
Definition: nodeAgg.h:313
TupleHashIterator hashiter
Definition: nodeAgg.h:312
AttrNumber * hashGrpColIdxHash
Definition: nodeAgg.h:320
AttrNumber * hashGrpColIdxInput
Definition: nodeAgg.h:319
Bitmapset ** grouped_cols
Definition: nodeAgg.h:285
ExprState * evaltrans
Definition: nodeAgg.h:291
ExprState * evaltrans_cache[2][2]
Definition: nodeAgg.h:299
ExprState ** eqfunctions
Definition: nodeAgg.h:286
AggStrategy aggstrategy
Definition: nodeAgg.h:282
bool * sortNullsFirst
Definition: nodeAgg.h:108
FmgrInfo serialfn
Definition: nodeAgg.h:89
FmgrInfo equalfnOne
Definition: nodeAgg.h:115
TupleDesc sortdesc
Definition: nodeAgg.h:143
TupleTableSlot * sortslot
Definition: nodeAgg.h:141
FmgrInfo transfn
Definition: nodeAgg.h:86
Aggref * aggref
Definition: nodeAgg.h:44
ExprState * equalfnMulti
Definition: nodeAgg.h:116
Tuplesortstate ** sortstates
Definition: nodeAgg.h:162
TupleTableSlot * uniqslot
Definition: nodeAgg.h:142
FmgrInfo deserialfn
Definition: nodeAgg.h:92
FunctionCallInfo deserialfn_fcinfo
Definition: nodeAgg.h:175
AttrNumber * sortColIdx
Definition: nodeAgg.h:105
FunctionCallInfo serialfn_fcinfo
Definition: nodeAgg.h:173
FunctionCallInfo transfn_fcinfo
Definition: nodeAgg.h:170
MemoryContext hash_metacxt
Definition: execnodes.h:2402
ScanState ss
Definition: execnodes.h:2360
Tuplesortstate * sort_out
Definition: execnodes.h:2393
uint64 hash_disk_used
Definition: execnodes.h:2420
AggStatePerGroup * all_pergroups
Definition: execnodes.h:2429
AggStatePerGroup * hash_pergroup
Definition: execnodes.h:2424
AggStatePerPhase phase
Definition: execnodes.h:2366
List * aggs
Definition: execnodes.h:2361
ExprContext * tmpcontext
Definition: execnodes.h:2373
int max_colno_needed
Definition: execnodes.h:2387
int hash_planned_partitions
Definition: execnodes.h:2414
HeapTuple grp_firstTuple
Definition: execnodes.h:2398
Size hash_mem_limit
Definition: execnodes.h:2412
ExprContext * curaggcontext
Definition: execnodes.h:2375
AggStatePerTrans curpertrans
Definition: execnodes.h:2378
bool table_filled
Definition: execnodes.h:2400
AggStatePerTrans pertrans
Definition: execnodes.h:2370
int current_set
Definition: execnodes.h:2383
struct LogicalTapeSet * hash_tapeset
Definition: execnodes.h:2403
AggStrategy aggstrategy
Definition: execnodes.h:2364
int numtrans
Definition: execnodes.h:2363
ExprContext * hashcontext
Definition: execnodes.h:2371
AggSplit aggsplit
Definition: execnodes.h:2365
int projected_set
Definition: execnodes.h:2381
SharedAggInfo * shared_info
Definition: execnodes.h:2432
uint64 hash_ngroups_limit
Definition: execnodes.h:2413
bool input_done
Definition: execnodes.h:2379
AggStatePerPhase phases
Definition: execnodes.h:2391
List * all_grouped_cols
Definition: execnodes.h:2385
bool hash_spill_mode
Definition: execnodes.h:2410
AggStatePerGroup * pergroups
Definition: execnodes.h:2396
AggStatePerHash perhash
Definition: execnodes.h:2423
Size hash_mem_peak
Definition: execnodes.h:2417
double hashentrysize
Definition: execnodes.h:2416
int numphases
Definition: execnodes.h:2367
uint64 hash_ngroups_current
Definition: execnodes.h:2418
int hash_batches_used
Definition: execnodes.h:2421
Tuplesortstate * sort_in
Definition: execnodes.h:2392
TupleTableSlot * hash_spill_wslot
Definition: execnodes.h:2407
AggStatePerAgg curperagg
Definition: execnodes.h:2376
struct HashAggSpill * hash_spills
Definition: execnodes.h:2404
TupleTableSlot * sort_slot
Definition: execnodes.h:2394
bool hash_ever_spilled
Definition: execnodes.h:2409
int numaggs
Definition: execnodes.h:2362
int num_hashes
Definition: execnodes.h:2401
AggStatePerAgg peragg
Definition: execnodes.h:2369
List * hash_batches
Definition: execnodes.h:2408
TupleTableSlot * hash_spill_rslot
Definition: execnodes.h:2406
int maxsets
Definition: execnodes.h:2390
ExprContext ** aggcontexts
Definition: execnodes.h:2372
Bitmapset * colnos_needed
Definition: execnodes.h:2386
int current_phase
Definition: execnodes.h:2368
bool all_cols_needed
Definition: execnodes.h:2388
bool agg_done
Definition: execnodes.h:2380
Bitmapset * grouped_cols
Definition: execnodes.h:2384
Definition: plannodes.h:998
AggSplit aggsplit
Definition: plannodes.h:1005
List * chain
Definition: plannodes.h:1032
long numGroups
Definition: plannodes.h:1018
List * groupingSets
Definition: plannodes.h:1029
Bitmapset * aggParams
Definition: plannodes.h:1024
Plan plan
Definition: plannodes.h:999
int numCols
Definition: plannodes.h:1008
uint64 transitionSpace
Definition: plannodes.h:1021
AggStrategy aggstrategy
Definition: plannodes.h:1002
Oid aggfnoid
Definition: primnodes.h:419
List * aggdistinct
Definition: primnodes.h:449
List * aggdirectargs
Definition: primnodes.h:440
List * args
Definition: primnodes.h:443
Expr * aggfilter
Definition: primnodes.h:452
List * aggorder
Definition: primnodes.h:446
MemoryContext es_query_cxt
Definition: execnodes.h:656
List * es_tupleTable
Definition: execnodes.h:658
MemoryContext ecxt_per_tuple_memory
Definition: execnodes.h:255
TupleTableSlot * ecxt_innertuple
Definition: execnodes.h:249
Datum * ecxt_aggvalues
Definition: execnodes.h:266
bool * ecxt_aggnulls
Definition: execnodes.h:268
TupleTableSlot * ecxt_outertuple
Definition: execnodes.h:251
Bitmapset * aggregated
Definition: nodeAgg.c:364
Bitmapset * unaggregated
Definition: nodeAgg.c:365
bool is_aggref
Definition: nodeAgg.c:363
bool fn_strict
Definition: fmgr.h:61
fmNodePtr context
Definition: fmgr.h:88
NullableDatum args[FLEXIBLE_ARRAY_MEMBER]
Definition: fmgr.h:95
int used_bits
Definition: nodeAgg.c:354
int64 input_tuples
Definition: nodeAgg.c:356
double input_card
Definition: nodeAgg.c:357
LogicalTape * input_tape
Definition: nodeAgg.c:355
hyperLogLogState * hll_card
Definition: nodeAgg.c:339
int64 * ntuples
Definition: nodeAgg.c:336
LogicalTape ** partitions
Definition: nodeAgg.c:335
int npartitions
Definition: nodeAgg.c:334
uint32 mask
Definition: nodeAgg.c:337
Definition: pg_list.h:54
Definition: nodes.h:129
Datum value
Definition: postgres.h:75
bool isnull
Definition: postgres.h:77
shm_toc_estimator estimator
Definition: parallel.h:42