PostgreSQL Source Code  git master
nodeAgg.c
Go to the documentation of this file.
1 /*-------------------------------------------------------------------------
2  *
3  * nodeAgg.c
4  * Routines to handle aggregate nodes.
5  *
6  * ExecAgg normally evaluates each aggregate in the following steps:
7  *
8  * transvalue = initcond
9  * foreach input_tuple do
10  * transvalue = transfunc(transvalue, input_value(s))
11  * result = finalfunc(transvalue, direct_argument(s))
12  *
13  * If a finalfunc is not supplied then the result is just the ending
14  * value of transvalue.
15  *
16  * Other behaviors can be selected by the "aggsplit" mode, which exists
17  * to support partial aggregation. It is possible to:
18  * * Skip running the finalfunc, so that the output is always the
19  * final transvalue state.
20  * * Substitute the combinefunc for the transfunc, so that transvalue
21  * states (propagated up from a child partial-aggregation step) are merged
22  * rather than processing raw input rows. (The statements below about
23  * the transfunc apply equally to the combinefunc, when it's selected.)
24  * * Apply the serializefunc to the output values (this only makes sense
25  * when skipping the finalfunc, since the serializefunc works on the
26  * transvalue data type).
27  * * Apply the deserializefunc to the input values (this only makes sense
28  * when using the combinefunc, for similar reasons).
29  * It is the planner's responsibility to connect up Agg nodes using these
30  * alternate behaviors in a way that makes sense, with partial aggregation
31  * results being fed to nodes that expect them.
32  *
33  * If a normal aggregate call specifies DISTINCT or ORDER BY, we sort the
34  * input tuples and eliminate duplicates (if required) before performing
35  * the above-depicted process. (However, we don't do that for ordered-set
36  * aggregates; their "ORDER BY" inputs are ordinary aggregate arguments
37  * so far as this module is concerned.) Note that partial aggregation
38  * is not supported in these cases, since we couldn't ensure global
39  * ordering or distinctness of the inputs.
40  *
41  * If transfunc is marked "strict" in pg_proc and initcond is NULL,
42  * then the first non-NULL input_value is assigned directly to transvalue,
43  * and transfunc isn't applied until the second non-NULL input_value.
44  * The agg's first input type and transtype must be the same in this case!
45  *
46  * If transfunc is marked "strict" then NULL input_values are skipped,
47  * keeping the previous transvalue. If transfunc is not strict then it
48  * is called for every input tuple and must deal with NULL initcond
49  * or NULL input_values for itself.
50  *
51  * If finalfunc is marked "strict" then it is not called when the
52  * ending transvalue is NULL, instead a NULL result is created
53  * automatically (this is just the usual handling of strict functions,
54  * of course). A non-strict finalfunc can make its own choice of
55  * what to return for a NULL ending transvalue.
56  *
57  * Ordered-set aggregates are treated specially in one other way: we
58  * evaluate any "direct" arguments and pass them to the finalfunc along
59  * with the transition value.
60  *
61  * A finalfunc can have additional arguments beyond the transvalue and
62  * any "direct" arguments, corresponding to the input arguments of the
63  * aggregate. These are always just passed as NULL. Such arguments may be
64  * needed to allow resolution of a polymorphic aggregate's result type.
65  *
66  * We compute aggregate input expressions and run the transition functions
67  * in a temporary econtext (aggstate->tmpcontext). This is reset at least
68  * once per input tuple, so when the transvalue datatype is
69  * pass-by-reference, we have to be careful to copy it into a longer-lived
70  * memory context, and free the prior value to avoid memory leakage. We
71  * store transvalues in another set of econtexts, aggstate->aggcontexts
72  * (one per grouping set, see below), which are also used for the hashtable
73  * structures in AGG_HASHED mode. These econtexts are rescanned, not just
74  * reset, at group boundaries so that aggregate transition functions can
75  * register shutdown callbacks via AggRegisterCallback.
76  *
77  * The node's regular econtext (aggstate->ss.ps.ps_ExprContext) is used to
78  * run finalize functions and compute the output tuple; this context can be
79  * reset once per output tuple.
80  *
81  * The executor's AggState node is passed as the fmgr "context" value in
82  * all transfunc and finalfunc calls. It is not recommended that the
83  * transition functions look at the AggState node directly, but they can
84  * use AggCheckCallContext() to verify that they are being called by
85  * nodeAgg.c (and not as ordinary SQL functions). The main reason a
86  * transition function might want to know this is so that it can avoid
87  * palloc'ing a fixed-size pass-by-ref transition value on every call:
88  * it can instead just scribble on and return its left input. Ordinarily
89  * it is completely forbidden for functions to modify pass-by-ref inputs,
90  * but in the aggregate case we know the left input is either the initial
91  * transition value or a previous function result, and in either case its
92  * value need not be preserved. See int8inc() for an example. Notice that
93  * the EEOP_AGG_PLAIN_TRANS step is coded to avoid a data copy step when
94  * the previous transition value pointer is returned. It is also possible
95  * to avoid repeated data copying when the transition value is an expanded
96  * object: to do that, the transition function must take care to return
97  * an expanded object that is in a child context of the memory context
98  * returned by AggCheckCallContext(). Also, some transition functions want
99  * to store working state in addition to the nominal transition value; they
100  * can use the memory context returned by AggCheckCallContext() to do that.
101  *
102  * Note: AggCheckCallContext() is available as of PostgreSQL 9.0. The
103  * AggState is available as context in earlier releases (back to 8.1),
104  * but direct examination of the node is needed to use it before 9.0.
105  *
106  * As of 9.4, aggregate transition functions can also use AggGetAggref()
107  * to get hold of the Aggref expression node for their aggregate call.
108  * This is mainly intended for ordered-set aggregates, which are not
109  * supported as window functions. (A regular aggregate function would
110  * need some fallback logic to use this, since there's no Aggref node
111  * for a window function.)
112  *
113  * Grouping sets:
114  *
115  * A list of grouping sets which is structurally equivalent to a ROLLUP
116  * clause (e.g. (a,b,c), (a,b), (a)) can be processed in a single pass over
117  * ordered data. We do this by keeping a separate set of transition values
118  * for each grouping set being concurrently processed; for each input tuple
119  * we update them all, and on group boundaries we reset those states
120  * (starting at the front of the list) whose grouping values have changed
121  * (the list of grouping sets is ordered from most specific to least
122  * specific).
123  *
124  * Where more complex grouping sets are used, we break them down into
125  * "phases", where each phase has a different sort order (except phase 0
126  * which is reserved for hashing). During each phase but the last, the
127  * input tuples are additionally stored in a tuplesort which is keyed to the
128  * next phase's sort order; during each phase but the first, the input
129  * tuples are drawn from the previously sorted data. (The sorting of the
130  * data for the first phase is handled by the planner, as it might be
131  * satisfied by underlying nodes.)
132  *
133  * Hashing can be mixed with sorted grouping. To do this, we have an
134  * AGG_MIXED strategy that populates the hashtables during the first sorted
135  * phase, and switches to reading them out after completing all sort phases.
136  * We can also support AGG_HASHED with multiple hash tables and no sorting
137  * at all.
138  *
139  * From the perspective of aggregate transition and final functions, the
140  * only issue regarding grouping sets is this: a single call site (flinfo)
141  * of an aggregate function may be used for updating several different
142  * transition values in turn. So the function must not cache in the flinfo
143  * anything which logically belongs as part of the transition value (most
144  * importantly, the memory context in which the transition value exists).
145  * The support API functions (AggCheckCallContext, AggRegisterCallback) are
146  * sensitive to the grouping set for which the aggregate function is
147  * currently being called.
148  *
149  * Plan structure:
150  *
151  * What we get from the planner is actually one "real" Agg node which is
152  * part of the plan tree proper, but which optionally has an additional list
153  * of Agg nodes hung off the side via the "chain" field. This is because an
154  * Agg node happens to be a convenient representation of all the data we
155  * need for grouping sets.
156  *
157  * For many purposes, we treat the "real" node as if it were just the first
158  * node in the chain. The chain must be ordered such that hashed entries
159  * come before sorted/plain entries; the real node is marked AGG_MIXED if
160  * there are both types present (in which case the real node describes one
161  * of the hashed groupings, other AGG_HASHED nodes may optionally follow in
162  * the chain, followed in turn by AGG_SORTED or (one) AGG_PLAIN node). If
163  * the real node is marked AGG_HASHED or AGG_SORTED, then all the chained
164  * nodes must be of the same type; if it is AGG_PLAIN, there can be no
165  * chained nodes.
166  *
167  * We collect all hashed nodes into a single "phase", numbered 0, and create
168  * a sorted phase (numbered 1..n) for each AGG_SORTED or AGG_PLAIN node.
169  * Phase 0 is allocated even if there are no hashes, but remains unused in
170  * that case.
171  *
172  * AGG_HASHED nodes actually refer to only a single grouping set each,
173  * because for each hashed grouping we need a separate grpColIdx and
174  * numGroups estimate. AGG_SORTED nodes represent a "rollup", a list of
175  * grouping sets that share a sort order. Each AGG_SORTED node other than
176  * the first one has an associated Sort node which describes the sort order
177  * to be used; the first sorted node takes its input from the outer subtree,
178  * which the planner has already arranged to provide ordered data.
179  *
180  * Memory and ExprContext usage:
181  *
182  * Because we're accumulating aggregate values across input rows, we need to
183  * use more memory contexts than just simple input/output tuple contexts.
184  * In fact, for a rollup, we need a separate context for each grouping set
185  * so that we can reset the inner (finer-grained) aggregates on their group
186  * boundaries while continuing to accumulate values for outer
187  * (coarser-grained) groupings. On top of this, we might be simultaneously
188  * populating hashtables; however, we only need one context for all the
189  * hashtables.
190  *
191  * So we create an array, aggcontexts, with an ExprContext for each grouping
192  * set in the largest rollup that we're going to process, and use the
193  * per-tuple memory context of those ExprContexts to store the aggregate
194  * transition values. hashcontext is the single context created to support
195  * all hash tables.
196  *
197  * Spilling To Disk
198  *
199  * When performing hash aggregation, if the hash table memory exceeds the
200  * limit (see hash_agg_check_limits()), we enter "spill mode". In spill
201  * mode, we advance the transition states only for groups already in the
202  * hash table. For tuples that would need to create a new hash table
203  * entries (and initialize new transition states), we instead spill them to
204  * disk to be processed later. The tuples are spilled in a partitioned
205  * manner, so that subsequent batches are smaller and less likely to exceed
206  * hash_mem (if a batch does exceed hash_mem, it must be spilled
207  * recursively).
208  *
209  * Spilled data is written to logical tapes. These provide better control
210  * over memory usage, disk space, and the number of files than if we were
211  * to use a BufFile for each spill. We don't know the number of tapes needed
212  * at the start of the algorithm (because it can recurse), so a tape set is
213  * allocated at the beginning, and individual tapes are created as needed.
214  * As a particular tape is read, logtape.c recycles its disk space. When a
215  * tape is read to completion, it is destroyed entirely.
216  *
217  * Tapes' buffers can take up substantial memory when many tapes are open at
218  * once. We only need one tape open at a time in read mode (using a buffer
219  * that's a multiple of BLCKSZ); but we need one tape open in write mode (each
220  * requiring a buffer of size BLCKSZ) for each partition.
221  *
222  * Note that it's possible for transition states to start small but then
223  * grow very large; for instance in the case of ARRAY_AGG. In such cases,
224  * it's still possible to significantly exceed hash_mem. We try to avoid
225  * this situation by estimating what will fit in the available memory, and
226  * imposing a limit on the number of groups separately from the amount of
227  * memory consumed.
228  *
229  * Transition / Combine function invocation:
230  *
231  * For performance reasons transition functions, including combine
232  * functions, aren't invoked one-by-one from nodeAgg.c after computing
233  * arguments using the expression evaluation engine. Instead
234  * ExecBuildAggTrans() builds one large expression that does both argument
235  * evaluation and transition function invocation. That avoids performance
236  * issues due to repeated uses of expression evaluation, complications due
237  * to filter expressions having to be evaluated early, and allows to JIT
238  * the entire expression into one native function.
239  *
240  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
241  * Portions Copyright (c) 1994, Regents of the University of California
242  *
243  * IDENTIFICATION
244  * src/backend/executor/nodeAgg.c
245  *
246  *-------------------------------------------------------------------------
247  */
248 
249 #include "postgres.h"
250 
251 #include "access/htup_details.h"
252 #include "access/parallel.h"
253 #include "catalog/objectaccess.h"
254 #include "catalog/pg_aggregate.h"
255 #include "catalog/pg_proc.h"
256 #include "catalog/pg_type.h"
257 #include "common/hashfn.h"
258 #include "executor/execExpr.h"
259 #include "executor/executor.h"
260 #include "executor/nodeAgg.h"
261 #include "lib/hyperloglog.h"
262 #include "miscadmin.h"
263 #include "nodes/nodeFuncs.h"
264 #include "optimizer/optimizer.h"
265 #include "parser/parse_agg.h"
266 #include "parser/parse_coerce.h"
267 #include "utils/acl.h"
268 #include "utils/builtins.h"
269 #include "utils/datum.h"
270 #include "utils/dynahash.h"
271 #include "utils/expandeddatum.h"
272 #include "utils/logtape.h"
273 #include "utils/lsyscache.h"
274 #include "utils/memutils.h"
275 #include "utils/syscache.h"
276 #include "utils/tuplesort.h"
277 
278 /*
279  * Control how many partitions are created when spilling HashAgg to
280  * disk.
281  *
282  * HASHAGG_PARTITION_FACTOR is multiplied by the estimated number of
283  * partitions needed such that each partition will fit in memory. The factor
284  * is set higher than one because there's not a high cost to having a few too
285  * many partitions, and it makes it less likely that a partition will need to
286  * be spilled recursively. Another benefit of having more, smaller partitions
287  * is that small hash tables may perform better than large ones due to memory
288  * caching effects.
289  *
290  * We also specify a min and max number of partitions per spill. Too few might
291  * mean a lot of wasted I/O from repeated spilling of the same tuples. Too
292  * many will result in lots of memory wasted buffering the spill files (which
293  * could instead be spent on a larger hash table).
294  */
295 #define HASHAGG_PARTITION_FACTOR 1.50
296 #define HASHAGG_MIN_PARTITIONS 4
297 #define HASHAGG_MAX_PARTITIONS 1024
298 
299 /*
300  * For reading from tapes, the buffer size must be a multiple of
301  * BLCKSZ. Larger values help when reading from multiple tapes concurrently,
302  * but that doesn't happen in HashAgg, so we simply use BLCKSZ. Writing to a
303  * tape always uses a buffer of size BLCKSZ.
304  */
305 #define HASHAGG_READ_BUFFER_SIZE BLCKSZ
306 #define HASHAGG_WRITE_BUFFER_SIZE BLCKSZ
307 
308 /*
309  * HyperLogLog is used for estimating the cardinality of the spilled tuples in
310  * a given partition. 5 bits corresponds to a size of about 32 bytes and a
311  * worst-case error of around 18%. That's effective enough to choose a
312  * reasonable number of partitions when recursing.
313  */
314 #define HASHAGG_HLL_BIT_WIDTH 5
315 
316 /*
317  * Estimate chunk overhead as a constant 16 bytes. XXX: should this be
318  * improved?
319  */
320 #define CHUNKHDRSZ 16
321 
322 /*
323  * Represents partitioned spill data for a single hashtable. Contains the
324  * necessary information to route tuples to the correct partition, and to
325  * transform the spilled data into new batches.
326  *
327  * The high bits are used for partition selection (when recursing, we ignore
328  * the bits that have already been used for partition selection at an earlier
329  * level).
330  */
331 typedef struct HashAggSpill
332 {
333  int npartitions; /* number of partitions */
334  LogicalTape **partitions; /* spill partition tapes */
335  int64 *ntuples; /* number of tuples in each partition */
336  uint32 mask; /* mask to find partition from hash value */
337  int shift; /* after masking, shift by this amount */
338  hyperLogLogState *hll_card; /* cardinality estimate for contents */
340 
341 /*
342  * Represents work to be done for one pass of hash aggregation (with only one
343  * grouping set).
344  *
345  * Also tracks the bits of the hash already used for partition selection by
346  * earlier iterations, so that this batch can use new bits. If all bits have
347  * already been used, no partitioning will be done (any spilled data will go
348  * to a single output tape).
349  */
350 typedef struct HashAggBatch
351 {
352  int setno; /* grouping set */
353  int used_bits; /* number of bits of hash already used */
354  LogicalTape *input_tape; /* input partition tape */
355  int64 input_tuples; /* number of tuples in this batch */
356  double input_card; /* estimated group cardinality */
358 
359 /* used to find referenced colnos */
360 typedef struct FindColsContext
361 {
362  bool is_aggref; /* is under an aggref */
363  Bitmapset *aggregated; /* column references under an aggref */
364  Bitmapset *unaggregated; /* other column references */
366 
367 static void select_current_set(AggState *aggstate, int setno, bool is_hash);
368 static void initialize_phase(AggState *aggstate, int newphase);
369 static TupleTableSlot *fetch_input_tuple(AggState *aggstate);
370 static void initialize_aggregates(AggState *aggstate,
371  AggStatePerGroup *pergroups,
372  int numReset);
373 static void advance_transition_function(AggState *aggstate,
374  AggStatePerTrans pertrans,
375  AggStatePerGroup pergroupstate);
376 static void advance_aggregates(AggState *aggstate);
377 static void process_ordered_aggregate_single(AggState *aggstate,
378  AggStatePerTrans pertrans,
379  AggStatePerGroup pergroupstate);
380 static void process_ordered_aggregate_multi(AggState *aggstate,
381  AggStatePerTrans pertrans,
382  AggStatePerGroup pergroupstate);
383 static void finalize_aggregate(AggState *aggstate,
384  AggStatePerAgg peragg,
385  AggStatePerGroup pergroupstate,
386  Datum *resultVal, bool *resultIsNull);
387 static void finalize_partialaggregate(AggState *aggstate,
388  AggStatePerAgg peragg,
389  AggStatePerGroup pergroupstate,
390  Datum *resultVal, bool *resultIsNull);
391 static inline void prepare_hash_slot(AggStatePerHash perhash,
392  TupleTableSlot *inputslot,
393  TupleTableSlot *hashslot);
394 static void prepare_projection_slot(AggState *aggstate,
395  TupleTableSlot *slot,
396  int currentSet);
397 static void finalize_aggregates(AggState *aggstate,
398  AggStatePerAgg peraggs,
399  AggStatePerGroup pergroup);
400 static TupleTableSlot *project_aggregates(AggState *aggstate);
401 static void find_cols(AggState *aggstate, Bitmapset **aggregated,
402  Bitmapset **unaggregated);
403 static bool find_cols_walker(Node *node, FindColsContext *context);
404 static void build_hash_tables(AggState *aggstate);
405 static void build_hash_table(AggState *aggstate, int setno, long nbuckets);
406 static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
407  bool nullcheck);
408 static long hash_choose_num_buckets(double hashentrysize,
409  long ngroups, Size memory);
410 static int hash_choose_num_partitions(double input_groups,
411  double hashentrysize,
412  int used_bits,
413  int *log2_npartitions);
414 static void initialize_hash_entry(AggState *aggstate,
415  TupleHashTable hashtable,
416  TupleHashEntry entry);
417 static void lookup_hash_entries(AggState *aggstate);
418 static TupleTableSlot *agg_retrieve_direct(AggState *aggstate);
419 static void agg_fill_hash_table(AggState *aggstate);
420 static bool agg_refill_hash_table(AggState *aggstate);
423 static void hash_agg_check_limits(AggState *aggstate);
424 static void hash_agg_enter_spill_mode(AggState *aggstate);
425 static void hash_agg_update_metrics(AggState *aggstate, bool from_tape,
426  int npartitions);
427 static void hashagg_finish_initial_spills(AggState *aggstate);
428 static void hashagg_reset_spill_state(AggState *aggstate);
429 static HashAggBatch *hashagg_batch_new(LogicalTape *input_tape, int setno,
430  int64 input_tuples, double input_card,
431  int used_bits);
432 static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp);
433 static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
434  int used_bits, double input_groups,
435  double hashentrysize);
436 static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
437  TupleTableSlot *inputslot, uint32 hash);
438 static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill,
439  int setno);
440 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
441 static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
442  AggState *aggstate, EState *estate,
443  Aggref *aggref, Oid transfn_oid,
444  Oid aggtranstype, Oid aggserialfn,
445  Oid aggdeserialfn, Datum initValue,
446  bool initValueIsNull, Oid *inputTypes,
447  int numArguments);
448 
449 
450 /*
451  * Select the current grouping set; affects current_set and
452  * curaggcontext.
453  */
454 static void
455 select_current_set(AggState *aggstate, int setno, bool is_hash)
456 {
457  /*
458  * When changing this, also adapt ExecAggPlainTransByVal() and
459  * ExecAggPlainTransByRef().
460  */
461  if (is_hash)
462  aggstate->curaggcontext = aggstate->hashcontext;
463  else
464  aggstate->curaggcontext = aggstate->aggcontexts[setno];
465 
466  aggstate->current_set = setno;
467 }
468 
469 /*
470  * Switch to phase "newphase", which must either be 0 or 1 (to reset) or
471  * current_phase + 1. Juggle the tuplesorts accordingly.
472  *
473  * Phase 0 is for hashing, which we currently handle last in the AGG_MIXED
474  * case, so when entering phase 0, all we need to do is drop open sorts.
475  */
476 static void
477 initialize_phase(AggState *aggstate, int newphase)
478 {
479  Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);
480 
481  /*
482  * Whatever the previous state, we're now done with whatever input
483  * tuplesort was in use.
484  */
485  if (aggstate->sort_in)
486  {
487  tuplesort_end(aggstate->sort_in);
488  aggstate->sort_in = NULL;
489  }
490 
491  if (newphase <= 1)
492  {
493  /*
494  * Discard any existing output tuplesort.
495  */
496  if (aggstate->sort_out)
497  {
498  tuplesort_end(aggstate->sort_out);
499  aggstate->sort_out = NULL;
500  }
501  }
502  else
503  {
504  /*
505  * The old output tuplesort becomes the new input one, and this is the
506  * right time to actually sort it.
507  */
508  aggstate->sort_in = aggstate->sort_out;
509  aggstate->sort_out = NULL;
510  Assert(aggstate->sort_in);
511  tuplesort_performsort(aggstate->sort_in);
512  }
513 
514  /*
515  * If this isn't the last phase, we need to sort appropriately for the
516  * next phase in sequence.
517  */
518  if (newphase > 0 && newphase < aggstate->numphases - 1)
519  {
520  Sort *sortnode = aggstate->phases[newphase + 1].sortnode;
521  PlanState *outerNode = outerPlanState(aggstate);
522  TupleDesc tupDesc = ExecGetResultType(outerNode);
523 
524  aggstate->sort_out = tuplesort_begin_heap(tupDesc,
525  sortnode->numCols,
526  sortnode->sortColIdx,
527  sortnode->sortOperators,
528  sortnode->collations,
529  sortnode->nullsFirst,
530  work_mem,
531  NULL, TUPLESORT_NONE);
532  }
533 
534  aggstate->current_phase = newphase;
535  aggstate->phase = &aggstate->phases[newphase];
536 }
537 
538 /*
539  * Fetch a tuple from either the outer plan (for phase 1) or from the sorter
540  * populated by the previous phase. Copy it to the sorter for the next phase
541  * if any.
542  *
543  * Callers cannot rely on memory for tuple in returned slot remaining valid
544  * past any subsequently fetched tuple.
545  */
546 static TupleTableSlot *
548 {
549  TupleTableSlot *slot;
550 
551  if (aggstate->sort_in)
552  {
553  /* make sure we check for interrupts in either path through here */
555  if (!tuplesort_gettupleslot(aggstate->sort_in, true, false,
556  aggstate->sort_slot, NULL))
557  return NULL;
558  slot = aggstate->sort_slot;
559  }
560  else
561  slot = ExecProcNode(outerPlanState(aggstate));
562 
563  if (!TupIsNull(slot) && aggstate->sort_out)
564  tuplesort_puttupleslot(aggstate->sort_out, slot);
565 
566  return slot;
567 }
568 
569 /*
570  * (Re)Initialize an individual aggregate.
571  *
572  * This function handles only one grouping set, already set in
573  * aggstate->current_set.
574  *
575  * When called, CurrentMemoryContext should be the per-query context.
576  */
577 static void
579  AggStatePerGroup pergroupstate)
580 {
581  /*
582  * Start a fresh sort operation for each DISTINCT/ORDER BY aggregate.
583  */
584  if (pertrans->aggsortrequired)
585  {
586  /*
587  * In case of rescan, maybe there could be an uncompleted sort
588  * operation? Clean it up if so.
589  */
590  if (pertrans->sortstates[aggstate->current_set])
591  tuplesort_end(pertrans->sortstates[aggstate->current_set]);
592 
593 
594  /*
595  * We use a plain Datum sorter when there's a single input column;
596  * otherwise sort the full tuple. (See comments for
597  * process_ordered_aggregate_single.)
598  */
599  if (pertrans->numInputs == 1)
600  {
601  Form_pg_attribute attr = TupleDescAttr(pertrans->sortdesc, 0);
602 
603  pertrans->sortstates[aggstate->current_set] =
604  tuplesort_begin_datum(attr->atttypid,
605  pertrans->sortOperators[0],
606  pertrans->sortCollations[0],
607  pertrans->sortNullsFirst[0],
608  work_mem, NULL, TUPLESORT_NONE);
609  }
610  else
611  pertrans->sortstates[aggstate->current_set] =
612  tuplesort_begin_heap(pertrans->sortdesc,
613  pertrans->numSortCols,
614  pertrans->sortColIdx,
615  pertrans->sortOperators,
616  pertrans->sortCollations,
617  pertrans->sortNullsFirst,
618  work_mem, NULL, TUPLESORT_NONE);
619  }
620 
621  /*
622  * (Re)set transValue to the initial value.
623  *
624  * Note that when the initial value is pass-by-ref, we must copy it (into
625  * the aggcontext) since we will pfree the transValue later.
626  */
627  if (pertrans->initValueIsNull)
628  pergroupstate->transValue = pertrans->initValue;
629  else
630  {
631  MemoryContext oldContext;
632 
634  pergroupstate->transValue = datumCopy(pertrans->initValue,
635  pertrans->transtypeByVal,
636  pertrans->transtypeLen);
637  MemoryContextSwitchTo(oldContext);
638  }
639  pergroupstate->transValueIsNull = pertrans->initValueIsNull;
640 
641  /*
642  * If the initial value for the transition state doesn't exist in the
643  * pg_aggregate table then we will let the first non-NULL value returned
644  * from the outer procNode become the initial value. (This is useful for
645  * aggregates like max() and min().) The noTransValue flag signals that we
646  * still need to do this.
647  */
648  pergroupstate->noTransValue = pertrans->initValueIsNull;
649 }
650 
651 /*
652  * Initialize all aggregate transition states for a new group of input values.
653  *
654  * If there are multiple grouping sets, we initialize only the first numReset
655  * of them (the grouping sets are ordered so that the most specific one, which
656  * is reset most often, is first). As a convenience, if numReset is 0, we
657  * reinitialize all sets.
658  *
659  * NB: This cannot be used for hash aggregates, as for those the grouping set
660  * number has to be specified from further up.
661  *
662  * When called, CurrentMemoryContext should be the per-query context.
663  */
664 static void
666  AggStatePerGroup *pergroups,
667  int numReset)
668 {
669  int transno;
670  int numGroupingSets = Max(aggstate->phase->numsets, 1);
671  int setno = 0;
672  int numTrans = aggstate->numtrans;
673  AggStatePerTrans transstates = aggstate->pertrans;
674 
675  if (numReset == 0)
676  numReset = numGroupingSets;
677 
678  for (setno = 0; setno < numReset; setno++)
679  {
680  AggStatePerGroup pergroup = pergroups[setno];
681 
682  select_current_set(aggstate, setno, false);
683 
684  for (transno = 0; transno < numTrans; transno++)
685  {
686  AggStatePerTrans pertrans = &transstates[transno];
687  AggStatePerGroup pergroupstate = &pergroup[transno];
688 
689  initialize_aggregate(aggstate, pertrans, pergroupstate);
690  }
691  }
692 }
693 
694 /*
695  * Given new input value(s), advance the transition function of one aggregate
696  * state within one grouping set only (already set in aggstate->current_set)
697  *
698  * The new values (and null flags) have been preloaded into argument positions
699  * 1 and up in pertrans->transfn_fcinfo, so that we needn't copy them again to
700  * pass to the transition function. We also expect that the static fields of
701  * the fcinfo are already initialized; that was done by ExecInitAgg().
702  *
703  * It doesn't matter which memory context this is called in.
704  */
705 static void
707  AggStatePerTrans pertrans,
708  AggStatePerGroup pergroupstate)
709 {
710  FunctionCallInfo fcinfo = pertrans->transfn_fcinfo;
711  MemoryContext oldContext;
712  Datum newVal;
713 
714  if (pertrans->transfn.fn_strict)
715  {
716  /*
717  * For a strict transfn, nothing happens when there's a NULL input; we
718  * just keep the prior transValue.
719  */
720  int numTransInputs = pertrans->numTransInputs;
721  int i;
722 
723  for (i = 1; i <= numTransInputs; i++)
724  {
725  if (fcinfo->args[i].isnull)
726  return;
727  }
728  if (pergroupstate->noTransValue)
729  {
730  /*
731  * transValue has not been initialized. This is the first non-NULL
732  * input value. We use it as the initial value for transValue. (We
733  * already checked that the agg's input type is binary-compatible
734  * with its transtype, so straight copy here is OK.)
735  *
736  * We must copy the datum into aggcontext if it is pass-by-ref. We
737  * do not need to pfree the old transValue, since it's NULL.
738  */
740  pergroupstate->transValue = datumCopy(fcinfo->args[1].value,
741  pertrans->transtypeByVal,
742  pertrans->transtypeLen);
743  pergroupstate->transValueIsNull = false;
744  pergroupstate->noTransValue = false;
745  MemoryContextSwitchTo(oldContext);
746  return;
747  }
748  if (pergroupstate->transValueIsNull)
749  {
750  /*
751  * Don't call a strict function with NULL inputs. Note it is
752  * possible to get here despite the above tests, if the transfn is
753  * strict *and* returned a NULL on a prior cycle. If that happens
754  * we will propagate the NULL all the way to the end.
755  */
756  return;
757  }
758  }
759 
760  /* We run the transition functions in per-input-tuple memory context */
761  oldContext = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
762 
763  /* set up aggstate->curpertrans for AggGetAggref() */
764  aggstate->curpertrans = pertrans;
765 
766  /*
767  * OK to call the transition function
768  */
769  fcinfo->args[0].value = pergroupstate->transValue;
770  fcinfo->args[0].isnull = pergroupstate->transValueIsNull;
771  fcinfo->isnull = false; /* just in case transfn doesn't set it */
772 
773  newVal = FunctionCallInvoke(fcinfo);
774 
775  aggstate->curpertrans = NULL;
776 
777  /*
778  * If pass-by-ref datatype, must copy the new value into aggcontext and
779  * free the prior transValue. But if transfn returned a pointer to its
780  * first input, we don't need to do anything.
781  *
782  * It's safe to compare newVal with pergroup->transValue without regard
783  * for either being NULL, because ExecAggCopyTransValue takes care to set
784  * transValue to 0 when NULL. Otherwise we could end up accidentally not
785  * reparenting, when the transValue has the same numerical value as
786  * newValue, despite being NULL. This is a somewhat hot path, making it
787  * undesirable to instead solve this with another branch for the common
788  * case of the transition function returning its (modified) input
789  * argument.
790  */
791  if (!pertrans->transtypeByVal &&
792  DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))
793  newVal = ExecAggCopyTransValue(aggstate, pertrans,
794  newVal, fcinfo->isnull,
795  pergroupstate->transValue,
796  pergroupstate->transValueIsNull);
797 
798  pergroupstate->transValue = newVal;
799  pergroupstate->transValueIsNull = fcinfo->isnull;
800 
801  MemoryContextSwitchTo(oldContext);
802 }
803 
804 /*
805  * Advance each aggregate transition state for one input tuple. The input
806  * tuple has been stored in tmpcontext->ecxt_outertuple, so that it is
807  * accessible to ExecEvalExpr.
808  *
809  * We have two sets of transition states to handle: one for sorted aggregation
810  * and one for hashed; we do them both here, to avoid multiple evaluation of
811  * the inputs.
812  *
813  * When called, CurrentMemoryContext should be the per-query context.
814  */
815 static void
817 {
818  bool dummynull;
819 
821  aggstate->tmpcontext,
822  &dummynull);
823 }
824 
825 /*
826  * Run the transition function for a DISTINCT or ORDER BY aggregate
827  * with only one input. This is called after we have completed
828  * entering all the input values into the sort object. We complete the
829  * sort, read out the values in sorted order, and run the transition
830  * function on each value (applying DISTINCT if appropriate).
831  *
832  * Note that the strictness of the transition function was checked when
833  * entering the values into the sort, so we don't check it again here;
834  * we just apply standard SQL DISTINCT logic.
835  *
836  * The one-input case is handled separately from the multi-input case
837  * for performance reasons: for single by-value inputs, such as the
838  * common case of count(distinct id), the tuplesort_getdatum code path
839  * is around 300% faster. (The speedup for by-reference types is less
840  * but still noticeable.)
841  *
842  * This function handles only one grouping set (already set in
843  * aggstate->current_set).
844  *
845  * When called, CurrentMemoryContext should be the per-query context.
846  */
847 static void
849  AggStatePerTrans pertrans,
850  AggStatePerGroup pergroupstate)
851 {
852  Datum oldVal = (Datum) 0;
853  bool oldIsNull = true;
854  bool haveOldVal = false;
855  MemoryContext workcontext = aggstate->tmpcontext->ecxt_per_tuple_memory;
856  MemoryContext oldContext;
857  bool isDistinct = (pertrans->numDistinctCols > 0);
858  Datum newAbbrevVal = (Datum) 0;
859  Datum oldAbbrevVal = (Datum) 0;
860  FunctionCallInfo fcinfo = pertrans->transfn_fcinfo;
861  Datum *newVal;
862  bool *isNull;
863 
864  Assert(pertrans->numDistinctCols < 2);
865 
866  tuplesort_performsort(pertrans->sortstates[aggstate->current_set]);
867 
868  /* Load the column into argument 1 (arg 0 will be transition value) */
869  newVal = &fcinfo->args[1].value;
870  isNull = &fcinfo->args[1].isnull;
871 
872  /*
873  * Note: if input type is pass-by-ref, the datums returned by the sort are
874  * freshly palloc'd in the per-query context, so we must be careful to
875  * pfree them when they are no longer needed.
876  */
877 
878  while (tuplesort_getdatum(pertrans->sortstates[aggstate->current_set],
879  true, false, newVal, isNull, &newAbbrevVal))
880  {
881  /*
882  * Clear and select the working context for evaluation of the equality
883  * function and transition function.
884  */
885  MemoryContextReset(workcontext);
886  oldContext = MemoryContextSwitchTo(workcontext);
887 
888  /*
889  * If DISTINCT mode, and not distinct from prior, skip it.
890  */
891  if (isDistinct &&
892  haveOldVal &&
893  ((oldIsNull && *isNull) ||
894  (!oldIsNull && !*isNull &&
895  oldAbbrevVal == newAbbrevVal &&
897  pertrans->aggCollation,
898  oldVal, *newVal)))))
899  {
900  MemoryContextSwitchTo(oldContext);
901  continue;
902  }
903  else
904  {
905  advance_transition_function(aggstate, pertrans, pergroupstate);
906 
907  MemoryContextSwitchTo(oldContext);
908 
909  /*
910  * Forget the old value, if any, and remember the new one for
911  * subsequent equality checks.
912  */
913  if (!pertrans->inputtypeByVal)
914  {
915  if (!oldIsNull)
916  pfree(DatumGetPointer(oldVal));
917  if (!*isNull)
918  oldVal = datumCopy(*newVal, pertrans->inputtypeByVal,
919  pertrans->inputtypeLen);
920  }
921  else
922  oldVal = *newVal;
923  oldAbbrevVal = newAbbrevVal;
924  oldIsNull = *isNull;
925  haveOldVal = true;
926  }
927  }
928 
929  if (!oldIsNull && !pertrans->inputtypeByVal)
930  pfree(DatumGetPointer(oldVal));
931 
932  tuplesort_end(pertrans->sortstates[aggstate->current_set]);
933  pertrans->sortstates[aggstate->current_set] = NULL;
934 }
935 
936 /*
937  * Run the transition function for a DISTINCT or ORDER BY aggregate
938  * with more than one input. This is called after we have completed
939  * entering all the input values into the sort object. We complete the
940  * sort, read out the values in sorted order, and run the transition
941  * function on each value (applying DISTINCT if appropriate).
942  *
943  * This function handles only one grouping set (already set in
944  * aggstate->current_set).
945  *
946  * When called, CurrentMemoryContext should be the per-query context.
947  */
948 static void
950  AggStatePerTrans pertrans,
951  AggStatePerGroup pergroupstate)
952 {
953  ExprContext *tmpcontext = aggstate->tmpcontext;
954  FunctionCallInfo fcinfo = pertrans->transfn_fcinfo;
955  TupleTableSlot *slot1 = pertrans->sortslot;
956  TupleTableSlot *slot2 = pertrans->uniqslot;
957  int numTransInputs = pertrans->numTransInputs;
958  int numDistinctCols = pertrans->numDistinctCols;
959  Datum newAbbrevVal = (Datum) 0;
960  Datum oldAbbrevVal = (Datum) 0;
961  bool haveOldValue = false;
962  TupleTableSlot *save = aggstate->tmpcontext->ecxt_outertuple;
963  int i;
964 
965  tuplesort_performsort(pertrans->sortstates[aggstate->current_set]);
966 
967  ExecClearTuple(slot1);
968  if (slot2)
969  ExecClearTuple(slot2);
970 
971  while (tuplesort_gettupleslot(pertrans->sortstates[aggstate->current_set],
972  true, true, slot1, &newAbbrevVal))
973  {
975 
976  tmpcontext->ecxt_outertuple = slot1;
977  tmpcontext->ecxt_innertuple = slot2;
978 
979  if (numDistinctCols == 0 ||
980  !haveOldValue ||
981  newAbbrevVal != oldAbbrevVal ||
982  !ExecQual(pertrans->equalfnMulti, tmpcontext))
983  {
984  /*
985  * Extract the first numTransInputs columns as datums to pass to
986  * the transfn.
987  */
988  slot_getsomeattrs(slot1, numTransInputs);
989 
990  /* Load values into fcinfo */
991  /* Start from 1, since the 0th arg will be the transition value */
992  for (i = 0; i < numTransInputs; i++)
993  {
994  fcinfo->args[i + 1].value = slot1->tts_values[i];
995  fcinfo->args[i + 1].isnull = slot1->tts_isnull[i];
996  }
997 
998  advance_transition_function(aggstate, pertrans, pergroupstate);
999 
1000  if (numDistinctCols > 0)
1001  {
1002  /* swap the slot pointers to retain the current tuple */
1003  TupleTableSlot *tmpslot = slot2;
1004 
1005  slot2 = slot1;
1006  slot1 = tmpslot;
1007  /* avoid ExecQual() calls by reusing abbreviated keys */
1008  oldAbbrevVal = newAbbrevVal;
1009  haveOldValue = true;
1010  }
1011  }
1012 
1013  /* Reset context each time */
1014  ResetExprContext(tmpcontext);
1015 
1016  ExecClearTuple(slot1);
1017  }
1018 
1019  if (slot2)
1020  ExecClearTuple(slot2);
1021 
1022  tuplesort_end(pertrans->sortstates[aggstate->current_set]);
1023  pertrans->sortstates[aggstate->current_set] = NULL;
1024 
1025  /* restore previous slot, potentially in use for grouping sets */
1026  tmpcontext->ecxt_outertuple = save;
1027 }
1028 
1029 /*
1030  * Compute the final value of one aggregate.
1031  *
1032  * This function handles only one grouping set (already set in
1033  * aggstate->current_set).
1034  *
1035  * The finalfn will be run, and the result delivered, in the
1036  * output-tuple context; caller's CurrentMemoryContext does not matter.
1037  * (But note that in some cases, such as when there is no finalfn, the
1038  * result might be a pointer to or into the agg's transition value.)
1039  *
1040  * The finalfn uses the state as set in the transno. This also might be
1041  * being used by another aggregate function, so it's important that we do
1042  * nothing destructive here. Moreover, the aggregate's final value might
1043  * get used in multiple places, so we mustn't return a R/W expanded datum.
1044  */
1045 static void
1047  AggStatePerAgg peragg,
1048  AggStatePerGroup pergroupstate,
1049  Datum *resultVal, bool *resultIsNull)
1050 {
1051  LOCAL_FCINFO(fcinfo, FUNC_MAX_ARGS);
1052  bool anynull = false;
1053  MemoryContext oldContext;
1054  int i;
1055  ListCell *lc;
1056  AggStatePerTrans pertrans = &aggstate->pertrans[peragg->transno];
1057 
1059 
1060  /*
1061  * Evaluate any direct arguments. We do this even if there's no finalfn
1062  * (which is unlikely anyway), so that side-effects happen as expected.
1063  * The direct arguments go into arg positions 1 and up, leaving position 0
1064  * for the transition state value.
1065  */
1066  i = 1;
1067  foreach(lc, peragg->aggdirectargs)
1068  {
1069  ExprState *expr = (ExprState *) lfirst(lc);
1070 
1071  fcinfo->args[i].value = ExecEvalExpr(expr,
1072  aggstate->ss.ps.ps_ExprContext,
1073  &fcinfo->args[i].isnull);
1074  anynull |= fcinfo->args[i].isnull;
1075  i++;
1076  }
1077 
1078  /*
1079  * Apply the agg's finalfn if one is provided, else return transValue.
1080  */
1081  if (OidIsValid(peragg->finalfn_oid))
1082  {
1083  int numFinalArgs = peragg->numFinalArgs;
1084 
1085  /* set up aggstate->curperagg for AggGetAggref() */
1086  aggstate->curperagg = peragg;
1087 
1088  InitFunctionCallInfoData(*fcinfo, &peragg->finalfn,
1089  numFinalArgs,
1090  pertrans->aggCollation,
1091  (void *) aggstate, NULL);
1092 
1093  /* Fill in the transition state value */
1094  fcinfo->args[0].value =
1095  MakeExpandedObjectReadOnly(pergroupstate->transValue,
1096  pergroupstate->transValueIsNull,
1097  pertrans->transtypeLen);
1098  fcinfo->args[0].isnull = pergroupstate->transValueIsNull;
1099  anynull |= pergroupstate->transValueIsNull;
1100 
1101  /* Fill any remaining argument positions with nulls */
1102  for (; i < numFinalArgs; i++)
1103  {
1104  fcinfo->args[i].value = (Datum) 0;
1105  fcinfo->args[i].isnull = true;
1106  anynull = true;
1107  }
1108 
1109  if (fcinfo->flinfo->fn_strict && anynull)
1110  {
1111  /* don't call a strict function with NULL inputs */
1112  *resultVal = (Datum) 0;
1113  *resultIsNull = true;
1114  }
1115  else
1116  {
1117  Datum result;
1118 
1119  result = FunctionCallInvoke(fcinfo);
1120  *resultIsNull = fcinfo->isnull;
1121  *resultVal = MakeExpandedObjectReadOnly(result,
1122  fcinfo->isnull,
1123  peragg->resulttypeLen);
1124  }
1125  aggstate->curperagg = NULL;
1126  }
1127  else
1128  {
1129  *resultVal =
1130  MakeExpandedObjectReadOnly(pergroupstate->transValue,
1131  pergroupstate->transValueIsNull,
1132  pertrans->transtypeLen);
1133  *resultIsNull = pergroupstate->transValueIsNull;
1134  }
1135 
1136  MemoryContextSwitchTo(oldContext);
1137 }
1138 
1139 /*
1140  * Compute the output value of one partial aggregate.
1141  *
1142  * The serialization function will be run, and the result delivered, in the
1143  * output-tuple context; caller's CurrentMemoryContext does not matter.
1144  */
1145 static void
1147  AggStatePerAgg peragg,
1148  AggStatePerGroup pergroupstate,
1149  Datum *resultVal, bool *resultIsNull)
1150 {
1151  AggStatePerTrans pertrans = &aggstate->pertrans[peragg->transno];
1152  MemoryContext oldContext;
1153 
1155 
1156  /*
1157  * serialfn_oid will be set if we must serialize the transvalue before
1158  * returning it
1159  */
1160  if (OidIsValid(pertrans->serialfn_oid))
1161  {
1162  /* Don't call a strict serialization function with NULL input. */
1163  if (pertrans->serialfn.fn_strict && pergroupstate->transValueIsNull)
1164  {
1165  *resultVal = (Datum) 0;
1166  *resultIsNull = true;
1167  }
1168  else
1169  {
1170  FunctionCallInfo fcinfo = pertrans->serialfn_fcinfo;
1171  Datum result;
1172 
1173  fcinfo->args[0].value =
1174  MakeExpandedObjectReadOnly(pergroupstate->transValue,
1175  pergroupstate->transValueIsNull,
1176  pertrans->transtypeLen);
1177  fcinfo->args[0].isnull = pergroupstate->transValueIsNull;
1178  fcinfo->isnull = false;
1179 
1180  result = FunctionCallInvoke(fcinfo);
1181  *resultIsNull = fcinfo->isnull;
1182  *resultVal = MakeExpandedObjectReadOnly(result,
1183  fcinfo->isnull,
1184  peragg->resulttypeLen);
1185  }
1186  }
1187  else
1188  {
1189  *resultVal =
1190  MakeExpandedObjectReadOnly(pergroupstate->transValue,
1191  pergroupstate->transValueIsNull,
1192  pertrans->transtypeLen);
1193  *resultIsNull = pergroupstate->transValueIsNull;
1194  }
1195 
1196  MemoryContextSwitchTo(oldContext);
1197 }
1198 
1199 /*
1200  * Extract the attributes that make up the grouping key into the
1201  * hashslot. This is necessary to compute the hash or perform a lookup.
1202  */
1203 static inline void
1205  TupleTableSlot *inputslot,
1206  TupleTableSlot *hashslot)
1207 {
1208  int i;
1209 
1210  /* transfer just the needed columns into hashslot */
1211  slot_getsomeattrs(inputslot, perhash->largestGrpColIdx);
1212  ExecClearTuple(hashslot);
1213 
1214  for (i = 0; i < perhash->numhashGrpCols; i++)
1215  {
1216  int varNumber = perhash->hashGrpColIdxInput[i] - 1;
1217 
1218  hashslot->tts_values[i] = inputslot->tts_values[varNumber];
1219  hashslot->tts_isnull[i] = inputslot->tts_isnull[varNumber];
1220  }
1221  ExecStoreVirtualTuple(hashslot);
1222 }
1223 
1224 /*
1225  * Prepare to finalize and project based on the specified representative tuple
1226  * slot and grouping set.
1227  *
1228  * In the specified tuple slot, force to null all attributes that should be
1229  * read as null in the context of the current grouping set. Also stash the
1230  * current group bitmap where GroupingExpr can get at it.
1231  *
1232  * This relies on three conditions:
1233  *
1234  * 1) Nothing is ever going to try and extract the whole tuple from this slot,
1235  * only reference it in evaluations, which will only access individual
1236  * attributes.
1237  *
1238  * 2) No system columns are going to need to be nulled. (If a system column is
1239  * referenced in a group clause, it is actually projected in the outer plan
1240  * tlist.)
1241  *
1242  * 3) Within a given phase, we never need to recover the value of an attribute
1243  * once it has been set to null.
1244  *
1245  * Poking into the slot this way is a bit ugly, but the consensus is that the
1246  * alternative was worse.
1247  */
1248 static void
1249 prepare_projection_slot(AggState *aggstate, TupleTableSlot *slot, int currentSet)
1250 {
1251  if (aggstate->phase->grouped_cols)
1252  {
1253  Bitmapset *grouped_cols = aggstate->phase->grouped_cols[currentSet];
1254 
1255  aggstate->grouped_cols = grouped_cols;
1256 
1257  if (TTS_EMPTY(slot))
1258  {
1259  /*
1260  * Force all values to be NULL if working on an empty input tuple
1261  * (i.e. an empty grouping set for which no input rows were
1262  * supplied).
1263  */
1264  ExecStoreAllNullTuple(slot);
1265  }
1266  else if (aggstate->all_grouped_cols)
1267  {
1268  ListCell *lc;
1269 
1270  /* all_grouped_cols is arranged in desc order */
1272 
1273  foreach(lc, aggstate->all_grouped_cols)
1274  {
1275  int attnum = lfirst_int(lc);
1276 
1277  if (!bms_is_member(attnum, grouped_cols))
1278  slot->tts_isnull[attnum - 1] = true;
1279  }
1280  }
1281  }
1282 }
1283 
1284 /*
1285  * Compute the final value of all aggregates for one group.
1286  *
1287  * This function handles only one grouping set at a time, which the caller must
1288  * have selected. It's also the caller's responsibility to adjust the supplied
1289  * pergroup parameter to point to the current set's transvalues.
1290  *
1291  * Results are stored in the output econtext aggvalues/aggnulls.
1292  */
1293 static void
1295  AggStatePerAgg peraggs,
1296  AggStatePerGroup pergroup)
1297 {
1298  ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
1299  Datum *aggvalues = econtext->ecxt_aggvalues;
1300  bool *aggnulls = econtext->ecxt_aggnulls;
1301  int aggno;
1302 
1303  /*
1304  * If there were any DISTINCT and/or ORDER BY aggregates, sort their
1305  * inputs and run the transition functions.
1306  */
1307  for (int transno = 0; transno < aggstate->numtrans; transno++)
1308  {
1309  AggStatePerTrans pertrans = &aggstate->pertrans[transno];
1310  AggStatePerGroup pergroupstate;
1311 
1312  pergroupstate = &pergroup[transno];
1313 
1314  if (pertrans->aggsortrequired)
1315  {
1316  Assert(aggstate->aggstrategy != AGG_HASHED &&
1317  aggstate->aggstrategy != AGG_MIXED);
1318 
1319  if (pertrans->numInputs == 1)
1321  pertrans,
1322  pergroupstate);
1323  else
1325  pertrans,
1326  pergroupstate);
1327  }
1328  else if (pertrans->numDistinctCols > 0 && pertrans->haslast)
1329  {
1330  pertrans->haslast = false;
1331 
1332  if (pertrans->numDistinctCols == 1)
1333  {
1334  if (!pertrans->inputtypeByVal && !pertrans->lastisnull)
1335  pfree(DatumGetPointer(pertrans->lastdatum));
1336 
1337  pertrans->lastisnull = false;
1338  pertrans->lastdatum = (Datum) 0;
1339  }
1340  else
1341  ExecClearTuple(pertrans->uniqslot);
1342  }
1343  }
1344 
1345  /*
1346  * Run the final functions.
1347  */
1348  for (aggno = 0; aggno < aggstate->numaggs; aggno++)
1349  {
1350  AggStatePerAgg peragg = &peraggs[aggno];
1351  int transno = peragg->transno;
1352  AggStatePerGroup pergroupstate;
1353 
1354  pergroupstate = &pergroup[transno];
1355 
1356  if (DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit))
1357  finalize_partialaggregate(aggstate, peragg, pergroupstate,
1358  &aggvalues[aggno], &aggnulls[aggno]);
1359  else
1360  finalize_aggregate(aggstate, peragg, pergroupstate,
1361  &aggvalues[aggno], &aggnulls[aggno]);
1362  }
1363 }
1364 
1365 /*
1366  * Project the result of a group (whose aggs have already been calculated by
1367  * finalize_aggregates). Returns the result slot, or NULL if no row is
1368  * projected (suppressed by qual).
1369  */
1370 static TupleTableSlot *
1372 {
1373  ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
1374 
1375  /*
1376  * Check the qual (HAVING clause); if the group does not match, ignore it.
1377  */
1378  if (ExecQual(aggstate->ss.ps.qual, econtext))
1379  {
1380  /*
1381  * Form and return projection tuple using the aggregate results and
1382  * the representative input tuple.
1383  */
1384  return ExecProject(aggstate->ss.ps.ps_ProjInfo);
1385  }
1386  else
1387  InstrCountFiltered1(aggstate, 1);
1388 
1389  return NULL;
1390 }
1391 
1392 /*
1393  * Find input-tuple columns that are needed, dividing them into
1394  * aggregated and unaggregated sets.
1395  */
1396 static void
1397 find_cols(AggState *aggstate, Bitmapset **aggregated, Bitmapset **unaggregated)
1398 {
1399  Agg *agg = (Agg *) aggstate->ss.ps.plan;
1401 
1402  context.is_aggref = false;
1403  context.aggregated = NULL;
1404  context.unaggregated = NULL;
1405 
1406  /* Examine tlist and quals */
1407  (void) find_cols_walker((Node *) agg->plan.targetlist, &context);
1408  (void) find_cols_walker((Node *) agg->plan.qual, &context);
1409 
1410  /* In some cases, grouping columns will not appear in the tlist */
1411  for (int i = 0; i < agg->numCols; i++)
1412  context.unaggregated = bms_add_member(context.unaggregated,
1413  agg->grpColIdx[i]);
1414 
1415  *aggregated = context.aggregated;
1416  *unaggregated = context.unaggregated;
1417 }
1418 
1419 static bool
1421 {
1422  if (node == NULL)
1423  return false;
1424  if (IsA(node, Var))
1425  {
1426  Var *var = (Var *) node;
1427 
1428  /* setrefs.c should have set the varno to OUTER_VAR */
1429  Assert(var->varno == OUTER_VAR);
1430  Assert(var->varlevelsup == 0);
1431  if (context->is_aggref)
1432  context->aggregated = bms_add_member(context->aggregated,
1433  var->varattno);
1434  else
1435  context->unaggregated = bms_add_member(context->unaggregated,
1436  var->varattno);
1437  return false;
1438  }
1439  if (IsA(node, Aggref))
1440  {
1441  Assert(!context->is_aggref);
1442  context->is_aggref = true;
1444  context->is_aggref = false;
1445  return false;
1446  }
1448  (void *) context);
1449 }
1450 
1451 /*
1452  * (Re-)initialize the hash table(s) to empty.
1453  *
1454  * To implement hashed aggregation, we need a hashtable that stores a
1455  * representative tuple and an array of AggStatePerGroup structs for each
1456  * distinct set of GROUP BY column values. We compute the hash key from the
1457  * GROUP BY columns. The per-group data is allocated in lookup_hash_entry(),
1458  * for each entry.
1459  *
1460  * We have a separate hashtable and associated perhash data structure for each
1461  * grouping set for which we're doing hashing.
1462  *
1463  * The contents of the hash tables always live in the hashcontext's per-tuple
1464  * memory context (there is only one of these for all tables together, since
1465  * they are all reset at the same time).
1466  */
1467 static void
1469 {
1470  int setno;
1471 
1472  for (setno = 0; setno < aggstate->num_hashes; ++setno)
1473  {
1474  AggStatePerHash perhash = &aggstate->perhash[setno];
1475  long nbuckets;
1476  Size memory;
1477 
1478  if (perhash->hashtable != NULL)
1479  {
1480  ResetTupleHashTable(perhash->hashtable);
1481  continue;
1482  }
1483 
1484  Assert(perhash->aggnode->numGroups > 0);
1485 
1486  memory = aggstate->hash_mem_limit / aggstate->num_hashes;
1487 
1488  /* choose reasonable number of buckets per hashtable */
1489  nbuckets = hash_choose_num_buckets(aggstate->hashentrysize,
1490  perhash->aggnode->numGroups,
1491  memory);
1492 
1493  build_hash_table(aggstate, setno, nbuckets);
1494  }
1495 
1496  aggstate->hash_ngroups_current = 0;
1497 }
1498 
1499 /*
1500  * Build a single hashtable for this grouping set.
1501  */
1502 static void
1503 build_hash_table(AggState *aggstate, int setno, long nbuckets)
1504 {
1505  AggStatePerHash perhash = &aggstate->perhash[setno];
1506  MemoryContext metacxt = aggstate->hash_metacxt;
1507  MemoryContext hashcxt = aggstate->hashcontext->ecxt_per_tuple_memory;
1508  MemoryContext tmpcxt = aggstate->tmpcontext->ecxt_per_tuple_memory;
1509  Size additionalsize;
1510 
1511  Assert(aggstate->aggstrategy == AGG_HASHED ||
1512  aggstate->aggstrategy == AGG_MIXED);
1513 
1514  /*
1515  * Used to make sure initial hash table allocation does not exceed
1516  * hash_mem. Note that the estimate does not include space for
1517  * pass-by-reference transition data values, nor for the representative
1518  * tuple of each group.
1519  */
1520  additionalsize = aggstate->numtrans * sizeof(AggStatePerGroupData);
1521 
1522  perhash->hashtable = BuildTupleHashTableExt(&aggstate->ss.ps,
1523  perhash->hashslot->tts_tupleDescriptor,
1524  perhash->numCols,
1525  perhash->hashGrpColIdxHash,
1526  perhash->eqfuncoids,
1527  perhash->hashfunctions,
1528  perhash->aggnode->grpCollations,
1529  nbuckets,
1530  additionalsize,
1531  metacxt,
1532  hashcxt,
1533  tmpcxt,
1534  DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit));
1535 }
1536 
1537 /*
1538  * Compute columns that actually need to be stored in hashtable entries. The
1539  * incoming tuples from the child plan node will contain grouping columns,
1540  * other columns referenced in our targetlist and qual, columns used to
1541  * compute the aggregate functions, and perhaps just junk columns we don't use
1542  * at all. Only columns of the first two types need to be stored in the
1543  * hashtable, and getting rid of the others can make the table entries
1544  * significantly smaller. The hashtable only contains the relevant columns,
1545  * and is packed/unpacked in lookup_hash_entry() / agg_retrieve_hash_table()
1546  * into the format of the normal input descriptor.
1547  *
1548  * Additional columns, in addition to the columns grouped by, come from two
1549  * sources: Firstly functionally dependent columns that we don't need to group
1550  * by themselves, and secondly ctids for row-marks.
1551  *
1552  * To eliminate duplicates, we build a bitmapset of the needed columns, and
1553  * then build an array of the columns included in the hashtable. We might
1554  * still have duplicates if the passed-in grpColIdx has them, which can happen
1555  * in edge cases from semijoins/distinct; these can't always be removed,
1556  * because it's not certain that the duplicate cols will be using the same
1557  * hash function.
1558  *
1559  * Note that the array is preserved over ExecReScanAgg, so we allocate it in
1560  * the per-query context (unlike the hash table itself).
1561  */
1562 static void
1564 {
1565  Bitmapset *base_colnos;
1566  Bitmapset *aggregated_colnos;
1567  TupleDesc scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
1568  List *outerTlist = outerPlanState(aggstate)->plan->targetlist;
1569  int numHashes = aggstate->num_hashes;
1570  EState *estate = aggstate->ss.ps.state;
1571  int j;
1572 
1573  /* Find Vars that will be needed in tlist and qual */
1574  find_cols(aggstate, &aggregated_colnos, &base_colnos);
1575  aggstate->colnos_needed = bms_union(base_colnos, aggregated_colnos);
1576  aggstate->max_colno_needed = 0;
1577  aggstate->all_cols_needed = true;
1578 
1579  for (int i = 0; i < scanDesc->natts; i++)
1580  {
1581  int colno = i + 1;
1582 
1583  if (bms_is_member(colno, aggstate->colnos_needed))
1584  aggstate->max_colno_needed = colno;
1585  else
1586  aggstate->all_cols_needed = false;
1587  }
1588 
1589  for (j = 0; j < numHashes; ++j)
1590  {
1591  AggStatePerHash perhash = &aggstate->perhash[j];
1592  Bitmapset *colnos = bms_copy(base_colnos);
1593  AttrNumber *grpColIdx = perhash->aggnode->grpColIdx;
1594  List *hashTlist = NIL;
1595  TupleDesc hashDesc;
1596  int maxCols;
1597  int i;
1598 
1599  perhash->largestGrpColIdx = 0;
1600 
1601  /*
1602  * If we're doing grouping sets, then some Vars might be referenced in
1603  * tlist/qual for the benefit of other grouping sets, but not needed
1604  * when hashing; i.e. prepare_projection_slot will null them out, so
1605  * there'd be no point storing them. Use prepare_projection_slot's
1606  * logic to determine which.
1607  */
1608  if (aggstate->phases[0].grouped_cols)
1609  {
1610  Bitmapset *grouped_cols = aggstate->phases[0].grouped_cols[j];
1611  ListCell *lc;
1612 
1613  foreach(lc, aggstate->all_grouped_cols)
1614  {
1615  int attnum = lfirst_int(lc);
1616 
1617  if (!bms_is_member(attnum, grouped_cols))
1618  colnos = bms_del_member(colnos, attnum);
1619  }
1620  }
1621 
1622  /*
1623  * Compute maximum number of input columns accounting for possible
1624  * duplications in the grpColIdx array, which can happen in some edge
1625  * cases where HashAggregate was generated as part of a semijoin or a
1626  * DISTINCT.
1627  */
1628  maxCols = bms_num_members(colnos) + perhash->numCols;
1629 
1630  perhash->hashGrpColIdxInput =
1631  palloc(maxCols * sizeof(AttrNumber));
1632  perhash->hashGrpColIdxHash =
1633  palloc(perhash->numCols * sizeof(AttrNumber));
1634 
1635  /* Add all the grouping columns to colnos */
1636  for (i = 0; i < perhash->numCols; i++)
1637  colnos = bms_add_member(colnos, grpColIdx[i]);
1638 
1639  /*
1640  * First build mapping for columns directly hashed. These are the
1641  * first, because they'll be accessed when computing hash values and
1642  * comparing tuples for exact matches. We also build simple mapping
1643  * for execGrouping, so it knows where to find the to-be-hashed /
1644  * compared columns in the input.
1645  */
1646  for (i = 0; i < perhash->numCols; i++)
1647  {
1648  perhash->hashGrpColIdxInput[i] = grpColIdx[i];
1649  perhash->hashGrpColIdxHash[i] = i + 1;
1650  perhash->numhashGrpCols++;
1651  /* delete already mapped columns */
1652  colnos = bms_del_member(colnos, grpColIdx[i]);
1653  }
1654 
1655  /* and add the remaining columns */
1656  i = -1;
1657  while ((i = bms_next_member(colnos, i)) >= 0)
1658  {
1659  perhash->hashGrpColIdxInput[perhash->numhashGrpCols] = i;
1660  perhash->numhashGrpCols++;
1661  }
1662 
1663  /* and build a tuple descriptor for the hashtable */
1664  for (i = 0; i < perhash->numhashGrpCols; i++)
1665  {
1666  int varNumber = perhash->hashGrpColIdxInput[i] - 1;
1667 
1668  hashTlist = lappend(hashTlist, list_nth(outerTlist, varNumber));
1669  perhash->largestGrpColIdx =
1670  Max(varNumber + 1, perhash->largestGrpColIdx);
1671  }
1672 
1673  hashDesc = ExecTypeFromTL(hashTlist);
1674 
1675  execTuplesHashPrepare(perhash->numCols,
1676  perhash->aggnode->grpOperators,
1677  &perhash->eqfuncoids,
1678  &perhash->hashfunctions);
1679  perhash->hashslot =
1680  ExecAllocTableSlot(&estate->es_tupleTable, hashDesc,
1682 
1683  list_free(hashTlist);
1684  bms_free(colnos);
1685  }
1686 
1687  bms_free(base_colnos);
1688 }
1689 
1690 /*
1691  * Estimate per-hash-table-entry overhead.
1692  */
1693 Size
1694 hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
1695 {
1696  Size tupleChunkSize;
1697  Size pergroupChunkSize;
1698  Size transitionChunkSize;
1699  Size tupleSize = (MAXALIGN(SizeofMinimalTupleHeader) +
1700  tupleWidth);
1701  Size pergroupSize = numTrans * sizeof(AggStatePerGroupData);
1702 
1703  tupleChunkSize = CHUNKHDRSZ + tupleSize;
1704 
1705  if (pergroupSize > 0)
1706  pergroupChunkSize = CHUNKHDRSZ + pergroupSize;
1707  else
1708  pergroupChunkSize = 0;
1709 
1710  if (transitionSpace > 0)
1711  transitionChunkSize = CHUNKHDRSZ + transitionSpace;
1712  else
1713  transitionChunkSize = 0;
1714 
1715  return
1716  sizeof(TupleHashEntryData) +
1717  tupleChunkSize +
1718  pergroupChunkSize +
1719  transitionChunkSize;
1720 }
1721 
1722 /*
1723  * hashagg_recompile_expressions()
1724  *
1725  * Identifies the right phase, compiles the right expression given the
1726  * arguments, and then sets phase->evalfunc to that expression.
1727  *
1728  * Different versions of the compiled expression are needed depending on
1729  * whether hash aggregation has spilled or not, and whether it's reading from
1730  * the outer plan or a tape. Before spilling to disk, the expression reads
1731  * from the outer plan and does not need to perform a NULL check. After
1732  * HashAgg begins to spill, new groups will not be created in the hash table,
1733  * and the AggStatePerGroup array may be NULL; therefore we need to add a null
1734  * pointer check to the expression. Then, when reading spilled data from a
1735  * tape, we change the outer slot type to be a fixed minimal tuple slot.
1736  *
1737  * It would be wasteful to recompile every time, so cache the compiled
1738  * expressions in the AggStatePerPhase, and reuse when appropriate.
1739  */
1740 static void
1741 hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
1742 {
1743  AggStatePerPhase phase;
1744  int i = minslot ? 1 : 0;
1745  int j = nullcheck ? 1 : 0;
1746 
1747  Assert(aggstate->aggstrategy == AGG_HASHED ||
1748  aggstate->aggstrategy == AGG_MIXED);
1749 
1750  if (aggstate->aggstrategy == AGG_HASHED)
1751  phase = &aggstate->phases[0];
1752  else /* AGG_MIXED */
1753  phase = &aggstate->phases[1];
1754 
1755  if (phase->evaltrans_cache[i][j] == NULL)
1756  {
1757  const TupleTableSlotOps *outerops = aggstate->ss.ps.outerops;
1758  bool outerfixed = aggstate->ss.ps.outeropsfixed;
1759  bool dohash = true;
1760  bool dosort = false;
1761 
1762  /*
1763  * If minslot is true, that means we are processing a spilled batch
1764  * (inside agg_refill_hash_table()), and we must not advance the
1765  * sorted grouping sets.
1766  */
1767  if (aggstate->aggstrategy == AGG_MIXED && !minslot)
1768  dosort = true;
1769 
1770  /* temporarily change the outerops while compiling the expression */
1771  if (minslot)
1772  {
1773  aggstate->ss.ps.outerops = &TTSOpsMinimalTuple;
1774  aggstate->ss.ps.outeropsfixed = true;
1775  }
1776 
1777  phase->evaltrans_cache[i][j] = ExecBuildAggTrans(aggstate, phase,
1778  dosort, dohash,
1779  nullcheck);
1780 
1781  /* change back */
1782  aggstate->ss.ps.outerops = outerops;
1783  aggstate->ss.ps.outeropsfixed = outerfixed;
1784  }
1785 
1786  phase->evaltrans = phase->evaltrans_cache[i][j];
1787 }
1788 
1789 /*
1790  * Set limits that trigger spilling to avoid exceeding hash_mem. Consider the
1791  * number of partitions we expect to create (if we do spill).
1792  *
1793  * There are two limits: a memory limit, and also an ngroups limit. The
1794  * ngroups limit becomes important when we expect transition values to grow
1795  * substantially larger than the initial value.
1796  */
1797 void
1798 hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
1799  Size *mem_limit, uint64 *ngroups_limit,
1800  int *num_partitions)
1801 {
1802  int npartitions;
1803  Size partition_mem;
1804  Size hash_mem_limit = get_hash_memory_limit();
1805 
1806  /* if not expected to spill, use all of hash_mem */
1807  if (input_groups * hashentrysize <= hash_mem_limit)
1808  {
1809  if (num_partitions != NULL)
1810  *num_partitions = 0;
1811  *mem_limit = hash_mem_limit;
1812  *ngroups_limit = hash_mem_limit / hashentrysize;
1813  return;
1814  }
1815 
1816  /*
1817  * Calculate expected memory requirements for spilling, which is the size
1818  * of the buffers needed for all the tapes that need to be open at once.
1819  * Then, subtract that from the memory available for holding hash tables.
1820  */
1821  npartitions = hash_choose_num_partitions(input_groups,
1822  hashentrysize,
1823  used_bits,
1824  NULL);
1825  if (num_partitions != NULL)
1826  *num_partitions = npartitions;
1827 
1828  partition_mem =
1830  HASHAGG_WRITE_BUFFER_SIZE * npartitions;
1831 
1832  /*
1833  * Don't set the limit below 3/4 of hash_mem. In that case, we are at the
1834  * minimum number of partitions, so we aren't going to dramatically exceed
1835  * work mem anyway.
1836  */
1837  if (hash_mem_limit > 4 * partition_mem)
1838  *mem_limit = hash_mem_limit - partition_mem;
1839  else
1840  *mem_limit = hash_mem_limit * 0.75;
1841 
1842  if (*mem_limit > hashentrysize)
1843  *ngroups_limit = *mem_limit / hashentrysize;
1844  else
1845  *ngroups_limit = 1;
1846 }
1847 
1848 /*
1849  * hash_agg_check_limits
1850  *
1851  * After adding a new group to the hash table, check whether we need to enter
1852  * spill mode. Allocations may happen without adding new groups (for instance,
1853  * if the transition state size grows), so this check is imperfect.
1854  */
1855 static void
1857 {
1858  uint64 ngroups = aggstate->hash_ngroups_current;
1859  Size meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt,
1860  true);
1862  true);
1863 
1864  /*
1865  * Don't spill unless there's at least one group in the hash table so we
1866  * can be sure to make progress even in edge cases.
1867  */
1868  if (aggstate->hash_ngroups_current > 0 &&
1869  (meta_mem + hashkey_mem > aggstate->hash_mem_limit ||
1870  ngroups > aggstate->hash_ngroups_limit))
1871  {
1872  hash_agg_enter_spill_mode(aggstate);
1873  }
1874 }
1875 
1876 /*
1877  * Enter "spill mode", meaning that no new groups are added to any of the hash
1878  * tables. Tuples that would create a new group are instead spilled, and
1879  * processed later.
1880  */
1881 static void
1883 {
1884  aggstate->hash_spill_mode = true;
1885  hashagg_recompile_expressions(aggstate, aggstate->table_filled, true);
1886 
1887  if (!aggstate->hash_ever_spilled)
1888  {
1889  Assert(aggstate->hash_tapeset == NULL);
1890  Assert(aggstate->hash_spills == NULL);
1891 
1892  aggstate->hash_ever_spilled = true;
1893 
1894  aggstate->hash_tapeset = LogicalTapeSetCreate(true, NULL, -1);
1895 
1896  aggstate->hash_spills = palloc(sizeof(HashAggSpill) * aggstate->num_hashes);
1897 
1898  for (int setno = 0; setno < aggstate->num_hashes; setno++)
1899  {
1900  AggStatePerHash perhash = &aggstate->perhash[setno];
1901  HashAggSpill *spill = &aggstate->hash_spills[setno];
1902 
1903  hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
1904  perhash->aggnode->numGroups,
1905  aggstate->hashentrysize);
1906  }
1907  }
1908 }
1909 
1910 /*
1911  * Update metrics after filling the hash table.
1912  *
1913  * If reading from the outer plan, from_tape should be false; if reading from
1914  * another tape, from_tape should be true.
1915  */
1916 static void
1917 hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
1918 {
1919  Size meta_mem;
1920  Size hashkey_mem;
1921  Size buffer_mem;
1922  Size total_mem;
1923 
1924  if (aggstate->aggstrategy != AGG_MIXED &&
1925  aggstate->aggstrategy != AGG_HASHED)
1926  return;
1927 
1928  /* memory for the hash table itself */
1929  meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
1930 
1931  /* memory for the group keys and transition states */
1932  hashkey_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
1933 
1934  /* memory for read/write tape buffers, if spilled */
1935  buffer_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
1936  if (from_tape)
1937  buffer_mem += HASHAGG_READ_BUFFER_SIZE;
1938 
1939  /* update peak mem */
1940  total_mem = meta_mem + hashkey_mem + buffer_mem;
1941  if (total_mem > aggstate->hash_mem_peak)
1942  aggstate->hash_mem_peak = total_mem;
1943 
1944  /* update disk usage */
1945  if (aggstate->hash_tapeset != NULL)
1946  {
1947  uint64 disk_used = LogicalTapeSetBlocks(aggstate->hash_tapeset) * (BLCKSZ / 1024);
1948 
1949  if (aggstate->hash_disk_used < disk_used)
1950  aggstate->hash_disk_used = disk_used;
1951  }
1952 
1953  /* update hashentrysize estimate based on contents */
1954  if (aggstate->hash_ngroups_current > 0)
1955  {
1956  aggstate->hashentrysize =
1957  sizeof(TupleHashEntryData) +
1958  (hashkey_mem / (double) aggstate->hash_ngroups_current);
1959  }
1960 }
1961 
1962 /*
1963  * Choose a reasonable number of buckets for the initial hash table size.
1964  */
1965 static long
1966 hash_choose_num_buckets(double hashentrysize, long ngroups, Size memory)
1967 {
1968  long max_nbuckets;
1969  long nbuckets = ngroups;
1970 
1971  max_nbuckets = memory / hashentrysize;
1972 
1973  /*
1974  * Underestimating is better than overestimating. Too many buckets crowd
1975  * out space for group keys and transition state values.
1976  */
1977  max_nbuckets >>= 1;
1978 
1979  if (nbuckets > max_nbuckets)
1980  nbuckets = max_nbuckets;
1981 
1982  return Max(nbuckets, 1);
1983 }
1984 
1985 /*
1986  * Determine the number of partitions to create when spilling, which will
1987  * always be a power of two. If log2_npartitions is non-NULL, set
1988  * *log2_npartitions to the log2() of the number of partitions.
1989  */
1990 static int
1991 hash_choose_num_partitions(double input_groups, double hashentrysize,
1992  int used_bits, int *log2_npartitions)
1993 {
1994  Size hash_mem_limit = get_hash_memory_limit();
1995  double partition_limit;
1996  double mem_wanted;
1997  double dpartitions;
1998  int npartitions;
1999  int partition_bits;
2000 
2001  /*
2002  * Avoid creating so many partitions that the memory requirements of the
2003  * open partition files are greater than 1/4 of hash_mem.
2004  */
2005  partition_limit =
2006  (hash_mem_limit * 0.25 - HASHAGG_READ_BUFFER_SIZE) /
2008 
2009  mem_wanted = HASHAGG_PARTITION_FACTOR * input_groups * hashentrysize;
2010 
2011  /* make enough partitions so that each one is likely to fit in memory */
2012  dpartitions = 1 + (mem_wanted / hash_mem_limit);
2013 
2014  if (dpartitions > partition_limit)
2015  dpartitions = partition_limit;
2016 
2017  if (dpartitions < HASHAGG_MIN_PARTITIONS)
2018  dpartitions = HASHAGG_MIN_PARTITIONS;
2019  if (dpartitions > HASHAGG_MAX_PARTITIONS)
2020  dpartitions = HASHAGG_MAX_PARTITIONS;
2021 
2022  /* HASHAGG_MAX_PARTITIONS limit makes this safe */
2023  npartitions = (int) dpartitions;
2024 
2025  /* ceil(log2(npartitions)) */
2026  partition_bits = my_log2(npartitions);
2027 
2028  /* make sure that we don't exhaust the hash bits */
2029  if (partition_bits + used_bits >= 32)
2030  partition_bits = 32 - used_bits;
2031 
2032  if (log2_npartitions != NULL)
2033  *log2_npartitions = partition_bits;
2034 
2035  /* number of partitions will be a power of two */
2036  npartitions = 1 << partition_bits;
2037 
2038  return npartitions;
2039 }
2040 
2041 /*
2042  * Initialize a freshly-created TupleHashEntry.
2043  */
2044 static void
2046  TupleHashEntry entry)
2047 {
2048  AggStatePerGroup pergroup;
2049  int transno;
2050 
2051  aggstate->hash_ngroups_current++;
2052  hash_agg_check_limits(aggstate);
2053 
2054  /* no need to allocate or initialize per-group state */
2055  if (aggstate->numtrans == 0)
2056  return;
2057 
2058  pergroup = (AggStatePerGroup)
2059  MemoryContextAlloc(hashtable->tablecxt,
2060  sizeof(AggStatePerGroupData) * aggstate->numtrans);
2061 
2062  entry->additional = pergroup;
2063 
2064  /*
2065  * Initialize aggregates for new tuple group, lookup_hash_entries()
2066  * already has selected the relevant grouping set.
2067  */
2068  for (transno = 0; transno < aggstate->numtrans; transno++)
2069  {
2070  AggStatePerTrans pertrans = &aggstate->pertrans[transno];
2071  AggStatePerGroup pergroupstate = &pergroup[transno];
2072 
2073  initialize_aggregate(aggstate, pertrans, pergroupstate);
2074  }
2075 }
2076 
2077 /*
2078  * Look up hash entries for the current tuple in all hashed grouping sets.
2079  *
2080  * Be aware that lookup_hash_entry can reset the tmpcontext.
2081  *
2082  * Some entries may be left NULL if we are in "spill mode". The same tuple
2083  * will belong to different groups for each grouping set, so may match a group
2084  * already in memory for one set and match a group not in memory for another
2085  * set. When in "spill mode", the tuple will be spilled for each grouping set
2086  * where it doesn't match a group in memory.
2087  *
2088  * NB: It's possible to spill the same tuple for several different grouping
2089  * sets. This may seem wasteful, but it's actually a trade-off: if we spill
2090  * the tuple multiple times for multiple grouping sets, it can be partitioned
2091  * for each grouping set, making the refilling of the hash table very
2092  * efficient.
2093  */
2094 static void
2096 {
2097  AggStatePerGroup *pergroup = aggstate->hash_pergroup;
2098  TupleTableSlot *outerslot = aggstate->tmpcontext->ecxt_outertuple;
2099  int setno;
2100 
2101  for (setno = 0; setno < aggstate->num_hashes; setno++)
2102  {
2103  AggStatePerHash perhash = &aggstate->perhash[setno];
2104  TupleHashTable hashtable = perhash->hashtable;
2105  TupleTableSlot *hashslot = perhash->hashslot;
2106  TupleHashEntry entry;
2107  uint32 hash;
2108  bool isnew = false;
2109  bool *p_isnew;
2110 
2111  /* if hash table already spilled, don't create new entries */
2112  p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
2113 
2114  select_current_set(aggstate, setno, true);
2115  prepare_hash_slot(perhash,
2116  outerslot,
2117  hashslot);
2118 
2119  entry = LookupTupleHashEntry(hashtable, hashslot,
2120  p_isnew, &hash);
2121 
2122  if (entry != NULL)
2123  {
2124  if (isnew)
2125  initialize_hash_entry(aggstate, hashtable, entry);
2126  pergroup[setno] = entry->additional;
2127  }
2128  else
2129  {
2130  HashAggSpill *spill = &aggstate->hash_spills[setno];
2131  TupleTableSlot *slot = aggstate->tmpcontext->ecxt_outertuple;
2132 
2133  if (spill->partitions == NULL)
2134  hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
2135  perhash->aggnode->numGroups,
2136  aggstate->hashentrysize);
2137 
2138  hashagg_spill_tuple(aggstate, spill, slot, hash);
2139  pergroup[setno] = NULL;
2140  }
2141  }
2142 }
2143 
2144 /*
2145  * ExecAgg -
2146  *
2147  * ExecAgg receives tuples from its outer subplan and aggregates over
2148  * the appropriate attribute for each aggregate function use (Aggref
2149  * node) appearing in the targetlist or qual of the node. The number
2150  * of tuples to aggregate over depends on whether grouped or plain
2151  * aggregation is selected. In grouped aggregation, we produce a result
2152  * row for each group; in plain aggregation there's a single result row
2153  * for the whole query. In either case, the value of each aggregate is
2154  * stored in the expression context to be used when ExecProject evaluates
2155  * the result tuple.
2156  */
2157 static TupleTableSlot *
2159 {
2160  AggState *node = castNode(AggState, pstate);
2161  TupleTableSlot *result = NULL;
2162 
2164 
2165  if (!node->agg_done)
2166  {
2167  /* Dispatch based on strategy */
2168  switch (node->phase->aggstrategy)
2169  {
2170  case AGG_HASHED:
2171  if (!node->table_filled)
2172  agg_fill_hash_table(node);
2173  /* FALLTHROUGH */
2174  case AGG_MIXED:
2175  result = agg_retrieve_hash_table(node);
2176  break;
2177  case AGG_PLAIN:
2178  case AGG_SORTED:
2179  result = agg_retrieve_direct(node);
2180  break;
2181  }
2182 
2183  if (!TupIsNull(result))
2184  return result;
2185  }
2186 
2187  return NULL;
2188 }
2189 
2190 /*
2191  * ExecAgg for non-hashed case
2192  */
2193 static TupleTableSlot *
2195 {
2196  Agg *node = aggstate->phase->aggnode;
2197  ExprContext *econtext;
2198  ExprContext *tmpcontext;
2199  AggStatePerAgg peragg;
2200  AggStatePerGroup *pergroups;
2201  TupleTableSlot *outerslot;
2202  TupleTableSlot *firstSlot;
2203  TupleTableSlot *result;
2204  bool hasGroupingSets = aggstate->phase->numsets > 0;
2205  int numGroupingSets = Max(aggstate->phase->numsets, 1);
2206  int currentSet;
2207  int nextSetSize;
2208  int numReset;
2209  int i;
2210 
2211  /*
2212  * get state info from node
2213  *
2214  * econtext is the per-output-tuple expression context
2215  *
2216  * tmpcontext is the per-input-tuple expression context
2217  */
2218  econtext = aggstate->ss.ps.ps_ExprContext;
2219  tmpcontext = aggstate->tmpcontext;
2220 
2221  peragg = aggstate->peragg;
2222  pergroups = aggstate->pergroups;
2223  firstSlot = aggstate->ss.ss_ScanTupleSlot;
2224 
2225  /*
2226  * We loop retrieving groups until we find one matching
2227  * aggstate->ss.ps.qual
2228  *
2229  * For grouping sets, we have the invariant that aggstate->projected_set
2230  * is either -1 (initial call) or the index (starting from 0) in
2231  * gset_lengths for the group we just completed (either by projecting a
2232  * row or by discarding it in the qual).
2233  */
2234  while (!aggstate->agg_done)
2235  {
2236  /*
2237  * Clear the per-output-tuple context for each group, as well as
2238  * aggcontext (which contains any pass-by-ref transvalues of the old
2239  * group). Some aggregate functions store working state in child
2240  * contexts; those now get reset automatically without us needing to
2241  * do anything special.
2242  *
2243  * We use ReScanExprContext not just ResetExprContext because we want
2244  * any registered shutdown callbacks to be called. That allows
2245  * aggregate functions to ensure they've cleaned up any non-memory
2246  * resources.
2247  */
2248  ReScanExprContext(econtext);
2249 
2250  /*
2251  * Determine how many grouping sets need to be reset at this boundary.
2252  */
2253  if (aggstate->projected_set >= 0 &&
2254  aggstate->projected_set < numGroupingSets)
2255  numReset = aggstate->projected_set + 1;
2256  else
2257  numReset = numGroupingSets;
2258 
2259  /*
2260  * numReset can change on a phase boundary, but that's OK; we want to
2261  * reset the contexts used in _this_ phase, and later, after possibly
2262  * changing phase, initialize the right number of aggregates for the
2263  * _new_ phase.
2264  */
2265 
2266  for (i = 0; i < numReset; i++)
2267  {
2268  ReScanExprContext(aggstate->aggcontexts[i]);
2269  }
2270 
2271  /*
2272  * Check if input is complete and there are no more groups to project
2273  * in this phase; move to next phase or mark as done.
2274  */
2275  if (aggstate->input_done == true &&
2276  aggstate->projected_set >= (numGroupingSets - 1))
2277  {
2278  if (aggstate->current_phase < aggstate->numphases - 1)
2279  {
2280  initialize_phase(aggstate, aggstate->current_phase + 1);
2281  aggstate->input_done = false;
2282  aggstate->projected_set = -1;
2283  numGroupingSets = Max(aggstate->phase->numsets, 1);
2284  node = aggstate->phase->aggnode;
2285  numReset = numGroupingSets;
2286  }
2287  else if (aggstate->aggstrategy == AGG_MIXED)
2288  {
2289  /*
2290  * Mixed mode; we've output all the grouped stuff and have
2291  * full hashtables, so switch to outputting those.
2292  */
2293  initialize_phase(aggstate, 0);
2294  aggstate->table_filled = true;
2296  &aggstate->perhash[0].hashiter);
2297  select_current_set(aggstate, 0, true);
2298  return agg_retrieve_hash_table(aggstate);
2299  }
2300  else
2301  {
2302  aggstate->agg_done = true;
2303  break;
2304  }
2305  }
2306 
2307  /*
2308  * Get the number of columns in the next grouping set after the last
2309  * projected one (if any). This is the number of columns to compare to
2310  * see if we reached the boundary of that set too.
2311  */
2312  if (aggstate->projected_set >= 0 &&
2313  aggstate->projected_set < (numGroupingSets - 1))
2314  nextSetSize = aggstate->phase->gset_lengths[aggstate->projected_set + 1];
2315  else
2316  nextSetSize = 0;
2317 
2318  /*----------
2319  * If a subgroup for the current grouping set is present, project it.
2320  *
2321  * We have a new group if:
2322  * - we're out of input but haven't projected all grouping sets
2323  * (checked above)
2324  * OR
2325  * - we already projected a row that wasn't from the last grouping
2326  * set
2327  * AND
2328  * - the next grouping set has at least one grouping column (since
2329  * empty grouping sets project only once input is exhausted)
2330  * AND
2331  * - the previous and pending rows differ on the grouping columns
2332  * of the next grouping set
2333  *----------
2334  */
2335  tmpcontext->ecxt_innertuple = econtext->ecxt_outertuple;
2336  if (aggstate->input_done ||
2337  (node->aggstrategy != AGG_PLAIN &&
2338  aggstate->projected_set != -1 &&
2339  aggstate->projected_set < (numGroupingSets - 1) &&
2340  nextSetSize > 0 &&
2341  !ExecQualAndReset(aggstate->phase->eqfunctions[nextSetSize - 1],
2342  tmpcontext)))
2343  {
2344  aggstate->projected_set += 1;
2345 
2346  Assert(aggstate->projected_set < numGroupingSets);
2347  Assert(nextSetSize > 0 || aggstate->input_done);
2348  }
2349  else
2350  {
2351  /*
2352  * We no longer care what group we just projected, the next
2353  * projection will always be the first (or only) grouping set
2354  * (unless the input proves to be empty).
2355  */
2356  aggstate->projected_set = 0;
2357 
2358  /*
2359  * If we don't already have the first tuple of the new group,
2360  * fetch it from the outer plan.
2361  */
2362  if (aggstate->grp_firstTuple == NULL)
2363  {
2364  outerslot = fetch_input_tuple(aggstate);
2365  if (!TupIsNull(outerslot))
2366  {
2367  /*
2368  * Make a copy of the first input tuple; we will use this
2369  * for comparisons (in group mode) and for projection.
2370  */
2371  aggstate->grp_firstTuple = ExecCopySlotHeapTuple(outerslot);
2372  }
2373  else
2374  {
2375  /* outer plan produced no tuples at all */
2376  if (hasGroupingSets)
2377  {
2378  /*
2379  * If there was no input at all, we need to project
2380  * rows only if there are grouping sets of size 0.
2381  * Note that this implies that there can't be any
2382  * references to ungrouped Vars, which would otherwise
2383  * cause issues with the empty output slot.
2384  *
2385  * XXX: This is no longer true, we currently deal with
2386  * this in finalize_aggregates().
2387  */
2388  aggstate->input_done = true;
2389 
2390  while (aggstate->phase->gset_lengths[aggstate->projected_set] > 0)
2391  {
2392  aggstate->projected_set += 1;
2393  if (aggstate->projected_set >= numGroupingSets)
2394  {
2395  /*
2396  * We can't set agg_done here because we might
2397  * have more phases to do, even though the
2398  * input is empty. So we need to restart the
2399  * whole outer loop.
2400  */
2401  break;
2402  }
2403  }
2404 
2405  if (aggstate->projected_set >= numGroupingSets)
2406  continue;
2407  }
2408  else
2409  {
2410  aggstate->agg_done = true;
2411  /* If we are grouping, we should produce no tuples too */
2412  if (node->aggstrategy != AGG_PLAIN)
2413  return NULL;
2414  }
2415  }
2416  }
2417 
2418  /*
2419  * Initialize working state for a new input tuple group.
2420  */
2421  initialize_aggregates(aggstate, pergroups, numReset);
2422 
2423  if (aggstate->grp_firstTuple != NULL)
2424  {
2425  /*
2426  * Store the copied first input tuple in the tuple table slot
2427  * reserved for it. The tuple will be deleted when it is
2428  * cleared from the slot.
2429  */
2431  firstSlot, true);
2432  aggstate->grp_firstTuple = NULL; /* don't keep two pointers */
2433 
2434  /* set up for first advance_aggregates call */
2435  tmpcontext->ecxt_outertuple = firstSlot;
2436 
2437  /*
2438  * Process each outer-plan tuple, and then fetch the next one,
2439  * until we exhaust the outer plan or cross a group boundary.
2440  */
2441  for (;;)
2442  {
2443  /*
2444  * During phase 1 only of a mixed agg, we need to update
2445  * hashtables as well in advance_aggregates.
2446  */
2447  if (aggstate->aggstrategy == AGG_MIXED &&
2448  aggstate->current_phase == 1)
2449  {
2450  lookup_hash_entries(aggstate);
2451  }
2452 
2453  /* Advance the aggregates (or combine functions) */
2454  advance_aggregates(aggstate);
2455 
2456  /* Reset per-input-tuple context after each tuple */
2457  ResetExprContext(tmpcontext);
2458 
2459  outerslot = fetch_input_tuple(aggstate);
2460  if (TupIsNull(outerslot))
2461  {
2462  /* no more outer-plan tuples available */
2463 
2464  /* if we built hash tables, finalize any spills */
2465  if (aggstate->aggstrategy == AGG_MIXED &&
2466  aggstate->current_phase == 1)
2468 
2469  if (hasGroupingSets)
2470  {
2471  aggstate->input_done = true;
2472  break;
2473  }
2474  else
2475  {
2476  aggstate->agg_done = true;
2477  break;
2478  }
2479  }
2480  /* set up for next advance_aggregates call */
2481  tmpcontext->ecxt_outertuple = outerslot;
2482 
2483  /*
2484  * If we are grouping, check whether we've crossed a group
2485  * boundary.
2486  */
2487  if (node->aggstrategy != AGG_PLAIN && node->numCols > 0)
2488  {
2489  tmpcontext->ecxt_innertuple = firstSlot;
2490  if (!ExecQual(aggstate->phase->eqfunctions[node->numCols - 1],
2491  tmpcontext))
2492  {
2493  aggstate->grp_firstTuple = ExecCopySlotHeapTuple(outerslot);
2494  break;
2495  }
2496  }
2497  }
2498  }
2499 
2500  /*
2501  * Use the representative input tuple for any references to
2502  * non-aggregated input columns in aggregate direct args, the node
2503  * qual, and the tlist. (If we are not grouping, and there are no
2504  * input rows at all, we will come here with an empty firstSlot
2505  * ... but if not grouping, there can't be any references to
2506  * non-aggregated input columns, so no problem.)
2507  */
2508  econtext->ecxt_outertuple = firstSlot;
2509  }
2510 
2511  Assert(aggstate->projected_set >= 0);
2512 
2513  currentSet = aggstate->projected_set;
2514 
2515  prepare_projection_slot(aggstate, econtext->ecxt_outertuple, currentSet);
2516 
2517  select_current_set(aggstate, currentSet, false);
2518 
2519  finalize_aggregates(aggstate,
2520  peragg,
2521  pergroups[currentSet]);
2522 
2523  /*
2524  * If there's no row to project right now, we must continue rather
2525  * than returning a null since there might be more groups.
2526  */
2527  result = project_aggregates(aggstate);
2528  if (result)
2529  return result;
2530  }
2531 
2532  /* No more groups */
2533  return NULL;
2534 }
2535 
2536 /*
2537  * ExecAgg for hashed case: read input and build hash table
2538  */
2539 static void
2541 {
2542  TupleTableSlot *outerslot;
2543  ExprContext *tmpcontext = aggstate->tmpcontext;
2544 
2545  /*
2546  * Process each outer-plan tuple, and then fetch the next one, until we
2547  * exhaust the outer plan.
2548  */
2549  for (;;)
2550  {
2551  outerslot = fetch_input_tuple(aggstate);
2552  if (TupIsNull(outerslot))
2553  break;
2554 
2555  /* set up for lookup_hash_entries and advance_aggregates */
2556  tmpcontext->ecxt_outertuple = outerslot;
2557 
2558  /* Find or build hashtable entries */
2559  lookup_hash_entries(aggstate);
2560 
2561  /* Advance the aggregates (or combine functions) */
2562  advance_aggregates(aggstate);
2563 
2564  /*
2565  * Reset per-input-tuple context after each tuple, but note that the
2566  * hash lookups do this too
2567  */
2568  ResetExprContext(aggstate->tmpcontext);
2569  }
2570 
2571  /* finalize spills, if any */
2573 
2574  aggstate->table_filled = true;
2575  /* Initialize to walk the first hash table */
2576  select_current_set(aggstate, 0, true);
2578  &aggstate->perhash[0].hashiter);
2579 }
2580 
2581 /*
2582  * If any data was spilled during hash aggregation, reset the hash table and
2583  * reprocess one batch of spilled data. After reprocessing a batch, the hash
2584  * table will again contain data, ready to be consumed by
2585  * agg_retrieve_hash_table_in_memory().
2586  *
2587  * Should only be called after all in memory hash table entries have been
2588  * finalized and emitted.
2589  *
2590  * Return false when input is exhausted and there's no more work to be done;
2591  * otherwise return true.
2592  */
2593 static bool
2595 {
2596  HashAggBatch *batch;
2597  AggStatePerHash perhash;
2598  HashAggSpill spill;
2599  LogicalTapeSet *tapeset = aggstate->hash_tapeset;
2600  bool spill_initialized = false;
2601 
2602  if (aggstate->hash_batches == NIL)
2603  return false;
2604 
2605  /* hash_batches is a stack, with the top item at the end of the list */
2606  batch = llast(aggstate->hash_batches);
2607  aggstate->hash_batches = list_delete_last(aggstate->hash_batches);
2608 
2609  hash_agg_set_limits(aggstate->hashentrysize, batch->input_card,
2610  batch->used_bits, &aggstate->hash_mem_limit,
2611  &aggstate->hash_ngroups_limit, NULL);
2612 
2613  /*
2614  * Each batch only processes one grouping set; set the rest to NULL so
2615  * that advance_aggregates() knows to ignore them. We don't touch
2616  * pergroups for sorted grouping sets here, because they will be needed if
2617  * we rescan later. The expressions for sorted grouping sets will not be
2618  * evaluated after we recompile anyway.
2619  */
2620  MemSet(aggstate->hash_pergroup, 0,
2621  sizeof(AggStatePerGroup) * aggstate->num_hashes);
2622 
2623  /* free memory and reset hash tables */
2624  ReScanExprContext(aggstate->hashcontext);
2625  for (int setno = 0; setno < aggstate->num_hashes; setno++)
2626  ResetTupleHashTable(aggstate->perhash[setno].hashtable);
2627 
2628  aggstate->hash_ngroups_current = 0;
2629 
2630  /*
2631  * In AGG_MIXED mode, hash aggregation happens in phase 1 and the output
2632  * happens in phase 0. So, we switch to phase 1 when processing a batch,
2633  * and back to phase 0 after the batch is done.
2634  */
2635  Assert(aggstate->current_phase == 0);
2636  if (aggstate->phase->aggstrategy == AGG_MIXED)
2637  {
2638  aggstate->current_phase = 1;
2639  aggstate->phase = &aggstate->phases[aggstate->current_phase];
2640  }
2641 
2642  select_current_set(aggstate, batch->setno, true);
2643 
2644  perhash = &aggstate->perhash[aggstate->current_set];
2645 
2646  /*
2647  * Spilled tuples are always read back as MinimalTuples, which may be
2648  * different from the outer plan, so recompile the aggregate expressions.
2649  *
2650  * We still need the NULL check, because we are only processing one
2651  * grouping set at a time and the rest will be NULL.
2652  */
2653  hashagg_recompile_expressions(aggstate, true, true);
2654 
2655  for (;;)
2656  {
2657  TupleTableSlot *spillslot = aggstate->hash_spill_rslot;
2658  TupleTableSlot *hashslot = perhash->hashslot;
2659  TupleHashEntry entry;
2660  MinimalTuple tuple;
2661  uint32 hash;
2662  bool isnew = false;
2663  bool *p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
2664 
2666 
2667  tuple = hashagg_batch_read(batch, &hash);
2668  if (tuple == NULL)
2669  break;
2670 
2671  ExecStoreMinimalTuple(tuple, spillslot, true);
2672  aggstate->tmpcontext->ecxt_outertuple = spillslot;
2673 
2674  prepare_hash_slot(perhash,
2675  aggstate->tmpcontext->ecxt_outertuple,
2676  hashslot);
2677  entry = LookupTupleHashEntryHash(perhash->hashtable, hashslot,
2678  p_isnew, hash);
2679 
2680  if (entry != NULL)
2681  {
2682  if (isnew)
2683  initialize_hash_entry(aggstate, perhash->hashtable, entry);
2684  aggstate->hash_pergroup[batch->setno] = entry->additional;
2685  advance_aggregates(aggstate);
2686  }
2687  else
2688  {
2689  if (!spill_initialized)
2690  {
2691  /*
2692  * Avoid initializing the spill until we actually need it so
2693  * that we don't assign tapes that will never be used.
2694  */
2695  spill_initialized = true;
2696  hashagg_spill_init(&spill, tapeset, batch->used_bits,
2697  batch->input_card, aggstate->hashentrysize);
2698  }
2699  /* no memory for a new group, spill */
2700  hashagg_spill_tuple(aggstate, &spill, spillslot, hash);
2701 
2702  aggstate->hash_pergroup[batch->setno] = NULL;
2703  }
2704 
2705  /*
2706  * Reset per-input-tuple context after each tuple, but note that the
2707  * hash lookups do this too
2708  */
2709  ResetExprContext(aggstate->tmpcontext);
2710  }
2711 
2712  LogicalTapeClose(batch->input_tape);
2713 
2714  /* change back to phase 0 */
2715  aggstate->current_phase = 0;
2716  aggstate->phase = &aggstate->phases[aggstate->current_phase];
2717 
2718  if (spill_initialized)
2719  {
2720  hashagg_spill_finish(aggstate, &spill, batch->setno);
2721  hash_agg_update_metrics(aggstate, true, spill.npartitions);
2722  }
2723  else
2724  hash_agg_update_metrics(aggstate, true, 0);
2725 
2726  aggstate->hash_spill_mode = false;
2727 
2728  /* prepare to walk the first hash table */
2729  select_current_set(aggstate, batch->setno, true);
2730  ResetTupleHashIterator(aggstate->perhash[batch->setno].hashtable,
2731  &aggstate->perhash[batch->setno].hashiter);
2732 
2733  pfree(batch);
2734 
2735  return true;
2736 }
2737 
2738 /*
2739  * ExecAgg for hashed case: retrieving groups from hash table
2740  *
2741  * After exhausting in-memory tuples, also try refilling the hash table using
2742  * previously-spilled tuples. Only returns NULL after all in-memory and
2743  * spilled tuples are exhausted.
2744  */
2745 static TupleTableSlot *
2747 {
2748  TupleTableSlot *result = NULL;
2749 
2750  while (result == NULL)
2751  {
2752  result = agg_retrieve_hash_table_in_memory(aggstate);
2753  if (result == NULL)
2754  {
2755  if (!agg_refill_hash_table(aggstate))
2756  {
2757  aggstate->agg_done = true;
2758  break;
2759  }
2760  }
2761  }
2762 
2763  return result;
2764 }
2765 
2766 /*
2767  * Retrieve the groups from the in-memory hash tables without considering any
2768  * spilled tuples.
2769  */
2770 static TupleTableSlot *
2772 {
2773  ExprContext *econtext;
2774  AggStatePerAgg peragg;
2775  AggStatePerGroup pergroup;
2776  TupleHashEntryData *entry;
2777  TupleTableSlot *firstSlot;
2778  TupleTableSlot *result;
2779  AggStatePerHash perhash;
2780 
2781  /*
2782  * get state info from node.
2783  *
2784  * econtext is the per-output-tuple expression context.
2785  */
2786  econtext = aggstate->ss.ps.ps_ExprContext;
2787  peragg = aggstate->peragg;
2788  firstSlot = aggstate->ss.ss_ScanTupleSlot;
2789 
2790  /*
2791  * Note that perhash (and therefore anything accessed through it) can
2792  * change inside the loop, as we change between grouping sets.
2793  */
2794  perhash = &aggstate->perhash[aggstate->current_set];
2795 
2796  /*
2797  * We loop retrieving groups until we find one satisfying
2798  * aggstate->ss.ps.qual
2799  */
2800  for (;;)
2801  {
2802  TupleTableSlot *hashslot = perhash->hashslot;
2803  int i;
2804 
2806 
2807  /*
2808  * Find the next entry in the hash table
2809  */
2810  entry = ScanTupleHashTable(perhash->hashtable, &perhash->hashiter);
2811  if (entry == NULL)
2812  {
2813  int nextset = aggstate->current_set + 1;
2814 
2815  if (nextset < aggstate->num_hashes)
2816  {
2817  /*
2818  * Switch to next grouping set, reinitialize, and restart the
2819  * loop.
2820  */
2821  select_current_set(aggstate, nextset, true);
2822 
2823  perhash = &aggstate->perhash[aggstate->current_set];
2824 
2825  ResetTupleHashIterator(perhash->hashtable, &perhash->hashiter);
2826 
2827  continue;
2828  }
2829  else
2830  {
2831  return NULL;
2832  }
2833  }
2834 
2835  /*
2836  * Clear the per-output-tuple context for each group
2837  *
2838  * We intentionally don't use ReScanExprContext here; if any aggs have
2839  * registered shutdown callbacks, they mustn't be called yet, since we
2840  * might not be done with that agg.
2841  */
2842  ResetExprContext(econtext);
2843 
2844  /*
2845  * Transform representative tuple back into one with the right
2846  * columns.
2847  */
2848  ExecStoreMinimalTuple(entry->firstTuple, hashslot, false);
2849  slot_getallattrs(hashslot);
2850 
2851  ExecClearTuple(firstSlot);
2852  memset(firstSlot->tts_isnull, true,
2853  firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
2854 
2855  for (i = 0; i < perhash->numhashGrpCols; i++)
2856  {
2857  int varNumber = perhash->hashGrpColIdxInput[i] - 1;
2858 
2859  firstSlot->tts_values[varNumber] = hashslot->tts_values[i];
2860  firstSlot->tts_isnull[varNumber] = hashslot->tts_isnull[i];
2861  }
2862  ExecStoreVirtualTuple(firstSlot);
2863 
2864  pergroup = (AggStatePerGroup) entry->additional;
2865 
2866  /*
2867  * Use the representative input tuple for any references to
2868  * non-aggregated input columns in the qual and tlist.
2869  */
2870  econtext->ecxt_outertuple = firstSlot;
2871 
2872  prepare_projection_slot(aggstate,
2873  econtext->ecxt_outertuple,
2874  aggstate->current_set);
2875 
2876  finalize_aggregates(aggstate, peragg, pergroup);
2877 
2878  result = project_aggregates(aggstate);
2879  if (result)
2880  return result;
2881  }
2882 
2883  /* No more groups */
2884  return NULL;
2885 }
2886 
2887 /*
2888  * hashagg_spill_init
2889  *
2890  * Called after we determined that spilling is necessary. Chooses the number
2891  * of partitions to create, and initializes them.
2892  */
2893 static void
2894 hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
2895  double input_groups, double hashentrysize)
2896 {
2897  int npartitions;
2898  int partition_bits;
2899 
2900  npartitions = hash_choose_num_partitions(input_groups, hashentrysize,
2901  used_bits, &partition_bits);
2902 
2903  spill->partitions = palloc0(sizeof(LogicalTape *) * npartitions);
2904  spill->ntuples = palloc0(sizeof(int64) * npartitions);
2905  spill->hll_card = palloc0(sizeof(hyperLogLogState) * npartitions);
2906 
2907  for (int i = 0; i < npartitions; i++)
2908  spill->partitions[i] = LogicalTapeCreate(tapeset);
2909 
2910  spill->shift = 32 - used_bits - partition_bits;
2911  spill->mask = (npartitions - 1) << spill->shift;
2912  spill->npartitions = npartitions;
2913 
2914  for (int i = 0; i < npartitions; i++)
2916 }
2917 
2918 /*
2919  * hashagg_spill_tuple
2920  *
2921  * No room for new groups in the hash table. Save for later in the appropriate
2922  * partition.
2923  */
2924 static Size
2926  TupleTableSlot *inputslot, uint32 hash)
2927 {
2928  TupleTableSlot *spillslot;
2929  int partition;
2930  MinimalTuple tuple;
2931  LogicalTape *tape;
2932  int total_written = 0;
2933  bool shouldFree;
2934 
2935  Assert(spill->partitions != NULL);
2936 
2937  /* spill only attributes that we actually need */
2938  if (!aggstate->all_cols_needed)
2939  {
2940  spillslot = aggstate->hash_spill_wslot;
2941  slot_getsomeattrs(inputslot, aggstate->max_colno_needed);
2942  ExecClearTuple(spillslot);
2943  for (int i = 0; i < spillslot->tts_tupleDescriptor->natts; i++)
2944  {
2945  if (bms_is_member(i + 1, aggstate->colnos_needed))
2946  {
2947  spillslot->tts_values[i] = inputslot->tts_values[i];
2948  spillslot->tts_isnull[i] = inputslot->tts_isnull[i];
2949  }
2950  else
2951  spillslot->tts_isnull[i] = true;
2952  }
2953  ExecStoreVirtualTuple(spillslot);
2954  }
2955  else
2956  spillslot = inputslot;
2957 
2958  tuple = ExecFetchSlotMinimalTuple(spillslot, &shouldFree);
2959 
2960  partition = (hash & spill->mask) >> spill->shift;
2961  spill->ntuples[partition]++;
2962 
2963  /*
2964  * All hash values destined for a given partition have some bits in
2965  * common, which causes bad HLL cardinality estimates. Hash the hash to
2966  * get a more uniform distribution.
2967  */
2968  addHyperLogLog(&spill->hll_card[partition], hash_bytes_uint32(hash));
2969 
2970  tape = spill->partitions[partition];
2971 
2972  LogicalTapeWrite(tape, &hash, sizeof(uint32));
2973  total_written += sizeof(uint32);
2974 
2975  LogicalTapeWrite(tape, tuple, tuple->t_len);
2976  total_written += tuple->t_len;
2977 
2978  if (shouldFree)
2979  pfree(tuple);
2980 
2981  return total_written;
2982 }
2983 
2984 /*
2985  * hashagg_batch_new
2986  *
2987  * Construct a HashAggBatch item, which represents one iteration of HashAgg to
2988  * be done.
2989  */
2990 static HashAggBatch *
2991 hashagg_batch_new(LogicalTape *input_tape, int setno,
2992  int64 input_tuples, double input_card, int used_bits)
2993 {
2994  HashAggBatch *batch = palloc0(sizeof(HashAggBatch));
2995 
2996  batch->setno = setno;
2997  batch->used_bits = used_bits;
2998  batch->input_tape = input_tape;
2999  batch->input_tuples = input_tuples;
3000  batch->input_card = input_card;
3001 
3002  return batch;
3003 }
3004 
3005 /*
3006  * hashagg_batch_read
3007  * read the next tuple from a batch's tape. Return NULL if no more.
3008  */
3009 static MinimalTuple
3011 {
3012  LogicalTape *tape = batch->input_tape;
3013  MinimalTuple tuple;
3014  uint32 t_len;
3015  size_t nread;
3016  uint32 hash;
3017 
3018  nread = LogicalTapeRead(tape, &hash, sizeof(uint32));
3019  if (nread == 0)
3020  return NULL;
3021  if (nread != sizeof(uint32))
3022  ereport(ERROR,
3024  errmsg_internal("unexpected EOF for tape %p: requested %zu bytes, read %zu bytes",
3025  tape, sizeof(uint32), nread)));
3026  if (hashp != NULL)
3027  *hashp = hash;
3028 
3029  nread = LogicalTapeRead(tape, &t_len, sizeof(t_len));
3030  if (nread != sizeof(uint32))
3031  ereport(ERROR,
3033  errmsg_internal("unexpected EOF for tape %p: requested %zu bytes, read %zu bytes",
3034  tape, sizeof(uint32), nread)));
3035 
3036  tuple = (MinimalTuple) palloc(t_len);
3037  tuple->t_len = t_len;
3038 
3039  nread = LogicalTapeRead(tape,
3040  (char *) tuple + sizeof(uint32),
3041  t_len - sizeof(uint32));
3042  if (nread != t_len - sizeof(uint32))
3043  ereport(ERROR,
3045  errmsg_internal("unexpected EOF for tape %p: requested %zu bytes, read %zu bytes",
3046  tape, t_len - sizeof(uint32), nread)));
3047 
3048  return tuple;
3049 }
3050 
3051 /*
3052  * hashagg_finish_initial_spills
3053  *
3054  * After a HashAggBatch has been processed, it may have spilled tuples to
3055  * disk. If so, turn the spilled partitions into new batches that must later
3056  * be executed.
3057  */
3058 static void
3060 {
3061  int setno;
3062  int total_npartitions = 0;
3063 
3064  if (aggstate->hash_spills != NULL)
3065  {
3066  for (setno = 0; setno < aggstate->num_hashes; setno++)
3067  {
3068  HashAggSpill *spill = &aggstate->hash_spills[setno];
3069 
3070  total_npartitions += spill->npartitions;
3071  hashagg_spill_finish(aggstate, spill, setno);
3072  }
3073 
3074  /*
3075  * We're not processing tuples from outer plan any more; only
3076  * processing batches of spilled tuples. The initial spill structures
3077  * are no longer needed.
3078  */
3079  pfree(aggstate->hash_spills);
3080  aggstate->hash_spills = NULL;
3081  }
3082 
3083  hash_agg_update_metrics(aggstate, false, total_npartitions);
3084  aggstate->hash_spill_mode = false;
3085 }
3086 
3087 /*
3088  * hashagg_spill_finish
3089  *
3090  * Transform spill partitions into new batches.
3091  */
3092 static void
3093 hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
3094 {
3095  int i;
3096  int used_bits = 32 - spill->shift;
3097 
3098  if (spill->npartitions == 0)
3099  return; /* didn't spill */
3100 
3101  for (i = 0; i < spill->npartitions; i++)
3102  {
3103  LogicalTape *tape = spill->partitions[i];
3104  HashAggBatch *new_batch;
3105  double cardinality;
3106 
3107  /* if the partition is empty, don't create a new batch of work */
3108  if (spill->ntuples[i] == 0)
3109  continue;
3110 
3111  cardinality = estimateHyperLogLog(&spill->hll_card[i]);
3112  freeHyperLogLog(&spill->hll_card[i]);
3113 
3114  /* rewinding frees the buffer while not in use */
3116 
3117  new_batch = hashagg_batch_new(tape, setno,
3118  spill->ntuples[i], cardinality,
3119  used_bits);
3120  aggstate->hash_batches = lappend(aggstate->hash_batches, new_batch);
3121  aggstate->hash_batches_used++;
3122  }
3123 
3124  pfree(spill->ntuples);
3125  pfree(spill->hll_card);
3126  pfree(spill->partitions);
3127 }
3128 
3129 /*
3130  * Free resources related to a spilled HashAgg.
3131  */
3132 static void
3134 {
3135  /* free spills from initial pass */
3136  if (aggstate->hash_spills != NULL)
3137  {
3138  int setno;
3139 
3140  for (setno = 0; setno < aggstate->num_hashes; setno++)
3141  {
3142  HashAggSpill *spill = &aggstate->hash_spills[setno];
3143 
3144  pfree(spill->ntuples);
3145  pfree(spill->partitions);
3146  }
3147  pfree(aggstate->hash_spills);
3148  aggstate->hash_spills = NULL;
3149  }
3150 
3151  /* free batches */
3152  list_free_deep(aggstate->hash_batches);
3153  aggstate->hash_batches = NIL;
3154 
3155  /* close tape set */
3156  if (aggstate->hash_tapeset != NULL)
3157  {
3158  LogicalTapeSetClose(aggstate->hash_tapeset);
3159  aggstate->hash_tapeset = NULL;
3160  }
3161 }
3162 
3163 
3164 /* -----------------
3165  * ExecInitAgg
3166  *
3167  * Creates the run-time information for the agg node produced by the
3168  * planner and initializes its outer subtree.
3169  *
3170  * -----------------
3171  */
3172 AggState *
3173 ExecInitAgg(Agg *node, EState *estate, int eflags)
3174 {
3175  AggState *aggstate;
3176  AggStatePerAgg peraggs;
3177  AggStatePerTrans pertransstates;
3178  AggStatePerGroup *pergroups;
3179  Plan *outerPlan;
3180  ExprContext *econtext;
3181  TupleDesc scanDesc;
3182  int max_aggno;
3183  int max_transno;
3184  int numaggrefs;
3185  int numaggs;
3186  int numtrans;
3187  int phase;
3188  int phaseidx;
3189  ListCell *l;
3190  Bitmapset *all_grouped_cols = NULL;
3191  int numGroupingSets = 1;
3192  int numPhases;
3193  int numHashes;
3194  int i = 0;
3195  int j = 0;
3196  bool use_hashing = (node->aggstrategy == AGG_HASHED ||
3197  node->aggstrategy == AGG_MIXED);
3198 
3199  /* check for unsupported flags */
3200  Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
3201 
3202  /*
3203  * create state structure
3204  */
3205  aggstate = makeNode(AggState);
3206  aggstate->ss.ps.plan = (Plan *) node;
3207  aggstate->ss.ps.state = estate;
3208  aggstate->ss.ps.ExecProcNode = ExecAgg;
3209 
3210  aggstate->aggs = NIL;
3211  aggstate->numaggs = 0;
3212  aggstate->numtrans = 0;
3213  aggstate->aggstrategy = node->aggstrategy;
3214  aggstate->aggsplit = node->aggsplit;
3215  aggstate->maxsets = 0;
3216  aggstate->projected_set = -1;
3217  aggstate->current_set = 0;
3218  aggstate->peragg = NULL;
3219  aggstate->pertrans = NULL;
3220  aggstate->curperagg = NULL;
3221  aggstate->curpertrans = NULL;
3222  aggstate->input_done = false;
3223  aggstate->agg_done = false;
3224  aggstate->pergroups = NULL;
3225  aggstate->grp_firstTuple = NULL;
3226  aggstate->sort_in = NULL;
3227  aggstate->sort_out = NULL;
3228 
3229  /*
3230  * phases[0] always exists, but is dummy in sorted/plain mode
3231  */
3232  numPhases = (use_hashing ? 1 : 2);
3233  numHashes = (use_hashing ? 1 : 0);
3234 
3235  /*
3236  * Calculate the maximum number of grouping sets in any phase; this
3237  * determines the size of some allocations. Also calculate the number of
3238  * phases, since all hashed/mixed nodes contribute to only a single phase.
3239  */
3240  if (node->groupingSets)
3241  {
3242  numGroupingSets = list_length(node->groupingSets);
3243 
3244  foreach(l, node->chain)
3245  {
3246  Agg *agg = lfirst(l);
3247 
3248  numGroupingSets = Max(numGroupingSets,
3249  list_length(agg->groupingSets));
3250 
3251  /*
3252  * additional AGG_HASHED aggs become part of phase 0, but all
3253  * others add an extra phase.
3254  */
3255  if (agg->aggstrategy != AGG_HASHED)
3256  ++numPhases;
3257  else
3258  ++numHashes;
3259  }
3260  }
3261 
3262  aggstate->maxsets = numGroupingSets;
3263  aggstate->numphases = numPhases;
3264 
3265  aggstate->aggcontexts = (ExprContext **)
3266  palloc0(sizeof(ExprContext *) * numGroupingSets);
3267 
3268  /*
3269  * Create expression contexts. We need three or more, one for
3270  * per-input-tuple processing, one for per-output-tuple processing, one
3271  * for all the hashtables, and one for each grouping set. The per-tuple
3272  * memory context of the per-grouping-set ExprContexts (aggcontexts)
3273  * replaces the standalone memory context formerly used to hold transition
3274  * values. We cheat a little by using ExecAssignExprContext() to build
3275  * all of them.
3276  *
3277  * NOTE: the details of what is stored in aggcontexts and what is stored
3278  * in the regular per-query memory context are driven by a simple
3279  * decision: we want to reset the aggcontext at group boundaries (if not
3280  * hashing) and in ExecReScanAgg to recover no-longer-wanted space.
3281  */
3282  ExecAssignExprContext(estate, &aggstate->ss.ps);
3283  aggstate->tmpcontext = aggstate->ss.ps.ps_ExprContext;
3284 
3285  for (i = 0; i < numGroupingSets; ++i)
3286  {
3287  ExecAssignExprContext(estate, &aggstate->ss.ps);
3288  aggstate->aggcontexts[i] = aggstate->ss.ps.ps_ExprContext;
3289  }
3290 
3291  if (use_hashing)
3292  aggstate->hashcontext = CreateWorkExprContext(estate);
3293 
3294  ExecAssignExprContext(estate, &aggstate->ss.ps);
3295 
3296  /*
3297  * Initialize child nodes.
3298  *
3299  * If we are doing a hashed aggregation then the child plan does not need
3300  * to handle REWIND efficiently; see ExecReScanAgg.
3301  */
3302  if (node->aggstrategy == AGG_HASHED)
3303  eflags &= ~EXEC_FLAG_REWIND;
3304  outerPlan = outerPlan(node);
3305  outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
3306 
3307  /*
3308  * initialize source tuple type.
3309  */
3310  aggstate->ss.ps.outerops =
3312  &aggstate->ss.ps.outeropsfixed);
3313  aggstate->ss.ps.outeropsset = true;
3314 
3315  ExecCreateScanSlotFromOuterPlan(estate, &aggstate->ss,
3316  aggstate->ss.ps.outerops);
3317  scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
3318 
3319  /*
3320  * If there are more than two phases (including a potential dummy phase
3321  * 0), input will be resorted using tuplesort. Need a slot for that.
3322  */
3323  if (numPhases > 2)
3324  {
3325  aggstate->sort_slot = ExecInitExtraTupleSlot(estate, scanDesc,
3327 
3328  /*
3329  * The output of the tuplesort, and the output from the outer child
3330  * might not use the same type of slot. In most cases the child will
3331  * be a Sort, and thus return a TTSOpsMinimalTuple type slot - but the
3332  * input can also be presorted due an index, in which case it could be
3333  * a different type of slot.
3334  *
3335  * XXX: For efficiency it would be good to instead/additionally
3336  * generate expressions with corresponding settings of outerops* for
3337  * the individual phases - deforming is often a bottleneck for
3338  * aggregations with lots of rows per group. If there's multiple
3339  * sorts, we know that all but the first use TTSOpsMinimalTuple (via
3340  * the nodeAgg.c internal tuplesort).
3341  */
3342  if (aggstate->ss.ps.outeropsfixed &&
3343  aggstate->ss.ps.outerops != &TTSOpsMinimalTuple)
3344  aggstate->ss.ps.outeropsfixed = false;
3345  }
3346 
3347  /*
3348  * Initialize result type, slot and projection.
3349  */
3351  ExecAssignProjectionInfo(&aggstate->ss.ps, NULL);
3352 
3353  /*
3354  * initialize child expressions
3355  *
3356  * We expect the parser to have checked that no aggs contain other agg
3357  * calls in their arguments (and just to be sure, we verify it again while
3358  * initializing the plan node). This would make no sense under SQL
3359  * semantics, and it's forbidden by the spec. Because it is true, we
3360  * don't need to worry about evaluating the aggs in any particular order.
3361  *
3362  * Note: execExpr.c finds Aggrefs for us, and adds them to aggstate->aggs.
3363  * Aggrefs in the qual are found here; Aggrefs in the targetlist are found
3364  * during ExecAssignProjectionInfo, above.
3365  */
3366  aggstate->ss.ps.qual =
3367  ExecInitQual(node->plan.qual, (PlanState *) aggstate);
3368 
3369  /*
3370  * We should now have found all Aggrefs in the targetlist and quals.
3371  */
3372  numaggrefs = list_length(aggstate->aggs);
3373  max_aggno = -1;
3374  max_transno = -1;
3375  foreach(l, aggstate->aggs)
3376  {
3377  Aggref *aggref = (Aggref *) lfirst(l);
3378 
3379  max_aggno = Max(max_aggno, aggref->aggno);
3380  max_transno = Max(max_transno, aggref->aggtransno);
3381  }
3382  numaggs = max_aggno + 1;
3383  numtrans = max_transno + 1;
3384 
3385  /*
3386  * For each phase, prepare grouping set data and fmgr lookup data for
3387  * compare functions. Accumulate all_grouped_cols in passing.
3388  */
3389  aggstate->phases = palloc0(numPhases * sizeof(AggStatePerPhaseData));
3390 
3391  aggstate->num_hashes = numHashes;
3392  if (numHashes)
3393  {
3394  aggstate->perhash = palloc0(sizeof(AggStatePerHashData) * numHashes);
3395  aggstate->phases[0].numsets = 0;
3396  aggstate->phases[0].gset_lengths = palloc(numHashes * sizeof(int));
3397  aggstate->phases[0].grouped_cols = palloc(numHashes * sizeof(Bitmapset *));
3398  }
3399 
3400  phase = 0;
3401  for (phaseidx = 0; phaseidx <= list_length(node->chain); ++phaseidx)
3402  {
3403  Agg *aggnode;
3404  Sort *sortnode;
3405 
3406  if (phaseidx > 0)
3407  {
3408  aggnode = list_nth_node(Agg, node->chain, phaseidx - 1);
3409  sortnode = castNode(Sort, outerPlan(aggnode));
3410  }
3411  else
3412  {
3413  aggnode = node;
3414  sortnode = NULL;
3415  }
3416 
3417  Assert(phase <= 1 || sortnode);
3418 
3419  if (aggnode->aggstrategy == AGG_HASHED
3420  || aggnode->aggstrategy == AGG_MIXED)
3421  {
3422  AggStatePerPhase phasedata = &aggstate->phases[0];
3423  AggStatePerHash perhash;
3424  Bitmapset *cols = NULL;
3425 
3426  Assert(phase == 0);
3427  i = phasedata->numsets++;
3428  perhash = &aggstate->perhash[i];
3429 
3430  /* phase 0 always points to the "real" Agg in the hash case */
3431  phasedata->aggnode = node;
3432  phasedata->aggstrategy = node->aggstrategy;
3433 
3434  /* but the actual Agg node representing this hash is saved here */
3435  perhash->aggnode = aggnode;
3436 
3437  phasedata->gset_lengths[i] = perhash->numCols = aggnode->numCols;
3438 
3439  for (j = 0; j < aggnode->numCols; ++j)
3440  cols = bms_add_member(cols, aggnode->grpColIdx[j]);
3441 
3442  phasedata->grouped_cols[i] = cols;
3443 
3444  all_grouped_cols = bms_add_members(all_grouped_cols, cols);
3445  continue;
3446  }
3447  else
3448  {
3449  AggStatePerPhase phasedata = &aggstate->phases[++phase];
3450  int num_sets;
3451 
3452  phasedata->numsets = num_sets = list_length(aggnode->groupingSets);
3453 
3454  if (num_sets)
3455  {
3456  phasedata->gset_lengths = palloc(num_sets * sizeof(int));
3457  phasedata->grouped_cols = palloc(num_sets * sizeof(Bitmapset *));
3458 
3459  i = 0;
3460  foreach(l, aggnode->groupingSets)
3461  {
3462  int current_length = list_length(lfirst(l));
3463  Bitmapset *cols = NULL;
3464 
3465  /* planner forces this to be correct */
3466  for (j = 0; j < current_length; ++j)
3467  cols = bms_add_member(cols, aggnode->grpColIdx[j]);
3468 
3469  phasedata->grouped_cols[i] = cols;
3470  phasedata->gset_lengths[i] = current_length;
3471 
3472  ++i;
3473  }
3474 
3475  all_grouped_cols = bms_add_members(all_grouped_cols,
3476  phasedata->grouped_cols[0]);
3477  }
3478  else
3479  {
3480  Assert(phaseidx == 0);
3481 
3482  phasedata->gset_lengths = NULL;
3483  phasedata->grouped_cols = NULL;
3484  }
3485 
3486  /*
3487  * If we are grouping, precompute fmgr lookup data for inner loop.
3488  */
3489  if (aggnode->aggstrategy == AGG_SORTED)
3490  {
3491  /*
3492  * Build a separate function for each subset of columns that
3493  * need to be compared.
3494  */
3495  phasedata->eqfunctions =
3496  (ExprState **) palloc0(aggnode->numCols * sizeof(ExprState *));
3497 
3498  /* for each grouping set */
3499  for (int k = 0; k < phasedata->numsets; k++)
3500  {
3501  int length = phasedata->gset_lengths[k];
3502 
3503  /* nothing to do for empty grouping set */
3504  if (length == 0)
3505  continue;
3506 
3507  /* if we already had one of this length, it'll do */
3508  if (phasedata->eqfunctions[length - 1] != NULL)
3509  continue;
3510 
3511  phasedata->eqfunctions[length - 1] =
3512  execTuplesMatchPrepare(scanDesc,
3513  length,
3514  aggnode->grpColIdx,
3515  aggnode->grpOperators,
3516  aggnode->grpCollations,
3517  (PlanState *) aggstate);
3518  }
3519 
3520  /* and for all grouped columns, unless already computed */
3521  if (aggnode->numCols > 0 &&
3522  phasedata->eqfunctions[aggnode->numCols - 1] == NULL)
3523  {
3524  phasedata->eqfunctions[aggnode->numCols - 1] =
3525  execTuplesMatchPrepare(scanDesc,
3526  aggnode->numCols,
3527  aggnode->grpColIdx,
3528  aggnode->grpOperators,
3529  aggnode->grpCollations,
3530  (PlanState *) aggstate);
3531  }
3532  }
3533 
3534  phasedata->aggnode = aggnode;
3535  phasedata->aggstrategy = aggnode->aggstrategy;
3536  phasedata->sortnode = sortnode;
3537  }
3538  }
3539 
3540  /*
3541  * Convert all_grouped_cols to a descending-order list.
3542  */
3543  i = -1;
3544  while ((i = bms_next_member(all_grouped_cols, i)) >= 0)
3545  aggstate->all_grouped_cols = lcons_int(i, aggstate->all_grouped_cols);
3546 
3547  /*
3548  * Set up aggregate-result storage in the output expr context, and also
3549  * allocate my private per-agg working storage
3550  */
3551  econtext = aggstate->ss.ps.ps_ExprContext;
3552  econtext->ecxt_aggvalues = (Datum *) palloc0(sizeof(Datum) * numaggs);
3553  econtext->ecxt_aggnulls = (bool *) palloc0(sizeof(bool) * numaggs);
3554 
3555  peraggs = (AggStatePerAgg) palloc0(sizeof(AggStatePerAggData) * numaggs);
3556  pertransstates = (AggStatePerTrans) palloc0(sizeof(AggStatePerTransData) * numtrans);
3557 
3558  aggstate->peragg = peraggs;
3559  aggstate->pertrans = pertransstates;
3560 
3561 
3562  aggstate->all_pergroups =
3564  * (numGroupingSets + numHashes));
3565  pergroups = aggstate->all_pergroups;
3566 
3567  if (node->aggstrategy != AGG_HASHED)
3568  {
3569  for (i = 0; i < numGroupingSets; i++)
3570  {
3571  pergroups[i] = (AggStatePerGroup) palloc0(sizeof(AggStatePerGroupData)
3572  * numaggs);
3573  }
3574 
3575  aggstate->pergroups = pergroups;
3576  pergroups += numGroupingSets;
3577  }
3578 
3579  /*
3580  * Hashing can only appear in the initial phase.
3581  */
3582  if (use_hashing)
3583  {
3584  Plan *outerplan = outerPlan(node);
3585  uint64 totalGroups = 0;
3586 
3587  aggstate->hash_metacxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
3588  "HashAgg meta context",
3590  aggstate->hash_spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
3592  aggstate->hash_spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
3593  &TTSOpsVirtual);
3594 
3595  /* this is an array of pointers, not structures */
3596  aggstate->hash_pergroup = pergroups;
3597 
3598  aggstate->hashentrysize = hash_agg_entry_size(aggstate->numtrans,
3599  outerplan->plan_width,
3600  node->transitionSpace);
3601 
3602  /*
3603  * Consider all of the grouping sets together when setting the limits
3604  * and estimating the number of partitions. This can be inaccurate
3605  * when there is more than one grouping set, but should still be
3606  * reasonable.
3607  */
3608  for (int k = 0; k < aggstate->num_hashes; k++)
3609  totalGroups += aggstate->perhash[k].aggnode->numGroups;
3610 
3611  hash_agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
3612  &aggstate->hash_mem_limit,
3613  &aggstate->hash_ngroups_limit,
3614  &aggstate->hash_planned_partitions);
3615  find_hash_columns(aggstate);
3616 
3617  /* Skip massive memory allocation if we are just doing EXPLAIN */
3618  if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
3619  build_hash_tables(aggstate);
3620 
3621  aggstate->table_filled = false;
3622 
3623  /* Initialize this to 1, meaning nothing spilled, yet */
3624  aggstate->hash_batches_used = 1;
3625  }
3626 
3627  /*
3628  * Initialize current phase-dependent values to initial phase. The initial
3629  * phase is 1 (first sort pass) for all strategies that use sorting (if
3630  * hashing is being done too, then phase 0 is processed last); but if only
3631  * hashing is being done, then phase 0 is all there is.
3632  */
3633  if (node->aggstrategy == AGG_HASHED)
3634  {
3635  aggstate->current_phase = 0;
3636  initialize_phase(aggstate, 0);
3637  select_current_set(aggstate, 0, true);
3638  }
3639  else
3640  {
3641  aggstate->current_phase = 1;
3642  initialize_phase(aggstate, 1);
3643  select_current_set(aggstate, 0, false);
3644  }
3645 
3646  /*
3647  * Perform lookups of aggregate function info, and initialize the
3648  * unchanging fields of the per-agg and per-trans data.
3649  */
3650  foreach(l, aggstate->aggs)
3651  {
3652  Aggref *aggref = lfirst(l);
3653  AggStatePerAgg peragg;
3654  AggStatePerTrans pertrans;
3655  Oid aggTransFnInputTypes[FUNC_MAX_ARGS];
3656  int numAggTransFnArgs;
3657  int numDirectArgs;
3658  HeapTuple aggTuple;
3659  Form_pg_aggregate aggform;
3660  AclResult aclresult;
3661  Oid finalfn_oid;
3662  Oid serialfn_oid,
3663  deserialfn_oid;
3664  Oid aggOwner;
3665  Expr *finalfnexpr;
3666  Oid aggtranstype;
3667 
3668  /* Planner should have assigned aggregate to correct level */
3669  Assert(aggref->agglevelsup == 0);
3670  /* ... and the split mode should match */
3671  Assert(aggref->aggsplit == aggstate->aggsplit);
3672 
3673  peragg = &peraggs[aggref->aggno];
3674 
3675  /* Check if we initialized the state for this aggregate already. */
3676  if (peragg->aggref != NULL)
3677  continue;
3678 
3679  peragg->aggref = aggref;
3680  peragg->transno = aggref->aggtransno;
3681 
3682  /* Fetch the pg_aggregate row */
3683  aggTuple = SearchSysCache1(AGGFNOID,
3684  ObjectIdGetDatum(aggref->aggfnoid));
3685  if (!HeapTupleIsValid(aggTuple))
3686  elog(ERROR, "cache lookup failed for aggregate %u",
3687  aggref->aggfnoid);
3688  aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
3689 
3690  /* Check permission to call aggregate function */
3691  aclresult = object_aclcheck(ProcedureRelationId, aggref->aggfnoid, GetUserId(),
3692  ACL_EXECUTE);
3693  if (aclresult != ACLCHECK_OK)
3694  aclcheck_error(aclresult, OBJECT_AGGREGATE,
3695  get_func_name(aggref->aggfnoid));
3697 
3698  /* planner recorded transition state type in the Aggref itself */
3699  aggtranstype = aggref->aggtranstype;
3700  Assert(OidIsValid(aggtranstype));
3701 
3702  /* Final function only required if we're finalizing the aggregates */
3703  if (DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit))
3704  peragg->finalfn_oid = finalfn_oid = InvalidOid;
3705  else
3706  peragg->finalfn_oid = finalfn_oid = aggform->aggfinalfn;
3707 
3708  serialfn_oid = InvalidOid;
3709  deserialfn_oid = InvalidOid;
3710 
3711  /*
3712  * Check if serialization/deserialization is required. We only do it
3713  * for aggregates that have transtype INTERNAL.
3714  */
3715  if (aggtranstype == INTERNALOID)
3716  {
3717  /*
3718  * The planner should only have generated a serialize agg node if
3719  * every aggregate with an INTERNAL state has a serialization
3720  * function. Verify that.
3721  */
3722  if (DO_AGGSPLIT_SERIALIZE(aggstate->aggsplit))
3723  {
3724  /* serialization only valid when not running finalfn */
3726 
3727  if (!OidIsValid(aggform->aggserialfn))
3728  elog(ERROR, "serialfunc not provided for serialization aggregation");
3729  serialfn_oid = aggform->aggserialfn;
3730  }
3731 
3732  /* Likewise for deserialization functions */
3733  if (DO_AGGSPLIT_DESERIALIZE(aggstate->aggsplit))
3734  {
3735  /* deserialization only valid when combining states */
3736  Assert(DO_AGGSPLIT_COMBINE(aggstate->aggsplit));
3737 
3738  if (!OidIsValid(aggform->aggdeserialfn))
3739  elog(ERROR, "deserialfunc not provided for deserialization aggregation");
3740  deserialfn_oid = aggform->aggdeserialfn;
3741  }
3742  }
3743 
3744  /* Check that aggregate owner has permission to call component fns */
3745  {
3746  HeapTuple procTuple;
3747 
3748  procTuple = SearchSysCache1(PROCOID,
3749  ObjectIdGetDatum(aggref->aggfnoid));
3750  if (!HeapTupleIsValid(procTuple))
3751  elog(ERROR, "cache lookup failed for function %u",
3752  aggref->aggfnoid);
3753  aggOwner = ((Form_pg_proc) GETSTRUCT(procTuple))->proowner;
3754  ReleaseSysCache(procTuple);
3755 
3756  if (OidIsValid(finalfn_oid))
3757  {
3758  aclresult = object_aclcheck(ProcedureRelationId, finalfn_oid, aggOwner,
3759  ACL_EXECUTE);
3760  if (aclresult != ACLCHECK_OK)
3761  aclcheck_error(aclresult, OBJECT_FUNCTION,
3762  get_func_name(finalfn_oid));
3763  InvokeFunctionExecuteHook(finalfn_oid);
3764  }
3765  if (OidIsValid(serialfn_oid))
3766  {
3767  aclresult = object_aclcheck(ProcedureRelationId, serialfn_oid, aggOwner,
3768  ACL_EXECUTE);
3769  if (aclresult != ACLCHECK_OK)
3770  aclcheck_error(aclresult, OBJECT_FUNCTION,
3771  get_func_name(serialfn_oid));
3772  InvokeFunctionExecuteHook(serialfn_oid);
3773  }
3774  if (OidIsValid(deserialfn_oid))
3775  {
3776  aclresult = object_aclcheck(ProcedureRelationId, deserialfn_oid, aggOwner,
3777  ACL_EXECUTE);
3778  if (aclresult != ACLCHECK_OK)
3779  aclcheck_error(aclresult, OBJECT_FUNCTION,
3780  get_func_name(deserialfn_oid));
3781  InvokeFunctionExecuteHook(deserialfn_oid);
3782  }
3783  }
3784 
3785  /*
3786  * Get actual datatypes of the (nominal) aggregate inputs. These
3787  * could be different from the agg's declared input types, when the
3788  * agg accepts ANY or a polymorphic type.
3789  */
3790  numAggTransFnArgs = get_aggregate_argtypes(aggref,
3791  aggTransFnInputTypes);
3792 
3793  /* Count the "direct" arguments, if any */
3794  numDirectArgs = list_length(aggref->aggdirectargs);
3795 
3796  /* Detect how many arguments to pass to the finalfn */
3797  if (aggform->aggfinalextra)
3798  peragg->numFinalArgs = numAggTransFnArgs + 1;
3799  else
3800  peragg->numFinalArgs = numDirectArgs + 1;
3801 
3802  /* Initialize any direct-argument expressions */
3803  peragg->aggdirectargs = ExecInitExprList(aggref->aggdirectargs,
3804  (PlanState *) aggstate);
3805 
3806  /*
3807  * build expression trees using actual argument & result types for the
3808  * finalfn, if it exists and is required.
3809  */
3810  if (OidIsValid(finalfn_oid))
3811  {
3812  build_aggregate_finalfn_expr(aggTransFnInputTypes,
3813  peragg->numFinalArgs,
3814  aggtranstype,
3815  aggref->aggtype,
3816  aggref->inputcollid,
3817  finalfn_oid,
3818  &finalfnexpr);
3819  fmgr_info(finalfn_oid, &peragg->finalfn);
3820  fmgr_info_set_expr((Node *) finalfnexpr, &peragg->finalfn);
3821  }
3822 
3823  /* get info about the output value's datatype */
3824  get_typlenbyval(aggref->aggtype,
3825  &peragg->resulttypeLen,
3826  &peragg->resulttypeByVal);
3827 
3828  /*
3829  * Build working state for invoking the transition function, if we
3830  * haven't done it already.
3831  */
3832  pertrans = &pertransstates[aggref->aggtransno];
3833  if (pertrans->aggref == NULL)
3834  {
3835  Datum textInitVal;
3836  Datum initValue;
3837  bool initValueIsNull;
3838  Oid transfn_oid;
3839 
3840  /*
3841  * If this aggregation is performing state combines, then instead
3842  * of using the transition function, we'll use the combine
3843  * function.
3844  */
3845  if (DO_AGGSPLIT_COMBINE(aggstate->aggsplit))
3846  {
3847  transfn_oid = aggform->aggcombinefn;
3848 
3849  /* If not set then the planner messed up */
3850  if (!OidIsValid(transfn_oid))
3851  elog(ERROR, "combinefn not set for aggregate function");
3852  }
3853  else
3854  transfn_oid = aggform->aggtransfn;
3855 
3856  aclresult = object_aclcheck(ProcedureRelationId, transfn_oid, aggOwner, ACL_EXECUTE);
3857  if (aclresult != ACLCHECK_OK)
3858  aclcheck_error(aclresult, OBJECT_FUNCTION,
3859  get_func_name(transfn_oid));
3860  InvokeFunctionExecuteHook(transfn_oid);
3861 
3862  /*
3863  * initval is potentially null, so don't try to access it as a
3864  * struct field. Must do it the hard way with SysCacheGetAttr.
3865  */
3866  textInitVal = SysCacheGetAttr(AGGFNOID, aggTuple,
3867  Anum_pg_aggregate_agginitval,
3868  &initValueIsNull);
3869  if (initValueIsNull)
3870  initValue = (Datum) 0;
3871  else
3872  initValue = GetAggInitVal(textInitVal, aggtranstype);
3873 
3874  if (DO_AGGSPLIT_COMBINE(aggstate->aggsplit))
3875  {
3876  Oid combineFnInputTypes[] = {aggtranstype,
3877  aggtranstype};
3878 
3879  /*
3880  * When combining there's only one input, the to-be-combined
3881  * transition value. The transition value is not counted
3882  * here.
3883  */
3884  pertrans->numTransInputs = 1;
3885 
3886  /* aggcombinefn always has two arguments of aggtranstype */
3887  build_pertrans_for_aggref(pertrans, aggstate, estate,
3888  aggref, transfn_oid, aggtranstype,
3889  serialfn_oid, deserialfn_oid,
3890  initValue, initValueIsNull,
3891  combineFnInputTypes, 2);
3892 
3893  /*
3894  * Ensure that a combine function to combine INTERNAL states
3895  * is not strict. This should have been checked during CREATE
3896  * AGGREGATE, but the strict property could have been changed
3897  * since then.
3898  */
3899  if (pertrans->transfn.fn_strict && aggtranstype == INTERNALOID)
3900  ereport(ERROR,
3901  (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
3902  errmsg("combine function with transition type %s must not be declared STRICT",
3903  format_type_be(aggtranstype))));
3904  }
3905  else
3906  {
3907  /* Detect how many arguments to pass to the transfn */
3908  if (AGGKIND_IS_ORDERED_SET(aggref->aggkind))
3909  pertrans->numTransInputs = list_length(aggref->args);
3910  else
3911  pertrans->numTransInputs = numAggTransFnArgs;
3912 
3913  build_pertrans_for_aggref(pertrans, aggstate, estate,
3914  aggref, transfn_oid, aggtranstype,
3915  serialfn_oid, deserialfn_oid,
3916  initValue, initValueIsNull,
3917  aggTransFnInputTypes,
3918  numAggTransFnArgs);
3919 
3920  /*
3921  * If the transfn is strict and the initval is NULL, make sure
3922  * input type and transtype are the same (or at least
3923  * binary-compatible), so that it's OK to use the first
3924  * aggregated input value as the initial transValue. This
3925  * should have been checked at agg definition time, but we
3926  * must check again in case the transfn's strictness property
3927  * has been changed.
3928  */
3929  if (pertrans->transfn.fn_strict && pertrans->initValueIsNull)
3930  {
3931  if (numAggTransFnArgs <= numDirectArgs ||
3932  !IsBinaryCoercible(aggTransFnInputTypes[numDirectArgs],
3933  aggtranstype))
3934  ereport(ERROR,
3935  (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
3936  errmsg("aggregate %u needs to have compatible input type and transition type",
3937  aggref->aggfnoid)));
3938  }
3939  }
3940  }
3941  else
3942  pertrans->aggshared = true;
3943  ReleaseSysCache(aggTuple);
3944  }
3945 
3946  /*
3947  * Update aggstate->numaggs to be the number of unique aggregates found.
3948  * Also set numstates to the number of unique transition states found.
3949  */
3950  aggstate->numaggs = numaggs;
3951  aggstate->numtrans = numtrans;
3952 
3953  /*
3954  * Last, check whether any more aggregates got added onto the node while
3955  * we processed the expressions for the aggregate arguments (including not
3956  * only the regular arguments and FILTER expressions handled immediately
3957  * above, but any direct arguments we might've handled earlier). If so,
3958  * we have nested aggregate functions, which is semantically nonsensical,
3959  * so complain. (This should have been caught by the parser, so we don't
3960  * need to work hard on a helpful error message; but we defend against it
3961  * here anyway, just to be sure.)
3962  */
3963  if (numaggrefs != list_length(aggstate->aggs))
3964  ereport(ERROR,
3965  (errcode(ERRCODE_GROUPING_ERROR),
3966  errmsg("aggregate function calls cannot be nested")));
3967 
3968  /*
3969  * Build expressions doing all the transition work at once. We build a
3970  * different one for each phase, as the number of transition function
3971  * invocation can differ between phases. Note this'll work both for
3972  * transition and combination functions (although there'll only be one
3973  * phase in the latter case).
3974  */
3975  for (phaseidx = 0; phaseidx < aggstate->numphases; phaseidx++)
3976  {
3977  AggStatePerPhase phase = &aggstate->phases[phaseidx];
3978  bool dohash = false;
3979  bool dosort = false;
3980 
3981  /* phase 0 doesn't necessarily exist */
3982  if (!phase->aggnode)
3983  continue;
3984 
3985  if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 1)
3986  {
3987  /*
3988  * Phase one, and only phase one, in a mixed agg performs both
3989  * sorting and aggregation.
3990  */
3991  dohash = true;
3992  dosort = true;
3993  }
3994  else if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 0)
3995  {
3996  /*
3997  * No need to compute a transition function for an AGG_MIXED phase
3998  * 0 - the contents of the hashtables will have been computed
3999  * during phase 1.
4000  */
4001  continue;
4002  }
4003  else if (phase->aggstrategy == AGG_PLAIN ||
4004  phase->aggstrategy == AGG_SORTED)
4005  {
4006  dohash = false;
4007  dosort = true;
4008  }
4009  else if (phase->aggstrategy == AGG_HASHED)
4010  {
4011  dohash = true;
4012  dosort = false;
4013  }
4014  else
4015  Assert(false);
4016 
4017  phase->evaltrans = ExecBuildAggTrans(aggstate, phase, dosort, dohash,
4018  false);
4019 
4020  /* cache compiled expression for outer slot without NULL check */
4021  phase->evaltrans_cache[0][0] = phase->evaltrans;
4022  }
4023 
4024  return aggstate;
4025 }
4026 
4027 /*
4028  * Build the state needed to calculate a state value for an aggregate.
4029  *
4030  * This initializes all the fields in 'pertrans'. 'aggref' is the aggregate
4031  * to initialize the state for. 'transfn_oid', 'aggtranstype', and the rest
4032  * of the arguments could be calculated from 'aggref', but the caller has
4033  * calculated them already, so might as well pass them.
4034  *
4035  * 'transfn_oid' may be either the Oid of the aggtransfn or the aggcombinefn.
4036  */
4037 static void
4039  AggState *aggstate, EState *estate,
4040  Aggref *aggref,
4041  Oid transfn_oid, Oid aggtranstype,
4042  Oid aggserialfn, Oid aggdeserialfn,
4043  Datum initValue, bool initValueIsNull,
4044  Oid *inputTypes, int numArguments)
4045 {
4046  int numGroupingSets = Max(aggstate->maxsets, 1);
4047  Expr *transfnexpr;
4048  int numTransArgs;
4049  Expr *serialfnexpr = NULL;
4050  Expr *deserialfnexpr = NULL;
4051  ListCell *lc;
4052  int numInputs;
4053  int numDirectArgs;
4054  List *sortlist;
4055  int numSortCols;
4056  int numDistinctCols;
4057  int i;
4058 
4059  /* Begin filling in the pertrans data */
4060  pertrans->aggref = aggref;
4061  pertrans->aggshared = false;
4062  pertrans->aggCollation = aggref->inputcollid;
4063  pertrans->transfn_oid = transfn_oid;
4064  pertrans->serialfn_oid = aggserialfn;
4065  pertrans->deserialfn_oid = aggdeserialfn;
4066  pertrans->initValue = initValue;
4067  pertrans->initValueIsNull = initValueIsNull;
4068 
4069  /* Count the "direct" arguments, if any */
4070  numDirectArgs = list_length(aggref->aggdirectargs);
4071 
4072  /* Count the number of aggregated input columns */
4073  pertrans->numInputs = numInputs = list_length(aggref->args);
4074 
4075  pertrans->aggtranstype = aggtranstype;
4076 
4077  /* account for the current transition state */
4078  numTransArgs = pertrans->numTransInputs + 1;
4079 
4080  /*
4081  * Set up infrastructure for calling the transfn. Note that invtransfn is
4082  * not needed here.
4083  */
4084  build_aggregate_transfn_expr(inputTypes,
4085  numArguments,
4086  numDirectArgs,
4087  aggref->aggvariadic,
4088  aggtranstype,
4089  aggref->inputcollid,
4090  transfn_oid,
4091  InvalidOid,
4092  &transfnexpr,
4093  NULL);
4094 
4095  fmgr_info(transfn_oid, &pertrans->transfn);
4096  fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
4097 
4098  pertrans->transfn_fcinfo =
4101  &pertrans->transfn,
4102  numTransArgs,
4103  pertrans->aggCollation,
4104  (void *) aggstate, NULL);
4105 
4106  /* get info about the state value's datatype */
4107  get_typlenbyval(aggtranstype,
4108  &pertrans->transtypeLen,
4109  &pertrans->transtypeByVal);
4110 
4111  if (OidIsValid(aggserialfn))
4112  {
4113  build_aggregate_serialfn_expr(aggserialfn,
4114  &serialfnexpr);
4115  fmgr_info(aggserialfn, &pertrans->serialfn);
4116  fmgr_info_set_expr((Node *) serialfnexpr, &pertrans->serialfn);
4117 
4118  pertrans->serialfn_fcinfo =
4121  &pertrans->serialfn,
4122  1,
4123  InvalidOid,
4124  (void *) aggstate, NULL);
4125  }
4126 
4127  if (OidIsValid(aggdeserialfn))
4128  {
4129  build_aggregate_deserialfn_expr(aggdeserialfn,
4130  &deserialfnexpr);
4131  fmgr_info(aggdeserialfn, &pertrans->deserialfn);
4132  fmgr_info_set_expr((Node *) deserialfnexpr, &pertrans->deserialfn);
4133 
4134  pertrans->deserialfn_fcinfo =
4137  &pertrans->deserialfn,
4138  2,
4139  InvalidOid,
4140  (void *) aggstate, NULL);
4141  }
4142 
4143  /*
4144  * If we're doing either DISTINCT or ORDER BY for a plain agg, then we
4145  * have a list of SortGroupClause nodes; fish out the data in them and
4146  * stick them into arrays. We ignore ORDER BY for an ordered-set agg,
4147  * however; the agg's transfn and finalfn are responsible for that.
4148  *
4149  * When the planner has set the aggpresorted flag, the input to the
4150  * aggregate is already correctly sorted. For ORDER BY aggregates we can
4151  * simply treat these as normal aggregates. For presorted DISTINCT
4152  * aggregates an extra step must be added to remove duplicate consecutive
4153  * inputs.
4154  *
4155  * Note that by construction, if there is a DISTINCT clause then the ORDER
4156  * BY clause is a prefix of it (see transformDistinctClause).
4157  */
4158  if (AGGKIND_IS_ORDERED_SET(aggref->aggkind))
4159  {
4160  sortlist = NIL;
4161  numSortCols = numDistinctCols = 0;
4162  pertrans->aggsortrequired = false;
4163  }
4164  else if (aggref->aggpresorted && aggref->aggdistinct == NIL)
4165  {
4166  sortlist = NIL;
4167  numSortCols = numDistinctCols = 0;
4168  pertrans->aggsortrequired = false;
4169  }
4170  else if (aggref->aggdistinct)
4171  {
4172  sortlist = aggref->aggdistinct;
4173  numSortCols = numDistinctCols = list_length(sortlist);
4174  Assert(numSortCols >= list_length(aggref->aggorder));
4175  pertrans->aggsortrequired = !aggref->aggpresorted;
4176  }
4177  else
4178  {
4179  sortlist = aggref->aggorder;
4180  numSortCols = list_length(sortlist);
4181  numDistinctCols = 0;
4182  pertrans->aggsortrequired = (numSortCols > 0);
4183  }
4184 
4185  pertrans->numSortCols = numSortCols;
4186  pertrans->numDistinctCols = numDistinctCols;
4187 
4188  /*
4189  * If we have either sorting or filtering to do, create a tupledesc and
4190  * slot corresponding to the aggregated inputs (including sort
4191  * expressions) of the agg.
4192  */
4193  if (numSortCols > 0 || aggref->aggfilter)
4194  {
4195  pertrans->sortdesc = ExecTypeFromTL(aggref->args);
4196  pertrans->sortslot =
4197  ExecInitExtraTupleSlot(estate, pertrans->sortdesc,
4199  }
4200 
4201  if (numSortCols > 0)
4202  {
4203  /*
4204  * We don't implement DISTINCT or ORDER BY aggs in the HASHED case
4205  * (yet)
4206  */
4207  Assert(aggstate->aggstrategy != AGG_HASHED && aggstate->aggstrategy != AGG_MIXED);
4208 
4209  /* ORDER BY aggregates are not supported with partial aggregation */
4210  Assert(!DO_AGGSPLIT_COMBINE(aggstate->aggsplit));
4211 
4212  /* If we have only one input, we need its len/byval info. */
4213  if (numInputs == 1)
4214  {
4215  get_typlenbyval(inputTypes[numDirectArgs],
4216  &pertrans->inputtypeLen,
4217  &pertrans->inputtypeByVal);
4218  }
4219  else if (numDistinctCols > 0)
4220  {
4221  /* we will need an extra slot to store prior values */
4222  pertrans->uniqslot =
4223  ExecInitExtraTupleSlot(estate, pertrans->sortdesc,
4225  }
4226 
4227  /* Extract the sort information for use later */
4228  pertrans->sortColIdx =
4229  (AttrNumber *) palloc(numSortCols * sizeof(AttrNumber));
4230  pertrans->sortOperators =
4231  (Oid *) palloc(numSortCols * sizeof(Oid));
4232  pertrans->sortCollations =
4233  (Oid *) palloc(numSortCols * sizeof(Oid));
4234  pertrans->sortNullsFirst =
4235  (bool *) palloc(numSortCols * sizeof(bool));
4236 
4237  i = 0;
4238  foreach(lc, sortlist)
4239  {
4240  SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
4241  TargetEntry *tle = get_sortgroupclause_tle(sortcl, aggref->args);
4242 
4243  /* the parser should have made sure of this */
4244  Assert(OidIsValid(sortcl->sortop));
4245 
4246  pertrans->sortColIdx[i] = tle->resno;
4247  pertrans->sortOperators[i] = sortcl->sortop;
4248  pertrans->sortCollations[i] = exprCollation((Node *) tle->expr);
4249  pertrans->sortNullsFirst[i] = sortcl->nulls_first;
4250  i++;
4251  }
4252  Assert(i == numSortCols);
4253  }
4254 
4255  if (aggref->aggdistinct)
4256  {
4257  Oid *ops;
4258 
4259  Assert(numArguments > 0);
4260  Assert(list_length(aggref->aggdistinct) == numDistinctCols);
4261 
4262  ops = palloc(numDistinctCols * sizeof(Oid));
4263 
4264  i = 0;
4265  foreach(lc, aggref->aggdistinct)
4266  ops[i++] = ((SortGroupClause *) lfirst(lc))->eqop;
4267 
4268  /* lookup / build the necessary comparators */
4269  if (numDistinctCols == 1)
4270  fmgr_info(get_opcode(ops[0]), &pertrans->equalfnOne);
4271  else
4272  pertrans->equalfnMulti =
4273  execTuplesMatchPrepare(pertrans->sortdesc,
4274  numDistinctCols,
4275  pertrans->sortColIdx,
4276  ops,
4277  pertrans->sortCollations,
4278  &aggstate->ss.ps);
4279  pfree(ops);
4280  }
4281 
4282  pertrans->sortstates = (Tuplesortstate **)
4283  palloc0(sizeof(Tuplesortstate *) * numGroupingSets);
4284 }
4285 
4286 
4287 static Datum
4288 GetAggInitVal(Datum textInitVal, Oid transtype)
4289 {
4290  Oid typinput,
4291  typioparam;
4292  char *strInitVal;
4293  Datum initVal;
4294 
4295  getTypeInputInfo(transtype, &typinput, &typioparam);
4296  strInitVal = TextDatumGetCString(textInitVal);
4297  initVal = OidInputFunctionCall(typinput, strInitVal,
4298  typioparam, -1);
4299  pfree(strInitVal);
4300  return initVal;
4301 }
4302 
4303 void
4305 {
4307  int transno;
4308  int numGroupingSets = Max(node->maxsets, 1);
4309  int setno;
4310 
4311  /*
4312  * When ending a parallel worker, copy the statistics gathered by the
4313  * worker back into shared memory so that it can be picked up by the main
4314  * process to report in EXPLAIN ANALYZE.
4315  */
4316  if (node->shared_info && IsParallelWorker())
4317  {
4319 
4320  Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
4323  si->hash_disk_used = node->hash_disk_used;
4324  si->hash_mem_peak = node->hash_mem_peak;
4325  }
4326 
4327  /* Make sure we have closed any open tuplesorts */
4328 
4329  if (node->sort_in)
4330  tuplesort_end(node->sort_in);
4331  if (node->sort_out)
4332  tuplesort_end(node->sort_out);
4333 
4335 
4336  if (node->hash_metacxt != NULL)
4337  {
4339  node->hash_metacxt = NULL;
4340  }
4341 
4342  for (transno = 0; transno < node->numtrans; transno++)
4343  {
4344  AggStatePerTrans pertrans = &node->pertrans[transno];
4345 
4346  for (setno = 0; setno < numGroupingSets; setno++)
4347  {
4348  if (pertrans->sortstates[setno])
4349  tuplesort_end(pertrans->sortstates[setno]);
4350  }
4351  }
4352 
4353  /* And ensure any agg shutdown callbacks have been called */
4354  for (setno = 0; setno < numGroupingSets; setno++)
4355  ReScanExprContext(node->aggcontexts[setno]);
4356  if (node->hashcontext)
4358 
4359  outerPlan = outerPlanState(node);
4361 }
4362 
4363 void
4365 {
4366  ExprContext *econtext = node->ss.ps.ps_ExprContext;
4368  Agg *aggnode = (Agg *) node->ss.ps.plan;
4369  int transno;
4370  int numGroupingSets = Max(node->maxsets, 1);
4371  int setno;
4372 
4373  node->agg_done = false;
4374 
4375  if (node->aggstrategy == AGG_HASHED)
4376  {
4377  /*
4378  * In the hashed case, if we haven't yet built the hash table then we
4379  * can just return; nothing done yet, so nothing to undo. If subnode's
4380  * chgParam is not NULL then it will be re-scanned by ExecProcNode,
4381  * else no reason to re-scan it at all.
4382  */
4383  if (!node->table_filled)
4384  return;
4385 
4386  /*
4387  * If we do have the hash table, and it never spilled, and the subplan
4388  * does not have any parameter changes, and none of our own parameter
4389  * changes affect input expressions of the aggregated functions, then
4390  * we can just rescan the existing hash table; no need to build it
4391  * again.
4392  */
4393  if (outerPlan->chgParam == NULL && !node->hash_ever_spilled &&
4394  !bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
4395  {
4397  &node->perhash[0].hashiter);
4398  select_current_set(node, 0, true);
4399  return;
4400  }
4401  }
4402 
4403  /* Make sure we have closed any open tuplesorts */
4404  for (transno = 0; transno < node->numtrans; transno++)
4405  {
4406  for (setno = 0; setno < numGroupingSets; setno++)
4407  {
4408  AggStatePerTrans pertrans = &node->pertrans[transno];
4409 
4410  if (pertrans->sortstates[setno])
4411  {
4412  tuplesort_end(pertrans->sortstates[setno]);
4413  pertrans->sortstates[setno] = NULL;
4414  }
4415  }
4416  }
4417 
4418  /*
4419  * We don't need to ReScanExprContext the output tuple context here;
4420  * ExecReScan already did it. But we do need to reset our per-grouping-set
4421  * contexts, which may have transvalues stored in them. (We use rescan
4422  * rather than just reset because transfns may have registered callbacks
4423  * that need to be run now.) For the AGG_HASHED case, see below.
4424  */
4425 
4426  for (setno = 0; setno < numGroupingSets; setno++)
4427  {
4428  ReScanExprContext(node->aggcontexts[setno]);
4429  }
4430 
4431  /* Release first tuple of group, if we have made a copy */
4432  if (node->grp_firstTuple != NULL)
4433  {
4435  node->grp_firstTuple = NULL;
4436  }
4438 
4439  /* Forget current agg values */
4440  MemSet(econtext->ecxt_aggvalues, 0, sizeof(Datum) * node->numaggs);
4441  MemSet(econtext->ecxt_aggnulls, 0, sizeof(bool) * node->numaggs);
4442 
4443  /*
4444  * With AGG_HASHED/MIXED, the hash table is allocated in a sub-context of
4445  * the hashcontext. This used to be an issue, but now, resetting a context
4446  * automatically deletes sub-contexts too.
4447  */
4448  if (node->aggstrategy == AGG_HASHED || node->aggstrategy == AGG_MIXED)
4449  {
4451 
4452  node->hash_ever_spilled = false;
4453  node->hash_spill_mode = false;
4454  node->hash_ngroups_current = 0;
4455 
4457  /* Rebuild an empty hash table */
4458  build_hash_tables(node);
4459  node->table_filled = false;
4460  /* iterator will be reset when the table is filled */
4461 
4462  hashagg_recompile_expressions(node, false, false);
4463  }
4464 
4465  if (node->aggstrategy != AGG_HASHED)
4466  {
4467  /*
4468  * Reset the per-group state (in particular, mark transvalues null)
4469  */
4470  for (setno = 0; setno < numGroupingSets; setno++)
4471  {
4472  MemSet(node->pergroups[setno], 0,
4473  sizeof(AggStatePerGroupData) * node->numaggs);
4474  }
4475 
4476  /* reset to phase 1 */
4477  initialize_phase(node, 1);
4478 
4479  node->input_done = false;
4480  node->projected_set = -1;
4481  }
4482 
4483  if (outerPlan->chgParam == NULL)
4485 }
4486 
4487 
4488 /***********************************************************************
4489  * API exposed to aggregate functions
4490  ***********************************************************************/
4491 
4492 
4493 /*
4494  * AggCheckCallContext - test if a SQL function is being called as an aggregate
4495  *
4496  * The transition and/or final functions of an aggregate may want to verify
4497  * that they are being called as aggregates, rather than as plain SQL
4498  * functions. They should use this function to do so. The return value
4499  * is nonzero if being called as an aggregate, or zero if not. (Specific
4500  * nonzero values are AGG_CONTEXT_AGGREGATE or AGG_CONTEXT_WINDOW, but more
4501  * values could conceivably appear in future.)
4502  *
4503  * If aggcontext isn't NULL, the function also stores at *aggcontext the
4504  * identity of the memory context that aggregate transition values are being
4505  * stored in. Note that the same aggregate call site (flinfo) may be called
4506  * interleaved on different transition values in different contexts, so it's
4507  * not kosher to cache aggcontext under fn_extra. It is, however, kosher to
4508  * cache it in the transvalue itself (for internal-type transvalues).
4509  */
4510 int
4512 {
4513  if (fcinfo->context && IsA(fcinfo->context, AggState))
4514  {
4515  if (aggcontext)
4516  {
4517  AggState *aggstate = ((AggState *) fcinfo->context);
4518  ExprContext *cxt = aggstate->curaggcontext;
4519 
4520  *aggcontext = cxt->ecxt_per_tuple_memory;
4521  }
4522  return AGG_CONTEXT_AGGREGATE;
4523  }
4524  if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
4525  {
4526  if (aggcontext)
4527  *aggcontext = ((WindowAggState *) fcinfo->context)->curaggcontext;
4528  return AGG_CONTEXT_WINDOW;
4529  }
4530 
4531  /* this is just to prevent "uninitialized variable" warnings */
4532  if (aggcontext)
4533  *aggcontext = NULL;
4534  return 0;
4535 }
4536 
4537 /*
4538  * AggGetAggref - allow an aggregate support function to get its Aggref
4539  *
4540  * If the function is being called as an aggregate support function,
4541  * return the Aggref node for the aggregate call. Otherwise, return NULL.
4542  *
4543  * Aggregates sharing the same inputs and transition functions can get
4544  * merged into a single transition calculation. If the transition function
4545  * calls AggGetAggref, it will get some one of the Aggrefs for which it is
4546  * executing. It must therefore not pay attention to the Aggref fields that
4547  * relate to the final function, as those are indeterminate. But if a final
4548  * function calls AggGetAggref, it will get a precise result.
4549  *
4550  * Note that if an aggregate is being used as a window function, this will
4551  * return NULL. We could provide a similar function to return the relevant
4552  * WindowFunc node in such cases, but it's not needed yet.
4553  */
4554 Aggref *
4556 {
4557  if (fcinfo->context && IsA(fcinfo->context, AggState))
4558  {
4559  AggState *aggstate = (AggState *) fcinfo->context;
4560  AggStatePerAgg curperagg;
4561  AggStatePerTrans curpertrans;
4562 
4563  /* check curperagg (valid when in a final function) */
4564  curperagg = aggstate->curperagg;
4565 
4566  if (curperagg)
4567  return curperagg->aggref;
4568 
4569  /* check curpertrans (valid when in a transition function) */
4570  curpertrans = aggstate->curpertrans;
4571 
4572  if (curpertrans)
4573  return curpertrans->aggref;
4574  }
4575  return NULL;
4576 }
4577 
4578 /*
4579  * AggGetTempMemoryContext - fetch short-term memory context for aggregates
4580  *
4581  * This is useful in agg final functions; the context returned is one that
4582  * the final function can safely reset as desired. This isn't useful for
4583  * transition functions, since the context returned MAY (we don't promise)
4584  * be the same as the context those are called in.
4585  *
4586  * As above, this is currently not useful for aggs called as window functions.
4587  */
4590 {
4591  if (fcinfo->context && IsA(fcinfo->context, AggState))
4592  {
4593  AggState *aggstate = (AggState *) fcinfo->context;
4594 
4595  return aggstate->tmpcontext->ecxt_per_tuple_memory;
4596  }
4597  return NULL;
4598 }
4599 
4600 /*
4601  * AggStateIsShared - find out whether transition state is shared
4602  *
4603  * If the function is being called as an aggregate support function,
4604  * return true if the aggregate's transition state is shared across
4605  * multiple aggregates, false if it is not.
4606  *
4607  * Returns true if not called as an aggregate support function.
4608  * This is intended as a conservative answer, ie "no you'd better not
4609  * scribble on your input". In particular, will return true if the
4610  * aggregate is being used as a window function, which is a scenario
4611  * in which changing the transition state is a bad idea. We might
4612  * want to refine the behavior for the window case in future.
4613  */
4614 bool
4616 {
4617  if (fcinfo->context && IsA(fcinfo->context, AggState))
4618  {
4619  AggState *aggstate = (AggState *) fcinfo->context;
4620  AggStatePerAgg curperagg;
4621  AggStatePerTrans curpertrans;
4622 
4623  /* check curperagg (valid when in a final function) */
4624  curperagg = aggstate->curperagg;
4625 
4626  if (curperagg)
4627  return aggstate->pertrans[curperagg->transno].aggshared;
4628 
4629  /* check curpertrans (valid when in a transition function) */
4630  curpertrans = aggstate->curpertrans;
4631 
4632  if (curpertrans)
4633  return curpertrans->aggshared;
4634  }
4635  return true;
4636 }
4637 
4638 /*
4639  * AggRegisterCallback - register a cleanup callback for an aggregate
4640  *
4641  * This is useful for aggs to register shutdown callbacks, which will ensure
4642  * that non-memory resources are freed. The callback will occur just before
4643  * the associated aggcontext (as returned by AggCheckCallContext) is reset,
4644  * either between groups or as a result of rescanning the query. The callback
4645  * will NOT be called on error paths. The typical use-case is for freeing of
4646  * tuplestores or tuplesorts maintained in aggcontext, or pins held by slots
4647  * created by the agg functions. (The callback will not be called until after
4648  * the result of the finalfn is no longer needed, so it's safe for the finalfn
4649  * to return data that will be freed by the callback.)
4650  *
4651  * As above, this is currently not useful for aggs called as window functions.
4652  */
4653 void
4656  Datum arg)
4657 {
4658  if (fcinfo->context && IsA(fcinfo->context, AggState))
4659  {
4660  AggState *aggstate = (AggState *) fcinfo->context;
4661  ExprContext *cxt = aggstate->curaggcontext;
4662 
4663  RegisterExprContextCallback(cxt, func, arg);
4664 
4665  return;
4666  }
4667  elog(ERROR, "aggregate function cannot register a callback in this context");
4668 }
4669 
4670 
4671 /* ----------------------------------------------------------------
4672  * Parallel Query Support
4673  * ----------------------------------------------------------------
4674  */
4675 
4676  /* ----------------------------------------------------------------
4677  * ExecAggEstimate
4678  *
4679  * Estimate space required to propagate aggregate statistics.
4680  * ----------------------------------------------------------------
4681  */
4682 void
4684 {
4685  Size size;
4686 
4687  /* don't need this if not instrumenting or no workers */
4688  if (!node->ss.ps.instrument || pcxt->nworkers == 0)
4689  return;
4690 
4691  size = mul_size(pcxt->nworkers, sizeof(AggregateInstrumentation));
4692  size = add_size(size, offsetof(SharedAggInfo, sinstrument));
4694  shm_toc_estimate_keys(&pcxt->estimator, 1);
4695 }
4696 
4697 /* ----------------------------------------------------------------
4698  * ExecAggInitializeDSM
4699  *
4700  * Initialize DSM space for aggregate statistics.
4701  * ----------------------------------------------------------------
4702  */
4703 void
4705 {
4706  Size size;
4707 
4708  /* don't need this if not instrumenting or no workers */
4709  if (!node->ss.ps.instrument || pcxt->nworkers == 0)
4710  return;
4711 
4712  size = offsetof(SharedAggInfo, sinstrument)
4713  + pcxt->nworkers * sizeof(AggregateInstrumentation);
4714  node->shared_info = shm_toc_allocate(pcxt->toc, size);
4715  /* ensure any unfilled slots will contain zeroes */
4716  memset(node->shared_info, 0, size);
4717  node->shared_info->num_workers = pcxt->nworkers;
4718  shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
4719  node->shared_info);
4720 }
4721 
4722 /* ----------------------------------------------------------------
4723  * ExecAggInitializeWorker
4724  *
4725  * Attach worker to DSM space for aggregate statistics.
4726  * ----------------------------------------------------------------
4727  */
4728 void
4730 {
4731  node->shared_info =
4732  shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
4733 }
4734 
4735 /* ----------------------------------------------------------------
4736  * ExecAggRetrieveInstrumentation
4737  *
4738  * Transfer aggregate statistics from DSM to private memory.
4739  * ----------------------------------------------------------------
4740  */
4741 void
4743 {
4744  Size size;
4745  SharedAggInfo *si;
4746 
4747  if (node->shared_info == NULL)
4748  return;
4749 
4750  size = offsetof(SharedAggInfo, sinstrument)
4752  si = palloc(size);
4753  memcpy(si, node->shared_info, size);
4754  node->shared_info = si;
4755 }
AclResult
Definition: acl.h:182
@ ACLCHECK_OK
Definition: acl.h:183
void aclcheck_error(AclResult aclerr, ObjectType objtype, const char *objectname)
Definition: aclchk.c:2700
AclResult object_aclcheck(Oid classid, Oid objectid, Oid roleid, AclMode mode)
Definition: aclchk.c:3888
int16 AttrNumber
Definition: attnum.h:21
int ParallelWorkerNumber
Definition: parallel.c:112
int bms_next_member(const Bitmapset *a, int prevbit)
Definition: bitmapset.c:1306
void bms_free(Bitmapset *a)
Definition: bitmapset.c:239
int bms_num_members(const Bitmapset *a)
Definition: bitmapset.c:751
bool bms_is_member(int x, const Bitmapset *a)
Definition: bitmapset.c:510
Bitmapset * bms_add_member(Bitmapset *a, int x)
Definition: bitmapset.c:815
Bitmapset * bms_union(const Bitmapset *a, const Bitmapset *b)
Definition: bitmapset.c:251
Bitmapset * bms_add_members(Bitmapset *a, const Bitmapset *b)
Definition: bitmapset.c:917
Bitmapset * bms_del_member(Bitmapset *a, int x)
Definition: bitmapset.c:868
bool bms_overlap(const Bitmapset *a, const Bitmapset *b)
Definition: bitmapset.c:582
Bitmapset * bms_copy(const Bitmapset *a)
Definition: bitmapset.c:122
#define TextDatumGetCString(d)
Definition: builtins.h:98
unsigned int uint32
Definition: c.h:506
#define MAXALIGN(LEN)
Definition: c.h:811
#define Max(x, y)
Definition: c.h:998
#define Assert(condition)
Definition: c.h:858
#define MemSet(start, val, len)
Definition: c.h:1020
#define OidIsValid(objectId)
Definition: c.h:775
size_t Size
Definition: c.h:605
Datum datumCopy(Datum value, bool typByVal, int typLen)
Definition: datum.c:132
int my_log2(long num)
Definition: dynahash.c:1751
int errmsg_internal(const char *fmt,...)
Definition: elog.c:1159
int errcode_for_file_access(void)
Definition: elog.c:878
int errcode(int sqlerrcode)
Definition: elog.c:855
int errmsg(const char *fmt,...)
Definition: elog.c:1072
#define ERROR
Definition: elog.h:39
#define elog(elevel,...)
Definition: elog.h:224
#define ereport(elevel,...)
Definition: elog.h:149
void ExecReScan(PlanState *node)
Definition: execAmi.c:76
Datum ExecAggCopyTransValue(AggState *aggstate, AggStatePerTrans pertrans, Datum newValue, bool newValueIsNull, Datum oldValue, bool oldValueIsNull)
List * ExecInitExprList(List *nodes, PlanState *parent)
Definition: execExpr.c:326
ExprState * ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase, bool doSort, bool doHash, bool nullcheck)
Definition: execExpr.c:3490
ExprState * ExecInitQual(List *qual, PlanState *parent)
Definition: execExpr.c:220
void execTuplesHashPrepare(int numCols, const Oid *eqOperators, Oid **eqFuncOids, FmgrInfo **hashFunctions)
Definition: execGrouping.c:95
TupleHashEntry LookupTupleHashEntryHash(TupleHashTable hashtable, TupleTableSlot *slot, bool *isnew, uint32 hash)
Definition: execGrouping.c:360
TupleHashEntry LookupTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot, bool *isnew, uint32 *hash)
Definition: execGrouping.c:305
TupleHashTable BuildTupleHashTableExt(PlanState *parent, TupleDesc inputDesc, int numCols, AttrNumber *keyColIdx, const Oid *eqfuncoids, FmgrInfo *hashfunctions, Oid *collations, long nbuckets, Size additionalsize, MemoryContext metacxt, MemoryContext tablecxt, MemoryContext tempcxt, bool use_variable_hash_iv)
Definition: execGrouping.c:153
void ResetTupleHashTable(TupleHashTable hashtable)
Definition: execGrouping.c:284
ExprState * execTuplesMatchPrepare(TupleDesc desc, int numCols, const AttrNumber *keyColIdx, const Oid *eqOperators, const Oid *collations, PlanState *parent)
Definition: execGrouping.c:58
void ExecEndNode(PlanState *node)
Definition: execProcnode.c:557
PlanState * ExecInitNode(Plan *node, EState *estate, int eflags)
Definition: execProcnode.c:142
const TupleTableSlotOps TTSOpsVirtual
Definition: execTuples.c:84
TupleTableSlot * ExecStoreVirtualTuple(TupleTableSlot *slot)
Definition: execTuples.c:1639
MinimalTuple ExecFetchSlotMinimalTuple(TupleTableSlot *slot, bool *shouldFree)
Definition: execTuples.c:1779
TupleTableSlot * ExecStoreAllNullTuple(TupleTableSlot *slot)
Definition: execTuples.c:1663
TupleTableSlot * ExecStoreMinimalTuple(MinimalTuple mtup, TupleTableSlot *slot, bool shouldFree)
Definition: execTuples.c:1533
TupleTableSlot * ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1918
void ExecInitResultTupleSlotTL(PlanState *planstate, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1886
const TupleTableSlotOps TTSOpsMinimalTuple
Definition: execTuples.c:86
TupleDesc ExecTypeFromTL(List *targetList)
Definition: execTuples.c:2025
TupleTableSlot * ExecAllocTableSlot(List **tupleTable, TupleDesc desc, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:1258
void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot, bool shouldFree)
Definition: execTuples.c:1556
TupleDesc ExecGetResultType(PlanState *planstate)
Definition: execUtils.c:493
void ReScanExprContext(ExprContext *econtext)
Definition: execUtils.c:441
ExprContext * CreateWorkExprContext(EState *estate)
Definition: execUtils.c:319
const TupleTableSlotOps * ExecGetResultSlotOps(PlanState *planstate, bool *isfixed)
Definition: execUtils.c:502
void ExecCreateScanSlotFromOuterPlan(EState *estate, ScanState *scanstate, const TupleTableSlotOps *tts_ops)
Definition: execUtils.c:659
void ExecAssignExprContext(EState *estate, PlanState *planstate)
Definition: execUtils.c:483
void ExecAssignProjectionInfo(PlanState *planstate, TupleDesc inputDesc)
Definition: execUtils.c:538
void RegisterExprContextCallback(ExprContext *econtext, ExprContextCallbackFunction function, Datum arg)
Definition: execUtils.c:897
void(* ExprContextCallbackFunction)(Datum arg)
Definition: execnodes.h:217
#define InstrCountFiltered1(node, delta)
Definition: execnodes.h:1220
#define outerPlanState(node)
Definition: execnodes.h:1212
#define ScanTupleHashTable(htable, iter)
Definition: execnodes.h:848
#define ResetTupleHashIterator(htable, iter)
Definition: execnodes.h:846
struct AggStatePerGroupData * AggStatePerGroup
Definition: execnodes.h:2483
struct AggStatePerTransData * AggStatePerTrans
Definition: execnodes.h:2482
struct TupleHashEntryData TupleHashEntryData
struct AggregateInstrumentation AggregateInstrumentation
struct AggStatePerAggData * AggStatePerAgg
Definition: execnodes.h:2481
#define EXEC_FLAG_BACKWARD
Definition: executor.h:68
#define EXEC_FLAG_REWIND
Definition: executor.h:67
static TupleTableSlot * ExecProject(ProjectionInfo *projInfo)
Definition: executor.h:376
#define ResetExprContext(econtext)
Definition: executor.h:544
static bool ExecQual(ExprState *state, ExprContext *econtext)
Definition: executor.h:413
static bool ExecQualAndReset(ExprState *state, ExprContext *econtext)
Definition: executor.h:440
static Datum ExecEvalExpr(ExprState *state, ExprContext *econtext, bool *isNull)
Definition: executor.h:333
static Datum ExecEvalExprSwitchContext(ExprState *state, ExprContext *econtext, bool *isNull)
Definition: executor.h:348
#define EXEC_FLAG_EXPLAIN_ONLY
Definition: executor.h:65
#define EXEC_FLAG_MARK
Definition: executor.h:69
static TupleTableSlot * ExecProcNode(PlanState *node)
Definition: executor.h:269
#define MakeExpandedObjectReadOnly(d, isnull, typlen)
Datum FunctionCall2Coll(FmgrInfo *flinfo, Oid collation, Datum arg1, Datum arg2)
Definition: fmgr.c:1149
void fmgr_info(Oid functionId, FmgrInfo *finfo)
Definition: fmgr.c:127
Datum OidInputFunctionCall(Oid functionId, char *str, Oid typioparam, int32 typmod)
Definition: fmgr.c:1754
#define SizeForFunctionCallInfo(nargs)
Definition: fmgr.h:102
#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs, Collation, Context, Resultinfo)
Definition: fmgr.h:150
#define AGG_CONTEXT_WINDOW
Definition: fmgr.h:762
#define LOCAL_FCINFO(name, nargs)
Definition: fmgr.h:110
#define AGG_CONTEXT_AGGREGATE
Definition: fmgr.h:761
struct FunctionCallInfoBaseData * FunctionCallInfo
Definition: fmgr.h:38
#define FunctionCallInvoke(fcinfo)
Definition: fmgr.h:172
#define fmgr_info_set_expr(expr, finfo)
Definition: fmgr.h:135
char * format_type_be(Oid type_oid)
Definition: format_type.c:343
int work_mem
Definition: globals.c:129
uint32 hash_bytes_uint32(uint32 k)
Definition: hashfn.c:610
for(;;)
void heap_freetuple(HeapTuple htup)
Definition: heaptuple.c:1434
MinimalTupleData * MinimalTuple
Definition: htup.h:27
#define HeapTupleIsValid(tuple)
Definition: htup.h:78
#define SizeofMinimalTupleHeader
Definition: htup_details.h:647
#define GETSTRUCT(TUP)
Definition: htup_details.h:653
void initHyperLogLog(hyperLogLogState *cState, uint8 bwidth)
Definition: hyperloglog.c:66
double estimateHyperLogLog(hyperLogLogState *cState)
Definition: hyperloglog.c:186
void addHyperLogLog(hyperLogLogState *cState, uint32 hash)
Definition: hyperloglog.c:167
void freeHyperLogLog(hyperLogLogState *cState)
Definition: hyperloglog.c:151
#define IsParallelWorker()
Definition: parallel.h:60
static int initValue(long lng_val)
Definition: informix.c:683
int j
Definition: isn.c:74
int i
Definition: isn.c:73
if(TABLE==NULL||TABLE_index==NULL)
Definition: isn.c:77
List * lcons_int(int datum, List *list)
Definition: list.c:513
List * lappend(List *list, void *datum)
Definition: list.c:339
void list_free(List *list)
Definition: list.c:1546
void list_free_deep(List *list)
Definition: list.c:1560
List * list_delete_last(List *list)
Definition: list.c:957
LogicalTape * LogicalTapeCreate(LogicalTapeSet *lts)
Definition: logtape.c:680
void LogicalTapeRewindForRead(LogicalTape *lt, size_t buffer_size)
Definition: logtape.c:846
size_t LogicalTapeRead(LogicalTape *lt, void *ptr, size_t size)
Definition: logtape.c:928
int64 LogicalTapeSetBlocks(LogicalTapeSet *lts)
Definition: logtape.c:1181
void LogicalTapeClose(LogicalTape *lt)
Definition: logtape.c:733
void LogicalTapeSetClose(LogicalTapeSet *lts)
Definition: logtape.c:667
void LogicalTapeWrite(LogicalTape *lt, const void *ptr, size_t size)
Definition: logtape.c:761
LogicalTapeSet * LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
Definition: logtape.c:556
void get_typlenbyval(Oid typid, int16 *typlen, bool *typbyval)
Definition: lsyscache.c:2251
RegProcedure get_opcode(Oid opno)
Definition: lsyscache.c:1285
void getTypeInputInfo(Oid type, Oid *typInput, Oid *typIOParam)
Definition: lsyscache.c:2874
char * get_func_name(Oid funcid)
Definition: lsyscache.c:1608
void MemoryContextReset(MemoryContext context)
Definition: mcxt.c:383
void pfree(void *pointer)
Definition: mcxt.c:1521
void * palloc0(Size size)
Definition: mcxt.c:1347
void * MemoryContextAlloc(MemoryContext context, Size size)
Definition: mcxt.c:1181
Size MemoryContextMemAllocated(MemoryContext context, bool recurse)
Definition: mcxt.c:762
void MemoryContextDelete(MemoryContext context)
Definition: mcxt.c:454
void * palloc(Size size)
Definition: mcxt.c:1317
#define AllocSetContextCreate
Definition: memutils.h:129
#define ALLOCSET_DEFAULT_SIZES
Definition: memutils.h:160
#define CHECK_FOR_INTERRUPTS()
Definition: miscadmin.h:122
Oid GetUserId(void)
Definition: miscinit.c:514
static void hashagg_finish_initial_spills(AggState *aggstate)
Definition: nodeAgg.c:3059
static long hash_choose_num_buckets(double hashentrysize, long ngroups, Size memory)
Definition: nodeAgg.c:1966
static void hash_agg_check_limits(AggState *aggstate)
Definition: nodeAgg.c:1856
static void initialize_hash_entry(AggState *aggstate, TupleHashTable hashtable, TupleHashEntry entry)
Definition: nodeAgg.c:2045
static void find_hash_columns(AggState *aggstate)
Definition: nodeAgg.c:1563
static bool agg_refill_hash_table(AggState *aggstate)
Definition: nodeAgg.c:2594
static void build_hash_table(AggState *aggstate, int setno, long nbuckets)
Definition: nodeAgg.c:1503
void ExecAggEstimate(AggState *node, ParallelContext *pcxt)
Definition: nodeAgg.c:4683
struct FindColsContext FindColsContext
static void hash_agg_enter_spill_mode(AggState *aggstate)
Definition: nodeAgg.c:1882
struct HashAggBatch HashAggBatch
static Datum GetAggInitVal(Datum textInitVal, Oid transtype)
Definition: nodeAgg.c:4288
static void find_cols(AggState *aggstate, Bitmapset **aggregated, Bitmapset **unaggregated)
Definition: nodeAgg.c:1397
void AggRegisterCallback(FunctionCallInfo fcinfo, ExprContextCallbackFunction func, Datum arg)
Definition: nodeAgg.c:4654
#define HASHAGG_HLL_BIT_WIDTH
Definition: nodeAgg.c:314
static void agg_fill_hash_table(AggState *aggstate)
Definition: nodeAgg.c:2540
static void initialize_aggregate(AggState *aggstate, AggStatePerTrans pertrans, AggStatePerGroup pergroupstate)
Definition: nodeAgg.c:578
static TupleTableSlot * fetch_input_tuple(AggState *aggstate)
Definition: nodeAgg.c:547
static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
Definition: nodeAgg.c:3093
static bool find_cols_walker(Node *node, FindColsContext *context)
Definition: nodeAgg.c:1420
void ExecAggInitializeWorker(AggState *node, ParallelWorkerContext *pwcxt)
Definition: nodeAgg.c:4729
AggState * ExecInitAgg(Agg *node, EState *estate, int eflags)
Definition: nodeAgg.c:3173
void ExecAggRetrieveInstrumentation(AggState *node)
Definition: nodeAgg.c:4742
static TupleTableSlot * ExecAgg(PlanState *pstate)
Definition: nodeAgg.c:2158
static TupleTableSlot * project_aggregates(AggState *aggstate)
Definition: nodeAgg.c:1371
static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp)
Definition: nodeAgg.c:3010
struct HashAggSpill HashAggSpill
static void process_ordered_aggregate_multi(AggState *aggstate, AggStatePerTrans pertrans, AggStatePerGroup pergroupstate)
Definition: nodeAgg.c:949
void ExecReScanAgg(AggState *node)
Definition: nodeAgg.c:4364
int AggCheckCallContext(FunctionCallInfo fcinfo, MemoryContext *aggcontext)
Definition: nodeAgg.c:4511
static void advance_transition_function(AggState *aggstate, AggStatePerTrans pertrans, AggStatePerGroup pergroupstate)
Definition: nodeAgg.c:706
static void hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
Definition: nodeAgg.c:1917
static void finalize_aggregates(AggState *aggstate, AggStatePerAgg peraggs, AggStatePerGroup pergroup)
Definition: nodeAgg.c:1294
static void initialize_phase(AggState *aggstate, int newphase)
Definition: nodeAgg.c:477
Size hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
Definition: nodeAgg.c:1694
static void initialize_aggregates(AggState *aggstate, AggStatePerGroup *pergroups, int numReset)
Definition: nodeAgg.c:665
static TupleTableSlot * agg_retrieve_hash_table_in_memory(AggState *aggstate)
Definition: nodeAgg.c:2771
void ExecAggInitializeDSM(AggState *node, ParallelContext *pcxt)
Definition: nodeAgg.c:4704
static void finalize_aggregate(AggState *aggstate, AggStatePerAgg peragg, AggStatePerGroup pergroupstate, Datum *resultVal, bool *resultIsNull)
Definition: nodeAgg.c:1046
#define HASHAGG_MAX_PARTITIONS
Definition: nodeAgg.c:297
static void lookup_hash_entries(AggState *aggstate)
Definition: nodeAgg.c:2095
static TupleTableSlot * agg_retrieve_direct(AggState *aggstate)
Definition: nodeAgg.c:2194
static void hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
Definition: nodeAgg.c:1741
static void prepare_projection_slot(AggState *aggstate, TupleTableSlot *slot, int currentSet)
Definition: nodeAgg.c:1249
bool AggStateIsShared(FunctionCallInfo fcinfo)
Definition: nodeAgg.c:4615
static void build_pertrans_for_aggref(AggStatePerTrans pertrans, AggState *aggstate, EState *estate, Aggref *aggref, Oid transfn_oid, Oid aggtranstype, Oid aggserialfn, Oid aggdeserialfn, Datum initValue, bool initValueIsNull, Oid *inputTypes, int numArguments)
Definition: nodeAgg.c:4038
Aggref * AggGetAggref(FunctionCallInfo fcinfo)
Definition: nodeAgg.c:4555
#define CHUNKHDRSZ
Definition: nodeAgg.c:320
static TupleTableSlot * agg_retrieve_hash_table(AggState *aggstate)
Definition: nodeAgg.c:2746
static void process_ordered_aggregate_single(AggState *aggstate, AggStatePerTrans pertrans, AggStatePerGroup pergroupstate)
Definition: nodeAgg.c:848
static void advance_aggregates(AggState *aggstate)
Definition: nodeAgg.c:816
static void prepare_hash_slot(AggStatePerHash perhash, TupleTableSlot *inputslot, TupleTableSlot *hashslot)
Definition: nodeAgg.c:1204
static void build_hash_tables(AggState *aggstate)
Definition: nodeAgg.c:1468
void ExecEndAgg(AggState *node)
Definition: nodeAgg.c:4304
#define HASHAGG_READ_BUFFER_SIZE
Definition: nodeAgg.c:305
static void hashagg_reset_spill_state(AggState *aggstate)
Definition: nodeAgg.c:3133
static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill, TupleTableSlot *inputslot, uint32 hash)
Definition: nodeAgg.c:2925
static void select_current_set(AggState *aggstate, int setno, bool is_hash)
Definition: nodeAgg.c:455
static void finalize_partialaggregate(AggState *aggstate, AggStatePerAgg peragg, AggStatePerGroup pergroupstate, Datum *resultVal, bool *resultIsNull)
Definition: nodeAgg.c:1146
static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits, double input_groups, double hashentrysize)
Definition: nodeAgg.c:2894
#define HASHAGG_MIN_PARTITIONS
Definition: nodeAgg.c:296
void hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits, Size *mem_limit, uint64 *ngroups_limit, int *num_partitions)
Definition: nodeAgg.c:1798
MemoryContext AggGetTempMemoryContext(FunctionCallInfo fcinfo)
Definition: nodeAgg.c:4589
#define HASHAGG_PARTITION_FACTOR
Definition: nodeAgg.c:295
static HashAggBatch * hashagg_batch_new(LogicalTape *input_tape, int setno, int64 input_tuples, double input_card, int used_bits)
Definition: nodeAgg.c:2991
#define HASHAGG_WRITE_BUFFER_SIZE
Definition: nodeAgg.c:306
static int hash_choose_num_partitions(double input_groups, double hashentrysize, int used_bits, int *log2_npartitions)
Definition: nodeAgg.c:1991
struct AggStatePerGroupData AggStatePerGroupData
Oid exprCollation(const Node *expr)
Definition: nodeFuncs.c:816
#define expression_tree_walker(n, w, c)
Definition: nodeFuncs.h:151
size_t get_hash_memory_limit(void)
Definition: nodeHash.c:3595
#define DO_AGGSPLIT_SKIPFINAL(as)
Definition: nodes.h:386
#define IsA(nodeptr, _type_)
Definition: nodes.h:158
#define DO_AGGSPLIT_DESERIALIZE(as)
Definition: nodes.h:388
#define DO_AGGSPLIT_COMBINE(as)
Definition: nodes.h:385
@ AGG_SORTED
Definition: nodes.h:355
@ AGG_HASHED
Definition: nodes.h:356
@ AGG_MIXED
Definition: nodes.h:357
@ AGG_PLAIN
Definition: nodes.h:354
#define DO_AGGSPLIT_SERIALIZE(as)
Definition: nodes.h:387
#define makeNode(_type_)
Definition: nodes.h:155
#define castNode(_type_, nodeptr)
Definition: nodes.h:176
#define InvokeFunctionExecuteHook(objectId)
Definition: objectaccess.h:213
void build_aggregate_finalfn_expr(Oid *agg_input_types, int num_finalfn_inputs, Oid agg_state_type, Oid agg_result_type, Oid agg_input_collation, Oid finalfn_oid, Expr **finalfnexpr)
Definition: parse_agg.c:2136
void build_aggregate_deserialfn_expr(Oid deserialfn_oid, Expr **deserialfnexpr)
Definition: parse_agg.c:2112
void build_aggregate_transfn_expr(Oid *agg_input_types, int agg_num_inputs, int agg_num_direct_inputs, bool agg_variadic, Oid agg_state_type, Oid agg_input_collation, Oid transfn_oid, Oid invtransfn_oid, Expr **transfnexpr, Expr **invtransfnexpr)
Definition: parse_agg.c:2028
int get_aggregate_argtypes(Aggref *aggref, Oid *inputTypes)
Definition: parse_agg.c:1908
void build_aggregate_serialfn_expr(Oid serialfn_oid, Expr **serialfnexpr)
Definition: parse_agg.c:2089
bool IsBinaryCoercible(Oid srctype, Oid targettype)
@ OBJECT_AGGREGATE
Definition: parsenodes.h:2262
@ OBJECT_FUNCTION
Definition: parsenodes.h:2280
#define ACL_EXECUTE
Definition: parsenodes.h:83
FormData_pg_aggregate * Form_pg_aggregate
Definition: pg_aggregate.h:109
int16 attnum
Definition: pg_attribute.h:74
FormData_pg_attribute * Form_pg_attribute
Definition: pg_attribute.h:209
void * arg
#define FUNC_MAX_ARGS
#define lfirst(lc)
Definition: pg_list.h:172
#define llast(l)
Definition: pg_list.h:198
static int list_length(const List *l)
Definition: pg_list.h:152
#define NIL
Definition: pg_list.h:68
#define lfirst_int(lc)
Definition: pg_list.h:173
#define linitial_int(l)
Definition: pg_list.h:179
static void * list_nth(const List *list, int n)
Definition: pg_list.h:299
#define list_nth_node(type, list, n)
Definition: pg_list.h:327
FormData_pg_proc * Form_pg_proc
Definition: pg_proc.h:136
#define outerPlan(node)
Definition: plannodes.h:182
static bool DatumGetBool(Datum X)
Definition: postgres.h:90
uintptr_t Datum
Definition: postgres.h:64
static Datum ObjectIdGetDatum(Oid X)
Definition: postgres.h:252
static Pointer DatumGetPointer(Datum X)
Definition: postgres.h:312
#define InvalidOid
Definition: postgres_ext.h:36
unsigned int Oid
Definition: postgres_ext.h:31
#define OUTER_VAR
Definition: primnodes.h:237
tree context
Definition: radixtree.h:1835
MemoryContextSwitchTo(old_ctx)
static unsigned hash(unsigned *uv, int n)
Definition: rege_dfa.c:715
void shm_toc_insert(shm_toc *toc, uint64 key, void *address)
Definition: shm_toc.c:171
void * shm_toc_allocate(shm_toc *toc, Size nbytes)
Definition: shm_toc.c:88
void * shm_toc_lookup(shm_toc *toc, uint64 key, bool noError)
Definition: shm_toc.c:232
#define shm_toc_estimate_chunk(e, sz)
Definition: shm_toc.h:51
#define shm_toc_estimate_keys(e, cnt)
Definition: shm_toc.h:53
Size add_size(Size s1, Size s2)
Definition: shmem.c:493
Size mul_size(Size s1, Size s2)
Definition: shmem.c:510
static pg_noinline void Size size
Definition: slab.c:607
FmgrInfo finalfn
Definition: nodeAgg.h:207
bool resulttypeByVal
Definition: nodeAgg.h:225
List * aggdirectargs
Definition: nodeAgg.h:218
Aggref * aggref
Definition: nodeAgg.h:195
int16 resulttypeLen
Definition: nodeAgg.h:224
FmgrInfo * hashfunctions
Definition: nodeAgg.h:314
TupleHashTable hashtable
Definition: nodeAgg.h:311
TupleTableSlot * hashslot
Definition: nodeAgg.h:313
TupleHashIterator hashiter
Definition: nodeAgg.h:312
AttrNumber * hashGrpColIdxHash
Definition: nodeAgg.h:320
AttrNumber * hashGrpColIdxInput
Definition: nodeAgg.h:319
Bitmapset ** grouped_cols
Definition: nodeAgg.h:285
ExprState * evaltrans
Definition: nodeAgg.h:291
ExprState * evaltrans_cache[2][2]
Definition: nodeAgg.h:299
ExprState ** eqfunctions
Definition: nodeAgg.h:286
AggStrategy aggstrategy
Definition: nodeAgg.h:282
bool * sortNullsFirst
Definition: nodeAgg.h:108
FmgrInfo serialfn
Definition: nodeAgg.h:89
FmgrInfo equalfnOne
Definition: nodeAgg.h:115
TupleDesc sortdesc
Definition: nodeAgg.h:143
TupleTableSlot * sortslot
Definition: nodeAgg.h:141
FmgrInfo transfn
Definition: nodeAgg.h:86
Aggref * aggref
Definition: nodeAgg.h:44
ExprState * equalfnMulti
Definition: nodeAgg.h:116
Tuplesortstate ** sortstates
Definition: nodeAgg.h:162
TupleTableSlot * uniqslot
Definition: nodeAgg.h:142
FmgrInfo deserialfn
Definition: nodeAgg.h:92
FunctionCallInfo deserialfn_fcinfo
Definition: nodeAgg.h:175
AttrNumber * sortColIdx
Definition: nodeAgg.h:105
FunctionCallInfo serialfn_fcinfo
Definition: nodeAgg.h:173
FunctionCallInfo transfn_fcinfo
Definition: nodeAgg.h:170
MemoryContext hash_metacxt
Definition: execnodes.h:2531
ScanState ss
Definition: execnodes.h:2489
Tuplesortstate * sort_out
Definition: execnodes.h:2522
uint64 hash_disk_used
Definition: execnodes.h:2549
AggStatePerGroup * all_pergroups
Definition: execnodes.h:2558
AggStatePerGroup * hash_pergroup
Definition: execnodes.h:2553
AggStatePerPhase phase
Definition: execnodes.h:2495
List * aggs
Definition: execnodes.h:2490
ExprContext * tmpcontext
Definition: execnodes.h:2502
int max_colno_needed
Definition: execnodes.h:2516
int hash_planned_partitions
Definition: execnodes.h:2543
HeapTuple grp_firstTuple
Definition: execnodes.h:2527
Size hash_mem_limit
Definition: execnodes.h:2541
ExprContext * curaggcontext
Definition: execnodes.h:2504
AggStatePerTrans curpertrans
Definition: execnodes.h:2507
bool table_filled
Definition: execnodes.h:2529
AggStatePerTrans pertrans
Definition: execnodes.h:2499
int current_set
Definition: execnodes.h:2512
struct LogicalTapeSet * hash_tapeset
Definition: execnodes.h:2532
AggStrategy aggstrategy
Definition: execnodes.h:2493
int numtrans
Definition: execnodes.h:2492
ExprContext * hashcontext
Definition: execnodes.h:2500
AggSplit aggsplit
Definition: execnodes.h:2494
int projected_set
Definition: execnodes.h:2510
SharedAggInfo * shared_info
Definition: execnodes.h:2560
uint64 hash_ngroups_limit
Definition: execnodes.h:2542
bool input_done
Definition: execnodes.h:2508
AggStatePerPhase phases
Definition: execnodes.h:2520
List * all_grouped_cols
Definition: execnodes.h:2514
bool hash_spill_mode
Definition: execnodes.h:2539
AggStatePerGroup * pergroups
Definition: execnodes.h:2525
AggStatePerHash perhash
Definition: execnodes.h:2552
Size hash_mem_peak
Definition: execnodes.h:2546
double hashentrysize
Definition: execnodes.h:2545
int numphases
Definition: execnodes.h:2496
uint64 hash_ngroups_current
Definition: execnodes.h:2547
int hash_batches_used
Definition: execnodes.h:2550
Tuplesortstate * sort_in
Definition: execnodes.h:2521
TupleTableSlot * hash_spill_wslot
Definition: execnodes.h:2536
AggStatePerAgg curperagg
Definition: execnodes.h:2505
struct HashAggSpill * hash_spills
Definition: execnodes.h:2533
TupleTableSlot * sort_slot
Definition: execnodes.h:2523
bool hash_ever_spilled
Definition: execnodes.h:2538
int numaggs
Definition: execnodes.h:2491
int num_hashes
Definition: execnodes.h:2530
AggStatePerAgg peragg
Definition: execnodes.h:2498
List * hash_batches
Definition: execnodes.h:2537
TupleTableSlot * hash_spill_rslot
Definition: execnodes.h:2535
int maxsets
Definition: execnodes.h:2519
ExprContext ** aggcontexts
Definition: execnodes.h:2501
Bitmapset * colnos_needed
Definition: execnodes.h:2515
int current_phase
Definition: execnodes.h:2497
bool all_cols_needed
Definition: execnodes.h:2517
bool agg_done
Definition: execnodes.h:2509
Bitmapset * grouped_cols
Definition: execnodes.h:2513
Definition: plannodes.h:997
AggSplit aggsplit
Definition: plannodes.h:1004
List * chain
Definition: plannodes.h:1031
long numGroups
Definition: plannodes.h:1017
List * groupingSets
Definition: plannodes.h:1028
Bitmapset * aggParams
Definition: plannodes.h:1023
Plan plan
Definition: plannodes.h:998
int numCols
Definition: plannodes.h:1007
uint64 transitionSpace
Definition: plannodes.h:1020
AggStrategy aggstrategy
Definition: plannodes.h:1001
Oid aggfnoid
Definition: primnodes.h:444
List * aggdistinct
Definition: primnodes.h:474
List * aggdirectargs
Definition: primnodes.h:465
List * args
Definition: primnodes.h:468
Expr * aggfilter
Definition: primnodes.h:477
List * aggorder
Definition: primnodes.h:471
MemoryContext es_query_cxt
Definition: execnodes.h:667
List * es_tupleTable
Definition: execnodes.h:669
MemoryContext ecxt_per_tuple_memory
Definition: execnodes.h:263
TupleTableSlot * ecxt_innertuple
Definition: execnodes.h:257
Datum * ecxt_aggvalues
Definition: execnodes.h:274
bool * ecxt_aggnulls
Definition: execnodes.h:276
TupleTableSlot * ecxt_outertuple
Definition: execnodes.h:259
Bitmapset * aggregated
Definition: nodeAgg.c:363
Bitmapset * unaggregated
Definition: nodeAgg.c:364
bool is_aggref
Definition: nodeAgg.c:362
bool fn_strict
Definition: fmgr.h:61
fmNodePtr context
Definition: fmgr.h:88
NullableDatum args[FLEXIBLE_ARRAY_MEMBER]
Definition: fmgr.h:95
int used_bits
Definition: nodeAgg.c:353
int64 input_tuples
Definition: nodeAgg.c:355
double input_card
Definition: nodeAgg.c:356
LogicalTape * input_tape
Definition: nodeAgg.c:354
hyperLogLogState * hll_card
Definition: nodeAgg.c:338
int64 * ntuples
Definition: nodeAgg.c:335
LogicalTape ** partitions
Definition: nodeAgg.c:334
int npartitions
Definition: nodeAgg.c:333
uint32 mask
Definition: nodeAgg.c:336
Definition: pg_list.h:54