PostgreSQL Source Code git master
All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
radixtree.h
Go to the documentation of this file.
1/*-------------------------------------------------------------------------
2 *
3 * radixtree.h
4 * Template for adaptive radix tree.
5 *
6 * A template to generate an "adaptive radix tree", specialized for value
7 * types and for local/shared memory.
8 *
9 * The concept originates from the paper "The Adaptive Radix Tree: ARTful
10 * Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper,
11 * and Thomas Neumann, 2013.
12 *
13 * Radix trees have some advantages over hash tables:
14 * - The keys are logically ordered, allowing efficient sorted iteration
15 * and range queries
16 * - Operations using keys that are lexicographically close together
17 * will have favorable memory locality
18 * - Memory use grows gradually rather than by doubling
19 * - The key does not need to be stored with the value, since the key
20 * is implicitly contained in the path to the value
21 *
22 * Some disadvantages are:
23 * - Point queries (along with insertion and deletion) are slower than
24 * a linear probing hash table as in simplehash.h
25 * - Memory usage varies by key distribution, so is difficult to predict
26 *
27 * A classic radix tree consists of nodes, each containing an array of
28 * pointers to child nodes. The size of the array is determined by the
29 * "span" of the tree, which is the number of bits of the key used to
30 * index into the array. For example, with a span of 6, a "chunk"
31 * of 6 bits is extracted from the key at each node traversal, and
32 * the arrays thus have a "fanout" of 2^6 or 64 entries. A large span
33 * allows a shorter tree, but requires larger arrays that may be mostly
34 * wasted space.
35 *
36 * The key idea of the adaptive radix tree is to choose different
37 * data structures based on the number of child nodes. A node will
38 * start out small when it is first populated, and when it is full,
39 * it is replaced by the next larger size. Conversely, when a node
40 * becomes mostly empty, it is replaced by the next smaller node. The
41 * bulk of the code complexity in this module stems from this dynamic
42 * switching. One mitigating factor is using a span of 8, since bytes
43 * are directly addressable.
44 *
45 * The ART paper mentions three ways to implement leaves:
46 *
47 * "- Single-value leaves: The values are stored using an addi-
48 * tional leaf node type which stores one value.
49 * - Multi-value leaves: The values are stored in one of four
50 * different leaf node types, which mirror the structure of
51 * inner nodes, but contain values instead of pointers.
52 * - Combined pointer/value slots: If values fit into point-
53 * ers, no separate node types are necessary. Instead, each
54 * pointer storage location in an inner node can either
55 * store a pointer or a value."
56 *
57 * We use a form of "combined pointer/value slots", as recommended. Values
58 * of size (if fixed at compile time) equal or smaller than the platform's
59 * pointer type are stored in the child slots of the last level node,
60 * while larger values are the same as "single-value" leaves above. This
61 * offers flexibility and efficiency. Variable-length types are currently
62 * treated as single-value leaves for simplicity, but future work may
63 * allow those to be stored in the child pointer arrays, when they're
64 * small enough.
65 *
66 * There are two other techniques described in the paper that are not
67 * implemented here:
68 * - path compression "...removes all inner nodes that have only a single child."
69 * - lazy path expansion "...inner nodes are only created if they are required
70 * to distinguish at least two leaf nodes."
71 *
72 * We do have a form of "poor man's path compression", however, enabled by
73 * only supporting unsigned integer keys (for now assumed to be 64-bit):
74 * A tree doesn't contain paths where the highest bytes of all keys are
75 * zero. That way, the tree's height adapts to the distribution of keys.
76 *
77 * To handle concurrency, we use a single reader-writer lock for the
78 * radix tree. If concurrent write operations are possible, the tree
79 * must be exclusively locked during write operations such as RT_SET()
80 * and RT_DELETE(), and share locked during read operations such as
81 * RT_FIND() and RT_BEGIN_ITERATE().
82 *
83 * TODO: The current locking mechanism is not optimized for high
84 * concurrency with mixed read-write workloads. In the future it might
85 * be worthwhile to replace it with the Optimistic Lock Coupling or
86 * ROWEX mentioned in the paper "The ART of Practical Synchronization"
87 * by the same authors as the ART paper, 2016.
88 *
89 * To generate a radix tree and associated functions for a use case
90 * several macros have to be #define'ed before this file is included.
91 * Including the file #undef's all those, so a new radix tree can be
92 * generated afterwards.
93 *
94 * The relevant parameters are:
95 * - RT_PREFIX - prefix for all symbol names generated. A prefix of "foo"
96 * will result in radix tree type "foo_radix_tree" and functions like
97 * "foo_create"/"foo_free" and so forth.
98 * - RT_DECLARE - if defined function prototypes and type declarations are
99 * generated
100 * - RT_DEFINE - if defined function definitions are generated
101 * - RT_SCOPE - in which scope (e.g. extern, static inline) do function
102 * declarations reside
103 * - RT_VALUE_TYPE - the type of the value.
104 * - RT_VARLEN_VALUE_SIZE() - for variable length values, an expression
105 * involving a pointer to the value type, to calculate size.
106 * NOTE: implies that the value is in fact variable-length,
107 * so do not set for fixed-length values.
108 * - RT_RUNTIME_EMBEDDABLE_VALUE - for variable length values, allows
109 * storing the value in a child pointer slot, rather than as a single-
110 * value leaf, if small enough. This requires that the value, when
111 * read as a child pointer, can be tagged in the lowest bit.
112 *
113 * Optional parameters:
114 * - RT_SHMEM - if defined, the radix tree is created in the DSA area
115 * so that multiple processes can access it simultaneously.
116 * - RT_DEBUG - if defined add stats tracking and debugging functions
117 *
118 * Interface
119 * ---------
120 *
121 * RT_CREATE - Create a new, empty radix tree
122 * RT_FREE - Free the radix tree
123 * RT_FIND - Lookup the value for a given key
124 * RT_SET - Set a key-value pair
125 * RT_BEGIN_ITERATE - Begin iterating through all key-value pairs
126 * RT_ITERATE_NEXT - Return next key-value pair, if any
127 * RT_END_ITERATE - End iteration
128 * RT_MEMORY_USAGE - Get the memory as measured by space in memory context blocks
129 *
130 * Interface for Shared Memory
131 * ---------
132 *
133 * RT_ATTACH - Attach to the radix tree
134 * RT_DETACH - Detach from the radix tree
135 * RT_LOCK_EXCLUSIVE - Lock the radix tree in exclusive mode
136 * RT_LOCK_SHARE - Lock the radix tree in share mode
137 * RT_UNLOCK - Unlock the radix tree
138 * RT_GET_HANDLE - Return the handle of the radix tree
139 *
140 * Optional Interface
141 * ---------
142 *
143 * RT_DELETE - Delete a key-value pair. Declared/defined if RT_USE_DELETE is defined
144 *
145 *
146 * Copyright (c) 2024, PostgreSQL Global Development Group
147 *
148 * IDENTIFICATION
149 * src/include/lib/radixtree.h
150 *
151 *-------------------------------------------------------------------------
152 */
153
154#include "nodes/bitmapset.h"
155#include "port/pg_bitutils.h"
156#include "port/simd.h"
157#include "utils/dsa.h"
158#include "utils/memutils.h"
159#ifdef RT_SHMEM
160#include "miscadmin.h"
161#include "storage/lwlock.h"
162#endif
163
164/* helpers */
165#define RT_MAKE_PREFIX(a) CppConcat(a,_)
166#define RT_MAKE_NAME(name) RT_MAKE_NAME_(RT_MAKE_PREFIX(RT_PREFIX),name)
167#define RT_MAKE_NAME_(a,b) CppConcat(a,b)
168/*
169 * stringify a macro constant, from https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html
170 */
171#define RT_STR(s) RT_STR_(s)
172#define RT_STR_(s) #s
173
174/* function declarations */
175#define RT_CREATE RT_MAKE_NAME(create)
176#define RT_FREE RT_MAKE_NAME(free)
177#define RT_FIND RT_MAKE_NAME(find)
178#ifdef RT_SHMEM
179#define RT_ATTACH RT_MAKE_NAME(attach)
180#define RT_DETACH RT_MAKE_NAME(detach)
181#define RT_GET_HANDLE RT_MAKE_NAME(get_handle)
182#define RT_LOCK_EXCLUSIVE RT_MAKE_NAME(lock_exclusive)
183#define RT_LOCK_SHARE RT_MAKE_NAME(lock_share)
184#define RT_UNLOCK RT_MAKE_NAME(unlock)
185#endif
186#define RT_SET RT_MAKE_NAME(set)
187#define RT_BEGIN_ITERATE RT_MAKE_NAME(begin_iterate)
188#define RT_ITERATE_NEXT RT_MAKE_NAME(iterate_next)
189#define RT_END_ITERATE RT_MAKE_NAME(end_iterate)
190#ifdef RT_USE_DELETE
191#define RT_DELETE RT_MAKE_NAME(delete)
192#endif
193#define RT_MEMORY_USAGE RT_MAKE_NAME(memory_usage)
194#define RT_DUMP_NODE RT_MAKE_NAME(dump_node)
195#define RT_STATS RT_MAKE_NAME(stats)
196
197/* internal helper functions (no externally visible prototypes) */
198#define RT_CHILDPTR_IS_VALUE RT_MAKE_NAME(childptr_is_value)
199#define RT_VALUE_IS_EMBEDDABLE RT_MAKE_NAME(value_is_embeddable)
200#define RT_GET_SLOT_RECURSIVE RT_MAKE_NAME(get_slot_recursive)
201#define RT_DELETE_RECURSIVE RT_MAKE_NAME(delete_recursive)
202#define RT_ALLOC_NODE RT_MAKE_NAME(alloc_node)
203#define RT_ALLOC_LEAF RT_MAKE_NAME(alloc_leaf)
204#define RT_FREE_NODE RT_MAKE_NAME(free_node)
205#define RT_FREE_LEAF RT_MAKE_NAME(free_leaf)
206#define RT_FREE_RECURSE RT_MAKE_NAME(free_recurse)
207#define RT_EXTEND_UP RT_MAKE_NAME(extend_up)
208#define RT_EXTEND_DOWN RT_MAKE_NAME(extend_down)
209#define RT_COPY_COMMON RT_MAKE_NAME(copy_common)
210#define RT_PTR_SET_LOCAL RT_MAKE_NAME(ptr_set_local)
211#define RT_NODE_16_SEARCH_EQ RT_MAKE_NAME(node_16_search_eq)
212#define RT_NODE_4_GET_INSERTPOS RT_MAKE_NAME(node_4_get_insertpos)
213#define RT_NODE_16_GET_INSERTPOS RT_MAKE_NAME(node_16_get_insertpos)
214#define RT_SHIFT_ARRAYS_FOR_INSERT RT_MAKE_NAME(shift_arrays_for_insert)
215#define RT_SHIFT_ARRAYS_AND_DELETE RT_MAKE_NAME(shift_arrays_and_delete)
216#define RT_COPY_ARRAYS_FOR_INSERT RT_MAKE_NAME(copy_arrays_for_insert)
217#define RT_COPY_ARRAYS_AND_DELETE RT_MAKE_NAME(copy_arrays_and_delete)
218#define RT_NODE_48_IS_CHUNK_USED RT_MAKE_NAME(node_48_is_chunk_used)
219#define RT_NODE_48_GET_CHILD RT_MAKE_NAME(node_48_get_child)
220#define RT_NODE_256_IS_CHUNK_USED RT_MAKE_NAME(node_256_is_chunk_used)
221#define RT_NODE_256_GET_CHILD RT_MAKE_NAME(node_256_get_child)
222#define RT_KEY_GET_SHIFT RT_MAKE_NAME(key_get_shift)
223#define RT_SHIFT_GET_MAX_VAL RT_MAKE_NAME(shift_get_max_val)
224#define RT_NODE_SEARCH RT_MAKE_NAME(node_search)
225#define RT_NODE_DELETE RT_MAKE_NAME(node_delete)
226#define RT_NODE_INSERT RT_MAKE_NAME(node_insert)
227#define RT_ADD_CHILD_4 RT_MAKE_NAME(add_child_4)
228#define RT_ADD_CHILD_16 RT_MAKE_NAME(add_child_16)
229#define RT_ADD_CHILD_48 RT_MAKE_NAME(add_child_48)
230#define RT_ADD_CHILD_256 RT_MAKE_NAME(add_child_256)
231#define RT_GROW_NODE_4 RT_MAKE_NAME(grow_node_4)
232#define RT_GROW_NODE_16 RT_MAKE_NAME(grow_node_16)
233#define RT_GROW_NODE_48 RT_MAKE_NAME(grow_node_48)
234#define RT_REMOVE_CHILD_4 RT_MAKE_NAME(remove_child_4)
235#define RT_REMOVE_CHILD_16 RT_MAKE_NAME(remove_child_16)
236#define RT_REMOVE_CHILD_48 RT_MAKE_NAME(remove_child_48)
237#define RT_REMOVE_CHILD_256 RT_MAKE_NAME(remove_child_256)
238#define RT_SHRINK_NODE_16 RT_MAKE_NAME(shrink_child_16)
239#define RT_SHRINK_NODE_48 RT_MAKE_NAME(shrink_child_48)
240#define RT_SHRINK_NODE_256 RT_MAKE_NAME(shrink_child_256)
241#define RT_NODE_ITERATE_NEXT RT_MAKE_NAME(node_iterate_next)
242#define RT_VERIFY_NODE RT_MAKE_NAME(verify_node)
243
244/* type declarations */
245#define RT_RADIX_TREE RT_MAKE_NAME(radix_tree)
246#define RT_RADIX_TREE_CONTROL RT_MAKE_NAME(radix_tree_control)
247#define RT_ITER RT_MAKE_NAME(iter)
248#ifdef RT_SHMEM
249#define RT_HANDLE RT_MAKE_NAME(handle)
250#endif
251#define RT_NODE RT_MAKE_NAME(node)
252#define RT_CHILD_PTR RT_MAKE_NAME(child_ptr)
253#define RT_NODE_ITER RT_MAKE_NAME(node_iter)
254#define RT_NODE_4 RT_MAKE_NAME(node_4)
255#define RT_NODE_16 RT_MAKE_NAME(node_16)
256#define RT_NODE_48 RT_MAKE_NAME(node_48)
257#define RT_NODE_256 RT_MAKE_NAME(node_256)
258#define RT_SIZE_CLASS RT_MAKE_NAME(size_class)
259#define RT_SIZE_CLASS_ELEM RT_MAKE_NAME(size_class_elem)
260#define RT_SIZE_CLASS_INFO RT_MAKE_NAME(size_class_info)
261#define RT_CLASS_4 RT_MAKE_NAME(class_4)
262#define RT_CLASS_16_LO RT_MAKE_NAME(class_32_min)
263#define RT_CLASS_16_HI RT_MAKE_NAME(class_32_max)
264#define RT_CLASS_48 RT_MAKE_NAME(class_48)
265#define RT_CLASS_256 RT_MAKE_NAME(class_256)
266
267/* generate forward declarations necessary to use the radix tree */
268#ifdef RT_DECLARE
269
271typedef struct RT_ITER RT_ITER;
272
273#ifdef RT_SHMEM
274typedef dsa_pointer RT_HANDLE;
275#endif
276
277#ifdef RT_SHMEM
278RT_SCOPE RT_RADIX_TREE *RT_CREATE(MemoryContext ctx, dsa_area *dsa, int tranche_id);
279RT_SCOPE RT_RADIX_TREE *RT_ATTACH(dsa_area *dsa, dsa_pointer dp);
280RT_SCOPE void RT_DETACH(RT_RADIX_TREE * tree);
281RT_SCOPE RT_HANDLE RT_GET_HANDLE(RT_RADIX_TREE * tree);
282RT_SCOPE void RT_LOCK_EXCLUSIVE(RT_RADIX_TREE * tree);
283RT_SCOPE void RT_LOCK_SHARE(RT_RADIX_TREE * tree);
284RT_SCOPE void RT_UNLOCK(RT_RADIX_TREE * tree);
285#else
287#endif
289
292
293#ifdef RT_USE_DELETE
294RT_SCOPE bool RT_DELETE(RT_RADIX_TREE * tree, uint64 key);
295#endif
296
299RT_SCOPE void RT_END_ITERATE(RT_ITER * iter);
300
302
303#ifdef RT_DEBUG
305#endif
306
307#endif /* RT_DECLARE */
308
309
310/* generate implementation of the radix tree */
311#ifdef RT_DEFINE
312
313/* The number of bits encoded in one tree level */
314#define RT_SPAN BITS_PER_BYTE
315
316/*
317 * The number of possible partial keys, and thus the maximum number of
318 * child pointers, for a node.
319 */
320#define RT_NODE_MAX_SLOTS (1 << RT_SPAN)
321
322/* Mask for extracting a chunk from a key */
323#define RT_CHUNK_MASK ((1 << RT_SPAN) - 1)
324
325/* Maximum shift needed to extract a chunk from a key */
326#define RT_MAX_SHIFT RT_KEY_GET_SHIFT(UINT64_MAX)
327
328/* Maximum level a tree can reach for a key */
329#define RT_MAX_LEVEL ((sizeof(uint64) * BITS_PER_BYTE) / RT_SPAN)
330
331/* Get a chunk from the key */
332#define RT_GET_KEY_CHUNK(key, shift) ((uint8) (((key) >> (shift)) & RT_CHUNK_MASK))
333
334/* For accessing bitmaps */
335#define RT_BM_IDX(x) ((x) / BITS_PER_BITMAPWORD)
336#define RT_BM_BIT(x) ((x) % BITS_PER_BITMAPWORD)
337
338/*
339 * Node kinds
340 *
341 * The different node kinds are what make the tree "adaptive".
342 *
343 * Each node kind is associated with a different datatype and different
344 * search/set/delete/iterate algorithms adapted for its size. The largest
345 * kind, node256 is basically the same as a traditional radix tree,
346 * and would be most wasteful of memory when sparsely populated. The
347 * smaller nodes expend some additional CPU time to enable a smaller
348 * memory footprint.
349 *
350 * NOTE: There are 4 node kinds, and this should never be increased,
351 * for several reasons:
352 * 1. With 5 or more kinds, gcc tends to use a jump table for switch
353 * statements.
354 * 2. The 4 kinds can be represented with 2 bits, so we have the option
355 * in the future to tag the node pointer with the kind, even on
356 * platforms with 32-bit pointers. That would touch fewer cache lines
357 * during traversal and allow faster recovery from branch mispredicts.
358 * 3. We can have multiple size classes per node kind.
359 */
360#define RT_NODE_KIND_4 0x00
361#define RT_NODE_KIND_16 0x01
362#define RT_NODE_KIND_48 0x02
363#define RT_NODE_KIND_256 0x03
364#define RT_NODE_KIND_COUNT 4
365
366/*
367 * Calculate the slab block size so that we can allocate at least 32 chunks
368 * from the block.
369 */
370#define RT_SLAB_BLOCK_SIZE(size) \
371 Max(SLAB_DEFAULT_BLOCK_SIZE, pg_nextpower2_32(size * 32))
372
373/* Common header for all nodes */
374typedef struct RT_NODE
375{
376 /* Node kind, one per search/set algorithm */
378
379 /*
380 * Max capacity for the current size class. Storing this in the node
381 * enables multiple size classes per node kind. uint8 is sufficient for
382 * all node kinds, because we only use this number to test if the node
383 * needs to grow. Since node256 never needs to grow, we let this overflow
384 * to zero.
385 */
387
388 /*
389 * Number of children. uint8 is sufficient for all node kinds, because
390 * nodes shrink when this number gets lower than some threshold. Since
391 * node256 cannot possibly have zero children, we let the counter overflow
392 * and we interpret zero as "256" for this node kind.
393 */
396
397
398/* pointer returned by allocation */
399#ifdef RT_SHMEM
400#define RT_PTR_ALLOC dsa_pointer
401#define RT_INVALID_PTR_ALLOC InvalidDsaPointer
402#define RT_PTR_ALLOC_IS_VALID(ptr) DsaPointerIsValid(ptr)
403#else
404#define RT_PTR_ALLOC RT_NODE *
405#define RT_INVALID_PTR_ALLOC NULL
406#define RT_PTR_ALLOC_IS_VALID(ptr) PointerIsValid(ptr)
407#endif
408
409/*
410 * A convenience type used when we need to work with a DSA pointer as well
411 * as its local pointer. For local memory, both members are the same, so
412 * we use a union.
413 */
414#ifdef RT_SHMEM
415typedef struct RT_CHILD_PTR
416#else
417typedef union RT_CHILD_PTR
418#endif
419{
423
424
425/*
426 * Helper macros and functions for value storage.
427 * We either embed values in the child slots of the last level
428 * node or store pointers to values to the child slots,
429 * depending on the value size.
430 */
431
432#ifdef RT_VARLEN_VALUE_SIZE
433#define RT_GET_VALUE_SIZE(v) RT_VARLEN_VALUE_SIZE(v)
434#else
435#define RT_GET_VALUE_SIZE(v) sizeof(RT_VALUE_TYPE)
436#endif
437
438/*
439 * Return true if the value can be stored in the child array
440 * of the lowest-level node, false otherwise.
441 */
442static inline bool
444{
445#ifdef RT_VARLEN_VALUE_SIZE
446
447#ifdef RT_RUNTIME_EMBEDDABLE_VALUE
448 return RT_GET_VALUE_SIZE(value_p) <= sizeof(RT_PTR_ALLOC);
449#else
450 return false;
451#endif
452
453#else
454 return RT_GET_VALUE_SIZE(value_p) <= sizeof(RT_PTR_ALLOC);
455#endif
456}
457
458/*
459 * Return true if the child pointer contains the value, false
460 * if the child pointer is a leaf pointer.
461 */
462static inline bool
464{
465#ifdef RT_VARLEN_VALUE_SIZE
466
467#ifdef RT_RUNTIME_EMBEDDABLE_VALUE
468 /* check for pointer tag */
469#ifdef RT_SHMEM
470 return child & 1;
471#else
472 return ((uintptr_t) child) & 1;
473#endif
474
475#else
476 return false;
477#endif
478
479#else
480 return sizeof(RT_VALUE_TYPE) <= sizeof(RT_PTR_ALLOC);
481#endif
482}
483
484/*
485 * Symbols for maximum possible fanout are declared first as they are
486 * required to declare each node kind. The declarations of other fanout
487 * values are followed as they need the struct sizes of each node kind.
488 */
489
490/* max possible key chunks without struct padding */
491#define RT_FANOUT_4_MAX (8 - sizeof(RT_NODE))
492
493/* equal to two 128-bit SIMD registers, regardless of availability */
494#define RT_FANOUT_16_MAX 32
495
496/*
497 * This also determines the number of bits necessary for the isset array,
498 * so we need to be mindful of the size of bitmapword. Since bitmapword
499 * can be 64 bits, the only values that make sense here are 64 and 128.
500 * The ART paper uses at most 64 for this node kind, and one advantage
501 * for us is that "isset" is a single bitmapword on most platforms,
502 * rather than an array, allowing the compiler to get rid of loops.
503 */
504#define RT_FANOUT_48_MAX 64
505
506#define RT_FANOUT_256 RT_NODE_MAX_SLOTS
507
508/*
509 * Node structs, one for each "kind"
510 */
511
512/*
513 * node4 and node16 use one array for key chunks and another
514 * array of the same length for children. The keys and children
515 * are stored at corresponding positions, sorted by chunk.
516 */
517
518typedef struct RT_NODE_4
519{
521
523
524 /* number of children depends on size class */
527
528typedef struct RT_NODE_16
529{
531
533
534 /* number of children depends on size class */
537
538/*
539 * node48 uses a 256-element array indexed by key chunks. This array
540 * stores indexes into a second array containing the children.
541 */
542typedef struct RT_NODE_48
543{
545
546 /* bitmap to track which slots are in use */
548
549 /*
550 * Lookup table for indexes into the children[] array. We make this the
551 * last fixed-size member so that it's convenient to memset separately
552 * from the previous members.
553 */
555
556/* Invalid index */
557#define RT_INVALID_SLOT_IDX 0xFF
558
559 /* number of children depends on size class */
562
563/*
564 * node256 is the largest node type. This node has an array of
565 * children directly indexed by chunk. Unlike other node kinds,
566 * its array size is by definition fixed.
567 */
568typedef struct RT_NODE_256
569{
571
572 /* bitmap to track which slots are in use */
574
575 /* slots for 256 children */
578
579#if defined(RT_SHMEM)
580/*
581 * Make sure the all nodes (except for node256) fit neatly into a DSA
582 * size class. We assume the RT_FANOUT_4 is in the range where DSA size
583 * classes increment by 8 (as of PG17 up to 64 bytes), so we just hard
584 * code that one.
585 */
586
587#if SIZEOF_DSA_POINTER < 8
588#define RT_FANOUT_16_LO ((96 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
589#define RT_FANOUT_16_HI Min(RT_FANOUT_16_MAX, (160 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
590#define RT_FANOUT_48 Min(RT_FANOUT_48_MAX, (512 - offsetof(RT_NODE_48, children)) / sizeof(RT_PTR_ALLOC))
591#else
592#define RT_FANOUT_16_LO ((160 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
593#define RT_FANOUT_16_HI Min(RT_FANOUT_16_MAX, (320 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
594#define RT_FANOUT_48 Min(RT_FANOUT_48_MAX, (768 - offsetof(RT_NODE_48, children)) / sizeof(RT_PTR_ALLOC))
595#endif /* SIZEOF_DSA_POINTER < 8 */
596
597#else /* ! RT_SHMEM */
598
599/* doesn't really matter, but may as well use the namesake */
600#define RT_FANOUT_16_LO 16
601/* use maximum possible */
602#define RT_FANOUT_16_HI RT_FANOUT_16_MAX
603#define RT_FANOUT_48 RT_FANOUT_48_MAX
604
605#endif /* RT_SHMEM */
606
607/*
608 * To save memory in trees with sparse keys, it would make sense to have two
609 * size classes for the smallest kind (perhaps a high class of 5 and a low class
610 * of 2), but it would be more effective to utilize lazy expansion and
611 * path compression.
612 */
613#define RT_FANOUT_4 4
614
615StaticAssertDecl(RT_FANOUT_4 <= RT_FANOUT_4_MAX, "watch struct padding");
616StaticAssertDecl(RT_FANOUT_16_LO < RT_FANOUT_16_HI, "LO subclass bigger than HI");
617StaticAssertDecl(RT_FANOUT_48 <= RT_FANOUT_48_MAX, "more slots than isset bits");
618
619/*
620 * Node size classes
621 *
622 * Nodes of different kinds necessarily belong to different size classes.
623 * One innovation in our implementation compared to the ART paper is
624 * decoupling the notion of size class from kind.
625 *
626 * The size classes within a given node kind have the same underlying
627 * type, but a variable number of children/values. This is possible
628 * because each type (except node256) contains metadata that work the
629 * same way regardless of how many child slots there are. The nodes
630 * can introspect their allocated capacity at runtime.
631 */
632typedef enum RT_SIZE_CLASS
633{
640
641/* Information for each size class */
642typedef struct RT_SIZE_CLASS_ELEM
643{
644 const char *name;
646 size_t allocsize;
648
649
651 [RT_CLASS_4] = {
652 .name = RT_STR(RT_PREFIX) "_radix_tree node4",
653 .fanout = RT_FANOUT_4,
654 .allocsize = sizeof(RT_NODE_4) + RT_FANOUT_4 * sizeof(RT_PTR_ALLOC),
655 },
656 [RT_CLASS_16_LO] = {
657 .name = RT_STR(RT_PREFIX) "_radix_tree node16_lo",
658 .fanout = RT_FANOUT_16_LO,
659 .allocsize = sizeof(RT_NODE_16) + RT_FANOUT_16_LO * sizeof(RT_PTR_ALLOC),
660 },
661 [RT_CLASS_16_HI] = {
662 .name = RT_STR(RT_PREFIX) "_radix_tree node16_hi",
663 .fanout = RT_FANOUT_16_HI,
664 .allocsize = sizeof(RT_NODE_16) + RT_FANOUT_16_HI * sizeof(RT_PTR_ALLOC),
665 },
666 [RT_CLASS_48] = {
667 .name = RT_STR(RT_PREFIX) "_radix_tree node48",
668 .fanout = RT_FANOUT_48,
669 .allocsize = sizeof(RT_NODE_48) + RT_FANOUT_48 * sizeof(RT_PTR_ALLOC),
670 },
671 [RT_CLASS_256] = {
672 .name = RT_STR(RT_PREFIX) "_radix_tree node256",
673 .fanout = RT_FANOUT_256,
674 .allocsize = sizeof(RT_NODE_256),
675 },
676};
677
678#define RT_NUM_SIZE_CLASSES lengthof(RT_SIZE_CLASS_INFO)
679
680#ifdef RT_SHMEM
681/* A magic value used to identify our radix tree */
682#define RT_RADIX_TREE_MAGIC 0x54A48167
683#endif
684
685/* Contains the actual tree, plus ancillary info */
687{
688#ifdef RT_SHMEM
689 RT_HANDLE handle;
690 uint32 magic;
691 LWLock lock;
692#endif
693
698
699 /* statistics */
700#ifdef RT_DEBUG
701 int64 num_nodes[RT_NUM_SIZE_CLASSES];
702 int64 num_leaves;
703#endif
705
706/* Entry point for allocating and accessing the tree */
708{
710
711 /* pointing to either local memory or DSA */
713
714#ifdef RT_SHMEM
715 dsa_area *dsa;
716#else
718
719 /* leaf_context is used only for single-value leaves */
721#endif
723};
724
725/*
726 * Iteration support.
727 *
728 * Iterating over the radix tree produces each key/value pair in ascending
729 * order of the key.
730 */
731
732/* state for iterating over a single node */
733typedef struct RT_NODE_ITER
734{
736
737 /*
738 * The next index of the chunk array in RT_NODE_KIND_4 and RT_NODE_KIND_16
739 * nodes, or the next chunk in RT_NODE_KIND_48 and RT_NODE_KIND_256 nodes.
740 * 0 for the initial value.
741 */
742 int idx;
744
745/* state for iterating over the whole radix tree */
747{
749
750 /*
751 * A stack to track iteration for each level. Level 0 is the lowest (or
752 * leaf) level
753 */
757
758 /* The key constructed during iteration */
760};
761
762
763/* verification (available only in assert-enabled builds) */
764static void RT_VERIFY_NODE(RT_NODE * node);
765
766static inline void
768{
769#ifdef RT_SHMEM
770 node->local = dsa_get_address(tree->dsa, node->alloc);
771#endif
772}
773
774/* Convenience functions for node48 and node256 */
775
776/* Return true if there is an entry for "chunk" */
777static inline bool
779{
780 return node->slot_idxs[chunk] != RT_INVALID_SLOT_IDX;
781}
782
783static inline RT_PTR_ALLOC *
785{
786 return &node->children[node->slot_idxs[chunk]];
787}
788
789/* Return true if there is an entry for "chunk" */
790static inline bool
792{
793 int idx = RT_BM_IDX(chunk);
794 int bitnum = RT_BM_BIT(chunk);
795
796 return (node->isset[idx] & ((bitmapword) 1 << bitnum)) != 0;
797}
798
799static inline RT_PTR_ALLOC *
801{
803 return &node->children[chunk];
804}
805
806/*
807 * Return the smallest shift that will allowing storing the given key.
808 */
809static inline int
811{
812 if (key == 0)
813 return 0;
814 else
816}
817
818/*
819 * Return the max value that can be stored in the tree with the given shift.
820 */
821static uint64
823{
824 if (shift == RT_MAX_SHIFT)
825 return UINT64_MAX;
826 else
827 return (UINT64CONST(1) << (shift + RT_SPAN)) - 1;
828}
829
830/*
831 * Allocate a new node with the given node kind and size class.
832 */
833static inline RT_CHILD_PTR
834RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_class)
835{
836 RT_CHILD_PTR allocnode;
837 RT_NODE *node;
838 size_t allocsize;
839
840 allocsize = RT_SIZE_CLASS_INFO[size_class].allocsize;
841
842#ifdef RT_SHMEM
843 allocnode.alloc = dsa_allocate(tree->dsa, allocsize);
844#else
845 allocnode.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->node_slabs[size_class],
846 allocsize);
847#endif
848
849 RT_PTR_SET_LOCAL(tree, &allocnode);
850 node = allocnode.local;
851
852 /* initialize contents */
853
854 switch (kind)
855 {
856 case RT_NODE_KIND_4:
857 memset(node, 0, offsetof(RT_NODE_4, children));
858 break;
859 case RT_NODE_KIND_16:
860 memset(node, 0, offsetof(RT_NODE_16, children));
861 break;
862 case RT_NODE_KIND_48:
863 {
864 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
865
866 memset(n48, 0, offsetof(RT_NODE_48, slot_idxs));
867 memset(n48->slot_idxs, RT_INVALID_SLOT_IDX, sizeof(n48->slot_idxs));
868 break;
869 }
870 case RT_NODE_KIND_256:
871 memset(node, 0, offsetof(RT_NODE_256, children));
872 break;
873 default:
875 }
876
877 node->kind = kind;
878
879 /*
880 * For node256, this will actually overflow to zero, but that's okay
881 * because that node doesn't need to introspect this value.
882 */
883 node->fanout = RT_SIZE_CLASS_INFO[size_class].fanout;
884
885#ifdef RT_DEBUG
886 /* update the statistics */
887 tree->ctl->num_nodes[size_class]++;
888#endif
889
890 return allocnode;
891}
892
893/*
894 * Allocate a new leaf.
895 */
896static RT_CHILD_PTR
897RT_ALLOC_LEAF(RT_RADIX_TREE * tree, size_t allocsize)
898{
899 RT_CHILD_PTR leaf;
900
901#ifdef RT_SHMEM
902 leaf.alloc = dsa_allocate(tree->dsa, allocsize);
903 RT_PTR_SET_LOCAL(tree, &leaf);
904#else
905 leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_context, allocsize);
906#endif
907
908#ifdef RT_DEBUG
909 tree->ctl->num_leaves++;
910#endif
911
912 return leaf;
913}
914
915/*
916 * Copy relevant members of the node header.
917 * This is a separate function in case other fields are added.
918 */
919static inline void
921{
922 (newnode.local)->count = (oldnode.local)->count;
923}
924
925/* Free the given node */
926static void
928{
929#ifdef RT_DEBUG
930 int i;
931
932 /* update the statistics */
933
934 for (i = 0; i < RT_NUM_SIZE_CLASSES; i++)
935 {
936 if ((node.local)->fanout == RT_SIZE_CLASS_INFO[i].fanout)
937 break;
938 }
939
940 /*
941 * The fanout of node256 will appear to be zero within the node header
942 * because of overflow, so we need an extra check here.
943 */
944 if (i == RT_NUM_SIZE_CLASSES)
945 i = RT_CLASS_256;
946
947 tree->ctl->num_nodes[i]--;
948 Assert(tree->ctl->num_nodes[i] >= 0);
949#endif
950
951#ifdef RT_SHMEM
952 dsa_free(tree->dsa, node.alloc);
953#else
954 pfree(node.alloc);
955#endif
956}
957
958static inline void
960{
961 Assert(leaf != tree->ctl->root);
962
963#ifdef RT_DEBUG
964 /* update the statistics */
965 tree->ctl->num_leaves--;
966 Assert(tree->ctl->num_leaves >= 0);
967#endif
968
969#ifdef RT_SHMEM
970 dsa_free(tree->dsa, leaf);
971#else
972 pfree(leaf);
973#endif
974}
975
976/***************** SEARCH *****************/
977
978/*
979 * Return the address of the child corresponding to "chunk",
980 * or NULL if there is no such element.
981 */
982static inline RT_PTR_ALLOC *
984{
985 int count = node->base.count;
986#ifndef USE_NO_SIMD
987 Vector8 spread_chunk;
988 Vector8 haystack1;
989 Vector8 haystack2;
990 Vector8 cmp1;
991 Vector8 cmp2;
992 uint32 bitfield;
993 RT_PTR_ALLOC *slot_simd = NULL;
994#endif
995
996#if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
997 RT_PTR_ALLOC *slot = NULL;
998
999 for (int i = 0; i < count; i++)
1000 {
1001 if (node->chunks[i] == chunk)
1002 {
1003 slot = &node->children[i];
1004 break;
1005 }
1006 }
1007#endif
1008
1009#ifndef USE_NO_SIMD
1010 /* replicate the search key */
1011 spread_chunk = vector8_broadcast(chunk);
1012
1013 /* compare to all 32 keys stored in the node */
1014 vector8_load(&haystack1, &node->chunks[0]);
1015 vector8_load(&haystack2, &node->chunks[sizeof(Vector8)]);
1016 cmp1 = vector8_eq(spread_chunk, haystack1);
1017 cmp2 = vector8_eq(spread_chunk, haystack2);
1018
1019 /* convert comparison to a bitfield */
1020 bitfield = vector8_highbit_mask(cmp1) | (vector8_highbit_mask(cmp2) << sizeof(Vector8));
1021
1022 /* mask off invalid entries */
1023 bitfield &= ((UINT64CONST(1) << count) - 1);
1024
1025 /* convert bitfield to index by counting trailing zeros */
1026 if (bitfield)
1027 slot_simd = &node->children[pg_rightmost_one_pos32(bitfield)];
1028
1029 Assert(slot_simd == slot);
1030 return slot_simd;
1031#else
1032 return slot;
1033#endif
1034}
1035
1036/*
1037 * Search for the child pointer corresponding to "key" in the given node.
1038 *
1039 * Return child if the key is found, otherwise return NULL.
1040 */
1041static inline RT_PTR_ALLOC *
1043{
1044 /* Make sure we already converted to local pointer */
1045 Assert(node != NULL);
1046
1047 switch (node->kind)
1048 {
1049 case RT_NODE_KIND_4:
1050 {
1051 RT_NODE_4 *n4 = (RT_NODE_4 *) node;
1052
1053 for (int i = 0; i < n4->base.count; i++)
1054 {
1055 if (n4->chunks[i] == chunk)
1056 return &n4->children[i];
1057 }
1058 return NULL;
1059 }
1060 case RT_NODE_KIND_16:
1061 return RT_NODE_16_SEARCH_EQ((RT_NODE_16 *) node, chunk);
1062 case RT_NODE_KIND_48:
1063 {
1064 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
1065 int slotpos = n48->slot_idxs[chunk];
1066
1067 if (slotpos == RT_INVALID_SLOT_IDX)
1068 return NULL;
1069
1070 return RT_NODE_48_GET_CHILD(n48, chunk);
1071 }
1072 case RT_NODE_KIND_256:
1073 {
1074 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
1075
1076 if (!RT_NODE_256_IS_CHUNK_USED(n256, chunk))
1077 return NULL;
1078
1079 return RT_NODE_256_GET_CHILD(n256, chunk);
1080 }
1081 default:
1083 }
1084}
1085
1086/*
1087 * Search the given key in the radix tree. Return the pointer to the value if found,
1088 * otherwise return NULL.
1089 *
1090 * Since the function returns a pointer (to support variable-length values),
1091 * the caller is responsible for locking until it's finished with the value.
1092 */
1095{
1096 RT_CHILD_PTR node;
1097 RT_PTR_ALLOC *slot = NULL;
1098 int shift;
1099
1100#ifdef RT_SHMEM
1101 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1102#endif
1103
1104 if (key > tree->ctl->max_val)
1105 return NULL;
1106
1107 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
1108 node.alloc = tree->ctl->root;
1109 shift = tree->ctl->start_shift;
1110
1111 /* Descend the tree */
1112 while (shift >= 0)
1113 {
1114 RT_PTR_SET_LOCAL(tree, &node);
1115 slot = RT_NODE_SEARCH(node.local, RT_GET_KEY_CHUNK(key, shift));
1116 if (slot == NULL)
1117 return NULL;
1118
1119 node.alloc = *slot;
1120 shift -= RT_SPAN;
1121 }
1122
1123 if (RT_CHILDPTR_IS_VALUE(*slot))
1124 return (RT_VALUE_TYPE *) slot;
1125 else
1126 {
1127 RT_PTR_SET_LOCAL(tree, &node);
1128 return (RT_VALUE_TYPE *) node.local;
1129 }
1130}
1131
1132/***************** INSERTION *****************/
1133
1134#define RT_NODE_MUST_GROW(node) \
1135 ((node)->count == (node)->fanout)
1136
1137/*
1138 * Return index of the chunk and slot arrays for inserting into the node,
1139 * such that the arrays remain ordered.
1140 */
1141static inline int
1143{
1144 int idx;
1145
1146 for (idx = 0; idx < count; idx++)
1147 {
1148 if (node->chunks[idx] >= chunk)
1149 break;
1150 }
1151
1152 return idx;
1153}
1154
1155/*
1156 * Return index of the chunk and slot arrays for inserting into the node,
1157 * such that the arrays remain ordered.
1158 */
1159static inline int
1161{
1162 int count = node->base.count;
1163#if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
1164 int index;
1165#endif
1166
1167#ifndef USE_NO_SIMD
1168 Vector8 spread_chunk;
1169 Vector8 haystack1;
1170 Vector8 haystack2;
1171 Vector8 cmp1;
1172 Vector8 cmp2;
1173 Vector8 min1;
1174 Vector8 min2;
1175 uint32 bitfield;
1176 int index_simd;
1177#endif
1178
1179 /*
1180 * First compare the last element. There are two reasons to branch here:
1181 *
1182 * 1) A realistic pattern is inserting ordered keys. In that case,
1183 * non-SIMD platforms must do a linear search to the last chunk to find
1184 * the insert position. This will get slower as the node fills up.
1185 *
1186 * 2) On SIMD platforms, we must branch anyway to make sure we don't bit
1187 * scan an empty bitfield. Doing the branch here eliminates some work that
1188 * we might otherwise throw away.
1189 */
1190 Assert(count > 0);
1191 if (node->chunks[count - 1] < chunk)
1192 return count;
1193
1194#if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
1195
1196 for (index = 0; index < count; index++)
1197 {
1198 if (node->chunks[index] > chunk)
1199 break;
1200 }
1201#endif
1202
1203#ifndef USE_NO_SIMD
1204
1205 /*
1206 * This is a bit more complicated than RT_NODE_16_SEARCH_EQ(), because no
1207 * unsigned uint8 comparison instruction exists, at least for SSE2. So we
1208 * need to play some trickery using vector8_min() to effectively get >=.
1209 * There'll never be any equal elements in current uses, but that's what
1210 * we get here...
1211 */
1212 spread_chunk = vector8_broadcast(chunk);
1213 vector8_load(&haystack1, &node->chunks[0]);
1214 vector8_load(&haystack2, &node->chunks[sizeof(Vector8)]);
1215 min1 = vector8_min(spread_chunk, haystack1);
1216 min2 = vector8_min(spread_chunk, haystack2);
1217 cmp1 = vector8_eq(spread_chunk, min1);
1218 cmp2 = vector8_eq(spread_chunk, min2);
1219 bitfield = vector8_highbit_mask(cmp1) | (vector8_highbit_mask(cmp2) << sizeof(Vector8));
1220
1221 Assert((bitfield & ((UINT64CONST(1) << count) - 1)) != 0);
1222 index_simd = pg_rightmost_one_pos32(bitfield);
1223
1224 Assert(index_simd == index);
1225 return index_simd;
1226#else
1227 return index;
1228#endif
1229}
1230
1231/* Shift the elements right at "insertpos" by one */
1232static inline void
1233RT_SHIFT_ARRAYS_FOR_INSERT(uint8 *chunks, RT_PTR_ALLOC * children, int count, int insertpos)
1234{
1235 /*
1236 * This is basically a memmove, but written in a simple loop for speed on
1237 * small inputs.
1238 */
1239 for (int i = count - 1; i >= insertpos; i--)
1240 {
1241 /* workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481 */
1242#ifdef __GNUC__
1243 __asm__("");
1244#endif
1245 chunks[i + 1] = chunks[i];
1246 children[i + 1] = children[i];
1247 }
1248}
1249
1250/*
1251 * Copy both chunk and slot arrays into the right
1252 * place. The caller is responsible for inserting the new element.
1253 */
1254static inline void
1256 uint8 *src_chunks, RT_PTR_ALLOC * src_children,
1257 int count, int insertpos)
1258{
1259 for (int i = 0; i < count; i++)
1260 {
1261 int sourceidx = i;
1262
1263 /* use a branch-free computation to skip the index of the new element */
1264 int destidx = i + (i >= insertpos);
1265
1266 dst_chunks[destidx] = src_chunks[sourceidx];
1267 dst_children[destidx] = src_children[sourceidx];
1268 }
1269}
1270
1271static inline RT_PTR_ALLOC *
1273{
1274 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
1275 int idx = RT_BM_IDX(chunk);
1276 int bitnum = RT_BM_BIT(chunk);
1277
1278 /* Mark the slot used for "chunk" */
1279 n256->isset[idx] |= ((bitmapword) 1 << bitnum);
1280
1281 n256->base.count++;
1282 RT_VERIFY_NODE((RT_NODE *) n256);
1283
1284 return RT_NODE_256_GET_CHILD(n256, chunk);
1285}
1286
1289 uint8 chunk)
1290{
1291 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
1292 RT_CHILD_PTR newnode;
1293 RT_NODE_256 *new256;
1294 int i = 0;
1295
1296 /* initialize new node */
1298 new256 = (RT_NODE_256 *) newnode.local;
1299
1300 /* copy over the entries */
1301 RT_COPY_COMMON(newnode, node);
1302 for (int word_num = 0; word_num < RT_BM_IDX(RT_NODE_MAX_SLOTS); word_num++)
1303 {
1304 bitmapword bitmap = 0;
1305
1306 /*
1307 * Bit manipulation is a surprisingly large portion of the overhead in
1308 * the naive implementation. Doing stores word-at-a-time removes a lot
1309 * of that overhead.
1310 */
1311 for (int bit = 0; bit < BITS_PER_BITMAPWORD; bit++)
1312 {
1313 uint8 offset = n48->slot_idxs[i];
1314
1315 if (offset != RT_INVALID_SLOT_IDX)
1316 {
1317 bitmap |= ((bitmapword) 1 << bit);
1318 new256->children[i] = n48->children[offset];
1319 }
1320
1321 i++;
1322 }
1323
1324 new256->isset[word_num] = bitmap;
1325 }
1326
1327 /* free old node and update reference in parent */
1328 *parent_slot = newnode.alloc;
1329 RT_FREE_NODE(tree, node);
1330
1331 return RT_ADD_CHILD_256(tree, newnode, chunk);
1332}
1333
1334static inline RT_PTR_ALLOC *
1336{
1337 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
1338 int insertpos;
1339 int idx = 0;
1340 bitmapword w,
1341 inverse;
1342
1343 /* get the first word with at least one bit not set */
1344 for (int i = 0; i < RT_BM_IDX(RT_FANOUT_48_MAX); i++)
1345 {
1346 w = n48->isset[i];
1347 if (w < ~((bitmapword) 0))
1348 {
1349 idx = i;
1350 break;
1351 }
1352 }
1353
1354 /* To get the first unset bit in w, get the first set bit in ~w */
1355 inverse = ~w;
1356 insertpos = idx * BITS_PER_BITMAPWORD;
1357 insertpos += bmw_rightmost_one_pos(inverse);
1358 Assert(insertpos < n48->base.fanout);
1359
1360 /* mark the slot used by setting the rightmost zero bit */
1361 n48->isset[idx] |= w + 1;
1362
1363 /* insert new chunk into place */
1364 n48->slot_idxs[chunk] = insertpos;
1365
1366 n48->base.count++;
1367 RT_VERIFY_NODE((RT_NODE *) n48);
1368
1369 return &n48->children[insertpos];
1370}
1371
1374 uint8 chunk)
1375{
1376 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
1377 int insertpos;
1378
1379 if (n16->base.fanout < RT_FANOUT_16_HI)
1380 {
1381 RT_CHILD_PTR newnode;
1382 RT_NODE_16 *new16;
1383
1385
1386 /* initialize new node */
1388 new16 = (RT_NODE_16 *) newnode.local;
1389
1390 /* copy over existing entries */
1391 RT_COPY_COMMON(newnode, node);
1393 insertpos = RT_NODE_16_GET_INSERTPOS(n16, chunk);
1395 n16->chunks, n16->children,
1396 RT_FANOUT_16_LO, insertpos);
1397
1398 /* insert new chunk into place */
1399 new16->chunks[insertpos] = chunk;
1400
1401 new16->base.count++;
1402 RT_VERIFY_NODE((RT_NODE *) new16);
1403
1404 /* free old node and update references */
1405 RT_FREE_NODE(tree, node);
1406 *parent_slot = newnode.alloc;
1407
1408 return &new16->children[insertpos];
1409 }
1410 else
1411 {
1412 RT_CHILD_PTR newnode;
1413 RT_NODE_48 *new48;
1414 int idx,
1415 bit;
1416
1418
1419 /* initialize new node */
1421 new48 = (RT_NODE_48 *) newnode.local;
1422
1423 /* copy over the entries */
1424 RT_COPY_COMMON(newnode, node);
1425 for (int i = 0; i < RT_FANOUT_16_HI; i++)
1426 new48->slot_idxs[n16->chunks[i]] = i;
1427 memcpy(&new48->children[0], &n16->children[0], RT_FANOUT_16_HI * sizeof(new48->children[0]));
1428
1429 /*
1430 * Since we just copied a dense array, we can fill "isset" using a
1431 * single store, provided the length of that array is at most the
1432 * number of bits in a bitmapword.
1433 */
1435 new48->isset[0] = (bitmapword) (((uint64) 1 << RT_FANOUT_16_HI) - 1);
1436
1437 /* put the new child at the end of the copied entries */
1438 insertpos = RT_FANOUT_16_HI;
1439 idx = RT_BM_IDX(insertpos);
1440 bit = RT_BM_BIT(insertpos);
1441
1442 /* mark the slot used */
1443 new48->isset[idx] |= ((bitmapword) 1 << bit);
1444
1445 /* insert new chunk into place */
1446 new48->slot_idxs[chunk] = insertpos;
1447
1448 new48->base.count++;
1449 RT_VERIFY_NODE((RT_NODE *) new48);
1450
1451 /* free old node and update reference in parent */
1452 *parent_slot = newnode.alloc;
1453 RT_FREE_NODE(tree, node);
1454
1455 return &new48->children[insertpos];
1456 }
1457}
1458
1459static inline RT_PTR_ALLOC *
1461{
1462 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
1463 int insertpos = RT_NODE_16_GET_INSERTPOS(n16, chunk);
1464
1465 /* shift chunks and children */
1467 n16->base.count, insertpos);
1468
1469 /* insert new chunk into place */
1470 n16->chunks[insertpos] = chunk;
1471
1472 n16->base.count++;
1473 RT_VERIFY_NODE((RT_NODE *) n16);
1474
1475 return &n16->children[insertpos];
1476}
1477
1480 uint8 chunk)
1481{
1482 RT_NODE_4 *n4 = (RT_NODE_4 *) (node.local);
1483 RT_CHILD_PTR newnode;
1484 RT_NODE_16 *new16;
1485 int insertpos;
1486
1487 /* initialize new node */
1489 new16 = (RT_NODE_16 *) newnode.local;
1490
1491 /* copy over existing entries */
1492 RT_COPY_COMMON(newnode, node);
1493 Assert(n4->base.count == RT_FANOUT_4);
1494 insertpos = RT_NODE_4_GET_INSERTPOS(n4, chunk, RT_FANOUT_4);
1496 n4->chunks, n4->children,
1497 RT_FANOUT_4, insertpos);
1498
1499 /* insert new chunk into place */
1500 new16->chunks[insertpos] = chunk;
1501
1502 new16->base.count++;
1503 RT_VERIFY_NODE((RT_NODE *) new16);
1504
1505 /* free old node and update reference in parent */
1506 *parent_slot = newnode.alloc;
1507 RT_FREE_NODE(tree, node);
1508
1509 return &new16->children[insertpos];
1510}
1511
1512static inline RT_PTR_ALLOC *
1514{
1515 RT_NODE_4 *n4 = (RT_NODE_4 *) (node.local);
1516 int count = n4->base.count;
1517 int insertpos = RT_NODE_4_GET_INSERTPOS(n4, chunk, count);
1518
1519 /* shift chunks and children */
1521 count, insertpos);
1522
1523 /* insert new chunk into place */
1524 n4->chunks[insertpos] = chunk;
1525
1526 n4->base.count++;
1527 RT_VERIFY_NODE((RT_NODE *) n4);
1528
1529 return &n4->children[insertpos];
1530}
1531
1532/*
1533 * Reserve slot in "node"'s child array. The caller will populate it
1534 * with the actual child pointer.
1535 *
1536 * "parent_slot" is the address of the parent's child pointer to "node".
1537 * If the node we're inserting into needs to grow, we update the parent's
1538 * child pointer with the pointer to the new larger node.
1539 */
1540static inline RT_PTR_ALLOC *
1542 uint8 chunk)
1543{
1544 RT_NODE *n = node.local;
1545
1546 switch (n->kind)
1547 {
1548 case RT_NODE_KIND_4:
1549 {
1551 return RT_GROW_NODE_4(tree, parent_slot, node, chunk);
1552
1553 return RT_ADD_CHILD_4(tree, node, chunk);
1554 }
1555 case RT_NODE_KIND_16:
1556 {
1558 return RT_GROW_NODE_16(tree, parent_slot, node, chunk);
1559
1560 return RT_ADD_CHILD_16(tree, node, chunk);
1561 }
1562 case RT_NODE_KIND_48:
1563 {
1565 return RT_GROW_NODE_48(tree, parent_slot, node, chunk);
1566
1567 return RT_ADD_CHILD_48(tree, node, chunk);
1568 }
1569 case RT_NODE_KIND_256:
1570 return RT_ADD_CHILD_256(tree, node, chunk);
1571 default:
1573 }
1574}
1575
1576/*
1577 * The radix tree doesn't have sufficient height. Put new node(s) on top,
1578 * and move the old node below it.
1579 */
1580static pg_noinline void
1582{
1583 int target_shift = RT_KEY_GET_SHIFT(key);
1584 int shift = tree->ctl->start_shift;
1585
1586 Assert(shift < target_shift);
1587
1588 /* Grow tree upwards until start shift can accommodate the key */
1589 while (shift < target_shift)
1590 {
1591 RT_CHILD_PTR node;
1592 RT_NODE_4 *n4;
1593
1595 n4 = (RT_NODE_4 *) node.local;
1596 n4->base.count = 1;
1597 n4->chunks[0] = 0;
1598 n4->children[0] = tree->ctl->root;
1599
1600 /* Update the root */
1601 tree->ctl->root = node.alloc;
1602
1603 shift += RT_SPAN;
1604 }
1605
1606 tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(target_shift);
1607 tree->ctl->start_shift = target_shift;
1608}
1609
1610/*
1611 * Insert a chain of nodes until we reach the lowest level,
1612 * and return the address of a slot to be filled further up
1613 * the call stack.
1614 */
1617{
1618 RT_CHILD_PTR node,
1619 child;
1620 RT_NODE_4 *n4;
1621
1622 /*
1623 * The child pointer of the first node in the chain goes in the
1624 * caller-provided slot.
1625 */
1627 *parent_slot = child.alloc;
1628
1629 node = child;
1630 shift -= RT_SPAN;
1631
1632 while (shift > 0)
1633 {
1635
1636 /* We open-code the insertion ourselves, for speed. */
1637 n4 = (RT_NODE_4 *) node.local;
1638 n4->base.count = 1;
1639 n4->chunks[0] = RT_GET_KEY_CHUNK(key, shift);
1640 n4->children[0] = child.alloc;
1641
1642 node = child;
1643 shift -= RT_SPAN;
1644 }
1645 Assert(shift == 0);
1646
1647 /* Reserve slot for the value. */
1648 n4 = (RT_NODE_4 *) node.local;
1649 n4->chunks[0] = RT_GET_KEY_CHUNK(key, 0);
1650 n4->base.count = 1;
1651
1652 return &n4->children[0];
1653}
1654
1655/*
1656 * Workhorse for RT_SET
1657 *
1658 * "parent_slot" is the address of the child pointer we just followed,
1659 * in the parent's array of children, needed if inserting into the
1660 * current node causes it to grow.
1661 */
1662static RT_PTR_ALLOC *
1663RT_GET_SLOT_RECURSIVE(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, uint64 key, int shift, bool *found)
1664{
1665 RT_PTR_ALLOC *slot;
1666 RT_CHILD_PTR node;
1667 uint8 chunk = RT_GET_KEY_CHUNK(key, shift);
1668
1669 node.alloc = *parent_slot;
1670 RT_PTR_SET_LOCAL(tree, &node);
1671 slot = RT_NODE_SEARCH(node.local, chunk);
1672
1673 if (slot == NULL)
1674 {
1675 *found = false;
1676
1677 /* reserve slot for the caller to populate */
1678
1679 slot = RT_NODE_INSERT(tree, parent_slot, node, chunk);
1680
1681 if (shift == 0)
1682 return slot;
1683 else
1684 return RT_EXTEND_DOWN(tree, slot, key, shift);
1685 }
1686 else
1687 {
1688 if (shift == 0)
1689 {
1690 *found = true;
1691 return slot;
1692 }
1693 else
1694 return RT_GET_SLOT_RECURSIVE(tree, slot, key, shift - RT_SPAN, found);
1695 }
1696}
1697
1698/*
1699 * Set key to value that value_p points to. If the entry already exists, we
1700 * update its value and return true. Returns false if entry doesn't yet exist.
1701 *
1702 * Taking a lock in exclusive mode is the caller's responsibility.
1703 */
1704RT_SCOPE bool
1706{
1707 bool found;
1708 RT_PTR_ALLOC *slot;
1709 size_t value_sz = RT_GET_VALUE_SIZE(value_p);
1710
1711#ifdef RT_SHMEM
1712 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1713#endif
1714
1715 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
1716
1717 /* Extend the tree if necessary */
1718 if (unlikely(key > tree->ctl->max_val))
1719 {
1720 if (tree->ctl->num_keys == 0)
1721 {
1722 RT_CHILD_PTR node;
1723 RT_NODE_4 *n4;
1725
1726 /*
1727 * With an empty root node, we don't extend the tree upwards,
1728 * since that would result in orphan empty nodes. Instead we open
1729 * code inserting into the root node and extend downward from
1730 * there.
1731 */
1732 node.alloc = tree->ctl->root;
1733 RT_PTR_SET_LOCAL(tree, &node);
1734 n4 = (RT_NODE_4 *) node.local;
1735 n4->base.count = 1;
1737
1738 slot = RT_EXTEND_DOWN(tree, &n4->children[0], key, start_shift);
1739 found = false;
1740 tree->ctl->start_shift = start_shift;
1741 tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(start_shift);
1742 goto have_slot;
1743 }
1744 else
1746 }
1747
1748 slot = RT_GET_SLOT_RECURSIVE(tree, &tree->ctl->root,
1749 key, tree->ctl->start_shift, &found);
1750
1751have_slot:
1752 Assert(slot != NULL);
1753
1754 if (RT_VALUE_IS_EMBEDDABLE(value_p))
1755 {
1756 /* free the existing leaf */
1757 if (found && !RT_CHILDPTR_IS_VALUE(*slot))
1758 RT_FREE_LEAF(tree, *slot);
1759
1760 /* store value directly in child pointer slot */
1761 memcpy(slot, value_p, value_sz);
1762
1763#ifdef RT_RUNTIME_EMBEDDABLE_VALUE
1764 /* tag child pointer */
1765#ifdef RT_SHMEM
1766 *slot |= 1;
1767#else
1768 *((uintptr_t *) slot) |= 1;
1769#endif
1770#endif
1771 }
1772 else
1773 {
1774 RT_CHILD_PTR leaf;
1775
1776 if (found && !RT_CHILDPTR_IS_VALUE(*slot))
1777 {
1779 leaf.alloc = *slot;
1780 RT_PTR_SET_LOCAL(tree, &leaf);
1781
1782 if (RT_GET_VALUE_SIZE((RT_VALUE_TYPE *) leaf.local) != value_sz)
1783 {
1784 /*
1785 * different sizes, so first free the existing leaf before
1786 * allocating a new one
1787 */
1788 RT_FREE_LEAF(tree, *slot);
1789 leaf = RT_ALLOC_LEAF(tree, value_sz);
1790 *slot = leaf.alloc;
1791 }
1792 }
1793 else
1794 {
1795 /* allocate new leaf and store it in the child array */
1796 leaf = RT_ALLOC_LEAF(tree, value_sz);
1797 *slot = leaf.alloc;
1798 }
1799
1800 memcpy(leaf.local, value_p, value_sz);
1801 }
1802
1803 /* Update the statistics */
1804 if (!found)
1805 tree->ctl->num_keys++;
1806
1807 return found;
1808}
1809
1810/***************** SETUP / TEARDOWN *****************/
1811
1812/*
1813 * Create the radix tree in the given memory context and return it.
1814 *
1815 * All local memory required for a radix tree is allocated in the given
1816 * memory context and its children. Note that RT_FREE() will delete all
1817 * allocated space within the given memory context, so the dsa_area should
1818 * be created in a different context.
1819 */
1821#ifdef RT_SHMEM
1822RT_CREATE(MemoryContext ctx, dsa_area *dsa, int tranche_id)
1823#else
1825#endif
1826{
1830#ifdef RT_SHMEM
1831 dsa_pointer dp;
1832#endif
1833
1835
1837 tree->context = ctx;
1838
1839 /*
1840 * Separate context for iteration in case the tree context doesn't support
1841 * pfree
1842 */
1843 tree->iter_context = AllocSetContextCreate(ctx,
1844 RT_STR(RT_PREFIX) "_radix_tree iter context",
1846
1847#ifdef RT_SHMEM
1848 tree->dsa = dsa;
1849 dp = dsa_allocate0(dsa, sizeof(RT_RADIX_TREE_CONTROL));
1850 tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, dp);
1851 tree->ctl->handle = dp;
1852 tree->ctl->magic = RT_RADIX_TREE_MAGIC;
1853 LWLockInitialize(&tree->ctl->lock, tranche_id);
1854#else
1856
1857 /* Create a slab context for each size class */
1858 for (int i = 0; i < RT_NUM_SIZE_CLASSES; i++)
1859 {
1861 size_t inner_blocksize = RT_SLAB_BLOCK_SIZE(size_class.allocsize);
1862
1863 tree->node_slabs[i] = SlabContextCreate(ctx,
1864 size_class.name,
1865 inner_blocksize,
1866 size_class.allocsize);
1867 }
1868
1869 /* By default we use the passed context for leaves. */
1870 tree->leaf_context = tree->context;
1871
1872#ifndef RT_VARLEN_VALUE_SIZE
1873
1874 /*
1875 * For leaves storing fixed-length values, we use a slab context to avoid
1876 * the possibility of space wastage by power-of-2 rounding up.
1877 */
1878 if (sizeof(RT_VALUE_TYPE) > sizeof(RT_PTR_ALLOC))
1879 tree->leaf_context = SlabContextCreate(ctx,
1880 RT_STR(RT_PREFIX) "_radix_tree leaf context",
1882 sizeof(RT_VALUE_TYPE));
1883#endif /* !RT_VARLEN_VALUE_SIZE */
1884#endif /* RT_SHMEM */
1885
1886 /* add root node now so that RT_SET can assume it exists */
1888 tree->ctl->root = rootnode.alloc;
1889 tree->ctl->start_shift = 0;
1890 tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(0);
1891
1893
1894 return tree;
1895}
1896
1897#ifdef RT_SHMEM
1899RT_ATTACH(dsa_area *dsa, RT_HANDLE handle)
1900{
1902 dsa_pointer control;
1903
1904 tree = (RT_RADIX_TREE *) palloc0(sizeof(RT_RADIX_TREE));
1905
1906 /* Find the control object in shared memory */
1907 control = handle;
1908
1909 tree->dsa = dsa;
1910 tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, control);
1911 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1912
1913 return tree;
1914}
1915
1916RT_SCOPE void
1917RT_DETACH(RT_RADIX_TREE * tree)
1918{
1919 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1920 pfree(tree);
1921}
1922
1923RT_SCOPE RT_HANDLE
1924RT_GET_HANDLE(RT_RADIX_TREE * tree)
1925{
1926 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1927 return tree->ctl->handle;
1928}
1929
1930RT_SCOPE void
1931RT_LOCK_EXCLUSIVE(RT_RADIX_TREE * tree)
1932{
1933 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1934 LWLockAcquire(&tree->ctl->lock, LW_EXCLUSIVE);
1935}
1936
1937RT_SCOPE void
1938RT_LOCK_SHARE(RT_RADIX_TREE * tree)
1939{
1940 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1941 LWLockAcquire(&tree->ctl->lock, LW_SHARED);
1942}
1943
1944RT_SCOPE void
1945RT_UNLOCK(RT_RADIX_TREE * tree)
1946{
1947 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1948 LWLockRelease(&tree->ctl->lock);
1949}
1950
1951/*
1952 * Recursively free all nodes allocated in the DSA area.
1953 */
1954static void
1956{
1957 RT_CHILD_PTR node;
1958
1960
1961 node.alloc = ptr;
1962 RT_PTR_SET_LOCAL(tree, &node);
1963
1964 switch (node.local->kind)
1965 {
1966 case RT_NODE_KIND_4:
1967 {
1968 RT_NODE_4 *n4 = (RT_NODE_4 *) node.local;
1969
1970 for (int i = 0; i < n4->base.count; i++)
1971 {
1972 RT_PTR_ALLOC child = n4->children[i];
1973
1974 if (shift > 0)
1975 RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
1976 else if (!RT_CHILDPTR_IS_VALUE(child))
1977 dsa_free(tree->dsa, child);
1978 }
1979
1980 break;
1981 }
1982 case RT_NODE_KIND_16:
1983 {
1984 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
1985
1986 for (int i = 0; i < n16->base.count; i++)
1987 {
1988 RT_PTR_ALLOC child = n16->children[i];
1989
1990 if (shift > 0)
1991 RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
1992 else if (!RT_CHILDPTR_IS_VALUE(child))
1993 dsa_free(tree->dsa, child);
1994 }
1995
1996 break;
1997 }
1998 case RT_NODE_KIND_48:
1999 {
2000 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
2001
2002 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
2003 {
2004 RT_PTR_ALLOC child;
2005
2006 if (!RT_NODE_48_IS_CHUNK_USED(n48, i))
2007 continue;
2008
2009 child = *RT_NODE_48_GET_CHILD(n48, i);
2010
2011 if (shift > 0)
2012 RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
2013 else if (!RT_CHILDPTR_IS_VALUE(child))
2014 dsa_free(tree->dsa, child);
2015 }
2016
2017 break;
2018 }
2019 case RT_NODE_KIND_256:
2020 {
2021 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
2022
2023 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
2024 {
2025 RT_PTR_ALLOC child;
2026
2027 if (!RT_NODE_256_IS_CHUNK_USED(n256, i))
2028 continue;
2029
2030 child = *RT_NODE_256_GET_CHILD(n256, i);
2031
2032 if (shift > 0)
2033 RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
2034 else if (!RT_CHILDPTR_IS_VALUE(child))
2035 dsa_free(tree->dsa, child);
2036 }
2037
2038 break;
2039 }
2040 }
2041
2042 /* Free the inner node */
2043 dsa_free(tree->dsa, ptr);
2044}
2045#endif
2046
2047/*
2048 * Free the radix tree, including all nodes and leaves.
2049 */
2050RT_SCOPE void
2052{
2053#ifdef RT_SHMEM
2054 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
2055
2056 /* Free all memory used for radix tree nodes */
2057 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
2058 RT_FREE_RECURSE(tree, tree->ctl->root, tree->ctl->start_shift);
2059
2060 /*
2061 * Vandalize the control block to help catch programming error where other
2062 * backends access the memory formerly occupied by this radix tree.
2063 */
2064 tree->ctl->magic = 0;
2065 dsa_free(tree->dsa, tree->ctl->handle);
2066#endif
2067
2068 /*
2069 * Free all space allocated within the tree's context and delete all child
2070 * contexts such as those used for nodes.
2071 */
2072 MemoryContextReset(tree->context);
2073}
2074
2075/***************** ITERATION *****************/
2076
2077/*
2078 * Create and return the iterator for the given radix tree.
2079 *
2080 * Taking a lock in shared mode during the iteration is the caller's
2081 * responsibility.
2082 */
2085{
2086 RT_ITER *iter;
2088
2089 iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
2090 sizeof(RT_ITER));
2091 iter->tree = tree;
2092
2093 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
2094 root.alloc = iter->tree->ctl->root;
2096
2097 iter->top_level = iter->tree->ctl->start_shift / RT_SPAN;
2098
2099 /* Set the root to start */
2100 iter->cur_level = iter->top_level;
2101 iter->node_iters[iter->cur_level].node = root;
2102 iter->node_iters[iter->cur_level].idx = 0;
2103
2104 return iter;
2105}
2106
2107/*
2108 * Scan the inner node and return the next child pointer if one exists, otherwise
2109 * return NULL.
2110 */
2111static inline RT_PTR_ALLOC *
2113{
2114 uint8 key_chunk = 0;
2115 RT_NODE_ITER *node_iter;
2116 RT_CHILD_PTR node;
2117 RT_PTR_ALLOC *slot = NULL;
2118
2119#ifdef RT_SHMEM
2120 Assert(iter->tree->ctl->magic == RT_RADIX_TREE_MAGIC);
2121#endif
2122
2123 node_iter = &(iter->node_iters[level]);
2124 node = node_iter->node;
2125
2126 Assert(node.local != NULL);
2127
2128 switch ((node.local)->kind)
2129 {
2130 case RT_NODE_KIND_4:
2131 {
2132 RT_NODE_4 *n4 = (RT_NODE_4 *) (node.local);
2133
2134 if (node_iter->idx >= n4->base.count)
2135 return NULL;
2136
2137 slot = &n4->children[node_iter->idx];
2138 key_chunk = n4->chunks[node_iter->idx];
2139 node_iter->idx++;
2140 break;
2141 }
2142 case RT_NODE_KIND_16:
2143 {
2144 RT_NODE_16 *n16 = (RT_NODE_16 *) (node.local);
2145
2146 if (node_iter->idx >= n16->base.count)
2147 return NULL;
2148
2149 slot = &n16->children[node_iter->idx];
2150 key_chunk = n16->chunks[node_iter->idx];
2151 node_iter->idx++;
2152 break;
2153 }
2154 case RT_NODE_KIND_48:
2155 {
2156 RT_NODE_48 *n48 = (RT_NODE_48 *) (node.local);
2157 int chunk;
2158
2159 for (chunk = node_iter->idx; chunk < RT_NODE_MAX_SLOTS; chunk++)
2160 {
2162 break;
2163 }
2164
2165 if (chunk >= RT_NODE_MAX_SLOTS)
2166 return NULL;
2167
2168 slot = RT_NODE_48_GET_CHILD(n48, chunk);
2169
2170 key_chunk = chunk;
2171 node_iter->idx = chunk + 1;
2172 break;
2173 }
2174 case RT_NODE_KIND_256:
2175 {
2176 RT_NODE_256 *n256 = (RT_NODE_256 *) (node.local);
2177 int chunk;
2178
2179 for (chunk = node_iter->idx; chunk < RT_NODE_MAX_SLOTS; chunk++)
2180 {
2182 break;
2183 }
2184
2185 if (chunk >= RT_NODE_MAX_SLOTS)
2186 return NULL;
2187
2188 slot = RT_NODE_256_GET_CHILD(n256, chunk);
2189
2190 key_chunk = chunk;
2191 node_iter->idx = chunk + 1;
2192 break;
2193 }
2194 }
2195
2196 /* Update the key */
2197 iter->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
2198 iter->key |= (((uint64) key_chunk) << (level * RT_SPAN));
2199
2200 return slot;
2201}
2202
2203/*
2204 * Return pointer to value and set key_p as long as there is a key. Otherwise
2205 * return NULL.
2206 */
2209{
2210 RT_PTR_ALLOC *slot = NULL;
2211
2212 while (iter->cur_level <= iter->top_level)
2213 {
2214 RT_CHILD_PTR node;
2215
2216 slot = RT_NODE_ITERATE_NEXT(iter, iter->cur_level);
2217
2218 if (iter->cur_level == 0 && slot != NULL)
2219 {
2220 /* Found a value at the leaf node */
2221 *key_p = iter->key;
2222 node.alloc = *slot;
2223
2224 if (RT_CHILDPTR_IS_VALUE(*slot))
2225 return (RT_VALUE_TYPE *) slot;
2226 else
2227 {
2228 RT_PTR_SET_LOCAL(iter->tree, &node);
2229 return (RT_VALUE_TYPE *) node.local;
2230 }
2231 }
2232
2233 if (slot != NULL)
2234 {
2235 /* Found the child slot, move down the tree */
2236 node.alloc = *slot;
2237 RT_PTR_SET_LOCAL(iter->tree, &node);
2238
2239 iter->cur_level--;
2240 iter->node_iters[iter->cur_level].node = node;
2241 iter->node_iters[iter->cur_level].idx = 0;
2242 }
2243 else
2244 {
2245 /* Not found the child slot, move up the tree */
2246 iter->cur_level++;
2247 }
2248 }
2249
2250 /* We've visited all nodes, so the iteration finished */
2251 return NULL;
2252}
2253
2254/*
2255 * Terminate the iteration. The caller is responsible for releasing any locks.
2256 */
2257RT_SCOPE void
2259{
2260 pfree(iter);
2261}
2262
2263/***************** DELETION *****************/
2264
2265#ifdef RT_USE_DELETE
2266
2267/* Delete the element at "deletepos" */
2268static inline void
2269RT_SHIFT_ARRAYS_AND_DELETE(uint8 *chunks, RT_PTR_ALLOC * children, int count, int deletepos)
2270{
2271 /*
2272 * This is basically a memmove, but written in a simple loop for speed on
2273 * small inputs.
2274 */
2275 for (int i = deletepos; i < count - 1; i++)
2276 {
2277 /* workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481 */
2278#ifdef __GNUC__
2279 __asm__("");
2280#endif
2281 chunks[i] = chunks[i + 1];
2282 children[i] = children[i + 1];
2283 }
2284}
2285
2286/*
2287 * Copy both chunk and slot arrays into the right
2288 * place. The element at "deletepos" is deleted by skipping it.
2289 */
2290static inline void
2291RT_COPY_ARRAYS_AND_DELETE(uint8 *dst_chunks, RT_PTR_ALLOC * dst_children,
2292 uint8 *src_chunks, RT_PTR_ALLOC * src_children,
2293 int count, int deletepos)
2294{
2295 for (int i = 0; i < count - 1; i++)
2296 {
2297 /*
2298 * use a branch-free computation to skip the index of the deleted
2299 * element
2300 */
2301 int sourceidx = i + (i >= deletepos);
2302 int destidx = i;
2303
2304 dst_chunks[destidx] = src_chunks[sourceidx];
2305 dst_children[destidx] = src_children[sourceidx];
2306 }
2307}
2308
2309/*
2310 * Note: While all node-growing functions are called to perform an insertion
2311 * when no more space is available, shrinking is not a hard-and-fast requirement.
2312 * When shrinking nodes, we generally wait until the count is about 3/4 of
2313 * the next lower node's fanout. This prevents ping-ponging between different
2314 * node sizes.
2315 *
2316 * Some shrinking functions delete first and then shrink, either because we
2317 * must or because it's fast and simple that way. Sometimes it's faster to
2318 * delete while shrinking.
2319 */
2320
2321/*
2322 * Move contents of a node256 to a node48. Any deletion should have happened
2323 * in the caller.
2324 */
2325static void pg_noinline
2327{
2328 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
2329 RT_CHILD_PTR newnode;
2330 RT_NODE_48 *new48;
2331 int slot_idx = 0;
2332
2333 /* initialize new node */
2335 new48 = (RT_NODE_48 *) newnode.local;
2336
2337 /* copy over the entries */
2338 RT_COPY_COMMON(newnode, node);
2339 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
2340 {
2341 if (RT_NODE_256_IS_CHUNK_USED(n256, i))
2342 {
2343 new48->slot_idxs[i] = slot_idx;
2344 new48->children[slot_idx] = n256->children[i];
2345 slot_idx++;
2346 }
2347 }
2348
2349 /*
2350 * Since we just copied a dense array, we can fill "isset" using a single
2351 * store, provided the length of that array is at most the number of bits
2352 * in a bitmapword.
2353 */
2355 new48->isset[0] = (bitmapword) (((uint64) 1 << n256->base.count) - 1);
2356
2357 /* free old node and update reference in parent */
2358 *parent_slot = newnode.alloc;
2359 RT_FREE_NODE(tree, node);
2360}
2361
2362static inline void
2364{
2365 int shrink_threshold;
2366 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
2367 int idx = RT_BM_IDX(chunk);
2368 int bitnum = RT_BM_BIT(chunk);
2369
2370 /* Mark the slot free for "chunk" */
2371 n256->isset[idx] &= ~((bitmapword) 1 << bitnum);
2372
2373 n256->base.count--;
2374
2375 /*
2376 * A full node256 will have a count of zero because of overflow, so we
2377 * delete first before checking the shrink threshold.
2378 */
2379 Assert(n256->base.count > 0);
2380
2381 /* This simplifies RT_SHRINK_NODE_256() */
2382 shrink_threshold = BITS_PER_BITMAPWORD;
2383 shrink_threshold = Min(RT_FANOUT_48 / 4 * 3, shrink_threshold);
2384
2385 if (n256->base.count <= shrink_threshold)
2386 RT_SHRINK_NODE_256(tree, parent_slot, node, chunk);
2387}
2388
2389/*
2390 * Move contents of a node48 to a node16. Any deletion should have happened
2391 * in the caller.
2392 */
2393static void pg_noinline
2395{
2396 RT_NODE_48 *n48 = (RT_NODE_48 *) (node.local);
2397 RT_CHILD_PTR newnode;
2398 RT_NODE_16 *new16;
2399 int destidx = 0;
2400
2401 /*
2402 * Initialize new node. For now we skip the larger node16 size class for
2403 * simplicity.
2404 */
2406 new16 = (RT_NODE_16 *) newnode.local;
2407
2408 /* copy over all existing entries */
2409 RT_COPY_COMMON(newnode, node);
2410 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
2411 {
2412 if (n48->slot_idxs[chunk] != RT_INVALID_SLOT_IDX)
2413 {
2414 new16->chunks[destidx] = chunk;
2415 new16->children[destidx] = n48->children[n48->slot_idxs[chunk]];
2416 destidx++;
2417 }
2418 }
2419
2420 Assert(destidx < new16->base.fanout);
2421
2422 RT_VERIFY_NODE((RT_NODE *) new16);
2423
2424 /* free old node and update reference in parent */
2425 *parent_slot = newnode.alloc;
2426 RT_FREE_NODE(tree, node);
2427}
2428
2429static inline void
2431{
2432 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
2433 int deletepos = n48->slot_idxs[chunk];
2434
2435 /* For now we skip the larger node16 size class for simplicity */
2436 int shrink_threshold = RT_FANOUT_16_LO / 4 * 3;
2437 int idx;
2438 int bitnum;
2439
2440 Assert(deletepos != RT_INVALID_SLOT_IDX);
2441
2442 idx = RT_BM_IDX(deletepos);
2443 bitnum = RT_BM_BIT(deletepos);
2444 n48->isset[idx] &= ~((bitmapword) 1 << bitnum);
2446
2447 n48->base.count--;
2448
2449 /*
2450 * To keep shrinking simple, do it after deleting, which is fast for
2451 * node48 anyway.
2452 */
2453 if (n48->base.count <= shrink_threshold)
2454 RT_SHRINK_NODE_48(tree, parent_slot, node, chunk);
2455}
2456
2457/*
2458 * Move contents of a node16 to a node4, and delete the one at "deletepos".
2459 * By deleting as we move, we can avoid memmove operations in the new
2460 * node.
2461 */
2462static void pg_noinline
2463RT_SHRINK_NODE_16(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 deletepos)
2464{
2465 RT_NODE_16 *n16 = (RT_NODE_16 *) (node.local);
2466 RT_CHILD_PTR newnode;
2467 RT_NODE_4 *new4;
2468
2469 /* initialize new node */
2471 new4 = (RT_NODE_4 *) newnode.local;
2472
2473 /* copy over existing entries, except for the one at "deletepos" */
2474 RT_COPY_COMMON(newnode, node);
2476 n16->chunks, n16->children,
2477 n16->base.count, deletepos);
2478
2479 new4->base.count--;
2480 RT_VERIFY_NODE((RT_NODE *) new4);
2481
2482 /* free old node and update reference in parent */
2483 *parent_slot = newnode.alloc;
2484 RT_FREE_NODE(tree, node);
2485}
2486
2487static inline void
2489{
2490 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
2491 int deletepos = slot - n16->children;
2492
2493 /*
2494 * When shrinking to node4, 4 is hard-coded. After shrinking, the new node
2495 * will end up with 3 elements and 3 is the largest count where linear
2496 * search is faster than SIMD, at least on x86-64.
2497 */
2498 if (n16->base.count <= 4)
2499 {
2500 RT_SHRINK_NODE_16(tree, parent_slot, node, deletepos);
2501 return;
2502 }
2503
2504 Assert(deletepos >= 0);
2505 Assert(n16->chunks[deletepos] == chunk);
2506
2508 n16->base.count, deletepos);
2509 n16->base.count--;
2510}
2511
2512static inline void
2514{
2515 RT_NODE_4 *n4 = (RT_NODE_4 *) node.local;
2516
2517 if (n4->base.count == 1)
2518 {
2519 Assert(n4->chunks[0] == chunk);
2520
2521 /*
2522 * If we're deleting the last entry from the root child node don't
2523 * free it, but mark both the tree and the root child node empty. That
2524 * way, RT_SET can assume it exists.
2525 */
2526 if (parent_slot == &tree->ctl->root)
2527 {
2528 n4->base.count = 0;
2529 tree->ctl->start_shift = 0;
2530 tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(0);
2531 }
2532 else
2533 {
2534 /*
2535 * Deleting last entry, so just free the entire node.
2536 * RT_DELETE_RECURSIVE has already freed the value and lower-level
2537 * children.
2538 */
2539 RT_FREE_NODE(tree, node);
2540
2541 /*
2542 * Also null out the parent's slot -- this tells the next higher
2543 * level to delete its child pointer
2544 */
2545 *parent_slot = RT_INVALID_PTR_ALLOC;
2546 }
2547 }
2548 else
2549 {
2550 int deletepos = slot - n4->children;
2551
2552 Assert(deletepos >= 0);
2553 Assert(n4->chunks[deletepos] == chunk);
2554
2556 n4->base.count, deletepos);
2557
2558 n4->base.count--;
2559 }
2560}
2561
2562/*
2563 * Delete the child pointer corresponding to "key" in the given node.
2564 */
2565static inline void
2567{
2568 switch ((node.local)->kind)
2569 {
2570 case RT_NODE_KIND_4:
2571 {
2572 RT_REMOVE_CHILD_4(tree, parent_slot, node, chunk, slot);
2573 return;
2574 }
2575 case RT_NODE_KIND_16:
2576 {
2577 RT_REMOVE_CHILD_16(tree, parent_slot, node, chunk, slot);
2578 return;
2579 }
2580 case RT_NODE_KIND_48:
2581 {
2582 RT_REMOVE_CHILD_48(tree, parent_slot, node, chunk);
2583 return;
2584 }
2585 case RT_NODE_KIND_256:
2586 {
2587 RT_REMOVE_CHILD_256(tree, parent_slot, node, chunk);
2588 return;
2589 }
2590 default:
2592 }
2593}
2594
2595/* workhorse for RT_DELETE */
2596static bool
2597RT_DELETE_RECURSIVE(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, uint64 key, int shift)
2598{
2599 RT_PTR_ALLOC *slot;
2600 RT_CHILD_PTR node;
2601 uint8 chunk = RT_GET_KEY_CHUNK(key, shift);
2602
2603 node.alloc = *parent_slot;
2604 RT_PTR_SET_LOCAL(tree, &node);
2605 slot = RT_NODE_SEARCH(node.local, chunk);
2606
2607 if (slot == NULL)
2608 return false;
2609
2610 if (shift == 0)
2611 {
2612 if (!RT_CHILDPTR_IS_VALUE(*slot))
2613 RT_FREE_LEAF(tree, *slot);
2614
2615 RT_NODE_DELETE(tree, parent_slot, node, chunk, slot);
2616 return true;
2617 }
2618 else
2619 {
2620 bool deleted;
2621
2622 deleted = RT_DELETE_RECURSIVE(tree, slot, key, shift - RT_SPAN);
2623
2624 /* Child node was freed, so delete its slot now */
2625 if (*slot == RT_INVALID_PTR_ALLOC)
2626 {
2627 Assert(deleted);
2628 RT_NODE_DELETE(tree, parent_slot, node, chunk, slot);
2629 }
2630
2631 return deleted;
2632 }
2633}
2634
2635/*
2636 * Delete the given key from the radix tree. If the key is found delete it
2637 * and return true, otherwise do nothing and return false.
2638 *
2639 * Taking a lock in exclusive mode is the caller's responsibility.
2640 */
2641RT_SCOPE bool
2642RT_DELETE(RT_RADIX_TREE * tree, uint64 key)
2643{
2644 bool deleted;
2645
2646#ifdef RT_SHMEM
2647 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
2648#endif
2649
2650 if (key > tree->ctl->max_val)
2651 return false;
2652
2653 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
2654 deleted = RT_DELETE_RECURSIVE(tree, &tree->ctl->root,
2655 key, tree->ctl->start_shift);
2656
2657 /* Found the key to delete. Update the statistics */
2658 if (deleted)
2659 {
2660 tree->ctl->num_keys--;
2661 Assert(tree->ctl->num_keys >= 0);
2662 }
2663
2664 return deleted;
2665}
2666
2667#endif /* RT_USE_DELETE */
2668
2669/***************** UTILITY FUNCTIONS *****************/
2670
2671/*
2672 * Return the statistics of the amount of memory used by the radix tree.
2673 *
2674 * Since dsa_get_total_size() does appropriate locking, the caller doesn't
2675 * need to take a lock.
2676 */
2679{
2680 size_t total = 0;
2681
2682#ifdef RT_SHMEM
2683 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
2684 total = dsa_get_total_size(tree->dsa);
2685#else
2686 total = MemoryContextMemAllocated(tree->context, true);
2687#endif
2688
2689 return total;
2690}
2691
2692/*
2693 * Perform some sanity checks on the given node.
2694 */
2695static void
2697{
2698#ifdef USE_ASSERT_CHECKING
2699
2700 switch (node->kind)
2701 {
2702 case RT_NODE_KIND_4:
2703 {
2704 RT_NODE_4 *n4 = (RT_NODE_4 *) node;
2705
2706 /* RT_DUMP_NODE(node); */
2707
2708 for (int i = 1; i < n4->base.count; i++)
2709 Assert(n4->chunks[i - 1] < n4->chunks[i]);
2710
2711 break;
2712 }
2713 case RT_NODE_KIND_16:
2714 {
2715 RT_NODE_16 *n16 = (RT_NODE_16 *) node;
2716
2717 /* RT_DUMP_NODE(node); */
2718
2719 for (int i = 1; i < n16->base.count; i++)
2720 Assert(n16->chunks[i - 1] < n16->chunks[i]);
2721
2722 break;
2723 }
2724 case RT_NODE_KIND_48:
2725 {
2726 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
2727 int cnt = 0;
2728
2729 /* RT_DUMP_NODE(node); */
2730
2731 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
2732 {
2733 uint8 slot = n48->slot_idxs[i];
2734 int idx = RT_BM_IDX(slot);
2735 int bitnum = RT_BM_BIT(slot);
2736
2737 if (!RT_NODE_48_IS_CHUNK_USED(n48, i))
2738 continue;
2739
2740 /* Check if the corresponding slot is used */
2741 Assert(slot < node->fanout);
2742 Assert((n48->isset[idx] & ((bitmapword) 1 << bitnum)) != 0);
2743
2744 cnt++;
2745 }
2746
2747 Assert(n48->base.count == cnt);
2748
2749 break;
2750 }
2751 case RT_NODE_KIND_256:
2752 {
2753 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
2754 int cnt = 0;
2755
2756 /* RT_DUMP_NODE(node); */
2757
2758 for (int i = 0; i < RT_BM_IDX(RT_NODE_MAX_SLOTS); i++)
2759 cnt += bmw_popcount(n256->isset[i]);
2760
2761 /*
2762 * Check if the number of used chunk matches, accounting for
2763 * overflow
2764 */
2765 if (cnt == RT_FANOUT_256)
2766 Assert(n256->base.count == 0);
2767 else
2768 Assert(n256->base.count == cnt);
2769
2770 break;
2771 }
2772 }
2773#endif
2774}
2775
2776/***************** DEBUG FUNCTIONS *****************/
2777
2778#ifdef RT_DEBUG
2779
2780/*
2781 * Print out tree stats, some of which are only collected in debugging builds.
2782 */
2783RT_SCOPE void
2785{
2786 fprintf(stderr, "max_val = " UINT64_FORMAT "\n", tree->ctl->max_val);
2787 fprintf(stderr, "num_keys = %lld\n", (long long) tree->ctl->num_keys);
2788
2789#ifdef RT_SHMEM
2790 fprintf(stderr, "handle = " DSA_POINTER_FORMAT "\n", tree->ctl->handle);
2791#endif
2792
2793 fprintf(stderr, "height = %d", tree->ctl->start_shift / RT_SPAN);
2794
2795 for (int i = 0; i < RT_NUM_SIZE_CLASSES; i++)
2796 {
2798
2799 fprintf(stderr, ", n%d = %lld", size_class.fanout, (long long) tree->ctl->num_nodes[i]);
2800 }
2801
2802 fprintf(stderr, ", leaves = %lld", (long long) tree->ctl->num_leaves);
2803
2804 fprintf(stderr, "\n");
2805}
2806
2807/*
2808 * Print out debugging information about the given node.
2809 */
2810static void
2812RT_DUMP_NODE(RT_NODE * node)
2813{
2814#ifdef RT_SHMEM
2815#define RT_CHILD_PTR_FORMAT DSA_POINTER_FORMAT
2816#else
2817#define RT_CHILD_PTR_FORMAT "%p"
2818#endif
2819
2820 fprintf(stderr, "kind %d, fanout %d, count %u\n",
2821 (node->kind == RT_NODE_KIND_4) ? 4 :
2822 (node->kind == RT_NODE_KIND_16) ? 16 :
2823 (node->kind == RT_NODE_KIND_48) ? 48 : 256,
2824 node->fanout == 0 ? 256 : node->fanout,
2825 node->count == 0 ? 256 : node->count);
2826
2827 switch (node->kind)
2828 {
2829 case RT_NODE_KIND_4:
2830 {
2831 RT_NODE_4 *n4 = (RT_NODE_4 *) node;
2832
2833 fprintf(stderr, "chunks and slots:\n");
2834 for (int i = 0; i < n4->base.count; i++)
2835 {
2836 fprintf(stderr, " [%d] chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
2837 i, n4->chunks[i], n4->children[i]);
2838 }
2839
2840 break;
2841 }
2842 case RT_NODE_KIND_16:
2843 {
2844 RT_NODE_16 *n16 = (RT_NODE_16 *) node;
2845
2846 fprintf(stderr, "chunks and slots:\n");
2847 for (int i = 0; i < n16->base.count; i++)
2848 {
2849 fprintf(stderr, " [%d] chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
2850 i, n16->chunks[i], n16->children[i]);
2851 }
2852 break;
2853 }
2854 case RT_NODE_KIND_48:
2855 {
2856 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
2857 char *sep = "";
2858
2859 fprintf(stderr, "slot_idxs: \n");
2860 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
2861 {
2863 continue;
2864
2865 fprintf(stderr, " idx[%d] = %d\n",
2866 chunk, n48->slot_idxs[chunk]);
2867 }
2868
2869 fprintf(stderr, "isset-bitmap: ");
2870 for (int i = 0; i < (RT_FANOUT_48_MAX / BITS_PER_BYTE); i++)
2871 {
2872 fprintf(stderr, "%s%x", sep, ((uint8 *) n48->isset)[i]);
2873 sep = " ";
2874 }
2875 fprintf(stderr, "\n");
2876
2877 fprintf(stderr, "chunks and slots:\n");
2878 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
2879 {
2881 continue;
2882
2883 fprintf(stderr, " chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
2884 chunk,
2886 }
2887 break;
2888 }
2889 case RT_NODE_KIND_256:
2890 {
2891 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
2892 char *sep = "";
2893
2894 fprintf(stderr, "isset-bitmap: ");
2895 for (int i = 0; i < (RT_FANOUT_256 / BITS_PER_BYTE); i++)
2896 {
2897 fprintf(stderr, "%s%x", sep, ((uint8 *) n256->isset)[i]);
2898 sep = " ";
2899 }
2900 fprintf(stderr, "\n");
2901
2902 fprintf(stderr, "chunks and slots:\n");
2903 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
2904 {
2905 if (!RT_NODE_256_IS_CHUNK_USED(n256, chunk))
2906 continue;
2907
2908 fprintf(stderr, " chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
2909 chunk,
2911 }
2912 break;
2913 }
2914 }
2915}
2916#endif /* RT_DEBUG */
2917
2918#endif /* RT_DEFINE */
2919
2920
2921/* undefine external parameters, so next radix tree can be defined */
2922#undef RT_PREFIX
2923#undef RT_SCOPE
2924#undef RT_DECLARE
2925#undef RT_DEFINE
2926#undef RT_VALUE_TYPE
2927#undef RT_VARLEN_VALUE_SIZE
2928#undef RT_RUNTIME_EMBEDDABLE_VALUE
2929#undef RT_SHMEM
2930#undef RT_USE_DELETE
2931#undef RT_DEBUG
2932
2933/* locally declared macros */
2934#undef RT_MAKE_PREFIX
2935#undef RT_MAKE_NAME
2936#undef RT_MAKE_NAME_
2937#undef RT_STR
2938#undef RT_STR_
2939#undef RT_SPAN
2940#undef RT_NODE_MAX_SLOTS
2941#undef RT_CHUNK_MASK
2942#undef RT_MAX_SHIFT
2943#undef RT_MAX_LEVEL
2944#undef RT_GET_KEY_CHUNK
2945#undef RT_BM_IDX
2946#undef RT_BM_BIT
2947#undef RT_NODE_MUST_GROW
2948#undef RT_NODE_KIND_COUNT
2949#undef RT_NUM_SIZE_CLASSES
2950#undef RT_INVALID_SLOT_IDX
2951#undef RT_SLAB_BLOCK_SIZE
2952#undef RT_RADIX_TREE_MAGIC
2953#undef RT_CHILD_PTR_FORMAT
2954
2955/* type declarations */
2956#undef RT_RADIX_TREE
2957#undef RT_RADIX_TREE_CONTROL
2958#undef RT_CHILD_PTR
2959#undef RT_PTR_ALLOC
2960#undef RT_INVALID_PTR_ALLOC
2961#undef RT_HANDLE
2962#undef RT_ITER
2963#undef RT_NODE
2964#undef RT_NODE_ITER
2965#undef RT_NODE_KIND_4
2966#undef RT_NODE_KIND_16
2967#undef RT_NODE_KIND_48
2968#undef RT_NODE_KIND_256
2969#undef RT_NODE_4
2970#undef RT_NODE_16
2971#undef RT_NODE_48
2972#undef RT_NODE_256
2973#undef RT_SIZE_CLASS
2974#undef RT_SIZE_CLASS_ELEM
2975#undef RT_SIZE_CLASS_INFO
2976#undef RT_CLASS_4
2977#undef RT_CLASS_16_LO
2978#undef RT_CLASS_16_HI
2979#undef RT_CLASS_48
2980#undef RT_CLASS_256
2981#undef RT_FANOUT_4
2982#undef RT_FANOUT_4_MAX
2983#undef RT_FANOUT_16_LO
2984#undef RT_FANOUT_16_HI
2985#undef RT_FANOUT_16_MAX
2986#undef RT_FANOUT_48
2987#undef RT_FANOUT_48_MAX
2988#undef RT_FANOUT_256
2989
2990/* function declarations */
2991#undef RT_CREATE
2992#undef RT_FREE
2993#undef RT_ATTACH
2994#undef RT_DETACH
2995#undef RT_LOCK_EXCLUSIVE
2996#undef RT_LOCK_SHARE
2997#undef RT_UNLOCK
2998#undef RT_GET_HANDLE
2999#undef RT_FIND
3000#undef RT_SET
3001#undef RT_BEGIN_ITERATE
3002#undef RT_ITERATE_NEXT
3003#undef RT_END_ITERATE
3004#undef RT_DELETE
3005#undef RT_MEMORY_USAGE
3006#undef RT_DUMP_NODE
3007#undef RT_STATS
3008
3009/* internal helper functions */
3010#undef RT_GET_VALUE_SIZE
3011#undef RT_VALUE_IS_EMBEDDABLE
3012#undef RT_CHILDPTR_IS_VALUE
3013#undef RT_GET_SLOT_RECURSIVE
3014#undef RT_DELETE_RECURSIVE
3015#undef RT_ALLOC_NODE
3016#undef RT_ALLOC_LEAF
3017#undef RT_FREE_NODE
3018#undef RT_FREE_LEAF
3019#undef RT_FREE_RECURSE
3020#undef RT_EXTEND_UP
3021#undef RT_EXTEND_DOWN
3022#undef RT_COPY_COMMON
3023#undef RT_PTR_SET_LOCAL
3024#undef RT_PTR_ALLOC_IS_VALID
3025#undef RT_NODE_16_SEARCH_EQ
3026#undef RT_NODE_4_GET_INSERTPOS
3027#undef RT_NODE_16_GET_INSERTPOS
3028#undef RT_SHIFT_ARRAYS_FOR_INSERT
3029#undef RT_SHIFT_ARRAYS_AND_DELETE
3030#undef RT_COPY_ARRAYS_FOR_INSERT
3031#undef RT_COPY_ARRAYS_AND_DELETE
3032#undef RT_NODE_48_IS_CHUNK_USED
3033#undef RT_NODE_48_GET_CHILD
3034#undef RT_NODE_256_IS_CHUNK_USED
3035#undef RT_NODE_256_GET_CHILD
3036#undef RT_KEY_GET_SHIFT
3037#undef RT_SHIFT_GET_MAX_VAL
3038#undef RT_NODE_SEARCH
3039#undef RT_ADD_CHILD_4
3040#undef RT_ADD_CHILD_16
3041#undef RT_ADD_CHILD_48
3042#undef RT_ADD_CHILD_256
3043#undef RT_GROW_NODE_4
3044#undef RT_GROW_NODE_16
3045#undef RT_GROW_NODE_48
3046#undef RT_REMOVE_CHILD_4
3047#undef RT_REMOVE_CHILD_16
3048#undef RT_REMOVE_CHILD_48
3049#undef RT_REMOVE_CHILD_256
3050#undef RT_SHRINK_NODE_16
3051#undef RT_SHRINK_NODE_48
3052#undef RT_SHRINK_NODE_256
3053#undef RT_NODE_DELETE
3054#undef RT_NODE_INSERT
3055#undef RT_NODE_ITERATE_NEXT
3056#undef RT_VERIFY_NODE
Datum idx(PG_FUNCTION_ARGS)
Definition: _int_op.c:259
#define bmw_rightmost_one_pos(w)
Definition: bitmapset.h:79
uint32 bitmapword
Definition: bitmapset.h:44
#define BITS_PER_BITMAPWORD
Definition: bitmapset.h:43
#define bmw_popcount(w)
Definition: bitmapset.h:80
#define pg_noinline
Definition: c.h:269
#define Min(x, y)
Definition: c.h:958
#define pg_attribute_unused()
Definition: c.h:130
uint8_t uint8
Definition: c.h:483
#define Assert(condition)
Definition: c.h:812
int64_t int64
Definition: c.h:482
#define FLEXIBLE_ARRAY_MEMBER
Definition: c.h:417
#define UINT64_FORMAT
Definition: c.h:504
uint64_t uint64
Definition: c.h:486
#define pg_unreachable()
Definition: c.h:315
#define unlikely(x)
Definition: c.h:330
uint32_t uint32
Definition: c.h:485
#define UINT64CONST(x)
Definition: c.h:500
#define fprintf(file, fmt, msg)
Definition: cubescan.l:21
void * dsa_get_address(dsa_area *area, dsa_pointer dp)
Definition: dsa.c:942
size_t dsa_get_total_size(dsa_area *area)
Definition: dsa.c:1027
void dsa_free(dsa_area *area, dsa_pointer dp)
Definition: dsa.c:826
#define dsa_allocate0(area, size)
Definition: dsa.h:113
uint64 dsa_pointer
Definition: dsa.h:62
#define DSA_POINTER_FORMAT
Definition: dsa.h:69
#define dsa_allocate(area, size)
Definition: dsa.h:109
uint64 chunk
int i
Definition: isn.c:72
if(TABLE==NULL||TABLE_index==NULL)
Definition: isn.c:76
bool LWLockAcquire(LWLock *lock, LWLockMode mode)
Definition: lwlock.c:1168
void LWLockRelease(LWLock *lock)
Definition: lwlock.c:1781
void LWLockInitialize(LWLock *lock, int tranche_id)
Definition: lwlock.c:707
@ LW_SHARED
Definition: lwlock.h:115
@ LW_EXCLUSIVE
Definition: lwlock.h:114
void * MemoryContextAlloc(MemoryContext context, Size size)
Definition: mcxt.c:1181
void MemoryContextReset(MemoryContext context)
Definition: mcxt.c:383
void * MemoryContextAllocZero(MemoryContext context, Size size)
Definition: mcxt.c:1215
void pfree(void *pointer)
Definition: mcxt.c:1521
void * palloc0(Size size)
Definition: mcxt.c:1347
Size MemoryContextMemAllocated(MemoryContext context, bool recurse)
Definition: mcxt.c:762
#define AllocSetContextCreate
Definition: memutils.h:129
#define ALLOCSET_SMALL_SIZES
Definition: memutils.h:170
static int pg_rightmost_one_pos32(uint32 word)
Definition: pg_bitutils.h:111
static int pg_leftmost_one_pos64(uint64 word)
Definition: pg_bitutils.h:72
#define BITS_PER_BYTE
#define RT_FIND
Definition: radixtree.h:177
#define RT_ADD_CHILD_48
Definition: radixtree.h:229
#define RT_NODE_KIND_16
Definition: radixtree.h:361
#define RT_NODE_ITERATE_NEXT
Definition: radixtree.h:241
#define RT_SPAN
Definition: radixtree.h:314
#define RT_CHILD_PTR
Definition: radixtree.h:252
#define RT_STATS
Definition: radixtree.h:195
#define RT_NODE_4_GET_INSERTPOS
Definition: radixtree.h:212
#define RT_NODE_ITER
Definition: radixtree.h:253
#define RT_ALLOC_LEAF
Definition: radixtree.h:203
#define RT_NODE_SEARCH
Definition: radixtree.h:224
#define RT_INVALID_SLOT_IDX
Definition: radixtree.h:557
#define RT_CHILDPTR_IS_VALUE
Definition: radixtree.h:198
#define RT_REMOVE_CHILD_16
Definition: radixtree.h:235
#define RT_VALUE_IS_EMBEDDABLE
Definition: radixtree.h:199
#define RT_ALLOC_NODE
Definition: radixtree.h:202
RT_CHILD_PTR rootnode
Definition: radixtree.h:1829
#define RT_FREE_RECURSE
Definition: radixtree.h:206
#define RT_NODE_16_GET_INSERTPOS
Definition: radixtree.h:213
#define RT_NODE_48_GET_CHILD
Definition: radixtree.h:219
#define RT_ITERATE_NEXT
Definition: radixtree.h:188
tree ctl start_shift
Definition: radixtree.h:1889
#define RT_END_ITERATE
Definition: radixtree.h:189
#define RT_BEGIN_ITERATE
Definition: radixtree.h:187
tree
Definition: radixtree.h:1836
#define RT_GROW_NODE_4
Definition: radixtree.h:231
#define RT_NODE_MAX_SLOTS
Definition: radixtree.h:320
RT_SCOPE RT_RADIX_TREE *MemoryContext old_ctx
Definition: radixtree.h:1828
#define RT_PTR_SET_LOCAL
Definition: radixtree.h:210
#define RT_COPY_ARRAYS_FOR_INSERT
Definition: radixtree.h:216
#define RT_DELETE_RECURSIVE
Definition: radixtree.h:201
#define RT_STR(s)
Definition: radixtree.h:171
#define RT_GET_SLOT_RECURSIVE
Definition: radixtree.h:200
for(int i=0;i< RT_NUM_SIZE_CLASSES;i++)
Definition: radixtree.h:1858
#define RT_SHRINK_NODE_256
Definition: radixtree.h:240
#define RT_SHRINK_NODE_48
Definition: radixtree.h:239
#define RT_EXTEND_DOWN
Definition: radixtree.h:208
#define RT_ADD_CHILD_16
Definition: radixtree.h:228
#define RT_SIZE_CLASS_ELEM
Definition: radixtree.h:259
#define RT_NODE_KIND_48
Definition: radixtree.h:362
MemoryContextSwitchTo(old_ctx)
#define RT_NODE_KIND_256
Definition: radixtree.h:363
#define RT_NODE
Definition: radixtree.h:251
#define RT_CREATE
Definition: radixtree.h:175
#define RT_REMOVE_CHILD_256
Definition: radixtree.h:237
#define RT_NODE_4
Definition: radixtree.h:254
#define RT_NODE_256
Definition: radixtree.h:257
#define RT_SIZE_CLASS
Definition: radixtree.h:258
#define RT_RADIX_TREE_CONTROL
Definition: radixtree.h:246
#define RT_NODE_MUST_GROW(node)
Definition: radixtree.h:1134
#define RT_COPY_COMMON
Definition: radixtree.h:209
#define RT_FREE
Definition: radixtree.h:176
#define RT_NODE_48
Definition: radixtree.h:256
#define RT_SLAB_BLOCK_SIZE(size)
Definition: radixtree.h:370
#define RT_FREE_NODE
Definition: radixtree.h:204
#define RT_CLASS_16_LO
Definition: radixtree.h:262
#define RT_MAX_SHIFT
Definition: radixtree.h:326
#define RT_SHIFT_GET_MAX_VAL
Definition: radixtree.h:223
#define RT_MEMORY_USAGE
Definition: radixtree.h:193
#define RT_NODE_48_IS_CHUNK_USED
Definition: radixtree.h:218
#define RT_BM_BIT(x)
Definition: radixtree.h:336
#define RT_CHUNK_MASK
Definition: radixtree.h:323
#define RT_FANOUT_256
Definition: radixtree.h:506
#define RT_COPY_ARRAYS_AND_DELETE
Definition: radixtree.h:217
#define RT_CLASS_256
Definition: radixtree.h:265
#define RT_ADD_CHILD_4
Definition: radixtree.h:227
#define RT_REMOVE_CHILD_48
Definition: radixtree.h:236
#define RT_DUMP_NODE
Definition: radixtree.h:194
#define RT_ADD_CHILD_256
Definition: radixtree.h:230
#define RT_KEY_GET_SHIFT
Definition: radixtree.h:222
#define RT_SHIFT_ARRAYS_AND_DELETE
Definition: radixtree.h:215
#define RT_NODE_DELETE
Definition: radixtree.h:225
RT_SIZE_CLASS
Definition: radixtree.h:633
#define RT_NODE_16
Definition: radixtree.h:255
#define RT_NODE_INSERT
Definition: radixtree.h:226
tree ctl root
Definition: radixtree.h:1888
#define RT_MAX_LEVEL
Definition: radixtree.h:329
#define RT_GROW_NODE_16
Definition: radixtree.h:232
#define RT_CLASS_16_HI
Definition: radixtree.h:263
#define RT_NODE_256_GET_CHILD
Definition: radixtree.h:221
StaticAssertDecl(RT_FANOUT_4<=RT_FANOUT_4_MAX, "watch struct padding")
#define RT_SHRINK_NODE_16
Definition: radixtree.h:238
#define RT_NODE_KIND_4
Definition: radixtree.h:360
#define RT_PTR_ALLOC_IS_VALID(ptr)
Definition: radixtree.h:406
#define RT_GET_KEY_CHUNK(key, shift)
Definition: radixtree.h:332
#define RT_VERIFY_NODE
Definition: radixtree.h:242
#define RT_FANOUT_16_HI
Definition: radixtree.h:602
#define RT_GET_VALUE_SIZE(v)
Definition: radixtree.h:433
#define RT_GROW_NODE_48
Definition: radixtree.h:233
#define RT_PTR_ALLOC
Definition: radixtree.h:404
#define RT_NODE_256_IS_CHUNK_USED
Definition: radixtree.h:220
#define RT_FANOUT_16_MAX
Definition: radixtree.h:494
#define RT_SHIFT_ARRAYS_FOR_INSERT
Definition: radixtree.h:214
#define RT_FANOUT_48
Definition: radixtree.h:603
#define RT_NODE_16_SEARCH_EQ
Definition: radixtree.h:211
#define RT_BM_IDX(x)
Definition: radixtree.h:335
#define RT_SIZE_CLASS_INFO
Definition: radixtree.h:260
#define RT_EXTEND_UP
Definition: radixtree.h:207
#define RT_FANOUT_48_MAX
Definition: radixtree.h:504
#define RT_SET
Definition: radixtree.h:186
#define RT_FREE_LEAF
Definition: radixtree.h:205
#define RT_FANOUT_4_MAX
Definition: radixtree.h:491
#define RT_FANOUT_4
Definition: radixtree.h:613
#define RT_CLASS_48
Definition: radixtree.h:264
#define RT_FANOUT_16_LO
Definition: radixtree.h:600
#define RT_INVALID_PTR_ALLOC
Definition: radixtree.h:405
#define RT_REMOVE_CHILD_4
Definition: radixtree.h:234
#define RT_NUM_SIZE_CLASSES
Definition: radixtree.h:678
#define RT_CLASS_4
Definition: radixtree.h:261
static Vector8 vector8_broadcast(const uint8 c)
Definition: simd.h:135
static void vector8_load(Vector8 *v, const uint8 *s)
Definition: simd.h:108
uint64 Vector8
Definition: simd.h:60
MemoryContext SlabContextCreate(MemoryContext parent, const char *name, Size blockSize, Size chunkSize)
Definition: slab.c:322
void check_stack_depth(void)
Definition: stack_depth.c:95
Definition: lwlock.h:42
RT_NODE_ITER node_iters[RT_MAX_LEVEL]
Definition: radixtree.h:754
int cur_level
Definition: radixtree.h:756
RT_RADIX_TREE * tree
Definition: radixtree.h:748
uint64 key
Definition: radixtree.h:759
int top_level
Definition: radixtree.h:755
RT_NODE base
Definition: radixtree.h:530
RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER]
Definition: radixtree.h:535
uint8 chunks[RT_FANOUT_16_MAX]
Definition: radixtree.h:532
RT_PTR_ALLOC children[RT_FANOUT_256]
Definition: radixtree.h:576
RT_NODE base
Definition: radixtree.h:570
bitmapword isset[RT_BM_IDX(RT_FANOUT_256)]
Definition: radixtree.h:573
RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER]
Definition: radixtree.h:560
uint8 slot_idxs[RT_NODE_MAX_SLOTS]
Definition: radixtree.h:554
RT_NODE base
Definition: radixtree.h:544
bitmapword isset[RT_BM_IDX(RT_FANOUT_48_MAX)]
Definition: radixtree.h:547
RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER]
Definition: radixtree.h:525
RT_NODE base
Definition: radixtree.h:520
uint8 chunks[RT_FANOUT_4_MAX]
Definition: radixtree.h:522
RT_CHILD_PTR node
Definition: radixtree.h:735
uint8 kind
Definition: radixtree.h:377
uint8 count
Definition: radixtree.h:394
uint8 fanout
Definition: radixtree.h:386
RT_PTR_ALLOC root
Definition: radixtree.h:694
MemoryContext context
Definition: radixtree.h:709
MemoryContextData * node_slabs[RT_NUM_SIZE_CLASSES]
Definition: radixtree.h:717
MemoryContextData * leaf_context
Definition: radixtree.h:720
MemoryContextData * iter_context
Definition: radixtree.h:722
RT_RADIX_TREE_CONTROL * ctl
Definition: radixtree.h:712
const char * name
Definition: radixtree.h:644
Definition: dsa.c:348
Definition: type.h:96
#define RT_VALUE_TYPE
Definition: tidstore.c:106
#define RT_SCOPE
Definition: tidstore.c:103
#define RT_PREFIX
Definition: tidstore.c:101
RT_NODE * local
Definition: radixtree.h:421
RT_PTR_ALLOC alloc
Definition: radixtree.h:420
Datum bit(PG_FUNCTION_ARGS)
Definition: varbit.c:391