PostgreSQL Source Code git master
All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
radixtree.h
Go to the documentation of this file.
1/*-------------------------------------------------------------------------
2 *
3 * radixtree.h
4 * Template for adaptive radix tree.
5 *
6 * A template to generate an "adaptive radix tree", specialized for value
7 * types and for local/shared memory.
8 *
9 * The concept originates from the paper "The Adaptive Radix Tree: ARTful
10 * Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper,
11 * and Thomas Neumann, 2013.
12 *
13 * Radix trees have some advantages over hash tables:
14 * - The keys are logically ordered, allowing efficient sorted iteration
15 * and range queries
16 * - Operations using keys that are lexicographically close together
17 * will have favorable memory locality
18 * - Memory use grows gradually rather than by doubling
19 * - The key does not need to be stored with the value, since the key
20 * is implicitly contained in the path to the value
21 *
22 * Some disadvantages are:
23 * - Point queries (along with insertion and deletion) are slower than
24 * a linear probing hash table as in simplehash.h
25 * - Memory usage varies by key distribution, so is difficult to predict
26 *
27 * A classic radix tree consists of nodes, each containing an array of
28 * pointers to child nodes. The size of the array is determined by the
29 * "span" of the tree, which is the number of bits of the key used to
30 * index into the array. For example, with a span of 6, a "chunk"
31 * of 6 bits is extracted from the key at each node traversal, and
32 * the arrays thus have a "fanout" of 2^6 or 64 entries. A large span
33 * allows a shorter tree, but requires larger arrays that may be mostly
34 * wasted space.
35 *
36 * The key idea of the adaptive radix tree is to choose different
37 * data structures based on the number of child nodes. A node will
38 * start out small when it is first populated, and when it is full,
39 * it is replaced by the next larger size. Conversely, when a node
40 * becomes mostly empty, it is replaced by the next smaller node. The
41 * bulk of the code complexity in this module stems from this dynamic
42 * switching. One mitigating factor is using a span of 8, since bytes
43 * are directly addressable.
44 *
45 * The ART paper mentions three ways to implement leaves:
46 *
47 * "- Single-value leaves: The values are stored using an addi-
48 * tional leaf node type which stores one value.
49 * - Multi-value leaves: The values are stored in one of four
50 * different leaf node types, which mirror the structure of
51 * inner nodes, but contain values instead of pointers.
52 * - Combined pointer/value slots: If values fit into point-
53 * ers, no separate node types are necessary. Instead, each
54 * pointer storage location in an inner node can either
55 * store a pointer or a value."
56 *
57 * We use a form of "combined pointer/value slots", as recommended. Values
58 * of size (if fixed at compile time) equal or smaller than the platform's
59 * pointer type are stored in the child slots of the last level node,
60 * while larger values are the same as "single-value" leaves above. This
61 * offers flexibility and efficiency. Variable-length types are currently
62 * treated as single-value leaves for simplicity, but future work may
63 * allow those to be stored in the child pointer arrays, when they're
64 * small enough.
65 *
66 * There are two other techniques described in the paper that are not
67 * implemented here:
68 * - path compression "...removes all inner nodes that have only a single child."
69 * - lazy path expansion "...inner nodes are only created if they are required
70 * to distinguish at least two leaf nodes."
71 *
72 * We do have a form of "poor man's path compression", however, enabled by
73 * only supporting unsigned integer keys (for now assumed to be 64-bit):
74 * A tree doesn't contain paths where the highest bytes of all keys are
75 * zero. That way, the tree's height adapts to the distribution of keys.
76 *
77 * To handle concurrency, we use a single reader-writer lock for the
78 * radix tree. If concurrent write operations are possible, the tree
79 * must be exclusively locked during write operations such as RT_SET()
80 * and RT_DELETE(), and share locked during read operations such as
81 * RT_FIND() and RT_BEGIN_ITERATE().
82 *
83 * TODO: The current locking mechanism is not optimized for high
84 * concurrency with mixed read-write workloads. In the future it might
85 * be worthwhile to replace it with the Optimistic Lock Coupling or
86 * ROWEX mentioned in the paper "The ART of Practical Synchronization"
87 * by the same authors as the ART paper, 2016.
88 *
89 * To generate a radix tree and associated functions for a use case
90 * several macros have to be #define'ed before this file is included.
91 * Including the file #undef's all those, so a new radix tree can be
92 * generated afterwards.
93 *
94 * The relevant parameters are:
95 * - RT_PREFIX - prefix for all symbol names generated. A prefix of "foo"
96 * will result in radix tree type "foo_radix_tree" and functions like
97 * "foo_create"/"foo_free" and so forth.
98 * - RT_DECLARE - if defined function prototypes and type declarations are
99 * generated
100 * - RT_DEFINE - if defined function definitions are generated
101 * - RT_SCOPE - in which scope (e.g. extern, static inline) do function
102 * declarations reside
103 * - RT_VALUE_TYPE - the type of the value.
104 * - RT_VARLEN_VALUE_SIZE() - for variable length values, an expression
105 * involving a pointer to the value type, to calculate size.
106 * NOTE: implies that the value is in fact variable-length,
107 * so do not set for fixed-length values.
108 * - RT_RUNTIME_EMBEDDABLE_VALUE - for variable length values, allows
109 * storing the value in a child pointer slot, rather than as a single-
110 * value leaf, if small enough. This requires that the value, when
111 * read as a child pointer, can be tagged in the lowest bit.
112 *
113 * Optional parameters:
114 * - RT_SHMEM - if defined, the radix tree is created in the DSA area
115 * so that multiple processes can access it simultaneously.
116 * - RT_DEBUG - if defined add stats tracking and debugging functions
117 *
118 * Interface
119 * ---------
120 *
121 * RT_CREATE - Create a new, empty radix tree
122 * RT_FREE - Free the radix tree
123 * RT_FIND - Lookup the value for a given key
124 * RT_SET - Set a key-value pair
125 * RT_BEGIN_ITERATE - Begin iterating through all key-value pairs
126 * RT_ITERATE_NEXT - Return next key-value pair, if any
127 * RT_END_ITERATE - End iteration
128 * RT_MEMORY_USAGE - Get the memory as measured by space in memory context blocks
129 *
130 * Interface for Shared Memory
131 * ---------
132 *
133 * RT_ATTACH - Attach to the radix tree
134 * RT_DETACH - Detach from the radix tree
135 * RT_LOCK_EXCLUSIVE - Lock the radix tree in exclusive mode
136 * RT_LOCK_SHARE - Lock the radix tree in share mode
137 * RT_UNLOCK - Unlock the radix tree
138 * RT_GET_HANDLE - Return the handle of the radix tree
139 *
140 * Optional Interface
141 * ---------
142 *
143 * RT_DELETE - Delete a key-value pair. Declared/defined if RT_USE_DELETE is defined
144 *
145 *
146 * Copyright (c) 2024-2025, PostgreSQL Global Development Group
147 *
148 * IDENTIFICATION
149 * src/include/lib/radixtree.h
150 *
151 *-------------------------------------------------------------------------
152 */
153
154#include "nodes/bitmapset.h"
155#include "port/pg_bitutils.h"
156#include "port/simd.h"
157#include "utils/dsa.h"
158#include "utils/memutils.h"
159#ifdef RT_SHMEM
160#include "miscadmin.h"
161#include "storage/lwlock.h"
162#endif
163
164/* helpers */
165#define RT_MAKE_PREFIX(a) CppConcat(a,_)
166#define RT_MAKE_NAME(name) RT_MAKE_NAME_(RT_MAKE_PREFIX(RT_PREFIX),name)
167#define RT_MAKE_NAME_(a,b) CppConcat(a,b)
168/*
169 * stringify a macro constant, from https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html
170 */
171#define RT_STR(s) RT_STR_(s)
172#define RT_STR_(s) #s
173
174/* function declarations */
175#define RT_CREATE RT_MAKE_NAME(create)
176#define RT_FREE RT_MAKE_NAME(free)
177#define RT_FIND RT_MAKE_NAME(find)
178#ifdef RT_SHMEM
179#define RT_ATTACH RT_MAKE_NAME(attach)
180#define RT_DETACH RT_MAKE_NAME(detach)
181#define RT_GET_HANDLE RT_MAKE_NAME(get_handle)
182#define RT_LOCK_EXCLUSIVE RT_MAKE_NAME(lock_exclusive)
183#define RT_LOCK_SHARE RT_MAKE_NAME(lock_share)
184#define RT_UNLOCK RT_MAKE_NAME(unlock)
185#endif
186#define RT_SET RT_MAKE_NAME(set)
187#define RT_BEGIN_ITERATE RT_MAKE_NAME(begin_iterate)
188#define RT_ITERATE_NEXT RT_MAKE_NAME(iterate_next)
189#define RT_END_ITERATE RT_MAKE_NAME(end_iterate)
190#ifdef RT_USE_DELETE
191#define RT_DELETE RT_MAKE_NAME(delete)
192#endif
193#define RT_MEMORY_USAGE RT_MAKE_NAME(memory_usage)
194#define RT_DUMP_NODE RT_MAKE_NAME(dump_node)
195#define RT_STATS RT_MAKE_NAME(stats)
196
197/* internal helper functions (no externally visible prototypes) */
198#define RT_CHILDPTR_IS_VALUE RT_MAKE_NAME(childptr_is_value)
199#define RT_VALUE_IS_EMBEDDABLE RT_MAKE_NAME(value_is_embeddable)
200#define RT_GET_SLOT_RECURSIVE RT_MAKE_NAME(get_slot_recursive)
201#define RT_DELETE_RECURSIVE RT_MAKE_NAME(delete_recursive)
202#define RT_ALLOC_NODE RT_MAKE_NAME(alloc_node)
203#define RT_ALLOC_LEAF RT_MAKE_NAME(alloc_leaf)
204#define RT_FREE_NODE RT_MAKE_NAME(free_node)
205#define RT_FREE_LEAF RT_MAKE_NAME(free_leaf)
206#define RT_FREE_RECURSE RT_MAKE_NAME(free_recurse)
207#define RT_EXTEND_UP RT_MAKE_NAME(extend_up)
208#define RT_EXTEND_DOWN RT_MAKE_NAME(extend_down)
209#define RT_COPY_COMMON RT_MAKE_NAME(copy_common)
210#define RT_PTR_SET_LOCAL RT_MAKE_NAME(ptr_set_local)
211#define RT_NODE_16_SEARCH_EQ RT_MAKE_NAME(node_16_search_eq)
212#define RT_NODE_4_GET_INSERTPOS RT_MAKE_NAME(node_4_get_insertpos)
213#define RT_NODE_16_GET_INSERTPOS RT_MAKE_NAME(node_16_get_insertpos)
214#define RT_SHIFT_ARRAYS_FOR_INSERT RT_MAKE_NAME(shift_arrays_for_insert)
215#define RT_SHIFT_ARRAYS_AND_DELETE RT_MAKE_NAME(shift_arrays_and_delete)
216#define RT_COPY_ARRAYS_FOR_INSERT RT_MAKE_NAME(copy_arrays_for_insert)
217#define RT_COPY_ARRAYS_AND_DELETE RT_MAKE_NAME(copy_arrays_and_delete)
218#define RT_NODE_48_IS_CHUNK_USED RT_MAKE_NAME(node_48_is_chunk_used)
219#define RT_NODE_48_GET_CHILD RT_MAKE_NAME(node_48_get_child)
220#define RT_NODE_256_IS_CHUNK_USED RT_MAKE_NAME(node_256_is_chunk_used)
221#define RT_NODE_256_GET_CHILD RT_MAKE_NAME(node_256_get_child)
222#define RT_KEY_GET_SHIFT RT_MAKE_NAME(key_get_shift)
223#define RT_SHIFT_GET_MAX_VAL RT_MAKE_NAME(shift_get_max_val)
224#define RT_NODE_SEARCH RT_MAKE_NAME(node_search)
225#define RT_NODE_DELETE RT_MAKE_NAME(node_delete)
226#define RT_NODE_INSERT RT_MAKE_NAME(node_insert)
227#define RT_ADD_CHILD_4 RT_MAKE_NAME(add_child_4)
228#define RT_ADD_CHILD_16 RT_MAKE_NAME(add_child_16)
229#define RT_ADD_CHILD_48 RT_MAKE_NAME(add_child_48)
230#define RT_ADD_CHILD_256 RT_MAKE_NAME(add_child_256)
231#define RT_GROW_NODE_4 RT_MAKE_NAME(grow_node_4)
232#define RT_GROW_NODE_16 RT_MAKE_NAME(grow_node_16)
233#define RT_GROW_NODE_48 RT_MAKE_NAME(grow_node_48)
234#define RT_REMOVE_CHILD_4 RT_MAKE_NAME(remove_child_4)
235#define RT_REMOVE_CHILD_16 RT_MAKE_NAME(remove_child_16)
236#define RT_REMOVE_CHILD_48 RT_MAKE_NAME(remove_child_48)
237#define RT_REMOVE_CHILD_256 RT_MAKE_NAME(remove_child_256)
238#define RT_SHRINK_NODE_16 RT_MAKE_NAME(shrink_child_16)
239#define RT_SHRINK_NODE_48 RT_MAKE_NAME(shrink_child_48)
240#define RT_SHRINK_NODE_256 RT_MAKE_NAME(shrink_child_256)
241#define RT_NODE_ITERATE_NEXT RT_MAKE_NAME(node_iterate_next)
242#define RT_VERIFY_NODE RT_MAKE_NAME(verify_node)
243
244/* type declarations */
245#define RT_RADIX_TREE RT_MAKE_NAME(radix_tree)
246#define RT_RADIX_TREE_CONTROL RT_MAKE_NAME(radix_tree_control)
247#define RT_ITER RT_MAKE_NAME(iter)
248#ifdef RT_SHMEM
249#define RT_HANDLE RT_MAKE_NAME(handle)
250#endif
251#define RT_NODE RT_MAKE_NAME(node)
252#define RT_CHILD_PTR RT_MAKE_NAME(child_ptr)
253#define RT_NODE_ITER RT_MAKE_NAME(node_iter)
254#define RT_NODE_4 RT_MAKE_NAME(node_4)
255#define RT_NODE_16 RT_MAKE_NAME(node_16)
256#define RT_NODE_48 RT_MAKE_NAME(node_48)
257#define RT_NODE_256 RT_MAKE_NAME(node_256)
258#define RT_SIZE_CLASS RT_MAKE_NAME(size_class)
259#define RT_SIZE_CLASS_ELEM RT_MAKE_NAME(size_class_elem)
260#define RT_SIZE_CLASS_INFO RT_MAKE_NAME(size_class_info)
261#define RT_CLASS_4 RT_MAKE_NAME(class_4)
262#define RT_CLASS_16_LO RT_MAKE_NAME(class_32_min)
263#define RT_CLASS_16_HI RT_MAKE_NAME(class_32_max)
264#define RT_CLASS_48 RT_MAKE_NAME(class_48)
265#define RT_CLASS_256 RT_MAKE_NAME(class_256)
266
267/* generate forward declarations necessary to use the radix tree */
268#ifdef RT_DECLARE
269
271typedef struct RT_ITER RT_ITER;
272
273#ifdef RT_SHMEM
274typedef dsa_pointer RT_HANDLE;
275#endif
276
277#ifdef RT_SHMEM
278RT_SCOPE RT_RADIX_TREE *RT_CREATE(dsa_area *dsa, int tranche_id);
279RT_SCOPE RT_RADIX_TREE *RT_ATTACH(dsa_area *dsa, dsa_pointer dp);
280RT_SCOPE void RT_DETACH(RT_RADIX_TREE * tree);
281RT_SCOPE RT_HANDLE RT_GET_HANDLE(RT_RADIX_TREE * tree);
282RT_SCOPE void RT_LOCK_EXCLUSIVE(RT_RADIX_TREE * tree);
283RT_SCOPE void RT_LOCK_SHARE(RT_RADIX_TREE * tree);
284RT_SCOPE void RT_UNLOCK(RT_RADIX_TREE * tree);
285#else
287#endif
289
292
293#ifdef RT_USE_DELETE
294RT_SCOPE bool RT_DELETE(RT_RADIX_TREE * tree, uint64 key);
295#endif
296
299RT_SCOPE void RT_END_ITERATE(RT_ITER * iter);
300
302
303#ifdef RT_DEBUG
305#endif
306
307#endif /* RT_DECLARE */
308
309
310/* generate implementation of the radix tree */
311#ifdef RT_DEFINE
312
313/* The number of bits encoded in one tree level */
314#define RT_SPAN BITS_PER_BYTE
315
316/*
317 * The number of possible partial keys, and thus the maximum number of
318 * child pointers, for a node.
319 */
320#define RT_NODE_MAX_SLOTS (1 << RT_SPAN)
321
322/* Mask for extracting a chunk from a key */
323#define RT_CHUNK_MASK ((1 << RT_SPAN) - 1)
324
325/* Maximum shift needed to extract a chunk from a key */
326#define RT_MAX_SHIFT RT_KEY_GET_SHIFT(UINT64_MAX)
327
328/* Maximum level a tree can reach for a key */
329#define RT_MAX_LEVEL ((sizeof(uint64) * BITS_PER_BYTE) / RT_SPAN)
330
331/* Get a chunk from the key */
332#define RT_GET_KEY_CHUNK(key, shift) ((uint8) (((key) >> (shift)) & RT_CHUNK_MASK))
333
334/* For accessing bitmaps */
335#define RT_BM_IDX(x) ((x) / BITS_PER_BITMAPWORD)
336#define RT_BM_BIT(x) ((x) % BITS_PER_BITMAPWORD)
337
338/*
339 * Node kinds
340 *
341 * The different node kinds are what make the tree "adaptive".
342 *
343 * Each node kind is associated with a different datatype and different
344 * search/set/delete/iterate algorithms adapted for its size. The largest
345 * kind, node256 is basically the same as a traditional radix tree,
346 * and would be most wasteful of memory when sparsely populated. The
347 * smaller nodes expend some additional CPU time to enable a smaller
348 * memory footprint.
349 *
350 * NOTE: There are 4 node kinds, and this should never be increased,
351 * for several reasons:
352 * 1. With 5 or more kinds, gcc tends to use a jump table for switch
353 * statements.
354 * 2. The 4 kinds can be represented with 2 bits, so we have the option
355 * in the future to tag the node pointer with the kind, even on
356 * platforms with 32-bit pointers. That would touch fewer cache lines
357 * during traversal and allow faster recovery from branch mispredicts.
358 * 3. We can have multiple size classes per node kind.
359 */
360#define RT_NODE_KIND_4 0x00
361#define RT_NODE_KIND_16 0x01
362#define RT_NODE_KIND_48 0x02
363#define RT_NODE_KIND_256 0x03
364#define RT_NODE_KIND_COUNT 4
365
366/*
367 * Calculate the slab block size so that we can allocate at least 32 chunks
368 * from the block.
369 */
370#define RT_SLAB_BLOCK_SIZE(size) \
371 Max(SLAB_DEFAULT_BLOCK_SIZE, pg_nextpower2_32(size * 32))
372
373/* Common header for all nodes */
374typedef struct RT_NODE
375{
376 /* Node kind, one per search/set algorithm */
378
379 /*
380 * Max capacity for the current size class. Storing this in the node
381 * enables multiple size classes per node kind. uint8 is sufficient for
382 * all node kinds, because we only use this number to test if the node
383 * needs to grow. Since node256 never needs to grow, we let this overflow
384 * to zero.
385 */
387
388 /*
389 * Number of children. uint8 is sufficient for all node kinds, because
390 * nodes shrink when this number gets lower than some threshold. Since
391 * node256 cannot possibly have zero children, we let the counter overflow
392 * and we interpret zero as "256" for this node kind.
393 */
396
397
398/* pointer returned by allocation */
399#ifdef RT_SHMEM
400#define RT_PTR_ALLOC dsa_pointer
401#define RT_INVALID_PTR_ALLOC InvalidDsaPointer
402#define RT_PTR_ALLOC_IS_VALID(ptr) DsaPointerIsValid(ptr)
403#else
404#define RT_PTR_ALLOC RT_NODE *
405#define RT_INVALID_PTR_ALLOC NULL
406#define RT_PTR_ALLOC_IS_VALID(ptr) PointerIsValid(ptr)
407#endif
408
409/*
410 * A convenience type used when we need to work with a DSA pointer as well
411 * as its local pointer. For local memory, both members are the same, so
412 * we use a union.
413 */
414#ifdef RT_SHMEM
415typedef struct RT_CHILD_PTR
416#else
417typedef union RT_CHILD_PTR
418#endif
419{
423
424
425/*
426 * Helper macros and functions for value storage.
427 * We either embed values in the child slots of the last level
428 * node or store pointers to values to the child slots,
429 * depending on the value size.
430 */
431
432#ifdef RT_VARLEN_VALUE_SIZE
433#define RT_GET_VALUE_SIZE(v) RT_VARLEN_VALUE_SIZE(v)
434#else
435#define RT_GET_VALUE_SIZE(v) sizeof(RT_VALUE_TYPE)
436#endif
437
438/*
439 * Return true if the value can be stored in the child array
440 * of the lowest-level node, false otherwise.
441 */
442static inline bool
444{
445#ifdef RT_VARLEN_VALUE_SIZE
446
447#ifdef RT_RUNTIME_EMBEDDABLE_VALUE
448 return RT_GET_VALUE_SIZE(value_p) <= sizeof(RT_PTR_ALLOC);
449#else
450 return false;
451#endif
452
453#else
454 return RT_GET_VALUE_SIZE(value_p) <= sizeof(RT_PTR_ALLOC);
455#endif
456}
457
458/*
459 * Return true if the child pointer contains the value, false
460 * if the child pointer is a leaf pointer.
461 */
462static inline bool
464{
465#ifdef RT_VARLEN_VALUE_SIZE
466
467#ifdef RT_RUNTIME_EMBEDDABLE_VALUE
468 /* check for pointer tag */
469#ifdef RT_SHMEM
470 return child & 1;
471#else
472 return ((uintptr_t) child) & 1;
473#endif
474
475#else
476 return false;
477#endif
478
479#else
480 return sizeof(RT_VALUE_TYPE) <= sizeof(RT_PTR_ALLOC);
481#endif
482}
483
484/*
485 * Symbols for maximum possible fanout are declared first as they are
486 * required to declare each node kind. The declarations of other fanout
487 * values are followed as they need the struct sizes of each node kind.
488 */
489
490/* max possible key chunks without struct padding */
491#define RT_FANOUT_4_MAX (8 - sizeof(RT_NODE))
492
493/* equal to two 128-bit SIMD registers, regardless of availability */
494#define RT_FANOUT_16_MAX 32
495
496/*
497 * This also determines the number of bits necessary for the isset array,
498 * so we need to be mindful of the size of bitmapword. Since bitmapword
499 * can be 64 bits, the only values that make sense here are 64 and 128.
500 * The ART paper uses at most 64 for this node kind, and one advantage
501 * for us is that "isset" is a single bitmapword on most platforms,
502 * rather than an array, allowing the compiler to get rid of loops.
503 */
504#define RT_FANOUT_48_MAX 64
505
506#define RT_FANOUT_256 RT_NODE_MAX_SLOTS
507
508/*
509 * Node structs, one for each "kind"
510 */
511
512/*
513 * node4 and node16 use one array for key chunks and another
514 * array of the same length for children. The keys and children
515 * are stored at corresponding positions, sorted by chunk.
516 */
517
518typedef struct RT_NODE_4
519{
521
523
524 /* number of children depends on size class */
527
528typedef struct RT_NODE_16
529{
531
533
534 /* number of children depends on size class */
537
538/*
539 * node48 uses a 256-element array indexed by key chunks. This array
540 * stores indexes into a second array containing the children.
541 */
542typedef struct RT_NODE_48
543{
545
546 /* bitmap to track which slots are in use */
548
549 /*
550 * Lookup table for indexes into the children[] array. We make this the
551 * last fixed-size member so that it's convenient to memset separately
552 * from the previous members.
553 */
555
556/* Invalid index */
557#define RT_INVALID_SLOT_IDX 0xFF
558
559 /* number of children depends on size class */
562
563/*
564 * node256 is the largest node type. This node has an array of
565 * children directly indexed by chunk. Unlike other node kinds,
566 * its array size is by definition fixed.
567 */
568typedef struct RT_NODE_256
569{
571
572 /* bitmap to track which slots are in use */
574
575 /* slots for 256 children */
578
579#if defined(RT_SHMEM)
580/*
581 * Make sure the all nodes (except for node256) fit neatly into a DSA
582 * size class. We assume the RT_FANOUT_4 is in the range where DSA size
583 * classes increment by 8 (as of PG17 up to 64 bytes), so we just hard
584 * code that one.
585 */
586
587#if SIZEOF_DSA_POINTER < 8
588#define RT_FANOUT_16_LO ((96 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
589#define RT_FANOUT_16_HI Min(RT_FANOUT_16_MAX, (160 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
590#define RT_FANOUT_48 Min(RT_FANOUT_48_MAX, (512 - offsetof(RT_NODE_48, children)) / sizeof(RT_PTR_ALLOC))
591#else
592#define RT_FANOUT_16_LO ((160 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
593#define RT_FANOUT_16_HI Min(RT_FANOUT_16_MAX, (320 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
594#define RT_FANOUT_48 Min(RT_FANOUT_48_MAX, (768 - offsetof(RT_NODE_48, children)) / sizeof(RT_PTR_ALLOC))
595#endif /* SIZEOF_DSA_POINTER < 8 */
596
597#else /* ! RT_SHMEM */
598
599/* doesn't really matter, but may as well use the namesake */
600#define RT_FANOUT_16_LO 16
601/* use maximum possible */
602#define RT_FANOUT_16_HI RT_FANOUT_16_MAX
603#define RT_FANOUT_48 RT_FANOUT_48_MAX
604
605#endif /* RT_SHMEM */
606
607/*
608 * To save memory in trees with sparse keys, it would make sense to have two
609 * size classes for the smallest kind (perhaps a high class of 5 and a low class
610 * of 2), but it would be more effective to utilize lazy expansion and
611 * path compression.
612 */
613#define RT_FANOUT_4 4
614
615StaticAssertDecl(RT_FANOUT_4 <= RT_FANOUT_4_MAX, "watch struct padding");
616StaticAssertDecl(RT_FANOUT_16_LO < RT_FANOUT_16_HI, "LO subclass bigger than HI");
617StaticAssertDecl(RT_FANOUT_48 <= RT_FANOUT_48_MAX, "more slots than isset bits");
618
619/*
620 * Node size classes
621 *
622 * Nodes of different kinds necessarily belong to different size classes.
623 * One innovation in our implementation compared to the ART paper is
624 * decoupling the notion of size class from kind.
625 *
626 * The size classes within a given node kind have the same underlying
627 * type, but a variable number of children/values. This is possible
628 * because each type (except node256) contains metadata that work the
629 * same way regardless of how many child slots there are. The nodes
630 * can introspect their allocated capacity at runtime.
631 */
632typedef enum RT_SIZE_CLASS
633{
640
641/* Information for each size class */
642typedef struct RT_SIZE_CLASS_ELEM
643{
644 const char *name;
646 size_t allocsize;
648
649
651 [RT_CLASS_4] = {
652 .name = RT_STR(RT_PREFIX) "_radix_tree node4",
653 .fanout = RT_FANOUT_4,
654 .allocsize = sizeof(RT_NODE_4) + RT_FANOUT_4 * sizeof(RT_PTR_ALLOC),
655 },
656 [RT_CLASS_16_LO] = {
657 .name = RT_STR(RT_PREFIX) "_radix_tree node16_lo",
658 .fanout = RT_FANOUT_16_LO,
659 .allocsize = sizeof(RT_NODE_16) + RT_FANOUT_16_LO * sizeof(RT_PTR_ALLOC),
660 },
661 [RT_CLASS_16_HI] = {
662 .name = RT_STR(RT_PREFIX) "_radix_tree node16_hi",
663 .fanout = RT_FANOUT_16_HI,
664 .allocsize = sizeof(RT_NODE_16) + RT_FANOUT_16_HI * sizeof(RT_PTR_ALLOC),
665 },
666 [RT_CLASS_48] = {
667 .name = RT_STR(RT_PREFIX) "_radix_tree node48",
668 .fanout = RT_FANOUT_48,
669 .allocsize = sizeof(RT_NODE_48) + RT_FANOUT_48 * sizeof(RT_PTR_ALLOC),
670 },
671 [RT_CLASS_256] = {
672 .name = RT_STR(RT_PREFIX) "_radix_tree node256",
673 .fanout = RT_FANOUT_256,
674 .allocsize = sizeof(RT_NODE_256),
675 },
676};
677
678#define RT_NUM_SIZE_CLASSES lengthof(RT_SIZE_CLASS_INFO)
679
680#ifdef RT_SHMEM
681/* A magic value used to identify our radix tree */
682#define RT_RADIX_TREE_MAGIC 0x54A48167
683#endif
684
685/* Contains the actual tree, plus ancillary info */
687{
688#ifdef RT_SHMEM
689 RT_HANDLE handle;
690 uint32 magic;
691 LWLock lock;
692#endif
693
698
699 /* statistics */
700#ifdef RT_DEBUG
701 int64 num_nodes[RT_NUM_SIZE_CLASSES];
702 int64 num_leaves;
703#endif
705
706/* Entry point for allocating and accessing the tree */
708{
709 /* pointing to either local memory or DSA */
711
712#ifdef RT_SHMEM
713 dsa_area *dsa;
714#else
716
717 /* leaf_context is used only for single-value leaves */
719#endif
720};
721
722/*
723 * Iteration support.
724 *
725 * Iterating over the radix tree produces each key/value pair in ascending
726 * order of the key.
727 */
728
729/* state for iterating over a single node */
730typedef struct RT_NODE_ITER
731{
733
734 /*
735 * The next index of the chunk array in RT_NODE_KIND_4 and RT_NODE_KIND_16
736 * nodes, or the next chunk in RT_NODE_KIND_48 and RT_NODE_KIND_256 nodes.
737 * 0 for the initial value.
738 */
739 int idx;
741
742/* state for iterating over the whole radix tree */
744{
746
747 /*
748 * A stack to track iteration for each level. Level 0 is the lowest (or
749 * leaf) level
750 */
754
755 /* The key constructed during iteration */
757};
758
759
760/* verification (available only in assert-enabled builds) */
761static void RT_VERIFY_NODE(RT_NODE * node);
762
763static inline void
765{
766#ifdef RT_SHMEM
767 node->local = dsa_get_address(tree->dsa, node->alloc);
768#endif
769}
770
771/* Convenience functions for node48 and node256 */
772
773/* Return true if there is an entry for "chunk" */
774static inline bool
776{
777 return node->slot_idxs[chunk] != RT_INVALID_SLOT_IDX;
778}
779
780static inline RT_PTR_ALLOC *
782{
783 return &node->children[node->slot_idxs[chunk]];
784}
785
786/* Return true if there is an entry for "chunk" */
787static inline bool
789{
790 int idx = RT_BM_IDX(chunk);
791 int bitnum = RT_BM_BIT(chunk);
792
793 return (node->isset[idx] & ((bitmapword) 1 << bitnum)) != 0;
794}
795
796static inline RT_PTR_ALLOC *
798{
799 Assert(RT_NODE_256_IS_CHUNK_USED(node, chunk));
800 return &node->children[chunk];
801}
802
803/*
804 * Return the smallest shift that will allowing storing the given key.
805 */
806static inline int
808{
809 if (key == 0)
810 return 0;
811 else
813}
814
815/*
816 * Return the max value that can be stored in the tree with the given shift.
817 */
818static uint64
820{
821 if (shift == RT_MAX_SHIFT)
822 return UINT64_MAX;
823 else
824 return (UINT64CONST(1) << (shift + RT_SPAN)) - 1;
825}
826
827/*
828 * Allocate a new node with the given node kind and size class.
829 */
830static inline RT_CHILD_PTR
831RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_class)
832{
833 RT_CHILD_PTR allocnode;
834 RT_NODE *node;
835 size_t allocsize;
836
837 allocsize = RT_SIZE_CLASS_INFO[size_class].allocsize;
838
839#ifdef RT_SHMEM
840 allocnode.alloc = dsa_allocate(tree->dsa, allocsize);
841#else
842 allocnode.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->node_slabs[size_class],
843 allocsize);
844#endif
845
846 RT_PTR_SET_LOCAL(tree, &allocnode);
847 node = allocnode.local;
848
849 /* initialize contents */
850
851 switch (kind)
852 {
853 case RT_NODE_KIND_4:
854 memset(node, 0, offsetof(RT_NODE_4, children));
855 break;
856 case RT_NODE_KIND_16:
857 memset(node, 0, offsetof(RT_NODE_16, children));
858 break;
859 case RT_NODE_KIND_48:
860 {
861 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
862
863 memset(n48, 0, offsetof(RT_NODE_48, slot_idxs));
864 memset(n48->slot_idxs, RT_INVALID_SLOT_IDX, sizeof(n48->slot_idxs));
865 break;
866 }
867 case RT_NODE_KIND_256:
868 memset(node, 0, offsetof(RT_NODE_256, children));
869 break;
870 default:
872 }
873
874 node->kind = kind;
875
876 /*
877 * For node256, this will actually overflow to zero, but that's okay
878 * because that node doesn't need to introspect this value.
879 */
880 node->fanout = RT_SIZE_CLASS_INFO[size_class].fanout;
881
882#ifdef RT_DEBUG
883 /* update the statistics */
884 tree->ctl->num_nodes[size_class]++;
885#endif
886
887 return allocnode;
888}
889
890/*
891 * Allocate a new leaf.
892 */
893static RT_CHILD_PTR
894RT_ALLOC_LEAF(RT_RADIX_TREE * tree, size_t allocsize)
895{
896 RT_CHILD_PTR leaf;
897
898#ifdef RT_SHMEM
899 leaf.alloc = dsa_allocate(tree->dsa, allocsize);
900 RT_PTR_SET_LOCAL(tree, &leaf);
901#else
902 leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_context, allocsize);
903#endif
904
905#ifdef RT_DEBUG
906 tree->ctl->num_leaves++;
907#endif
908
909 return leaf;
910}
911
912/*
913 * Copy relevant members of the node header.
914 * This is a separate function in case other fields are added.
915 */
916static inline void
918{
919 (newnode.local)->count = (oldnode.local)->count;
920}
921
922/* Free the given node */
923static void
925{
926#ifdef RT_DEBUG
927 int i;
928
929 /* update the statistics */
930
931 for (i = 0; i < RT_NUM_SIZE_CLASSES; i++)
932 {
933 if ((node.local)->fanout == RT_SIZE_CLASS_INFO[i].fanout)
934 break;
935 }
936
937 /*
938 * The fanout of node256 will appear to be zero within the node header
939 * because of overflow, so we need an extra check here.
940 */
941 if (i == RT_NUM_SIZE_CLASSES)
942 i = RT_CLASS_256;
943
944 tree->ctl->num_nodes[i]--;
945 Assert(tree->ctl->num_nodes[i] >= 0);
946#endif
947
948#ifdef RT_SHMEM
949 dsa_free(tree->dsa, node.alloc);
950#else
951 pfree(node.alloc);
952#endif
953}
954
955static inline void
957{
958 Assert(leaf != tree->ctl->root);
959
960#ifdef RT_DEBUG
961 /* update the statistics */
962 tree->ctl->num_leaves--;
963 Assert(tree->ctl->num_leaves >= 0);
964#endif
965
966#ifdef RT_SHMEM
967 dsa_free(tree->dsa, leaf);
968#else
969 pfree(leaf);
970#endif
971}
972
973/***************** SEARCH *****************/
974
975/*
976 * Return the address of the child corresponding to "chunk",
977 * or NULL if there is no such element.
978 */
979static inline RT_PTR_ALLOC *
981{
982 int count = node->base.count;
983#ifndef USE_NO_SIMD
984 Vector8 spread_chunk;
985 Vector8 haystack1;
986 Vector8 haystack2;
987 Vector8 cmp1;
988 Vector8 cmp2;
989 uint32 bitfield;
990 RT_PTR_ALLOC *slot_simd = NULL;
991#endif
992
993#if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
994 RT_PTR_ALLOC *slot = NULL;
995
996 for (int i = 0; i < count; i++)
997 {
998 if (node->chunks[i] == chunk)
999 {
1000 slot = &node->children[i];
1001 break;
1002 }
1003 }
1004#endif
1005
1006#ifndef USE_NO_SIMD
1007 /* replicate the search key */
1008 spread_chunk = vector8_broadcast(chunk);
1009
1010 /* compare to all 32 keys stored in the node */
1011 vector8_load(&haystack1, &node->chunks[0]);
1012 vector8_load(&haystack2, &node->chunks[sizeof(Vector8)]);
1013 cmp1 = vector8_eq(spread_chunk, haystack1);
1014 cmp2 = vector8_eq(spread_chunk, haystack2);
1015
1016 /* convert comparison to a bitfield */
1017 bitfield = vector8_highbit_mask(cmp1) | (vector8_highbit_mask(cmp2) << sizeof(Vector8));
1018
1019 /* mask off invalid entries */
1020 bitfield &= ((UINT64CONST(1) << count) - 1);
1021
1022 /* convert bitfield to index by counting trailing zeros */
1023 if (bitfield)
1024 slot_simd = &node->children[pg_rightmost_one_pos32(bitfield)];
1025
1026 Assert(slot_simd == slot);
1027 return slot_simd;
1028#else
1029 return slot;
1030#endif
1031}
1032
1033/*
1034 * Search for the child pointer corresponding to "key" in the given node.
1035 *
1036 * Return child if the key is found, otherwise return NULL.
1037 */
1038static inline RT_PTR_ALLOC *
1040{
1041 /* Make sure we already converted to local pointer */
1042 Assert(node != NULL);
1043
1044 switch (node->kind)
1045 {
1046 case RT_NODE_KIND_4:
1047 {
1048 RT_NODE_4 *n4 = (RT_NODE_4 *) node;
1049
1050 for (int i = 0; i < n4->base.count; i++)
1051 {
1052 if (n4->chunks[i] == chunk)
1053 return &n4->children[i];
1054 }
1055 return NULL;
1056 }
1057 case RT_NODE_KIND_16:
1058 return RT_NODE_16_SEARCH_EQ((RT_NODE_16 *) node, chunk);
1059 case RT_NODE_KIND_48:
1060 {
1061 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
1062 int slotpos = n48->slot_idxs[chunk];
1063
1064 if (slotpos == RT_INVALID_SLOT_IDX)
1065 return NULL;
1066
1067 return RT_NODE_48_GET_CHILD(n48, chunk);
1068 }
1069 case RT_NODE_KIND_256:
1070 {
1071 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
1072
1073 if (!RT_NODE_256_IS_CHUNK_USED(n256, chunk))
1074 return NULL;
1075
1076 return RT_NODE_256_GET_CHILD(n256, chunk);
1077 }
1078 default:
1080 }
1081}
1082
1083/*
1084 * Search the given key in the radix tree. Return the pointer to the value if found,
1085 * otherwise return NULL.
1086 *
1087 * Since the function returns a pointer (to support variable-length values),
1088 * the caller is responsible for locking until it's finished with the value.
1089 */
1092{
1093 RT_CHILD_PTR node;
1094 RT_PTR_ALLOC *slot = NULL;
1095 int shift;
1096
1097#ifdef RT_SHMEM
1098 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1099#endif
1100
1101 if (key > tree->ctl->max_val)
1102 return NULL;
1103
1104 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
1105 node.alloc = tree->ctl->root;
1106 shift = tree->ctl->start_shift;
1107
1108 /* Descend the tree */
1109 while (shift >= 0)
1110 {
1111 RT_PTR_SET_LOCAL(tree, &node);
1112 slot = RT_NODE_SEARCH(node.local, RT_GET_KEY_CHUNK(key, shift));
1113 if (slot == NULL)
1114 return NULL;
1115
1116 node.alloc = *slot;
1117 shift -= RT_SPAN;
1118 }
1119
1120 if (RT_CHILDPTR_IS_VALUE(*slot))
1121 return (RT_VALUE_TYPE *) slot;
1122 else
1123 {
1124 RT_PTR_SET_LOCAL(tree, &node);
1125 return (RT_VALUE_TYPE *) node.local;
1126 }
1127}
1128
1129/***************** INSERTION *****************/
1130
1131#define RT_NODE_MUST_GROW(node) \
1132 ((node)->count == (node)->fanout)
1133
1134/*
1135 * Return index of the chunk and slot arrays for inserting into the node,
1136 * such that the arrays remain ordered.
1137 */
1138static inline int
1139RT_NODE_4_GET_INSERTPOS(RT_NODE_4 * node, uint8 chunk, int count)
1140{
1141 int idx;
1142
1143 for (idx = 0; idx < count; idx++)
1144 {
1145 if (node->chunks[idx] >= chunk)
1146 break;
1147 }
1148
1149 return idx;
1150}
1151
1152/*
1153 * Return index of the chunk and slot arrays for inserting into the node,
1154 * such that the arrays remain ordered.
1155 */
1156static inline int
1158{
1159 int count = node->base.count;
1160#if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
1161 int index;
1162#endif
1163
1164#ifndef USE_NO_SIMD
1165 Vector8 spread_chunk;
1166 Vector8 haystack1;
1167 Vector8 haystack2;
1168 Vector8 cmp1;
1169 Vector8 cmp2;
1170 Vector8 min1;
1171 Vector8 min2;
1172 uint32 bitfield;
1173 int index_simd;
1174#endif
1175
1176 /*
1177 * First compare the last element. There are two reasons to branch here:
1178 *
1179 * 1) A realistic pattern is inserting ordered keys. In that case,
1180 * non-SIMD platforms must do a linear search to the last chunk to find
1181 * the insert position. This will get slower as the node fills up.
1182 *
1183 * 2) On SIMD platforms, we must branch anyway to make sure we don't bit
1184 * scan an empty bitfield. Doing the branch here eliminates some work that
1185 * we might otherwise throw away.
1186 */
1187 Assert(count > 0);
1188 if (node->chunks[count - 1] < chunk)
1189 return count;
1190
1191#if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
1192
1193 for (index = 0; index < count; index++)
1194 {
1195 if (node->chunks[index] > chunk)
1196 break;
1197 }
1198#endif
1199
1200#ifndef USE_NO_SIMD
1201
1202 /*
1203 * This is a bit more complicated than RT_NODE_16_SEARCH_EQ(), because no
1204 * unsigned uint8 comparison instruction exists, at least for SSE2. So we
1205 * need to play some trickery using vector8_min() to effectively get >=.
1206 * There'll never be any equal elements in current uses, but that's what
1207 * we get here...
1208 */
1209 spread_chunk = vector8_broadcast(chunk);
1210 vector8_load(&haystack1, &node->chunks[0]);
1211 vector8_load(&haystack2, &node->chunks[sizeof(Vector8)]);
1212 min1 = vector8_min(spread_chunk, haystack1);
1213 min2 = vector8_min(spread_chunk, haystack2);
1214 cmp1 = vector8_eq(spread_chunk, min1);
1215 cmp2 = vector8_eq(spread_chunk, min2);
1216 bitfield = vector8_highbit_mask(cmp1) | (vector8_highbit_mask(cmp2) << sizeof(Vector8));
1217
1218 Assert((bitfield & ((UINT64CONST(1) << count) - 1)) != 0);
1219 index_simd = pg_rightmost_one_pos32(bitfield);
1220
1221 Assert(index_simd == index);
1222 return index_simd;
1223#else
1224 return index;
1225#endif
1226}
1227
1228/* Shift the elements right at "insertpos" by one */
1229static inline void
1230RT_SHIFT_ARRAYS_FOR_INSERT(uint8 *chunks, RT_PTR_ALLOC * children, int count, int insertpos)
1231{
1232 /*
1233 * This is basically a memmove, but written in a simple loop for speed on
1234 * small inputs.
1235 */
1236 for (int i = count - 1; i >= insertpos; i--)
1237 {
1238 /* workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481 */
1239#ifdef __GNUC__
1240 __asm__("");
1241#endif
1242 chunks[i + 1] = chunks[i];
1243 children[i + 1] = children[i];
1244 }
1245}
1246
1247/*
1248 * Copy both chunk and slot arrays into the right
1249 * place. The caller is responsible for inserting the new element.
1250 */
1251static inline void
1253 uint8 *src_chunks, RT_PTR_ALLOC * src_children,
1254 int count, int insertpos)
1255{
1256 for (int i = 0; i < count; i++)
1257 {
1258 int sourceidx = i;
1259
1260 /* use a branch-free computation to skip the index of the new element */
1261 int destidx = i + (i >= insertpos);
1262
1263 dst_chunks[destidx] = src_chunks[sourceidx];
1264 dst_children[destidx] = src_children[sourceidx];
1265 }
1266}
1267
1268static inline RT_PTR_ALLOC *
1270{
1271 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
1272 int idx = RT_BM_IDX(chunk);
1273 int bitnum = RT_BM_BIT(chunk);
1274
1275 /* Mark the slot used for "chunk" */
1276 n256->isset[idx] |= ((bitmapword) 1 << bitnum);
1277
1278 n256->base.count++;
1279 RT_VERIFY_NODE((RT_NODE *) n256);
1280
1281 return RT_NODE_256_GET_CHILD(n256, chunk);
1282}
1283
1286 uint8 chunk)
1287{
1288 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
1289 RT_CHILD_PTR newnode;
1290 RT_NODE_256 *new256;
1291 int i = 0;
1292
1293 /* initialize new node */
1295 new256 = (RT_NODE_256 *) newnode.local;
1296
1297 /* copy over the entries */
1298 RT_COPY_COMMON(newnode, node);
1299 for (int word_num = 0; word_num < RT_BM_IDX(RT_NODE_MAX_SLOTS); word_num++)
1300 {
1301 bitmapword bitmap = 0;
1302
1303 /*
1304 * Bit manipulation is a surprisingly large portion of the overhead in
1305 * the naive implementation. Doing stores word-at-a-time removes a lot
1306 * of that overhead.
1307 */
1308 for (int bit = 0; bit < BITS_PER_BITMAPWORD; bit++)
1309 {
1310 uint8 offset = n48->slot_idxs[i];
1311
1312 if (offset != RT_INVALID_SLOT_IDX)
1313 {
1314 bitmap |= ((bitmapword) 1 << bit);
1315 new256->children[i] = n48->children[offset];
1316 }
1317
1318 i++;
1319 }
1320
1321 new256->isset[word_num] = bitmap;
1322 }
1323
1324 /* free old node and update reference in parent */
1325 *parent_slot = newnode.alloc;
1326 RT_FREE_NODE(tree, node);
1327
1328 return RT_ADD_CHILD_256(tree, newnode, chunk);
1329}
1330
1331static inline RT_PTR_ALLOC *
1333{
1334 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
1335 int insertpos;
1336 int idx = 0;
1337 bitmapword w,
1338 inverse;
1339
1340 /* get the first word with at least one bit not set */
1341 for (int i = 0; i < RT_BM_IDX(RT_FANOUT_48_MAX); i++)
1342 {
1343 w = n48->isset[i];
1344 if (w < ~((bitmapword) 0))
1345 {
1346 idx = i;
1347 break;
1348 }
1349 }
1350
1351 /* To get the first unset bit in w, get the first set bit in ~w */
1352 inverse = ~w;
1353 insertpos = idx * BITS_PER_BITMAPWORD;
1354 insertpos += bmw_rightmost_one_pos(inverse);
1355 Assert(insertpos < n48->base.fanout);
1356
1357 /* mark the slot used by setting the rightmost zero bit */
1358 n48->isset[idx] |= w + 1;
1359
1360 /* insert new chunk into place */
1361 n48->slot_idxs[chunk] = insertpos;
1362
1363 n48->base.count++;
1364 RT_VERIFY_NODE((RT_NODE *) n48);
1365
1366 return &n48->children[insertpos];
1367}
1368
1371 uint8 chunk)
1372{
1373 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
1374 int insertpos;
1375
1376 if (n16->base.fanout < RT_FANOUT_16_HI)
1377 {
1378 RT_CHILD_PTR newnode;
1379 RT_NODE_16 *new16;
1380
1382
1383 /* initialize new node */
1385 new16 = (RT_NODE_16 *) newnode.local;
1386
1387 /* copy over existing entries */
1388 RT_COPY_COMMON(newnode, node);
1390 insertpos = RT_NODE_16_GET_INSERTPOS(n16, chunk);
1392 n16->chunks, n16->children,
1393 RT_FANOUT_16_LO, insertpos);
1394
1395 /* insert new chunk into place */
1396 new16->chunks[insertpos] = chunk;
1397
1398 new16->base.count++;
1399 RT_VERIFY_NODE((RT_NODE *) new16);
1400
1401 /* free old node and update references */
1402 RT_FREE_NODE(tree, node);
1403 *parent_slot = newnode.alloc;
1404
1405 return &new16->children[insertpos];
1406 }
1407 else
1408 {
1409 RT_CHILD_PTR newnode;
1410 RT_NODE_48 *new48;
1411 int idx,
1412 bit;
1413
1415
1416 /* initialize new node */
1418 new48 = (RT_NODE_48 *) newnode.local;
1419
1420 /* copy over the entries */
1421 RT_COPY_COMMON(newnode, node);
1422 for (int i = 0; i < RT_FANOUT_16_HI; i++)
1423 new48->slot_idxs[n16->chunks[i]] = i;
1424 memcpy(&new48->children[0], &n16->children[0], RT_FANOUT_16_HI * sizeof(new48->children[0]));
1425
1426 /*
1427 * Since we just copied a dense array, we can fill "isset" using a
1428 * single store, provided the length of that array is at most the
1429 * number of bits in a bitmapword.
1430 */
1432 new48->isset[0] = (bitmapword) (((uint64) 1 << RT_FANOUT_16_HI) - 1);
1433
1434 /* put the new child at the end of the copied entries */
1435 insertpos = RT_FANOUT_16_HI;
1436 idx = RT_BM_IDX(insertpos);
1437 bit = RT_BM_BIT(insertpos);
1438
1439 /* mark the slot used */
1440 new48->isset[idx] |= ((bitmapword) 1 << bit);
1441
1442 /* insert new chunk into place */
1443 new48->slot_idxs[chunk] = insertpos;
1444
1445 new48->base.count++;
1446 RT_VERIFY_NODE((RT_NODE *) new48);
1447
1448 /* free old node and update reference in parent */
1449 *parent_slot = newnode.alloc;
1450 RT_FREE_NODE(tree, node);
1451
1452 return &new48->children[insertpos];
1453 }
1454}
1455
1456static inline RT_PTR_ALLOC *
1458{
1459 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
1460 int insertpos = RT_NODE_16_GET_INSERTPOS(n16, chunk);
1461
1462 /* shift chunks and children */
1464 n16->base.count, insertpos);
1465
1466 /* insert new chunk into place */
1467 n16->chunks[insertpos] = chunk;
1468
1469 n16->base.count++;
1470 RT_VERIFY_NODE((RT_NODE *) n16);
1471
1472 return &n16->children[insertpos];
1473}
1474
1477 uint8 chunk)
1478{
1479 RT_NODE_4 *n4 = (RT_NODE_4 *) (node.local);
1480 RT_CHILD_PTR newnode;
1481 RT_NODE_16 *new16;
1482 int insertpos;
1483
1484 /* initialize new node */
1486 new16 = (RT_NODE_16 *) newnode.local;
1487
1488 /* copy over existing entries */
1489 RT_COPY_COMMON(newnode, node);
1490 Assert(n4->base.count == RT_FANOUT_4);
1491 insertpos = RT_NODE_4_GET_INSERTPOS(n4, chunk, RT_FANOUT_4);
1493 n4->chunks, n4->children,
1494 RT_FANOUT_4, insertpos);
1495
1496 /* insert new chunk into place */
1497 new16->chunks[insertpos] = chunk;
1498
1499 new16->base.count++;
1500 RT_VERIFY_NODE((RT_NODE *) new16);
1501
1502 /* free old node and update reference in parent */
1503 *parent_slot = newnode.alloc;
1504 RT_FREE_NODE(tree, node);
1505
1506 return &new16->children[insertpos];
1507}
1508
1509static inline RT_PTR_ALLOC *
1511{
1512 RT_NODE_4 *n4 = (RT_NODE_4 *) (node.local);
1513 int count = n4->base.count;
1514 int insertpos = RT_NODE_4_GET_INSERTPOS(n4, chunk, count);
1515
1516 /* shift chunks and children */
1518 count, insertpos);
1519
1520 /* insert new chunk into place */
1521 n4->chunks[insertpos] = chunk;
1522
1523 n4->base.count++;
1524 RT_VERIFY_NODE((RT_NODE *) n4);
1525
1526 return &n4->children[insertpos];
1527}
1528
1529/*
1530 * Reserve slot in "node"'s child array. The caller will populate it
1531 * with the actual child pointer.
1532 *
1533 * "parent_slot" is the address of the parent's child pointer to "node".
1534 * If the node we're inserting into needs to grow, we update the parent's
1535 * child pointer with the pointer to the new larger node.
1536 */
1537static inline RT_PTR_ALLOC *
1539 uint8 chunk)
1540{
1541 RT_NODE *n = node.local;
1542
1543 switch (n->kind)
1544 {
1545 case RT_NODE_KIND_4:
1546 {
1548 return RT_GROW_NODE_4(tree, parent_slot, node, chunk);
1549
1550 return RT_ADD_CHILD_4(tree, node, chunk);
1551 }
1552 case RT_NODE_KIND_16:
1553 {
1555 return RT_GROW_NODE_16(tree, parent_slot, node, chunk);
1556
1557 return RT_ADD_CHILD_16(tree, node, chunk);
1558 }
1559 case RT_NODE_KIND_48:
1560 {
1562 return RT_GROW_NODE_48(tree, parent_slot, node, chunk);
1563
1564 return RT_ADD_CHILD_48(tree, node, chunk);
1565 }
1566 case RT_NODE_KIND_256:
1567 return RT_ADD_CHILD_256(tree, node, chunk);
1568 default:
1570 }
1571}
1572
1573/*
1574 * The radix tree doesn't have sufficient height. Put new node(s) on top,
1575 * and move the old node below it.
1576 */
1577static pg_noinline void
1579{
1580 int target_shift = RT_KEY_GET_SHIFT(key);
1581 int shift = tree->ctl->start_shift;
1582
1583 Assert(shift < target_shift);
1584
1585 /* Grow tree upwards until start shift can accommodate the key */
1586 while (shift < target_shift)
1587 {
1588 RT_CHILD_PTR node;
1589 RT_NODE_4 *n4;
1590
1592 n4 = (RT_NODE_4 *) node.local;
1593 n4->base.count = 1;
1594 n4->chunks[0] = 0;
1595 n4->children[0] = tree->ctl->root;
1596
1597 /* Update the root */
1598 tree->ctl->root = node.alloc;
1599
1600 shift += RT_SPAN;
1601 }
1602
1603 tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(target_shift);
1604 tree->ctl->start_shift = target_shift;
1605}
1606
1607/*
1608 * Insert a chain of nodes until we reach the lowest level,
1609 * and return the address of a slot to be filled further up
1610 * the call stack.
1611 */
1614{
1615 RT_CHILD_PTR node,
1616 child;
1617 RT_NODE_4 *n4;
1618
1619 /*
1620 * The child pointer of the first node in the chain goes in the
1621 * caller-provided slot.
1622 */
1624 *parent_slot = child.alloc;
1625
1626 node = child;
1627 shift -= RT_SPAN;
1628
1629 while (shift > 0)
1630 {
1632
1633 /* We open-code the insertion ourselves, for speed. */
1634 n4 = (RT_NODE_4 *) node.local;
1635 n4->base.count = 1;
1636 n4->chunks[0] = RT_GET_KEY_CHUNK(key, shift);
1637 n4->children[0] = child.alloc;
1638
1639 node = child;
1640 shift -= RT_SPAN;
1641 }
1642 Assert(shift == 0);
1643
1644 /* Reserve slot for the value. */
1645 n4 = (RT_NODE_4 *) node.local;
1646 n4->chunks[0] = RT_GET_KEY_CHUNK(key, 0);
1647 n4->base.count = 1;
1648
1649 return &n4->children[0];
1650}
1651
1652/*
1653 * Workhorse for RT_SET
1654 *
1655 * "parent_slot" is the address of the child pointer we just followed,
1656 * in the parent's array of children, needed if inserting into the
1657 * current node causes it to grow.
1658 */
1659static RT_PTR_ALLOC *
1660RT_GET_SLOT_RECURSIVE(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, uint64 key, int shift, bool *found)
1661{
1662 RT_PTR_ALLOC *slot;
1663 RT_CHILD_PTR node;
1664 uint8 chunk = RT_GET_KEY_CHUNK(key, shift);
1665
1666 node.alloc = *parent_slot;
1667 RT_PTR_SET_LOCAL(tree, &node);
1668 slot = RT_NODE_SEARCH(node.local, chunk);
1669
1670 if (slot == NULL)
1671 {
1672 *found = false;
1673
1674 /* reserve slot for the caller to populate */
1675
1676 slot = RT_NODE_INSERT(tree, parent_slot, node, chunk);
1677
1678 if (shift == 0)
1679 return slot;
1680 else
1681 return RT_EXTEND_DOWN(tree, slot, key, shift);
1682 }
1683 else
1684 {
1685 if (shift == 0)
1686 {
1687 *found = true;
1688 return slot;
1689 }
1690 else
1691 return RT_GET_SLOT_RECURSIVE(tree, slot, key, shift - RT_SPAN, found);
1692 }
1693}
1694
1695/*
1696 * Set key to value that value_p points to. If the entry already exists, we
1697 * update its value and return true. Returns false if entry doesn't yet exist.
1698 *
1699 * Taking a lock in exclusive mode is the caller's responsibility.
1700 */
1701RT_SCOPE bool
1703{
1704 bool found;
1705 RT_PTR_ALLOC *slot;
1706 size_t value_sz = RT_GET_VALUE_SIZE(value_p);
1707
1708#ifdef RT_SHMEM
1709 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1710#endif
1711
1712 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
1713
1714 /* Extend the tree if necessary */
1715 if (unlikely(key > tree->ctl->max_val))
1716 {
1717 if (tree->ctl->num_keys == 0)
1718 {
1719 RT_CHILD_PTR node;
1720 RT_NODE_4 *n4;
1722
1723 /*
1724 * With an empty root node, we don't extend the tree upwards,
1725 * since that would result in orphan empty nodes. Instead we open
1726 * code inserting into the root node and extend downward from
1727 * there.
1728 */
1729 node.alloc = tree->ctl->root;
1730 RT_PTR_SET_LOCAL(tree, &node);
1731 n4 = (RT_NODE_4 *) node.local;
1732 n4->base.count = 1;
1734
1735 slot = RT_EXTEND_DOWN(tree, &n4->children[0], key, start_shift);
1736 found = false;
1737 tree->ctl->start_shift = start_shift;
1738 tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(start_shift);
1739 goto have_slot;
1740 }
1741 else
1743 }
1744
1745 slot = RT_GET_SLOT_RECURSIVE(tree, &tree->ctl->root,
1746 key, tree->ctl->start_shift, &found);
1747
1748have_slot:
1749 Assert(slot != NULL);
1750
1751 if (RT_VALUE_IS_EMBEDDABLE(value_p))
1752 {
1753 /* free the existing leaf */
1754 if (found && !RT_CHILDPTR_IS_VALUE(*slot))
1755 RT_FREE_LEAF(tree, *slot);
1756
1757 /* store value directly in child pointer slot */
1758 memcpy(slot, value_p, value_sz);
1759
1760#ifdef RT_RUNTIME_EMBEDDABLE_VALUE
1761 /* tag child pointer */
1762#ifdef RT_SHMEM
1763 *slot |= 1;
1764#else
1765 *((uintptr_t *) slot) |= 1;
1766#endif
1767#endif
1768 }
1769 else
1770 {
1771 RT_CHILD_PTR leaf;
1772
1773 if (found && !RT_CHILDPTR_IS_VALUE(*slot))
1774 {
1776 leaf.alloc = *slot;
1777 RT_PTR_SET_LOCAL(tree, &leaf);
1778
1779 if (RT_GET_VALUE_SIZE((RT_VALUE_TYPE *) leaf.local) != value_sz)
1780 {
1781 /*
1782 * different sizes, so first free the existing leaf before
1783 * allocating a new one
1784 */
1785 RT_FREE_LEAF(tree, *slot);
1786 leaf = RT_ALLOC_LEAF(tree, value_sz);
1787 *slot = leaf.alloc;
1788 }
1789 }
1790 else
1791 {
1792 /* allocate new leaf and store it in the child array */
1793 leaf = RT_ALLOC_LEAF(tree, value_sz);
1794 *slot = leaf.alloc;
1795 }
1796
1797 memcpy(leaf.local, value_p, value_sz);
1798 }
1799
1800 /* Update the statistics */
1801 if (!found)
1802 tree->ctl->num_keys++;
1803
1804 return found;
1805}
1806
1807/***************** SETUP / TEARDOWN *****************/
1808
1809/*
1810 * Create the radix tree root in the caller's memory context and return it.
1811 *
1812 * The tree's nodes and leaves are allocated in "ctx" and its children for
1813 * local memory, or in "dsa" for shared memory.
1814 */
1816#ifdef RT_SHMEM
1817RT_CREATE(dsa_area *dsa, int tranche_id)
1818#else
1820#endif
1821{
1824#ifdef RT_SHMEM
1825 dsa_pointer dp;
1826#endif
1827
1829
1830#ifdef RT_SHMEM
1831 tree->dsa = dsa;
1832 dp = dsa_allocate0(dsa, sizeof(RT_RADIX_TREE_CONTROL));
1833 tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, dp);
1834 tree->ctl->handle = dp;
1835 tree->ctl->magic = RT_RADIX_TREE_MAGIC;
1836 LWLockInitialize(&tree->ctl->lock, tranche_id);
1837#else
1839
1840 /* Create a slab context for each size class */
1841 for (int i = 0; i < RT_NUM_SIZE_CLASSES; i++)
1842 {
1844 size_t inner_blocksize = RT_SLAB_BLOCK_SIZE(size_class.allocsize);
1845
1846 tree->node_slabs[i] = SlabContextCreate(ctx,
1847 size_class.name,
1848 inner_blocksize,
1849 size_class.allocsize);
1850 }
1851
1852 tree->leaf_context = ctx;
1853#endif /* RT_SHMEM */
1854
1855 /* add root node now so that RT_SET can assume it exists */
1857 tree->ctl->root = rootnode.alloc;
1858 tree->ctl->start_shift = 0;
1859 tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(0);
1860
1861 return tree;
1862}
1863
1864#ifdef RT_SHMEM
1866RT_ATTACH(dsa_area *dsa, RT_HANDLE handle)
1867{
1869 dsa_pointer control;
1870
1871 tree = (RT_RADIX_TREE *) palloc0(sizeof(RT_RADIX_TREE));
1872
1873 /* Find the control object in shared memory */
1874 control = handle;
1875
1876 tree->dsa = dsa;
1877 tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, control);
1878 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1879
1880 return tree;
1881}
1882
1883RT_SCOPE void
1884RT_DETACH(RT_RADIX_TREE * tree)
1885{
1886 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1887 pfree(tree);
1888}
1889
1890RT_SCOPE RT_HANDLE
1891RT_GET_HANDLE(RT_RADIX_TREE * tree)
1892{
1893 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1894 return tree->ctl->handle;
1895}
1896
1897RT_SCOPE void
1898RT_LOCK_EXCLUSIVE(RT_RADIX_TREE * tree)
1899{
1900 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1901 LWLockAcquire(&tree->ctl->lock, LW_EXCLUSIVE);
1902}
1903
1904RT_SCOPE void
1905RT_LOCK_SHARE(RT_RADIX_TREE * tree)
1906{
1907 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1908 LWLockAcquire(&tree->ctl->lock, LW_SHARED);
1909}
1910
1911RT_SCOPE void
1912RT_UNLOCK(RT_RADIX_TREE * tree)
1913{
1914 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
1915 LWLockRelease(&tree->ctl->lock);
1916}
1917
1918/*
1919 * Recursively free all nodes allocated in the DSA area.
1920 */
1921static void
1923{
1924 RT_CHILD_PTR node;
1925
1927
1928 node.alloc = ptr;
1929 RT_PTR_SET_LOCAL(tree, &node);
1930
1931 switch (node.local->kind)
1932 {
1933 case RT_NODE_KIND_4:
1934 {
1935 RT_NODE_4 *n4 = (RT_NODE_4 *) node.local;
1936
1937 for (int i = 0; i < n4->base.count; i++)
1938 {
1939 RT_PTR_ALLOC child = n4->children[i];
1940
1941 if (shift > 0)
1942 RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
1943 else if (!RT_CHILDPTR_IS_VALUE(child))
1944 dsa_free(tree->dsa, child);
1945 }
1946
1947 break;
1948 }
1949 case RT_NODE_KIND_16:
1950 {
1951 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
1952
1953 for (int i = 0; i < n16->base.count; i++)
1954 {
1955 RT_PTR_ALLOC child = n16->children[i];
1956
1957 if (shift > 0)
1958 RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
1959 else if (!RT_CHILDPTR_IS_VALUE(child))
1960 dsa_free(tree->dsa, child);
1961 }
1962
1963 break;
1964 }
1965 case RT_NODE_KIND_48:
1966 {
1967 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
1968
1969 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
1970 {
1971 RT_PTR_ALLOC child;
1972
1973 if (!RT_NODE_48_IS_CHUNK_USED(n48, i))
1974 continue;
1975
1976 child = *RT_NODE_48_GET_CHILD(n48, i);
1977
1978 if (shift > 0)
1979 RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
1980 else if (!RT_CHILDPTR_IS_VALUE(child))
1981 dsa_free(tree->dsa, child);
1982 }
1983
1984 break;
1985 }
1986 case RT_NODE_KIND_256:
1987 {
1988 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
1989
1990 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
1991 {
1992 RT_PTR_ALLOC child;
1993
1994 if (!RT_NODE_256_IS_CHUNK_USED(n256, i))
1995 continue;
1996
1997 child = *RT_NODE_256_GET_CHILD(n256, i);
1998
1999 if (shift > 0)
2000 RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
2001 else if (!RT_CHILDPTR_IS_VALUE(child))
2002 dsa_free(tree->dsa, child);
2003 }
2004
2005 break;
2006 }
2007 }
2008
2009 /* Free the inner node */
2010 dsa_free(tree->dsa, ptr);
2011}
2012#endif
2013
2014/*
2015 * Free the radix tree, including all nodes and leaves.
2016 */
2017RT_SCOPE void
2019{
2020#ifdef RT_SHMEM
2021 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
2022
2023 /* Free all memory used for radix tree nodes */
2024 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
2025 RT_FREE_RECURSE(tree, tree->ctl->root, tree->ctl->start_shift);
2026
2027 /*
2028 * Vandalize the control block to help catch programming error where other
2029 * backends access the memory formerly occupied by this radix tree.
2030 */
2031 tree->ctl->magic = 0;
2032 dsa_free(tree->dsa, tree->ctl->handle);
2033#else
2034 /*
2035 * Free all space allocated within the leaf context and delete all child
2036 * contexts such as those used for nodes.
2037 */
2038 MemoryContextReset(tree->leaf_context);
2039
2040 pfree(tree->ctl);
2041#endif
2042 pfree(tree);
2043}
2044
2045/***************** ITERATION *****************/
2046
2047/*
2048 * Create and return an iterator for the given radix tree
2049 * in the caller's memory context.
2050 *
2051 * Taking a lock in shared mode during the iteration is the caller's
2052 * responsibility.
2053 */
2056{
2057 RT_ITER *iter;
2059
2060 iter = (RT_ITER *) palloc0(sizeof(RT_ITER));
2061 iter->tree = tree;
2062
2063 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
2064 root.alloc = iter->tree->ctl->root;
2066
2067 iter->top_level = iter->tree->ctl->start_shift / RT_SPAN;
2068
2069 /* Set the root to start */
2070 iter->cur_level = iter->top_level;
2071 iter->node_iters[iter->cur_level].node = root;
2072 iter->node_iters[iter->cur_level].idx = 0;
2073
2074 return iter;
2075}
2076
2077/*
2078 * Scan the inner node and return the next child pointer if one exists, otherwise
2079 * return NULL.
2080 */
2081static inline RT_PTR_ALLOC *
2083{
2084 uint8 key_chunk = 0;
2085 RT_NODE_ITER *node_iter;
2086 RT_CHILD_PTR node;
2087 RT_PTR_ALLOC *slot = NULL;
2088
2089#ifdef RT_SHMEM
2090 Assert(iter->tree->ctl->magic == RT_RADIX_TREE_MAGIC);
2091#endif
2092
2093 node_iter = &(iter->node_iters[level]);
2094 node = node_iter->node;
2095
2096 Assert(node.local != NULL);
2097
2098 switch ((node.local)->kind)
2099 {
2100 case RT_NODE_KIND_4:
2101 {
2102 RT_NODE_4 *n4 = (RT_NODE_4 *) (node.local);
2103
2104 if (node_iter->idx >= n4->base.count)
2105 return NULL;
2106
2107 slot = &n4->children[node_iter->idx];
2108 key_chunk = n4->chunks[node_iter->idx];
2109 node_iter->idx++;
2110 break;
2111 }
2112 case RT_NODE_KIND_16:
2113 {
2114 RT_NODE_16 *n16 = (RT_NODE_16 *) (node.local);
2115
2116 if (node_iter->idx >= n16->base.count)
2117 return NULL;
2118
2119 slot = &n16->children[node_iter->idx];
2120 key_chunk = n16->chunks[node_iter->idx];
2121 node_iter->idx++;
2122 break;
2123 }
2124 case RT_NODE_KIND_48:
2125 {
2126 RT_NODE_48 *n48 = (RT_NODE_48 *) (node.local);
2127 int chunk;
2128
2129 for (chunk = node_iter->idx; chunk < RT_NODE_MAX_SLOTS; chunk++)
2130 {
2131 if (RT_NODE_48_IS_CHUNK_USED(n48, chunk))
2132 break;
2133 }
2134
2135 if (chunk >= RT_NODE_MAX_SLOTS)
2136 return NULL;
2137
2138 slot = RT_NODE_48_GET_CHILD(n48, chunk);
2139
2140 key_chunk = chunk;
2141 node_iter->idx = chunk + 1;
2142 break;
2143 }
2144 case RT_NODE_KIND_256:
2145 {
2146 RT_NODE_256 *n256 = (RT_NODE_256 *) (node.local);
2147 int chunk;
2148
2149 for (chunk = node_iter->idx; chunk < RT_NODE_MAX_SLOTS; chunk++)
2150 {
2151 if (RT_NODE_256_IS_CHUNK_USED(n256, chunk))
2152 break;
2153 }
2154
2155 if (chunk >= RT_NODE_MAX_SLOTS)
2156 return NULL;
2157
2158 slot = RT_NODE_256_GET_CHILD(n256, chunk);
2159
2160 key_chunk = chunk;
2161 node_iter->idx = chunk + 1;
2162 break;
2163 }
2164 }
2165
2166 /* Update the key */
2167 iter->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
2168 iter->key |= (((uint64) key_chunk) << (level * RT_SPAN));
2169
2170 return slot;
2171}
2172
2173/*
2174 * Return pointer to value and set key_p as long as there is a key. Otherwise
2175 * return NULL.
2176 */
2179{
2180 RT_PTR_ALLOC *slot = NULL;
2181
2182 while (iter->cur_level <= iter->top_level)
2183 {
2184 RT_CHILD_PTR node;
2185
2186 slot = RT_NODE_ITERATE_NEXT(iter, iter->cur_level);
2187
2188 if (iter->cur_level == 0 && slot != NULL)
2189 {
2190 /* Found a value at the leaf node */
2191 *key_p = iter->key;
2192 node.alloc = *slot;
2193
2194 if (RT_CHILDPTR_IS_VALUE(*slot))
2195 return (RT_VALUE_TYPE *) slot;
2196 else
2197 {
2198 RT_PTR_SET_LOCAL(iter->tree, &node);
2199 return (RT_VALUE_TYPE *) node.local;
2200 }
2201 }
2202
2203 if (slot != NULL)
2204 {
2205 /* Found the child slot, move down the tree */
2206 node.alloc = *slot;
2207 RT_PTR_SET_LOCAL(iter->tree, &node);
2208
2209 iter->cur_level--;
2210 iter->node_iters[iter->cur_level].node = node;
2211 iter->node_iters[iter->cur_level].idx = 0;
2212 }
2213 else
2214 {
2215 /* Not found the child slot, move up the tree */
2216 iter->cur_level++;
2217 }
2218 }
2219
2220 /* We've visited all nodes, so the iteration finished */
2221 return NULL;
2222}
2223
2224/*
2225 * Terminate the iteration. The caller is responsible for releasing any locks.
2226 */
2227RT_SCOPE void
2229{
2230 pfree(iter);
2231}
2232
2233/***************** DELETION *****************/
2234
2235#ifdef RT_USE_DELETE
2236
2237/* Delete the element at "deletepos" */
2238static inline void
2239RT_SHIFT_ARRAYS_AND_DELETE(uint8 *chunks, RT_PTR_ALLOC * children, int count, int deletepos)
2240{
2241 /*
2242 * This is basically a memmove, but written in a simple loop for speed on
2243 * small inputs.
2244 */
2245 for (int i = deletepos; i < count - 1; i++)
2246 {
2247 /* workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481 */
2248#ifdef __GNUC__
2249 __asm__("");
2250#endif
2251 chunks[i] = chunks[i + 1];
2252 children[i] = children[i + 1];
2253 }
2254}
2255
2256/*
2257 * Copy both chunk and slot arrays into the right
2258 * place. The element at "deletepos" is deleted by skipping it.
2259 */
2260static inline void
2261RT_COPY_ARRAYS_AND_DELETE(uint8 *dst_chunks, RT_PTR_ALLOC * dst_children,
2262 uint8 *src_chunks, RT_PTR_ALLOC * src_children,
2263 int count, int deletepos)
2264{
2265 for (int i = 0; i < count - 1; i++)
2266 {
2267 /*
2268 * use a branch-free computation to skip the index of the deleted
2269 * element
2270 */
2271 int sourceidx = i + (i >= deletepos);
2272 int destidx = i;
2273
2274 dst_chunks[destidx] = src_chunks[sourceidx];
2275 dst_children[destidx] = src_children[sourceidx];
2276 }
2277}
2278
2279/*
2280 * Note: While all node-growing functions are called to perform an insertion
2281 * when no more space is available, shrinking is not a hard-and-fast requirement.
2282 * When shrinking nodes, we generally wait until the count is about 3/4 of
2283 * the next lower node's fanout. This prevents ping-ponging between different
2284 * node sizes.
2285 *
2286 * Some shrinking functions delete first and then shrink, either because we
2287 * must or because it's fast and simple that way. Sometimes it's faster to
2288 * delete while shrinking.
2289 */
2290
2291/*
2292 * Move contents of a node256 to a node48. Any deletion should have happened
2293 * in the caller.
2294 */
2295static void pg_noinline
2297{
2298 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
2299 RT_CHILD_PTR newnode;
2300 RT_NODE_48 *new48;
2301 int slot_idx = 0;
2302
2303 /* initialize new node */
2305 new48 = (RT_NODE_48 *) newnode.local;
2306
2307 /* copy over the entries */
2308 RT_COPY_COMMON(newnode, node);
2309 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
2310 {
2311 if (RT_NODE_256_IS_CHUNK_USED(n256, i))
2312 {
2313 new48->slot_idxs[i] = slot_idx;
2314 new48->children[slot_idx] = n256->children[i];
2315 slot_idx++;
2316 }
2317 }
2318
2319 /*
2320 * Since we just copied a dense array, we can fill "isset" using a single
2321 * store, provided the length of that array is at most the number of bits
2322 * in a bitmapword.
2323 */
2325 new48->isset[0] = (bitmapword) (((uint64) 1 << n256->base.count) - 1);
2326
2327 /* free old node and update reference in parent */
2328 *parent_slot = newnode.alloc;
2329 RT_FREE_NODE(tree, node);
2330}
2331
2332static inline void
2334{
2335 int shrink_threshold;
2336 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
2337 int idx = RT_BM_IDX(chunk);
2338 int bitnum = RT_BM_BIT(chunk);
2339
2340 /* Mark the slot free for "chunk" */
2341 n256->isset[idx] &= ~((bitmapword) 1 << bitnum);
2342
2343 n256->base.count--;
2344
2345 /*
2346 * A full node256 will have a count of zero because of overflow, so we
2347 * delete first before checking the shrink threshold.
2348 */
2349 Assert(n256->base.count > 0);
2350
2351 /* This simplifies RT_SHRINK_NODE_256() */
2352 shrink_threshold = BITS_PER_BITMAPWORD;
2353 shrink_threshold = Min(RT_FANOUT_48 / 4 * 3, shrink_threshold);
2354
2355 if (n256->base.count <= shrink_threshold)
2356 RT_SHRINK_NODE_256(tree, parent_slot, node, chunk);
2357}
2358
2359/*
2360 * Move contents of a node48 to a node16. Any deletion should have happened
2361 * in the caller.
2362 */
2363static void pg_noinline
2365{
2366 RT_NODE_48 *n48 = (RT_NODE_48 *) (node.local);
2367 RT_CHILD_PTR newnode;
2368 RT_NODE_16 *new16;
2369 int destidx = 0;
2370
2371 /*
2372 * Initialize new node. For now we skip the larger node16 size class for
2373 * simplicity.
2374 */
2376 new16 = (RT_NODE_16 *) newnode.local;
2377
2378 /* copy over all existing entries */
2379 RT_COPY_COMMON(newnode, node);
2380 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
2381 {
2382 if (n48->slot_idxs[chunk] != RT_INVALID_SLOT_IDX)
2383 {
2384 new16->chunks[destidx] = chunk;
2385 new16->children[destidx] = n48->children[n48->slot_idxs[chunk]];
2386 destidx++;
2387 }
2388 }
2389
2390 Assert(destidx < new16->base.fanout);
2391
2392 RT_VERIFY_NODE((RT_NODE *) new16);
2393
2394 /* free old node and update reference in parent */
2395 *parent_slot = newnode.alloc;
2396 RT_FREE_NODE(tree, node);
2397}
2398
2399static inline void
2401{
2402 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
2403 int deletepos = n48->slot_idxs[chunk];
2404
2405 /* For now we skip the larger node16 size class for simplicity */
2406 int shrink_threshold = RT_FANOUT_16_LO / 4 * 3;
2407 int idx;
2408 int bitnum;
2409
2410 Assert(deletepos != RT_INVALID_SLOT_IDX);
2411
2412 idx = RT_BM_IDX(deletepos);
2413 bitnum = RT_BM_BIT(deletepos);
2414 n48->isset[idx] &= ~((bitmapword) 1 << bitnum);
2415 n48->slot_idxs[chunk] = RT_INVALID_SLOT_IDX;
2416
2417 n48->base.count--;
2418
2419 /*
2420 * To keep shrinking simple, do it after deleting, which is fast for
2421 * node48 anyway.
2422 */
2423 if (n48->base.count <= shrink_threshold)
2424 RT_SHRINK_NODE_48(tree, parent_slot, node, chunk);
2425}
2426
2427/*
2428 * Move contents of a node16 to a node4, and delete the one at "deletepos".
2429 * By deleting as we move, we can avoid memmove operations in the new
2430 * node.
2431 */
2432static void pg_noinline
2433RT_SHRINK_NODE_16(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 deletepos)
2434{
2435 RT_NODE_16 *n16 = (RT_NODE_16 *) (node.local);
2436 RT_CHILD_PTR newnode;
2437 RT_NODE_4 *new4;
2438
2439 /* initialize new node */
2441 new4 = (RT_NODE_4 *) newnode.local;
2442
2443 /* copy over existing entries, except for the one at "deletepos" */
2444 RT_COPY_COMMON(newnode, node);
2446 n16->chunks, n16->children,
2447 n16->base.count, deletepos);
2448
2449 new4->base.count--;
2450 RT_VERIFY_NODE((RT_NODE *) new4);
2451
2452 /* free old node and update reference in parent */
2453 *parent_slot = newnode.alloc;
2454 RT_FREE_NODE(tree, node);
2455}
2456
2457static inline void
2459{
2460 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
2461 int deletepos = slot - n16->children;
2462
2463 /*
2464 * When shrinking to node4, 4 is hard-coded. After shrinking, the new node
2465 * will end up with 3 elements and 3 is the largest count where linear
2466 * search is faster than SIMD, at least on x86-64.
2467 */
2468 if (n16->base.count <= 4)
2469 {
2470 RT_SHRINK_NODE_16(tree, parent_slot, node, deletepos);
2471 return;
2472 }
2473
2474 Assert(deletepos >= 0);
2475 Assert(n16->chunks[deletepos] == chunk);
2476
2478 n16->base.count, deletepos);
2479 n16->base.count--;
2480}
2481
2482static inline void
2483RT_REMOVE_CHILD_4(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk, RT_PTR_ALLOC * slot)
2484{
2485 RT_NODE_4 *n4 = (RT_NODE_4 *) node.local;
2486
2487 if (n4->base.count == 1)
2488 {
2489 Assert(n4->chunks[0] == chunk);
2490
2491 /*
2492 * If we're deleting the last entry from the root child node don't
2493 * free it, but mark both the tree and the root child node empty. That
2494 * way, RT_SET can assume it exists.
2495 */
2496 if (parent_slot == &tree->ctl->root)
2497 {
2498 n4->base.count = 0;
2499 tree->ctl->start_shift = 0;
2500 tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(0);
2501 }
2502 else
2503 {
2504 /*
2505 * Deleting last entry, so just free the entire node.
2506 * RT_DELETE_RECURSIVE has already freed the value and lower-level
2507 * children.
2508 */
2509 RT_FREE_NODE(tree, node);
2510
2511 /*
2512 * Also null out the parent's slot -- this tells the next higher
2513 * level to delete its child pointer
2514 */
2515 *parent_slot = RT_INVALID_PTR_ALLOC;
2516 }
2517 }
2518 else
2519 {
2520 int deletepos = slot - n4->children;
2521
2522 Assert(deletepos >= 0);
2523 Assert(n4->chunks[deletepos] == chunk);
2524
2526 n4->base.count, deletepos);
2527
2528 n4->base.count--;
2529 }
2530}
2531
2532/*
2533 * Delete the child pointer corresponding to "key" in the given node.
2534 */
2535static inline void
2536RT_NODE_DELETE(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk, RT_PTR_ALLOC * slot)
2537{
2538 switch ((node.local)->kind)
2539 {
2540 case RT_NODE_KIND_4:
2541 {
2542 RT_REMOVE_CHILD_4(tree, parent_slot, node, chunk, slot);
2543 return;
2544 }
2545 case RT_NODE_KIND_16:
2546 {
2547 RT_REMOVE_CHILD_16(tree, parent_slot, node, chunk, slot);
2548 return;
2549 }
2550 case RT_NODE_KIND_48:
2551 {
2552 RT_REMOVE_CHILD_48(tree, parent_slot, node, chunk);
2553 return;
2554 }
2555 case RT_NODE_KIND_256:
2556 {
2557 RT_REMOVE_CHILD_256(tree, parent_slot, node, chunk);
2558 return;
2559 }
2560 default:
2562 }
2563}
2564
2565/* workhorse for RT_DELETE */
2566static bool
2567RT_DELETE_RECURSIVE(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, uint64 key, int shift)
2568{
2569 RT_PTR_ALLOC *slot;
2570 RT_CHILD_PTR node;
2571 uint8 chunk = RT_GET_KEY_CHUNK(key, shift);
2572
2573 node.alloc = *parent_slot;
2574 RT_PTR_SET_LOCAL(tree, &node);
2575 slot = RT_NODE_SEARCH(node.local, chunk);
2576
2577 if (slot == NULL)
2578 return false;
2579
2580 if (shift == 0)
2581 {
2582 if (!RT_CHILDPTR_IS_VALUE(*slot))
2583 RT_FREE_LEAF(tree, *slot);
2584
2585 RT_NODE_DELETE(tree, parent_slot, node, chunk, slot);
2586 return true;
2587 }
2588 else
2589 {
2590 bool deleted;
2591
2592 deleted = RT_DELETE_RECURSIVE(tree, slot, key, shift - RT_SPAN);
2593
2594 /* Child node was freed, so delete its slot now */
2595 if (*slot == RT_INVALID_PTR_ALLOC)
2596 {
2597 Assert(deleted);
2598 RT_NODE_DELETE(tree, parent_slot, node, chunk, slot);
2599 }
2600
2601 return deleted;
2602 }
2603}
2604
2605/*
2606 * Delete the given key from the radix tree. If the key is found delete it
2607 * and return true, otherwise do nothing and return false.
2608 *
2609 * Taking a lock in exclusive mode is the caller's responsibility.
2610 */
2611RT_SCOPE bool
2612RT_DELETE(RT_RADIX_TREE * tree, uint64 key)
2613{
2614 bool deleted;
2615
2616#ifdef RT_SHMEM
2617 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
2618#endif
2619
2620 if (key > tree->ctl->max_val)
2621 return false;
2622
2623 Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
2624 deleted = RT_DELETE_RECURSIVE(tree, &tree->ctl->root,
2625 key, tree->ctl->start_shift);
2626
2627 /* Found the key to delete. Update the statistics */
2628 if (deleted)
2629 {
2630 tree->ctl->num_keys--;
2631 Assert(tree->ctl->num_keys >= 0);
2632 }
2633
2634 return deleted;
2635}
2636
2637#endif /* RT_USE_DELETE */
2638
2639/***************** UTILITY FUNCTIONS *****************/
2640
2641/*
2642 * Return the statistics of the amount of memory used by the radix tree.
2643 *
2644 * Since dsa_get_total_size() does appropriate locking, the caller doesn't
2645 * need to take a lock.
2646 */
2649{
2650 size_t total = 0;
2651
2652#ifdef RT_SHMEM
2653 Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
2654 total = dsa_get_total_size(tree->dsa);
2655#else
2656 total = MemoryContextMemAllocated(tree->leaf_context, true);
2657#endif
2658
2659 return total;
2660}
2661
2662/*
2663 * Perform some sanity checks on the given node.
2664 */
2665static void
2667{
2668#ifdef USE_ASSERT_CHECKING
2669
2670 switch (node->kind)
2671 {
2672 case RT_NODE_KIND_4:
2673 {
2674 RT_NODE_4 *n4 = (RT_NODE_4 *) node;
2675
2676 /* RT_DUMP_NODE(node); */
2677
2678 for (int i = 1; i < n4->base.count; i++)
2679 Assert(n4->chunks[i - 1] < n4->chunks[i]);
2680
2681 break;
2682 }
2683 case RT_NODE_KIND_16:
2684 {
2685 RT_NODE_16 *n16 = (RT_NODE_16 *) node;
2686
2687 /* RT_DUMP_NODE(node); */
2688
2689 for (int i = 1; i < n16->base.count; i++)
2690 Assert(n16->chunks[i - 1] < n16->chunks[i]);
2691
2692 break;
2693 }
2694 case RT_NODE_KIND_48:
2695 {
2696 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
2697 int cnt = 0;
2698
2699 /* RT_DUMP_NODE(node); */
2700
2701 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
2702 {
2703 uint8 slot = n48->slot_idxs[i];
2704 int idx = RT_BM_IDX(slot);
2705 int bitnum = RT_BM_BIT(slot);
2706
2707 if (!RT_NODE_48_IS_CHUNK_USED(n48, i))
2708 continue;
2709
2710 /* Check if the corresponding slot is used */
2711 Assert(slot < node->fanout);
2712 Assert((n48->isset[idx] & ((bitmapword) 1 << bitnum)) != 0);
2713
2714 cnt++;
2715 }
2716
2717 Assert(n48->base.count == cnt);
2718
2719 break;
2720 }
2721 case RT_NODE_KIND_256:
2722 {
2723 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
2724 int cnt = 0;
2725
2726 /* RT_DUMP_NODE(node); */
2727
2728 for (int i = 0; i < RT_BM_IDX(RT_NODE_MAX_SLOTS); i++)
2729 cnt += bmw_popcount(n256->isset[i]);
2730
2731 /*
2732 * Check if the number of used chunk matches, accounting for
2733 * overflow
2734 */
2735 if (cnt == RT_FANOUT_256)
2736 Assert(n256->base.count == 0);
2737 else
2738 Assert(n256->base.count == cnt);
2739
2740 break;
2741 }
2742 }
2743#endif
2744}
2745
2746/***************** DEBUG FUNCTIONS *****************/
2747
2748#ifdef RT_DEBUG
2749
2750/*
2751 * Print out tree stats, some of which are only collected in debugging builds.
2752 */
2753RT_SCOPE void
2755{
2756 fprintf(stderr, "max_val = " UINT64_FORMAT "\n", tree->ctl->max_val);
2757 fprintf(stderr, "num_keys = %" PRId64 "\n", tree->ctl->num_keys);
2758
2759#ifdef RT_SHMEM
2760 fprintf(stderr, "handle = " DSA_POINTER_FORMAT "\n", tree->ctl->handle);
2761#endif
2762
2763 fprintf(stderr, "height = %d", tree->ctl->start_shift / RT_SPAN);
2764
2765 for (int i = 0; i < RT_NUM_SIZE_CLASSES; i++)
2766 {
2768
2769 fprintf(stderr, ", n%d = %" PRId64, size_class.fanout, tree->ctl->num_nodes[i]);
2770 }
2771
2772 fprintf(stderr, ", leaves = %" PRId64, tree->ctl->num_leaves);
2773
2774 fprintf(stderr, "\n");
2775}
2776
2777/*
2778 * Print out debugging information about the given node.
2779 */
2780static void
2782RT_DUMP_NODE(RT_NODE * node)
2783{
2784#ifdef RT_SHMEM
2785#define RT_CHILD_PTR_FORMAT DSA_POINTER_FORMAT
2786#else
2787#define RT_CHILD_PTR_FORMAT "%p"
2788#endif
2789
2790 fprintf(stderr, "kind %d, fanout %d, count %u\n",
2791 (node->kind == RT_NODE_KIND_4) ? 4 :
2792 (node->kind == RT_NODE_KIND_16) ? 16 :
2793 (node->kind == RT_NODE_KIND_48) ? 48 : 256,
2794 node->fanout == 0 ? 256 : node->fanout,
2795 node->count == 0 ? 256 : node->count);
2796
2797 switch (node->kind)
2798 {
2799 case RT_NODE_KIND_4:
2800 {
2801 RT_NODE_4 *n4 = (RT_NODE_4 *) node;
2802
2803 fprintf(stderr, "chunks and slots:\n");
2804 for (int i = 0; i < n4->base.count; i++)
2805 {
2806 fprintf(stderr, " [%d] chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
2807 i, n4->chunks[i], n4->children[i]);
2808 }
2809
2810 break;
2811 }
2812 case RT_NODE_KIND_16:
2813 {
2814 RT_NODE_16 *n16 = (RT_NODE_16 *) node;
2815
2816 fprintf(stderr, "chunks and slots:\n");
2817 for (int i = 0; i < n16->base.count; i++)
2818 {
2819 fprintf(stderr, " [%d] chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
2820 i, n16->chunks[i], n16->children[i]);
2821 }
2822 break;
2823 }
2824 case RT_NODE_KIND_48:
2825 {
2826 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
2827 char *sep = "";
2828
2829 fprintf(stderr, "slot_idxs: \n");
2830 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
2831 {
2832 if (!RT_NODE_48_IS_CHUNK_USED(n48, chunk))
2833 continue;
2834
2835 fprintf(stderr, " idx[%d] = %d\n",
2836 chunk, n48->slot_idxs[chunk]);
2837 }
2838
2839 fprintf(stderr, "isset-bitmap: ");
2840 for (int i = 0; i < (RT_FANOUT_48_MAX / BITS_PER_BYTE); i++)
2841 {
2842 fprintf(stderr, "%s%x", sep, ((uint8 *) n48->isset)[i]);
2843 sep = " ";
2844 }
2845 fprintf(stderr, "\n");
2846
2847 fprintf(stderr, "chunks and slots:\n");
2848 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
2849 {
2850 if (!RT_NODE_48_IS_CHUNK_USED(n48, chunk))
2851 continue;
2852
2853 fprintf(stderr, " chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
2854 chunk,
2855 *RT_NODE_48_GET_CHILD(n48, chunk));
2856 }
2857 break;
2858 }
2859 case RT_NODE_KIND_256:
2860 {
2861 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
2862 char *sep = "";
2863
2864 fprintf(stderr, "isset-bitmap: ");
2865 for (int i = 0; i < (RT_FANOUT_256 / BITS_PER_BYTE); i++)
2866 {
2867 fprintf(stderr, "%s%x", sep, ((uint8 *) n256->isset)[i]);
2868 sep = " ";
2869 }
2870 fprintf(stderr, "\n");
2871
2872 fprintf(stderr, "chunks and slots:\n");
2873 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
2874 {
2875 if (!RT_NODE_256_IS_CHUNK_USED(n256, chunk))
2876 continue;
2877
2878 fprintf(stderr, " chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
2879 chunk,
2880 *RT_NODE_256_GET_CHILD(n256, chunk));
2881 }
2882 break;
2883 }
2884 }
2885}
2886#endif /* RT_DEBUG */
2887
2888#endif /* RT_DEFINE */
2889
2890
2891/* undefine external parameters, so next radix tree can be defined */
2892#undef RT_PREFIX
2893#undef RT_SCOPE
2894#undef RT_DECLARE
2895#undef RT_DEFINE
2896#undef RT_VALUE_TYPE
2897#undef RT_VARLEN_VALUE_SIZE
2898#undef RT_RUNTIME_EMBEDDABLE_VALUE
2899#undef RT_SHMEM
2900#undef RT_USE_DELETE
2901#undef RT_DEBUG
2902
2903/* locally declared macros */
2904#undef RT_MAKE_PREFIX
2905#undef RT_MAKE_NAME
2906#undef RT_MAKE_NAME_
2907#undef RT_STR
2908#undef RT_STR_
2909#undef RT_SPAN
2910#undef RT_NODE_MAX_SLOTS
2911#undef RT_CHUNK_MASK
2912#undef RT_MAX_SHIFT
2913#undef RT_MAX_LEVEL
2914#undef RT_GET_KEY_CHUNK
2915#undef RT_BM_IDX
2916#undef RT_BM_BIT
2917#undef RT_NODE_MUST_GROW
2918#undef RT_NODE_KIND_COUNT
2919#undef RT_NUM_SIZE_CLASSES
2920#undef RT_INVALID_SLOT_IDX
2921#undef RT_SLAB_BLOCK_SIZE
2922#undef RT_RADIX_TREE_MAGIC
2923#undef RT_CHILD_PTR_FORMAT
2924
2925/* type declarations */
2926#undef RT_RADIX_TREE
2927#undef RT_RADIX_TREE_CONTROL
2928#undef RT_CHILD_PTR
2929#undef RT_PTR_ALLOC
2930#undef RT_INVALID_PTR_ALLOC
2931#undef RT_HANDLE
2932#undef RT_ITER
2933#undef RT_NODE
2934#undef RT_NODE_ITER
2935#undef RT_NODE_KIND_4
2936#undef RT_NODE_KIND_16
2937#undef RT_NODE_KIND_48
2938#undef RT_NODE_KIND_256
2939#undef RT_NODE_4
2940#undef RT_NODE_16
2941#undef RT_NODE_48
2942#undef RT_NODE_256
2943#undef RT_SIZE_CLASS
2944#undef RT_SIZE_CLASS_ELEM
2945#undef RT_SIZE_CLASS_INFO
2946#undef RT_CLASS_4
2947#undef RT_CLASS_16_LO
2948#undef RT_CLASS_16_HI
2949#undef RT_CLASS_48
2950#undef RT_CLASS_256
2951#undef RT_FANOUT_4
2952#undef RT_FANOUT_4_MAX
2953#undef RT_FANOUT_16_LO
2954#undef RT_FANOUT_16_HI
2955#undef RT_FANOUT_16_MAX
2956#undef RT_FANOUT_48
2957#undef RT_FANOUT_48_MAX
2958#undef RT_FANOUT_256
2959
2960/* function declarations */
2961#undef RT_CREATE
2962#undef RT_FREE
2963#undef RT_ATTACH
2964#undef RT_DETACH
2965#undef RT_LOCK_EXCLUSIVE
2966#undef RT_LOCK_SHARE
2967#undef RT_UNLOCK
2968#undef RT_GET_HANDLE
2969#undef RT_FIND
2970#undef RT_SET
2971#undef RT_BEGIN_ITERATE
2972#undef RT_ITERATE_NEXT
2973#undef RT_END_ITERATE
2974#undef RT_DELETE
2975#undef RT_MEMORY_USAGE
2976#undef RT_DUMP_NODE
2977#undef RT_STATS
2978
2979/* internal helper functions */
2980#undef RT_GET_VALUE_SIZE
2981#undef RT_VALUE_IS_EMBEDDABLE
2982#undef RT_CHILDPTR_IS_VALUE
2983#undef RT_GET_SLOT_RECURSIVE
2984#undef RT_DELETE_RECURSIVE
2985#undef RT_ALLOC_NODE
2986#undef RT_ALLOC_LEAF
2987#undef RT_FREE_NODE
2988#undef RT_FREE_LEAF
2989#undef RT_FREE_RECURSE
2990#undef RT_EXTEND_UP
2991#undef RT_EXTEND_DOWN
2992#undef RT_COPY_COMMON
2993#undef RT_PTR_SET_LOCAL
2994#undef RT_PTR_ALLOC_IS_VALID
2995#undef RT_NODE_16_SEARCH_EQ
2996#undef RT_NODE_4_GET_INSERTPOS
2997#undef RT_NODE_16_GET_INSERTPOS
2998#undef RT_SHIFT_ARRAYS_FOR_INSERT
2999#undef RT_SHIFT_ARRAYS_AND_DELETE
3000#undef RT_COPY_ARRAYS_FOR_INSERT
3001#undef RT_COPY_ARRAYS_AND_DELETE
3002#undef RT_NODE_48_IS_CHUNK_USED
3003#undef RT_NODE_48_GET_CHILD
3004#undef RT_NODE_256_IS_CHUNK_USED
3005#undef RT_NODE_256_GET_CHILD
3006#undef RT_KEY_GET_SHIFT
3007#undef RT_SHIFT_GET_MAX_VAL
3008#undef RT_NODE_SEARCH
3009#undef RT_ADD_CHILD_4
3010#undef RT_ADD_CHILD_16
3011#undef RT_ADD_CHILD_48
3012#undef RT_ADD_CHILD_256
3013#undef RT_GROW_NODE_4
3014#undef RT_GROW_NODE_16
3015#undef RT_GROW_NODE_48
3016#undef RT_REMOVE_CHILD_4
3017#undef RT_REMOVE_CHILD_16
3018#undef RT_REMOVE_CHILD_48
3019#undef RT_REMOVE_CHILD_256
3020#undef RT_SHRINK_NODE_16
3021#undef RT_SHRINK_NODE_48
3022#undef RT_SHRINK_NODE_256
3023#undef RT_NODE_DELETE
3024#undef RT_NODE_INSERT
3025#undef RT_NODE_ITERATE_NEXT
3026#undef RT_VERIFY_NODE
Datum idx(PG_FUNCTION_ARGS)
Definition: _int_op.c:262
#define bmw_rightmost_one_pos(w)
Definition: bitmapset.h:79
uint32 bitmapword
Definition: bitmapset.h:44
#define BITS_PER_BITMAPWORD
Definition: bitmapset.h:43
#define bmw_popcount(w)
Definition: bitmapset.h:80
#define pg_noinline
Definition: c.h:286
#define Min(x, y)
Definition: c.h:975
#define pg_attribute_unused()
Definition: c.h:133
uint8_t uint8
Definition: c.h:500
int64_t int64
Definition: c.h:499
#define FLEXIBLE_ARRAY_MEMBER
Definition: c.h:434
#define UINT64_FORMAT
Definition: c.h:521
uint64_t uint64
Definition: c.h:503
#define pg_unreachable()
Definition: c.h:332
#define unlikely(x)
Definition: c.h:347
uint32_t uint32
Definition: c.h:502
#define UINT64CONST(x)
Definition: c.h:517
#define fprintf(file, fmt, msg)
Definition: cubescan.l:21
void * dsa_get_address(dsa_area *area, dsa_pointer dp)
Definition: dsa.c:942
size_t dsa_get_total_size(dsa_area *area)
Definition: dsa.c:1027
void dsa_free(dsa_area *area, dsa_pointer dp)
Definition: dsa.c:826
#define dsa_allocate0(area, size)
Definition: dsa.h:113
uint64 dsa_pointer
Definition: dsa.h:62
#define DSA_POINTER_FORMAT
Definition: dsa.h:69
#define dsa_allocate(area, size)
Definition: dsa.h:109
Assert(PointerIsAligned(start, uint64))
int i
Definition: isn.c:77
if(TABLE==NULL||TABLE_index==NULL)
Definition: isn.c:81
bool LWLockAcquire(LWLock *lock, LWLockMode mode)
Definition: lwlock.c:1182
void LWLockRelease(LWLock *lock)
Definition: lwlock.c:1902
void LWLockInitialize(LWLock *lock, int tranche_id)
Definition: lwlock.c:721
@ LW_SHARED
Definition: lwlock.h:115
@ LW_EXCLUSIVE
Definition: lwlock.h:114
void * MemoryContextAlloc(MemoryContext context, Size size)
Definition: mcxt.c:1256
void MemoryContextReset(MemoryContext context)
Definition: mcxt.c:414
void pfree(void *pointer)
Definition: mcxt.c:2147
void * palloc0(Size size)
Definition: mcxt.c:1970
Size MemoryContextMemAllocated(MemoryContext context, bool recurse)
Definition: mcxt.c:793
static int pg_rightmost_one_pos32(uint32 word)
Definition: pg_bitutils.h:111
static int pg_leftmost_one_pos64(uint64 word)
Definition: pg_bitutils.h:72
#define BITS_PER_BYTE
#define RT_FIND
Definition: radixtree.h:177
#define RT_ADD_CHILD_48
Definition: radixtree.h:229
#define RT_NODE_KIND_16
Definition: radixtree.h:361
#define RT_NODE_ITERATE_NEXT
Definition: radixtree.h:241
#define RT_SPAN
Definition: radixtree.h:314
#define RT_CHILD_PTR
Definition: radixtree.h:252
#define RT_STATS
Definition: radixtree.h:195
#define RT_NODE_4_GET_INSERTPOS
Definition: radixtree.h:212
#define RT_NODE_ITER
Definition: radixtree.h:253
#define RT_ALLOC_LEAF
Definition: radixtree.h:203
#define RT_NODE_SEARCH
Definition: radixtree.h:224
#define RT_INVALID_SLOT_IDX
Definition: radixtree.h:557
#define RT_CHILDPTR_IS_VALUE
Definition: radixtree.h:198
#define RT_REMOVE_CHILD_16
Definition: radixtree.h:235
#define RT_VALUE_IS_EMBEDDABLE
Definition: radixtree.h:199
#define RT_ALLOC_NODE
Definition: radixtree.h:202
RT_SCOPE RT_RADIX_TREE *RT_CHILD_PTR rootnode
Definition: radixtree.h:1823
#define RT_FREE_RECURSE
Definition: radixtree.h:206
#define RT_NODE_16_GET_INSERTPOS
Definition: radixtree.h:213
#define RT_NODE_48_GET_CHILD
Definition: radixtree.h:219
#define RT_ITERATE_NEXT
Definition: radixtree.h:188
tree ctl start_shift
Definition: radixtree.h:1858
#define RT_END_ITERATE
Definition: radixtree.h:189
#define RT_BEGIN_ITERATE
Definition: radixtree.h:187
tree
Definition: radixtree.h:1828
#define RT_GROW_NODE_4
Definition: radixtree.h:231
#define RT_NODE_MAX_SLOTS
Definition: radixtree.h:320
#define RT_PTR_SET_LOCAL
Definition: radixtree.h:210
#define RT_COPY_ARRAYS_FOR_INSERT
Definition: radixtree.h:216
#define RT_DELETE_RECURSIVE
Definition: radixtree.h:201
#define RT_STR(s)
Definition: radixtree.h:171
#define RT_GET_SLOT_RECURSIVE
Definition: radixtree.h:200
for(int i=0;i< RT_NUM_SIZE_CLASSES;i++)
Definition: radixtree.h:1841
#define RT_SHRINK_NODE_256
Definition: radixtree.h:240
#define RT_SHRINK_NODE_48
Definition: radixtree.h:239
#define RT_EXTEND_DOWN
Definition: radixtree.h:208
#define RT_ADD_CHILD_16
Definition: radixtree.h:228
#define RT_SIZE_CLASS_ELEM
Definition: radixtree.h:259
#define RT_NODE_KIND_48
Definition: radixtree.h:362
#define RT_NODE_KIND_256
Definition: radixtree.h:363
#define RT_NODE
Definition: radixtree.h:251
#define RT_CREATE
Definition: radixtree.h:175
#define RT_REMOVE_CHILD_256
Definition: radixtree.h:237
#define RT_NODE_4
Definition: radixtree.h:254
#define RT_NODE_256
Definition: radixtree.h:257
#define RT_SIZE_CLASS
Definition: radixtree.h:258
#define RT_RADIX_TREE_CONTROL
Definition: radixtree.h:246
#define RT_NODE_MUST_GROW(node)
Definition: radixtree.h:1131
#define RT_COPY_COMMON
Definition: radixtree.h:209
#define RT_FREE
Definition: radixtree.h:176
#define RT_NODE_48
Definition: radixtree.h:256
#define RT_SLAB_BLOCK_SIZE(size)
Definition: radixtree.h:370
#define RT_FREE_NODE
Definition: radixtree.h:204
#define RT_CLASS_16_LO
Definition: radixtree.h:262
#define RT_MAX_SHIFT
Definition: radixtree.h:326
#define RT_SHIFT_GET_MAX_VAL
Definition: radixtree.h:223
#define RT_MEMORY_USAGE
Definition: radixtree.h:193
#define RT_NODE_48_IS_CHUNK_USED
Definition: radixtree.h:218
#define RT_BM_BIT(x)
Definition: radixtree.h:336
#define RT_CHUNK_MASK
Definition: radixtree.h:323
#define RT_FANOUT_256
Definition: radixtree.h:506
#define RT_COPY_ARRAYS_AND_DELETE
Definition: radixtree.h:217
#define RT_CLASS_256
Definition: radixtree.h:265
#define RT_ADD_CHILD_4
Definition: radixtree.h:227
#define RT_REMOVE_CHILD_48
Definition: radixtree.h:236
#define RT_DUMP_NODE
Definition: radixtree.h:194
#define RT_ADD_CHILD_256
Definition: radixtree.h:230
#define RT_KEY_GET_SHIFT
Definition: radixtree.h:222
#define RT_SHIFT_ARRAYS_AND_DELETE
Definition: radixtree.h:215
#define RT_NODE_DELETE
Definition: radixtree.h:225
RT_SIZE_CLASS
Definition: radixtree.h:633
#define RT_NODE_16
Definition: radixtree.h:255
#define RT_NODE_INSERT
Definition: radixtree.h:226
tree ctl root
Definition: radixtree.h:1857
#define RT_MAX_LEVEL
Definition: radixtree.h:329
#define RT_GROW_NODE_16
Definition: radixtree.h:232
#define RT_CLASS_16_HI
Definition: radixtree.h:263
#define RT_NODE_256_GET_CHILD
Definition: radixtree.h:221
StaticAssertDecl(RT_FANOUT_4<=RT_FANOUT_4_MAX, "watch struct padding")
#define RT_SHRINK_NODE_16
Definition: radixtree.h:238
#define RT_NODE_KIND_4
Definition: radixtree.h:360
#define RT_PTR_ALLOC_IS_VALID(ptr)
Definition: radixtree.h:406
#define RT_GET_KEY_CHUNK(key, shift)
Definition: radixtree.h:332
#define RT_VERIFY_NODE
Definition: radixtree.h:242
#define RT_FANOUT_16_HI
Definition: radixtree.h:602
#define RT_GET_VALUE_SIZE(v)
Definition: radixtree.h:433
#define RT_GROW_NODE_48
Definition: radixtree.h:233
#define RT_PTR_ALLOC
Definition: radixtree.h:404
#define RT_NODE_256_IS_CHUNK_USED
Definition: radixtree.h:220
#define RT_FANOUT_16_MAX
Definition: radixtree.h:494
#define RT_SHIFT_ARRAYS_FOR_INSERT
Definition: radixtree.h:214
#define RT_FANOUT_48
Definition: radixtree.h:603
#define RT_NODE_16_SEARCH_EQ
Definition: radixtree.h:211
#define RT_BM_IDX(x)
Definition: radixtree.h:335
#define RT_SIZE_CLASS_INFO
Definition: radixtree.h:260
#define RT_EXTEND_UP
Definition: radixtree.h:207
#define RT_FANOUT_48_MAX
Definition: radixtree.h:504
#define RT_SET
Definition: radixtree.h:186
#define RT_FREE_LEAF
Definition: radixtree.h:205
#define RT_FANOUT_4_MAX
Definition: radixtree.h:491
#define RT_FANOUT_4
Definition: radixtree.h:613
#define RT_CLASS_48
Definition: radixtree.h:264
#define RT_FANOUT_16_LO
Definition: radixtree.h:600
#define RT_INVALID_PTR_ALLOC
Definition: radixtree.h:405
#define RT_REMOVE_CHILD_4
Definition: radixtree.h:234
#define RT_NUM_SIZE_CLASSES
Definition: radixtree.h:678
#define RT_CLASS_4
Definition: radixtree.h:261
static Vector8 vector8_broadcast(const uint8 c)
Definition: simd.h:135
static void vector8_load(Vector8 *v, const uint8 *s)
Definition: simd.h:108
uint64 Vector8
Definition: simd.h:60
MemoryContext SlabContextCreate(MemoryContext parent, const char *name, Size blockSize, Size chunkSize)
Definition: slab.c:322
void check_stack_depth(void)
Definition: stack_depth.c:95
Definition: lwlock.h:42
RT_NODE_ITER node_iters[RT_MAX_LEVEL]
Definition: radixtree.h:751
int cur_level
Definition: radixtree.h:753
RT_RADIX_TREE * tree
Definition: radixtree.h:745
uint64 key
Definition: radixtree.h:756
int top_level
Definition: radixtree.h:752
RT_NODE base
Definition: radixtree.h:530
RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER]
Definition: radixtree.h:535
uint8 chunks[RT_FANOUT_16_MAX]
Definition: radixtree.h:532
RT_PTR_ALLOC children[RT_FANOUT_256]
Definition: radixtree.h:576
RT_NODE base
Definition: radixtree.h:570
bitmapword isset[RT_BM_IDX(RT_FANOUT_256)]
Definition: radixtree.h:573
RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER]
Definition: radixtree.h:560
uint8 slot_idxs[RT_NODE_MAX_SLOTS]
Definition: radixtree.h:554
RT_NODE base
Definition: radixtree.h:544
bitmapword isset[RT_BM_IDX(RT_FANOUT_48_MAX)]
Definition: radixtree.h:547
RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER]
Definition: radixtree.h:525
RT_NODE base
Definition: radixtree.h:520
uint8 chunks[RT_FANOUT_4_MAX]
Definition: radixtree.h:522
RT_CHILD_PTR node
Definition: radixtree.h:732
uint8 kind
Definition: radixtree.h:377
uint8 count
Definition: radixtree.h:394
uint8 fanout
Definition: radixtree.h:386
RT_PTR_ALLOC root
Definition: radixtree.h:694
MemoryContextData * node_slabs[RT_NUM_SIZE_CLASSES]
Definition: radixtree.h:715
MemoryContextData * leaf_context
Definition: radixtree.h:718
RT_RADIX_TREE_CONTROL * ctl
Definition: radixtree.h:710
const char * name
Definition: radixtree.h:644
Definition: dsa.c:348
Definition: type.h:96
#define RT_VALUE_TYPE
Definition: tidstore.c:106
#define RT_SCOPE
Definition: tidstore.c:103
#define RT_PREFIX
Definition: tidstore.c:101
RT_NODE * local
Definition: radixtree.h:421
RT_PTR_ALLOC alloc
Definition: radixtree.h:420
Datum bit(PG_FUNCTION_ARGS)
Definition: varbit.c:391