Dynamic perfect hashing: Difference between revisions

Browse history interactively ← Previous editContent deleted Content addedVisual WikitextInline

Revision as of 13:36, 11 December 2014 edit132.199.34.102 (talk)No edit summary← Previous edit		Latest revision as of 18:23, 23 December 2024 edit undoLPhnTrDc (talk \| contribs)13 edits Edit `load factor` hyperlink
(25 intermediate revisions by 19 users not shown)
Line 1:		Line 1:
			{{Short description\|Programming technique for resolving duplicate hash values in a hash table data structure}}
	In ], '''dynamic perfect hashing''' is a programming technique for resolving ] in a ] ].<ref name="inventor">Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#</ref><ref name="dietzfelbinger">Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#</ref><ref>Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf</ref> This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.		In ], '''dynamic perfect hashing''' is a programming technique for resolving ] in a ] ].<ref name="inventor">Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#</ref><ref name="dietzfelbinger">Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994.
			{{Webarchive\|url=https://web.archive.org/web/20160304094014/http://www.arl.wustl.edu/~sailesh/download_files/Limited_Edition/hash/Dynamic%20Perfect%20Hashing-%20Upper%20and%20Lower%20Bounds.pdf \|date=2016-03-04 }}.
			SIAM J. Comput. 23, 4 (Aug. 1994), 738-761.
			http://portal.acm.org/citation.cfm?id=182370
			{{doi\|10.1137/S0097539791194094}}</ref><ref>
			Erik Demaine, Jeff Lind.
			.
			MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003.
			</ref>
			While more memory-intensive than its hash table counterparts,{{citation needed\|date=April 2015}} this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

	==Details==		==Details==

			=== Static case ===
	In this method, the entries that hash to the same slot of the table are organized as a separate second-level hash table. If there are ''k'' entries in this set ''S'', the second-level table is allocated with ''k''<sup>2</sup> slots, and its ] is selected at random from a ] set so that it is collision-free (i.e. a ]). Therefore, the look-up cost is guaranteed to be ] ].<ref name="dietzfelbinger"/>

			====FKS Scheme====
			{{main \| static hashing#FKS Hashing }}

			The problem of optimal ] was first solved in general by Fredman, Komlós and Szemerédi.<ref>{{cite web\|last1=Yap\|first1=Chee\|title=Universal Construction for the FKS Scheme\|url=ftp://cs.nyu.edu/pub/local/yap/cg/hashFKS.ps.gz\|website=New York University\|publisher=New York University\|accessdate=15 February 2015}}{{dead link\|date=September 2017 \|bot=InternetArchiveBot \|fix-attempted=yes }}</ref> In their 1984 paper,<ref name="inventor"/> they detail a two-tiered hash table scheme in which each bucket of the (first-level) hash table corresponds to a separate second-level hash table. Keys are hashed twice—the first hash value maps to a certain bucket in the first-level hash table; the second hash value gives the position of that entry in that bucket's second-level hash table. The second-level table is guaranteed to be collision-free (i.e. ]) upon construction. Consequently, the look-up cost is guaranteed to be ] ].<ref name="dietzfelbinger"/>

			In the static case, we are given a set with a total of {{mvar\|x}} entries, each one with a unique key, ahead of time.
			Fredman, Komlós and Szemerédi pick a first-level hash table with size <math>s = 2(x-1)</math> buckets.<ref name="dietzfelbinger"/>

			To construct, {{mvar\|x}} entries are separated into {{mvar\|s}} buckets by the top-level hashing function, where <math>s = 2(x-1)</math>. Then for each bucket with {{mvar\|k}} entries, a second-level table is allocated with <math>k^2</math> slots, and its ] is selected at random from a ] set so that it is collision-free (i.e. a ]) and stored alongside the hash table. If the hash function randomly selected creates a table with collisions, a new hash function is randomly selected until a collision-free table can be guaranteed. Finally, with the collision-free hash, the {{mvar\|k}} entries are hashed into the second-level table.

		⚫	The quadratic size of the <math>k^2</math> space ensures that randomly creating a table with collisions is infrequent and independent of the size of {{mvar\|k}}, providing linear amortized construction time. Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are ], the structure as a whole occupies expected <math>O(n)</math> space, since bucket sizes are small with high ].<ref name="inventor"/>

			The first-level hash function is specifically chosen so that, for the specific set of {{mvar\|x}} unique key values, the total space {{mvar\|T}} used by all the second-level hash tables has expected <math>O(n)</math> space, and more specifically <math>T < s + 4 \cdot x</math>.
			Fredman, Komlós and Szemerédi showed that given a ] family of hash functions, at least half of those functions have that property.<ref name="dietzfelbinger"/>

			===Dynamic case===

			Dietzfelbinger et al. present a dynamic dictionary algorithm that, when a set of n items is incrementally added to the dictionary, membership queries always run in constant time and therefore <math>O(1)</math> worst-case time, the total storage required is <math>O(n)</math> (linear), and <math>O(1)</math> expected amortized insertion and deletion time (]).

			In the dynamic case, when a key is inserted into the hash table, if its entry in its respective subtable is occupied, then a collision is said to occur and the subtable is rebuilt based on its new total entry count and randomly selected hash function. Because the ] of the second-level table is kept low <math>1/k</math>, rebuilding is infrequent, and the ] expected cost of insertions is <math>O(1)</math>.<ref name="dietzfelbinger"/> Similarly, the amortized expected cost of deletions is <math>O(1)</math>.<ref name="dietzfelbinger"/>

			Additionally, the ultimate sizes of the top-level table or any of the subtables is unknowable in the dynamic case. One method for maintaining expected <math>O(n)</math> space of the table is to prompt a full reconstruction when a sufficient number of insertions and deletions have occurred. By results due to Dietzfelbinger et al.,<ref name="dietzfelbinger"/> as long as the total number of insertions or deletions exceeds the number of elements at the time of last construction, the amortized expected cost of insertion and deletion remain <math>O(1)</math> with full rehashing taken into consideration.

			The implementation of dynamic perfect hashing by Dietzfelbinger et al. uses these concepts, as well as ], and is shown in pseudocode below.

			==Pseudocode implementation==

			=== Locate ===

	'''function''' Locate(''x'') '''is'''		'''function''' Locate(''x'') '''is'''
	''j'' = h('''x''');		''j'' := h('''x''')
	'''if''' (position h<sub>j</sub>(''x'') of subtable ''T<sub>j</sub>'' contains ''x'' (not deleted))		'''if''' (position h<sub>j</sub>(''x'') of subtable ''T<sub>j</sub>'' contains ''x'' (not deleted))
	'''return''' (''x'' is in ''S'');		'''return''' (''x'' is in ''S'')
	'''end if'''		'''end if'''
	'''else'''		'''else'''
	'''return''' (''x'' is not in ''S'');		'''return''' (''x'' is not in ''S'')
	'''end else'''		'''end else'''
	'''end'''		'''end'''

			=== Insert ===
⚫	Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are ], the structure as a whole occupies expected O(''n'') space, since bucket sizes are small with high ].<ref name="inventor"/>

	During the insertion of a new entry ''x'' at ''j'', the global operations counter, ''count'', is incremented. If ''x'' exists at ''j'' but is marked as deleted then the mark is removed. If ''x'' exists at ''j'', or at the subtable ''T<sub>j</sub>'', ~~but~~ is not marked as deleted then a collision is said to occur and the ''j''<sup>th</sup> bucket's second-level table ''T<sub>j</sub>'' is rebuilt with a different randomly selected hash function ''h<sub>j</sub>''. ~~Because the ] of the second-level table is kept low (1/''k''), rebuilding is infrequent, and the ] expected cost of insertions is O(1).<ref name="dietzfelbinger"/>~~		During the insertion of a new entry ''x'' at ''j'', the global operations counter, ''count'', is incremented.

			If ''x'' exists at ''j'', but is marked as deleted, then the mark is removed.

			If ''x'' exists at ''j'' or at the subtable ''T<sub>j</sub>'', and is not marked as deleted, then a collision is said to occur and the ''j''<sup>th</sup> bucket's second-level table ''T<sub>j</sub>'' is rebuilt with a different randomly selected hash function ''h<sub>j</sub>''.

	'''function''' Insert(''x'') '''is'''		'''function''' Insert(''x'') '''is'''
	''count'' = ''count'' + 1;		''count'' = ''count'' + 1;
	'''if''' (''count'' > ''M'')		'''if''' (''count'' > ''M'')
	FullRehash(''x'');		FullRehash(''x'');
	'''end if'''		'''end if'''
	'''else'''		'''else'''
	''j'' = h(''x'');		''j'' = h(''x'');
	'''if''' (Position h<sub>''j''</sub>(x) of subtable ''T<sub>j</sub>'' contains ''x'')		'''if''' (Position h<sub>''j''</sub>(x) of subtable ''T<sub>j</sub>'' contains ''x'')
	'''if''' (''x'' is marked deleted)		'''if''' (''x'' is marked deleted)
	remove the delete marker;		remove the delete marker;
	'''end if'''		'''end if'''
	'''end if'''		'''end if'''
	'''else'''		'''else'''
	''b<sub>j</sub>'' = ''b<sub>j</sub>'' + 1;		''b<sub>j</sub>'' = ''b<sub>j</sub>'' + 1;
	'''if''' (''b<sub>j</sub>'' <= ''m<sub>j</sub>'')		'''if''' (''b<sub>j</sub>'' <= ''m<sub>j</sub>'')
	'''if''' position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'' is empty		'''if''' position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'' is empty
	store ''x'' in position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'';		store ''x'' in position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'';
	'''end if'''		'''end if'''
	'''else'''		'''else'''
	Put all unmarked elements of ''T<sub>j</sub>'' in list ''L<sub>j</sub>'';		Put all unmarked elements of ''T<sub>j</sub>'' in list ''L<sub>j</sub>'';
	Append ''x'' to list ''L<sub>j</sub>'';		Append ''x'' to list ''L<sub>j</sub>'';
	''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';		''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';
	'''repeat'''		'''repeat'''
	''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';		''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';
	'''until''' ''h<sub>j</sub>'' is injective on the elements of ''L<sub>j</sub>'';		'''until''' ''h<sub>j</sub>'' is injective on the elements of ''L<sub>j</sub>'';
	'''for''' all ''y'' on list ''L<sub>j</sub>''		'''for''' all ''y'' on list ''L<sub>j</sub>''
	store ''y'' in position h<sub>''j''</sub>(''y'') of ''T<sub>j</sub>'';		store ''y'' in position h<sub>''j''</sub>(''y'') of ''T<sub>j</sub>'';
	'''end for'''		'''end for'''
	'''end else'''		'''end else'''
	'''end if'''		'''end if'''
	'''else'''		'''else'''
	''m<sub>j</sub>'' = 2 * max{1, ''m<sub>j</sub>''};		''m<sub>j</sub>'' = 2 * max{1, ''m<sub>j</sub>''};
	''s<sub>j</sub>'' = 2 * ''m<sub>j</sub>'' * (''m<sub>j</sub>'' - 1);		''s<sub>j</sub>'' = 2 * ''m<sub>j</sub>'' * (''m<sub>j</sub>'' - 1);
	'''if''' the sum total of all s<sub>j</sub> ≤ 32 * ''M''<sup>2</sup> / ''s''(''M'') + 4 * ''M''		'''if''' the sum total of all s<sub>j</sub> ≤ 32 * ''M''<sup>2</sup> / ''s''(''M'') + 4 * ''M''
	Allocate ''s<sub>j</sub>'' cells for ''T<sub>j</sub>'';		Allocate ''s<sub>j</sub>'' cells for ''T<sub>j</sub>'';
	Put all unmarked elements of ''T<sub>j</sub>'' in list ''L<sub>j</sub>'';		Put all unmarked elements of ''T<sub>j</sub>'' in list ''L<sub>j</sub>'';
	Append ''x'' to list ''L<sub>j</sub>'';		Append ''x'' to list ''L<sub>j</sub>'';
	''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';		''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';
	'''repeat'''		'''repeat'''
	''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';		''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';
	'''until''' ''h<sub>j</sub>'' is injective on the elements of ''L<sub>j</sub>'';		'''until''' ''h<sub>j</sub>'' is injective on the elements of ''L<sub>j</sub>'';
	'''for''' all ''y'' on list ''L<sub>j</sub>''		'''for''' all ''y'' on list ''L<sub>j</sub>''
	store ''y'' in position h<sub>''j''</sub>(''y'') of ''T<sub>j</sub>'';		store ''y'' in position h<sub>''j''</sub>(''y'') of ''T<sub>j</sub>'';
	'''end for'''		'''end for'''
	'''end if'''		'''end if'''
	'''else'''		'''else'''
	FullRehash(''x'');		FullRehash(''x'');
	'''end else'''		'''end else'''
	'''end else'''		'''end else'''
	'''end else'''		'''end else'''
	'''end else'''		'''end else'''
	'''end'''		'''end'''

			=== Delete ===
⚫	Deletion of ''x'' simply flags ''x'' as deleted without removal and increments ''count''. In the case of both insertions and deletions, if ''count'' reaches a threshold ''M'' the entire table is rebuilt, where ''M'' is some constant multiple of the size of S at the start of a new ''phase''. Here ''phase'' refers to the time between full rebuilds. ~~The amortized expected cost of delete is O(1).<ref name="dietzfelbinger"/>~~ Note that here the -1 in "Delete(''x'')" is a representation of an element which is not in the set of all possible elements ''U''.

		⚫	Deletion of ''x'' simply flags ''x'' as deleted without removal and increments ''count''. In the case of both insertions and deletions, if ''count'' reaches a threshold ''M'' the entire table is rebuilt, where ''M'' is some constant multiple of the size of S at the start of a new ''phase''. Here ''phase'' refers to the time between full rebuilds. Note that here the -1 in "Delete(''x'')" is a representation of an element which is not in the set of all possible elements ''U''.

	'''function''' Delete(''x'') '''is'''		'''function''' Delete(''x'') '''is'''
	''count'' = ''count'' + 1;		''count'' = ''count'' + 1;
	''j'' = h(''x'');		''j'' = h(''x'');
	'''if''' position h<sub>j</sub>(''x'') of subtable ''Tj'' contains ''x''		'''if''' position h<sub>j</sub>(''x'') of subtable ''Tj'' contains ''x''
	mark ''x'' as deleted;		mark ''x'' as deleted;
	'''end if'''		'''end if'''
	'''else'''		'''else'''
	'''return''' (x is not a member of S);		'''return''' (x is not a member of S);
	'''end else'''		'''end else'''
	'''if''' (''count'' >= ''M'')		'''if''' (''count'' >= ''M'')
	FullRehash(-1);		FullRehash(-1);
	'''end if'''		'''end if'''
	'''end'''		'''end'''

			=== Full rebuild ===

	A full rebuild of the table of ''S'' first starts by removing all elements marked as deleted and then setting the next threshold value ''M'' to some constant multiple of the size of ''S''. A hash function, which partitions ''S'' into ''s''(''M'') subsets, where the size of subset ''j'' is ''s<sub>j</sub>'', is repeatedly randomly chosen until:		A full rebuild of the table of ''S'' first starts by removing all elements marked as deleted and then setting the next threshold value ''M'' to some constant multiple of the size of ''S''. A hash function, which partitions ''S'' into ''s''(''M'') subsets, where the size of subset ''j'' is ''s<sub>j</sub>'', is repeatedly randomly chosen until:
Line 92:		Line 139:
	<math>\sum_{0\le j\le s(M)} s_j \le \frac{32M^2}{s(M)} + 4M.</math>		<math>\sum_{0\le j\le s(M)} s_j \le \frac{32M^2}{s(M)} + 4M.</math>

	Finally, for each subtable ''T<sub>j</sub>'' a hash function ''h<sub>j</sub>'' is repeatedly randomly chosen from ''H<sub>sj</sub>'' until ''h<sub>j</sub>'' is injective on the elements of ''T<sub>j</sub>''. The expected time for a full rebuild of the table of ''S'' with size ''n'' is O(''n'').<ref name="dietzfelbinger"/>		Finally, for each subtable ''T<sub>j</sub>'' a hash function ''h<sub>j</sub>'' is repeatedly randomly chosen from ''H<sub>sj</sub>'' until ''h<sub>j</sub>'' is injective on the elements of ''T<sub>j</sub>''. The expected time for a full rebuild of the table of ''S'' with size ''n'' is O(''n'').<ref name="dietzfelbinger"/>

	'''function''' FullRehash(''x'') '''is'''		'''function''' FullRehash(''x'') '''is'''
	Put all unmarked elements of ''T'' in list ''L'';		Put all unmarked elements of ''T'' in list ''L'';
	'''if''' (''x'' is in ''U'')		'''if''' (''x'' is in ''U'')
	append ''x'' to ''L'';		append ''x'' to ''L'';
	'''end if'''		'''end if'''
	''count'' = length of list ''L'';		''count'' = length of list ''L'';
	''M'' = (1 + ''c'') * max{''count'', 4};		''M'' = (1 + ''c'') * max{''count'', 4};
	'''repeat'''		'''repeat'''
	h = randomly chosen function in ''H<sub>s(M)</sub>'';		h = randomly chosen function in ''H<sub>s(M)</sub>'';
	'''for''' all ''j'' < ''s''(''M'')		'''for''' all ''j'' < ''s''(''M'')
	form a list ''L<sub>j</sub>'' for h(''x'') = ''j'';		form a list ''L<sub>j</sub>'' for h(''x'') = ''j'';
	''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';		''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';
	''m<sub>j</sub>'' = 2 * ''b<sub>j</sub>'';		''m<sub>j</sub>'' = 2 * ''b<sub>j</sub>'';
	''s<sub>j</sub>'' = 2 * ''m<sub>j</sub>'' * (''m<sub>j</sub>'' - 1);		''s<sub>j</sub>'' = 2 * ''m<sub>j</sub>'' * (''m<sub>j</sub>'' - 1);
	'''end for'''		'''end for'''
	'''until''' the sum total of all s<sub>j</sub> ≤ 32 * ''M''<sup>2</sup> / ''s''(''M'') + 4 * ''M''		'''until''' the sum total of all s<sub>j</sub> ≤ 32 * ''M''<sup>2</sup> / ''s''(''M'') + 4 * ''M''
	'''for''' all ''j'' < ''s''(''M'')		'''for''' all ''j'' < ''s''(''M'')
	Allocate space ''s<sub>j</sub>'' for subtable ''T<sub>j</sub>'';		Allocate space ''s<sub>j</sub>'' for subtable ''T<sub>j</sub>'';
	'''repeat'''		'''repeat'''
	''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';		''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';
	'''until''' ''h<sub>j</sub>'' is injective on the elements of list ''L<sub>j</sub>'';		'''until''' ''h<sub>j</sub>'' is injective on the elements of list ''L<sub>j</sub>'';
	'''end for'''		'''end for'''
	'''for''' all ''x'' on list ''L<sub>j</sub>''		'''for''' all ''x'' on list ''L<sub>j</sub>''
	store ''x'' in position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'';		store ''x'' in position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'';
	'''end for'''		'''end for'''
	'''end'''		'''end'''

	==See also==		==See also==
	*]		* ]

	==References==		==References==

Latest revision as of 18:23, 23 December 2024

Programming technique for resolving duplicate hash values in a hash table data structure

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. While more memory-intensive than its hash table counterparts, this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

Details

Static case

FKS Scheme

Main article: static hashing § FKS Hashing

The problem of optimal static hashing was first solved in general by Fredman, Komlós and Szemerédi. In their 1984 paper, they detail a two-tiered hash table scheme in which each bucket of the (first-level) hash table corresponds to a separate second-level hash table. Keys are hashed twice—the first hash value maps to a certain bucket in the first-level hash table; the second hash value gives the position of that entry in that bucket's second-level hash table. The second-level table is guaranteed to be collision-free (i.e. perfect hashing) upon construction. Consequently, the look-up cost is guaranteed to be O(1) in the worst-case.

In the static case, we are given a set with a total of x entries, each one with a unique key, ahead of time. Fredman, Komlós and Szemerédi pick a first-level hash table with size $s=2(x-1)$ buckets.

To construct, x entries are separated into s buckets by the top-level hashing function, where $s=2(x-1)$ . Then for each bucket with k entries, a second-level table is allocated with $k^{2}$ slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function) and stored alongside the hash table. If the hash function randomly selected creates a table with collisions, a new hash function is randomly selected until a collision-free table can be guaranteed. Finally, with the collision-free hash, the k entries are hashed into the second-level table.

The quadratic size of the $k^{2}$ space ensures that randomly creating a table with collisions is infrequent and independent of the size of k, providing linear amortized construction time. Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected $O(n)$ space, since bucket sizes are small with high probability.

The first-level hash function is specifically chosen so that, for the specific set of x unique key values, the total space T used by all the second-level hash tables has expected $O(n)$ space, and more specifically $T<s+4\cdot x$ . Fredman, Komlós and Szemerédi showed that given a universal hashing family of hash functions, at least half of those functions have that property.

Dynamic case

Dietzfelbinger et al. present a dynamic dictionary algorithm that, when a set of n items is incrementally added to the dictionary, membership queries always run in constant time and therefore $O(1)$ worst-case time, the total storage required is $O(n)$ (linear), and $O(1)$ expected amortized insertion and deletion time (amortized constant time).

In the dynamic case, when a key is inserted into the hash table, if its entry in its respective subtable is occupied, then a collision is said to occur and the subtable is rebuilt based on its new total entry count and randomly selected hash function. Because the load factor of the second-level table is kept low $1/k$ , rebuilding is infrequent, and the amortized expected cost of insertions is $O(1)$ . Similarly, the amortized expected cost of deletions is $O(1)$ .

Additionally, the ultimate sizes of the top-level table or any of the subtables is unknowable in the dynamic case. One method for maintaining expected $O(n)$ space of the table is to prompt a full reconstruction when a sufficient number of insertions and deletions have occurred. By results due to Dietzfelbinger et al., as long as the total number of insertions or deletions exceeds the number of elements at the time of last construction, the amortized expected cost of insertion and deletion remain $O(1)$ with full rehashing taken into consideration.

The implementation of dynamic perfect hashing by Dietzfelbinger et al. uses these concepts, as well as lazy deletion, and is shown in pseudocode below.

Pseudocode implementation

Locate

function Locate(x) is
    j := h(x)
    if (position h_j(x) of subtable T_j contains x (not deleted))
        return (x is in S)
    end if
    else 
        return (x is not in S)
    end else
end

Insert

During the insertion of a new entry x at j, the global operations counter, count, is incremented.

If x exists at j, but is marked as deleted, then the mark is removed.

If x exists at j or at the subtable T_j, and is not marked as deleted, then a collision is said to occur and the j bucket's second-level table T_j is rebuilt with a different randomly selected hash function h_j.

function Insert(x) is
    count = count + 1;
    if (count > M) 
        FullRehash(x);
    end if
    else
        j = h(x);
        if (Position h_j(x) of subtable T_j contains x)
            if (x is marked deleted) 
                remove the delete marker;
            end if
        end if
        else
            b_j = b_j + 1;
            if (b_j <= m_j) 
                if position h_j(x) of T_j is empty 
                    store x in position h_j(x) of T_j;
                end if
                else
                    Put all unmarked elements of T_j in list L_j;
                    Append x to list L_j;
                    b_j = length of L_j;
                    repeat 
                        h_j = randomly chosen function in H_sj;
                    until h_j is injective on the elements of L_j;
                    for all y on list L_j
                        store y in position h_j(y) of T_j;
                    end for
                end else
            end if
            else
                m_j = 2 * max{1, m_j};
                s_j = 2 * m_j * (m_j - 1);
                if the sum total of all s_j ≤ 32 * M / s(M) + 4 * M 
                    Allocate s_j cells for T_j;
                    Put all unmarked elements of T_j in list L_j;
                    Append x to list L_j;
                    b_j = length of L_j;
                    repeat 
                        h_j = randomly chosen function in H_sj;
                    until h_j is injective on the elements of L_j;
                    for all y on list L_j
                        store y in position h_j(y) of T_j;
                    end for
                end if
                else
                    FullRehash(x);
                end else
            end else
        end else
    end else
end

Delete

Deletion of x simply flags x as deleted without removal and increments count. In the case of both insertions and deletions, if count reaches a threshold M the entire table is rebuilt, where M is some constant multiple of the size of S at the start of a new phase. Here phase refers to the time between full rebuilds. Note that here the -1 in "Delete(x)" is a representation of an element which is not in the set of all possible elements U.

function Delete(x) is
    count = count + 1;
    j = h(x);
    if position h_j(x) of subtable Tj contains x
        mark x as deleted;
    end if
    else 
        return (x is not a member of S);
    end else
    if (count >= M)
        FullRehash(-1);
    end if
end

Full rebuild

A full rebuild of the table of S first starts by removing all elements marked as deleted and then setting the next threshold value M to some constant multiple of the size of S. A hash function, which partitions S into s(M) subsets, where the size of subset j is s_j, is repeatedly randomly chosen until:

$\sum _{0\leq j\leq s(M)}s_{j}\leq {\frac {32M^{2}}{s(M)}}+4M.$

Finally, for each subtable T_j a hash function h_j is repeatedly randomly chosen from H_sj until h_j is injective on the elements of T_j. The expected time for a full rebuild of the table of S with size n is O(n).

function FullRehash(x) is
    Put all unmarked elements of T in list L;
    if (x is in U) 
        append x to L;
    end if
    count = length of list L;
    M = (1 + c) * max{count, 4};
    repeat 
        h = randomly chosen function in H_s(M);
        for all j < s(M) 
            form a list L_j for h(x) = j;
            b_j = length of L_j; 
            m_j = 2 * b_j; 
            s_j = 2 * m_j * (m_j - 1);
        end for
    until the sum total of all s_j ≤ 32 * M / s(M) + 4 * M
    for all j < s(M) 
        Allocate space s_j for subtable T_j;
        repeat 
            h_j = randomly chosen function in H_sj;
        until h_j is injective on the elements of list L_j;
    end for
    for all x on list L_j 
        store x in position h_j(x) of T_j;
    end for
end

References

^ Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#
^ Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. "Dynamic Perfect Hashing: Upper and Lower Bounds" Archived 2016-03-04 at the Wayback Machine. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370 doi:10.1137/S0097539791194094
Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003.
Yap, Chee. "Universal Construction for the FKS Scheme". New York University. New York University. Retrieved 15 February 2015.

Categories: