Dynamic perfect hashing - Misplaced Pages

This is an old revision of this page, as edited by 35.11.210.102 (talk) at 23:09, 27 October 2014 (→Added 'a' between 'as' and 'seperate'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 23:09, 27 October 2014 by 35.11.210.102 (talk) (→Added 'a' between 'as' and 'seperate')(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

Details

In this method, the entries that hash to the same slot of the table are organized as a separate second-level hash table. If there are k entries in this set S, the second-level table is allocated with k slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function). Therefore, the look-up cost is guaranteed to be O(1) in the worst-case.

function Locate(x) is
       j = h(x);
       if (position h_j(x) of subtable T_j contains x (not deleted))
          return (x is in S);
       end if
       else 
          return (x is not in S);
       end else
end

Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected O(n) space, since bucket sizes are small with high probability.

During the insertion of a new entry x at j, the global operations counter, count, is incremented. If x exists at j but is marked as deleted then the mark is removed. If x exists at j, or at the subtable T_j, but is not marked as deleted then a collision is said to occur and the j bucket's second-level table T_j is rebuilt with a different randomly selected hash function h_j. Because the load factor of the second-level table is kept low (1/k), rebuilding is infrequent, and the amortized cost of insertions is O(1).

function Insert(x) is
       count = count + 1;
       if (count > M) 
          FullRehash(x);
       end if
       else
          j = h(x);
          if (Position h_j(x) of subtable T_j contains x)
             if (x is marked deleted) 
                remove the delete marker;
             end if
          end if
          else
             b_j = b_j + 1;
             if (b_j <= m_j) 
                if position h_j(x) of T_j is empty 
                   store x in position h_j(x) of T_j;
                end if
                else
                   Put all unmarked elements of T_j in list L_j;
                   Append x to list L_j;
                   b_j = length of L_j;
                   repeat 
                      h_j = randomly chosen function in H_sj;
                   until h_j is injective on the elements of L_j;
                   for all y on list L_j
                      store y in position h_j(y) of T_j;
                   end for
                end else
             end if
             else
                m_j = 2 * max{1, m_j};
                s_j = 2 * m_j * (m_j - 1);
                if the sum total of all s_j ≤ 32 * M / s(M) + 4 * M 
                   Allocate s_j cells for T_j;
                   Put all unmarked elements of T_j in list L_j;
                   Append x to list L_j;
                   b_j = length of L_j;
                   repeat 
                      h_j = randomly chosen function in H_sj;
                   until h_j is injective on the elements of L_j;
                   for all y on list L_j
                      store y in position h_j(y) of T_j;
                   end for
                end if
                else
                   FullRehash(x);
                end else
             end else
          end else
       end else
end

Deletion of x simply flags x as deleted without removal and increments count. In the case of both insertions and deletions, if count reaches a threshold M the entire table is rebuilt, where M is some constant multiple of the size of S at the start of a new phase. Here phase refers to the time between full rebuilds. The amortized cost of delete is O(1). Note that here the -1 in "Delete(x)" is a representation of an element which is not in the set of all possible elements U.

function Delete(x) is
       count = count + 1;
       j = h(x);
       if position h_j(x) of subtable Tj contains x
          mark x as deleted;
       end if
       else 
          return (x is not a member of S);
       end else
       if (count >= M)
          FullRehash(-1);
       end if
end

A full rebuild of the table of S first starts by removing all elements marked as deleted and then setting the next threshold value M to some constant multiple of the size of S. A hash function, which partitions S into s(M) subsets, where the size of subset j is s_j, is repeatedly randomly chosen until:

$\sum _{0\leq j\leq s(M)}s_{j}\leq {\frac {32M^{2}}{s(M)}}+4M.$

Finally, for each subtable T_j a hash function h_j is repeatedly randomly chosen from H_sj until h_j is injective on the elements of T_j. The expected time for a full rebuild of the table of S with size n is O(n).

function FullRehash(x) is
       Put all unmarked elements of T in list L;
       if (x is in U) 
          append x to L;
       end if
       count = length of list L;
       M = (1 + c) * max{count, 4};
       repeat 
          h = randomly chosen function in H_s(M);
          for all j < s(M) 
             form a list L_j for h(x) = j;
             b_j = length of L_j; 
             m_j = 2 * b_j; 
             s_j = 2 * m_j * (m_j - 1);
          end for
       until the sum total of all s_j ≤ 32 * M / s(M) + 4 * M
       for all j < s(M) 
          Allocate space s_j for subtable T_j;
          repeat 
             h_j = randomly chosen function in H_sj;
          until h_j is injective on the elements of list L_j;
       end for
       for all x on list L_j 
          store x in position h_j(x) of T_j;
       end for
end

References

^ Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#
^ Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#
Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf

Categories:

Details

See also

References