Draft:WKdm: Difference between revisions

Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 17:43, 8 January 2024 editJdbtwo (talk \| contribs)Extended confirmed users695 edits →Compression: ClarificationTags: Mobile edit Mobile web edit Advanced mobile edit← Previous edit		Revision as of 18:32, 8 January 2024 edit undoJdbtwo (talk \| contribs)Extended confirmed users695 edits →Motivation: Style fixupTags: Mobile edit Mobile web edit Advanced mobile editNext edit →
Line 10:		Line 10:
	The key insight on which the WKdm algorithm is built is the realization that most <!--high-level programming languages-->] compile to output whose data section(s) have certain very strong data regularities with regard to integers and pointers.<ref name="CaseForCompressedCaching" /><ref name="simpson1">{{cite web \|url=https://terpconnect.umd.edu/~barua/matt-compress-tr.pdf \|title=Analysis of Compression Algorithms for Program Data \|last=Simpson \|first=Matthew \|date= 12 Aug 2003 \|website=umd.edu \|access-date= 6 Jan 2024}}</ref> Firstly, a large amount of integers and pointers are word-aligned within records, where "word" here and for the rest of this article is 32 bits. Additionally, most integers usually contain small values relative to their maximum ranges. For pointers, most proximal in memory to each other reference addresses that are close to each other in memory. Finally, certain data patterns, particularly words of all zeroes, frequently occur and this is exploited by the algorithm.		The key insight on which the WKdm algorithm is built is the realization that most <!--high-level programming languages-->] compile to output whose data section(s) have certain very strong data regularities with regard to integers and pointers.<ref name="CaseForCompressedCaching" /><ref name="simpson1">{{cite web \|url=https://terpconnect.umd.edu/~barua/matt-compress-tr.pdf \|title=Analysis of Compression Algorithms for Program Data \|last=Simpson \|first=Matthew \|date= 12 Aug 2003 \|website=umd.edu \|access-date= 6 Jan 2024}}</ref> Firstly, a large amount of integers and pointers are word-aligned within records, where "word" here and for the rest of this article is 32 bits. Additionally, most integers usually contain small values relative to their maximum ranges. For pointers, most proximal in memory to each other reference addresses that are close to each other in memory. Finally, certain data patterns, particularly words of all zeroes, frequently occur and this is exploited by the algorithm.

	To make use of the above data regularities, one need only realize that ~~many~~ words will share many of their high-order bits either because they aren’t large enough to require a full-word bit width, or, said words are pointers whose values reference addresses close in memory to those referenced by nearby pointers. Also, words of all zeroes, which occur frequently, can be easily compressed.		To make use of the above data regularities, one need only realize that, frequently, words will share many of their high-order bits either because they aren’t large enough to require a full-word bit width, or, said words are pointers whose values reference addresses close in memory to those referenced by nearby pointers. Also, words of all zeroes, which occur frequently, can be easily compressed.

	==Algorithm==		==Algorithm==

Revision as of 18:32, 8 January 2024

Virtual memory compression algorithm

Review waiting, please be patient.

This may take 2 months or more, since drafts are reviewed in no specific order. There are 1,767 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Misplaced Pages:Contributing to Misplaced Pages – a basic overview on how to edit Misplaced Pages.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Misplaced Pages:Article development – how to develop your article
Misplaced Pages:Writing better articles – how to improve your article
Misplaced Pages:Verifiability – make sure your article includes reliable third-party sources

You can also browse Misplaced Pages:Featured articles and Misplaced Pages:Good articles to find examples of Misplaced Pages's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · WKdm (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Misplaced Pages) · Submitted 12 months ago by Jdbtwo (talk: D · +) · Last edited 12 months ago by Jdbtwo

The WKdm algorithm is one of the first in the class of WK virtual memory compression techniques developed initially by Paul R. Wilson and Scott F. Kaplan et al. circa 1999. The "dm" in the WKdm acronym stands for "direct mapped" and refers to the direct mapped hash method used to map uncompressed words in memory to the WKdm algorithm's dictionary.

Motivation

The key insight on which the WKdm algorithm is built is the realization that most high-level programming languages compile to output whose data section(s) have certain very strong data regularities with regard to integers and pointers. Firstly, a large amount of integers and pointers are word-aligned within records, where "word" here and for the rest of this article is 32 bits. Additionally, most integers usually contain small values relative to their maximum ranges. For pointers, most proximal in memory to each other reference addresses that are close to each other in memory. Finally, certain data patterns, particularly words of all zeroes, frequently occur and this is exploited by the algorithm.

To make use of the above data regularities, one need only realize that, frequently, words will share many of their high-order bits either because they aren’t large enough to require a full-word bit width, or, said words are pointers whose values reference addresses close in memory to those referenced by nearby pointers. Also, words of all zeroes, which occur frequently, can be easily compressed.

Algorithm

Compression

The WKdm algorithm reads one word at a time from an address range, usually a page or pages, and uses a 16-entry direct mapped dictionary of words to produce compressed output which is segregated into four arrays or "segments" which contain, respectively, "tags" ( 2-bit values indicating the type of (non)match ), dictionary indices, unmatched words and the lower 10 bits of partially matched words. The tag, index and partial match values are initially output into bytes or words in their respective segments, before being “packed” after the number of words in the addresses range to be compressed is exhausted.

For each word read, the word is mapped to the dictionary using a direct-mapped hash, and then the type of (non)match is determined. If a full 32-bit word match is found in the dictionary, then a 2-bit "tag" value indicating a full-word match is written to the tags segment and the 4-bit index of the match within the dictionary is written to the indices segment. If only the high-order 22 bits match, then a different tag is written to the tags segment, the dictionary index of the partial match is output to the indices segment and the differing 10 lower-order bits are recorded in the partial match segment. If no match is found then the new value is added to the dictionary, as well as being emitted to the unmatched-words segment, and another tag signaling this is written to the tags segment. If the read word is all zeroes, then only one tag indicating this is output to the tags segment.

After all the words in the address range to be compressed have been read, the tags, indices and 10-bit partial match values, which are stored in bytes or words in their segments, are "packed" within their respective segments ( eg. their bits are made contiguous if their particular segment is taken to be one large bit vector. Additional steps may be taken — the exact details are implementation specific ) using bitwise operations to further reduce the compressed size of the data.

Decompression

Decompression is quite straightforward. The tags segment is processed one 2-bit tag at a time and action is taken depending on the value of the tag. If the value indicates a full-word match, then the corresponding dictionary index within the indices segment is referenced and the value referenced by the index in the dictionary is output. If a partial match is indicated, then the corresponding entry in the indices segment is consulted to look up the value that matches the high-order 22 bits and then the partial match segment is read to reconstruct the full 32-bit word, which is written to the uncompressed output. If the current tag indicates that there was no match, then the corresponding 32-bit word in the unmatched words segment is referenced and added to the dictionary as well as being emitted as part of the uncompressed output. If the tag indicates that a word was read that was all zeroes, then a 32-bit zero value is sent to the output.

Performance

From the authors' simulations and others', it was found that WKdm compression achieves a compression ratio comparable or superior to LZ-based dictionary compressors -- 2:1 to 2.5:1 . The WKdm algorithm also has much less overhead than an LZ-class compressor as it only uses a dictionary that is 64 bytes in size as compared to eg. 64 kilobytes. Furthermore, because of the simplicity of the algorithm, compression and decompression is usually much faster than traditional LZ-based compressors.

References

^ Wilson, Paul R.; Kaplan, Scott F.; Smaragdakis, Yannis (1999-06-06). The Case for Compressed Caching in Virtual Memory Systems (PDF). USENIX Annual Technical Conference. Monterey, California, USA. pp. 101–116.
^ Simpson, Matthew (12 Aug 2003). "Analysis of Compression Algorithms for Program Data" (PDF). umd.edu. Retrieved 6 Jan 2024.

Categories: