If not already known, establish frequency of occurrence counts / ratios

Build the tree iteratively with the “aggregate the two smallest-weight items” algorithm

Traverse the tree to identify the encoding strings, and store them in a table

Encode via table look-up, padding if needed

Decode via tree navigation, outputting characters when leaves reached, continuing at root. Stop when we run out of bits (even if not at a leaf node)

When it’s time to decode, we must have the tree

Suppose Alice encodes a file and sends it to Bob.

How can Bob decode the file without the tree?

If Bob has the original data, then he can build the tree, but if Bob already has the original data, then there was no need to send him an encoded copy OF the original data

What Bob needs is a method of creating the tree himself, without the added overhead of sending tree-creating data being too great

How to do this?

A couple of ways:

Bob needs either the tree, or to be able to re-build the tree

We could send Bob the frequency counts for each of the 256 possible characters, and he could essentially re-do what we did to build the tree in the first place

If the input file is large (perhaps 1 GB?), then the individual counts would need to be 4-byte integers (we can’t count on being able to use shorts), so we would have to send 256 * 4 bytes (1024) of additional data (that “bloats” our “compressed” file!)

Of course, if the file is THAT large, it’s reasonable to assume (hope?) that it will compress enough such that even with the extra 1 KB of overhead, the total is smaller than the original file.

Bob needs to be able to re-build the tree, and we’d like to be able to send him the information required to do so with fewer than the 1024 bytes the original 256 int counts would require

But Bob doesn’t NEED the original counts!

All Bob needs is the ORDER in which we gathered the nodes originally!

Let’s walk back through our original example, this time with an eye towards remembering HOW we built the tree, in addition to winding up with the tree itself…

In our original example, our “alphabet” consisted of the values: ‘T’, ‘H’, ‘I’, ‘S’, <space>, ‘A’, ‘N’, ‘E’, ‘X’, ‘M’, ‘P’, ‘L’, ‘O’, ‘F’, ‘U’, ‘R’

Here is the “alphabet”, along with the frequency counts (“Sp” for “<Space>”):

Note: for a “real” application, the character numbers, rather than running from 1 to 16 would run from 0 to 255, and the characters would run from ASCII 0 to ASCII 255 – the row number IS the character!

This is huge, and helps us minimize that overhead.

Keep this idea on the back burner for a moment…

Char Count T 2 H 2 I 2 S 2 Sp 7 A 4 N 2 E 4 X 1 M 2 P 1 L 1 O 1 F 3 U 1 R 1

We started by sorting this table by count (in decreasing order), just to make it easy to see where the various items fell with respect to their counts

This is really not necessary. We can do everything in-place without having to sort the list.

Here’s how…

Char Count T 2 H 2 I 2 S 2 Sp 7 A 4 N 2 E 4 X 1 M 2 P 1 L 1 O 1 F 3 U 1 R 1

First, make a pass through the table, and find the lowest count. We said that if the same “lowest count” occurred more than once, that we would go with the first one we came across.

That’s the 1 in row 9

Make another pass through the table, finding the lowest count, as long as it’s not the one in row 9. That’s row 11

Now we’ve identified that the ‘X’ and the ‘P’ need to be gathered up into a subtree, and we do so…

Char Count T 2 H 2 I 2 S 2 Sp 7 A 4 N 2 E 4 X 1 M 2 P 1 L 1 O 1 F 3 U 1 R 1

Since we’ve gathered up the ‘X’ and the ‘P’ into a subtree, we replace the pointer to the ‘X’ node with a pointer to the subtree’s root, and replace the pointer to the ‘P’ node with null. The weight of the ‘XP’ subtree is 2 (since we gathered 1 and 1)

Now slot #9 contains a reference to the subtree with ‘X’ and ‘P’ (we don’t see the structure, but the characters ‘X’ and ‘P’ are contained by the subtree that row 9 now points to.

We gathered up rows 9 and 11 to do this.

Remember “9” and “11”, and make another pass

Char Count T 2 H 2 I 2 S 2 Sp 7 A 4 N 2 E 4 XP 2 M 2 – – L 1 O 1 F 3 U 1 R 1

This time, we identify rows 12 and 13 as containing the lowest-weight items, so we gather them into row 12 (the ‘L’ becomes ‘LO’), and the weight goes from 1 to 2

We do the same thing with rows 15 and 16

We started by remembering 9 and 11.

Now, we’re remembering 9, 11, 12, 13, 15, and 16 (in that order)…

Char Count T 2 H 2 I 2 S 2 Sp 7 A 4 N 2 E 4 XP 2 M 2 – – L 1 O 1 F 3 U 1 R 1

This time, we gather up rows 1 & 2 (the first two we come to that hold the minimal weight of 2), then 3 & 4, and then 7 & 9, and 10 & 12.

Note that gathering rows 7 & 9 and 10 & 12 combine a single character with a subtree.

Each of these has a weight of 2, so when we gather them up, we put the combined weight (4) into the first row of the pair, and set the second row of the pair to null:

Char Count T 2 H 2 I 2 S 2 Sp 7 A 4 N 2 E 4 XP 2 M 2 – – LO 2 – – F 3 UR 2 – –

Now the two lowest-weight items are rows 14 & 15, so we gather them.

Remember, the two lowest-weight items might not have the same weight

Gather rows 14 & 15, placing the result (“FUR”) in row 14, with a weight of 5.

Char Count TH 4 – – IS 4 – – Sp 7 A 4 NXP 4 E 4 – – MLO 4 – – – – – – F 3 UR 2 – –

Now 4 is our lowest-weight item for the next few iterations of the algorithm, and we combine them here now.

Gather rows 1 & 3, and 6 & 7, and 8 & 10, each time making the lower-numbered row the left child of the new parent node, and making the higher- numbered row the right child.

Char Count TH 4 – – IS 4 – – Sp 7 A 4 NXP 4 E 4 – – MLO 4 – – – – – – FUR 5 – – – –

Then we gather rows 5 and 14

Char Count THIS 8 – – – – – – Sp 7 ANXP 8 – – EMLO 8 – – – – – – – – – – FUR 5 – – – –

Now our lowest remaining weights are the two 8s in rows 1 and 6, so combine them:

Char Count THIS 8 – – – – – – SpFUR 12 ANXP 8 – – EMLO 8 – – – – – – – – – – – – – – – –

Now our lowest remaining weights are in rows 5 and 8, so combine them:

Char Count THISANXP 16 – – – – – – SpFUR 12 – – – – EMLO 8 – – – – – – – – – – – – – – – –

We finish the process by combining rows 1 and 5

Char Count THISANXP 16 – – – – – – SpFUREMLO 20 – – – – – – – – – – – – – – – – – – – – – –

Now, row 1 contains a reference to the subtree that (somewhere below its root) contains the entire alphabet.

If you go back through the last several slides, in each iteration, we combined two items to get one.

So, with 16 items in the table, we had to make 15 iterations.

Char Count <everything> 36 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –

In the first iteration, we combined rows 9 & 11

In the second, we combined 12 & 13

In the 3rd, we combined 15 & 16

In the 4th, we combined 1 & 2

In the 5th, we combined 3 & 4

In the 6th, we combined 7 & 9

In the 7th, we combined 10 & 12

In the 8th, we combined 14 & 15

In the 9th, we combined 1 & 3

In the 10th, we combined 6 & 7

In the 11th, we combined 8 & 10

In the 12th, we combined 5 & 14

In the 13th, we combined 1 & 6

In the 14th, we combined 5 & 8

In the 15th, we combined 1 & 5, and 1 pointed at the resulting root

So, our “combination sequence” was: 9, 11, 12, 13, 15, 16, 1, 2, 3, 4, 7, 9, 10, 12, 14, 15, 1, 3, 6, 7, 8, 10, 5, 14, 1, 6, 5, 8, 1, 5

With 16 rows in the table, we needed 15 iterations, and each gave us a “combination pair”, so we have a sequence of 30 values

Now, back to square 1…

Suppose we had the alphabet, but not the counts

Rather than the counts, if I gave you the sequence:

9, 11, 12, 13, 15, 16, 1, 2, 3, 4, 7, 9, 10, 12, 14, 15, 1, 3, 6, 7, 8, 10, 5, 14, 1, 6, 5, 8, 1, 5

You could process that in pairs, and do exactly the same combining operations, in exactly the same order

You don’t NEED the counts if you have the sequence!

Char T H I S Sp A N E X M P L O F U R

In this example, each number in the sequence was a row number between 1 and 16.

We could just as easily numbered them 0-15

Because we have 16 rows in the table (n), we needed (n-1) iterations, and each produced a pair of row numbers, so we wound up with 2(n-1), or 30 numbers, each 0-15

If our alphabet had been the entire set of 256 values that could occupy a byte, we would have needed 2(n-1), or 510 numbers, each 0-255.

Since 0-255 fits in a byte, and we need 510, we have cut our overhead for the tree-building information down to 510 bytes

So, we can describe the tree-building process in just 510 bytes of data!

We embed that information in the encoded file as the first 510 bytes

To decode a file, the receiver starts by retrieving those 255 pairs, and building the tree for doing the decoding, and then the remainder of the file is a bit stream the receiver uses (in conjunction with the tree) to produce the original byte stream.

Implement a complete Huffman encoding / decoding system

Create a console application (Huff) that has different behavior based on its command-line arguments. It will either:

Read a specified file and produce the 510-byte tree-building info, save that to a file for later use, and stop

Read a specified file, produce the 510-byte tree-building information, and use the resulting tree to encode the file, embedding the tree-building information in the encoded file (the tree-building information is not saved to a separate file)

Read a specified file, and apply a 510-byte tree-building information block from a second specified file, and produce an encoded file

Read a specified encoded file, containing the embedded tree-building information block, build the tree, and decode the file

Read a specified file and produce the 510-byte tree-building info, save that to a file, and stop

Huff –t file1 [file2]

Read a specified file, produce the 510-byte tree-building information, and use the resulting tree to encode the file, embedding the tree-building information in the encoded file (the tree-building information is not saved to a separate file)

Huff –e file1 [file2]

Read a specified file, and apply a 510-byte tree-building information block from a second specified file, and produce an encoded file

Huff –et file1 file2 [file3]

Read a specified encoded file, containing the embedded tree-building information block, build the tree, and decode the file.

Huff –d filename1 filename2

Give the user the appropriate usage forms

Huff –h -or- Huff -? -or- Huff –help

All other forms not matching cases 1 – 5 above result in an error message and the display of the appropriate usage forms

Note: Your program is to check to see if input files exist, and if output files can be created (what if the output file is to a non-existent or read-only drive?). If not, display an error and exit

Command line inputs should be of one of the following forms:

Huff –t file1 [file2] (just build a tree file)

Huff –e file1 [file2] (build tree and encode)

Huff –et file1 file2 [file3] (encode from existing tree)

Huff –d file1 file2 (decode)

Huff –h | -? | –help

All valid runs (except for “help”) should output their elapsed run time in seconds, to three decimal places (nearest millisecond)

From a command window, “fc /b” will compare two files on a byte-by-byte basis

The sample commands: Huff –e C:\Shakespeare.txt C:\Shakespeare.enc Huff –d C:\Shakespeare.enc C:\Shakespeare.dec fc /b C:\Shakespeare.txt C:\Shakespeare.dec should show FC: no differences encountered if your encoder / decoder works correctly from end-to-end

Hide all implementation-specific details (root pointer, any nodes needed, path strings, etc.) in the class (make them private).

You’ll need at least one (non-trivial) recursive (private) method to traverse the tree, building the encoding strings.

Recall that the char data type holds a single byte, which may be interpreted as a character or as an 8-bit integer. Values 128-255 might cause problems (because of the sign bit). Use (((int) c) & 0xff) when you need to use a char arithmetically

Or use unsigned char

http://www.asciitable.com/ or http://www.asciichart.com

This may help with displaying your character frequency counts:

for (int i=0; i<32; i++)

cout << setw(3) << i << setw(7) << CharFreq[i ] << ” || ”

<< setw(3) << i+32 << setw(7) << CharFreq[i+32 ] << ” || ”

<< setw(3) << i+64 << setw(7) << CharFreq[i+64 ] << ” || ”

<< setw(3) << i+96 << setw(7) << CharFreq[i+96 ] << ” || ”

<< setw(3) << i+128 << setw(7) << CharFreq[i+128] << ” || ”

<< setw(3) << i+160 << setw(7) << CharFreq[i+160] << ” || ”

<< setw(3) << i+192 << setw(7) << CharFreq[i+192] << ” || ”

<< setw(3) << i+224 << setw(7) << CharFreq[i+224] << endl;

For this assignment, you will need to be able to manipulate individual bits within a byte

The techniques extend to 32- and 64-bit words

The operations you’ll need will be:

Test a bit (see whether it is “1” or “0”)

Set a bit (make it be “1”) [“turn on”]

Clear a bit (make it be “0”) [“turn off” or “reset”]

To TEST a bit, AND it with “1”.

If the result of the AND operation is 1, the bit was 1; if the result of the AND operation is 0, the bit was 0

Note: This is the BIT-WISE AND (&); not the LOGICAL AND (&&)

Example:

if ((byteValue & 64) == 64)

Recall, however, that in C/C++, non-zero values are taken as true, so we don’t have to check to see if the ANDed value is any particular value; if it’s not zero, then the bit was ON

if (byteValue & 64)

To SET a bit, OR it with “1”.

Note: This is the BIT-WISE OR (|); not the LOGICAL OR (||)

Example:

byteValue = byteValue | 64;

byteValue |= 64;

You can turn on multiple bits at once

byteValue = byteValue | 96; // 96 = 01100000

byteValue |= 96; // ^^

byteValue |= (64 | 32); // |+—- 32 // +—– 64

To CLEAR a bit, AND it with “0”.

Note: This is the BIT-WISE AND (&); not the LOGICAL AND (&&)

Since all 8 bits are used in the CLEAR operation, the ones you’re NOT clearing have to all be “1”

Example:

byteValue = byteValue & (255-64);

byteValue &= (255-64); // 255-64 in binary is 10111111

You can turn off multiple bits at once

byteValue = byteValue & (255 – 64 – 8);

byteValue &= (255 – 64 – 8); // 255-64-8 in binary: 10110111

Clearing them all at once is easy: byteValue = 0;

Setting them all at once is easy: byteValue = 0xff;

An example:

170 & 103 (decimal)

170 (decimal) = 0xAA = 10101010

103 (decimal) = 0x67 = 01100111

Apply bit-wise and ——–

00100010

0010 0010

2 2 (hex)

2*16 + 2 (dec)

34 (dec)

You will need to build some bytes one-bit-at-a-time, from a list of eight 1-or-0 values. There are a couple of ways to go about it:

One is to use the bit position and the left-shift operator.

Suppose we call the bit positions 1 – 8, where “bit 1” is the MSB – Most Significant Bit (the left-most bit, or the bit corresponding to 128 [the sign bit]), and “bit 8” is the LSB – Least Significant Bit, corresponding to the value of 1.

Let’s say the byte we need to build is unsigned char b

Further, suppose we have 8 integer variables (i1 – i8), each of which contains either 0 or 1.

What we need to do is to take the bit that makes up i1 and put it in the MSB, the bit that makes up i2, and put it in the next bit, … the bit that makes up i8, and put it in the LSB

Rather than trying to figure out which bits to turn on and which ones to turn off, we can start by turning them all off at once:

b = 0; // that was easy

Then we use the values in i1 – i8 to set the individual bits in b whenever i1 – i8 is 1

if (i1) b |= (1 << 7); // 128

if (i2) b |= (1 << 6); // 64

if (i3) b |= (1 << 5); // 32

… // …

if (i7) b |= (1 << 1); // 2

if (i8) b |= (1 << 0); // 1

If the values of i happen to be in an array, rather than 8 distinct variables, then you COULD use a loop:

b = 0; // set all 8 bits to zero

for (int j = 0; j<=7; j++) b |= (i[j] << (7-j)); — OR — for (int j = 0; j<=7; j++) if (i[j]) b |= (1 << (7-j));

You may also want to consider pre-computing the 8 powers of two, and let j select one of them, rather than computing them on-the-fly:

unsigned char P2[] = {1, 2, 4, … 128}; // Powers of 2

b = 0; // set all 8 bits to zero

for (int j = 0; j<=7; j++) if (i[j]) b |= P2[j]); // using P2[j] rather than // 1 << (7-j)

Some of the data types you’ll need (perhaps class-level variables?)

Huffman tree nodes (char / count / LCH / RCH)

These will be contained entirely in the Huffman class, so creating a node class is overkill; just create a struct for the nodes.

An array of pointers to nodes (to build the tree)

A string array (for the paths to the leaves for every possible character)

An integer array (for the character frequencies)

You will have to use the command-line arguments to determine what to do, and what files to operate on

The header for main() should be:

int main(int argc, char* argv[])

main() receives the number of command-line arguments in argc, and a string array in argv – each argument is one of the strings.

This is essentially what java does with main(String[] args) (except that in Java, an array is an object, so we can GET its size with .length, which isn’t available in C++, so we have to be given the count (hence, int argc)

For example, if the usage is:

Huff –et file1 file2 [file3]

Then argc will either be 4 or 5 (depending on whether file3 was given or not)

argv[0] will be “Huff”

argv[1] will be “-et”

argv[2] will be the first file name (file1)

argv[3] will be the second one (file2)

If the third file name (file3) is specified, it will be in argv[4]

Some things you must NOT do / use:

Do not use anything in the STL:

All you need are strings, arrays, and chars, and ints (the chars and ints should be unsigned!)

No bitsets, no vectors, nothing that does the work for you.

Do not attempt to build a long string containing bits of the entire encoded file.

This is space-inefficient, and will be terribly slow

For Shakespeare, that would be a 40,000,000-character string!

Do not use pow(2, i) to create powers of two – in terms of efficiency, this is evil, and we can do better!

int[] freqCounts(string fileName)

Produces a 256-element array containing the character frequency counts from the file in fileName

Node* countsToTree(int[] counts, String &builder)

Given the frequency counts in counts, builds (and returns a pointer to the root of) the Huffman tree, and produces a 510-byte string containing the tree-building sequence

string[] buildEncodingStrings(node* tree)

Takes a pointer to the root of the Huffman tree, and traverses the tree to return a 256-element string array, where each is the encoding string for the corresponding character

int[] encode(string inFile, string builder, string outFile)

Uses 510-byte string builder to create a Huffman tree, which is then used to encode inFile, placing the results, along with the builder, into outFile. The “results” is the Huffman bit string, padded (if needed) to fill the byte in which it ends.

void decode(string file1, string file2)

Reads file1, extracts the 510-byte tree builder, builds the tree, and uses it to decode the remainder of file1 into file2.

ACT I

SCENE I. Rousillon. The COUNT’s palace.

Enter BERTRAM, the COUNTESS of Rousillon, HELENA, and LAFEU, all in black

COUNTESS

In delivering my son from me, I bury a second husband.

BERTRAM

And I in going, madam, weep o’er my father’s death

anew: but I must attend his majesty’s command, to

whom I am now in ward, evermore in subjection.

LAFEU

You shall find of the king a husband, madam; you,

sir, a father: he that so generally is at all times

good must of necessity hold his virtue to you; whose

worthiness would stir it up where it wanted rather

than lack it where there is such abundance.

COUNTESS

What hope is there of his majesty’s amendment?

LAFEU

He hath abandoned his physicians, madam; under whose

practises he hath persecuted time with hope, and

finds no other advantage in the process but only the

losing of hope by time.

Any Questions?

## Business Finance : Paraphrasing Discussion...

Assignment 1: Paraphrasing Discussion This is a two-part assignment. First, you will use what you learned in the...

## 3 Page Compare and Contrast Essay...

Request 3 page compare and contrast paper.  I read comments left by other students.  If you are not a good writer, please do not waste my time or...

## MBA 5220 Homework 5...

MBA 5220 Homework 5 Please show all calculation steps for full credits. 1. A magazine publ...

## math statistics...

need help with math statistcs?...

## some...

1. each one 150-300 words2. Reference(URL, Financial Times, the Wall Street journal etc.)3. the format shows in the picture4....

## Essay...

900 words, double spacedIdentify a leader on the Fortune list here: ...

...

## The deliverable length of your homework assignment...

Write a professional letter to your Sheriff’s Department to submit your resume. See doc sharing for an example. Inclu...

The product will be on Fast Food, please add notes in the power point.Choose a product or serviceResearch the Internet to analyze how that product...

## math theory questions...

I need answers within 3h just for 2 of therm... ...

## Education & Technology (English)...

Please review all of the guidelines!!! This should be a 5-7 page research paper/interview!!!!...

## What example is change blindness and what are some...

What example is change blindness and what are some examples of change blindness?2 examples would be great please!...
Calculate Price

When you use PaperHelp, you save one valuable — TIME

You can spend it for more important things than paper writing.

Approx. price
\$65
Order a paper. Study better. Sleep tight. Calculate Price!
Calculate Price
Approx. price
\$65