Data Compression
RunLength Encoding
RLE (RUN LENGTH ENCODING)
Aims to save money and time by reducing amount of data transmitted
Instead of sending <char> <char><char><char><char><char><char> Send ESC 7 <char> When data includes ESC, send ESC ESC
Text usually unsuitable for RLE only contains repeated space chars
more elegant; <char> acts as its own esc
Binary files better often contain repeated chars, especially NUL
RLE is used for encoding faxes
Data Compression
Huffman Coding
HUFFMAN CODING
ASCII encodes all characters with 7 bits
Characters occur with unequal frequencies f e =100 x fq
Use fewer bits to encode most-common
Makes boundaries between characters hard to find
Data Compression
Huffman Coding
HUFFMAN CODING
Consider an alphabet with only 4 letters
00
00
00
01
01
01
00
10
10
11
AAAABBBCCD
1
1
1
01
01
01
1
001
001
000
20 bits → 19 bits better compression with larger alphabets!
Data Compression
Huffman Coding
HUFFMAN CODING
Both ends must agree on the code set
Most efficient to create a code specifically for the data being sent Allows for different letter frequencies in different languages
Fax machines use a modified Huffman scheme. codes for sequences containing 1, 2, 3, .... , 63, 64 black or white dots 128, 192, ... (i.e. multiples of 64) dots
So 67-dot sequence would be sent as codes for 64 then 3
Data Compression
Lempel-Ziv Compression
LEMPEL-ZIV COMPRESSION
ZIP and UNIX Compress utility use (modified) L-Z compression
Codes are fixed-length (usually 12 or 16-bits )
7-bit ASCII for single characters + nine 0-bits Inefficient when sending single characters But after a while, very few single characters get sent
Extra codes for most common character sequences Sender creates extra codes based on the letter frequencies in the message Receiver constructs extra codes while decompressing the original message
Data Compression
Lempel-Ziv Compression
LEMPEL-ZIV COMPRESSION
the Remaining string
R denotes the unsent part of the message
Initially, R is the complete message
A code table relates character sequences to codes
a 1 b 2 . z 26 . th 27 . the 79 . then 158
Initially, code table just contains the alphabet Sender and receiver have the same table
New sequences are added when they are encountered sender and receiver both add the same codes easy for the sender!
L denotes the longest string of characters… starting from the first character of R occurring in the code table
L’ denotes L + the next character in R
Data Compression
Lempel-Ziv Compression
LEMPEL-ZIV COMPRESSION
…they then and there theorised that this was thus
Sender identifies L, sends the code for L to the receiver
Receiver receives code looks it up in the code table adds L to the message string
Sender identifies L’, & makes a new entry in the code table for L’
Data Compression
Lempel-Ziv Compression
LEMPEL-ZIV COMPRESSION
Data Compression
Lempel-Ziv Compression
LEMPEL-ZIV COMPRESSION
Receiver
Sender
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27
t
h
e
y
_
t
h
e
n
_
a
n
d
_
t
h
e
r
e
_
t
h
e
o
r
i
s
e
d
_
t
h
a
t
_
t
h
i
s
_
w
a
s
_
t
h
u
s
t
h
R L L’
20
28
Data Compression
Lempel-Ziv Compression
LEMPEL-ZIV COMPRESSION
Receiver
Sender
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27
e
e
h
y
_
t
h
e
n
_
a
n
d
_
t
h
e
r
e
_
t
h
e
o
r
i
s
e
d
_
t
h
a
t
_
t
h
i
s
_
w
a
s
_
t
h
u
s
t
h
R L L’
8
29
Data Compression
Lempel-Ziv Compression
LEMPEL-ZIV COMPRESSION
Receiver
Sender
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27
y
e
y
_
t
h
e
n
_
a
n
d
_
t
h
e
r
e
_
t
h
e
o
r
i
s
e
d
_
t
h
a
t
_
t
h
i
s
_
w
a
s
_
t
h
u
s
t
h
e
R L L’
5
30
Data Compression
Lempel-Ziv Compression
LEMPEL-ZIV COMPRESSION
Receiver
Уважаемый посетитель!
Чтобы распечатать файл, скачайте его (в формате Word).
Ссылка на скачивание - внизу страницы.