AVR200: Multiply and Divide Routines. Code Size Execution Time. 8 x 8 = 16 Unsigned Multiplication - “mpy8u”

Страницы работы

Фрагмент текста работы

Speed is Comparable with HW mance specifications is given in Table 1.

Multiplicators/Dividers

Example: 8 x 8 Mul in 2.8 ms, 16 x 16

Mul in 8.7 µs (12 MHz)

•  Extremely Compact Code

Table 1.  Performance Figures Summary

Application

Code Size

(Words)

Execution Time

(Cycles)

8 x 8 = 16 bit unsigned (Code Optimized)

9

58

8 x 8 = 16 bit unsigned (Speed Optimized)

34

34

8 x 8 = 16 bit signed (Code Optimized)

10

73

16 x 16 = 32 bit unsigned (Code Optimized)

14

153

16 x 16 = 32 bit unsigned (Speed Optimized)

105

105

16 x 16 = 32 bit signed (Code Optimized)

16

218

8 / 8 = 8 + 8 bit unsigned (Code Optimized)

14

97

8 / 8 = 8 + 8 bit unsigned (Speed Optimized)

66

58

8 / 8 = 8 + 8 bit signed (Code Optimized)

22

103

16 / 16 = 16 + 16 bit unsigned (Code Optimized)

19

243

16 / 16 = 16 + 16 bit unsigned (Speed Optimized)

196

173

16 / 16 = 16 + 16 bit signed (Code Optimized)

39

255


The application note listing consists of two files:

“avr200.asm”: Code size optimized multiplied and divide routines.

“avr200b.asm”: Speed optimized multiply and divide routines.

8 x 8 = 16 Unsigned Multiplication - “mpy8u”


Both program files contain a routine called “mpy8u” which performs unsigned 8-bit multiplication. Both implementations are based on the same algorithm. The code size optimized implementation, however, uses looped code, whereas the speed optimized code is a straightline code implementation. Figure 1 shows the flow chart for the code size optimized version.

Algorithm Description

The algorithm for the Code Size optimized version is as follows:

1.  Clear result High byte.

2.  Load loop counter with 8.

3.  Shift right multiplier

4.  If carry (previous bit 0 of multiplier) set, add multiplicand to result High byte.

5.  Shift right result High byte into result Low byte/multiplier.

6.  Shift right result Low byte/multiplier.

7.  Decrement Loop counter.

Figure 1.  “mpy8u” Flow Chart (Code Size Optimized Implementation)

8.  If loop counter not zero, goto Step 4.

Usage

The usage of “mpy8u” is the same for both versions:

1.  Load register variables “mp8u” and “mc8u” with the multiplier and multiplicand, respectively.

2.  Call “mpy8u” Performance

3.  The 16 -bit result is found in the two register variables “m8uH” (High byte) and “m8uL” (Low byte)

Observe that to minimize register usage, code and execution time, the multiplier and result Low byte share the same register.


Table 2.  “mpy8u” Register Usage (Code Size Optimized Implementation)

Register

Input

Internal

Output

R16

“mc8u” - multiplicand

R17

“mp8u” - multiplier

“m8uL” - result Low byte

R18

“m8uH” - result High byte

R19

“mcnt8u” - loop counter

Table 3.  “mpy8u” Performance Figures (Code Size Optimized Implementation)

Parameter

Value

Code Size (Words)

9 + return

Execution Time (Cycles)

58 + return

Register Usage

•  Low registers

•  High registers

•  Pointers

:None

:4

:None

Interrupts Usage

None

Peripherals Usage

None

Table 4.  “mpy8u” Register Usage (Straight-line Implementation)

Register

Input

Internal

Output

R16

“mc8u” - multiplicand

R17

“mp8u” - multiplier

“m8uL” - result Low byte

R18

“m8uH” - result High byte

Table 5.  “mpy8u” Performance Figures (Straight-Line Implementation)

Parameter

Value

Code Size (Words)

34 + return

Execution Time (Cycles)

34 + return

Register Usage

•  Low registers

•  High registers

•  Pointers

:None

:3

:None

Interrupts Usage

None

Peripherals Usage

None

8 x 8 = 16 Signed Multiplication - “mpy8s”

This subroutine, which is found in “avr200.asm” implements signed 8 x 8 multiplication. Negative numbers are represented as 2’s complement numbers. The application is an implementation of Booth's algorithm. The algorithm provides both small and fast code. However, it has one limitation that the user should bear in mind; If all 16 bits of the result is needed, the algorithm fails when used with the most negative number (-128) as the multiplicand.

Algorithm Description

The algorithm for signed 8 x 8 multiplication is as follows:

1.  Clear result High byte and carry.

2.  Load loop counter with 8.

3.  If carry (previous bit 0 of multiplier) set, add multiplicand to result High byte.

4.  If current bit 0 of multiplier set, subtract multiplicand from result High byte.

5.  Shift right result High byte into result Low byte/multiplier.

6.  Shift right result Low byte/multiplier.

7.  Decrement loop counter.

8.  If loop counter not zero, goto Step 3.


Figure 2.  “mpy8s” Flow Chart

Usage

The usage of “mpy8s” is as follows:                                         3.     The 16 -bit result is found in the two register vari1.  Load register variables “mp8s” and “mc8s” with the ables “m8sH” (High byte) and “m8sL” (Low byte) multiplier and multiplicand, respectively. Observe that to minimize register usage, code and execu2.  Call “mpy8s”           tion time, the multiplier and result Low byte share the same register.

Performance

Table 6.  “mpy8s” Register Usage

Register

Input

Internal

Output

R16

“mc8s” - multiplicand

R17

“mp8s” - multiplier

“m8sL” - result Low byte

R18

“m8sH” - result High byte

R19

“mcnt8s” - loop counter

Table 7.  “mpy8s” Performance Figures

Parameter

Value

Code Size (Words)

10 + return

Execution Time (Cycles)

73 + return

Register Usage

•  Low registers

•  High registers

•  Pointers

:None

:4

:None

Interrupts Usage

None

Peripherals Usage

None


16 x 16 = 32 Unsigned Multiplication - “mpy16u”

Both program files contain a routine called “mpy16u” which performs unsigned 16-bit multiplication. Both implementations are based on the same algorithm. The code size optimized implementation, however, uses looped code, whereas the speed optimized code is a straight-line code implementation. Figure 3 shows the flow chart for the Code Size optimized (looped) version.

Algorithm Description

The algorithm for the Code Size optimized version is as follows:

1.  Clear result high word (Bytes 2 and 3)

2.  Load loop counter with 16.

3.  Shift multiplier right

4.  If carry (previous bit 0 of multiplier Low byte) set, add multiplicand to result High word.

5.  Shift right result High word into result Low word/multiplier.

6.  Shift right Low word/multiplier.

7.  Decrement Loop counter.

8.  8. If loop counter not zero, goto Step 4.

Figure 3.  “mpy16u” Flow Chart (Code Size Optimized Implementation)

Usage

The usage of “mpy16u” is the same for both versions:

1.  Load register variables “mp16uL”/”mp16uH” with multiplier

Похожие материалы

Информация о работе

Тип:
Дополнительные материалы
Размер файла:
90 Kb
Скачали:
0