102

COS2621/102/3/2018 Tutorial Letter 102/3/2018 Computer Organisation and Architecture COS2621 Semesters 1 and 2 School ...

1 downloads 124 Views 976KB Size
COS2621/102/3/2018

Tutorial Letter 102/3/2018 Computer Organisation and Architecture

COS2621 Semesters 1 and 2 School of Computing This tutorial letter contains important information about your module.

BARCODE

COS2621/102/3/2018 Dear Student

1 Study unit 0 It is important for any serious computer scientist/ICT professional to have a deep knowledge of the way in which a computer program is executed. It does not matter whether the program was written in a high-level language like C++, Visual Basic or Java, or whether it was written in assembly language. Eventually, all program code has to be translated into machine code before it can be executed. In this module, we study the underlying organisation and operation of a modern digital computer. Our aim is to explain what goes on at the lower levels inside a computer, and we do this by examining some of the hardware and software components of a computer system.

We would like to instill the following attitude: "Look, I don't know this particular brand of computer, but since I know that almost all computers operate in basically the same way, give me one or two weeks and the set of relevant manuals and I will be quite at home with this computer".

Theoretical and practical components 

The theoretical part of the syllabus for COS2621 is covered in the prescribed book. This tutorial letter (often referred to as the guide, or the study guide) contains all the study material you will need for the practical component of this module.

COS2621/102/3/2018

The prescribed book for this module is:

Stallings, W. 2016. Computer Organization & Architecture: Designing for performance. 10th edition. Prentice Hall. Note that we are not going to study the chapters in Stallings in the order in which they are presented. We have to introduce the assembly language concepts as soon as possible in order for you to master the practical work. A short summary of the concepts that we study in this module follows: Study unit 1: Computer Organisation and Computer Architecture, Boolean algebra and number systems (This study unit supplements chapters 1, 9 and 11 as well as appendix B of Stallings.) Study unit 2: The evolution of computers has been characterized by increasing processor speed, decreasing component size, increasing memory size, and increasing I/O capacity and speed. One factor responsible for the great increase in processor speed is the shrinking size of microprocessor components; this reduces the distance between components and hence increases speed. (This study unit supplements chapter 2 of Stallings.) Study unit 3: An instruction cycle consists of an instruction fetch, followed by zero or more operand fetches, followed by zero or more operand stores, followed by an interrupt check (if interrupts are enabled). The major computer system components (processor, main memory, I/O modules) need to be interconnected in order to exchange data and control signals. The most popular means of interconnection is the use of a shared system bus consisting of multiple lines. In contemporary systems, there typically is a hierarchy of buses to improve performance. (This study unit supplements chapter 3 of Stallings.) Study unit 4: Processors make use of instruction pipelining to speed up execution. In essence, pipelining involves breaking up the instruction cycle into a number of separate stages that occur in sequence, such as fetch instruction, decode 3

COS2621/102/3/2018 instruction, determine operand addresses, fetch operands, execute instruction, and write operand result. Instructions move through these stages, as on an assembly line, so that in principle, each stage can be working on a different instruction at the same time. The occurrence of branches and dependencies between instructions complicates the design and use of pipelines. (This study unit supplements chapter 14 of Stallings).

Study unit 5: Computer Arithmetic (This study unit supplements Chapter 10 of Stallings.)

Study unit 6: Instruction sets: Characteristics and Functions (This study unit supplements chapter 12 and appendix B, as well as appendix O (online) of Stallings.)

Study unit 7: Instruction sets: Addressing mode and formats unit supplements chapter 13 of Stallings.)

(This

study

Study unit 8: Cache memory (This study unit supplements Chapter 4 of Stallings.)

Study unit 9: Internal memory (This study unit supplements chapter 5 of Stallings).

Study unit 10: External memory (This study unit supplements chapter 6 of Stallings.)

Study unit 11: Input/ Output (This study unit supplements chapter 7 of Stallings.)

Study unit 12: RISC computers (This study unit supplements chapter 15 of Stallings.)

DEBUG: DEBUG, that is available as part of DOS within the Windows environment, creates an environment in which relatively small assembly language programs can be

COS2621/102/3/2018 developed, run and debugged with ease. It can also be used to debug larger programs. This is a very powerful debugging tool which you should learn to use effectively.

Important! This study guide supplements the prescribed book. You cannot study this module using only this study guide. All the theoretical study material is contained in Stallings.

5

COS2621/102/3/2018

Study unit 1 Organisation and architecture

This study unit supplements the preface, chapters 1, 9 and 11, as well as Appendix B of Stallings. The most important components of a computer are introduced. A short overview f t hit t i i d d i i f fi t k

Learning outcomes Once you have mastered the study material in this study unit, you will be able to



identify the different components a computer system consists of



describe the most important function of each component



list the main components of the CPU



describe the difference between computer organisation and computer architecture



discuss the advantages of studying assembly language programming,

computer

organisation and architecture •

be able to convert values between the binary, decimal and hexadecimal systems



use Boolean algebra to simplify Boolean expressions

numbers

COS2621/102/3/2018

Stallings Study chapter 1; chapter 9; chapter 11: section 11.1. Read preface, chapter 11: sections 11.2 to 11.7; Appendix B.

7

COS2621/102/3/2018 Chapter 1

Introduction

Chapter 1 introduces the concept of the computer as a hierarchical system. A computer can be viewed as a structure of components and its function described in terms of the collective function of its cooperating components. Each component, in turn, can be described in terms of its internal structure and function. The major levels of this hierarchical view are introduced. The remainder of the book is organized, top down, using these levels.

Stallings: Study section 1.2.

1.1 Basic building blocks 

Stallings: Study chapter 9. When working on the low level of computer systems, we frequently use the binary and hexadecimal number systems. It is important that you feel comfortable with both these number systems. Stallings: Study chapter 11.1. The concepts discussed in chapter 11 form the basic building blocks of computer components.

1.2 Boolean algebra 

Activity 1-1 Simplify the following expressions using Boolean algebra. Name the identity used in every step. a) [(CD)′ + A]′+ A + CD + AB

COS2621/102/3/2018 b) xyz + x′y + xyz′ c) ABC' + CB' + AC + AB'

Solution  a) [(CD)′ + A]′ + A + CD + AB = ((CD)′)′ A′ + A + CD + AB

De Morgan

= CDA′ + A + CD + AB

Double negative

= CD(A′+1) + A(1+B)

Associative & distributive

= CD + A

Null (Zero): A'+1=1

b) xyz + x′y + xyz′ = xy(z+z′) + x′y

Distributive

= xy + x′y

Complement

= y(x+x′)

Distributive

=y

Complement

c) ABC' + CB' + AC + AB' = ABC' + CB' + AC(B+1) + AB'

Null: B+1=1

= ABC' + (ABC + AC) + AB' + CB'

Distributive & associative

= AB(C' + C) + AC + AB' + CB'

Associative & distributive

= AB + AC + AB' + CB'

Complement

= A(B + B') + AC + CB'

Associative

= A + AC + CB'

Complement

= A(C+1) + CB'

Distributive

= A + CB'

Null: C+1=1

9

COS2621/102/3/2018 Stallings: Read sections 11.2 to 11.7. This is not for examination purposes but you will encounter the concepts explained here in other sections of the study material. It is important that you understand the functions of multiplexers and decoders respectively.

1.3 Summary 

The scope of the study material in the text book is discussed in the preface. It is advisable to read this through to get an overview of the study material covered in this module. Remember that we will not study all the chapters in Stallings and will not work through the text book in the order in which the chapters are presented.

1.4 Key questions 

Make sure that you are able to answer the relevant review questions given at the end of each of the chapters covered in this unit!

----oooOooo----

COS2621/102/3/2018

Study unit 2

4

5 Computer performance

This study unit supplements chapter 2 of Stallings. The history of the different computer generations is discussed, with the emphasis on the technological advances that led to the emergence of each new computer generation. Some of the factors that have to be taken into account when designing a computer system are discussed, and we look at the

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to



discuss important issues that have to be resolved when designing

computer

systems •

describe the evolution of the Intel x86 family of microprocessors



describe what is meant by an embedded system



discuss the impact of new approaches involving multicore machines, MICs

and

GPUs

11

COS2621/102/3/2018

Stallings

Study sections 2.1 to 2.5. (Study the sections from these chapters as indicated in this study unit. Read the rest of

COS2621/102/3/2018 2.1 Designing for performance  The evolution of the different computer generations is discussed with a short explanation of the technology that gave rise to the emergence of each new generation. We also look at important factors that have to be taken into account in the design of a computer system. The practical work for COS2621 should be done on an Intel x86 computer. Note that all modern Intel CPUs are x64 but are x86 backwards compatible. Pay particular attention to all the discussions in the prescribed book that deal with the Intel x86 family of microprocessors. It is very important to note that there may be significant differences at a low level between different families of microprocessors. For this reason, we do not restrict our attention to the Intel x86, but also look at the organisation of the ARM processor.

2.2 Computer evolution 

Stallings: read section 2.1. Pay particular attention to the figures. If you understand these diagrams it will be easier to understand the corresponding study material.

Stallings: study the following in section 2.1: 

 The general structure of the IAS computer that is based on the von Neuman design  Moore’s law and the consequences thereof  The set of characteristics that distinguish a particular family of computers

2.3 Design principles 

Stallings: study section 2.2.

13

COS2621/102/3/2018 Hardware and software are generally logically equivalent. This is a simple but very important concept. Hardware and software are frequently capable of performing the same function. Compare performing the same logical function in Boolean arithmetic (section 11.1), and using hardware (sections 11.2 and 11.3). In computer design, we often have to decide which functions to implement in hardware and which in software. Both methods have pros and cons that should be considered. Hardware offers speed but not flexibility, whereas software offers flexibility but less speed. This phenomenon, known as a "trade-off", is often relevant in this module. Remember that it is a design decision. Except for the reasons mentioned here, there is no special reason why it is advisable to implement a specific operation in hardware rather than in software or vice versa, but cost often plays a role in the trade-off between the different approaches.

2.4 Multicore processors, MICs and  GPUs  Stallings, study section 2.3

This section discusses the cutting-edge developments as far as computer architecture is concerned. 2.5 The Intel x86 family and the ARM  processor  Stallings: study section 2.4 and read section 2.5. It is not necessary to memorise the differences between all the members of the Intel x86 family. However, take note of the main differences especially among the Pentium 1, 2, 3, and 4 and the Pentium Multicore CPUs. Read the section on embedded systems and the ARM CPU. Note that the ARM CPU is also used in mobile phones and tablets! You need not memorise the differences between the various members of the ARM family of microprocessors. We study the design principles of RISC and CISC architectures respectively in study unit 12. 2.6 Summary  Stallings, section 2.8 Make sure that you understand all the key terms, and that you know what the acronyms listed there represent.

COS2621/102/3/2018

2.7 Key Questions 

Work through the review questions at the end of the chapter.

----oooOooo----

6

Notes

15

COS2621/102/3/2018

7

Study unit 3

8

Computer functions and interconnections

This study unit supplements chapter 3 of Stallings. The instruction cycle is described. We also look at different ways in which computer components can be connected.

Learning outcomes Once you have mastered the study material in this study unit, you will be able to •

explain the instruction cycle



explain what an interrupt is and what happens when an interrupt occurs



describe the different classes of interrupt



discuss different ways in which multiple interrupts can be handled explain why interconnections between different computer components are necessary



explain what a computer bus is



describe different bus structures and multiple-bus hierarchies



distinguish different types of bus



discuss bus arbitration and bus width

• •

explain the difference between synchronous and asynchronous timing describe what PCI is

COS2621/102/3/2018

Stallings

Study sections 3.1 - 3.4. Read sections 3.5 and 3.6. You must study the sections indicated in this study unit.

17

COS2621/102/3/2018 3.1 Introduction 

Chapter 3 of Stallings gives a general introduction to the structures that are used to connect different computer components. The instruction cycle is explained as well as the handling of different types of interrupt. We look at ways in which buses are used for interconnection.

3.2 Computer components 

Stallings: study section 3.1. Stallings discusses the components that a computer system consists of and explains why connections between these components are necessary.

3.3 Computer function 

Stallings: study section 3.2. The instruction cycle is explained. We look at interrupts and how and why an interrupt cycle is introduced into the instruction cycle. In this way we can test for an interrupt after the execution of every instruction. Stallings also discusses ways in which multiple interrupts could be handled.

3.4 Interconnections 

Stallings: study section 3.3. The structure of the different computer system is discussed in this section.

interconnections

3.5 Bus interconnection, point‐to point interconnect and PCI  buses 

in

a

COS2621/102/3/2018 Stallings: study sections 3.4. The general structure of a bus, as well as the way

in

which multiple buses can be used to optimise performance, are discussed. Different types of bus are described and we look at bus arbitration and bus width. The concepts of synchronous/asynchronous timing between communicating components are considered.

Stallings: read sections 3.5 and 3.6. PCI, as well as the newer PCI Express buses are frequently used in microcomputers. It is not necessary to pay attention to detail.

3.6 Summary 

Make sure that you understand the key terms and that you know what the acronyms listed in this section represent.

3.7 Key Questions. 

Work through the review questions at the end of the chapter.

----oooOooo----

9

Notes

19

COS2621/102/3/2018

10

Study unit 4

11

Processor Structure and Function

This study unit supplements chapter 14 of Stallings. The most important components of the CPU are described and processor register organisation is discussed. We look in detail at the register organisation of the Intel x86 and the ARM processors respectively. The

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to



describe the internal structure of the CPU in general



describe the register organisation of the Intel x86



describe what an interrupt is and what happens when an interrupt occurs



describe different types of interrupt



describe what an exception is and what happens when an exception occurs



describe the instruction cycle for indirect addressing



understand the principles of pipelining

COS2621/102/3/2018

Stallings

Study sections 14.1, 14.2, 14.3 and 14.5. Read sections 14.4 and 14.6.

21

COS2621/102/3/2018 4.1 Introduction 

We look at CPU organisation using the instruction cycle to explain the need for the different components comprising the CPU. The basic principles of instruction pipelining are discussed without going into too much detail. The organisation of the Intel x86 family is studied in detail since we need this knowledge to write assembly language programs for this family of microprocessors. Finally, we look at the organization of the ARM processor.

4.2 Processor organisation 

Stallings, study section 14.1. Once you understand the instruction organisation of the CPU will become clear.

cycle,

the

Stallings, revise section 1.2.

4.3 Register organisation & the instruction cycle  revisited 

Stallings, study section 14.2. The purpose of a variety of sets of CPU registers is explained. We study the register organisation of the Intel x86 family. We give more detail regarding the register set of the Intel x86 relevant to our practical work in section 4.5 of this study unit.

Stallings, study section 14.3. Indirect addressing on the Intel x86 is discussed in study unit 7 of this study guide.

4.4 Instruction pipelining 

COS2621/102/3/2018 Stallings, read section 14.4.

4.5 The x86 processor 

Stallings, study section 14.5. Stallings does not give much detail regarding the assembly language of the Intel x86 family. As you will be doing your practical work in x86 Assembly Language, we provide you with additional notes regarding the architecture and assembly language of the Intel x86 family. It is important to realise that this particular assembly language is but one of many different assembly languages. Each family of machines has its own assembly language.

To avoid having to buy an expensive second text book for the practical work, we include a description of the most important assembly language instructions in this section as well as in the next two study units. It might take a while before you will be able to see how the different concepts fit together, but working through some examples is always the easiest way to learn a new programming language.

!

Look at some examples in Appendix C

4.5.1 The register set of the x86  family 

Registers are special storage components inside the CPU (Central Processing Unit). The registers common to all the Intel x86 processors are 16 bits in size and can be classified as follows:

23

COS2621/102/3/2018 General-purpose registers: AX, BX, CX, DX Segment registers:

CS, DS, SS, ES

Index registers:

SI, DI

Pointers:

BP, IP, SP

Flags register:

Nine of the bits (flags) in this register are of some importance to us.

Most of the processors in the Intel family have extended general-purpose registers, extended index registers and extended pointer registers that are 32 bits in size. These are written as above with a prefix E. The extended versions of the general-purpose registers are called EAX, EBX, ECX and EDX. Current Intel processors have 64-bit registers.

For your practical work, we are concerned only with the 16‐bit registers  of  the Integer  Unit common to all the Intel chips. In this section, we therefore restrict our attention to  these ones.  4.5.1.1 General-purpose registers

The general-purpose registers are used for data movement and for arithmetic. Each of these registers can be addressed as either one 16-bit (2 bytes) register or as two 8-bit (1 byte) registers. The leftmost byte is the high portion and the rightmost byte is the low portion. For example, AH and AL are the high and low portions of the AX register: Bit

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

16-bit AX register 8-bit AH register Bit

7

6

5

4

3

2

8-bit AL register 1

0

7

6

5

4

3

2

1

0

COS2621/102/3/2018 Note that the bits are numbered from right to left on the Intel machines.

AX register: The AX register is known as the primary accumulator. Some text books refer to the AX register as the accumulator. The AX register is mainly used for operations involving data movement, input/output and arithmetic. Some instructions such as MUL and DIV assume that AX contains the multiplicand and dividend respectively.

BX register: The BX register is the only general-purpose register that can be used as a pointer to extend addressing. It is therefore referred to as the base register. It is also used for arithmetic. We will have another look at extended addressing when we discuss addressing modes.

CX register: The CX register is known as the count register since it is used, for example, to control the number of times a loop is to be executed, or the number of shifts that should be performed. It can also be used for arithmetic.

DX register: Some input/output operations such as IN and OUT use the DX register. Multiply and divide operations involving 16-bit registers make use of the combination of DX and AX. The DX register is thus known as the data register.

4.5.1.2 Segments and segment registers 

Memory consists of a collection of segments each of which is 64K bytes long. A segment is an area in memory that begins on a paragraph boundary, in other words on an address that is a multiple of 1610 (10h). This means that one segment could start at address 00000, another at address 00010h, another at address 00020h, et cetera. This also means that segments can overlap. Note that the starting address of a segment always ends with a 0 when written as a hexadecimal number. For this reason, the designers decided that it would be unnecessary to store the last digit, which is always 0, in the segment register, so the address of a segment is always stored without the rightmost 0. For example, the address of the segment starting at 18A30h will be stored as 18A3h. Some text books show the rightmost 0 in square brackets, for example 18A3[0]h. 25

COS2621/102/3/2018 A segment may be located anywhere in memory (as long as it starts on a paragraph boundary) and may be as small as 1 paragraph (10h bytes) and as large as 64K bytes in length. However, it requires only as much space as is required for execution by the program that uses it. A program written in x86 assembly language is divided into three primary segments, namely: - The code segment - The data segment - The stack segment. The code segment (pointed to by the CS register) contains the machine instructions of the program. All references to memory locations that contain instructions of a program are relative to the start of a segment specified by the contents of the CS register. A byte of memory is referenced by the segment that contains it, followed by an offset from the beginning of that segment. The offset is indicated by the IP register (instruction pointer). The notation used is segment:offset. The offset is between 0000h and FFFFh (FFFFh = 64K-1). The IP register always contains the offset (relative to the start of the code segment) of the next instruction to be executed, so CS:IP forms the actual 20-bit address of the next instruction to be executed. The actual 20-bit address is also referred to as the effective address. Suppose CS contains BCEFh and IP contains 123h. The actual 20-bit address referred to by BCEF:0123 is calculated as follows:

Segment

address

BCEF0

(CS):

Remember rightmost 0

in

0123 Offset (IP):

the

address Add the two addresses t

th

hexadecimal

COS2621/102/3/2018

Activity 4.1

Calculate the actual 20-bit address referred to in the following examples:

(i)

CS = 020A; IP = 1BCD

(ii)

74D6:0100

(iii)

3DA3:311

  Solution 

(i)

Segment address (CS): Offset (IP):

(ii)

1BCD

Actual address (CS:IP):

03C6D

Segment address (CS):

74D60

Offset (IP):

(iii)

020A0

0100

Actual address (CS:IP):

74E60

Segment address (CS):

3DA30

Offset (IP): Actual address (CS:IP):

Addresses hexadecimal

in

Remember to add the right-most 0

311 3DD41

We do not always refer to addresses in the notation given above. When a memory location in a specific segment is addressed from within that segment, we use only the offset part. Example: If a program is stored in some code segment, only the offset is used to refer to other memory locations within this segment. Thus, JMP 123h will cause a jump to offset 123h within the memory segment where the program is loaded. 27

COS2621/102/3/2018 The IP register cannot be referenced directly in a program by a programmer. Its value can be changed when using DEBUG during the debugging and testing phase of a program. (Refer to Appendix A.3.2.) However, the value of IP can be changed indirectly by the programmer by using JMP (jump) and conditional branch instructions.

You need not worry about the CS register. During normal execution of a program, instructions are automatically fetched from memory. You do not need to concern yourself with the values of CS and IP or any of the values of the segment registers. The data segment (pointed to by the DS register) contains the variables, constants and work areas of a program. The actual address where data is fetched from memory when an instruction such as MOV AL,[120h] is executed, is not the physical memory location 120h but the memory location 120h relative to the contents of the DS register. Suppose the DS register contains 0600h. The actual 20-bit location from which the byte of data will be moved can be calculated as follows: Data address:

segment

Address given:

06000

Remember the rightmost 0

0120

Addresses in hexadecimal

06120

Actual address:

You need not worry about the DS register either. The operating system sets the DS register to a suitable value when a program is loaded. In actual fact, the operating system decides where in memory both the program and the data should be located. One can set the DS register to insist on using specific locations for data, but we will not use this functionality in COS2621. The stack segment (pointed to by the SS register) contains the program stack. The stack is an area in memory that is used to save data and addresses that need to be temporarily stored while the program is executing.

COS2621/102/3/2018 We demonstrate the use of the SS register in study unit 6 of this study guide.

The extra segment register (ES) is used during the manipulation of sequences of characters by some string operations. The ES register is associated with the DI register which is discussed in a later section. We will not work with the ES register in COS2621. 4.5.1.3 Index registers 

SI and DI: Index registers contain the offset of memory positions relative to the start of the segment. SI (Source Index) and DI (Destination Index) are used to indicate source and destination addresses when strings of characters are written to and read from memory. SI usually contains an offset value from the DS register, but it can address any memory position. The source string is pointed to by the SI register. The DI register is the destination for string movement instructions. It usually contains an offset from the ES register but can address any memory position.

The use of the SI and DI registers will be demonstrated in examples in study unit 6 of this study guide.

4.5.1.4 Pointer and stack registers 

The stack is a special area in memory that is used for the temporary storage of addresses and data. The stack is located in the stack segment. You may think of the stack as a number of boxes stacked on top of each other. The last one to be added on top (pushed onto the stack) is also the first one to be removed (popped) from the stack.

The Stack Pointer (SP) register holds the address (in the stack) of the last element to be added to (pushed onto) the stack. In other words:

29

COS2621/102/3/2018 We can say that the SP register contains the offset from the beginning of the stack to the top of the stack. So SS:SP contains the address of the top-of-the-stack.

The Base Pointer (BP) register contains an offset from the SS register. Normally, the only word in the stack that is accessed is the one on top of the stack. However, the BP register can also keep an offset in the stack segment and can be used in procedure calls, especially when parameters are passed to subroutines. SS:BP contains the address of the current word being processed in the stack. More information about the stack and the BP and SP registers will be given in study unit 6 of this study guide.

4.5.1.5 The flags register 

Stallings: figure 14.22. The EFLAGS register is used as a flags register. Nine of the 32 bits of this register are common to all x86 processors. The other bits are not important for COS2621 and we will not consider them in this section.

Each time an arithmetic instruction is executed, certain flags will be set either to 1 or 0 to indicate the outcome of the operation. Not all instructions affect the flags and not all flags are affected by instructions that do have an effect on flags. The way in which the individual flags are affected by an instruction is shown in Appendix E where the instructions comprising the x86 instruction set are listed.

The following bits in the flags register represent the individual flags we are interested in:

COS2621/102/3/2018

Bit Flag

Contents or Description

Set to 1

0

1) The carry from the high-order (leftmost) bit following an arithmetic operation

1) If addition produced a carry or if a subtraction produced a borrow

CF: Carry Flag

2)

The result of a CoMPare instruction

2) When the data elements compared are not equal

3)The bit which has been shifted or 3) If the bit which has been rotated out of a register or memory shifted or rotated out of a location register or memory location is equal to 1 2

PF: Parity Flag

Mainly used when transmitting data

If the result of an operation has an even number of 1bits

4

AF: Auxiliary Flag

Similar to CF, except that it indicates the presence or absence of a carry or overflow based on a 4-bit numeric representation in bits 0, 1, 2, 3. It is useful for operations on ‘packed decimal’ (BCD) numbers

See CF

6

ZF: Zero Flag

Indicates whether or not the result of Set to 1 for a zero result! an arithmetic or compare operation is equal to zero

7

SF: Sign Flag

Contains the resulting sign after an arithmetic operation

8

TF: Trap Flag

By setting TF to 1, the Intel processors can be forced to operate in single-step mode

9

IF: Interrupt Flag

Allows interruption of normal operations from peripherals and other sources

Set to 1 if result is negative

31

COS2621/102/3/2018 10 DF: Direction Flag

11 OF: flag

Overflow

If DF is set to 0, the special string manipulation instructions increment the appropriate index registers and proceed forward through memory. If DF is set to 1 the index registers are decremented and progress is backward through 1) Overflow occurs if an arithmetic 1) Set to 1 if the carry into operation on signed numbers gives and the carry out of the highest-order bit differ a result that is out of range 2 ) Used during shift operations

2) Set to 1 if the sign bit of an operand changes during a shift operation, otherwise set to 0

3) Used during division

3) Set to 1 if a division operation produces a quotient that is too big for the register which has to contain the result

The flags that concern us most are SF, ZF, CF and OF. Of these, the CF and OF are the more complicated ones. Carry flag: signed numbers (twos complement)] For signed data, the range of signed numbers that can be stored in n bits is -2 n-1  n  2 n-1 -1.

Thus, the ranges of signed numbers are as follows: in an 8-bit register: in a 16-bit register:

( -128)10  n  (127)10, and (-32 768)10  n  (32 767)10.

The following table summarises the conditions under which overflow occurs:

COS2621/102/3/2018

Carry into sign bit

Carry out of sign bit

Overflow

No

No

No

No

Yes

Yes

Yes

No

Yes

Yes

Yes

No

Confusing as it may seem, the contents of a data field mean whatever you intend it to mean. As long as the process writing the data and the one reading it interpret it the same way, there is no problem. Apart from signed and unsigned integers we may also store floating-point numbers or character data in ASCII, EBCDIC, UNICODE or any other character code. Let us look at a few examples to illustrate how the OF and CF are used to indicate overflow for signed as well as unsigned numbers.

Note: The results in columns 2 and 3 are the equivalent of the binary numbers in column 1 interpreted as unsigned binary numbers and signed numbers in twos complement representation, respectively. Look at the second addition for example:

(1000 0100)2 is the twos complement representation of (-124)10 in column 3 which differs from (119 + 13 = 132)10 in column 2.

33

COS2621/102/3/2018

Examples: 

Signed and unsigned numbers in 8-bit registers

Binary

Overflow flag Unsigned Signed numbers numbers 249

1111 1001

3

Carry into sign bit (0) is CF = 0 equal to the carry out of sign bit (0).

252

-4

Thus OF = 0

Valid

Valid

+0000 0011

+3

1111 1100

+

1000 0100

1111 1100 +

0000 0101

(1) 0000 0001

+

13

119 +

13

132

-124

Valid

Invalid

252

-4

5

+ 5

1

1

Invalid 1111 0110

246

1000 1001

+ 137

(1) 0111 1111

127

+

+

119

0111 0111 +0000 1101

-7

Carry flag

Invalid

Signed numbers in 16-bit registers

Carry into sign bit (1) is CF = 0 not equal to the carry out of sign bit (0). Thus OF = 1

Carry into sign bit (1) is CF = 1 equal to the carry out of sign bit (1). Thus OF = 0

Valid -

10 Carry into sign bit (0) is CF = 1 119 not equal to the carry +127 out of sign bit (1). Invalid

Thus OF = 1

COS2621/102/3/2018 Binary number in twos Decimal complement representation (equivalent number column 1)

Comments of in

0100 1000 0011 1111

18 495

+ 0110 0100 0101 1010

+ 25 690

Overflow occurs since there is a carry into the sign bit but no carry out.

1010 1100 1001 1001

- 21 351

Thus: OF = 1

Invalid

1110 1001 1111 1111

- 5 633

+ 1000 1100 1111 0000

-29 456

Overflow occurs since there is a carry out of the sign bit but no carry into the sign bit.

(1) 0111 0110 1110 1111

30 447

Thus: OF = 1

Invalid

+

1111 1111 1110 0111

-25

1111 1111 1111 0110

-10

No overflow. The carry into the sign bit (1) is equal to the carry out (1) of the sign bit.

-35

Thus: OF = 0

(1)1111 1111 1101 1101

This is the twos complement Valid representation of -35. Remember that we discard the carry out of the sign bit.

35

COS2621/102/3/2018 4.5.2 Reverse Byte Order in  Memory 

The x86 machines store bytes in reverse byte order in memory. Consider the number 0015h. Two bytes are required to store this number. The 16-bit number 0015h consists of a high-order (most significant) byte 00h, and a low-order (least significant) byte 15h. Suppose we want to store 0015h in memory from location 101h onwards. In memory the low-order byte, 15h, will be stored in the low-order position 101h, and the high-order byte, 00h, will be stored in the high-order position 102h. Thus 0015h is stored in memory as follows:

Memory position Hexadecimal number

101h

102h

15h

00h

The CPU expects numeric data in memory (not registers) to be stored in reverse byte order and processes it accordingly. You must be aware of this peculiarity otherwise you may get confused when you examine the contents of memory locations. The CPU reverses the bytes again if data is loaded into registers from memory locations.

Activity 4.2

Execute the following two instructions and inspect the contents of memory locations 200h and 201h:

mov

ax,1234h

mov

[200h],ax

COS2621/102/3/2018 Contents of memory location 200h:

34h

Contents of memory location 201h:

12h

We will learn more about byte-ordering conventions in the next study unit.

4.5.3 Interrupts 

An interrupt causes the interruption of normal program execution. In other words, the flow of control is interrupted. Interrupts can be caused by hardware or software. Interrupts are handled either by DOS or by BIOS. The DOS routines are part of the particular operating system that is loaded, whereas most of the BIOS routines are hardwired onto the motherboard.

Software interrupts: A software interrupt is generated by an INT instruction and is a call to a specific interrupt routine (the address is found in the interrupt vector table) to request a service from the operating system. Interrupts can be invoked by using the INT instruction. Software interrupts can also be referred to as system calls

Hardware interrupts: A hardware interrupt, sometimes called an external interrupt, is generated by a special chip, called the Interrupt Controller. This chip requests the CPU to suspend the current program temporarily and to process the interrupt. Example: pressing a key on the keyboard causes the CPU to suspend the current program temporarily and to execute the BIOS routine that reads the character from the keyboard input port and stores it in a memory buffer. The steps taken when a hardware interrupt occurs are similar to those when a software interrupt occurs. The hardware interrupt is handled by the interrupt handler that associates a specific vector in the interrupt vector table with that particular hardware interrupt. Control is passed to the relevant interrupt service routine in the same way as is done when a software interrupt occurs.

37

COS2621/102/3/2018 4.5.3.1 The interrupt vector table

Stallings, table 14.3. A part of memory is reserved for the interrupt vector table. Each of the interrupts available on the Intel chips has a 4-byte (2-word) entry in the interrupt vector table to accommodate an offset and a code segment address. For interrupt type 0, the instruction offset is stored in the word at address 0, and the code segment address is stored in the word at address 2. For interrupt type 1, the offset is stored in the word at address 4, and the code segment address is stored in the word at address 6. In general, for interrupt type n, the instruction offset is stored in the word at address 4*n, and the code segment address is stored in the word at address (4 * n) + 2. Interrupt

Address

Contents of address

number (4 * n)+2 n

4*n

.

.

.

.

.

. 6

1

0

Code address

segment

Offset address

Code address

segment

4

Offset address

2

Code segment address

0

Offset address

Each code segment and offset points to its own interrupt handler (also called interrupt service routine). This is a block of code similar to a procedure, which must be executed if that particular interrupt occurs.

COS2621/102/3/2018 4.5.3.2 The INT instruction ‐ software  interrupts 

The above description of how interrupts function might seem complicated but fortunately, the operating system does most of the work. The programmer needs only to place one or more values in registers and to invoke the relevant interrupt instruction. The instruction used for interrupts by the Intel family is the INT instruction. Since (256)10 (i.e. 0 to FFh) different types of interrupt are allowed, we must specify which one is required.

The format of the INT instruction is as follows: int number (number is an integer from 0 to FFh).

The interrupt instructions which we will use frequently are INT 20h and INT 21h. INT 20h causes the program to terminate and control to be transferred back to DOS. INT 21h allows operations such as reading a character from the keyboard, printing a character on a printer, and displaying a character on the screen. We call the specific action to be performed a function or service. The AH register is used to specify which function is required. Example (refer to INT 21h in Appendix D): mov

ah,2

; Select function 2, display character

mov

dl,48h

; 48h is ASCII for 'H'

int

21h

; Display the character in DL on the screen

int

20h

; Return to DOS (terminate program)

More examples of the use of the INT instruction are included in Appendix C. The common software interrupts, i.e. INT 10h, INT 17h and INT 21h, are listed in Appendix D.

4.5.4 The Assembly Process, Nasm and DEBUG 

There are a number of routes we can follow when creating an assembly language program on the Intel x86. Two of the possibilities are shown in Figure 4.1. We can use NASM to 39

COS2621/102/3/2018 assemble the source code. We can then use DEBUG, which is available under DOS, to trace through the program. DEBUG can also be used to assemble small programs but it has limitations.

ALGORITHM Source Program

EDITOR

DEBUG

DEBUG input

DEBUG

*.

*.ASM file

NASM

COM file

(executable prog ram)

Test by using DEBUG

Test by running the program

Figure 4.1: The assembly process 

COS2621/102/3/2018 4.6 The ARM processor

Stallings, read section 14.6. It should become clear why we need to study the register organisation of the machine we are working with at a low level if we compare the register organisation of the ARM processor to that of the Intel x86.

4.7 Summary 

Stallings, section 14.8. Make sure that you understand the processor organisation of the Intel x86 and that you know the meaning of all the key words listed in this section.

4.8 Key Questions 

Work through the review questions at the end of the chapter.

----oooOooo----

12 Notes

41

COS2621/102/3/2018

Study unit 5 Computer arithmetic

13 14

This study unit supplements chapter 10 of Stallings. The way in which integer and floatingpoint numbers are represented in a computer is explained. We look at instructions that can be used on the Intel x86 to perform arithmetic and logical, as well as bit-manipulation

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to



explain how integers can be represented in a computer



use assembly language to perform arithmetic and logical operations on integers on the Intel x86



use assembly language to perform bit manipulation operations on the Intel x86



explain how floating-point numbers are stored in a computer



show how a floating-point number is stored in a computer using a given floating-point format



calculate the value of a floating-point number that is stored in a computer given the relevant floating-point format

COS2621/102/3/2018

Stallings

Study sections 10.1, 10.2, parts of section 10.3, and section 10.4. Read parts of section 10.3.

5.1 Introduction 

Stallings: study section 10.1 The way in which integers and floating-point numbers are stored in a computer is discussed. We look at some instructions that can be used on the Intel x86 to do arithmetic and logical operations.

5.2 Integer representation 

Stallings: study section 10.2. There are a number of different ways in which integers can be represented on a computer. Stallings only discusses sign-magnitude and twos complement representation. Remember that integers can also be referred to as fixed-point numbers.

5.3 Integer arithmetic  Stallings: study the sections on negation, addition and subtraction in section 10.3.

Read the rest of section 10.3.

This section of the study guide contains examples that illustrate the use of arithmetic and bit43

COS2621/102/3/2018 manipulation instructions. You can use either DEBUG or NASM to assemble these instructions to see what happens.

Remember that DEBUG treats all numbers as hexadecimal whereas NASM treats numbers as decimal unless otherwise indicated. 5.3.1 Integer multiply and divide (IMUL and  IDIV) 

Activity 5.1 Consider the following arithmetic operations:

(a) a*b

(b) a/b

(c) a*b

(d) a/b

(both a and b are 8 bits long.)

(both a and b are 8 bits long and b  0.)

(both a and b are 16 bits long.)

(both a and b are 16 bits long and b  0.)

Assume that a = 16h = (22)10 and b = 07h = 710.

Very important: Note that the first operand, i.e. a, must first be loaded into AL/AX in order to execute an IMUL/IDIV on 8/16 bits.

(a) a*b (both a and b are 8 bits long) use IMUL (integer multiplication)

mov

al,16h

; a to AL for 8-bit multiplication

COS2621/102/3/2018 mov

bl,07h

imul bl

; b to BL ; 16-bit product (9Ah = 15410) in AX

(b) a/b (both a and b are 8 bits long) use IDIV (integer division)

mov

ah,0

; Clear AH

mov

al,16h

; Dividend in AX (AH:AL)

mov

dl,07h

; Divisor in DL

idiv dl

; Integer division, 8-bit quotient(310)in AL, ; 8-bit remainder (=1) in AH

(c) a*b (both a and b are 16 bits long) use IMUL

mov

ax,16h

; a to AX for 16-bit multiplication

mov

bx,07h

; b to BX

imul bx

; 32-bit product in DX:AX

(d) a/b (both a and b are 16 bits long) use IDIV

mov

dx,0

; Clear DX

mov

ax,16h

; Dividend in DX:AX

mov

bx,07h

; Divisor in BX

idiv bx

; 16-bit quotient in AX, ; 16-bit remainder in DX

45

COS2621/102/3/2018

Things to remember: 

DEBUG assumes all values to be in hexadecimal.

NASM assumes all values to be in decimal unless otherwise indicated.

With IMUL, MUL, IDIV and DIV, the first operand is assumed to be in AL or AX. Be careful: IMUL AL,BL will not be flagged as a syntax error but will give incorrect results because BL will be ignored: the contents of AL will be multiplied with the contents of the first operand as specified in the instruction, namely AL, i.e. IMUL AL,BL is interpreted as IMUL AL. DX:AX is used for 16-bit multiplication operations giving a 32-bit result stored in the combination of the DX and AX registers. With 16-bit division operations the dividend is assumed to be in the 32-bit register DX:AX. The quotient is stored in AX and the remainder is stored in DX. 5.3.2 Data movement (MOV) 

We will write a simple assembly language program to illustrate the movement of data between registers and memory.

Activity 5.2

Evaluate a = a*b + c*d*e 

Use memory location temp as a temporary storage area. Assume that a, b, c, d and e are 8-bit integers and that they are stored in memory positions a, b, c, d and e respectively. Note the following:

COS2621/102/3/2018

(i)

The x86 assembly language does not allow direct data movement from one memory position to another. Such data movement must occur via one of the registers.

(ii)

The first operand is the destination operand, e.g. MOV AX,BX will result in the contents of BX being stored in the AX register.

(iii)

DEBUG and NASM do not allow the specification of a memory position in a multiplication operation. Thus imul [a] is illegal in both DEBUG and NASM. However, it is legal in some commercially available assemblers such as MASM.

AX = a*b:  mov

al,[a]

mov

bl,[b]

imul bl

; Move the contents of ‘a’ to AL ; Move the contents of ‘b’ to BL ; Multiply the contents of AL with the contents of BL ; Product in AX ; Note that AL is the first operand by default ;

(for

8-bit

multiplication)

and

must

not

be

specified

AX = c*d  mov

[temp],ax

; Store the contents of AX in ‘temp’

mov

al,[c]

; Move the contents of ‘c’ to AL

mov

bl,[d]

; Move the contents of ‘d’ to BL

imul BL

bl

; Multiply the contents of AL with the

contents of

; Product in AX

47

COS2621/102/3/2018

AX = c*d*e  mov

bl,[e]

imul bl of BL

; Move the contents of ‘e’ to BL ; Multiply the contents of AL with

the contents

; Product in AX

AX = a*b + c*d*e  add

al,[temp]

; Add contents of ‘temp’ to AX

mov

[a],al

; Store the result in ‘a’

Our program might not give the correct result. Why not? Think about this for a moment.

We assumed that the result of c * d and of c * d * e would fit into AL, in other words into 8 bits.

Things to remember:  MOV AL,[a] moves one byte from memory to register AL because AL is an 8-bit register, therefore it is a byte operation. MOV AX,[b] moves two bytes from memory to register AX because AX is a 16-bit register. It is thus a word operation. MOV [a],[b] is not allowed. You have to move via a register.

COS2621/102/3/2018

Activity 5.3

MOV [a],'N' is illegal in NASM and DEBUG because the size of the operands cannot be determined, i.e. whether it should move 8 bits or 16 bits. We have to store the ASCII character 'N' in either an 8-bit or a 16-bit register and move the contents of this register to a. mov cl,'N' mov [a],cl

5.3.3 Bit manipulation  5.3.3.1 Boolean operations The instructions available in x86 Assembly Language for Boolean operations that can be used for bit manipulation are AND, OR, NOT and XOR.

Activity 5.4

For Intel x86 machines: Remember that bits are numbered from right to left, starting from 0.

Suppose we want to determine whether or not bit 5 of the AL register is set to 1. We set up a so-called mask in another register (BL in the code below) with all the bits equal to 0 except bit 5 which is equal to 1. Use BL for the mask (BL = 00100000).

When we execute AND BL,AL all the bits in the destination register, BL, will be set to 0 if 49

COS2621/102/3/2018 bit 5 of AL is equal to 0, otherwise bit 5 of BL will be equal to 1.

mov

bl,20h

and

bl,al

; 20h = 00100000, setting bit 5 to 1.

The destination register BL will contain all 0s if bit 5 of AL was equal to 0, otherwise bit 5 of BL will be equal to 1.

Direct bit access  Direct bit access instructions such as BSF,

BSR, BT, BTC, BTR and BTS are available

COS2621/102/3/2018 but we will not use them in COS2621.

Shift and rotate instructions  In this section we describe the shift and rotate instructions available on the Intel x86.

The SHL (Shift Logical Left) instruction 

Contents of AL CF after execution of instruction

The sign bit is treated as a data bit.

10111011 SHL AL,1

01110110

1

Shifts each bit one position to the left, filling with a 0 on the right (low-order bit). High-order bit moves to Carry Flag (CF).

*SHL AL,2

11011000

1

MOV CL,3 SHL AL,CL

Shifts each bit two positions to the left, filling with 0s on the right. High-order bits move through CF, so the last one to move out will be in CF. CL = 3

11000000

0

Shifts AL to the left 3 times.

* Not allowed in DEBUG. The constant may only be equal to 1.

51

COS2621/102/3/2018

The SHR (Shift Logical Right) instruction 

Contents AL after execution instruction

of

CF

Explanation

1

Shifts each bit one position to the right, filling with 0 on the left (high-order bit).

of

10111011 SHR AL,1

01011101

Low-order bit moves to Carry Flag (CF). *SHR AL,2

00010111

0

MOV CL,3 SHR AL,CL

Shifts each bit two positions to the right, filling with 0s on the left. Second bit out moves to CF. CL = 3

00000010

1

Shifts AL to the right 3 times.

* Not allowed in DEBUG. The constant may only be equal to 1.

The SAL (Shift Algebraic Left) and SAR (Shift Algebraic Right)  instructions 

The SAL instruction is identical to the SHL instruction.

SAR is not identical to SHR. The sign bit is not regarded as a data bit. Bits are shifted to the right a specified number of times and the sign bit retains its original value. Loworder bit moves to Carry Flag (CF).

COS2621/102/3/2018 Contents of CF AL after execution of instruction

Explanation

10111011 SAR AL,1

11011101

1

Shifts each bit one position to the right, filling leftmost position (high-order bit) with a copy of the sign bit. Low-order bit (right-most bit) moves to Carry Flag (CF).

*SAR AL,2

11110111

0

Shifts each bit two positions to the right, duplicating the sign bit on the left. Second bit that moves out on the right-hand side moves to CF.

MOV AL,07

00000111

SAR AL,1

00000011

AL = 07 1

Shifts to the right. Duplicate sign bit on the left. Sign bit retains its original value (0). Low-order bit (right-most bit) moves to Carry Flag (CF).

* Not allowed in DEBUG. The constant may only be equal to 1.

The ROL (ROtate Left) and ROR (ROtate Right) instructions 

Contents of AL

CF

Explanation

1

Shifts each bit one position to the left, filling right (low-order) bit with a copy of the sign bit.

after execution of instruction. 10111011 ROL AL,1

01110111

A copy of the sign bit is also copied to the Carry Flag (CF).

53

COS2621/102/3/2018 ROL AL,1

11101110

0

Shifts each bit to the left, duplicating the sign bit on the right. The sign bit is also copied to CF.

ROR AL,1

01110111

0

Shifts each bit to the right. The bit on the right (low-order) moves into position on the left (most-significant bit) and into the CF.

ROR AL,1

10111011

1

The RCL (Rotate through Carry Left) and RCR (Rotate through Carry right)

Contents of AL

CF

Explanation

after execution of instruction

RCL AL,1

10111011

0

01110110

1

Shifts (rotates) each bit one position to the left, filling right (low-order bit) with a copy of the CF. The high-order bit moves into the Carry Flag (CF).

RCL AL,1

11101101

0

RCR AL,1

01110110

1

RCR AL,1

10111011

0

Shifts (rotates) each bit to the right. The bit in the CF moves into the position on the left (most- significant) and the bit on the right moves into the CF.

COS2621/102/3/2018

Activity 5.5 a

Bit manipulation using shifts and masks

Suppose we need to extract bits 3 to 5 of the AL register. mov

bl,38h

and

bl,al

; Mask for bits 3 to 5: 38h = 0011 1000 ; Bits 3 to 5 in BL now contain the of the

values

; corresponding bits in AL and the rest of the ; bits are set to 0 mov

cl,3

shr

bl,cl

; Move 3 to CL ; Shift bits 3 to 5 to the right-hand of BL

side

NOTE: NASM will assemble and execute the instruction shr bl,3 while DEBUG will give a syntax error. Beware if you code such an instruction in NASM and then use DEBUG to trace the program. DEBUG interprets the assembled instruction differently, but executes it correctly! Rather code shifts as shown in the example given above.

b Bit manipulation using shifts and the carry  flag:  Suppose we want to determine the value of bit 0 of the AL register and jump to memory position 200h if it is equal to 0.

shr al,1

; Shift the bits in AL one position to the right ;

jnc carry_n

The

carry flag contains shifted out

the

bit

that

is

; Jump to label carry_n if no carry, i.e. ; the carry flag is not set 55

COS2621/102/3/2018

5.4 Floating‐point representation 

Stallings: study section 10.4. Stallings only considers the IEEE and the IBM S390 standards. To give you some additional background, we look at different representations that are used on other machines. These are the older IBM 370 format and the format used on the DEC PDP/11 and VAX machines.

5.4.1 The IEEE standard 

The format for single-precision floating-point numbers as discussed in Stallings can be represented as follows:

s

8 bits

23 bits

e

f

To get the value of a number given a 32-bit bit pattern, we use the following formula:

value = (-1)s 1.f × 2 e - 127

Note the implied 1 to the left of the binary point. Remember that in the IEEE format the number is normalised in such a way that we have one 1 to the left of the binary point.

COS2621/102/3/2018 Since this 1 is always present, it is not necessary to store it. This is known as a hidden bit.

Note that this formula cannot be used for numbers that have not been normalised.

Let us look at an example.

Activity 5.6

The IEEE format  Give the representation of -7.6875 in IEEE floating-point format.

7.687510 = 111.10112 = 1.1110112 × 22

normalise

= 1.1110112 × 2(129 - 127)

get the value of e

= 1.1110112 × 210000001

convert e to binary

- 127

Remember that we have a hidden bit to the left of the binary point and that this is not stored as part of the number. This means that f = 1110110...0.

e = 10000001. The number is negative so s = 1.

s

e

f

57

COS2621/102/3/2018

1

1000000 1

1110110....0

5.4.2 The IBM 360/370 format 

The IBM 360/370 format for a single-precision floating-point number is as follows:

7 bits e

24 bits f

s

value = (-1)s 0.f × 16 e - 64

Note that the radix of the exponent is 16 and not 2! The exponent is represented using excess 64.

Let us consider an example.

Activity 5.7

The IBM 360/370 format  Show how you would represent -7.6875 using the IBM 360/370 floating-point format.

COS2621/102/3/2018

7.687510 = 111.10112 = 0.011110112 × 161

normalise

= 0.011110112 × 16( 65 - 64) get the value of e = 0.011110112 × 161000001

- 64

convert e to binary

So f = 011110110...0 and e = 1000001. The number is negative so s = 1.

s

1

e

f

100000 1

011110110....0

5.4.3 The DEC PDP 11/VAX format 

The format of a single-precision floating-point number in DEC PDP 11/VAX format is as follows:

8 bits

s

E

23 bits

f

value = (-1)s 0.1f × 2 e - 128

(e  0..0)

59

COS2621/102/3/2018 This format also uses a radix 2 for the exponent. Note that in this case, the 1 to the right of the binary point in the formula. Again, since we are working with normalised numbers, it is not necessary to store the first 1 after the binary point. So we again have a hidden bit in this representation. Let us look at an example.

Activity 5.8

The DEC PDP 11/VAX format :  Show how you would represent 7.6875 using the DEC single-precision floating-point representation. 7.687510 = 111.10112 = 0.1111011 × 23 = 0.1111011 × 2131 - 128

normalise get the value of e

= 0.1111011 × 210000011 - 128convert e to binary So e = 10000011 The number is positive so s = 0. We have a hidden bit to the right of the binary point so the first 1 of the significant is not stored.

f = 1110110...0

s

e

f

COS2621/102/3/2018

0

1000001 1

1110110......0

61

COS2621/102/3/2018

5.5 Summary 

Stallings: section 10.7. Make sure that you understand the arithmetic, shift and logical instructions discussed in this study unit and that you can convert floating-point numbers to the format in which they will be stored. You should know the meaning of all the key terms that are listed in this section.

5.6 Key Questions 

Work through the review questions at the end of the chapter.

----oooOooo----

COS2621/102/3/2018

15

16

Study unit 6 Instruction sets

This study unit supplements chapter 12 and Appendix B in Stallings. Several very important concepts are discussed. These include operand types, instruction types and types of operation. The use of subroutines and macros are discussed and we also look at t

i

h

i

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to



describe how instructions are represented in a computer



distinguish different types of instruction



distinguish different types of operand



discuss the different types of operation that can be performed by a

computer

program ●

implement the operations discussed in this section on an Intel x86 machine



define and use subroutines in assembly language



describe and implement different parameter-passing methods



describe what a macro is and use macros in assembly language 63

COS2621/102/3/2018

Stallings Study sections 12.1 - 12.4; the first part of section 12.5; Appendix B.1, and also section 13.5. Read the rest of section 12.5; Appendix B.2 to B.5.

6.1 Introduction 

Stallings: study section 12.1. Various important concepts regarding assembly language are discussed in this study unit. It is important to understand these concepts in order to master the practical work that is expected of you in this module.

6.2 Types of operand 

Stallings: study section 12.2. Operands are data on which machine instructions operate. Make sure that you understand the different categories of data. Note that an address is also a data item in its own right.

6.3 Data types 

Stallings: study section 12.3. It is important that you work through sections 4 to 6 of Appendix B in this study guide before continuing with this study unit. You will not understand the examples if you do not know the different data types and how storage space is reserved when using NASM.

Working with floating-point numbers in assembly language is quite complex so we will only use integer data and characters or character strings in the programs we use as examples.

COS2621/102/3/2018

6.4 Types of operation in assembly language 

Stallings: study section 12.4. Stallings: study section 12.5 up to, but not including, the section on x86 SIMD instructions. Read the rest of section 12.5.

Stallings: study Appendix B.1 and chapter 13.5. 

We consider some of the concepts discussed in this chapter of Stallings in more detail: 6.4.1 Jumps and branches 

The JMP instruction is used for branching and is called an unconditional jump.

To branch or not to branch! A conditional jump uses the status of the flags to decide whether or not to branch. A conditional branch frequently follows after a compare statement, CMP.

Example: 

cmp al,6 jl

; Compare AL to 6

less_than ; Jump to less_than if AL is less than 6

65

COS2621/102/3/2018 We can also use the status of the carry or overflow flag for conditional branching:

shl ax,1

; Shift AX one left

jc carry_set

; Jump to carry_set if carry flag is set

Note that a conditional jump instruction is only two bytes long. The destination address is stored in one byte as an offset from the current value of IP, in other words, the difference between the current value of IP and the destination address. This offset must fit into one byte, otherwise NASM will flag an error. This means that we can only branch one "byte" far!

Let us use an example to explain:

Suppose we want to assemble the instruction JL 300h at address 200h. You will see that the offset from address 200h to address 300h is 100h bytes (300h - 200h). If you try to assemble this instruction it will be flagged as an error by NASM because we have only one byte to store the offset from 202h (the current value of IP which points to the next instruction to be executed) to the destination address. This means that, from address 202h, we cannot conditionally jump further than address 281h forward (281h - 202h = 7Fh = 12710), or address 182h backward (202h - 182h = 80h = - 12810)

Activity 6.1

Conditional branching 

COS2621/102/3/2018

Consider the following pseudo code:

temp = y; x = 0; top: if temp < z then go to bottom; x = x + 1; temp = temp - z; go to top; bottom: ... We can encode this pseudo code in x86 assembly language as follows: org

0x100

jmp

main

; Jump to main program

y:

dw

0005

; y = 5 for testing purposes

z:

dw

0002

; z = 2 for testing purposes

mov

ax,[y]

xor

bx,bx

; BX = 0 (initial value of x = 0)

cmp

ax,[z]

; Is Temp < z ?

jl

bottom

; Yes, go to bottom

inc

bx

; x = x + 1

main: ; Use AX as Temp (=y)

top:

67

COS2621/102/3/2018 sub

ax,[z]

; Temp = Temp - z

jmp

top

; Jump to top

20h

; Terminate program

bottom: int

6.4.2 Loop control 

There are, of course, various ways in which loop control can be handled. In x86 assembly language, the CX register, in combination with the LOOP instruction, can be used for this purpose. It works as follows:

The LOOP instruction 

mov cx,10

; Set loop counter to decimal 10

loop_1: . . loop loop_1 ; Decrement CX, ; repeat loop if CX 0

Initially, we set CX to the number of times we want the loop to be repeated. When the LOOP instruction is executed, CX is decremented by 1. If CX ≠ 0, we branch back to LOOP_1. If CX = 0, the instruction following the LOOP instruction is executed.

Activity 6.2

Use of the LOOP instruction  COS2621/102/3/2018 Let us look at another example.

Write an assembly language program equivalent to the following short algorithm: This algorithm implements the operation x = y * z by repeated addition.

Multiplication by repeated addition: 

69

COS2621/102/3/2018 x = 0; for temp = y to 0 x = x + z;

The assembly language program for the above follows.

org

0x100

jmp

main

; ; Data declarations x:

db

00

; Use x=0, y=5 and z=4 for

y:

db

05

; testing purposes

z:

db

04

main: xor

ch,ch

; Clear CH

xor

al,al

; AL = x (=0)

mov

cl,[y]

; CL = y

mov

bl,[z]

; BL = z

add

al,bl

; AL = AL + z

loop

loop_1

; Repeat loop if CX 0

loop_1:

;end

mov

[x],al

int

20h

; x = y * z

COS2621/102/3/2018 The result after execution is AX = (20)10 = 0014h Note that we assume that the result will fit into one byte.

6.4.3 Input and output 

Refer to the examples in Appendices C and D in this study guide. You will see that we use a set of DOS and BIOS routines for all input and output. When we want to use one of these routines, we have to set up the registers in a predetermined way and issue an interrupt instruction. When the INT instruction is executed, control is transferred to either DOS or BIOS (depending on the interrupt issued) that handles the I/O and returns control back to our program.

Appendix D of this guide contains a number of I/O functions that we will find handy to use.

71

COS2621/102/3/2018 6.4.4 The use of subroutines or  procedures  Subroutines Procedures are used in Pascal, and methods in C++ and Java, to divide a program into smaller portions. In assembly language, a subroutine can be regarded as the equivalent of a procedure. There are various reasons for using subroutines in a program. The program may be very large and different parts of it may be written by different programmers. Each programmer may write his/her parts of the program in the form of subroutines. One programmer will be responsible for writing the main program and all the routines will eventually be linked together into one program. Another reason may be that parts of the program are written in assembly language for efficiency and the rest of the program in a high-level language like C++ or Java. Thirdly, a subroutine may be reusable; written once and used by different programs. One of the reasons for using subroutines in COS2621 is to make the writing and testing of our assembly language programs easier. The program can be divided into small routines that can be tested in isolation. If we know that a routine is working correctly, there will be no need to step through the code again as we test the rest of our program. The program needs a main routine that will call the individual subroutines as required. All the code is combined into one program. In COS2621, subroutines are not assembled and stored separately. A subroutine call can be represented as follows: ; main program

; subroutine definition

Prog_start:

sub1: mov temp,ax .

.

.

.

call sub1

mov ax,temp

sub ax,bx

ret

COS2621/102/3/2018 The return address is stored on the stack. When the CALL instruction is executed, the current value of the IP, i.e. the address of the instruction following the call, is automatically pushed onto the stack. The address of the first instruction in the subroutine is stored in the IP and execution carries on from that point. When the RET instruction is encountered in the subroutine, the top of the stack is popped into the IP and the calling program can start executing at the instruction following the call. From this description the importance of returning from a subroutine to the calling program via the RET instruction should be clear. One should never jump out of a subroutine back to the calling program without restoring the state of the stack. Fortunately, this is done automatically for us as part of the RET instruction.

When a program is very large it can be subdivided into separate subroutines that are assembled individually and eventually linked together. This can be done using NASM, but this will not be covered in COS2621.

6.4.5 Parameter Passing 

Parameters can be passed to the subroutine in a number of different ways. We look at the following possibilities in this section: 1. By defining a block in memory, called a parameter block, where the parameters are stored. 2. By storing the parameters in registers before calling the subroutine. 3. By pushing the parameters onto the stack.

In x86 assembly language, parameter passing can be done in any of the ways described above. It is very important for reusability that the comments in the subroutine heading describe very clearly where and how the subroutine expects to find the parameters and where the result, if one is passed back to the caller, will be stored by the subroutine.

Calling by name 

73

COS2621/102/3/2018 If parameters are passed using a parameter block in memory (calling by name), the address of the block is stored in a specific register before the subroutine is called. Let us look at an example. ; ; Subroutine to accept a string of characters from the keyboard. ; The address of the parameter block must be in DX. ; Accept_string: mov

ah,0Ah

; INT 21, function 0Ah

int

21h

; DOS system call

ret

; Return

The parameter block is defined elsewhere in the program as follows: ; Parameter block parb: db db

20

; Maximum of 20 characters

0

; Reserve space for actual length of string

resb 0

; Reserve 20 bytes as input buffer

In the calling program, the address of the parameter block must be stored in DX before the subroutine is called so that DOS will know where to find the parameters and where to store the input string. See Appendix D.3.

This parameter-passing mechanism is suitable to use when we have one or more parameters that may include data structures such as arrays or strings.

Calling by value (parameters passed in registers) 

If parameters are passed in registers (calling by value), the calling program must simply store the parameters in the relevant registers before calling the subroutine.

This mechanism is fast but can only be used when we have to pass a few values that will fit into registers. COS2621/102/3/2018

Call by value (parameters passed on stack) 

Parameters can also be passed on the stack although this may involve tricky programming. Remember that the return address is pushed onto the top of the stack when the subroutine is called. This mechanism is suitable for passing a small number of individual values/addresses but only in cases where there is a restriction on the number of available registers. It is not suitable for passing parameters that involve relatively large data structures. Note that it will be slower than passing parameters in registers since we have to manipulate the stack which resides in memory. We use the exercises in the following section to illustrate the different possibilities. 6.4.6 Examples of parameter‐passing  mechanisms 

Consider a subroutine that returns the largest of three numbers that are passed to it as parameters. The following code forms the body of the subroutine, i.e. that part of the subroutine that does the work and will be the same no matter how the parameters are passed.

; ; We assume that the three numbers are in AX, BX and CX ; respectively. The result is returned in AX ; body:

cmp

ax,bx

; AX > BX?

jg

next1

; Yes, compare next

mov

ax,bx

; BX greater

next1:

75 cmp

ax,cx

jg

epilog

; AX > CX? ; Yes, we are done

COS2621/102/3/2018 mov

ax,cx

; CX greater

; ; Now AX contains the largest of the three numbers ; epilog: ...

Assumptions: Let us assume that the three values are stored in memory in A, B and C respectively. The result must be stored in A. We also assume that the contents of registers AX, BX, CX and DX will be undefined after the subroutine call.

Activity 6.3

The parameters are passed on the stack and the result is returned on the  stack.  The three parameters are pushed onto the stack before the call. The result (i.e. the largest number) is returned on the stack. Calling program, prepare for call (start-up sequence): Start-up sequence

push

[A]

push

;

[B] ;

push [C] ; Push the three values into

the stack

Prolog

On entering the subroutin (prolog): pop

dx

; Store the return address

pop

ax

;

pop

bx

COS2621/102/3/2018 ; Pop the three values into AX,BX,CX

pop

cx

; respectively

.

Body of subroutine fits in here . Preparing for the return (epilog):

Epilog

push ax

; Return largest value on the stack

push

; Restore the return address

dx

ret

Calling program, get the result (clean-up sequence):

Clean-up sequence

77

COS2621/102/3/2018

pop

[D]

; Pop the result into D

Activity 6.4

Using registers to pass the parameters: 

The parameters are passed in AX, BX and CX respectively. The result is returned in AX.

Start-up sequence

Calling program, prepare for the call:

mov

ax,[A]

;

mov

bx,[B]

; Get values into registers

mov

cx,[C]

;

On entering the subroutine: Prolog

Nothing needs to be done in the prolog. The parameters are already in the required registers.

COS2621/102/3/2018

Body

Body of subroutine fits in here.

Epilog

Prepare the result before leaving the subroutine: Nothing further needs to be done in the epilog. The result is already in AX. ret

Clean-up sequence

Calling program, get the result:

mov

[D],ax

; Store result

Activity 6.5

Using a parameter block to pass the parameters: 

The address of the parameter block (in memory) is passed to the subroutine in the

DX 79

COS2621/102/3/2018 register. The result is returned in the parameter block.

Calling program, prepare for the call:

Start-up sequence

mov

dx,A

; Address of parameter block ; in memory

Prolog

On entering the subroutine:

mov

si,dx

; SI points to parameter block

mov

ax,[si]

mov

bx,[si+2]

; Get the three values

mov

cx,[si+4]

;

Body

Body of subroutine fits in here.

Epilog

Subroutine, store the result - prepare for return:

COS2621/102/3/2018 mov

[si+6],ax

; Store result in the parameter

81

COS2621/102/3/2018 ; block ret

Clean-up sequence

Calling program, get the result: Nothing needs to be done in the clean-up sequence. The result is already stored in memory at the required address.

Near and Far Calls 

The subroutine calls we have looked at so far are all near calls. This means that the main program as well as the subroutine reside within the same segment. The value of the CS register will be the same for both.

If the main program and the subroutine are not within the same segment, or if the subroutine was assembled and stored as a separate unit, the main program has to make a far call to the subroutine. When a far call is made, the value of the IP as well as the current value of the CS register is stored on the stack. On return from the subroutine, the CS register is restored to the code segment address of the caller and the IP register is restored to point to the instruction following the call where the execution of the caller must be resumed. We are only going to use near calls in our programs.

6.4.7 Recursion 

This section does not form part of the study material but is included for interest’s sake. You have probably already encountered recursion in COS2611. DEBUG is an excellent tool to use to see exactly what happens when we call a recursive routine. Use the

COS2621/102/3/2018 following example to step through the program and study the stack after each call and after each return.

Calculate the sum of the numbers 1 to n 

Activity 6.6

The following program calculates the sum of the numbers 1 to n.

bits

16

org

0x100

jmp

main

; Jump to main program

; ; This recursive subroutine calculates the sum of ; the numbers 1 to n (CX initially contains n). ; AX must contain 0 on first call. Result is returned in ; AX. recur: or

cx,cx

; Is CX = 0?

jz

return

; Yes, return

add

ax,cx

; AX holds the sum

dec

cx

; CX = CX-1

call

recur

; Recursive call return: (L2)

83

COS2621/102/3/2018 ret

; Return to main

; main: mov

cx,6

; Assume n = 6

mov

ax,0

; Initialise AX to 0.

call

recur

; Call adder

int

20h

; Terminate program

(L1)

CX = 0 (the base case) signals the termination of the recursive subroutine. The return addresses of the recursive calls are pushed onto the stack by the CALL instruction.

Return address

CX

on stack 0

AX

AX

(dec)

(hex)

1st call:

L1

6

0

2nd cal

L2

5

6

6

3rd call:

L2

4

11

B

4th call:

L2

3

15

F

5th call:

L2

2

18

12

6th call:

L2

1

20

14

7th call:

L2

0

21

15

In our example 1 + 2 + 3 + 4 + 5 + 6 = 21(decimal) = 15h.

COS2621/102/3/2018 We suggest that you test the above program by using the T option of DEBUG. Inspect the stack after each call, and after each return instruction has been executed.

Things to remember: 

The return address: In x86 assembly language, the CALL instruction pushes the offset of the next instruction to be executed onto the stack and transfers control to the subroutine. The RET instruction pops the address on top of the stack into the IP and execution resumes from this location.

Parameter-passing methods: The advantages and disadvantages of each of the methods discussed in this study unit have to be considered when we have to decide which will be best for a particular implementation.

6.5 Assembly language 

Stallings: revise section 11.5. The main reason for using assembly language is improve the efficiency of a program if the execution time of a program is crucial.

to

There is also method in the madness of our teaching assembly language to our students! It helps one to understand the underlying operations of a computer system and also to understand the way in which a compiler operates.

6.5.1 Assembler directives 

85

COS2621/102/3/2018 Pseudo-instructions are also called assembler directives. These are instructions to the assembler and do not form part of the program code that will be executed during run time. DB and DW are examples of assembler directives that we have come across in our programs so far. Assembler directives tell the assembler where and how to reserve memory and how it should be initialised. There are several other assembler directives that are important when we are implementing large programs but we will not consider these here. 6.5.2 Macros 

It is important to note the difference between a macro and a subroutine. A macro is not the same as a subroutine. We have already looked in detail at subroutines in the previous section. When a subroutine is called, we branch to an area in memory where the subroutine is stored and execute the code. Then we branch back to our main or caller program. This happens at run time.

When a macro is called, the piece of code comprising the macro is duplicated in program. We call this process macro expansion. This happens during assembly time.

We use the following example to illustrate this concept:

Activity 6.7

The use of macros in assembly language programs. 

A macro gives a name to a piece of code. Whenever we refer to the macro, the code is duplicated in the program during the assembly process.

the

COS2621/102/3/2018

bits 16 org

0x100

jmp

main

db

'Hello World',10,13,'$'

db

'What a beautiful day!',10,13,'$'

mes1:

mes2:

; ; macro definition %macro disp 1 mov

dx,%1

mov

ah,09

int

21h

%endmacro ; ; end of macro definition main: disp

mes1

disp

mes2

int

20h

The assembled program code is as follows:

1

bits 16

87

COS2621/102/3/2018 2

org

0x100

3 00000000 E92600

jmp

main

4

mes1:

5 00000003 48656C6C6F20576F72- db

'Hello World',10,13,'$'

6 0000000C 6C640A0D24 7

mes2:

8 00000011 576861742061206265- db'What a beautiful day!',10,13,'$' 9 0000001A 6175746966756C206410 00000023 6179210A0D24 11 12

%macro disp 1

13

mov

dx,%1

14

mov

ah,09

15

int

21h

16

%endmacro

17 18

main:

19

disp mes1

20 00000029 BA[0300]



mov

dx,%1

21 0000002C B409



mov

ah,09

22 0000002E CD21



int

21h

23

disp mes2

24 00000030 BA[1100]



mov

dx,%1

25 00000033 B409



mov

ah,09

26 00000035 CD21



int

21h

COS2621/102/3/2018 27 00000037 CD20

int

20h

28

Note how the code within the macro definition is duplicated in the main program. The macro definition does not exist in the assembled program.

6.6 Stacks and byte‐ordering 

NB: You should be able to manipulate the stack on an x86 machine. See section 4.5.1.4 in the study guide. We will consider the stack again in the next study unit.

6.7 Summary 

Stallings: section 12.7 Make sure that you understand the different categories of data, the use of all the different kinds of instruction, and the use of subroutines, macros and pseudo-instructions. You should also know the meaning of all the key terms that are listed in section 12.7.

6.8 Key Questions 

Work through the review questions at the end of the chapter.

----oooOooo----

101

COS2621/102/3/2018 17

Study unit 7 Instruction Sets: Addressing modes and

This study unit supplements chapter 13 in Stallings.

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to



identify the addressing mode used in a given instruction



give examples of instructions in which specific x86 addressing modes are used



discuss the advantages and disadvantages of different instruction formats

COS2621/102/3/2018

7.1 Addressing modes  Stallings: study section 13.1. The addressing mode used in an instruction indicates the way in which the operand is accessed. We have two operands in binary operations such as addition, for example, but generally, at least one of the operands is stored in a register. Consequently, we refer to the addressing mode of the operand that is not obtained directly from a register as the addressing mode of the instruction. The examples below illustrate this principle.

7.2 Intel x86 and ARM addressing  modes 

Stallings: study section 13.2 excluding the section on the ARM addressing modes.

Read the rest of section 13.2, i.e. the section on ARM addressing modes.

We examine the addressing modes of the Intel x86 in more detail below.

A number of different addressing modes can be identified in the Intel x86 instruction set. The addressing mode is determined by the operands in the instruction.

The following addressing modes are quite straightforward:

Register addressing: 

mov

ax,bx

; Load the value of BX into AX

add

ax,bx

; AX = AX + BX

103

COS2621/102/3/2018 Both operands are in registers. The first operand is always the destination operand.

Immediate addressing: 

mov

ax,1

; Load the value 1 into AX

mov

bx,0x010C ; Load the hexadecimal value 010C ; into BX

One of the operands is a value that is stored as part of the instruction.

Direct addressing (displacement addressing): 

mov

ax,[102h]

; Load the contents of memory

; location 102h into AX mov

[temp],bx

; Store the contents of BX in the ; memory location with label temp

Note: In contrast to this, mov bx, temp stores the address of the label temp in BX.

One of the operands is the address of the actual operand. This address is the offset (displacement) from the start of the Data Segment.

COS2621/102/3/2018 Stallings: revise and study section 13.3. We use examples to illustrate the use of the following two addressing modes where indirect addressing are used and which are a bit more complicated. Both of these are examples of register indirect addressing.

Indirect addressing means that the operand given in the instruction contains the address of the actual operand. On the Intel x86 we can only have register indirect addressing using SI, DI, BP and BX. Hence indexed addressing and base-indexed addressing, which we consider in the examples given in section 7.2.1 are both different forms of register indirect addressing. We can also use BP and BX for indirect addressing.

Base addressing: If BX or BP is used we call it base addressing.

Activity 7.1

7.2.1 Indexed addressing mode 

Revise section 4.5.1.3 in this study guide. Consider the case where we want to access each individual element in a character string. We can regard the string as an array of characters and use a pointer to access each individual character. Suppose we have the following string that starts at memory position 102h:

String

S

C

I

E

N

C

E

Offset

0

1

2

3

4

5

6

7

I

S

8

9

10

F

U

N

11

12

13

105

COS2621/102/3/2018 If we set SI (the source index register) to the start of the string, we can use SI as a pointer to access the characters one at a time. The character 'S' is stored at offset 0 from the start of the string and the character 'N’ is stored at offset (13)10 from the start of the string.

Example:

Suppose we want to access the character 'F’ which is stored at offset (11)10 start of the string.

db

from the

'SCIENCE IS FUN' . mov

si,102h; Move addr of start of string to SI

add

si,11

; Add (11)10 to SI. NASM assumes that ; numbers are decimal

mov

al,[si]; Move the contents of the address ; pointed to by SI to AL. ; AL will contain the character 'F'

Let us look at another example to illustrate how we can use indexed addressing to access each element in the string.

COS2621/102/3/2018

Activity 7.2

Using an index to point to an array element 

Read each element of a string (array of characters), which we assume is in upper case, from memory, convert it to lower case and store it in an output string. Assume that the input string, 'SCIENCE IS FUN', is stored in memory. We use indexed addressing as follows:

We use SI to point to the input string, so SI is initialised to the starting address of the input string. We use DI to point to the output string, so DI (Destination Index) is initialised to the starting address of the output string.

Algorithm to solve the problem:

Do while not end_of_string Move a character from the input string to AL Convert to lower case (This is done by adding 20h to the contents of AL since the difference between an upper and lower case ASCII character is 20h) Store character in the output string Increment SI Increment DI End do;

107

COS2621/102/3/2018

Solution 

Note that we do not test whether the character is in fact an upper case character, so the program will give incorrect results if the input string contains characters that are not upper case (which is indeed the case for the spaces in our string).

Note the use of indexed addressing in this example.

in_str: out_str: output

org

0x100

jmp

main

db

'SCIENCE IS FUN'

times 14 db

' '

; Reserve 14

spaces for

; string out_len: string

equ

$-out_str

; Define length

of output

main: mov

si,in_str

mov

di,out_str

; Start address of

input

string string

; Start address of

output

COS2621/102/3/2018 mov

cx,out_len

; Length of string

mov

al,[si]

; Get one character

add

al,20h

; Convert to lower case

mov

[di],al

; Store in output string

inc

si

; Adjust pointers

inc

di

loop

loop_1

; If CX 0, get next one

int

20h

; Terminate program

loop_1:

Run this program under DEBUG and inspect the contents of memory where the output string is stored. How do we know this address? We have to inspect the NASM output listing which is as follows: 1

org

0x100

2 00000000 E91C00

jmp

main

3 00000003 534349454E43452049- in_str:

db

'SCIENCE IS FUN'

4 0000000C 532046554E

5 00000011 20

out_str:

times 14 db ' '

; Reserve 14 spaces for

; output string 6

out_len:

equ

$-out_str

; Define length of output ; string

7

main:

8 0000001F BE[0300]

mov

9 00000022 BF[1100] 10 00000025 B90E00 11 12 00000028 8A04

si,in_str

; Start address of input ; str

mov

di,out_str

; Start address of output

mov

cx,out_len

; Length of string

loop_1: mov

al,[si]

; Get one character

109

COS2621/102/3/2018 13 0000002A 0420

add

al,20h

; Convert to lower case

14 0000002C 8805

mov

[di],al

; Store in output string

15 0000002E 46

inc

si

; Adjust pointers

16 0000002F 47

inc

di

17 00000030 E2F6

loop

loop_1

; If CX 0, get next one

18 00000032 CD20

int

20h

; Terminate program

The input string starts at offset 103h and the output string starts at offset 111h.

7.2.2 Base‐indexed addressing  mode  Base-indexed addressing can also be used to access array elements. BX and BP can be used as base registers. When we use this type of addressing mode, the base register is normally set to the start of the array and the index register is used as an offset from the start of the array.

Activity 7.3

Access array elements by using base-indexed addressing

org

0x100

; ; Program to select a classic movie from a list of 9 ;

COS2621/102/3/2018

jmp

main

; jump to main program

; ; Prompt message ; message:

db

'Please enter a theatre number',0ah,0dh,'$'

; ; List of movies you can select from ; m_1: db

'Top Gun','$'

m_2: db

'Braveheart','$'

m_3: db

'Casablanca','$'

m_4: db

'The Rock','$'

m_5: db

'A few good men','$'

m_6: db

'Con Air','$'

m_7: db

'The Lion King','$'

m_8: db

'Kellys Heroes','$'

m_9: db

'The dirty dozen','$'

; m_addr: dw

m_1, m_2, m_3, m_4, m_5, m_6, m_7, m_8, m_9

; ; Each of the values in m_addr occupies 2 bytes. ; This means that the first movie adress is at offset 0 in ; m_addr, ; the second movie address is at offset 2,

111

COS2621/102/3/2018 ; the third movie address is at offset 4, ; the fourth,…, ninth movie addresses are at offsets ; 6,8,10,12,14,16 respectively. ; ; Display message on the screen ; DX points to start of message ;

display: mov

ah,09

; Display message function

int

21h

; DOS system call

ret ; ; Accept a character from the keyboard ; Character (in ASCII) is returned in AL ; Character is not echoed to the screen ; input: mov

ah,07

; Accept char, no echo function

int

21h

; DOS system call

ret ; ; Display movie name corresponding to the theatre number ; input ; Movie number is in AL. ;

COS2621/102/3/2018

display_m:

shl

xor

ah,ah

; AH = 0

sub

al,31h

al,1

; Offset = (AL 1)*2 ; Equivalent to AL*2

mov

si,ax

; SI is used as an index

mov

dx,[m_addr+si]

; DX points to movie name

call display ret main: mov

dx,message

call display call

input

call display_m int

20h

Activity 7.4

Two‐dimensional arrays:  Calculate the sum of all the elements in the second column of a 3 x 5 array.

org jmp

0x100 main

;Jump to main program

113

COS2621/102/3/2018

; First row of array first_row: db 00h,01h,02h,03h,04h ; Second row of array second_row: db 10h,20h,30h,40h,50h ; Third row of array third_row: db 60h,70h,80h,90h,0xa0 ; ; Add the elements in the second column main: mov

bx,first_row ; BX points to the first row

mov

si,1

; SI points to column 2 ; (SI=0 for column 1 and ;

mov

cx,3

SI=1 for column 2)

; Loop counter -

add three

; elements mov

ax,0000

; AX to contain the sum

add

al,[bx+si]

; Add element to contents of AL

add

bx,05

; BX points to next row

; sum:

loop sum

; Repeat the loop until CX=0

int

; Terminate program

20h

COS2621/102/3/2018

Run this program under DEBUG. The AX register should contain 0091h on termination.

7.2.3 Stack addressing 

Stack Pointer (SP): The SP register contains the offset from the beginning of the stack to the top of the stack. We use the instructions PUSH and POP to push values onto the stack and to pop them off the stack respectively. This addressing mode is called stack addressing.

PUSH: An item is stored on top of the stack and the stack pointer adjusted accordingly. SP is decremented before a new item is placed on the stack. This means that the stack grows from high memory to low memory. We say that the stack grows backward in memory. SP points to the last item that was pushed onto the stack.

POP: An item is removed from the top of the stack and the stack pointer adjusted accordingly. SP is incremented after an item has been removed from the stack. SP points to the new top of stack.

Note that the operations of PUSH and POP are slightly different on some older machines. This may not concern you. If you have an older machine though, this might explain differences between the results obtained by you on your machine and the results given in the next example which was tested on a relatively new Intel machine. If your results are different, the following is the cause: On some of the older machines, when the PUSH instruction is executed, SP is decremented after an item has been pushed on the stack. This means that SP points to the next available position on the stack. With the POP instruction, SP is incremented before the item on top of the stack is popped.

115

COS2621/102/3/2018

Activity 7.5

The state of the stack: Assume that the initial values of the registers are: AX = 00A5, BX = 0001, CX = 0002, SP = FFFC

The table given below shows the state of the stack and the registers after each instruction listed has been executed. All values are in hexadecimal.

Content s of AX

Contents of BX

Contents of CX

Contents of SP (Points to top of stack.)

Contents stack. ‘grows’

of It

Comment

from high to low memory.

PUSH AX

00A5

0001

0002

FFFC

?

00A5

0001

0002

FFFA

00A5

Points to current top of stack. Contents change.

of

AX

do

not

SP is decremented by 2. (The stack grows from high memory to low memory.) PUSH BX

00A5

0001

0002

FFF8

0001 00A5

Contents change.

of

BX

do

not

SP is decremented by 2. PUSH CX

00A5

0001

0002

FFF6

0002 0001 00A5

POP AX

0002

0001

0002

FFF8

0001 00A5

Contents change.

of

CX

do

not

SP is decremented by 2.

The value on top of the stack (ie 0002) is popped into AX. SP is incremented by 2.

COS2621/102/3/2018 After POP AX has been executed, AX contains the value that was on top of the stack, namely 0002. The stack pointer is now pointing to the value 0001. The popped value (0002) no longer forms part of the stack. Only the values below the stack pointer form part of the stack.

Things to remember:   The stack grows backward in memory.  The POP instruction moves the value on top of the stack into the operand. This value does not form part of the stack any longer. See also Example 6 in Appendix C of this guide. 7.3 Instruction Formats  Stallings: study the first three pages of section 13.3.

Read the rest of section 13.3.

7.4 The Intel x86 and ARM instruction  formats  Stallings: read section 13.4

7.5 Summary  Stalling: section 13.7. You should be able to identify addressing modes used in Intel x86 instructions and to give examples of instructions where a specific addressing mode is used. Make sure that you understand the issues involved when choosing appropriate instruction formats and that you know the meaning of the key terms listed in section 13.7. 7.6 Key Questions 

Work through the review questions at the end of the chapter. 117

COS2621/102/3/2018

----oooOooo---This study unit supplements chapter 4 of Stallings. Concepts pertaining to cache memory principles, design and organisation are discussed in this study unit.

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to

 explain the characteristics of memory systems  describe the memory hierarchy  discuss cache memory principles  discuss issues relevant to cache design  describe the cache organisation of the Pentium 4

Cache memory 8.1 Computer memory systems 

Stallings: study section 4.1. The characteristics of memory systems are discussed. These include memory capacity, the basic unit of transfer, the method of accessing and performance. The memory hierarchy as well as the principle of locality of reference are also explained.

COS2621/102/3/2018

8.2 Cache memory principles 

Stallings: study section 4.2. The structure of cache memory is explained. Stallings also describes how a read from cache memory takes place to illustrate why this is much faster than a read from main memory. 8.3 Elements of cache design  Stallings: carefully read section 4.3 and study the last page. The concepts explained here are often referred to in the media in advertisements of laptops, for example.

8.4 Intel x86 and ARM cache  organisations  Stallings: study the section on the cache organisation of the Pentium 4 in section

4.4.

Stallings: read the rest of section 4.4 and read section 4.5. Note the difference between the cache organisations of the Pentium 4 and the ARM processor. 8.5 Summary  Make sure you understand the meaning of the key terms listed in this section.

8.6 Key Questions 

Work through the review questions at the end of the chapter.

---ooo0ooo--119

COS2621/102/3/2018

Study unit 9 Internal memory This study unit supplements chapter 5 of Stallings. The principles of dynamic and static RAM are discussed as well as the different types of ROM. The principles of SDRAM organisation are also considered.

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to

 describe the operation of a memory cell  explain the difference between DRAM and SRAM  discuss the different types of ROM  explain the concepts of a hard failure and a soft error respectively  and describe SDRAM organisation

COS2621/102/3/2018

Stallings

Study sections 5.1, 5.2 and 5.3.

121

COS2621/102/3/2018 9.1 Semiconductor main  memory 

Stallings: study section 5.1. The operation of a memory cell is explained. Stallings also describes the operation of dynamic RAM (DRAM) and static RAM. Finally, the different types of ROM are discussed. Pay special attention to the discussion on flash memory.

9.2 Error correction 

Stallings: study section 5.2. Make sure that you understand the difference between a hard failure and a soft error. The principles on which error-correcting codes function are discussed and the Hamming error-correcting code is described.

9.3 Advanced DRAM organisation 

Stallings: study section 5.3. Make sure that you understand the difference between SDRAM and traditional DRAM.

9.4 Summary 

Make sure you understand the meaning of all the key terms in this section.

9.5 Key Questions 

Work through the review questions at the end of the chapter.

---oooOooo---

COS2621/102/3/2018

Study unit 10 External memory

This study unit supplements chapter 6 of Stallings. Magnetic disks are discussed as well as optical storage devices.

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to

 discuss the physical characteristics of magnetic disks  describe how data is organised and accessed on a magnetic disk  discuss the parameters that play a role in the performance of magnetic disks  describe different optical memory devices

123

COS2621/102/3/2018

Stallings

Study sections 6.1, 6.3 and 6.4. Read sections 6.2 and 6.5.

COS2621/102/3/2018 10.1 Magnetic Disk 

Stallings: study section 6.1. The way data is stored on and retrieved from magnetic disks is discussed. The physical characteristics of a magnetic disk are described as well as the factors that play a role in the performance of a disk.

10.2 RAID 

Stallings: read section 6.2. The basic principles of a RAID scheme are discussed. This section also contains detailed discussions of the various levels of RAID.

10.3 Solid state drives 

Stallings: study section 6.3. The principles on which different solid state devices function are described.

storage

10.4 Optical memory 

Stallings: study section 6.4. The principles on which different optical storage devices function are described.

10.5 Magnetic tape 

Stallings: read section 6.5. The principles on which magnetic tape storage devices function are described. 131

COS2621/102/3/2018

10.6 Summary 

Make sure that you understand the meaning of the key terms listed in this section.

10.7 Key Questions 

Work through the review questions at the end of the chapter.

---ooo0ooo---

COS2621/102/3/2018

Study unit 11 Input / Output

This study unit supplements chapter 7 of Stallings. Various aspects of Input/Output are discussed including the operation of some external devices, the operation of I/O modules and different I/O methods. Stallings also discusses external I/O interfaces like FireWire and InfiniBand.

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to

 name different device categories  discuss the functions and structure of I/O modules  describe the principles of Programmed I/O  describe the principles of Interrupt-driven I/O  describe the principles of DMA  discuss the evolution and characteristics of I/O channels  describe different types of I/O interface  explain the principles of point-to-point and multipoint configurations  discuss the way in which a InfiniBand serial bus functions  discuss the principles of Thunderbolt architecture

133

COS2621/102/3/2018

Stallings

Study sections 7.1 - 7.3, the first part of section 7.4, and sections 7.5 - 7.7. Read the last part of section 7.4.

COS2621/102/3/2018 11.1 External devices 

Stallings: study section 7.1. The different categories in which external devices can be classified are discussed. We look in some detail at the operation of the keyboard and monitor and also disk drives. Note that ASCII is the US version of IRA referred to in this section.

11.2 I/O modules 

Stallings: study section 7.2. The most important functions, as well as the structure of an I/O module are discussed.

11.3 Programmed I/O 

Stallings: study section 7.3. The way in which Programmed

I/O operates is discussed.

Stallings also gives a short description of the I/O commands as well as I/O instructions.

It is important to study this section together with the following two sections to clearly understand the advantages and disadvantages of each method of communication.

11.4 Interrupt‐driven I/O 

Stallings: study section 7.4 up to (but not including) the section on the Intel Interrupt Controller. Make sure that you understand how interrupts are used to implement this type of I/O communication and that you can discuss the relevant design issues involved when this method is used.

135

COS2621/102/3/2018 Read the rest of the section. 11.5 Direct Memory Access (DMA)  Stallings: study section 7.5. The principles of Direct Memory Access I/O are discussed. Note the comments regarding Programmed I/O and Interrupt-driven I/O. 11.6 I/O channels and processors  Stallings: study section 7.6. Stallings gives a summary of the evolution of the I/O function in computers. The characteristics of I/O channels are also considered. 11.7 The external interface: Thunderbolt and  InfiniBand  Stallings: study section 7.7. Different types of interface are first considered and then Stallings discusses point-to-point and multipoint configurations. The principles of Thunderbolt technology are explained and we look at the configurations involved. You need not go into the protocol details. You can read on the Internet about Thunderbolt version 2 as well. Stallings also discusses InfiniBand architecture. You need not go into the protocol details (InfiniBand operation). You can ignore section 7.8. 11.8 Summary  Stallings: section 7.10. Make sure you understand the meaning of all the key terms in this section. 11.9 Key Questions  Work through the review questions at the end of the chapter. ----oooOooo----

COS2621/102/3/2017

Notes Study unit 12 Reduced Instruction Set Computers 18 This study unit supplements chapter 15 of Stallings. RISC and CISC design principles, as well as the RISC vs CISC controversy are discussed. Stallings also considers compilerbased register optimisation.

Learning outcomes

Once you have mastered the study material in this study unit, you will be able to

 explain the advantages of using a large number of registers  discuss the way in which compilers optimise register usage  discuss the evolution of CISC machines  describe the characteristics of RISC architectures  discuss the RISC vs CISC controversy  describe the way in which RISC and CISC design principles can be combined

Stallings Study sections 15.1 - 15.4 and section 15.8 Read sections 15.5 - 15.7 137

COS2621/102/3/2018

12.1 Instruction execution characteristics 

Stallings: study the introduction to this chapter as well as section 15.1. The way in which an instruction is executed is once again revised to help us understand what CISC and RISC design principles are all about.

12.2 The use of a large register file 

Stallings: study section 15.2. One of the most important design principles of RISC machines is the use of a large number of registers. The concept of register windows and the use of a large register file versus the use of cache memory are discussed. 12.3 Compiler‐based register optimisation  Stallings: study section 15.3. Compiler-based optimisation is discussed and the tradeoff between using this, rather than a large set of registers, is also considered. 12.4 Reduced Instruction Set Architecture  (RISC)  Stallings: study section 15.4. The design principles of CISC machines versus RISC machines are discussed. Each approach has its own advantages and disadvantages and it was inevitable that these principles would be combined in some machines. Stallings: read sections 15.5 - 15.7. 12.5 The RISC vs CISC controversy  Stallings: study section 15.8. The merits of both the RISC and the CISC approaches are discussed. 12.6 Summary  Make sure you understand the meaning of all the key terms in this section. 12.7 Key Questions 

COS2621/102/3/2017 Work through the review questions at the end of the chapter.

---ooo0ooo---

19

Appendix A 20 The DOS DEBUG environment

Contents

A.1

Introduction

A.2

How to start DEBUG

A.3

DEBUG commands

A.3.1

A (Assemble)

A.3.2

R (Register)

A.3.3

T (Trace)

A.3.4

G (Go)

A.3.5

P (Proceed)

A.3.6

D (Dump or Display)

A.3.7

U ("Unassemble")

139

COS2621/102/3/2018 A.3.8

E (Enter)

A.3.9

N (Name) L (Load)

A.3.10 A.3.11

A.3.12

W (W it ) H (Hexadecimal) I (Input)

Q (Quit)

Note



DEBUG does not allow variable names or labels. We have to work with actual memory locations.

COS2621/102/3/2017 A.1 Introduction 

DEBUG, which forms part of DOS within the Windows environment, is very useful for writing and debugging small machine language and assembly language programs. We suggest that you execute all the commands on your machine as you work through this appendix. You will not learn much by just reading this section without using DEBUG.

We suggest that you create a directory (folder) called C2621 to store the programs that you write for this module.

A.2 How to start DEBUG 

Open a DOS-window, go to the C2621 directory by typing cd \c2621, and type debug at the DOS prompt. DEBUG responds with a hyphen (-) called the ‘DEBUG prompt’.

c:\c2621>debug

means that you must press the Enter key.

A.3 Debug commands  A.3.1

A (Assemble) 

The "a” command translates assembly language source statements into machine code. One can use the "a" command to write and test small assembly language programs.

141

COS2621/102/3/2018

We will now create a short program consisting of only six instructions. We want to store this program from memory address 100h (h signifies a hexadecimal number).

At the DEBUG prompt, type the following:

a 100 and press the Enter key.

Key in the program as listed below and press the Enter key TWICE after you have finished entering the program. We will use this program in the following sections to explain some of the DEBUG commands. Do not get upset if you make a typing error as you can always change a specific instruction. We will explain how to do this shortly. DEBUG will warn you if you key in an illegal instruction and you will be allowed to key in the correct one.

Example: (Your entries are shown in italics and DEBUG's responses in bold - DO NOT TYPE the hyphen (-).) Type in only the instructions shown in italics.

(Ignore the first step (-a 100 ) if you have already entered it.)

 a 100

nnnn:0100

mov

ax,0015



nnnn:0103

mov

cx,0023



COS2621/102/3/2017 nnnn:0106

sub

cx,ax



nnnn:0108

mov

[120],al



nnnn:010B

mov

cl,[120]



nnnn:010F

nop

nnnn:0110

<

0110 1110

n

0000 1111

SI

0011 1111

?

0110 1111

o

0001 0000

DLE

0100 0000

@

0111 0000

p

0001 0001

DC1

0100 0001

A

0111 0001

q

0001 0010

DC2

0100 0010

B

0111 0010

r

0001 0011

DC3

0100 0011

C

0111 0011

s

0001 0100

DC4

0100 0100

D

0111 0100

t

0001 0101

NAK

0100 0101

E

0111 0101

u

0001 0110

SYN

0100 0110

F

0111 0110

v

0001 0111

ETB

0100 0111

G

0111 0111

w

0001 1000

CAN

0100 1000

H

0111 1000

x

0001 1001

EM

0100 1001

IJ

0111 1001

y

0001 1010

SUB

0100 1010

K

0111 1010

z

0001 1011

ESC

0100 1011

L

0111 1011

{

0001 1100

FS

0100 1100

M

0111 1100

|

0001 1101

GS

0100 1101

N

0111 1101

}

0001 1110

RS

0100 1110

O

0111 1110

~

0001 1111

US

0100 1111

0111 1111

Delete

269

COS2621/102/3/2018

0010 0000

space

0101 0000

P

0010 0001

!

0101 0001

Q

0010 0010

"

0101 0010

R

0010 0011

#

0101 0011

S

0010 0100

$

0101 0100

T

0010 0101

%

0101 0101

U

0010 0110

&

0101 0110

V

0010 0111

'

0101 0111

W

0010 1000

(

0101 1000

X

0010 1001

)

0101 1001

Y

0010 1010

*

0101 1010

Z

0010 1011

+

0101 1011

[

0010 1100

,

0101 1100

\

0010 1101

-

0101 1101

]

0010 1110

.

0101 1110

^

0010 1111

/

0101 1111

_

COS2621/102/3/2018

33 Bibliography

P. Abel. IBM PC Assembly Language and Programming, Fifth Edition. USA: PrenticeHall, 2001. S.D. Burd. Systems Architecture, Sixth Edition. Canada: Course Technology, Thompson Learning, 2010. J.D. Carpinelli. Computer Systems: Organization & Architecture. USA: Addison Wesley Longman, 2000. R.C. Detmer. Introduction to 80x86 assembly language and computer architecture. Jones & Bartlett Publishers, 2001. J. Duntemann. Assembly Language Step-by-Step, Second Edition. USA: John Wiley & Sons, 2000. J.L. Hennessy & D.A. Patterson. Computer Architecture: A Quantitative Approach, 3rd edition, Morgan Kaufmann Publishers, 2003. V.P. Heuring & H.F. Jordan. Computer Systems Design and Architecture. USA: AddisonWesley, 1997. K.R. Irvine. Assembly Language for the IBM PC, Second Edition. USA: Macmillan Publishing Company, 1993. N.S. Matlof. IBM Microcomputer Architecture and Assembly Language. A look under the hood. USA: Prentice-Hall International, 1992. W. Stallings. Computer Organization and Architecture. Designing for Performance, 9th edition. USA: Prentice-Hall International, 2013. A.S. Tanenbaum. Structured Computer Organization, Fourth Edition. USA: Prentice-Hall International, 1999. M. Thorne. Computer Organization and Assembly Language Programming. For IBM PC’s and Compatibles, 2nd Edition. The Benjamin/Cummings Publishing Company, 1991.

271