Input / Output
It should also provide an interface between the devices and the rest of the system that is simple and easy to use. 于是乎这里就有了对于硬盘等IO设备的抽象。
5.1 PRINCIPLES OF I/O HARDWARE
5.1.1 I/0 Devices
I/0 devices can be roughly divided into two categories: block devices and character devices. The essential property of a block device is that it is possible to read or write each block independently of all the
other ones. Hard disks, CD-ROMs, and USB sticks are common block devices.
The other type of I/O device is the character device. A character device de-livers or accepts a stream of characters, without regard to any block structure. It
is not addressable and does not have any seek operation.
5.1.2 Device Controllers
I/O units typically consist of a mechanical component and an electronic component. It is often possible to separate the two portions to provide a more modular and general design. The electronic component is called
the device controller or adapter.
The controller‘s job is to convert the serial bit stream into a block of bytes and perform any error correction necessary.
5.1.3 Memory- Mapped I/O
Each controller has a few registers that are used for communicating with the CPU. By writing into these registers, the operating system can command the device to deliver data, accept data, switch itself on or off,
or otherwise perform some action. By reading from these registers, the operating system can learn what the device‘s state is, whether it is prepared to accept a new command, and so on.
The issue thus arises of how the CPU communicates with the control registers and the device data buffers. Two alternatives exist. In the first approach, each control register is
assigned an I/O port number, an 8- or 16-bit integer. The set of all the I/O ports form the I/O port space and is protected so that ordinary user programs cannot access it (only the operating system can). Using a special I/O in-struction such as
IN REG,PORT,
the CPU can read in control register PORT and store the result in CPU register REG. Similarly, using
OUT PORT,REG
The CPU can write the contents of REG to a control register. Most early computers, including nearly all mainframes, such as the IBM 360 and all of its successors, worked this way.
In this scheme, the address spaces for memory and I/O are different, as shown in Fig. 5-2(a). The instructions
IN R0,4
and
MOV R0,4
are completely different in this design.
Each control register is assigned a unique memory address to which no memory is assigned. This system is called memory-mapped I/0.
Usually, the assigned addresses are at the top of the address space. A hybrid scheme, with memory-mapped I/O data buffers and separate I/O ports for the control registers is shown
in Fig. 5-2(c).
How do these schemes work?
In all cases, when the CPU wants to read a word, either from memory or from an I/O port, it puts the address it needs on the bus‘ address lines and then asserts a READ signal on a bus‘ control
line. A second signal line is used to tell whether I/O space or memory space is needed. If it is memory space, the memory responds to the request. If it is I/0 space, the I/0 device responds to the request. If there is only memory space [as in Fig. 5-2(b)],
every memory module and every I/O device compares the address lines to the range of addresses that it services. If the address falls in its range, it responds to the request. Since no address is ever assigned to both memory and an I/O device, there is no ambiguity
and no conflict. The two schemes for addressing the controllers have different strengths and
weaknesses.
Thus with memory-mapped I/O, a I/O device driver can be written entirely in C. Without memory-mapped I/O, some assembly code is needed.
Second, with memory-mapped I/O, no special protection mechanism is need-ed to keep user processes from performing I/O.
Third, with memory-mapped I/O, every instruction that can reference memory can also reference control registers.
In computer design, practically everything involves tradeoffs, and that is the case here too. Memory-mapped I/O also has its disadvantages. First, most computers nowadays have
some form of caching of memory words. Caching a device control register would be disastrous.
Subsequent references would just take the value from the cache and not even ask the device. To prevent this situation with memory-mapped I/O, the hardware has to be equipped
with the ability to selectively disable caching
Second, if there is only one address space, then all memory modules and all I/O devices must examine all memory references to see which ones to respond to.
However, the trend in modern personal computers is to have a dedicated high-speed memory bus, as shown in Fig. 5-3(b), a property also found in main-frames, incidentally. This
bus is tailored to optimize memory performance, with no compromises for the sake of slow I/O devices. Pentium systems can have multiple buses (memory, PCI, SCSI, USB, ISA), as shown in Fig. 1-12.
5.1.4 Direct Memory Access (DMA)
No matter whether a CPU does or does not have memory-mapped I/0, it needs to address the device controllers to exchange data with them. The CPU can request data from an I/O controller one byte at a time but doing so wastes the CPU‘s time, so a different
scheme, called DMA (Direct Memory Access)
More commonly, a single DMA controller is available (e.g., on the parentboard) for regulating transfers to multiple devices, often concurrently.
To explain how DMA works, let us first look at how disk reads occur when DMA is not used.
First the disk controller reads the block (one or more sectors) from the drive serially, bit by bit, until the entire block is in the controller‘s internal buffer.
Next, it computes the checksum to verify that no read errors have oc-curred.
Then the controller causes an interrupt. When the operating system starts running, it can read the disk block from the controller‘s buffer a byte or a word at a time by executing a loop, with each iteration reading one
byte or word from a controller device register and storing it in main memory.
When DMA is used, the procedure is different.
First the CPU programs the DMA controller by setting its registers so it knows what to transfer where (step 1 in Fig. 5-4). It also issues a command to the disk controller telling it to read data from the disk into its
internal buffer and verify the checksum.When valid data are
in the disk controller‘s buffer, DMA can begin.The DMA controller initiates the transfer by issuing a read request over the bus to the disk controller (step 2). This read request looks like any other read request, and the disk controller does not know or care whether
it came from the CPU or from a DMA controller.Typically, the memory address to write to is onthe bus‘ address lines so when the disk controller fetches the next word from itsinternal buffer,
it knows where to write it. The write to memory is another standard bus cycle (step 3).When the write is complete, the disk controller sends anacknowledgement signal to the DMA controller, also over the bus (step 4).TheDMA
controller then increments the memory address to use and decrements thebyte count. If the byte count is still greater than 0, steps 2 through 4 are repeateduntil
the count reaches 0. At that time, the DMA controller interrupts the CPU tolet it know that the transfer is now complete. When the operating system starts up,it does not have to copy the disk block to memory; it is already there.
After each word is transferred (steps 2 through 4) in Fig. 5-4, the DMA controller decides which device to service next.
两种不同模式的DMA
Many buses can operate in two modes:
word-at-a-time mode and block mode.Some DMA controllers can also operate in either mode.
In the former mode, the operation is as described above: the DMA controller requests for the transfer ofone word and gets it. If the CPU also wants
the bus, it has to wait. The mechan-ism is called cycle stealing because the device controller sneaks in and steals anoccasional bus cycle from the CPU once in a while, delaying it slightly.In block mode, the DMA controller tells the device to acquire the bus, issue a series oftransfers, then release the bus. This form of operation is
called burst mode. It ismore efficient than cycle stealing because acquiring the bus takes time and multiple words can be transferred for the price of one bus acquisition. The down side toburst
mode is that it can block the CPU and other devices for a substantial periodof time if a long burst is being transferred.
Most DMA controllers use physical memory addresses for their transfers. Using physical addresses requires the operating system to convert the virtual ad-dress of the intended memory buffer into a physical address
and write this physi-cal address into the DMA controller‘s address register.
You may be wondering why the controller does not just store the bytes in main memory as soon as it gets them from the disk. In other words,why does it need an internal buffer? There
are two reasons.
First, by doing internal buffering, the disk controller can verify the checksum before starting a trans-fer. If the checksum is incorrect, an error is signaled and no transfer is done.
The second reason is that once a disk transfer has started, the bits keep arriv-ing from the disk at a constant rate, whether the controller is ready for them or not. If the controller tried to write data directly
to memory, it would have to go over the system bus for each word transferred. If the bus were busy due to some other device using it (e.g., in burst mode), the controller would have to wait.
When the block is buffered internally, the bus is not needed until the DMA begins, so the design of the controller is much simpler because the DMA transfer to memory is not time critical.
DMA到底用还是不用的争议
Not all computers use DMA. The argument against it is that the main CPU is often far faster than the DMA controller and can do the job much faster (when the limiting factor is not the speed of the I/O device). If
there is no other work for it to do, having the (fast) CPU wait for the (slow) DMA controller to finish is pointless. Also, getting rid of the DMA controller and having the CPU do all the work in software saves money, important on low-end (embedded) computers.
也就是说,对于那种任务不重的CPU(某些低级嵌入式系统),让CPU等着IO完成也无妨,反正没事做(没有其他任务等待完成)。如果是比较高级的嵌入式系统或计算机,可能CPU的负担比较重,会有不同任务之间的相互切换,并发。这样为了充分的利用CPU,于是会有DMA。
5.1.5 Interrupts Revisited
When an I/O device has fin-ished the work given to it, it causes an interrupt
If another one is in progress, or another device has made a simultaneous request on a higher-priority interrupt request line on the bus, the device is just ignored for the moment. In this case it continues to
assert an interrupt signal on the bus until it is serviced by the CPU.
To handle the interrupt, the controller puts a number on the address lines specifying which device wants attention and asserts a signal to interrupt the CPU.
Typically traps and interrupts use the same mechanism from this point on, and frequently share the same interrupt vector.
However, switching into kernel mode may require changing MMU contexts and will probably invalidate most or all of the cache and TLB. Reloading all of these, statically or dynamically will increase the time to
process an interrupt and thus waste CPU time.
Precise and Imprecise Interrupts
An interrupt that leaves the machine in a well-defined state is called aprecise interrupt(Walker and Cragon, 1995). Such an interrupt has four properties:
1. The PC (Program Counter) is saved in a known place.
2. All instructions before the one pointed to by the PC have fully executed.
3. No instruction beyond the one pointed to by the PC has been executed.
4. The execution state of the instruction pointed to by the PC is known.
An interrupt that does not meet these requirements is called animprecise interruptand makes life most unpleasant for the operating system writer, who now has to figure out what has
happened and what still has to happen. Fig. 5-6(b) shows an imprecise interrupt, where different instructions near the program counter are in different stages of completion, with older ones not necessarily more complete than younger ones.
On the other hand, imprecise interrupts make the operating system far more complicated and slower, so it is hard to tell which approach is really better.
5.2 PRINCIPLES OF I/O SOFTWARE
5.2.1 Goals of the I/O Software
A key concept in the design of I/0 software is known asdevice independence. What it means is that it should be possible to write programs that can access any I/O device without
having to specify the device in advance.
Closely related to device independence is the goal ofuniform naming.
Another important issue for I/0 software iserror handling.
Still another key issue is that of synchronous (blocking) versus asynchro-nous (interrupt-driven) transfers. Most physical I/O is asynchronous—the CPU starts the transfer and goes off to do something else until the
interrupt arrives.
Buffering involves considerable copying and often has a major impact on I/O performance.
5.2.2 Programmed I/O
The simplest form of I/O is to have the CPU do all the work. This method is called programmed I/O.
The operating system then (usually) copies the buffer with the string to an array, say, p, in kernel space, where it is more easily accessed (because the kernel may have to change the memory map to get at user space).
In Fig. 5-7(b), however, we see that the first character has been printed and that the system has marked the "B" as the next character to be printed.
Generally, the printer has a second register, which gives its status. The act of writing to the data register causes the status to become not ready.
At this point the operating system waits for the printer to become ready again. When that happens, it prints the next character, as shown in Fig. 5-7(c). This loop continues until the entire string has been printed. Then
control returns to the user process.
First the data are copied to the kernel.
This behavior is often called polling or busy waiting.
copy_from_user(buffer, p, count); for (i = 0; i < count; i++) { while (*printer_status_reg != READY) ; *printer_data_register = p[i]; } return_to_user();
Writing a string to the printer using programmed I/O. Programmed I/O is simple but has the disadvantage of tying up the CPU full time until all the I/O is done. Also, in an embedded system, where the CPU has nothing
else to do, busy waiting is reasonable. However, in more complex systems, where the CPU has other work to do, busy waiting is inefficient. A better I/O method is needed.
5.2.3 Interrupt-Driven I/O
The way to allow the CPU to do something else while waiting for the printer to become ready is to use interrupts.
5.2.4 I/O Using DMA
An obvious disadvantage of interrupt-driven 1/0 is that an interrupt occurs on every character. Interrupts take time, so this scheme wastes a certain amount of CPU time. A solution is to use DMA.
The big win with DMA is reducing the number of interrupts from one per cha-racter to one per buffer printed. If there are many characters and interrupts are slow, this can be a major improvement.
5.4 DISKS
Magnetic Disks
Older disks have little electronics and just deliver a simple serial bit stream.On these disks, the controller does most of the work. On other disks, in particular,IDE
(Integrated Drive Electronics) and SATA (Serial ATA) disks, the diskdrive itself contains a microcontroller that does considerable work and allows thereal controller to issue a set of
higher-level commands. The controller often doestrack caching, bad block remapping, and much more.
5.4.2 Disk Formatting
A hard disk consists of a stack of aluminum, alloy, or glass platters 5.25 inch or 3.5 inch in diameter (or even smaller on notebook computers). On each platter is deposited a thin magnetizable metal oxide. After manufacturing,
there is no information whatsoever on the disk.
Before the disk can be used, each platter must receive a low- level format done by software. The format consists of a series of concentric tracks, each con-taining some number of sectors, with short gaps between the
sectors. The format of a sector is shown in Fig. 5-25.
The preamble starts with a certain bit pattern that allows the hardware to recognize the start of the sector.
A 16-byte ECC field is not unusual.
The position of sector 0 on each track is offset from the previous track when the low-level format is laid down. This offset, called cylinder skew, is done to improve performance.
As a result of the low-level formatting, disk capacity is reduced, depending on the sizes of the preamble, intersector gap, and ECC, as well as the number of spare sectors reserved.
There is considerable confusion about disk capacity because some manufact-urers advertised the unformatted capacity to make their drives look larger than they really are.
On the Pentium and most other computers, sector 0 contains the master boot record.
It also puts a code in the partition table entry telling which file system is used in the partition because many operating systems support multiple incompatible file systems (for historical reasons). At this
point the system can be booted.
5.4.3 Disk Arm Scheduling Algorithms
First, consider how long it takes to read or write a disk block. The time required is determined by three factors:
1. Seek time (the time to move the arm to the proper cylinder).
2. Rotational delay (the time for the proper sector to rotate under the head).
3. Actual data transfer time.
For most disks, the seek time dominates the other two times, so reducing the mean seek time can improve system performance substantially.
However, most elevators use a different algorithm in order to reconcile the mutually conflicting goals of efficiency and fairness. They keep moving in the same direction until there are no more outstanding requests in
that direction, then they switch directions. This algorithm, known both in the disk world and the elevator world as theelevator algorithm
When the highest numbered cylinder with a pending request has been serviced, the arm goes to the lowest-numbered cylinder with a pending request and then continues moving in an upward direction.
If the disk has the property that seek time is much faster than the rotational delay, then a different optimization should be used. Pending requests should be sorted by sector number, and as soon as the next sector is about to pass under the head,
the arm should be zipped over to the right track to read or write it.
With a modern hard disk, the seek and rotational delays so dominate per-formance that reading one or two sectors at a time is very inefficient. For this rea-son, many disk controllers always read and cache multiple
sectors, even when only one is requested.
The use of the cache is determined dynami-cally by the controller. In its simplest mode, the cache is divided into two sec-tions, one for reads and one for writes.
It is worth noting that the disk controller‘s cache is completely independent of the operating system‘s cache.
这章再后面感觉。。。好啰嗦啊。。。有需要再update了,不能拖下去了