BACKGROUND OF THE INVENTION
A conventional virtual-machine monitor (VMM) typically runs on a computer and presents to other software the abstraction of one or more virtual machines. Each virtual machine may function as a self-contained platform, running its own "guest operating system" (i.e., an operating system hosted by the VMM). The guest operating system expects to operate as if it were running on a dedicated computer rather than a virtual machine. That is, the guest operating system expects to control various computer operations and have access to hardware resources during these operations. The hardware resources may include processor-resident resources (e.g., control registers) and resources that reside in memory (e.g., descriptor tables).
In a virtual-machine environment, the VMM should be able to have ultimate control over these resources to provide proper operation of virtual machines and for protection from and between virtual machines. To achieve this, the VMM typically intercepts and arbitrates all accesses made by guest software to the hardware resources. Specifically, when guest software requests an operation that requires access to a protected hardware resource, the control over this operation is transferred to the VMM which then assures the validity of the access, emulates the functionality desired by guest software and transfers control back to the guest software, thereby protecting the hardware resources and virtualizing accesses of guest software to hardware resources. Because the number of hardware resource elements that need to be protected from accesses by guest software is large and such accesses may be frequent, there is a significant performance cost associated with this protection and virtualization.
One example of a hardware resource that is frequently accessed by guest software is a control register. For instance, in the instruction-set architecture (ISA) of the Intel Pentium IV (referred to herein as the IA-32 ISA), there are a number of control registers that are used to configure the processor operating mode, control the memory subsystem configuration and hardware resources, etc. Typically, when guest software attempts to access a bit in a control register, the control is transferred to the VMM which is responsible for maintaining consistency between write and read operations initiated by the guest software with respect to this bit. That is, the VMM controls the value that guest software is allowed to write to each bit of the control register and the value that guest software reads from each bit. Such virtualization of control register accesses creates significant performance overheads.
DESCRIPTION OF EMBODIMENTS
A method and apparatus for controlling accesses of guest software to registers in a virtual-machine architecture are described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention can be practiced without these specific details.
Some portions of the detailed descriptions that follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer system‘s registers or memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or the like, may refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system‘s registers and memories into other data similarly represented as physical quantities within the computer-system memories or registers or other such information storage, transmission or display devices.
In the following detailed description of the embodiments, reference is made to the accompanying drawings that show, by way of illustration, specific embodiments in which the invention may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the present invention. Moreover, it is to be understood that the various embodiments of the invention, although different, are not necessarily mutually exclusive. For example, a particular feature, structure, or characteristic described in one embodiment may be included within other embodiments. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims, along with the full scope of equivalents to which such claims are entitled.
FIG. 1 illustrates one embodiment of a virtual-machine environment?100, in which the present invention may operate. In this embodiment, bare platform hardware?116?comprises a computing platform, which may be capable, for example, of executing a standard operating system (OS) or a virtual-machine monitor (VMM), such as a VMM?112. The VMM?112, though typically implemented in software, may emulate and export a bare machine interface to higher level software. Such higher level software may comprise a standard or real-time OS, may be a highly stripped down operating environment with limited operating system functionality, may not include traditional OS facilities, etc. Alternatively, for example, the VMM?112?may be run within, or on top of, another VMM. VMMs and their typical features and functionality are well-known by those skilled in the art and may be implemented, for example, in software, firmware or by a combination of various techniques.
The platform hardware?116?includes a processor?118?and memory?120. Processor?118?can be any type of processor capable of executing software, such as a microprocessor, digital signal processor, microcontroller, or the like. The processor?118?may include microcode or hardcoded logic for performing the execution of method embodiments of the present invention.
The platform hardware?116?can be of a personal computer (PC), mainframe, handheld device, portable computer, set-top box, or any other computing system.
Memory?120?can be a hard disk, a floppy disk, random access memory (RAM), read only memory (ROM), flash memory, or any other type of machine medium readable by processor?118. Memory?120?may store instructions for performing the execution of method embodiments of the present invention.
The VMM?112?presents to other software (i.e., "guest" software) the abstraction of one or more virtual machines (VMs), which may provide the same or different abstractions to the various guests. FIG. 1 shows two VMs,?102?and?114. The guest software running on each VM may include a guest OS such as a guest OS?104?or?106?and various guest software applications?108–110. Each of the guest OSs?104?and?106?expects to control access to physical resources (e.g., processor registers, memory and memory-mapped I/O devices) within the VMs?102?and?114?on which the guest OS?104?or?106?is running and to perform other functions.
The VMM?112?facilitates functionality desired by guest software while retaining ultimate control over privileged hardware resources within the platform hardware?116. Specifically, once guest software attempts to access a privileged resource, the control over the processor is transferred to the VMM112, which then decides whether to perform a requested operation (e.g., emulate it for the guest software, proxy the operation directly to the platform hardware?116, etc.) or deny access to the resource to facilitate security, reliability or other mechanisms. The act of facilitating the functionality for the guest software may include a wide variety of activities on the part of the VMM?112. The activities of the VMM?112?as well as its characteristics should not limit the scope of the present invention.
In one embodiment, the transfer of control from guest software to VMM is dictated by control bit settings in a virtual machine control structure (VMCS)122. Settings in the VMCS?122?may prevent guest software from performing operations that may result in its access of certain privileged hardware resources. Different guest software may execute with different control bit settings in different VMCS memory images, though only one such VMCS is shown in FIG. 1. The VMCS?122?resides in memory?120?and is maintained by the processor?118. It should be noted that any other data structure (e.g., an on-chip cache, a file, a lookup table, etc.) may be used to store the VMCS?122?or the fields associated with each designated hardware resource without loss of generality.
When guest software attempts to perform an operation which accesses protected resources, control is transferred to the VMM?112. The VMM?112?has access to all platform hardware?116. When such a transition occurs, the VMM?112?receives control over the operation initiated by guest software. The VMM?112?then may perform this operation or deny access as described above, and may transfer control back to guest software by executing a special instruction. The control of guest software through this mechanism is referred to herein as VMX operation and the transfer of control from the guest software to VMM is referred to herein as a VM exit.
In one embodiment, the execution of certain instructions, certain exceptions and interrupts and certain platform events may cause a VM exit. These potential causes of VM exits are referred to herein as virtualization events. For example, a VM exit may be generated when guest software attempts to perform an operation (e.g., an instruction) that may result in its access of certain privileged hardware resources (e.g., a control register or an IO port).
In an embodiment, when a VM exit occurs, components of the processor state used by guest software are saved, and components of the processor state required by the VMM?112?are loaded. This saving and loading of processor state may, depending on the processor ISA, have the effect of changing the active address space (e.g., in the IA-32 ISA, the active address space is determined by the values in the control registers, which may be saved and restored on VM exit). In one embodiment, the components of the processor state used by guest software are stored in a guest-state area of VMCS?122?and the components of the processor state required by the VMM?112?are stored in a monitor-state area of VMCS?122.
In one embodiment, when a transition from the VMM to guest software occurs, the processor state that was saved at the VM exit is restored and control is returned to the guest OS?104?or?106?or guest applications?108?or?110.
In an embodiment, when a VM exit occurs, control is passed to the VMM?112?at a specific entry point (e.g., an instruction pointer value) delineated in the VMCS?122. In another embodiment, control is passed to the VMM?112?after vectoring through a redirection structure (e.g., the interrupt-descriptor table in the IA-32 ISA). Alternatively, any other mechanism known in the art can be used to transfer control from the guest software to the VMM?112.
Because the number of hardware resource elements that need to be protected from accesses by guest software is large and such accesses may be frequent, there is a significant performance cost associated with this protection and virtualization. In addition, an operation initiated by guest software may involve access to a privileged resource, which may pose no problem to the security and proper operation of the VMs?102?and?114. For example, in the IA-32 ISA, control register?0?(CR0) includes a task-switch (TS) bit that is used to optimize context switching by avoiding saving and restoring floating-point state until the state is accessed. The update of the TS bit by the guest OS through the Clear Task-Switched Flag (CLTS) instruction is unlikely to pose a problem to system security and proper operation of the VMs?102?and?114. In contrast, the paging enable (PG) bit of CR0?configures the processor operating mode and as such must be controlled exclusively by the VMM?112. In some cases, the VMM?112?may not allow the guest software to disable paging and therefore must control attempts of the guest software to do so.
In one embodiment, a filtering mechanism is provided for reducing the number of VM exits caused by accesses of guest software to such hardware resources as registers (e.g., control registers, general purpose registers, model-specific registers, etc.) or memory-based resources (e.g., paging control fields in memory, etc.). It should be noted that while an exemplary embodiment of the present invention is described below with reference to a register, the teachings of the present invention may be applied to any other hardware resource without loss of generality.
The filtering mechanism functions using one or more fields associated with each designated hardware resource as will be described in greater detail below. In one embodiment, the fields associated with each designated hardware resource are contained in a VMCS?122.
FIG. 2 is a flow diagram of one embodiment of a process?200?for filtering accesses of guest software to a hardware resource such as a register. The process may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
Referring to FIG. 2, process?200?begins with processing logic receiving a command pertaining to one or more portions of a register from guest software (processing block?202). A register portion may be a particular single bit of the register or multiple (contiguous or non-contiguous) bits of the register. The command pertaining to the register portions may be a read command requesting to read data from the register portions or a write command requesting to write data to the register portions. The register may represent a control register (e.g., CR0?or CR4?in the IA-32 ISA), an integer register, or any other register or memory-based resource.
Next, processing logic reads corresponding indicators from a mask field (processing block?204). The mask field includes a set of indicators corresponding to portions of the register. For example, if the register is a 32-bit control register (e.g., CR0?or CR4?in the IA-32 ISA), the mask field may include 32 indicators, with each indicator corresponding to a particular bit of the control register. Alternatively, the mask field may have fewer indicators than the number of bits in the register because the register may have unused bits, some indicators in the mask field may correspond to two or more bits, or for any other reason. Each indicator in the mask field provides information on whether a portion is under guest control (i.e., the guest software is permitted to access the corresponding portion of the register) or under control of the VMM. In an embodiment of the invention, bits in the register that do not have a corresponding mask bit are assumed to be under guest control. In another embodiment of the invention, they are assumed to be under VMM control.
At decision box?206, processing logic determines whether guest software is permitted to access all of the requested register portions based on the corresponding indicators from the mask field. If the determination is positive, processing logic executes the command on the requested register portions (processing block?208). That is, processing logic reads data from, or writes data to, the requested register portions.
Otherwise, if the determination made at decision box?206, is negative, then in one embodiment, processing logic transfers control to the VMM (processing block?210).
In an alternative embodiment, an extra field is used to further reduce the number of situations in which control is transferred in to the VMM. The extra field is referred to herein as a shadow value field. Each portion of the shadow value field corresponds to a particular portion of the register and stores the value that guest software expects to see in this portion of the register. In an embodiment of the invention, the value of the shadow value field is maintained by the VMM and is stored in the VMCS. In one embodiment, only the register portions with the indicators in the mask field that indicate the inability of guest software to access these register portions have corresponding portions in the shadow value field. For example, in the IA-32 ISA, if guest software is not permitted to access bits?1?through?10?in CR0?as reflected by values of indicators in the mask field, the size of the shadow value field will be limited to 10 bits that correspond to bits?1?through?10?of CR0. In another embodiment, each register portion with an indicator (regardless of its value) in the mask field has a corresponding portion in the shadow value field.
FIG. 4 is a flow diagram of one embodiment of a process?400?for providing an additional filtering of guest software accesses to a hardware resource such as a register. In one embodiment, process?400?replaces block?210?in FIG. 2. The process may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
Referring to FIG. 4, process?400?begins at processing block?402?with processing logic determining that one or more register portions being accessed are under control of the VMM based on corresponding one or more indicators in a mask field, as discussed above in conjunction with FIG. 2.
Next, at decision box?403, processing logic determines whether the access is a command to write data to the requested register portions. If the determination is positive, i.e., the access is a write command, processing logic determines whether the data that the guest wishes to write to each of the portions is equal to data stored in corresponding portions of a read shadow field for all portions under VMM control (decision box?404). If this determination is positive for all requested portions that are under VMM control, the guest is allowed to write data to all portions of the actual register resource that are under guest control (as determined by the corresponding bits in the mask field) (processing block?405) and then process?400?ends. If the determination is negative for any requested portions that are under VMM control, processing logic transfers control to the VMM (processing block406). The VMM then updates the corresponding portion of the shadow value field and actual register resource as necessary according to its implementation requirements and transfers control back to guest software.
Alternatively, if the command initiated by guest software is a command to read data from the requested register portions, control is not transferred to the VMM. Specifically, processing logic accesses the corresponding portions of the shadow value field for all requested portions that are under VMM control (processing block?412) and returns data stored in these portions of the shadow value field combined with values from the actual register resource for portions of the resource that are under guest control to guest software (processing block?414).
One embodiment in which the transfer of control to the VMM is supported via VMX operation discussed in greater detail above with reference to FIG. 1 will now be described in more detail.
In one embodiment, the VMM maintains a set of control bits to configure which virtualization events will cause a VM exit. This set of control bits is referred to herein as a redirection map. In one embodiment, the redirection map is contained in the VMCS?122?of FIG. 1. Once an occurrence of a virtualization event is detected, the redirection map is consulted to find an unconditional exit bit associated with this virtualization event. The bit indicates whether this virtualization event will unconditionally result in a VM exit. For example, the redirection map may include two bits for each control register, with one bit controlling VM exits on guest requests to read data from the control register and the other bit controlling VM exits on guest requests to write data to the control register.
In addition, in one embodiment, for each designated resource (e.g., CR0?or CR4?in the IA-32 ISA), the redirection map includes a bit indicating whether a mask field will be used for this resource and a bit indicating whether a shadow value field will be used for this resource.
FIG. 5 is flow diagram of one embodiment of a process?500?for controlling access to a hardware resource such as a register during VMX operation using a redirection map. The process may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
Referring to FIG. 5, process?500?begins at processing block?502?with processing logic identifying an occurrence of a virtualization event caused by a request of guest software to access a portion of a hardware resource such as a control register. This request is either a command to read data from one or more portions of a particular register or a command to write data to one or more portions of a particular register.
At processing block?503, processing logic consults the redirection map to determine if the unconditional exit bit associated with this virtualization event is set (decision box?506). If this bit is set, processing logic triggers a VM exit (processing block?522). For example, in the IA-32 ISA the redirection map may include bits to unconditionally cause VM exits on writes to CR2, reads from CR0, writes to CR4, etc.
Alternatively, if the unconditional exit bit is not set, processing logic further determines whether a mask field is to be used for the register (decision box508). This determination is made using a designated bit in the redirection map. For example, in the IA-32 ISA there may be bits in the redirection map indicating if a mask is used for CR0, if a mask is used for CR4, etc. If the mask field is not to be used for this register, processing logic executes the requested read or write command on the requested register portions (processing block?514). Otherwise, processing logic reads mask field bits corresponding to the requested register portions (processing block?510). These bits are referred to as the requested mask field bits. The requested mask field bits are examined to determine if one or more of them are set (indicating that one or more of the corresponding register portions are under VMM control) (decision box?512).
If none of the requested mask field bits are set, i.e., guest software is allowed to access all of the requested register portions, processing logic executes the requested read or write command on the register portions (processing block?514). Otherwise, if any bits in the requested mask field are set, processing logic determines whether a shadow value field will be used for the register based on a designated bit in the redirection map (decision box?516). For example, in the IA-32 ISA there may be bits in the redirection map to indicate if a shadow value is used for CR0?accesses, for CR4accesses, etc. If the shadow value field is not to be used for the register, processing logic triggers a VM exit (processing box?522).
If the shadow value field is to be used for the register and the request initiated by guest software is a read command (decision box?517), processing logic reads the bits of the shadow value field that correspond to those register portions that are set in the requested mask field and hence are under VMM control (processing block?518). These bits from the shadow value field are combined with the bits from the actual register that correspond to bits in the requested mask field that are not set and hence are under guest control. These combined values are then returned to guest software. Values of bits in the protected resource which are not represented in the mask and/or shadow value field may be read from the register.
If the shadow value field is to be used for the register but the request initiated by guest software is a write command, processing logic compares the value requested to be written to register bits under VMM control with the value of corresponding bits in the shadow value field (decision box?520). If these two values are the same, the requested register portions that are under guest control are written (processing block?519). That is, the bits under guest control are written; those under VMM control remain unchanged. In one embodiment, bits in the register which are not represented in the mask and/or shadow value field may be written if they are assumed to be under guest control. In another embodiment, data is not written to the unrepresented bits because they are assumed to be under VMM control. Otherwise, if the two values compared at decision box?520?are different, processing logic triggers a VM exit (processing block?522).
In one embodiment, a set of criteria is predefined by the VMM for filtering VM exits. The criteria are based on combinations of values stored in a mask field and a shadow value field and a value that guest software wishes to write to the register. FIG. 3 is a flow diagram of one embodiment of a process300?for filtering VM exits using a set of criteria. The process may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
Referring to FIG. 3, process?300?begins with processing logic determining whether the access of guest software is a request to write data to a register (decision box?302). If the determination is negative, i.e., the access is a request to read data from the register, the resulting value of the read request is determined using the following expression:?
DEST=(MF AND SVF) OR (NOT MF AND CRVAL),?
where AND, NOT and OR are bitwise Boolean operators, MF is a value of the mask field, SVF is a value of the shadow value field, and CRVAL is the current value of the actual protected register. A bit in the mask field is set if a corresponding bit in the register is controlled by the VMM. Otherwise, if a bit in the register is controlled by guest software, then a corresponding bit in the mask field is equal to zero. A bit in the shadow field has the value that guest software expects to see in the corresponding bit of the register and may be different from the current value of the corresponding bit of the actual register.
According to the above expression, if the requested bit is controlled by guest software, the data is read from the register, and if the requested bit is controlled by the VMM, the data is read from the shadow value field.
If processing logic determines at decision box?302?that the access of guest software is a write request, processing logic combines the values of the mask field and shadow value field (processing block?304) as follows:?
INT1=MF AND SVF.
In addition, processing logic combines the value of the mask field with the value that guest software wishes to write to the register (processing block306) using the following expression:?
INT2=MF AND SRC,?
where SRC is the value that the guest wishes to write to the register.
Further, processing logic compares the two combinations at decision box?308. If the two combinations are equal, i.e., all bits in the register are either controlled by guest software, or controlled by the VMM and the value of the corresponding bit in shadow value field is equal to the value that guest software wishes to write to the register, then processing logic executes the following expression at processing block?312:?
CR=(MF AND CRVAL) OR (NOT MF AND SRC).
According to this expression, if a bit in the register is controlled by guest software, the bit will be updated with the value that guest software wishes to write. Otherwise, the value of the bit in the register will remain the same and will not be updated.
Alternatively, if the two combinations are not equal, i.e., at least one bit in the register is controlled by the VMM and the value of the corresponding bit in shadow value field is not equal to the value that guest software wishes to write to the register, then processing logic triggers a VM exit at processing block?310.
Note that the description of the process?300?is simplified by using the entire register (e.g., 32 bits for the CR0?register in the IA-32 ISA) and mask and shadow value fields that are 32 bits wide. A person of ordinary skill in the art will understand that embodiments of the present invention can apply to read and write operations that access only a limited subset of the register bits or that access bits in multiple registers. Additionally, those skilled in the art will see application of the invention to situations where there is not a bit-for-bit correspondence between the various elements involved (e.g. if bits in the mask apply to multiple bits in the protected resource).
SRC=http://www.freepatentsonline.com/7127548.html
PatentTips - Control register access virtualization performance improvement