1. If using shadow page table, invlpg (invalidate the tlb entry) and set cr3 will trap into VMM. If ept is enabled, cr3 and invlpg don’t need to cause vm exit. In the function setup_vmcs_config in vmx.c, we can find that
if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
/* CR3 accesses and invlpg don't need to cause VM Exits
when EPT enabled */
_cpu_based_exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING |
CPU_BASED_CR3_STORE_EXITING |
CPU_BASED_INVLPG_EXITING);
rdmsr(MSR_IA32_VMX_EPT_VPID_CAP,
vmx_capability.ept, vmx_capability.vpid);
}
2. kvm_mm_reload will be called every time when guest is loaded from vmm in the function of vcpu_enter_guest. If the root_hpa is INVALID_PAGE, it will allocate new roots and set the cr3 to the new root_hpa.
int kvm_mmu_load(struct kvm_vcpu *vcpu)
{
int r;
r = mmu_topup_memory_caches(vcpu);
if (r)
goto out;
spin_lock(&vcpu->kvm->mmu_lock);
kvm_mmu_free_some_pages(vcpu);
r = mmu_alloc_roots(vcpu);
mmu_sync_roots(vcpu);
spin_unlock(&vcpu->kvm->mmu_lock);
if (r)
goto out;
/* set_cr3() should ensure TLB has been flushed */
kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
out:
return r;
}
3. kvm_x86_ops->set_cr3 will call vmx_set_cr3. And it will use
vmcs_writel(GUEST_CR3, guest_cr3);
which will set the register of guest cr3 to root_hpa. Then the guest will use the shadow page table which is offered by the underlying VMM.
4. There is still a function named kvm_set_cr3 which will be called everytime if there is context switch in the guest. The guest want to set the cr3 register in this case. KVM will save this value in the vcpu->arch.cr3. We need to clarify that this value is not the value which will be actually set in the guest. But it can be used to translate guest virtual address to guest physical address. And we can find that after setting this cr3, the vmm will call the function of new_cr3 which will free the roots of the shadow page table because the shadow page table also needs to be switched if there is context switch.
5. If there is page fault happened, we can group it into two categories. The first kind of page fault is due to that the page is not mapped in the guest. In this case, KVM inject a page fault into the guest and let the guest handle it. Function walk_addr will walk the guest page table other than the shadow page table. In the function of FNAME(page_fault) in paging_tmpl.h
/*
* Look up the guest pte for the faulting address.
*/
r = FNAME(walk_addr)(&walker, vcpu, addr, write_fault, user_fault,fetch_fault);
/*
* The page is not mapped by the guest. Let the guest handle it.
*/
if (!r) {
pgprintk("%s: guest page fault\n", __func__);
inject_page_fault(vcpu, addr, walker.error_code);
vcpu->arch.last_pt_write_count = 0; /* reset fork detector */
return 0;
}
The second kind is due to that the page is not mapped in the shadow page table. KVM will fetch a new shadow page and insert it into the shadow page table.
Iam new to virtualization field.But know something about linux normal pagetables and x86 hw.
Can you pls explain how page tables are managed in
1.Shadow page tables
2.With HW support for virtualization and NO EPT support.
3.With HW support for virtualization and EPT support.