UART at the specified I/O port or MMIO address,
switching to the matching ttyS device later. The
options are the same as for ttyS, above.
+ hvc<n> Use the hypervisor console device <n>. This is for
+ both Xen and PowerPC hypervisors.
If the device connected to the port is not a TTY but a braille
device, prepend "brl," before the device type, for instance
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.
+ crashkernel_low=size[KMG]
+ [KNL, x86] parts under 4G.
+
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
earlyprintk= [X86,SH,BLACKFIN]
earlyprintk=vga
+ earlyprintk=xen
earlyprintk=serial[,ttySn[,baudrate]]
earlyprintk=ttySn[,baudrate]
earlyprintk=dbgp[debugController#]
The VGA output is eventually overwritten by the real
console.
+ The xen output can only be used by Xen PV guests.
+
ekgdboc= [X86,KGDB] Allow early kernel console debugging
ekgdboc=kbd
Claim all unknown PCI IDE storage controllers.
idle= [X86]
- Format: idle=poll, idle=mwait, idle=halt, idle=nomwait
+ Format: idle=poll, idle=halt, idle=nomwait
Poll forces a polling idle loop that can slightly
improve the performance of waking up a idle CPU, but
will use a lot of power and make the system run hot.
Not recommended.
- idle=mwait: On systems which support MONITOR/MWAIT but
- the kernel chose to not use it because it doesn't save
- as much power as a normal idle loop, use the
- MONITOR/MWAIT idle loop anyways. Performance should be
- the same as idle=poll.
idle=halt: Halt is forced to be used for CPU idle.
In such case C2/C3 won't be used again.
idle=nomwait: Disable mwait for CPU C-states
0 disables intel_idle and fall back on acpi_idle.
1 to 6 specify maximum depth of C-state.
+ intel_pstate= [X86]
+ disable
+ Do not enable intel_pstate as the default
+ scaling driver for the supported processors
+
intremap= [X86-64, Intel-IOMMU]
on enable Interrupt Remapping (default)
off disable Interrupt Remapping
that the amount of memory usable for all allocations
is not too small.
+ movablemem_map=acpi
+ [KNL,X86,IA-64,PPC] This parameter is similar to
+ memmap except it specifies the memory map of
+ ZONE_MOVABLE.
+ This option inform the kernel to use Hot Pluggable bit
+ in flags from SRAT from ACPI BIOS to determine which
+ memory devices could be hotplugged. The corresponding
+ memory ranges will be set as ZONE_MOVABLE.
+ NOTE: Whatever node the kernel resides in will always
+ be un-hotpluggable.
+
+ movablemem_map=nn[KMG]@ss[KMG]
+ [KNL,X86,IA-64,PPC] This parameter is similar to
+ memmap except it specifies the memory map of
+ ZONE_MOVABLE.
+ If user specifies memory ranges, the info in SRAT will
+ be ingored. And it works like the following:
+ - If more ranges are all within one node, then from
+ lowest ss to the end of the node will be ZONE_MOVABLE.
+ - If a range is within a node, then from ss to the end
+ of the node will be ZONE_MOVABLE.
+ - If a range covers two or more nodes, then from ss to
+ the end of the 1st node will be ZONE_MOVABLE, and all
+ the rest nodes will only have ZONE_MOVABLE.
+ If memmap is specified at the same time, the
+ movablemem_map will be limited within the memmap
+ areas. If kernelcore or movablecore is also specified,
+ movablemem_map will have higher priority to be
+ satisfied. So the administrator should be careful that
+ the amount of movablemem_map areas are not too large.
+ Otherwise kernel won't have enough memory to start.
+ NOTE: We don't stop users specifying the node the
+ kernel resides in as hotpluggable so that this
+ option can be used as a workaround of firmware
+ bugs.
+
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
wfi(ARM) instruction doesn't work correctly and not to
use it. This is also useful when using JTAG debugger.
- no-hlt [BUGS=X86-32] Tells the kernel that the hlt
- instruction doesn't work correctly and not to
- use it.
-
no_file_caps Tells the kernel not to honor file capabilities. The
only way then for a file to be executed with privilege
is to be setuid root or executed by root.
This sorting is done to get a device
order compatible with older (<= 2.4) kernels.
nobfsort Don't sort PCI devices into breadth-first order.
+ pcie_bus_tune_off Disable PCIe MPS (Max Payload Size)
+ tuning and use the BIOS-configured MPS defaults.
+ pcie_bus_safe Set every device's MPS to the largest value
+ supported by all devices below the root complex.
+ pcie_bus_perf Set device MPS to the largest allowable MPS
+ based on its parent bus. Also set MRRS (Max
+ Read Request Size) to the largest supported
+ value (no larger than the MPS that the device
+ or bus can support) for best performance.
+ pcie_bus_peer2peer Set every device's MPS to 128B, which
+ every device is guaranteed to support. This
+ configuration allows peer-to-peer DMA between
+ any pair of devices, possibly at the cost of
+ reduced performance. This also guarantees
+ that hot-added devices will work.
cbiosize=nn[KMG] The fixed amount of bus space which is
reserved for the CardBus bridge's IO window.
The default value is 256 bytes.
the default.
off: Turn ECRC off
on: Turn ECRC on.
+ hpiosize=nn[KMG] The fixed amount of bus space which is
+ reserved for hotplug bridge's IO window.
+ Default size is 256 bytes.
+ hpmemsize=nn[KMG] The fixed amount of bus space which is
+ reserved for hotplug bridge's memory window.
+ Default size is 2 megabytes.
realloc= Enable/disable reallocating PCI bridge resources
if allocations done by BIOS are too small to
accommodate resources required by all child
.code64
.globl startup_64
startup_64:
-
/*
- * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
+ * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
* and someone has loaded an identity mapped page table
* for us. These identity mapped page tables map all of the
* kernel pages and possibly all of memory.
*
- * %esi holds a physical pointer to real_mode_data.
+ * %rsi holds a physical pointer to real_mode_data.
*
* We come here either directly from a 64bit bootloader, or from
* arch/x86_64/boot/compressed/head.S.
* tables and then reload them.
*/
- /* Compute the delta between the address I am compiled to run at and the
+ /*
+ * Compute the delta between the address I am compiled to run at and the
* address I am actually running at.
*/
leaq _text(%rip), %rbp
testl %eax, %eax
jnz bad_address
- /* Is the address too large? */
- leaq _text(%rip), %rdx
- movq $PGDIR_SIZE, %rax
- cmpq %rax, %rdx
- jae bad_address
-
- /* Fixup the physical addresses in the page table
+ /*
+ * Is the address too large?
*/
- addq %rbp, init_level4_pgt + 0(%rip)
- addq %rbp, init_level4_pgt + (L4_PAGE_OFFSET*8)(%rip)
- addq %rbp, init_level4_pgt + (L4_START_KERNEL*8)(%rip)
+ leaq _text(%rip), %rax
+ shrq $MAX_PHYSMEM_BITS, %rax
+ jnz bad_address
- addq %rbp, level3_ident_pgt + 0(%rip)
+ /*
+ * Fixup the physical addresses in the page table
+ */
+ addq %rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
addq %rbp, level3_kernel_pgt + (510*8)(%rip)
addq %rbp, level3_kernel_pgt + (511*8)(%rip)
addq %rbp, level2_fixmap_pgt + (506*8)(%rip)
- /* Add an Identity mapping if I am above 1G */
+ /*
+ * Set up the identity mapping for the switchover. These
+ * entries should *NOT* have the global bit set! This also
+ * creates a bunch of nonsense entries but that is fine --
+ * it avoids problems around wraparound.
+ */
leaq _text(%rip), %rdi
- andq $PMD_PAGE_MASK, %rdi
+ leaq early_level4_pgt(%rip), %rbx
movq %rdi, %rax
- shrq $PUD_SHIFT, %rax
- andq $(PTRS_PER_PUD - 1), %rax
- jz ident_complete
+ shrq $PGDIR_SHIFT, %rax
- leaq (level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
- leaq level3_ident_pgt(%rip), %rbx
- movq %rdx, 0(%rbx, %rax, 8)
+ leaq (4096 + _KERNPG_TABLE)(%rbx), %rdx
+ movq %rdx, 0(%rbx,%rax,8)
+ movq %rdx, 8(%rbx,%rax,8)
+ addq $4096, %rdx
movq %rdi, %rax
- shrq $PMD_SHIFT, %rax
- andq $(PTRS_PER_PMD - 1), %rax
- leaq __PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
- leaq level2_spare_pgt(%rip), %rbx
- movq %rdx, 0(%rbx, %rax, 8)
-ident_complete:
+ shrq $PUD_SHIFT, %rax
+ andl $(PTRS_PER_PUD-1), %eax
+ movq %rdx, (4096+0)(%rbx,%rax,8)
+ movq %rdx, (4096+8)(%rbx,%rax,8)
+
+ addq $8192, %rbx
+ movq %rdi, %rax
+ shrq $PMD_SHIFT, %rdi
+ addq $(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+ leaq (_end - 1)(%rip), %rcx
+ shrq $PMD_SHIFT, %rcx
+ subq %rdi, %rcx
+ incl %ecx
+
+1:
+ andq $(PTRS_PER_PMD - 1), %rdi
+ movq %rax, (%rbx,%rdi,8)
+ incq %rdi
+ addq $PMD_SIZE, %rax
+ decl %ecx
+ jnz 1b
/*
* Fixup the kernel text+data virtual addresses. Note that
* cleanup_highmap() fixes this up along with the mappings
* beyond _end.
*/
-
leaq level2_kernel_pgt(%rip), %rdi
leaq 4096(%rdi), %r8
/* See if it is a valid page table entry */
/* Fixup phys_base */
addq %rbp, phys_base(%rip)
- /* Due to ENTRY(), sometimes the empty space gets filled with
- * zeros. Better take a jmp than relying on empty space being
- * filled with 0x90 (nop)
- */
- jmp secondary_startup_64
+ movq $(early_level4_pgt - __START_KERNEL_map), %rax
+ jmp 1f
ENTRY(secondary_startup_64)
/*
- * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
+ * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
* and someone has loaded a mapped page table.
*
- * %esi holds a physical pointer to real_mode_data.
+ * %rsi holds a physical pointer to real_mode_data.
*
* We come here either from startup_64 (using physical addresses)
* or from trampoline.S (using virtual addresses).
* after the boot processor executes this code.
*/
+ movq $(init_level4_pgt - __START_KERNEL_map), %rax
+1:
+
/* Enable PAE mode and PGE */
- movl $(X86_CR4_PAE | X86_CR4_PGE), %eax
- movq %rax, %cr4
+ movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
+ movq %rcx, %cr4
/* Setup early boot stage 4 level pagetables. */
- movq $(init_level4_pgt - __START_KERNEL_map), %rax
addq phys_base(%rip), %rax
movq %rax, %cr3
movq %rax, %cr0
/* Setup a boot time stack */
- movq stack_start(%rip),%rsp
+ movq stack_start(%rip), %rsp
/* zero EFLAGS after setting rsp */
pushq $0
movl initial_gs+4(%rip),%edx
wrmsr
- /* esi is pointer to real mode structure with interesting info.
+ /* rsi is pointer to real mode structure with interesting info.
pass it to C */
- movl %esi, %edi
+ movq %rsi, %rdi
/* Finally jump to run C code and to be on real kernel address
* Since we are running on identity-mapped space we have to jump
* to the full 64bit address, this is only possible as indirect
* jump. In addition we need to ensure %cs is set so we make this
* a far return.
+ *
+ * Note: do not change to far jump indirect with 64bit offset.
+ *
+ * AMD does not support far jump indirect with 64bit offset.
+ * AMD64 Architecture Programmer's Manual, Volume 3: states only
+ * JMP FAR mem16:16 FF /5 Far jump indirect,
+ * with the target specified by a far pointer in memory.
+ * JMP FAR mem16:32 FF /5 Far jump indirect,
+ * with the target specified by a far pointer in memory.
+ *
+ * Intel64 does support 64bit offset.
+ * Software Developer Manual Vol 2: states:
+ * FF /5 JMP m16:16 Jump far, absolute indirect,
+ * address given in m16:16
+ * FF /5 JMP m16:32 Jump far, absolute indirect,
+ * address given in m16:32.
+ * REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
+ * address given in m16:64.
*/
movq initial_code(%rip),%rax
pushq $0 # fake return address to stop unwinder
/* SMP bootup changes these two */
__REFDATA
- .align 8
- ENTRY(initial_code)
+ .balign 8
+ GLOBAL(initial_code)
.quad x86_64_start_kernel
- ENTRY(initial_gs)
+ GLOBAL(initial_gs)
.quad INIT_PER_CPU_VAR(irq_stack_union)
- ENTRY(stack_start)
+ GLOBAL(stack_start)
.quad init_thread_union+THREAD_SIZE-8
.word 0
__FINITDATA
bad_address:
jmp bad_address
- .section ".init.text","ax"
+ __INIT
.globl early_idt_handlers
early_idt_handlers:
# 104(%rsp) %rflags
i = i + 1
.endr
+/* This is global to keep gas from relaxing the jumps */
ENTRY(early_idt_handler)
cld
pushq %r11 # 0(%rsp)
cmpl $__KERNEL_CS,96(%rsp)
- jne 10f
+ jne 11f
+
+ cmpl $14,72(%rsp) # Page fault?
+ jnz 10f
+ GET_CR2_INTO(%rdi) # can clobber any volatile register if pv
+ call early_make_pgtable
+ andl %eax,%eax
+ jz 20f # All good
+10:
leaq 88(%rsp),%rdi # Pointer to %rip
call early_fixup_exception
andl %eax,%eax
jnz 20f # Found an exception entry
-10:
+11:
#ifdef CONFIG_EARLY_PRINTK
GET_CR2_INTO(%r9) # can clobber any volatile register if pv
movl 80(%rsp),%r8d # error code
1: hlt
jmp 1b
-20: # Exception table entry found
+20: # Exception table entry found or page table generated
popq %r11
popq %r10
popq %r9
addq $16,%rsp # drop vector number and error code
decl early_recursion_flag(%rip)
INTERRUPT_RETURN
+ENDPROC(early_idt_handler)
+
+ __INITDATA
.balign 4
early_recursion_flag:
early_idt_ripmsg:
.asciz "RIP %s\n"
#endif /* CONFIG_EARLY_PRINTK */
- .previous
#define NEXT_PAGE(name) \
.balign PAGE_SIZE; \
-ENTRY(name)
+GLOBAL(name)
/* Automate the creation of 1 to 1 mapping pmd entries */
#define PMDS(START, PERM, COUNT) \
i = i + 1 ; \
.endr
+ __INITDATA
+NEXT_PAGE(early_level4_pgt)
+ .fill 511,8,0
+ .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+
+NEXT_PAGE(early_dynamic_pgts)
+ .fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
+
.data
- /*
- * This default setting generates an ident mapping at address 0x100000
- * and a mapping for the kernel that precisely maps virtual address
- * 0xffffffff80000000 to physical address 0x000000. (always using
- * 2Mbyte large pages provided by PAE mode)
- */
+
+#ifndef CONFIG_XEN
NEXT_PAGE(init_level4_pgt)
- .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
- .org init_level4_pgt + L4_PAGE_OFFSET*8, 0
- .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
- .org init_level4_pgt + L4_START_KERNEL*8, 0
+ .fill 512,8,0
+#else
+NEXT_PAGE(init_level4_pgt)
+ .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+ .org init_level4_pgt + L4_PAGE_OFFSET*8, 0
+ .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+ .org init_level4_pgt + L4_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
- .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
NEXT_PAGE(level3_ident_pgt)
.quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
- .fill 511,8,0
+ .fill 511, 8, 0
+NEXT_PAGE(level2_ident_pgt)
+ /* Since I easily can, map the first 1G.
+ * Don't set NX because code runs from these pages.
+ */
+ PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
+#endif
NEXT_PAGE(level3_kernel_pgt)
.fill L3_START_KERNEL,8,0
.quad level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
.quad level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
-NEXT_PAGE(level2_fixmap_pgt)
- .fill 506,8,0
- .quad level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
- /* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
- .fill 5,8,0
-
-NEXT_PAGE(level1_fixmap_pgt)
- .fill 512,8,0
-
-NEXT_PAGE(level2_ident_pgt)
- /* Since I easily can, map the first 1G.
- * Don't set NX because code runs from these pages.
- */
- PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
-
NEXT_PAGE(level2_kernel_pgt)
/*
* 512 MB kernel mapping. We spend a full page on this pagetable
PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
KERNEL_IMAGE_SIZE/PMD_SIZE)
-NEXT_PAGE(level2_spare_pgt)
- .fill 512, 8, 0
+NEXT_PAGE(level2_fixmap_pgt)
+ .fill 506,8,0
+ .quad level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+ /* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
+ .fill 5,8,0
+
+NEXT_PAGE(level1_fixmap_pgt)
+ .fill 512,8,0
#undef PMDS
-#undef NEXT_PAGE
.data
.align 16
.skip IDT_ENTRIES * 16
__PAGE_ALIGNED_BSS
- .align PAGE_SIZE
-ENTRY(empty_zero_page)
+NEXT_PAGE(empty_zero_page)
.skip PAGE_SIZE