riscv: vmlinux.lds.S|vmlinux-xip.lds.S: remove `.fixup` section These are no longer necessary now that we have a more standard extable mechanism. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
riscv: extable: add `type` and `data` fields This is a riscv port of commit d6e2cc564775 ("arm64: extable: add `type` and `data` fields"). Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
riscv: Move EXCEPTION_TABLE to RO_DATA segment _ex_table section is read-only, so move it to RO_DATA. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
riscv: add VMAP_STACK overflow detection This patch adds stack overflow detection to riscv, usable when CONFIG_VMAP_STACK=y. Overflow is detected in kernel exception entry(kernel/entry.S), if the kernel stack is overflow and been detected, the overflow handler is invoked on a per-cpu overflow stack. This approach preserves GPRs and the original exception information. The overflow detect is performed before any attempt is made to access the stack and the principle of stack overflow detection: kernel stacks are aligned to double their size, enabling overflow to be detected with a single bit test. For example, a 16K stack is aligned to 32K, ensuring that bit 14 of the SP must be zero. On an overflow (or underflow), this bit is flipped. Thus, overflow (of less than the size of the stack) can be detected by testing whether this bit is set. This gives us a useful error message on stack overflow, as can be trigger with the LKDTM overflow test: [ 388.053267] lkdtm: Performing direct entry EXHAUST_STACK [ 388.053663] lkdtm: Calling function with 1024 frame size to depth 32 ... [ 388.054016] lkdtm: loop 32/32 ... [ 388.054186] lkdtm: loop 31/32 ... [ 388.054491] lkdtm: loop 30/32 ... [ 388.054672] lkdtm: loop 29/32 ... [ 388.054859] lkdtm: loop 28/32 ... [ 388.055010] lkdtm: loop 27/32 ... [ 388.055163] lkdtm: loop 26/32 ... [ 388.055309] lkdtm: loop 25/32 ... [ 388.055481] lkdtm: loop 24/32 ... [ 388.055653] lkdtm: loop 23/32 ... [ 388.055837] lkdtm: loop 22/32 ... [ 388.056015] lkdtm: loop 21/32 ... [ 388.056188] lkdtm: loop 20/32 ... [ 388.058145] Insufficient stack space to handle exception! [ 388.058153] Task stack: [0xffffffd014260000..0xffffffd014264000] [ 388.058160] Overflow stack: [0xffffffe1f8d2c220..0xffffffe1f8d2d220] [ 388.058168] CPU: 0 PID: 89 Comm: bash Not tainted 5.12.0-rc8-dirty #90 [ 388.058175] Hardware name: riscv-virtio,qemu (DT) [ 388.058187] epc : number+0x32/0x2c0 [ 388.058247] ra : vsnprintf+0x2ae/0x3f0 [ 388.058255] epc : ffffffe0002d38f6 ra : ffffffe0002d814e sp : ffffffd01425ffc0 [ 388.058263] gp : ffffffe0012e4010 tp : ffffffe08014da00 t0 : ffffffd0142606e8 [ 388.058271] t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffd014260070 [ 388.058303] s1 : ffffffd014260158 a0 : ffffffd01426015e a1 : ffffffd014260158 [ 388.058311] a2 : 0000000000000013 a3 : ffff0a01ffffff10 a4 : ffffffe000c398e0 [ 388.058319] a5 : 511b02ec65f3e300 a6 : 0000000000a1749a a7 : 0000000000000000 [ 388.058327] s2 : ffffffff000000ff s3 : 00000000ffff0a01 s4 : ffffffe0012e50a8 [ 388.058335] s5 : 0000000000ffff0a s6 : ffffffe0012e50a8 s7 : ffffffe000da1cc0 [ 388.058343] s8 : ffffffffffffffff s9 : ffffffd0142602b0 s10: ffffffd0142602a8 [ 388.058351] s11: ffffffd01426015e t3 : 00000000000f0000 t4 : ffffffffffffffff [ 388.058359] t5 : 000000000000002f t6 : ffffffd014260158 [ 388.058366] status: 0000000000000100 badaddr: ffffffd01425fff8 cause: 000000000000000f [ 388.058374] Kernel panic - not syncing: Kernel stack overflow [ 388.058381] CPU: 0 PID: 89 Comm: bash Not tainted 5.12.0-rc8-dirty #90 [ 388.058387] Hardware name: riscv-virtio,qemu (DT) [ 388.058393] Call Trace: [ 388.058400] [<ffffffe000004944>] walk_stackframe+0x0/0xce [ 388.058406] [<ffffffe0006f0b28>] dump_backtrace+0x38/0x46 [ 388.058412] [<ffffffe0006f0b46>] show_stack+0x10/0x18 [ 388.058418] [<ffffffe0006f3690>] dump_stack+0x74/0x8e [ 388.058424] [<ffffffe0006f0d52>] panic+0xfc/0x2b2 [ 388.058430] [<ffffffe0006f0acc>] print_trace_address+0x0/0x24 [ 388.058436] [<ffffffe0002d814e>] vsnprintf+0x2ae/0x3f0 [ 388.058956] SMP: stopping secondary CPUs Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
RISC-V: enable XIP Introduce XIP (eXecute In Place) support for RISC-V platforms. It allows code to be executed directly from non-volatile storage directly addressable by the CPU, such as QSPI NOR flash which can be found on many RISC-V platforms. This makes way for significant optimization of RAM footprint. The XIP kernel is not compressed since it has to run directly from flash, so it will occupy more space on the non-volatile storage. The physical flash address used to link the kernel object files and for storing it has to be known at compile time and is represented by a Kconfig option. XIP on RISC-V will for the time being only work on MMU-enabled kernels. Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> [Alex: Rebase on top of "Move kernel mapping outside the linear mapping" ] Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> [Palmer: disable XIP for allyesconfig] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
riscv: Move kernel mapping outside of linear mapping This is a preparatory patch for relocatable kernel and sv48 support. The kernel used to be linked at PAGE_OFFSET address therefore we could use the linear mapping for the kernel mapping. But the relocated kernel base address will be different from PAGE_OFFSET and since in the linear mapping, two different virtual addresses cannot point to the same physical address, the kernel mapping needs to lie outside the linear mapping so that we don't have to copy it at the same physical offset. The kernel mapping is moved to the last 2GB of the address space, BPF is now always after the kernel and modules use the 2GB memory range right before the kernel, so BPF and modules regions do not overlap. KASLR implementation will simply have to move the kernel in the last 2GB range and just take care of leaving enough space for BPF. In addition, by moving the kernel to the end of the address space, both sv39 and sv48 kernels will be exactly the same without needing to be relocated at runtime. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> [Palmer: Squash the STRICT_RWX fix, and a !MMU fix] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
riscv: Introduce alternative mechanism to apply errata solution Introduce the "alternative" mechanism from ARM64 and x86 to apply the CPU vendors' errata solution at runtime. The main purpose of this patch is to provide a framework. Therefore, the implementation is quite basic for now so that some scenarios could not use this schemei, such as patching code to a module, relocating the patching code and heterogeneous CPU topology. Users could use the macro ALTERNATIVE to apply an errata to the existing code flow. In the macro ALTERNATIVE, users need to specify the manufacturer information(vendorid, archid, and impid) for this errata. Therefore, kernel will know this errata is suitable for which CPU core. During the booting procedure, kernel will select the errata required by the CPU core and then patch it. It means that the kernel only applies the errata to the specified CPU core. In this case, the vendor's errata does not affect each other at runtime. The above patching procedure only occurs during the booting phase, so we only take the overhead of the "alternative" mechanism once. This "alternative" mechanism is enabled by default to ensure that all required errata will be applied. However, users can disable this feature by the Kconfig "CONFIG_RISCV_ERRATA_ALTERNATIVE". Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
RISC-V: Move dynamic relocation section under __init Dynamic relocation section are only required during boot. Those sections can be freed after init. Thus, it can be moved to __init section. Signed-off-by: Atish Patra <atish.patra@wdc.com> Tested-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
RISC-V: Protect all kernel sections including init early Currently, .init.text & .init.data are intermixed which makes it impossible apply different permissions to them. .init.data shouldn't need exec permissions while .init.text shouldn't have write permission. Moreover, the strict permission are only enforced /init starts. This leaves the kernel vulnerable from possible buggy built-in modules. Keep .init.text & .data in separate sections so that different permissions are applied to each section. Apply permissions to individual sections as early as possible. This improves the kernel protection under CONFIG_STRICT_KERNEL_RWX. We also need to restore the permissions for the entire _init section after it is freed so that those pages can be used for other purpose. Signed-off-by: Atish Patra <atish.patra@wdc.com> Tested-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
RISC-V: Align the .init.text section In order to improve kernel text protection, we need separate .init.text/ .init.data/.text in separate sections. However, RISC-V linker relaxation code is not aware of any alignment between sections. As a result, it may relax any RISCV_CALL relocations between sections to JAL without realizing that an inter section alignment may move the address farther. That may lead to a relocation truncated fit error. However, linker relaxation code is aware of the individual section alignments. The detailed discussion on this issue can be found here. https://github.com/riscv/riscv-gnu-toolchain/issues/738 Keep the .init.text section aligned so that linker relaxation will take that as a hint while relaxing inter section calls. Here are the code size changes for each section because of this change. section change in size (in bytes) .head.text +4 .text +40 .init.text +6530 .exit.text +84 The only significant increase in size happened for .init.text because all intra relocations also use 2MB alignment. Suggested-by: Jim Wilson <jimw@sifive.com> Signed-off-by: Atish Patra <atish.patra@wdc.com> Tested-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Merge tag 'riscv-for-linus-5.10-mw0' of git://git./linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "A handful of cleanups and new features: - A handful of cleanups for our page fault handling - Improvements to how we fill out cacheinfo - Support for EFI-based systems" * tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (22 commits) RISC-V: Add page table dump support for uefi RISC-V: Add EFI runtime services RISC-V: Add EFI stub support. RISC-V: Add PE/COFF header for EFI stub RISC-V: Implement late mapping page table allocation functions RISC-V: Add early ioremap support RISC-V: Move DT mapping outof fixmap RISC-V: Fix duplicate included thread_info.h riscv/mm/fault: Set FAULT_FLAG_INSTRUCTION flag in do_page_fault() riscv/mm/fault: Fix inline placement in vmalloc_fault() declaration riscv: Add cache information in AUX vector riscv: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO riscv: Set more data to cacheinfo riscv/mm/fault: Move access error check to function riscv/mm/fault: Move FAULT_FLAG_WRITE handling in do_page_fault() riscv/mm/fault: Simplify mm_fault_error() riscv/mm/fault: Move fault error handling to mm_fault_error() riscv/mm/fault: Simplify fault error handling riscv/mm/fault: Move vmalloc fault handling to vmalloc_fault() riscv/mm/fault: Move bad area handling to bad_area() ...
Merge tag 'core-build-2020-10-12' of git://git./linux/kernel/git/tip/tip Pull orphan section checking from Ingo Molnar: "Orphan link sections were a long-standing source of obscure bugs, because the heuristics that various linkers & compilers use to handle them (include these bits into the output image vs discarding them silently) are both highly idiosyncratic and also version dependent. Instead of this historically problematic mess, this tree by Kees Cook (et al) adds build time asserts and build time warnings if there's any orphan section in the kernel or if a section is not sized as expected. And because we relied on so many silent assumptions in this area, fix a metric ton of dependencies and some outright bugs related to this, before we can finally enable the checks on the x86, ARM and ARM64 platforms" * tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/boot/compressed: Warn on orphan section placement x86/build: Warn on orphan section placement arm/boot: Warn on orphan section placement arm/build: Warn on orphan section placement arm64/build: Warn on orphan section placement x86/boot/compressed: Add missing debugging sections to output x86/boot/compressed: Remove, discard, or assert for unwanted sections x86/boot/compressed: Reorganize zero-size section asserts x86/build: Add asserts for unwanted sections x86/build: Enforce an empty .got.plt section x86/asm: Avoid generating unused kprobe sections arm/boot: Handle all sections explicitly arm/build: Assert for unwanted sections arm/build: Add missing sections arm/build: Explicitly keep .ARM.attributes sections arm/build: Refactor linker script headers arm64/build: Assert for unwanted sections arm64/build: Add missing DWARF sections arm64/build: Use common DISCARDS in linker script arm64/build: Remove .eh_frame* sections due to unwind tables ...
riscv: Fixup bootup failure with HARDENED_USERCOPY 6184358da000 ("riscv: Fixup static_obj() fail") attempted to elide a lockdep failure by rearranging our kernel image to place all initdata within [_stext, _end], thus triggering lockdep to treat these as static objects. These objects are released and eventually reallocated, causing check_kernel_text_object() to trigger a BUG(). This backs out the change to make [_stext, _end] all-encompassing, instead just moving initdata. This results in initdata being outside of [__init_begin, __init_end], which means initdata can't be freed. Link: https://lore.kernel.org/linux-riscv/1593266228-61125-1-git-send-email-guoren@kernel.org/T/#t Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Reported-by: Aurelien Jarno <aurelien@aurel32.net> Tested-by: Aurelien Jarno <aurelien@aurel32.net> [Palmer: Clean up commit text] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
RISC-V: Add PE/COFF header for EFI stub Linux kernel Image can appear as an EFI application With appropriate PE/COFF header fields in the beginning of the Image header. An EFI application loader can directly load a Linux kernel Image and an EFI stub residing in kernel can boot Linux kernel directly. Add the necessary PE/COFF header. Signed-off-by: Atish Patra <atish.patra@wdc.com> Link: https://lore.kernel.org/r/20200421033336.9663-3-atish.patra@wdc.com [ardb: - use C prefix for c.li to ensure the expected opcode is emitted - align all image sections according to PE/COFF section alignment ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG The .comment section doesn't belong in STABS_DEBUG. Split it out into a new macro named ELF_DETAILS. This will gain other non-debug sections that need to be accounted for when linking with --orphan-handling=warn. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-arch@vger.kernel.org Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org
riscv: Fixup static_obj() fail When enable LOCKDEP, static_obj() will cause error. Because some __initdata static variables is before _stext: static int static_obj(const void *obj) { unsigned long start = (unsigned long) &_stext, end = (unsigned long) &_end, addr = (unsigned long) obj; /* * static variable? */ if ((addr >= start) && (addr < end)) return 1; [ 0.067192] INFO: trying to register non-static key. [ 0.067325] the code is fine but needs lockdep annotation. [ 0.067449] turning off the locking correctness validator. [ 0.067718] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc7-dirty #44 [ 0.067945] Call Trace: [ 0.068369] [<ffffffe00020323c>] walk_stackframe+0x0/0xa4 [ 0.068506] [<ffffffe000203422>] show_stack+0x2a/0x34 [ 0.068631] [<ffffffe000521e4e>] dump_stack+0x94/0xca [ 0.068757] [<ffffffe000255a4e>] register_lock_class+0x5b8/0x5bc [ 0.068969] [<ffffffe000255abe>] __lock_acquire+0x6c/0x1d5c [ 0.069101] [<ffffffe0002550fe>] lock_acquire+0xae/0x312 [ 0.069228] [<ffffffe000989a8e>] _raw_spin_lock_irqsave+0x40/0x5a [ 0.069357] [<ffffffe000247c64>] complete+0x1e/0x50 [ 0.069479] [<ffffffe000984c38>] rest_init+0x1b0/0x28a [ 0.069660] [<ffffffe0000016a2>] 0xffffffe0000016a2 [ 0.069779] [<ffffffe000001b84>] 0xffffffe000001b84 [ 0.069953] [<ffffffe000001092>] 0xffffffe000001092 static __initdata DECLARE_COMPLETION(kthreadd_done); noinline void __ref rest_init(void) { ... complete(&kthreadd_done); Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
riscv: Allow device trees to be built into the kernel Some systems don't provide a useful device tree to the kernel on boot. Chasing around bootloaders for these systems is a headache, so instead le't's just keep a device tree table in the kernel, keyed by the SOC's unique identifier, that contains the relevant DTB. This is only implemented for M mode right now. While we could implement this via the SBI calls that allow access to these identifiers, we don't have any systems that need this right now. Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
riscv: Add SOC early init support Add a mechanism for early SoC initialization for platforms that need additional hardware initialization not possible through the regular device tree and drivers mechanism. With this, a SoC specific initialization function can be called very early, before DTB parsing is done by parse_dtb() in Linux RISC-V kernel setup code. This can be very useful for early hardware initialization for No-MMU kernels booted directly in M-mode because it is quite likely that no other booting stage exist prior to the No-MMU kernel. Example use of a SoC early initialization is as follows: static void vendor_abc_early_init(const void *fdt) { /* * some early init code here that can use simple matches * against the flat device tree file. */ } SOC_EARLY_INIT_DECLARE("vendor,abc", abc_early_init); This early initialization function is executed only if the flat device tree for the board has a 'compatible = "vendor,abc"' entry; Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
RISC-V: Move relocate and few other functions out of __init The secondary hart booting and relocation code are under .init section. As a result, it will be freed once kernel booting is done. However, ordered booting protocol and CPU hotplug always requires these functions to be present to bringup harts after initial kernel boot. Move the required functions to a different section and make sure that they are in memory within first 2MB offset as trampoline page directory only maps first 2MB. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
riscv: add alignment for text, rodata and data sections The kernel mapping will tried to optimize its mapping by using bigger size. In rv64, it tries to use PMD_SIZE, and tryies to use PGDIR_SIZE in rv32. To ensure that the start address of these sections could fit the mapping entry size, make them align to the biggest alignment. Define a macro SECTION_ALIGN because the HPAGE_SIZE or PMD_SIZE, etc., are invisible in linker script. This patch is prepared for STRICT_KERNEL_RWX support. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>