Closes issue #5 : Added comments to virt.lds to explain what each section does.

2024-11-24 02:16:19 +04:00 · 2019-10-06 11:06:51 -04:00 · 2019-10-06 11:06:51 -04:00 · 0b2cbfcb35
commit 0b2cbfcb35
parent 33d27d9efa
2 changed files with 394 additions and 2 deletions
--- a/risc_v/ch0/virt.lds
+++ b/risc_v/ch0/virt.lds
@ -1,12 +1,76 @@
+/*
+ virt.lds
+ Linker script for outputting to RISC-V QEMU "virt" machine.
+ Stephen Marz
+ 6 October 2019
+*/
+
+/*
+  riscv is the name of the architecture that the linker understands
+  for any RISC-V target (64-bit or 32-bit).
+
+  We will further refine this by using -mabi=lp64 and -march=rv64gc
+*/
 OUTPUT_ARCH( "riscv" )

+/*
+We're setting our entry point to a symbol
+called _start which is inside of boot.S. This
+essentially stores the address of _start as the
+"entry point", or where CPU instructions should start
+executing.
+
+In the rest of this script, we are going to place _start
+right at the beginning of 0x8000_0000 because this is where
+the virtual machine and many RISC-V boards will start executing.
+*/
 ENTRY( _start )

+/*
+The MEMORY section will explain that we have "ram" that contains
+a section that is 'w' (writeable), 'x' (executable), and 'a' (allocatable).
+We use '!' to invert 'r' (read-only) and 'i' (initialized). We don't want
+our memory to be read-only, and we're stating that it is NOT initialized
+at the beginning.
+
+The ORIGIN is the memory address 0x8000_0000. If we look at the virt
+spec or the specification for the RISC-V HiFive Unleashed, this is the
+starting memory address for our code.
+
+Side note: There might be other boot ROMs at different addresses, but
+their job is to get to this point.
+
+Finally LENGTH = 128M tells the linker that we have 128 megabyte of RAM.
+The linker will double check this to make sure everything can fit.
+
+The HiFive Unleashed has a lot more RAM than this, but for the virtual 
+machine, I went with 128M since I think that's enough RAM for now.
+
+We can provide other pieces of memory, such as QSPI, or ROM, but we're
+telling the linker script here that we have one pool of RAM.
+*/
 MEMORY
 {
  ram   (wxa!ri) : ORIGIN = 0x80000000, LENGTH = 128M
 }

+/*
+PHDRS is short for "program headers", which we specify three here:
+text - CPU instructions (executable sections)
+data - Global, initialized variables
+bss  - Global, uninitialized variables (all will be set to 0 by boot.S)
+
+The command PT_LOAD tells the linker that these sections will be loaded
+from the file into memory.
+
+We can actually stuff all of these into a single program header, but by
+splitting it up into three, we can actually use the other PT_* commands
+such as PT_DYNAMIC, PT_INTERP, PT_NULL to tell the linker where to find
+additional information.
+
+However, for our purposes, every section will be loaded from the program
+headers.
+*/
 PHDRS
 {
  text PT_LOAD;
@ -14,23 +78,128 @@ PHDRS
  bss PT_LOAD;
 }

+/*
+We are now going to organize the memory based on which
+section it is in. In assembly, we can change the section
+with the ".section" directive. However, in C++ and Rust,
+CPU instructions go into text, global constants go into
+rodata, global initialized variables go into data, and
+global uninitialized variables go into bss.
+*/
 SECTIONS
 {
+  /*
+    The first part of our RAM layout will be the text section.
+	Since our CPU instructions are here, and our memory starts at
+	0x8000_0000, we need our entry point to line up here.
+  */
  .text : {
+	  /* 
+	    PROVIDE allows me to access a symbol called _text_start so
+		I know where the text section starts in the operating system.
+		This should not move, but it is here for convenience.
+		The period '.' tells the linker to set _text_start to the
+		CURRENT location ('.' = current memory location). This current
+		memory location moves as we add things.
+	  */
+
    PROVIDE(_text_start = .);
+	/*
+	  We are going to layout all text sections here, starting with 
+	  .text.init. The asterisk in front of the parentheses means to match
+	  the .text.init section of ANY object file. Otherwise, we can specify
+	  which object file should contain the .text.init section, for example,
+	  boot.o(.text.init) would specifically put the .text.init section of
+	  our bootloader here.
+
+	  Because we might want to change the name of our files, we'll leave it
+	  with a *.
+
+	  Inside the parentheses is the name of the section. I created my own
+	  called .text.init to make 100% sure that the _start is put right at the
+	  beginning. The linker will lay this out in the order it receives it:
+
+	  .text.init first
+	  all .text sections next
+	  any .text.* sections last
+
+	  .text.* means to match anything after .text. If we didn't already specify
+	  .text.init, this would've matched here. The assembler and linker can place
+	  things in "special" text sections, so we match any we might come across here.
+	*/
    *(.text.init) *(.text .text.*)
+
+	/*
+	  Again, with PROVIDE, we're providing a readable symbol called _text_end, which is
+	  set to the memory address AFTER .text.init, .text, and .text.*'s have been added.
+	*/
    PROVIDE(_text_end = .);
+	/*
+	  The portion after the right brace is in an odd format. However, this is telling the
+	  linker what memory portion to put it in. We labeled our RAM, ram, with the constraints
+	  that it is writeable, allocatable, and executable. The linker will make sure with this
+	  that we can do all of those things.
+
+	  >ram - This just tells the linker script to put this entire section (.text) into the
+	         ram region of memory. To my knowledge, the '>' does not mean "greater than". Instead,
+			 it is a symbol to let the linker know we want to put this in ram.
+
+	  AT>ram - This sets the LMA (load memory address) region to the same thing. LMA is the final
+	           translation of a VMA (virtual memory address). With this linker script, we're loading
+			   everything into its physical location. We'll let the kernel copy and sort out the 
+			   virtual memory. That's why >ram and AT>ram are continually the same thing.
+
+	  :text  - This tells the linker script to put this into the :text program header. We've only
+	           defined three: text, data, and bss. In this case, we're telling the linker script
+			   to go into the text section.
+	*/
  } >ram AT>ram :text
+   /*
+     The global pointer allows the linker to position global variables and constants into
+	 independent positions relative to the gp (global pointer) register. The globals start
+	 after the text sections and are only relevant to the rodata, data, and bss sections.
+   */
   PROVIDE(_global_pointer = .);
+   /*
+     Most compilers create a rodata (read only data) section for global constants. However,
+	 we're going to place ours in the text section. We can actually put this in :data, but
+	 since the .text section is read-only, we can place it there.
+
+	 NOTE: This doesn't actually do anything, yet. The actual "protection" cannot be done
+	 at link time. Instead, when we program the memory management unit (MMU), we will be
+	 able to choose which bits (R=read, W=write, X=execute) we want each memory segment
+	 to be able to do.
+   */
  .rodata : {
    PROVIDE(_rodata_start = .);
    *(.rodata .rodata.*)
    PROVIDE(_rodata_end = .);
+	/*
+	   Again, we're placing the rodata section in the memory segment "ram" and we're putting
+	   it in the :text program header. We don't have one for rodata anyway.
+	*/
  } >ram AT>ram :text

  .data : {
+	/*
+	   . = ALIGN(4096) tells the linker to align the current memory location (which is
+	   0x8000_0000 + text section + rodata section) to 4096 bytes. This is because our paging
+	   system's resolution is 4,096 bytes or 4 KiB.
+	*/
    . = ALIGN(4096);
    PROVIDE(_data_start = .);
+	/*
+	   sdata and data are essentially the same thing. However, compilers usually use the
+	   sdata sections for shorter, quicker loading sections. So, usually critical data
+	   is loaded there. However, we're loading all of this in one fell swoop.
+	   So, we're looking to put all of the following sections under the umbrella .data:
+	   .sdata
+	   .sdata.[anything]
+	   .data
+	   .data.[anything]
+
+	   ...in that order.
+	*/
    *(.sdata .sdata.*) *(.data .data.*)
    PROVIDE(_data_end = .);
  } >ram AT>ram :data
@ -41,9 +210,36 @@ SECTIONS
    PROVIDE(_bss_end = .);
  } >ram AT>ram :bss

+  /*
+     The following will be helpful when we allocate the kernel stack (_stack) and
+	 determine where the heap begnis and ends (_heap_start and _heap_start + _heap_size)/
+	 When we do memory allocation, we can use these symbols.
+
+	 We use the symbols instead of hard-coding an address because this is a floating target.
+	 As we add code, the heap moves farther down the memory and gets shorter.
+
+	 _memory_start will be set to 0x8000_0000 here. We use ORIGIN(ram) so that it will take
+	 whatever we set the origin of ram to. Otherwise, we'd have to change it more than once
+	 if we ever stray away from 0x8000_0000 as our entry point.
+  */
  PROVIDE(_memory_start = ORIGIN(ram));
+  /*
+     Our kernel stack starts at the end of the bss segment (_bss_end). However, we're allocating
+	 0x80000 bytes (524 KiB) to our kernel stack. This should be PLENTY of space. The reason
+	 we add the memory is because the stack grows from higher memory to lower memory (bottom to top).
+	 Therefore we set the stack at the very bottom of its allocated slot.
+	 When we go to allocate from the stack, we'll subtract the number of bytes we need.
+  */
  PROVIDE(_stack = _bss_end + 0x80000);
  PROVIDE(_memory_end = ORIGIN(ram) + LENGTH(ram));
+
+  /* 
+     Finally, our heap starts right after the kernel stack. This heap will be used mainly
+	 to dole out memory for user-space applications. However, in some circumstances, it will
+	 be used for kernel memory as well.
+
+	 We don't align here because we let the kernel determine how it wants to do this.
+  */
  PROVIDE(_heap_start = _stack);
  PROVIDE(_heap_size = _memory_end - _stack);
 }
--- a/risc_v/ch2/src/lds/virt.lds
+++ b/risc_v/ch2/src/lds/virt.lds
@ -1,12 +1,76 @@
+/*
+ virt.lds
+ Linker script for outputting to RISC-V QEMU "virt" machine.
+ Stephen Marz
+ 6 October 2019
+*/
+
+/*
+  riscv is the name of the architecture that the linker understands
+  for any RISC-V target (64-bit or 32-bit).
+
+  We will further refine this by using -mabi=lp64 and -march=rv64gc
+*/
 OUTPUT_ARCH( "riscv" )

+/*
+We're setting our entry point to a symbol
+called _start which is inside of boot.S. This
+essentially stores the address of _start as the
+"entry point", or where CPU instructions should start
+executing.
+
+In the rest of this script, we are going to place _start
+right at the beginning of 0x8000_0000 because this is where
+the virtual machine and many RISC-V boards will start executing.
+*/
 ENTRY( _start )

+/*
+The MEMORY section will explain that we have "ram" that contains
+a section that is 'w' (writeable), 'x' (executable), and 'a' (allocatable).
+We use '!' to invert 'r' (read-only) and 'i' (initialized). We don't want
+our memory to be read-only, and we're stating that it is NOT initialized
+at the beginning.
+
+The ORIGIN is the memory address 0x8000_0000. If we look at the virt
+spec or the specification for the RISC-V HiFive Unleashed, this is the
+starting memory address for our code.
+
+Side note: There might be other boot ROMs at different addresses, but
+their job is to get to this point.
+
+Finally LENGTH = 128M tells the linker that we have 128 megabyte of RAM.
+The linker will double check this to make sure everything can fit.
+
+The HiFive Unleashed has a lot more RAM than this, but for the virtual 
+machine, I went with 128M since I think that's enough RAM for now.
+
+We can provide other pieces of memory, such as QSPI, or ROM, but we're
+telling the linker script here that we have one pool of RAM.
+*/
 MEMORY
 {
  ram   (wxa!ri) : ORIGIN = 0x80000000, LENGTH = 128M
 }

+/*
+PHDRS is short for "program headers", which we specify three here:
+text - CPU instructions (executable sections)
+data - Global, initialized variables
+bss  - Global, uninitialized variables (all will be set to 0 by boot.S)
+
+The command PT_LOAD tells the linker that these sections will be loaded
+from the file into memory.
+
+We can actually stuff all of these into a single program header, but by
+splitting it up into three, we can actually use the other PT_* commands
+such as PT_DYNAMIC, PT_INTERP, PT_NULL to tell the linker where to find
+additional information.
+
+However, for our purposes, every section will be loaded from the program
+headers.
+*/
 PHDRS
 {
  text PT_LOAD;
@ -14,23 +78,128 @@ PHDRS
  bss PT_LOAD;
 }

+/*
+We are now going to organize the memory based on which
+section it is in. In assembly, we can change the section
+with the ".section" directive. However, in C++ and Rust,
+CPU instructions go into text, global constants go into
+rodata, global initialized variables go into data, and
+global uninitialized variables go into bss.
+*/
 SECTIONS
 {
+  /*
+    The first part of our RAM layout will be the text section.
+	Since our CPU instructions are here, and our memory starts at
+	0x8000_0000, we need our entry point to line up here.
+  */
  .text : {
+	  /* 
+	    PROVIDE allows me to access a symbol called _text_start so
+		I know where the text section starts in the operating system.
+		This should not move, but it is here for convenience.
+		The period '.' tells the linker to set _text_start to the
+		CURRENT location ('.' = current memory location). This current
+		memory location moves as we add things.
+	  */
+
    PROVIDE(_text_start = .);
+	/*
+	  We are going to layout all text sections here, starting with 
+	  .text.init. The asterisk in front of the parentheses means to match
+	  the .text.init section of ANY object file. Otherwise, we can specify
+	  which object file should contain the .text.init section, for example,
+	  boot.o(.text.init) would specifically put the .text.init section of
+	  our bootloader here.
+
+	  Because we might want to change the name of our files, we'll leave it
+	  with a *.
+
+	  Inside the parentheses is the name of the section. I created my own
+	  called .text.init to make 100% sure that the _start is put right at the
+	  beginning. The linker will lay this out in the order it receives it:
+
+	  .text.init first
+	  all .text sections next
+	  any .text.* sections last
+
+	  .text.* means to match anything after .text. If we didn't already specify
+	  .text.init, this would've matched here. The assembler and linker can place
+	  things in "special" text sections, so we match any we might come across here.
+	*/
    *(.text.init) *(.text .text.*)
+
+	/*
+	  Again, with PROVIDE, we're providing a readable symbol called _text_end, which is
+	  set to the memory address AFTER .text.init, .text, and .text.*'s have been added.
+	*/
    PROVIDE(_text_end = .);
+	/*
+	  The portion after the right brace is in an odd format. However, this is telling the
+	  linker what memory portion to put it in. We labeled our RAM, ram, with the constraints
+	  that it is writeable, allocatable, and executable. The linker will make sure with this
+	  that we can do all of those things.
+
+	  >ram - This just tells the linker script to put this entire section (.text) into the
+	         ram region of memory. To my knowledge, the '>' does not mean "greater than". Instead,
+			 it is a symbol to let the linker know we want to put this in ram.
+
+	  AT>ram - This sets the LMA (load memory address) region to the same thing. LMA is the final
+	           translation of a VMA (virtual memory address). With this linker script, we're loading
+			   everything into its physical location. We'll let the kernel copy and sort out the 
+			   virtual memory. That's why >ram and AT>ram are continually the same thing.
+
+	  :text  - This tells the linker script to put this into the :text program header. We've only
+	           defined three: text, data, and bss. In this case, we're telling the linker script
+			   to go into the text section.
+	*/
  } >ram AT>ram :text
+   /*
+     The global pointer allows the linker to position global variables and constants into
+	 independent positions relative to the gp (global pointer) register. The globals start
+	 after the text sections and are only relevant to the rodata, data, and bss sections.
+   */
   PROVIDE(_global_pointer = .);
+   /*
+     Most compilers create a rodata (read only data) section for global constants. However,
+	 we're going to place ours in the text section. We can actually put this in :data, but
+	 since the .text section is read-only, we can place it there.
+
+	 NOTE: This doesn't actually do anything, yet. The actual "protection" cannot be done
+	 at link time. Instead, when we program the memory management unit (MMU), we will be
+	 able to choose which bits (R=read, W=write, X=execute) we want each memory segment
+	 to be able to do.
+   */
  .rodata : {
    PROVIDE(_rodata_start = .);
    *(.rodata .rodata.*)
    PROVIDE(_rodata_end = .);
+	/*
+	   Again, we're placing the rodata section in the memory segment "ram" and we're putting
+	   it in the :text program header. We don't have one for rodata anyway.
+	*/
  } >ram AT>ram :text

  .data : {
+	/*
+	   . = ALIGN(4096) tells the linker to align the current memory location (which is
+	   0x8000_0000 + text section + rodata section) to 4096 bytes. This is because our paging
+	   system's resolution is 4,096 bytes or 4 KiB.
+	*/
    . = ALIGN(4096);
    PROVIDE(_data_start = .);
+	/*
+	   sdata and data are essentially the same thing. However, compilers usually use the
+	   sdata sections for shorter, quicker loading sections. So, usually critical data
+	   is loaded there. However, we're loading all of this in one fell swoop.
+	   So, we're looking to put all of the following sections under the umbrella .data:
+	   .sdata
+	   .sdata.[anything]
+	   .data
+	   .data.[anything]
+
+	   ...in that order.
+	*/
    *(.sdata .sdata.*) *(.data .data.*)
    PROVIDE(_data_end = .);
  } >ram AT>ram :data
@ -41,9 +210,36 @@ SECTIONS
    PROVIDE(_bss_end = .);
  } >ram AT>ram :bss

+  /*
+     The following will be helpful when we allocate the kernel stack (_stack) and
+	 determine where the heap begnis and ends (_heap_start and _heap_start + _heap_size)/
+	 When we do memory allocation, we can use these symbols.
+
+	 We use the symbols instead of hard-coding an address because this is a floating target.
+	 As we add code, the heap moves farther down the memory and gets shorter.
+
+	 _memory_start will be set to 0x8000_0000 here. We use ORIGIN(ram) so that it will take
+	 whatever we set the origin of ram to. Otherwise, we'd have to change it more than once
+	 if we ever stray away from 0x8000_0000 as our entry point.
+  */
  PROVIDE(_memory_start = ORIGIN(ram));
+  /*
+     Our kernel stack starts at the end of the bss segment (_bss_end). However, we're allocating
+	 0x80000 bytes (524 KiB) to our kernel stack. This should be PLENTY of space. The reason
+	 we add the memory is because the stack grows from higher memory to lower memory (bottom to top).
+	 Therefore we set the stack at the very bottom of its allocated slot.
+	 When we go to allocate from the stack, we'll subtract the number of bytes we need.
+  */
  PROVIDE(_stack = _bss_end + 0x80000);
  PROVIDE(_memory_end = ORIGIN(ram) + LENGTH(ram));
+
+  /* 
+     Finally, our heap starts right after the kernel stack. This heap will be used mainly
+	 to dole out memory for user-space applications. However, in some circumstances, it will
+	 be used for kernel memory as well.
+
+	 We don't align here because we let the kernel determine how it wants to do this.
+  */
  PROVIDE(_heap_start = _stack);
  PROVIDE(_heap_size = _memory_end - _stack);
 }