Homework 3
This homework will introduce you to compiling binaries for a different architecture, which is called cross compiling. It is based on Homework 2, with the exception of targeting an ARM virtual machine instead of x86_64.
Overview
- Documentation (Human Language))
- Preparation
- Assignments
- Result To Be Submitted (tracked by Git))
- Bonus Assignments
Documentation (Human Language)
A non-technical assignment is to write documentation and answer questions. You can use either English oder German while working on technical assignments. The suggestion is to use English for practicing purposes and to avoid awkward mixins of English technical terms with German ;-)
hw3/QnA.md Question Boxes
Throughout the document you will find several question boxes.
These questions are meant to help you think through what you did and how you can solve the current part of the assignment.
Please keep a protocol for answering these questions in your project repository at $REPO_DIR/hw3/QnA.md
.
The file must contain the questions with brief answers written in your own words.
hw3/README.md (optional)
You can use this document to make the following notes:
- difficulties throughout the homework
- design decisions that are necessary to explain or you think are important to emphasize
Preparation
- Complete Homework 2
- Have SSH/X2Go access to your group's syslab container
Skills You Will Acquire
During this assignment you'll gain experiences in the following activities:
- Strengthen the skills learned in the previous homeworks
- Configuring the Linux Kernel for the qemu-system-aarch64 (arm64) virtual machine architecture
- Cross Compiling the Linux Kernel, the sysinfo application, busybox, dropbear
- Using qemu to emulate the aarch64 architecture as a virtual machine and userspace emulator
Bonus only:
- Learn more about the dynamic linking with the GCC toolchain
- Learn how to deploy an init system instead of a script based init procedure
[warning] Bonus Assignment Information
The first bonus assignment contains significant changes for this homework, as it switches from static to dynamic linking of libraries which need to be copied to the InitRamDisk. It is suggested that you decide upfront if you want to do the first bonus assignment, as it affects every component of this homework except for the Linux kernel.
Pre-Requisites
Before you proceed, please research the following topics. There is no need get into great detail at this point, but simply get an overview of what they are and how they are related.
- The ARM aarch64 architecture
qemu(-system)-aarch64 Emulator
https://qemu.weilnetz.de/doc/qemu-doc.html#ARM-System-emulator
Unfortunately the official documentation site doesn't cover aarch64 yet, so please use the -arm target as an example. For this homework it is important to understand that the machine -M / -machine and CPU -cpu settings allow to emulate different hardware models within a given architecture. This is significant for your kernel configuration, which must support the hardware that is emulated. Use help as the machine and cpu type to get a list of supported models.
https://qemu.weilnetz.de/doc/qemu-doc.html#QEMU-User-space-emulator
- http://wiki.qemu-project.org/Documentation/Networking#User_Networking_.28SLIRP.29
Cross Compilation: http://wiki.osdev.org/GCC_Cross-Compiler#Introduction
(You don't need to download a cross-compiler on the containers)
Cross Kernel Configuration and Compilation
The Gentoo has a nice article on [how to configure a kkernel for the Raspberry Pi 3 board, which is of aarch64:
Bonus only:
- Linux dynamic linker
Inspecting the toolchains
Please take a closer look at the main components of the toolchain that humans might interact with directly: gcc
and ld
.
[success] Questions
- What is the difference between the commands
gcc
andaarch64-linux-gnu-gcc
?- What is the difference between the commands
ld
andaarch64-linux-gnu-ld
?
Assignments
The following table lists the assignments and their interdependencies.
Assignment | Dependencies |
---|---|
Linux Kernel | - |
Busybox | - |
Dropbear | - |
InitRamDisk | Busybox, Dropbear |
qemu for aarch64 (network enabled) | Kernel, InitRamDisk |
The goals for this homework are the same as for homework 2 with the following exceptions:
- The sysinfo receives a slight modification
- The compilation and emulation target platform is aarch64
Bonus only:
- All binaries are produced with dynamic linking, which requires you to install the libraries on the target system
Linux Kernel
Even though the configuration from homework 2 is for the x86_64 architecture, it can be re-used in this assignment. It will be transformed to an aarch64 (or arm64 in Kernel buildsystem slang) when menuconfig is called with the below mentioned environment variables set.
Configure Linux for a different architecture
To change the architecture of the kernel menuconfig, you need to supply the ARCH and CROSS_COMPILE environment variables accordingly.
The link to the Gentoo wiki in the Pre-Requesites demonstrates the procedure to set these variables
[info] Use the linked wiki as an information source only not as a tutorial!
- The target platform for this assignment is not the Raspberry Pi 3, because it cannot be emulated by Qemu, and you will choose a machine and cpu type used by Qemu
- You will use a different toolchain prefix and obviously won't install Gentoo in the VM ;-)
- Take note of the unintuitive naming scheme (aarch64/arm64/armv8)
For starters, place the homework 2 kernel config in the kernel source directory and run menuconfig. Alternatively you can also use the load config option in the menuconfig.
Don't change the configuration manually, just exit and confirm the save.
[success] Questions Please look at the
diff
between the newly saved kernel config and the one from homework 2.
- Which configure option(s) reflect(s) the architecture change?
- Which ARM platform has been activated by default?
Enable The Correct Serial Device For Console output
The qemu-system-aarch64 uses a different hardware architecture and also has different periphery components as the emulator for x86_64. These peripherals are not automatically enabled by changing the config's architecture.
One of the important devices that your kernel needs to support is the PrimeCell PL011 UART controller, which you will use for serial console output.
Enable the Generic PCI Host controller
On aarch64, the option CONFIG_PCI_HOST_GENERIC
is required for the qemu's virtio-net to work.
[success] Questions
- What is the name of the serial port that you need to pass to
console=...
in order to get console output on the PL011 device?
Busybox
For this assignment you don't need any new busybox applets.
Dropbear SSH
The config.h file for dropbear can remain the same for this assignment.
Configuration
In addition to the configuration from homework 2, the following option needs to be passed to ./configure
--host=aarch64-linux-gnu
[success] Questions
- What is the meaning of the host option for this configure script?
Sysinfo Application Extended
Please extend your sysinfo application from the homework 1 to print out the following information:
---- gethostname Information ---- Hostname: grp0 ---- sysinfo Information ---- Uptime: 2 seconds Process count: 20 Total RAM: 46292992 Byte Free RAM: 39907328 Byte Page size: 1 Byte ---- utsname Information ---- system name: Linux node name: grp0 release: 4.11.0 version: #1 SMP Sat May 6 11:04:10 UTC 2017 machine: aarch64
The information from utsname is retrieved by the uname() library function which is used by the uname
command line tool you already learned about.
InitRamDisk
The intitial ramdisk for this assignment will include the same components as in Homework 2, this time compiled for the aarch64 architecture.
Dropbear dlopen()
's some libraries (Reminder)
At runtime, the following libraries will be needed once you try to login to the SSH server:
- ld-linux-aarch64.so.1
- libc.so.6
- libnss_files.so.2
If these libraries are not available at runtime, the login to the SSH server cannot work.
As in homework 2, you can get the absolute path for each library within the development environment using the command aarch64-linux-gnu-gcc -print-file-name=$library_name
.
The init file
The init file in the initrd can be identical to the one in homework 2.
Run Linux in a Virtual Machine (with qemu)
The qemu command line arguments differ from homework 2 in the following ways:
- You invoke the aarch64 system emulator now
- You need to provide a machine type and a CPU model. It's recommended to use virt as the machine type.
[success] Questions
- Please explain your choice for the machine and CPU types.
Result To Be Submitted (tracked by Git)
This section gives you accurate information which files are part of the submission for this homework. Results for bonus assignments are not covered within this section.
Test-Suite and Continous Integration
Merge and Run the Continous-Integration test suite.
Directory Structure
This is an exemplary structure of how your submitted (track by git) files could be structured:
. ├──.travis.yml ├──ci │ └── ... └──hw3 ├── busybox │ └── config ├── dropbear │ └── options.h ├── hw3.sh ├── initrd │ ├── bin │ │ ├── busybox -> ../../artifacts/busybox │ │ ├── dropbearmulti -> ../../artifacts/dropbearmulti │ │ └── sysinfo -> ../../artifacts/sysinfo │ ├── etc │ │ ├── group │ │ └── passwd │ ├── init │ └── lib ├── kernel │ └── config └── sysinfo └── src ├── Makefile └── sysinfo.c
Documentation files
(not shown in the above tree) Please include the documentation files that are explained in the beginning.
Build Instructions (hw3/hw3.sh)
A shell script that reproduces the final result of your homework. This shell script will be used to verify your results, and does not need to include commands that run the interactive menuconfig. However, you may implement such functionality for working conveniently within your homework repository.
Arguments and Script behavior
Arguments | Function |
---|---|
(called without any arguments) | Build all artifacts starting with just the files that are checked in to git |
qemu | Run qemu-system-aarch64 , booting your system with the initrd and network |
clean | Remove all files not tracked by git |
ssh_cmd cmd [args...] | Establish a connection to the VM's SSH server via authorized_key authentication. It runs the specified command with all arguments inside the VM. An example would be ssh_cmd "echo Hello, World" , as found in the CI scripts. |
SSH Key For Authorization
(not shown in the above tree)
The SSH Key file that can be used to authorize with root in your virtual machine. (not displayed in the example tree layout)
Build Artifacts (Binary Files)
The following files must be present in at hw3/artifacts/
after the build is complete but not tracked by git!
File | Target Architecture | Purpose |
---|---|---|
Image.gz | aarch64 (arm64) | Cross-compiled Kernel binary used for qemu |
sysinfo | aarch64 | Cross-compiled Statically linked binary of your little C program |
dropbearmulti | aarch64 | Cross-compiled statically linked multicall binary for Dropbear |
initrd.cpio | aarch64 | Initial RamDisk file in the form of a cpio archive |
Other Files and Git
[warning] Please do not add binary files to your git repository. Only add files that represent configuration, source code and build commands.
Bonus Assignments
These optional assignments allow you to dig in a little deeper! They don't depend on each other so you can cherry-pick the ones you are interested in.
Bonus 1: Use dynamic linking for all binaries
Instead of statically linking the libraries into each binary, and hereby redundant binary code on the target system, the libraries can be linked dynamically and stored as shared libraries. In this bonus assignment you will learn a lot about the internals of the ELF binary format and how dynamic linking works on Linux.
Sysinfo Application Extended
Please omit the -static
argument for the compiler to turn off static linking.
Busybox
The difference here is that you have to disable the setting that causes a static build.
Please consult the busybox Makefile to see how cross compilation works with the busybox buildsystem.
Dropbear
Please compile dropbear as a dynamically linked multi-call binary this time by omitting STATIC=1
from the make
call.
InitRamDisk
The intitial ramdisk for this assignment will include all dynamically libraries for all contained binaries. The most difficult part is to find these libraries because they are scattered all over the filesystem.
[success] Questions By now you should have dynamically linked and cross-compiled binaries for busybox, sysinfo, and dropbearmulti in your artifacts directory.
- Why does running
ldd
against these binaries not tell you that they are dynamically linked?- How else can you verifty this, and in addition find out the architecture they are built for?
Locating and copying the required shared objects
If you use the file
program on the binaries you can see the path of their dynamic interpreter, which is the program that locates the required libraries at runtime.
It can also be used to display information about dynamically linked binaries.
This method allows you to find the absolute paths for libraries you need to install in your initrd.
The method's rationale is quite simple:
- Get the dynamic linker path used by the binary
- Use the dynamic linker to print information about the binary
Example: Inspecting the bash executable
$ file $(type -P bash)
bin/bash: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=4be0cc32aba02ec4e0f010047be5ae9dee756960, stripped
$ ldd $(which bash)
linux-vdso.so.1 (0x00007ffc01133000)
libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f4f8a559000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4f8a355000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4f89fb6000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4f8a783000)
The first command displays the interpreter, which is the dynamic linker for x86_64 binaries in this case. The second command uses said dynamic linker to resolve and list all dependencies. The interpreter and all dynamically linked libraries need to be available at runtime on the target system, in order for the executable to work. If you just copy the executable and try to run it there will be an error.
Based on this method you can retrieve the list of shared objects files for the binaries in your initrd.
[success] Questions
- What error message do you receive when you try to run an executable and forgot to copy the executable's interpreter into the InitRamDisk?
- What is the interpreter for your cross compiled binary files?
- Why can you not run the dynamic linker for those binaries directly in your container?
- How can qemu's userspace emulator help you out?
Dropbear
At runtime, the following libraries will be needed once you try to login to the SSH server:
- libnss_files.so.2
In comparison to the static compilation scenario this list is reduced by the libraries that are properly linked, thus become visible dependencies and are handled by the mechanism to copy the required shared libraries.
You will not use the environment variable LD_LIBRARY_PATH in this case, but rely on the rpath which is stored in the executable file itself. The dynamic linker will search all paths in rpath automatically when a dynamically linked binary is invoked.
Bonus 2: Use busybox's init as PID1 in your system
The regular assignments instruct you to use a simple shell script as the program that controls the init procedure. On production systems this is handled by a process supervisor, that is configurable rather than scriptable. Activate busybox's init system and integrate it into your initrd. This includes setting up an inittab. Please configure this so that it automatically spawns a (login) shell on the serial port.