Pretty shell prompts
I’ve previously talked about doing some simple software packaging for other people’s projects, where the maintainer doesn’t currently generate any packages I can use across debian-based systems.
As a big fan of powerline-shell, I wanted to see if I could distribute this in a usable form to my systems that I frequently log into. There’s not many, but they do vary in power and architecture (some x86_64, some arm7, some arm6 etc). The idea behind powerline-shell is that you include its output as part of your shell’s prompt output, and it gives you a useful (and pretty!) prompt line. It’s trivial, but I find it useful and pleasing.
The problem here is that it effectively executes every single time the shell decides to give you a prompt, which is mostly after every command. Anyone who’s played an online FPS game, or worked with sound recording can tell you that for things that the brain expects to happen immediately, it’s pretty good at noticing that the thing isn’t actually “immediate” after delays of 10s of ms, if not less. So when you’re working at a shell, you expect a prompt to arrive essentially immediately after the last command finishes running. Type
ls and the output along with the prompt for the next command are expected to appear at the same time. If you need to run an entire process to figure out how that prompt should look, it better be fast.
So let’s see how
powerline-shell performs. On my laptop (recent MBP, python 3.7.7):
[mac]$ time powerline-shell real 0m0.096s user 0m0.066s sys 0m0.032s
On a small cloud VPS running debian:
[cloud-host]$ time powerline-shell real 0m0.081s user 0m0.058s sys 0m0.022s
Not too bad. 80-90ms is probably ok-enough. Let’s see how it runs on a Raspberry Pi 2 (arm7, hf) with an NFS root.
[rpi 2]$ time powerline-shell real 0m1.019s user 0m0.760s sys 0m0.248s
Oh. How about an rPi zero (arm6, hf)?
[rpi-zero]$ time powerline-shell real 0m1.401s user 0m1.136s sys 0m0.203s
Hmm. Waiting a second for your shell to be responsive after running a command is going to send you quietly insane.
Golang to the rescue!?
Computers are pretty quick these days, but using python to write a program that has to execute very quickly doesn’t seem like a good idea. Thankfully, some other people on the internet have had a similar thought: someone’s got a decent version available in golang. Let’s see how that does.
[mac]$ time powerline-go real 0m0.030s user 0m0.013s sys 0m0.021s [cloud-host]$ time powerline-go real 0m0.014s user 0m0.011s sys 0m0.003s [rpi 2]$ time powerline-go real 0m0.541s user 0m0.061s sys 0m0.126s [rpi-zero]$ time powerline-go real 0m0.264s user 0m0.125s sys 0m0.128s
Pretty good! This is getting towards usable, although a 250ms wait on the rpi-zero would start to get annoying. I had a look at the golang code and did some profiling, but it didn’t seem there were any obvious places in which latency could be improved.
How about Rust?
Then I discovered that someone had done a port in rust. Intrigued, I wanted to find out how this performed. I’ve been dabbling in rust recently and seem to be going through the same learning curve as most other people. Initial excitement at how good the language and stdlib feel (along with the excellent tooling) followed by complete despair at my inability to write even the most simple program without the compiler complaining about literally everything. Without sounding too Stockholm-y, this isn’t rust’s fault - it’s just enforcing some pretty simple rules about memory management. It’s actually my fault for not understanding what I actually want to do.
Anyhow, the other reason for wanting to play with it is that it’s not an interpreted language. Unlike python/golang etc. it doesn’t rely on a VM or runtime to be present, which potentially lends itself well towards higher performance and lower latency applications. People are calling it the “new C” (but without all the annoying unsafe foot-gun bits that C also comes with).
So in theory, a version of
powerline-shell in rust should perform pretty well. But first we have to build it.
Compile for everything!
One thing that a lot of “newer” languages and environments have in common is a recognition that sometimes the type of computer you’re developing / building you application on is not the same as what it will eventually run on. Hence them making cross-compiling very straightforward (at least in theory).
In golang, cross-compilation is very easy: just set the
GOARCH environment variables before building it and the output you’ll end up with is a binary that will execute on that platform. I assume this is simple because golang code fundamentally runs on a VM, so cross compiling is just a case of bundling the right VM for the target system with the binary.
In rust, it’s a little more complex. The compiler has to output the actual assembler that will execute on the target architecture, in a format that can be run by the target OS. The process is described pretty well here.
cargo utility is what takes care of building and managing the dependencies of any rust project. In theory, cross-compiling is as simple as specifying the target that you want to build for when executing
cargo build. So let’s try and build for
arm-unknown-linux-musleabihf (I’m using musl because it’ll a little lighter than glibc so should run faster - I should test this though).
$ cargo build --target=arm-unknown-linux-musleabihf ... the `arm-unknown-linux-musleabihf` target may not be installed
Ok, so we install with rustup
$ rustup target add arm-unknown-linux-musleabihf $ cargo build --target=arm-unknown-linux-musleabihf ... error occurred: Failed to find tool. Is `arm-linux-musleabihf-gcc` installed?
Ah, it’s looking for a compiler that can compile some of the dependencies for the requested target. The problem I have is that I’m building this on Ubuntu Bionic (18.04), and this doesn’t have any package that provides the
musl version of armhf gcc. However, it does offer
gcc-8-arm-linux-gnueabihf (the glibc version) and
musl-tools supplies a script called
musl-gcc which sets some arguments pointing to the musl libs and then executes the actual CC. In cargo, we can tell it to use a specific compiler by overriding
TARGET_CC. We also need to set the environment variable
REALGCC to be the actual cross-compiler we want to use, as this is then invoked by
musl-gcc. The default (I think) is just to invoke the
gcc for whatever the current arch is, and we don’t want that (we want ARM!).
$ REALGCC=arm-linux-gnueabihf-gcc-8 TARGET_CC=musl-gcc cargo build --target=arm-unknown-linux-musleabihf ... error: linking with `cc` failed: exit code: 1 /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/arm-unknown-linux-musleabihf/lib/crt1.o: error adding symbols: File in wrong format collect2: error: ld returned 1 exit status
Oh, the linker has failed. So the compiler worked! Progress!
More magic variables are needed. Cargo needs to be told what linker to use for each target if it’s not the default linker. So we set
CARGO_TARGET_ARM_UNKNOWN_LINUX_MUSLEABIHF_LINKER to be the cross-suite’s arm linker:
$ CARGO_TARGET_ARM_UNKNOWN_LINUX_MUSLEABIHF_LINKER=arm-linux-gnueabihf-ld REALGCC=arm-linux-gnueabihf-gcc-8 TARGET_CC=musl-gcc cargo build --target=arm-unknown-linux-musleabihf ... Finished dev [unoptimized + debuginfo] target(s) in 1.19s
Let’s build a release version (
cargo build [...] --release) and see how it performs.
[mac]$ time powerline-rust real 0m0.009s user 0m0.004s sys 0m0.004s [cloud-host]$ time powerline-rust real 0m0.001s user 0m0.001s sys 0m0.000s [rpi 2]$ time powerline-rust real 0m0.009s user 0m0.007s sys 0m0.000s [rpi-zero]$ time powerline-rust Illegal instruction
Hmm. Something’s odd about the rpi-zero build. This is an ARMv6 chip that definitely has hard-float, so an arm eabihf build should work fine. If I build the same output on OSX (using https://github.com/osx-cross/homebrew-arm), I get a binary that works. So what’s happening?
Maybe it’s just musl being odd. Let’s try GNU instead.
[build]$ CARGO_TARGET_ARM_UNKNOWN_LINUX_GNUEABIHF_LINKER=arm-linux-gnueabihf-gcc-8 TARGET_CC=arm-linux-gnueabihf-gcc-8 cargo build --target=arm-unknown-linux-gnueabihf [rpi-zero]$ ./powerline-rust-arm-gnueabihf Illegal Instruction
Nope. Something else independent of the libc implementation is causing a binary to be made that doesn’t work on ARMv6.
Stupid compilers and their opinions
Let’s have a look at the binary to find out what it is.
$ readelf -A powerline-rust-arm-gnueabihf Attribute Section: aeabi File Attributes Tag_CPU_name: "7-A" Tag_CPU_arch: v7 Tag_CPU_arch_profile: Application Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-2 Tag_FP_arch: VFPv3-D16 Tag_ABI_PCS_GOT_use: GOT-indirect Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_rounding: Needed Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte Tag_ABI_enum_size: int Tag_ABI_VFP_args: VFP registers Tag_CPU_unaligned_access: v6 Tag_ABI_FP_16bit_format: IEEE 754
Aha! GCC appears to have built an ARMv7 binary. No wonder it doesn’t work! I had made an assumption here: because rustup supports targets like
armv5te-unknown-linux-gnueabi (for ARMv5),
arm-unknown-linux-gnueabihf (v6) and
armv7-unknown-linux-gnueabihf (v7), I had assumed that the
arm- prefix generically meant “v6” as a sort-of default. I had also assumed that when cargo builds for a given target, it instructs the compiler to use the correct arch version (if the compiler happens to support multiple). It turns out this isn’t true.
For no reason at all, what happens if we try soft-float?
[build]$ root@f8704b95def8:~/src# CARGO_TARGET_ARM_UNKNOWN_LINUX_GNUEABI_LINKER=arm-linux-gnueabi-gcc-8 TARGET_CC=arm-linux-gnueabi-gcc-8 cargo build --target=arm-unknown-linux-gnueabi [rpi-zero]$ ./powerline-rust-arm-gnueabi (successful output)
Hmm, this works. Let’s check the ELF attributes:
$ readelf -A powerline-rust-arm-gnueabi Attribute Section: aeabi File Attributes Tag_CPU_name: "ARM v6" Tag_CPU_arch: v6 Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-1 Tag_ABI_PCS_GOT_use: GOT-indirect Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_rounding: Needed Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte Tag_ABI_enum_size: int Tag_CPU_unaligned_access: v6 Tag_ABI_FP_16bit_format: IEEE 754
How strange. Digging around, it looks like the GCC that’s shipped with Debian / Ubuntu has some interesting defaults, which means that building for ARMv6 is somewhat tricky: https://stackoverflow.com/questions/35132319/build-for-armv6-with-gnueabihf/51201725#51201725. I’d probably go as far as saying that it looks like building for anything less than ARMv7 on the GCC toolchain that Debian supplies isn’t supported. Why the hard-float toolchain chucks out an ARMv7 binary, but the soft-float an ARMv6 one is beyond me. Computers!
So, the GCC toolchain is the problem here. If the one that ships with Ubuntu doesn’t work, then I need to get one that does. I could compile my own, but this is a bit of a faff. Usefully, the Raspberry Pi lot provide their own gcc binaries that are build for the correct target arch. It’s quite an old version of gcc (4.9.3) but it should work. Unfortunately, it doesn’t work with
musl-tools, so we might have to compile against a glibc target.
[build]$ git clone --depth=1 https://github.com/raspberrypi/tools.git rpi-tools [build]$ export MAGIC_RPI_GCC=rpi-tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64/bin/arm-linux-gnueabihf-gcc [build]$ CARGO_TARGET_ARM_UNKNOWN_LINUX_GNUEABIHF_LINKER=$MAGIC_RPI_GCC TARGET_CC=$MAGIC_RPI_GCC cargo build --target=arm-unknown-linux-gnueabihf --release ... [build]$ readelf -A target/arm-unknown-linux-gnueabihf/release/powerline Attribute Section: aeabi File Attributes Tag_CPU_name: "6" Tag_CPU_arch: v6 Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-1 Tag_FP_arch: VFPv2 Tag_ABI_PCS_GOT_use: GOT-indirect Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte Tag_ABI_enum_size: int Tag_ABI_HardFP_use: Deprecated Tag_ABI_VFP_args: VFP registers Tag_CPU_unaligned_access: v6 Tag_ABI_FP_16bit_format: IEEE 754 Tag_DIV_use: Not allowed
Great! Looks like we have a binary that should work - it’s ARMv6 and VPFv2, which is what the rPI needs as a baseline. How does it perform?
[rpi 2]$ time powerline-rust real 0m0.030s user 0m0.002s sys 0m0.021s [rpi-zero]$ time powerline-rust real 0m0.047s user 0m0.010s sys 0m0.032s
At last! That’s very acceptable performance on the pi zero and even though there’s a significant degredation in performance on the rPi 2 from the gcc-8 toolchain earlier, it’s still perfectly reasonable.
At some point, I’ll try doing a custom build of gcc to see if that makes a difference, as well as trying to get musl to work.