I went on a little low-level Nix adventure yesterday and early this morning because of this excellent blog post. In it, Farid builds up the simplest possible Nix derivation—making a file that has the contents “hello world”. Here’s what I like about it:
One thing that I did not understand after reading, though, was where the hashes come from. Farid’s post does the same thing that I occasionally do with the C++ or Rust compiler, where we intionally cause an error to get information that the compiler already knows. That’s not very satisfying for someone (me) who is starting basically from zero. So I went down a bit of a rabbit hole trying to figure out how to manually generate these hashes.
But first, let’s get some terms out of the way. Please forgive (but help me correct) mistakes along the way, because this is my first time messing around with Nix.
As far as I can tell, it’s a very precise recipe for building a file. It’s kind
of like a Make recipe or a shell script except all of its inputs and outputs
refer to well-known long paths in /nix/store
.
Here’s an example from Farid’s blog that we will be trying to replicate in this post that has one of these path names:
{
"name": "simple",
"system": "x86_64-linux",
"builder": "/bin/sh",
"outputs": {
"out": {
"path": "/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple"
}
},
"inputSrcs": [],
"inputDrvs": {},
"env": {
"out": "/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple"
},
"args": [
"-c",
"echo 'hello world' > $out"
]
}
In it, we have a name
field (your pick) and a system
field (I am guessing
there are a couple of well-known system/platform names. I am also running
x86_64 Linux by happenstance. If you follow along, your hashes will be the
same as my hashes only if you are on the same system, I think.).
We also have this thing called a builder
, which is the singular command that
gets run. In this case, we also pass it args
(at the bottom) and environment
variables from env
. Don’t think too hard about inputSrcs
and inputDrvs
because they don’t come into play in this post.
Last, we have outputs
, which, like env
, also has this magical path name.
It’s the output file (or directory, I think) that your Nix derivation is
required to create. The name out
in each of outputs
and env
is arbitrary
and I think they don’t have to have the same name. Just convention.
This derivation roughly corresponds to the following Make recipe (remember that
Make variables also start with $
so you have to escape your shell variables
like $${var}
):
/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple:
export out=/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple; \
echo 'hello world' > $${out}
But where does that huge path come from?
I didn’t want to install Nix, so I did all of my explorations inside Docker. If you’re not going to use Nix long-term, I recommend you do the same, since it otherwise makes some pretty invasive system changes.
Here’s all you need to get started:
FROM nixos/nix AS builder
Now you can build a Docker container:
$ docker build . -t notnix
...
$
and then start /bin/sh
, the only shell in an easy-to-remember path:
$ docker run -i -t notnix /bin/sh
sh-5.2#
...
Any time I use a Nix command in the rest of the article, it’s either in a
Docker RUN
command or in /bin/sh
inside a running Nix container.
Now let’s find some hashes.
John Ott pointed out to me that Nix docs on store paths had a partial answer. He said that the path comes from hashing the ATerm representation of the derivation with the outPath set to an empty string.
…what?
So after some digging and re-reading Farid’s post, apparently ATerm is an old(er) configuration language that looks kind of like building OCaml variants. And I guess it makes sense that we shouldn’t need the path to calculate the path (otherwise we’d be in circular trouble). So I had Nix create the ATerm form of my JSON derivation without any paths:
{
"name": "simple",
"system": "x86_64-linux",
"builder": "/bin/sh",
"outputs": {
"out": {
}
},
"inputSrcs": [],
"inputDrvs": {},
"env": {
},
"args": [
"-c",
"echo 'hello world' > $out"
]
}
by running:
$ nix --extra-experimental-features nix-command derivation add < simple.json
/nix/store/1p6dixyqvjddfq5fmys3i55nl90ckjam-simple.drv
$
Running that command outputs the path of the ATerm form file, which we can check out:
$ cat /nix/store/1p6dixyqvjddfq5fmys3i55nl90ckjam-simple.drv
Derive([("out","","","")],[],[],"x86_64-linux","/bin/sh",["-c","echo 'hello world' > $out"],[("out","")])$
You can see they are making sure to remove all whitespace, even the trailing
newline (hence the $
at the end, which is my shell prompt).
Okay, so we have an ATerm form of the derivation and it has no output path. I guess we hash it? I got a little lost at this point until Jamey Sharp chimed in with the even more detailed store path specification.
This clarified, after many reads, that we have to do the following steps. At some point I switched to using Python because it got a little text manipulation heavy:
inner-fingerprint
because
we’re not doing anything with text
or source
types or NARs or somethinginner-fingerprint
and then base16-encode it.
That’s called the inner-digest
Okay, not so bad:
import hashlib
import base64
with open(inner_fingerprint, "rb") as f:
inner_fingerprint_hash = hashlib.file_digest(f, "sha256").digest()
inner_digest = (
base64.b16encode(inner_fingerprint_hash).decode("utf-8").lower()
)
Then, once we have that, we do some more stuff to it:
inner-digest
with some other fields like the derivation’s
name
and call that the fingerprint
Alright, not so bad, Python can do all of this in the standard library:
name = deriv["name"] # "simple"
# the "out" is the name we picked earlier and is arbitrary
fingerprint = f"output:out:sha256:{inner_digest}:/nix/store:{name}"
fingerprint_hash = hashlib.sha256(fingerprint.encode("utf-8")).digest()
fingerprint_digest = hashlib.b32encode(fingerprint_hash[:20])
However. The docs are misleading about two things and that sent me on a merry chase.
First of all, Nix does not use normal base32. They use a different character set. Also, they base32 in reverse. I didn’t figure either of these things out until tombl chimed in on Twitter.
Second of all, the store-path docs are outright lying when they say “the first 160 bits [20 bytes] of a SHA-256 hash”. Instead, what they should say is “do this weird XOR thing on the hash, folding it back onto itself kinda.”
I only got that second bit by digging through the Nix C++ codebase. So instead, what we really want is this:
def to_nix_base32(bytes_data):
b32_alphabet = b"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567"
b32_nix = b"0123456789abcdfghijklmnpqrsvwxyz"
trans = bytes.maketrans(b32_alphabet, b32_nix)
return base64.b32encode(bytes_data[::-1]).translate(trans).decode("utf-8")
def compress_hash(h, newlen):
result = bytearray(b"\0" * newlen)
for i in range(len(h)):
result[i % newlen] ^= h[i]
return bytes(result)
fingerprint = (
f"output:{output}:sha256:{inner_digest}:{STORE_DIR}:{name}"
)
fingerprint_hash = hashlib.sha256(fingerprint.encode("utf-8")).digest()
fingerprint_digest = to_nix_base32(compress_hash(fingerprint_hash, 20))
I’m using bytes.maketrans
because Python makes it easy enough to convert
between normal-base32 and nix-base32 but you should also take a look a
standalone implementation of nix-base32. For example, here is a link to
Tvix’s.
This magic number that pops out of the correct fingerprint hashing method is
the same as the one from Farid’s post! 5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb
Now we can add that back into the JSON as an output path and an environment variable to finally get the same JSON blob as before:
{
"name": "simple",
"system": "x86_64-linux",
"builder": "/bin/sh",
"outputs": {
"out": {
"path": "/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple"
}
},
"inputSrcs": [],
"inputDrvs": {},
"env": {
"out": "/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple"
},
"args": [
"-c",
"echo 'hello world' > $out"
]
}
And, just as Farid promised, Nix accepts our derivation JSON and gives us a new ATerm:
$ nix --extra-experimental-features nix-command derivation add < simple.json
/nix/store/vh5zww1mqbcshfcblrw3y92v7kkzamfx-simple.drv
$
It’s same derivation Path that Farid has in his post, too.
But having a derivation in hand doesn’t mean anything other than we have—finally—written a correct recipe to build a thing. Let’s run it and see the output!
$ nix-store --realize /nix/store/vh5zww1mqbcshfcblrw3y92v7kkzamfx-simple.drv
...
/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple
$ cat /nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple
hello world
$
I’m calling that success. We build a derivation by hand without any guess-and-check!
Check out my Python code if you like.