Gábor Nyéki

Hiding a message in my PyTorch weights

I show how to embed information into floating-point numbers such as those that constitute the parameters of a neural network. Detecting a hidden message may be possible, too. This works with PyTorch, but also with any framework that reads and writes tensors.

Introduction

I’ve built a small tool called Steganotorchy that hides an arbitrary message inside a neural network’s weights and biases. In this post, I’ll explain how it works and how we could detect a message that is hidden by it. But first, let’s talk about steganography.

Steganography is the practice of hiding a message in a medium where an ordinary observer does not expect nor notice it. The reason that this works is that the presence of the message does not change the medium in any obvious way. Since the message is generally not encrypted, only hidden, the guiding principle of this approach is that security by obscurity is good enough.

The classical application of steganography is to digital images. Our eyes don’t perceive small differences between colors. Consequently, one can embed a message in an image by manipulating the least significant bits of each pixel’s RGB color codes. In a way, this is a digital equivalent of “invisible ink” that can only be seen under UV light.


The top pixel is less green. You can't tell, can you?

We can go further in exploiting the limitations of human perception, and hide messages in sound files and videos, too. But redundancies in how binary data is used, for example, by operating systems, have also been used for steganography. This includes embedding information in TCP packets and x86 machine code.

However, there are simpler methods that we can use if our goal is just to obfuscate some information. One that has seen more nefarious uses is the humble xor cipher which can turn a comprehensible message into what looks like binary gibberish. As Wikipedia notes, this method is popular in malware, and indeed, the place where I encountered it over two decades ago was in a modified version of the Eggdrop IRC bot called VoiD. Eggdrop configuration files are stored in plaintext but avid users of VoiD were not always eager to reveal their operations to system administrators.

Instead, they inserted the configuration file into a fixed-length char array. This moved the configuration from a standalone file into a statically allocated char buffer inside the executable. The contents of a buffer like that can be easily discovered by running strings on the executable. To prevent this, VoiD obfuscated the buffer with an xor cipher. Here is a demonstration of the idea that shows how the output of hexdump changes for an executable after applying a cipher to a char buffer:

  00003010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
░░00003020░░54░68░69░73░20░6d░65░73░░73░61░67░65░20░69░73░20░░|This message is |░░
░░00003030░░68░69░64░64░65░6e░2e░20░░42░75░74░20░6e░6f░74░20░░|hidden. But not |░░
░░00003040░░76░65░72░79░20░77░65░6c░░6c░21░00░00░00░00░00░00░░|very well!......|░░
░░00003050░░00░00░00░00░00░00░00░00░░00░00░00░00░00░00░00░00░░|................|░░
  00003070  47 43 43 3a 20 28 55 62  75 6e 74 75 20 31 31 2e  |GCC: (Ubuntu 11.|

                      ◥◤       xor cipher      ◥◤

  00003010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
░░00003020░░9b░a7░a6░bc░ef░a2░aa░bc░░bc░ae░a8░aa░ef░a6░bc░ef░░|................|░░
░░00003030░░a7░a6░ab░ab░aa░a1░e1░ef░░8d░ba░bb░ef░a1░a0░bb░ef░░|................|░░
░░00003040░░b9░aa░bd░b6░ef░b8░aa░a3░░a3░ee░cf░cf░cf░cf░cf░cf░░|................|░░
░░00003050░░cf░cf░cf░cf░cf░cf░cf░cf░░cf░cf░cf░cf░cf░cf░cf░cf░░|................|░░
  00003070  47 43 43 3a 20 28 55 62  75 6e 74 75 20 31 31 2e  |GCC: (Ubuntu 11.|

This approach was far from perfect1 but it got the job done. The upside of it was that you could use any text editor to modify the configuration by applying the cipher to the executable with the correct key. The downside was that if you felt that you needed to do this, there was a chance that you were committing a crime.

Yet steganography is a much broader concept and it isn’t limited to bit manipulation. For example, one could argue that port knocking is a form of steganography, too, but this is somewhat tangential here.2 One thing that is common among these methods is that the embedded message is either invisible or, upon first inspection, looks like noise. And there is something else that looks like noise: the lowest bits of floating-point numbers. So let’s talk about those.

Hiding a message in floating-point numbers

The weights and biases of a neural network are represented by floating-point numbers. Floats come in different precisions.3 A single-precision float encodes 32 bits of information and takes up 4 bytes in memory. A double-precision float encodes 64 bits and takes up 8 bytes.

In machine learning, smaller floats that encode 16 bits or less are more popular because of their reduced memory footprint. But if a model is not meaningfully compromised by giving up precision, then it also shouldn’t be if we store a hidden message in them.

Binary representation

Floating-point numbers are encoded in scientific notation, but in base 2 instead of base 10. A number \(x\) is characterized as a float by the formula

\[ x = (-1)^{\rm sign} \times {\rm mantissa} \times 2^{\rm exponent}. \]

It is the mantissa that drives a float’s precision, and in normalized form, it is always between 0.1 and 1.0. If we had arbitrary precision, we could write any number as a floating-point number, no matter how small or large:

\[ \begin{aligned} 42 &= (-1)^0 \times 0.65625 \times 2^6, \\ 42,\!000,\!000 &= (-1)^0 \times 0.6258487701416016 \times 2^{26}, \;\text{and} \\ 4,\!200,\!000,\!042 &= (-1)^0 \times 0.9778887131251395 \times 2^{32}. \end{aligned} \]

But our precision is limited. For the sake of illustration, let’s consider 32-bit floats. 42 and 42,000,000 can be encoded without loss of precision. However, the mantissa of a 32-bit float is not long enough to encode 4,200,000,042. The 42 at the end gets chopped off which we can easily confirm, for example, with NumPy:

import numpy as np

np.float32(42)             # Returns 42.0.
np.float32(42_000_000)     # Returns 42000000.0.
np.float32(4_200_000_042)  # Returns 4200000000.0.

The mantissa is where Steganotorchy embeds hidden information. How exactly is the mantissa stored then? The IEEE 754 standard specifies the layout of single-precision floats. The binary representation of 1.0 is:

The mantissa is stored in the lowest 23 bits. If we change the lowest bit of the mantissa from 0 to 1, then the number changes ever so slightly, from 1.0 to 1.0000001.

Suppose we want to hide the letter a in a sequence of floats. The ASCII encoding of a is 0x61, or 01100001 in binary. This means that we can hide this letter inside eight 32-bit floats by changing only the lowest bit:


The highlighted bits were modified to embed the letter a.

We only need four floats if we change the lowest two bits:


The highlighted bits were modified to embed the letter a.

Or just one float if we use the lowest eight bits:


The highlighted bits were modified to embed the letter a.

Therefore if we use the lowest eight bits, then we can hide a 1 KiB message inside the weights and biases of any neural network that has at least 1,024 parameters. Now, unless every message that we might want to embed is exactly 1 KiB long, it is a good idea to also embed the length of the message. Budgeting for an extra 8 bytes to store the length as a 64-bit integer, we need 1,032 parameters.

Embedding information in floats

Steganotorchy embeds a message in two sections: a header and the content. The header consists of a 32-bit integer that encodes the length of the content. The length provides necessary information for extracting the content. Since the content is almost certainly smaller than the information capacity of the model, without the length we wouldn’t know how many bytes to read out of the model parameters.

Take an imaginary neural network where every model parameter is 1.0 before we embed information in it. This figure shows what the header and the content look like after we embed the letter a:


The highlighted bits were modified by Steganotorchy. 8 bits are embedded in each 32-bit float. The content is the letter a, taking up 1 byte. The header is the length of the content, 1, as a 32-bit integer.

It is easier to implement extraction if both the header and the content can be assumed to start in a new float. This assumption holds in the figure above because we embedded 8 bits in each float, and 8 is a power of two. However, if the number of bits per float is not a power of two, then we need to leave padding bits between the header and the content, as illustrated here:


The highlighted bits were modified by Steganotorchy. The header and the content are the same as in the previous figure, but now only 7 rather than 8 bits are embedded in each 32-bit float. This necessitates adding three padding bits after the header ends, so that the content starts in a new float.

Usage on the command line

Steganotorchy operates on safetensors and can embed up to 8 bits per model parameter. It supports two main commands, embed and extract:

$ steganotorchy -b 8 -m model.safetensors embed the_tale_of_peter_rabbit.txt rabbit.safetensors
Embedded 5182 bytes into "rabbit.safetensors".
$ steganotorchy -b 8 -m rabbit.safetensors extract rabbit.out
Extracted 5182 bytes into "rabbit.out".

The output of extract matches the input of embed, as we expect:

$ head -7 rabbit.out
Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.
$ sha1sum the_tale_of_peter_rabbit.txt rabbit.out 
f8969d071bc3d67bfc95fc12ea55912619fea3a5  the_tale_of_peter_rabbit.txt
f8969d071bc3d67bfc95fc12ea55912619fea3a5  rabbit.out

A safetensors file can also be inspected to see its capacity and any purported message hidden in it. In this case, we have embedded 8 bits per model parameter, so running inspect with -b 8 correctly shows the length and the beginning of the message:

$ steganotorchy -b 8 -m rabbit.safetensors inspect
Model file "rabbit.safetensors" (8 bits/byte):

  Capacity:         45000 bits
  Message length:   5182 bytes
  Message content:  "Once upon a time the..."

If the model doesn’t have enough capacity, then embed fails with an error:

$ steganotorchy -b 7 -m model.safetensors embed the_tale_of_peter_rabbit.txt rabbit.safetensors
Error: Message needs at least 5928 parameters to be embedded but tensors have only 5625.

But because the model only stores the message length and the content, not the number of bits per model parameter, we need to know the right setting upfront. If we try to read with the wrong setting, we just get non-sense:

$ steganotorchy -b 7 -m rabbit.safetensors inspect
Model file "rabbit.safetensors" (7 bits/byte):

  Capacity:         39375 bits
  Message length:   41961 bytes
  Message content:  "Ý\u{8f}*\u{e}¼7îA\u{85}\u{7}M;r\u{a0}é£/,¨;..."

Detecting a hidden message

We’ve seen that when a message is embedded, its length is embedded, too. Let’s change hats now and suppose that we don’t know whether there is a message hiding inside the model. Our goal is to detect the message if it’s there. The embedded length can be a valuable clue.

Bigger is not better

Steganotorchy’s choice of encoding the message length in a 32-bit integer imposes a size cap of 4 GiB on the message. On the face of it, a 64-bit integer seems like a better choice. However, that would be wasteful because messages larger than 4 GiB are unlikely.

Length size Shortest message Longest message
8 bits 0 B 256 B
16 bits 0 B 64 KiB
32 bits 0 B 4 GiB
64 bits 0 B 16,777,216 TiB

Although 64 bits would allow for messages as large as 16 exbibytes (or over 16 million TiB), no realistic neural network has enough parameters to hide such a message. Even a 4 GiB message would require around 4.3 billion parameters if we can hide 8 bits in each parameter, and it would require around 34.4 billion parameters if we can only hide 1 bit in each. In the world of large language models, such models do exist but they are somewhat unwieldy to lug around.

Yet a 64-bit length would be more than just wasteful, it’d also be a clear giveaway of the hidden message. If the message is shorter than 65,536 TiB, then the length would lead with at least 8 zero bits.


The highlighted bits are leading zeros in the 64-bit length. 389,548,077 represents a message length of around 372 MiB.

The probability of 8 consecutive zeros occurring at random is low enough to raise eyebrows:

\[ \Pr(L_{64} = \cdots = L_{57} = 0) = \dfrac{1}{2^8} \approx 0.39\%. \]

If the message that we embed takes up less than 4 GiB, then the top 32 bits of the length would all be zero, and the probability of 32 consecutive zeros is much lower:

\[ \Pr(L_{64} = \cdots = L_{33} = 0) = \dfrac{1}{2^{32}} < 10^{-9}. \]

So 32-bit lengths are better. But even with 32-bit lengths, we can still end up with a lot of zero leading bits. For example, if the message is below 2 MiB, then the integer that stores the length has at least 11 leading zeros because

\[32 - \log_2(2 \times 1024 \times 1024) = 11.\]

The probability of this occurring at random is low, too:

\[ \Pr(L_{32} = \cdots = L_{22} = 0) = \dfrac{1}{2^{11}} \approx 0.049\%. \]

Heuristics

All of this suggests two heuristics for deciding whether a model contains an embedded message. Let’s assume that there is a message embedded in the model. Let’s read the supposed message length from the lowest bits and call it \(\ell.\)

  1. If \(\ell\) is greater than what can possibly fit in the model given the number of model parameters, then our assumption must be wrong. There is no message of length \(\ell\) hidden in the model.
  2. If the model’s capacity is greater than \(\ell\), then we can calculate an upper bound on the probability that a number as small as \(\ell\) or smaller would occur in the lowest bits at random: \[ \Pr(L \leq \ell) \leq \Pr(L_{32} = L_{31} = \cdots = L_{1 + \lceil \log_2(1 + \ell) \rceil} = 0) = \left( \frac{1}{2} \right)^{32 - \lceil \log_2(1 + \ell) \rceil} . \] The upper bound holds with equality if and only if \(\ell\) is a power of 2. We saw earlier that if \(\ell\) is just below 2 MiB, then the upper bound is about 0.049%. If \(\ell\) is a little larger, say, 30 MiB, then the upper bound is still about 0.781%. For shorter messages like these, we would conclude that it is likely that there really is a message hidden in the model.

For shorter messages or for smaller models, these heuristics might just work. It is with message lengths of 256 MiB or more that the upper bound goes above 5%. Such messages require over 268 million model parameters to embed if each parameter can hide 8 bits, and they require almost 2.15 billion model parameters if each can hide only 1 bit.

So how could we detect a hidden message if \(\ell\) doesn’t give it away? That is a question for another day.


  1. Can you guess what key I used for the xor cipher by looking at the output of hexdump?↩︎

  2. Popular implementations of port knocking also improve security via obscurity and thus this scheme could also be considered a steganographic method. Port knocking is commonly used to block access to an SSH server unless the client sends packets to the right sequence of ports before connecting.


    The server only allows the client to connect if it first sends packets to the correct sequence of ports.

    The port sequence constitutes a secret handshake. Usually, the handshake that the client uses is fixed, so they can repeat it any number of times and the server will always let them connect. This renders the method vulnerable to eavesdropping. The handshake, because it’s fixed, can be captured and replayed by a knowing adversary who observes the network traffic between the client and the server. But it may not stand out to anyone else.↩︎

  3. When working with floats, lower-level languages like Rust, C, and C++, require that you explicitly choose the precision to use, but in higher-level languages like Python and R, double precision is the default. JavaScript goes further and even represents integers as double-precision floats.↩︎

Updated on November 20, 2024.