Hiding a message in my PyTorch weights

November 16, 2024

I show how to embed information into floating-point numbers such as those that constitute the parameters of a neural network. Detecting a hidden message may be possible, too. This works with PyTorch, but also with any framework that reads and writes tensors.

Introduction

I’ve built a small tool called Steganotorchy that hides an arbitrary message inside a neural network’s weights and biases. In this post, I’ll explain how it works and how we could detect a message that is hidden by it. But first, let’s talk about steganography.

Steganography is the practice of hiding a message in a medium where an ordinary observer does not expect nor notice it. The reason that this works is that the presence of the message does not change the medium in any obvious way. Since the message is generally not encrypted, only hidden, the guiding principle of this approach is that security by obscurity is good enough.

The classical application of steganography is to digital images. Our eyes don’t perceive small differences between colors. Consequently, one can embed a message in an image by manipulating the least significant bits of each pixel’s RGB color codes. In a way, this is a digital equivalent of “invisible ink” that can only be seen under UV light.

An illustration that shows two pixels in the same image having slightly different values but looking identical to the human eye.

The top pixel is less green. You can't tell, can you?

We can go further in exploiting the limitations of human perception, and hide messages in sound files and videos, too. But redundancies in how binary data is used, for example, by operating systems, have also been used for steganography. This includes embedding information in TCP packets and x86 machine code.

However, there are simpler methods that we can use if our goal is just to obfuscate some information. One that has seen more nefarious uses is the humble xor cipher which can turn a comprehensible message into what looks like binary gibberish. As Wikipedia notes, this method is popular in malware, and indeed, the place where I encountered it over two decades ago was in a modified version of the Eggdrop IRC bot called VoiD. Eggdrop configuration files are stored in plaintext but avid users of VoiD were not always eager to reveal their operations to system administrators.

Instead, they inserted the configuration file into a fixed-length char array. This moved the configuration from a standalone file into a statically allocated char buffer inside the executable. The contents of a buffer like that can be easily discovered by running strings on the executable. To prevent this, VoiD obfuscated the buffer with an xor cipher. Here is a demonstration of the idea that shows how the output of hexdump changes for an executable after applying a cipher to a char buffer:

  00003010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
░░00003020░░54░68░69░73░20░6d░65░73░░73░61░67░65░20░69░73░20░░|This message is |░░
░░00003030░░68░69░64░64░65░6e░2e░20░░42░75░74░20░6e░6f░74░20░░|hidden. But not |░░
░░00003040░░76░65░72░79░20░77░65░6c░░6c░21░00░00░00░00░00░00░░|very well!......|░░
░░00003050░░00░00░00░00░00░00░00░00░░00░00░00░00░00░00░00░00░░|................|░░
  00003070  47 43 43 3a 20 28 55 62  75 6e 74 75 20 31 31 2e  |GCC: (Ubuntu 11.|

                      ◥◤       xor cipher      ◥◤

  00003010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
░░00003020░░9b░a7░a6░bc░ef░a2░aa░bc░░bc░ae░a8░aa░ef░a6░bc░ef░░|................|░░
░░00003030░░a7░a6░ab░ab░aa░a1░e1░ef░░8d░ba░bb░ef░a1░a0░bb░ef░░|................|░░
░░00003040░░b9░aa░bd░b6░ef░b8░aa░a3░░a3░ee░cf░cf░cf░cf░cf░cf░░|................|░░
░░00003050░░cf░cf░cf░cf░cf░cf░cf░cf░░cf░cf░cf░cf░cf░cf░cf░cf░░|................|░░
  00003070  47 43 43 3a 20 28 55 62  75 6e 74 75 20 31 31 2e  |GCC: (Ubuntu 11.|

This approach was far from perfect¹ but it got the job done.

Yet steganography is a much broader concept and it isn’t limited to bit manipulation. For example, one could argue that port knocking is a form of steganography, too, but this is somewhat of a tangent here.² One thing that is common among these methods is that the embedded message is either invisible or, upon first inspection, looks like noise. And there is something else that looks like noise: the lowest bits of floating-point numbers. So let’s talk about those.

Hiding a message in floating-point numbers

The weights and biases of a neural network are represented by floating-point numbers. Floats come in different precisions.³ A single-precision float encodes 32 bits of information and takes up 4 bytes in memory. A double-precision float encodes 64 bits and takes up 8 bytes.

In machine learning, smaller floats that encode 16 bits or less are more popular because of their reduced memory footprint. But if a model is not meaningfully compromised by giving up precision, then it also shouldn’t be if we store a hidden message in them.

Binary representation

Floating-point numbers are encoded in scientific notation, but in base 2 instead of base 10. A number \(x\) is characterized as a float by the formula

\[ x = (-1)^{\rm sign} \times {\rm mantissa} \times 2^{\rm exponent}. \]

It is the mantissa that drives a float’s precision, and in normalized form, it is always between 0.1 and 1.0. If we had arbitrary precision, we could write any number as a floating-point number, no matter how small or large:

\[ \begin{aligned} 42 &= (-1)^0 \times 0.65625 \times 2^6, \\ 42,\!000,\!000 &= (-1)^0 \times 0.6258487701416016 \times 2^{26}, \;\text{and} \\ 4,\!200,\!000,\!042 &= (-1)^0 \times 0.9778887131251395 \times 2^{32}. \end{aligned} \]

But our precision is limited. For the sake of illustration, let’s consider 32-bit floats. 42 and 42,000,000 can be encoded without loss of precision. However, the mantissa of a 32-bit float is not long enough to encode 4,200,000,042. The 42 at the end gets chopped off which we can easily confirm, for example, with NumPy:

import numpy as np

np.float32(42)             # Returns 42.0.
np.float32(42_000_000)     # Returns 42000000.0.
np.float32(4_200_000_042)  # Returns 4200000000.0.

The mantissa is where Steganotorchy embeds hidden information. How exactly is the mantissa stored then? The IEEE 754 standard specifies the layout of single-precision floats. The binary representation of 1.0 is:

The highest bit of a 32-bit float is the sign, the next 8 bits are the exponent, and the lowest 23 bits are the mantissa.

The mantissa is stored in the lowest 23 bits. If we change the lowest bit of the mantissa from 0 to 1, then the number changes ever so slightly, from 1.0 to 1.0000001.

Suppose we want to hide the letter a in a sequence of floats. The ASCII encoding of a is 0x61, or 01100001 in binary. This means that we can hide this letter inside eight 32-bit floats by changing only the lowest bit:

An illustration of a one-byte message embedded in the lowest bit of eight 32-bit floats.

The highlighted bits were modified to embed the letter a.

We only need four floats if we change the lowest two bits:

An illustration of a one-byte message embedded in the lowest two bits of four 32-bit floats.

The highlighted bits were modified to embed the letter a.

Or just one float if we use the lowest eight bits:

An illustration of a one-byte message embedded in the lowest eight bits of a single 32-bit float.

The highlighted bits were modified to embed the letter a.

Therefore if we use the lowest eight bits, then we can hide a 1 KiB message inside the weights and biases of any neural network that has at least 1,024 parameters. Now, unless every message that we might want to embed is exactly 1 KiB long, it is a good idea to also embed the length of the message.

Embedding information in floats

Steganotorchy embeds a message in two sections: a header and the content. The header consists of an integer that encodes the length of the content. The length provides necessary information for extracting the content. Since the content is almost certainly smaller than the information capacity of the model, without the length we wouldn’t know how many bytes to read out of the model parameters.

Take an imaginary neural network where every model parameter is initially set to 1.0. This figure shows what the header and the content look like after we embed the letter a:

An illustration of a one-byte message embedded in two 32-bit floats by Steganotorchy, in the form of a header and a content.

The highlighted bits were modified by Steganotorchy. Up to 8 bits are embedded in each 32-bit float. The content is the letter a, taking up 1 byte. The header is the length of the content, 1, encoded as a ternary number.

The header encodes the length as a ternary number, with two bits for each ternary digit. The ternary number is terminated by the bit sequence 11 as an end-of-record marker.

Bit sequence	Meaning
`00`	0
`01`	1
`10`	2
`11`	End-of-record marker

In the example above, the letter a was a single byte long, so the length was 1, which got encoded as the bit sequence 0111 with the termination marker. If the content was, for example, 23 bytes long, then the length would be 212 in ternary, encoded as 10011011.

Usage on the command line

Steganotorchy operates on safetensors and can embed up to 8 bits per model parameter. It supports two main commands, embed and extract:

$ steganotorchy -b 8 -m model.safetensors embed the_tale_of_peter_rabbit.txt rabbit.safetensors
Embedded 5182 bytes into "rabbit.safetensors".
$ steganotorchy -b 8 -m rabbit.safetensors extract rabbit.out
Extracted 5182 bytes into "rabbit.out".

The output of extract matches the input of embed, as we expect:

$ head -7 rabbit.out
Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.
$ sha1sum the_tale_of_peter_rabbit.txt rabbit.out 
f8969d071bc3d67bfc95fc12ea55912619fea3a5  the_tale_of_peter_rabbit.txt
f8969d071bc3d67bfc95fc12ea55912619fea3a5  rabbit.out

A safetensors file can also be inspected to see its capacity and any purported message hidden in it. In this case, we have embedded 8 bits per model parameter, so running inspect with -b 8 correctly shows the length and the beginning of the message:

$ steganotorchy -b 8 -m rabbit.safetensors inspect
Model file "rabbit.safetensors":

  Assuming 8 bits/byte:

    Capacity:        45000 bits
    Message length:  5182 bytes
    Message content: "Once upon a time the..."

If the model doesn’t have enough capacity, then embed fails with an error:

$ steganotorchy -b 7 -m model.safetensors embed the_tale_of_peter_rabbit.txt rabbit.safetensors
Error: Message needs at least 5926 parameters to be embedded but tensors have only 5625.

But because the model only stores the message length and the content, not the number of bits per model parameter, we need to know the right setting upfront. If we try to read with the wrong setting, we just get non-sense:

$ steganotorchy -b 7 -m rabbit.safetensors inspect
Model file "rabbit.safetensors":

  Assuming 7 bits/byte:

    Capacity:        39375 bits
    Message length:  511 bytes
    Message content: "\u{9f}»\u{1e}T\u{1d}xoÜ\u{83}\n\u{e}\u{9a}våAÓF^YP..."

Detecting a hidden message

It may be possible to detect the hidden message because of the content but not because of the header. The reason for embedding the length as a ternary number is that this way the number of embedded bits scales with the length itself, avoiding a bunching of leading zero bits in the header.⁴

However, the content can give away the message because it changes the distribution of zero bits in the model parameters. For example, embedding The Tale of Peter Rabbit created this distribution in the lowest eight bits:

$ steganotorchy -b 8 -m rabbit.safetensors inspect -v
Model file "rabbit.safetensors":

  Bytes:         5625
  Zero 8th bits: 3180
  Zero 7th bits: 3577
  Zero 6th bits: 2697
  Zero 5th bits: 3501
  Zero 4th bits: 3879
  Zero 3rd bits: 266
  Zero 2nd bits: 1646
  Zero 1st bits: 5387

  ...

The high proportion of zero bits in the first position (\(5,\!387 / 5,\!625 \approx 95.8\%\)) is because the message is ASCII plaintext and ASCII letters and punctuation all have a zero first bit. In principle, if someone can compare these bit counts to a plausible counterfactual distribution of zero bits, then a simple statistical hypothesis test reveals the presence of the message.

What a plausible counterfactual could be is a complicated question and worth another blog post. Random number generators used in the training of neural networks generate good enough randomness but not truly random bits. As a result of this, and of the training process itself, the pre-embedding distribution of zero bits may also not be even:

$ steganotorchy -b 8 -m model.safetensors inspect -v
Model file "model.safetensors":

  Bytes:         5625
  Zero 8th bits: 2895
  Zero 7th bits: 2898
  Zero 6th bits: 209
  Zero 5th bits: 3
  Zero 4th bits: 0
  Zero 3rd bits: 0
  Zero 2nd bits: 5625
  Zero 1st bits: 2550

  ...

In any case, if the hidden message being found out is a concern, then the message can be encrypted with gpg or another tool before embedding it.

Thanks to Krisztián Fekete for feedback and suggestions that improved this post.

Updated on November 22, 2024.

Can you guess what key I used for the xor cipher by looking at the output of hexdump?↩︎
Popular implementations of port knocking also improve security via obscurity and thus this scheme could also be considered a steganographic method. Port knocking is commonly used to block access to an SSH server unless the client sends packets to the right sequence of ports before connecting.

The server only allows the client to connect if it first sends packets to the correct sequence of ports.

The port sequence constitutes a secret handshake. Usually, the handshake that the client uses is fixed, so they can repeat it any number of times and the server will always let them connect. This renders the method vulnerable to eavesdropping. The handshake, because it’s fixed, can be captured and replayed by a knowing adversary who observes the network traffic between the client and the server. But it may not stand out to anyone else.↩︎
When working with floats, lower-level languages like Rust, C, and C++, require that you explicitly choose the precision to use, but in higher-level languages like Python and R, double precision is the default. JavaScript goes further and even represents integers as double-precision floats.↩︎
If the length was embedded as an ordinary 32-bit integer instead, then the header would lead with an unusually long sequence of zero bits for smaller messages.

A hypothetical alternative in which the length is encoded as a 32-bit integer. The highlighted bits would be modified by Steganotorchy. The length, stored in the header, would have a long sequence of leading zero bits.

For example, if the message is below 2 MiB in size, then the header would have at least 11 leading zero bits because

\[32 - \log_2(2 \times 1024 \times 1024) = 11.\]

If the lowest bits of the model parameters are decided by a coin toss (which is a big if), and if an adversary has a good guess about where the header starts, then the hidden message would be given away by the low probability of 11 zero bits occurring in a sequence at random:

\[ \Pr(L_{32} = \cdots = L_{22} = 0) = \dfrac{1}{2^{11}} \approx 0.049\%. \]

Steganotorchy’s ternary encoding eliminates this problem with the header.↩︎

Gábor Nyéki

Hiding a message in my PyTorch weights

Contents

Introduction

Hiding a message in floating-point numbers

Binary representation

Embedding information in floats

Usage on the command line

Detecting a hidden message