To perform this, of course, we need a lot of training data, and here, the AI reads 40 gigabytes of internet text, which is 40 gigs of non-binary plaintext data, which is a stupendously large amount of text.
In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym.
Plaintext has reference to the operation of cryptographic algorithms, usually encryption algorithms, and is the input upon which they operate. Cleartext, by contrast, refers to data that is transmitted or stored unencrypted (that is, 'in the clear').