Hack a redacted pdf document

8/17/2023

Remove with: a tool capable of reordering the objects in the PDF file. If you have N objects, you have N! possible shuffles of those objects, and can then encode log2(N!) bits of information in the way they are ordered. Find a metric - lexicographic could do - to define a "natural" order of those objects. PDF has an internal index, so the actual ordering of most object entities inside the file does not matter.Yet other kinds of metadata, designed for stealth, may not be so easy to detect, much less remove. Now the stream length needs to be corrected, and the index offsets will be changed, yielding a broken PDF pdftk can reconstruct the index from this broken PDF generating a clean one. ColorSpace 8 0 the stream object as a JPEG, manipulating it, and writing it back. iText, or a state machine identifying the streams. To remove it, I would need to extract the JPEG(s), remove the EXIF and other data tags from them, and then re-embed them back. the original string is still there: $ strings test.pdf | grep CopyrightĬopyright (c) 1998 Hewlett-Packard Company Profile Copyright : Copyright (c) 1998 Hewlett-Packard CompanyĪnd convert it to PDF $ convert istockphoto_2425717-getting-a-call.jpg test.jpg GPS positioning and the like).įor example if I take an image containing a copyright $ exiftool istockphoto_2425717-getting-a-call.jpg | grep Copyright Actually, being a JPEG, a DCTDecode object could contain EXIF information and this might be sensitive (e.g. The original object might contain extra bytes that the filter will ignore and discard, and which may be meaningless, or may contain pieces of random memory from the encoding computer (and possibly sensitive information), or may contain the same intentionally. The filter supplies its own data size field internally so what happens is that the "outer" PDF reader gathers the DCTDecode-encoded object and passes it to the filter, which gives back a raster image. Some filters may contain garbage without this compromising PDF legibility (for example the DCTDecode filter used to store a JPEG entity within the document. To the end of the file, leaving its original contents intact When updating a PDF file incrementally, changes shall be appended The contents of a PDF file can be updated incrementally without rewriting "Semi-inadvertent" metadata such as the ID (above) can be easily removed either directly or by re-saving the PDF removing unused objects and previous revisions of extant objects, which could contain sensitive information either intentionally or unintentionally:

It much depends on what kind of redaction you're trying to achieve. pdftk compress command is advised to save disk space).

(in the case of ID, just replace with a sequence of random hex of the same length. You can for example easily alter the trailer where fields such as /ID are to be found: /Info 104 0 R Same goes for "non-immediately-PDF" code (you can see that with tools such as PDFid). Most metadata will then be immediately visible (and removable, provided you repair the file with pdftk afterwards). You can transform the PDF into uncompressed form using pdftk.

0 Comments

Hack a redacted pdf document

Leave a Reply.

Author

Archives

Categories