Palai

Palai

程序员 | 开源爱好者 | 喜欢交友

Report/PDF cannot add watermark - Internship Memo

Background: The crawler cannot add watermarks to the crawled reports.

The situation encountered is that only this special report cannot be added because it lacks the Trailer part of the PDF format report.

PDF format:

There are four basic parts in the PDF structure: Header, Body, Cross-Reference Table, and Trailer.

Let's focus on the Trailer part:
PDF parsing starts from the end of the PDF file, and the Trailer part can quickly locate the cross-reference table and certain special objects.

In addition, this format can be parsed by browsers and WPS, but the conversion tool we are currently using is the free version, so it cannot be parsed and therefore cannot add watermarks.

Solution: 1. Research other backend PDF watermarking tools; 2. Try frontend PDF watermarking tools; 3. Generate PDF from HTML.

Considering that the backend watermarking tool currently used is the open-source version of iText, which is not well-maintained, it is likely that using a newer PDF watermarking tool can solve the problem.
After my research, I found another open-source tool called Spire.PDF.

Spire.PDF is a professional PDF component that can independently create, write, edit, manipulate, and read PDF files, supporting .NET, Java, WPF, and Silverlight.

However, I found some minor issues when using it, such as Spire PDF's higher versions having built-in watermarking tools.

The solution is to switch to a lower version where the watermarking tool is only present on the first page. On this basis, add an extra page in front of each PDF and then delete it.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.