python合并pdf并添加书签

初步熟悉

安装

pip install  pypdf2

合并并添加书签

#!/usr/bin/env python3.5
# -*- coding: utf-8 -*-
# @Time    : 2019/12/1 下午3:42
# @Author  : yon
# @Email   : [email protected]
# @File    : tt.py

import os , os.path
from PyPDF2 import PdfFileReader, PdfFileWriter
import time
# import glob
#
# def getFileName(filepath):
#     filelist = glob.glob(filepath + "*.pdf")
#     # print(filelist)
#     # return filelist
#     list_a = []
#     for i in filelist:
#         print(i[22:-1])
#         list_a.append(i[22:])
#     for ii in  sorted(list_a):
#         print(ii)
#
#
#
# getFileName("/home/yon/Desktop/pdf/")
pdffile = "/home/yon/Desktop/pdf/688:TheOutCrowd.pdf"
pdffile1 = "/home/yon/Desktop/pdf/652:ICECapades.pdf"
pdfreader = PdfFileReader(pdffile)
pdfreader1 = PdfFileReader(pdffile1)
numbber = pdfreader.getNumPages()
print(numbber)
bookmark = PdfFileWriter()
bookmark.appendPagesFromReader(pdfreader)
bookmark.appendPagesFromReader(pdfreader1)
bookmark.addBookmark("00test", 0)
bookmark.addBookmark("11test", 1)
bookmark.addBookmark("22test", 2)
bookmark.addBookmark("44test", 4)
bookmark.addBookmark("77test", 7)
bookmark.addBookmark("652:ICECapades", numbber)
bookmark.write(open("/home/yon/Desktop/1tttt.pdf", 'wb'))

注意:
在指定的页面添加书签时候要注意,PyPDF2中pdf的页面是从0开始的,即在序列0添加书签时会跳到首页,在n处添加书签时会跳转至n+1页。

PyPDF2使用说明

官网:
https://pythonhosted.org/PyPDF2/PdfFileWriter.html

The PdfFileReader Class

class PyPDF2.PdfFileReader(stream, strict=True, warndest=None, overwriteWarnings=True)
Initializes a PdfFileReader object. This operation can take some time, as the PDF stream’s cross-reference tables are read into memory.
参数说明:

Parameters:
        stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.
        流,一个文件对象或是类似文件对象的支持标准的读和搜索方法的一个对象,也可以是一个代表了pdf文件路径的字符串
        strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.
        严格(bool类型)确定是否应警告用户所有问题,并会导致一些可纠正的问题致命
        warndest – Destination for logging warnings (defaults to sys.stderr).
        警告目的    警告日志的记录目的,默认为sys.stderr
        overwriteWarnings (bool) – Determines whether to override Python’s warnings.py module with a custom implementation (defaults to True).

一些方法:

##decrypt(password)
When using an encrypted / secured PDF file with the PDF Standard encryption handler, this function will allow the file to be decrypted. It checks the given password against the document’s user password and owner password, and then stores the resulting decryption key if either password is correct.

It does not matter which password was matched. Both passwords provide the correct decryption key that will allow the document to be used with this library.

Parameters: password (str) – The password to match.
Returns:    0 if the password failed, 1 if the password matched the user password, and 2 if the password matched the owner password.
Return type:    int
Raises NotImplementedError:
    if document uses an unsupported encryption method.
##documentInfo
Read-only property that accesses the getDocumentInfo() function.

##getDestinationPageNumber(destination)
Retrieve page number of a given Destination object

Parameters: destination (Destination) – The destination to get page number. Should be an instance of Destination
Returns:    the page number or -1 if page not found
Return type:    int

##getDocumentInfo()
Retrieves the PDF file’s document information dictionary, if it exists. Note that some PDF files use metadata streams instead of docinfo dictionaries, and these metadata streams will not be accessed by this function.

Returns:    the document information of this PDF file
Return type:    DocumentInformation or None if none exists.

##getFields(tree=None, retval=None, fileobj=None)
Extracts field data if this PDF contains interactive form fields. The tree and retval parameters are for recursive use.

Parameters: fileobj – A file object (usually a text file) to write a report to on all interactive form fields found.
Returns:    A dictionary where each key is a field name, and each value is a Field object. By default, the mapping name is used for keys.
Return type:    dict, or None if form data could not be located.

##getFormTextFields()
Retrieves form fields from the document with textual data (inputs, dropdowns)

##getNamedDestinations(tree=None, retval=None)
Retrieves the named destinations present in the document.

Returns:    a dictionary which maps names to Destinations.
Return type:    dict

##getNumPages()
Calculates the number of pages in this PDF file.

Returns:    number of pages
Return type:    int
Raises PdfReadError:
    if file is encrypted and restrictions prevent this action.

##getOutlines(node=None, outlines=None)
Retrieves the document outline present in the document.

Returns:    a nested list of Destinations.

##getPage(pageNumber)
Retrieves a page by number from this PDF file.

Parameters: pageNumber (int) – The page number to retrieve (pages begin at zero)
Returns:    a PageObject instance.
Return type:    PageObject

##getPageLayout()
Get the page layout. See setPageLayout() for a description of valid layouts.

Returns:    Page layout currently being used.
Return type:    str, None if not specified

##getPageMode()
Get the page mode. See setPageMode() for a description of valid modes.

Returns:    Page mode currently being used.
Return type:    str, None if not specified

##getPageNumber(page)
Retrieve page number of a given PageObject

Parameters: page (PageObject) – The page to get page number. Should be an instance of PageObject
Returns:    the page number or -1 if page not found
Return type:    int

##getXmpMetadata()
Retrieves XMP (Extensible Metadata Platform) data from the PDF document root.

Returns:    a XmpInformation instance that can be used to access XMP metadata from the document.
Return type:    XmpInformation or None if no metadata was found on the document root.

##isEncrypted
Read-only boolean property showing whether this PDF file is encrypted. Note that this property, if true, will remain true even after the decrypt() method is called.

##namedDestinations
Read-only property that accesses the getNamedDestinations() function.

##numPages
Read-only property that accesses the getNumPages() function.

##outlines
Read-only property that accesses the
getOutlines() function.

##pageLayout
Read-only property accessing the getPageLayout() method.

##pageMode
Read-only property accessing the getPageMode() method.

##pages
Read-only property that emulates a list based upon the getNumPages() and getPage() methods.

##xmpMetadata?
Read-only property that accesses the getXmpMetadata() function

The PdfFileMerger Class

class PyPDF2.PdfFileMerger(strict=True)
参数说明

Initializes a PdfFileMerger object. PdfFileMerger merges multiple PDFs into a single PDF. It can concatenate, slice, insert, or any combination of the above.

See the functions merge() (or append()) and write() for usage information.

Parameters: strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

方法和属性

##addBookmark(title, pagenum, parent=None)
Add a bookmark to this PDF file.

Parameters:
title (str) – Title to use for this bookmark.
pagenum (int) – Page number this bookmark will point to.
parent – A reference to a parent bookmark to create nested bookmarks.

##addMetadata(infos)
Add custom metadata to the output.

Parameters: infos (dict) – a Python dictionary where each key is a field and each value is your new metadata. Example: {u'/Title': u'My title'}

##addNamedDestination(title, pagenum)
Add a destination to the output.

Parameters:
title (str) – Title to use
pagenum (int) – Page number this destination points at.

##append(fileobj, bookmark=None, pages=None, import_bookmarks=True)
Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.

Parameters:
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark.
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.
import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

##close()
Shuts all file descriptors (input and output) and clears all memory usage.

##merge(position, fileobj, bookmark=None, pages=None, import_bookmarks=True)
Merges the pages from the given file into the output file at the specified page number.

Parameters:
position (int) – The page number to insert this file. File will be inserted after the given number.
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark.
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.
import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

##setPageLayout(layout)
Set the page layout

Parameters: layout (str) – The page layout to be used

Valid layouts are:
/NoLayout   Layout explicitly not specified
/SinglePage Show one page at a time
/OneColumn  Show one column at a time
/TwoColumnLeft  Show pages in two columns, odd-numbered pages on the left
/TwoColumnRight
    Show pages in two columns, odd-numbered pages on the right
/TwoPageLeft    Show two pages at a time, odd-numbered pages on the left
/TwoPageRight   Show two pages at a time, odd-numbered pages on the right

##setPageMode(mode)
Set the page mode.

Parameters: mode (str) – The page mode to use.
Valid modes are:
/UseNone    Do not show outlines or thumbnails panels
/UseOutlines    Show outlines (aka bookmarks) panel
/UseThumbs  Show page thumbnails panel
/FullScreen Fullscreen view
/UseOC  Show Optional Content Group (OCG) panel
/UseAttachments
    Show attachments panel

##write(fileobj)
Writes all data that has been merged to the given output file.

Parameters: fileobj – Output file. Can be a filename or any kind of file-like object.

The PdfFileWriter Class

class PyPDF2.PdfFileWriter

This class supports writing PDF files out, given pages produced by another class (typically PdfFileReader)

类的属性和方法


##addAttachment(fname, fdata)
    Embed a file inside the PDF.

    Parameters:
    fname (str) – The filename to display.
    fdata (str) – The data in the file.
    Reference: https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf Section 7.11.3

##addBlankPage(width=None, height=None)
    Appends a blank page to this PDF file and returns it. If no page size is specified, use the size of the last page.

    Parameters:
    width (float) – The width of the new page expressed in default user space units.
    height (float) – The height of the new page expressed in default user space units.
    Returns:
    the newly appended page

    Return type:
    PageObject

    Raises PageSizeNotDefinedError:

    if width and height are not defined and previous page does not exist.

##addBookmark(title, pagenum, parent=None, color=None, bold=False, italic=False, fit='/Fit', *args)
    Add a bookmark to this PDF file.

    Parameters:
    title (str) – Title to use for this bookmark.
    pagenum (int) – Page number this bookmark will point to.
    parent – A reference to a parent bookmark to create nested bookmarks.
    color (tuple) – Color of the bookmark as a red, green, blue tuple from 0.0 to 1.0
    bold (bool) – Bookmark is bold
    italic (bool) – Bookmark is italic
    fit (str) – The fit of the destination page. See addLink() for details.

##addJS(javascript)
    Add Javascript which will launch upon opening this PDF.

    Parameters: javascript (str) – Your Javascript.
    >>> output.addJS("this.print({bUI:true,bSilent:false,bShrinkToFit:true});")
    # Example: This will launch the print window when the PDF is opened.

##addLink(pagenum, pagedest, rect, border=None, fit='/Fit', *args)
    Add an internal link from a rectangular area to the specified page.

    Parameters:
    pagenum (int) – index of the page on which to place the link.
    pagedest (int) – index of the page to which the link should go.
    rect – RectangleObject or array of four integers specifying the clickable rectangular area [xLL, yLL, xUR, yUR], or string in the form "[ xLL yLL xUR yUR ]".
    border – if provided, an array describing border-drawing properties. See the PDF spec for details. No border will be drawn if this argument is omitted.
    fit (str) – Page fit or ‘zoom’ option (see below). Additional arguments may need to be supplied. Passing None will be read as a null value for that coordinate.

    Valid zoom arguments (see Table 8.2 of the PDF 1.7 reference for details):
    /Fit    No additional arguments
    /XYZ    [left] [top] [zoomFactor]
    /FitH   [top]
    /FitV   [left]
    /FitR   [left] [bottom] [right] [top]
    /FitB   No additional arguments
    /FitBH  [top]
    /FitBV  [left]

##addMetadata(infos)
    Add custom metadata to the output.

    Parameters: infos (dict) – a Python dictionary where each key is a field and each value is your new metadata.

##addPage(page)
    Adds a page to this PDF file. The page is usually acquired from a PdfFileReader instance.

    Parameters: page (PageObject) – The page to add to the document. Should be an instance of PageObject

##appendPagesFromReader(reader, after_page_append=None)
    Copy pages from reader to writer. Includes an optional callback parameter which is invoked after pages are appended to the writer.

    Parameters: reader – a PdfFileReader object from which to copy page annotations to this writer object. The writer’s annots
    will then be updated :callback after_page_append (function): Callback function that is invoked after

    each page is appended to the writer. Callback signature:

    param writer_pageref (PDF page reference):
        Reference to the page appended to the writer.

##cloneDocumentFromReader(reader, after_page_append=None)
    Create a copy (clone) of a document from a PDF file reader

    Parameters:
    reader – PDF file reader instance from which the clone should be created.

    Callback after_page_append (function):

    Callback function that is invoked after each page is appended to the writer. Signature includes a reference to the appended page (delegates to appendPagesFromReader). Callback signature:

    param writer_pageref (PDF page reference):
        Reference to the page just appended to the document.

##cloneReaderDocumentRoot(reader)
    Copy the reader document root to the writer.

    Parameters: reader – PdfFileReader from the document root should be copied.
    :callback after_page_append

##encrypt(user_pwd, owner_pwd=None, use_128bit=True)
    Encrypt this PDF file with the PDF Standard encryption handler.

    Parameters:
    user_pwd (str) – The “user password”, which allows for opening and reading the PDF file with the restrictions provided.
    owner_pwd (str) – The “owner password”, which allows for opening the PDF files without any restrictions. By default, the owner password is the same as the user password.
    use_128bit (bool) – flag as to whether to use 128bit encryption. When false, 40bit encryption will be used. By default, this flag is on.

##getNumPages()
    Returns:    the number of pages.
    Return type:    int

##getPage(pageNumber)
    Retrieves a page by number from this PDF file.

    Parameters: pageNumber (int) – The page number to retrieve (pages begin at zero)
    Returns:    the page at the index given by pageNumber
    Return type:    PageObject

##getPageLayout()
    Get the page layout. See setPageLayout() for a description of valid layouts.

    Returns:    Page layout currently being used.
    Return type:    str, None if not specified
##getPageMode()

    Get the page mode. See setPageMode() for a description of valid modes.

    Returns:    Page mode currently being used.
    Return type:    str, None if not specified

##insertBlankPage(width=None, height=None, index=0)
    Inserts a blank page to this PDF file and returns it. If no page size is specified, use the size of the last page.

    Parameters:
    width (float) – The width of the new page expressed in default user space units.
    height (float) – The height of the new page expressed in default user space units.
    index (int) – Position to add the page.
    Returns:
    the newly appended page

    Return type:
    PageObject

    Raises PageSizeNotDefinedError:

    if width and height are not defined and previous page does not exist.

##insertPage(page, index=0)
    Insert a page in this PDF file. The page is usually acquired from a PdfFileReader instance.

Parameters:
    page (PageObject) – The page to add to the document. This argument should be an instance of PageObject.
    index (int) – Position at which the page will be inserted.

##pageLayout
    Read and write property accessing the getPageLayout() and setPageLayout() methods.

##pageMode
    Read and write property accessing the getPageMode() and setPageMode() methods.

##removeImages(ignoreByteStringObject=False)
    Removes images from this output.

    Parameters: ignoreByteStringObject (bool) – optional parameter to ignore ByteString Objects.

##removeLinks()
    Removes links and annotations from this output.

##removeText(ignoreByteStringObject=False)
    Removes images from this output.

    Parameters: ignoreByteStringObject (bool) – optional parameter to ignore ByteString Objects.

##setPageLayout(layout)
    Set the page layout

    Parameters: layout (str) – The page layout to be used
    Valid layouts are:
    /NoLayout   Layout explicitly not specified
    /SinglePage Show one page at a time
    /OneColumn  Show one column at a time
    /TwoColumnLeft  Show pages in two columns, odd-numbered pages on the left
    /TwoColumnRight
        Show pages in two columns, odd-numbered pages on the right
    /TwoPageLeft    Show two pages at a time, odd-numbered pages on the left
    /TwoPageRight   Show two pages at a time, odd-numbered pages on the right

##setPageMode(mode)
    Set the page mode.

    Parameters: mode (str) – The page mode to use.
    Valid modes are:
    /UseNone    Do not show outlines or thumbnails panels
    /UseOutlines    Show outlines (aka bookmarks) panel
    /UseThumbs  Show page thumbnails panel
    /FullScreen Fullscreen view
    /UseOC  Show Optional Content Group (OCG) panel
    /UseAttachments
        Show attachments panel

##updatePageFormFieldValues(page, fields)
    Update the form field values for a given page from a fields dictionary. Copy field texts and values from fields to page.

    Parameters:
    page – Page reference from PDF writer where the annotations and field data will be updated.
    fields – a Python dictionary of field names (/T) and text values (/V)

##write(stream)
    Writes the collection of pages added to this object out as a PDF file.

    Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.

原文地址:https://www.cnblogs.com/g2thend/p/12003798.html

时间: 2024-07-31 20:29:11

python合并pdf并添加书签的相关文章

使用Python批量合并PDF文件(带书签功能)

网上找了几个合并pdf的软件,发现不是很好用,一般都没有添加书签的功能. 在网上查找了python合并pdf的脚本,发现也没有添加书签的功能.于是自己动手编写了一个小工具,代码如下: 1 #!/usr/bin/env python3 2 # -*- coding: utf-8 -*- 3 ''' 4 #文件名:pdfmerge.py 5 本脚本用来合并pdf文件,输出的pdf文件按输入的pdf文件名生成书签 6 使用示例如下: 7 python pdfmerge.py -p "D:\pdf-fi

如何给pdf添加书签

如何给pdf添加书签.书签可以方便我们在没有阅读完文件再次阅读文件时快速找到上一次阅读的位置,那么我们该如何给pdf文件添加书签呢?如果屏幕前的你还不知道方法的话,那就和小编一起开始今天的学习吧.操作软件:迅捷PDF编辑器 工具下载地址:https://www.xunjiepdf.com/editor 1.想要给pdf添加书签,我们需要准备好合适的工具,通过百度浏览器下载迅捷PDF编辑. 2.首先我们需要使用能够给pdf添加书签的工具将pdf文件打开,所以我们点击文件按钮,在目录中选择打开按钮打

pdf添加书签的快速方法

随着时代的发展,无论是在工作还是在生活中,我们阅读与处理的很多文件已经从纸质文件转变为电子文件了.那么,当我们接触到一份pdf文档时,如果需要给PDF添加书签,我们应该如何操作呢?感兴趣的小伙伴们继续往下阅读吧.操作软件:迅捷PDF编辑器软件地址:https://www.xunjiepdf.com/editor1.首先我们需要找到一款可以给PDF文档添加书签的软件,我们将软件安装到自己的电脑中.2.当我们用工具将pdf文件打开后,我们需要先查看pdf文档本身有没有书签存在,需要怎么操作呢?只需要

pdf添加书签操作介绍

无论是在工作还是生活中,pdf文件都是一种非常常用的文件格式,当我们阅读pdf文件的时,因为一些其他的事情需要暂时关闭文档,在下一次阅读的时候如何能够快速找到上一次阅读的位置呢?这时候,我们就需要掌握pdf添加书签的操作方法了,下面小编就使用迅捷PDF编辑器为大家操作一下PDF添加书签的方法. 操作软件:迅捷PDF编辑器 软件地址:https://www.xunjiepdf.com/editor 1.如果我们想要给pdf文件添加书签,那么就需要准备一款专业的工具.通过百度搜索pdf编辑器,选择一

怎么给PDF添加书签,PDF添加书签的方法

怎么给PDF添加书签?PDF添加书签的方法你知道吗?现在我们在阅读PDF文件的时候,阅读到某个位置的时候就想要给PDF文件添加个书签,想要给PDF添加书签还是比较简单的,推荐大家使用迅捷PDF编辑器来进行操作,下面小编就为大家分享一下迅捷PDF编辑器给PDF添加书签的操作方法.操作方法如下: 1.首先我们用pdf编辑器将pdf文件给打开,然后点击界面上方的视图按钮,找到其中的书签选项,点击书签按钮打开书签框. 2.在界面左侧此时会弹出书签框,我们可以在书签框中看到所有的pdf书签,点击这些书签,

合并PDF(Merge PDF)

插件介绍: 想必各位小伙伴们都会遇到这样的情况,在网上下载资料的时候,本来是完整的,下载完成后却被分成了几个部分,如果把这几个部分合成一个PDF文件,那么不管是阅读还是管理都会更方便,今天就给大家介绍这个合成PDF的插件.Smallpdf的PDF合并工具运行您免费在线合并PDF文件. 拖放或上传多个PDF文件,当文件正上传中,您可预备工作.一旦您满意该效果,即可生成PDF,然后该文件将自动下载至您的硬件. 使用说明: 将合并PDF(Merge PDF)添加至chrome,并在扩展器中启动它. 功

PDF编辑器让大家都能合并PDF文档 从此不求人

有很多网友对如何将两个PDF文档合并成一个PDF伤透了脑筋,不是在贴吧里求助,就在在问答知乎上提问,今天小编就要化身为解决问题小能手,帮助大家解决这个问题,让大家都能合并PDF文档,从此不求人! 首先,我们需要下载一个非常好用的PDF阅读编辑器,那就是PDF Expert for Mac.(不知道在哪里下载的,小编这里有免费的下载包哦:http://wm.makeding.com/iclk/?zoneid=17779) 下载好之后就将这个PDF阅读编辑器安装好,然后打开如果有PDF Expert

合并pdf

合并pdf    pom.xml添加Jar依赖        <!-- pdf处理jar依赖 start -->        <dependency>            <groupId>com.itextpdf</groupId>            <artifactId>itextpdf</artifactId>            <version>5.5.12</version>      

C# / VB.NET合并PDF指定页

在前面的文章中,我们已经知道如何合并.拆分多个PDF文件,在这篇文章中的合并.拆分PDF文档主要是以方便文档管理的目的来操作文档,在文档查阅.管理及存储上很方便实用.但是我们如果想要合并多个文档中的部分文档页的内容,该如何来做呢?可以参考接下来将要介绍的合并方法. PS: 本篇文章是对Free Spire.PDF 的合并功能的进一步介绍,即如何合并多个PDF文档中的指定页(指定单页.指定多页)为一个新文档,更多关于Free Spire.PDF对PDF文档的操作功能可参阅这里的博客. 使用工具:F