Adding Bookmarks to PDF Documents with pdfmark

Applies To: PDF Manipulation

Have you ever needed to add those fancy navigational elements to a PDF document? Some people (incorrectly) call these the table of contents, most interfaces call them Bookmarks and internally they’re collectively known as the outline. I’m talking about these things:

Bookmarks

But what if you’re generating PDFs programatically? For instance, you are converting a bunch of image files to a PDF using something like ImageMagick (or even GraphicsMagick). Or maybe your fancy software doesn’t add them and you need a way to get them added. Fortunately, you can add them through the somewhat obscure pdfmark interface! WOWEE!

There are several paid tools out there to do this and there are even some free ones that do an okay job. For instance, jpdfbookmarks is free and does an alright job (the bookmarks are there, but other PDF processors will likely mark the PDF as invalid and “repair” these by removing them). However, these often introduce an extra interface or simplified format that may not fully support everything you can do with bookmarks (let alone the rest of pdfmark). There is actually a ton you can do (even bookmarks can do far more than just navigate to another page) and the syntax isn’t that complicated.

Anatomy of a PDF Bookmark

Generally, a basic bookmark simply jumps to a specific page. Here’s a very basic example of a bookmark that will open page 2 of a PDF document (pages start at 1) with a child bookmark that will open page 3:

[ /Title (Some Bookmark)
  /Page 2
  /Count 1
  /View [/XYZ null null 0]
  /OUT pdfmark

    [ /Title (Sub Bookmark)
      /Page 3
      /View [/Fit]
      /OUT pdfmark

The whitespace (tabs and spaces) makes no difference, but I find it easier to indent child bookmarks. Here’s what’s happening:

  • represents the start of a new pdfmark “command” (you can have multiple commands in a single file)
  • /Title defines the text of the bookmark (everything in the parenthesis)
  • /Count indicates the number of child bookmarks
    • When the number is positive the bookmark is expanded (it’s children showing by default)
    • When the number is negative the bookmark is collapsed (it’s children hidden by default)
    • If there are no children, leave it out
    • Child bookmarks should follow their parent. When the children are done, the next bookmark belongs to the parent level
  • /Page defines which page number (starting with 1) to navigate to (this is not required since bookmarks can do other actions as well – see below)
  • /View defines the zoom level the destination page will have (see below for details)
  • /OUT pdfmark defines the “command” as a bookmark (/OUT) and is the end of the “command”

View Magnification

The zoom level of the destination page is set using the /View option. There are a ton of options here (see page 11 of the Cooking up Enhanced PDF with pdfmark Recipes eBook by Lynn Mead for more examples). But here are the most common values:

  • [/Fit] – Fits the page to the window (the whole page is visible)
  • [/FitH top] – Fits the width of the page to the window, replace top with a number
    • The top value is the distance from the page origin to the top of the window (offset) e.g. [/FitH 32]
  • [/FitH -32768] – Fits the width of the page to the window (top value is automatic)
  • [/FitV left] – Fits the height of the page to the window, replace left with a number
    • The left value is the distance in from the page origin to the left edge of the window (offset) e.g. [/FitV -17]
  • [/XYZ left top zoom] – Gives a specific origin offset and zoom level, replace left, top, and zoom with either a number or the word null e.g. [/XYZ 3 5 10] or [/XYZ null null 0]
    • The left value is the distance in from the page origin to the left edge of the window (offset)
    • The top value is the distance from the page origin to the top of the window (offset)
    • The zoom level is the magnification (0-100)

To simply keep whatever zoom level the user is using just use [/XYZ null null 0].

Other Options

There are additional parameters for bookmarks that can be added as necessary:

  • /Color [R G B] – Defines the color of the bookmark. Replace R, G, and B with numbers e.g. [.2 1 .76]
    • R is the percentage of Red from 0 to 1 (use decimals for other values)
    • G is the percentage of Green from 0 to 1 (use decimals for other values)
    • B is the percentage of Blue from 0 to 1 (use decimals for other values)
  • /F format – Defines the format of the bookmark text. Replace format with one of the following numbers:
    • 0 – normal
    • 1 – italic
    • 2 – bold
    • 3 – bold italic

Advanced Actions

A bookmark can do more than just open a page within the document. You can use the following parameters to change that behavior:

  • /File (filename) – Opens the PDF document specified in the filename value
    • To open a non-PDF file use the /File parameter along with /Action /Launch
  • /URI (address) – Opens the web page specified in the address value
  • /Action various – Performs an advanced action
    • There are several of these and they can get rather complicated but here’s a brief list:
      • /Action /Launch – Opens non-PDF files, requires a /File parameter
      • /Action << /Subtype /Name /N /menuitem >> – Executes the menu item specified (replace menuitem with the name of the menu item e.g. /Action << /Subtype /Name /N /Print >>)
      • /Action /Article – Follow an Article Thread (requires the /Dest parameter and optionally a /File parameter)
      • /Action << /Subtype /ImportData /F (filename.fdf) >> – Import Form Data from the specified file
      • /Action << /Subtype /ResetForm >> – Reset the Form

There are even additional actions for playing movies and sounds, executing JavaScript, and submitting form data.

See page 12 of the Cooking up Enhanced PDF with pdfmark Recipes eBook by Lynn Mead for more examples and details.

Applying the Bookmarks to a PDF Document

As a reminder, pdfmark is just put in a text file and then applied using GhostScript with the following command:

gswin64c -o [outputfilename] -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress [originalPDFfilename] [pdfmarkfillename]

For additional details about how to get this all quickly setup, see my previous post Applying pdfmark to PDF Documents Using GhostScript.

If you’re adding bookmarks, you should probably ensure they’re visible when the document is opened by also setting the View Options.

Setting PDF View Options with pdfmark

Applies To: PDF Manipulation

Using the pdfmark syntax, you can add a lot of features to existing PDF documents. In a previous post I showed you how to apply pdfmark to PDF documents using GhostScript from the command line. In this post, I’ll show you how to set the View Options using this same technique.

View Options control the default display for a PDF document when it is opened. You are able to set the zoom levels, the starting page, and determine what features of the interface are already visible.

The following pdfmark will apply View Options to a PDF document:

[ /PageMode /UseOutlines
  /Page 1
  /View [/Fit]
  /DOCVIEW pdfmark

Here’s a breakdown of what’s happening:

  • represents the start of a new pdfmark “command” (you can have multiple commands in a single file)
  • /PageMode defines the page mode display (see table below for options)
  • /Page defines which page number (starting with 1) to start on
  • /View defines the zoom level the document will start with (see below for details)
  • /DOCVIEW pdfmark defines the “command” as View Options (/DOCVIEW) and is the end of the “command”

Page Modes

The /PageMode option determines the starting state for the PDF document. None of the options disable any features (so a user can still turn them on in the interface).

  • /UseNone – Document displays without bookmarks or thumbnails visible
  • /UseOutlines – Document displays with the bookmarks visible
  • /UseThumbs – Document displays with the thumbnails visible
  • /FullScreen – Document displays in full screen mode

View Magnification

The starting zoom level for a PDF document is set using the /View option. There are a ton of options here (see page 11 of the Cooking up Enhanced PDF with pdfmark Recipes eBook by Lynn Mead for more examples). But here are the most common values:

  • [/Fit] – Fits the page to the window (the whole page is visible)
  • [/FitH top] – Fits the width of the page to the window, replace top with a number
    • The top value is the distance from the page origin to the top of the window (offset) e.g. [/FitH 32]
  • [/FitH -32768] – Fits the width of the page to the window (top value is automatic)
  • [/FitV left] – Fits the height of the page to the window, replace left with a number
    • The left value is the distance in from the page origin to the left edge of the window (offset) e.g. [/FitV -17]
  • [/XYZ left top zoom] – Gives a specific origin offset and zoom level, replace left, top, and zoom with either a number or the word null e.g. [/XYZ 3 5 10] or [/XYZ null null 0]
    • The left value is the distance in from the page origin to the left edge of the window (offset)
    • The top value is the distance from the page origin to the top of the window (offset)
    • The zoom level is the magnification (0-100)

Applying the View Options to a PDF Document

As a reminder, pdfmark is just put in a text file and then applied using GhostScript with the following command:

gswin64c -o [outputfilename] -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress [originalPDFfilename] [pdfmarkfillename]

For additional details about how to get this all quickly setup, see my previous post Applying pdfmark to PDF Documents Using GhostScript.

Applying pdfmark To PDF Documents Using GhostScript

Applies To: PDF Manipulation

The Adobe Portable Document Format (PDF) has a ton of features but often they seem locked behind pay walls such as Acrobat Pro or 3rd party software/utilities. Fortunately, Adobe created a syntax to tap into many of these features called pdfmark. Pdfmark lets you do things like add bookmarks, annotations, document properties, links, attachments, and more! In this post I’ll introduce you to basic pdfmark syntax and show you how to apply it via the command line.

I’ve often been tasked with doing this sort of thing to 1,000s of documents. There are PDF libraries, opensource and otherwise, that you can add to some custom code but they often have poor documentation, strange support, and various levels of cost.

Fortunately, I’m going to show you how to get started writing your own pdfmark files and applying them from the command line with the free GhostScript tool! From there it’s relatively simple to batch process 1,000s of documents or integrate it into your own tool.

This post will serve as a basic setup guide for a short series on different things you can do with pdfmark.

Getting GhostScript

GhostScript is a free opensource library with a command line interface that makes it really easy to apply your pdfmark markup to any PDF with a simple command and it can be automated to process thousands of files. You may even have it installed already since it’s used by a lot of software (like PDF Printers). If not, it’s a pretty simple install.

Head over to the downloads page and pick either the 64 or 32 bit version depending on your machine. Unless you want tech support or want to redistribute GhostScript commercially, you can get the free one. As of this post, the latest version was 9.21.

After you’ve installed GhostScript, you’ll want to add it to your PATH variable so that you can easily call it within any folder from the command line. The easiest way to do this on windows 10 is to type Environment Variables in the Cortana prompt and then double-click the Path variable, click New, and paste the path to the bin directory. Your actual path may be different depending on where you installed it:

GhostScript Path

Basic pdfmark

GhostScript will apply pdfmark syntax to a PDF document by referencing a text file. So let’s create a pdfmark text file!

Just open up notepad or whatever editor you prefer and type the following:

[ /PageMode /UseOutlines
  /Page 1
  /View [/Fit]
% I'm a comment!
  /DOCVIEW pdfmark

This is an example of applying View Options to a PDF document and is one of the simplest things you can do.

Structure

Every pdfmark “command” starts with a left square bracket and ends with the command type preceded by a forward slash followed by the word pdfmark. Frustratingly, there is no closing square bracket (WHY?!).

You can have multiple commands in a single file.

Whitespace

In a pdfmark document, spaces and tabs don’t matter (except in strings which are enclosed in parenthesis and not shown above). This means that you could write it all on one line or do what I did and separate it across multiple lines and indent in a way that makes it easier to read.

Comments

You can add comments by using a % sign. The comment will apply until the end of the line and won’t be interpreted at all. Multi-line comments must each have a %.

Applying pdfmark to a PDF Document

Here’s the basic syntax for applying a pdfmark text file to a PDF document:

gswin64c -o [outputfilename] -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress [originalPDFfilename] [pdfmarkfillename]
Note – GhostScript can do a ton of things and there are lots of additional options you can mix in to do some really powerful stuff, but the above is all you need to apply pdfmark.

If you just want to overwrite the PDF document, skip the -o parameter. Also, note that the parameters are CASE SENSITIVE.

Here’s an example of applying pdfmark to the MyPDF.pdf document using the pdfmark.txt file and saving the result as MySuperPDF.pdf:

gswin64c -o MySuperPDF.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress MyPDF.pdf pdfmark.txt

Great! Now we can apply pdfmark to PDF files using the command line! But what can we do with it? The next few posts will provide several examples of what you can do. In the meantime, check out the pdfmark Reference, the PDF Reference, and/or the really helpful Cooking up Enhanced PDF with pdfmark Recipes eBook by Lynn Mead.

Updating an XML File in the 14 Hive Using a Custom Timer Job

Applies To: SharePoint 2010, .NET Framework (C#, VB.NET)

As mentioned in a previous post, I’ve recently put together a solution for automatically configuring your SharePoint servers to use the Adobe PDF icon for PDF files. You can download the solution as well as the source for free from CodePlex here: WireBear PDFdocIcon. I’m going to show some of the code as it currently exists below, but be sure to check out the CodePlex site to ensure you have the latest version.

I’ve also provided the bulk of the code and some explanation for installing/uninstalling a custom job from a SharePoint solution in my last post: Implementing a Custom SharePoint Timer Job. In this post we’ll explore what’s actually happening in the execution of the timer job.

The goal is to update the DOCICON.xml file in the 14\TEMPLATE\XML folder within the SharePoint 2010 Hive to include or remove a mapping entry for a specific file extension. Here is the entire DocIconJob class:

The Code:

Imports Microsoft.SharePoint.Administration
Imports System.IO
Imports Microsoft.SharePoint.Utilities
Imports System.Xml

Public Class DocIconJob
    Inherits SPServiceJobDefinition

#Region "Properties"

    Private _dociconPath As String
    Public ReadOnly Property DocIconPath() As String
        Get
            If String.IsNullOrEmpty(_dociconPath) Then _dociconPath = SPUtility.GetGenericSetupPath("TEMPLATE\XML\DOCICON.XML")
            Return _dociconPath
        End Get
    End Property

    Private Const InstallingKey As String = "DocIconJob_InstallingKey"
    Private Property _installing() As Boolean
        Get
            If Properties.ContainsKey(InstallingKey) Then
                Return Convert.ToBoolean(Properties(InstallingKey))
            Else
                Return True
            End If
        End Get
        Set(ByVal value As Boolean)
            If Properties.ContainsKey(InstallingKey) Then
                Properties(InstallingKey) = value.ToString
            Else
                Properties.Add(InstallingKey, value.ToString)
            End If
        End Set
    End Property

    Private Const FileExtensionKey As String = "DocIconJob_FileExtensionKey"
    Private Property _fileExtension() As String
        Get
            If Properties.ContainsKey(FileExtensionKey) Then
                Return Convert.ToString(Properties(FileExtensionKey))
            Else
                Return String.Empty
            End If
        End Get
        Set(ByVal value As String)
            If Properties.ContainsKey(FileExtensionKey) Then
                Properties(FileExtensionKey) = value
            Else
                Properties.Add(FileExtensionKey, value)
            End If
        End Set
    End Property

    Private Const ImageFilenameKey As String = "DocIconJob_ImageFilenameKey"
    Private Property _imageFilename() As String
        Get
            If Properties.ContainsKey(ImageFilenameKey) Then
                Return Convert.ToString(Properties(ImageFilenameKey))
            Else
                Return String.Empty
            End If
        End Get
        Set(ByVal value As String)
            If Properties.ContainsKey(ImageFilenameKey) Then
                Properties(ImageFilenameKey) = value
            Else
                Properties.Add(ImageFilenameKey, value)
            End If
        End Set
    End Property

#End Region

    Public Sub New()
        MyBase.New()
    End Sub

    Public Sub New(JobName As String, service As SPService, Installing As Boolean, FileExtension As String, ImageFilename As String)
        MyBase.New(JobName, service)
        _installing = Installing
        _fileExtension = FileExtension
        _imageFilename = ImageFilename
    End Sub

    Public Overrides Sub Execute(jobState As Microsoft.SharePoint.Administration.SPJobState)
        UpdateDocIcon()
    End Sub

    Private Sub UpdateDocIcon()
        Dim x As New XmlDocument
        x.Load(DocIconPath)

        Dim mapNode As XmlNode = x.SelectSingleNode(String.Format("DocIcons/ByExtension/Mapping[@Key='{0}']", _fileExtension))

        If _installing Then
            'Create DocIcon entry
            If mapNode Is Nothing Then
                'Create Attributes
                Dim keyAttribute As XmlAttribute = x.CreateAttribute("Key")
                keyAttribute.Value = _fileExtension
                Dim valueAttribute As XmlAttribute = x.CreateAttribute("Value")
                valueAttribute.Value = _imageFilename

                'Create Node
                mapNode = x.CreateElement("Mapping")
                mapNode.Attributes.Append(keyAttribute)
                mapNode.Attributes.Append(valueAttribute)

                Dim byExtensionNode = x.SelectSingleNode("DocIcons/ByExtension")
                Dim NodeAdded As Boolean = False
                If byExtensionNode IsNot Nothing Then
                    'Add in alphabetic order
                    For Each mapping As XmlNode In byExtensionNode.ChildNodes
                        If mapping.Attributes("Key").Value.CompareTo(_fileExtension) > 0 Then
                            byExtensionNode.InsertBefore(mapNode, mapping)
                            NodeAdded = True
                            Exit For
                        End If
                    Next

                    If Not NodeAdded Then byExtensionNode.AppendChild(mapNode)
                    x.Save(DocIconPath)
                End If
            End If
        Else
            'Remove DocIcon entry
            If mapNode IsNot Nothing Then
                Dim byExtensionNode = x.SelectSingleNode("DocIcons/ByExtension")
                If byExtensionNode IsNot Nothing Then
                    byExtensionNode.RemoveChild(mapNode)
                    x.Save(DocIconPath)
                End If
            End If
        End If
    End Sub

End Class

What’s Going On:

Lines 9-73 are just the declaration of and logic needed to persist some properties. Again more information can be found in my last post, but basically I am using the SPJobDefinition’s Properties HashTable to store my own properties as specified in the constructor. Except for in the case of the DocIconPath property which is really just wrapping up some logic to get a reference to the 14 Hive’s TEMPLATE\XML directory using the SPUtility class.

The Execute method beginning in line 86 is what is called when the Timer Job actually runs. I override this method to ensure my custom code gets called instead. My custom code really begins in the UpdateDocIcon method starting at line 90.

In lines 91-94, I load the DOCICON.xml file into and XmlDocument object and attempt to find the mapping node that applies to the appropriate file extension (In this case it’s going to be pdf).

If this job is installing (Running on Solution Activation), then I just check to see if the node was found. If so, all done! If not, then it’s time to add it. I create the node and setup it’s attributes in lines 100-108 using standard objects from the System.Xml namespace.

In order to work, the mapping node needs to be added as a child of the ByExtension element, so we find that in line 110. By default the mapping nodes are listed in alphabetical order by their extension. Since I’m anal, I use a method in lines 114-120 presented by Steve Goodyear to ensure I insert the mapping node in it’s proper position. Failing that, I add it to the end in Line 122 and save the file in line 123.

If this job is uninstalling (Running on Solution Deactivation) and the mapping exists, we delete it and save the file in lines 128-134.

Isn’t that Super Exciting?!?! Hopefully this example will help make the concepts I was talking about in my previous post make some sense. If not, then sadness will fill my soul and flowers will no longer bloom or something.

Automatically Setting Up PDF Icon Mapping in SharePoint 2010

Applies To: SharePoint 2010

Nearly everyone who has ever used SharePoint has had to setup the PDF icon mapping so that PDF documents will have the familiar Adobe logo rather than the blank, unknown icon SharePoint uses by default. This is relatively simple and there are guides to do doing this all over the internet. (Microsoft’s can be found here).

Here is a very brief summay of the steps that must be performed manually on every server:

  1. Copy the PDF icon picture from Adobe and put it in your 14 Hive (TEMPLATE\IMAGES)
  2. Edit the DOCICON.xml file in your 14 Hive (TEMPLATE\XML) to add a Mapping element for pdf documents pointing to your new icon
  3. Reset IIS

These aren’t super complicated steps but there are some pretty big problems (or at least irritations) with using this approach:

  • Manual changes can often be error-prone, especially for those not familiar with XML
  • The change must be performed on every server
  • The change must be performed whenever a new server is added to the farm
  • The change will have to be redone in the event of disaster recovery

So, like many before me, I thought, surely this can all be automated! So I looked and I found some solutions for SharePoint 2007 and several solutions that only worked for Standalone Servers or for only one server in the farm. These were of help, but still no good for my needs. So, I wrote my own.

You can find it over on CodePlex as WireBear PDFdocIcon. There’s some stuff about it’s license over there (Free for personal and commercial use, etc.) and the basic installation instructions. It’s super easy to setup since it’s just a standard SharePoint Solution that you globally deploy.

The full source code is available on CodePlex, but I’ll be going in depth about how it works over the next few posts. But to summarize, here’s what happens:

  • The Adobe PDF icon file is copied to the 14\TEMPLATE\IMAGES folder using standard resource deployment
  • On Activation and Deactivation a one time Service Timer Job is run.
  • On Activation, the Timer job searches for a mapping for PDF documents within the 14\TEMPLATE\XML\DOCICON.xml file. If not found, it adds one (in alphabetic order) and points it to the icon file
  • An IIS Reset is performed to get the changes activated
  • When Deactivating, the Timer job removes the mapping for PDF documents
So why use this thing?
  • The changes will be reapplied in the event of Disaster Recovery
  • The changes will be applied to new servers as they are added to your farm
  • You don’t have to personally edit the 14 Hive on every server in your farm
  • It makes a special place in your heart of hearts that keeps the beast at bay

In making this, I came across several blog entries that were especially helpful, here are most of these (Thanks!):

I’ve found this to be a helpful approach and I hope you do too.

PDF Search Results Direct Link (Eliminating DispForm.aspx Results) Without an iFilter

Applies To: SharePoint

We are utilizing our search functionality much more in SharePoint and one of the more annoying things we found was how PDF files are treated by default. In the search results, the link goes to the DispForm.aspx for the item rather than directly to the item.

The obvious fix is to install an iFilter. Unfortunately, this isn’t always an option. For us, the performance and crawl delay issues didn’t make up for the benefit of having these documents indexed. Fortunately, I came across this answer by daver306 on SharePoint SE that didn’t get a lot of attention but worked perfectly for me.

I wanted to write it up with some added detail and share my experiences. Not only does this allow you to link directly to your PDFs within the search without the use of XSL and allows KnowledgeLake queries to open PDFs directly within the KnowledgeLake Viewer, it’s actually pretty simple to do.

1. Add PDF as a File Type

Within Central Admin, go to your Search Administration (Manage Service Applications > Search Service). From there click on the File Types link under Crawling on the left:

If pdf is not listed, click the New File Type button and type pdf (no period needed) in the File extension box and click OK:

2. Restart the Search Service

This is a very important step. I originally tried to skip it to spare myself some hassle and ended up having to repeat the crawl below. You will need to go to each server running the SharePoint Server Search service and stop it. You can do this through the command line or the Services panel under Administrative Tools:

Once off on all boxes, just go back through and start it again.

3. Reset Your Index

Back on the Search Administration page within Central Administration you will want to click on the Index Reset link under Crawling on the menu on the left:

Press the Reset Now button. Remember that this should be done at a time when your environment is not under heavy use or when search won’t be needed since search results will not be available until after a full crawl completes.

4. Perform a Full Crawl

If you have a pretty standard search setup, then you probably only have one content source. If not, then you already know how to start the full crawls for each of them. If you’ve just got the one, then from the Search Administration page within Central Administration click on the Content Sources link under Crawling on the menu on the left. Hover over your content source and choose Start Full Crawl in the dropdown menu:

After the crawl completes (This could be hours depending on the size of your farm), things should be working as expected. No more DispForm.aspx links in your search results!