Read Password Protected PDF through Apache PDFBox - Bug Reaper

                  Bug Reaper

Lean about Automation Testing,Selenium WebDriver,RestAssured,Appium,Jenkins,JAVA,API Automation,TestNG,Maven, Rest API, SOAP API,Linux,Maven,Security Testing,Interview Questions

Friday 20 April 2018

Read Password Protected PDF through Apache PDFBox

We can parse PDF files using Apache PDFBox
Just Add the dependency of Apache PDFBox  in pom.xml
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.9</version>
        </dependency>



Sometimes if you are getting error download following dependencies as well

        <dependency>
            <groupId>org.bouncycastle</groupId>
            <artifactId>bcpkix-jdk15on</artifactId>
            <version>1.54</version>
        </dependency>
        <dependency>
            <groupId>org.bouncycastle</groupId>
            <artifactId>bcprov-jdk15on</artifactId>
            <version>1.54</version>
        </dependency>

        <dependency>
            <groupId>org.bouncycastle</groupId>
            <artifactId>bcmail-jdk15on</artifactId>
            <version>1.54</version>
        </dependency>
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.9</version>
        </dependency>

Run the Sample below program to read the PDF (Unprotected)


 package com.neeraj.test.neeraj;  
 import java.io.File;  
 import java.io.IOException;  
 import org.apache.pdfbox.pdmodel.PDDocument;  
 import org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException;  
 import org.apache.pdfbox.text.PDFTextStripper;  
 import org.apache.pdfbox.text.PDFTextStripperByArea;  
 public class PDFParser {  
   public static void main(String[] args) throws InvalidPasswordException, IOException {  
      try (PDDocument document = PDDocument.load(new File("C:\\Users\\bakhtani\\Downloads\\ticket_3057779845.pdf"))) {  
         document.getClass();  
         if (!document.isEncrypted()) {  
           PDFTextStripperByArea stripper = new PDFTextStripperByArea();  
           stripper.setSortByPosition(true);  
           PDFTextStripper tStripper = new PDFTextStripper();  
           String pdfFileInText = tStripper.getText(document);  
           //System.out.println("Text:" + st);  
           // split by whitespace  
           String lines[] = pdfFileInText.split("\\r?\\n");  
           for (String line : lines) {  
             System.out.println(line);  
             if(line.contains("Neeraj")){  
               System.out.println("Neeraj is there");  
                 System.out.println(line);  
               }  
           }  
         }  
       }  
     }  
   }  
Output


E-Ticket
Paytm Booking ID : 4134491696
Booked on: 14 Nov 2017 09:35 PM
Customer Care
7053111905
Paytm Flight Support
7053111905
24X7 Care
paytm.com/care
*Always carry ticket and your ID proof while travelling
One97 Communications Limited, B 121, Sector 5, Noida - 201301


How to Read Password Encrypted PDF (Protected PDF)



 package com.neeraj.test.neeraj;  
 import java.io.File;  
 import java.io.IOException;  
 import org.apache.pdfbox.pdmodel.PDDocument;  
 import org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException;  
 import org.apache.pdfbox.text.PDFTextStripper;  
 public class PDFParser {  
   public static void main(String[] args) throws InvalidPasswordException, IOException {  
     try (PDDocument document = PDDocument.load(new File("C:\\Users\\bakhtani\\Downloads\\EAadhaar_285091794512_07122017141237_873152.pdf"), "password")) {  
       document.setAllSecurityToBeRemoved(true);  
       PDFTextStripper reader = new PDFTextStripper();  
       String pageText = reader.getText(document);  
       System.out.println(pageText);  
     } catch (IOException e){  
       System.err.println("Exception while trying to read pdf document - " + e);  
     }  
     }  
   }  

7 comments: