Convert Pdf to Word in Java Example

Convert Pdf to Word in Java

Required Jars

  1. itextpdf-5.4.4
  2. xmlbeans-xpath-2.3.0
  3. xmlbeans-2.6.0
  4. poi-3.9
  5. dom4j-1.6.1
  6. poi-ooxml-schemas-3.7
  7. poi-ooxml-3.7

Java Program to Convert PDF to Word

package com.ngdeveloper;

import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.poi.xwpf.usermodel.BreakType;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;

public class ConvertPdf2Word {

public static void main(String[] args) throws IOException {
System.out.println("Document converted started");
XWPFDocument doc = new XWPFDocument();
String pdf = "D:\\javadomain.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i,
new SimpleTextExtractionStrategy());
String text = strategy.getResultantText();
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
run.addBreak(BreakType.PAGE);
}
FileOutputStream out = new FileOutputStream("D:\\javadomain.docx");
doc.write(out);
out.close();
reader.close();
System.out.println("Document converted successfully");
}
}

Input: [pdf file]

pdf input

Output: [word file]

word output

12 comments

  • dhanush

    Can you please send me te code to convert from word to pdf using itext..

    • User Avatar Naveen

      Source code provided in the post itself. Are you facing any issue ? if so please post the errors here to look and solve it.

  • Dhanush

    Can you please send me the code to convert from doc to pdf…

    • User Avatar Naveen

      Source code provided in the post itself. Are you facing any issue ? if so please post the errors here to look and solve it.

  • poonam

    I am unable to get the exact format from a pdf to doc or docx if the pdf is in a tabular format.
    The structure gets distorted. Can you please help.

  • Docx4j is the only open source api which is efficient in converting docx to pdf without compromising the format and styling but catch there is it does not handle space and tabs in documents which keeps the problem unsolved. So I have been doing a lot of research in this area, I have not been able to find a single perfect api in java which converts doc or docx to pdf without compromising the format and styling.

  • I'm not a developer, i always use this free online pdf to word converter(http://www.online-code.net/pdf-to-word.html) to convert pdf to word online.

  • Bavaraj

    i amusing your code but getting the following error.can u plz help me
    Exception in thread “main” java.lang.NoSuchMethodError: org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR.getPictList()Ljava/util/List;
    at org.apache.poi.xwpf.usermodel.XWPFRun.(XWPFRun.java:75)
    at org.apache.poi.xwpf.usermodel.XWPFParagraph.createRun(XWPFParagraph.java:266)
    at com.tcs.ConvertPdf2Word.main(ConvertPdf2Word.java:27)

  • kaminee patil

    plz send code for arabic language pdf to word?

  • Nice post, useful for me.

  • devpro

    this is not valid code , it will only do a sample word copy , but if your pdf has image or table this will not work.

Leave a Reply