Convert Pdf to Word in Java Example

Convert Pdf to Word in Java Example

Convert Pdf to Word in Java:

Required Jars:
1. itextpdf-5.4.4
2. xmlbeans-xpath-2.3.0
3. xmlbeans-2.6.0
4. poi-3.9
5. dom4j-1.6.1
6. poi-ooxml-schemas-3.7
7. poi-ooxml-3.7

Java Program:

package in.javadomain;

import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.poi.xwpf.usermodel.BreakType;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;

public class ConvertPdf2Word {

	public static void main(String[] args) throws IOException {
		System.out.println("Document converted started");
		XWPFDocument doc = new XWPFDocument();
		String pdf = "D:\\javadomain.pdf";
		PdfReader reader = new PdfReader(pdf);
		PdfReaderContentParser parser = new PdfReaderContentParser(reader);
		for (int i = 1; i <= reader.getNumberOfPages(); i++) {
			TextExtractionStrategy strategy = parser.processContent(i,
					new SimpleTextExtractionStrategy());
			String text = strategy.getResultantText();
			XWPFParagraph p = doc.createParagraph();
			XWPFRun run = p.createRun();
			run.setText(text);
			run.addBreak(BreakType.PAGE);
		}
		FileOutputStream out = new FileOutputStream("D:\\javadomain.docx");
		doc.write(out);
		out.close();
		reader.close();
		System.out.println("Document converted successfully");
	}
}

 

Input: [pdf file]
pdf input

 

Output: [word file]
word output

Recommended Books:

6,450 total views, 6 views today

11 thoughts on “Convert Pdf to Word in Java Example

  1. I am unable to get the exact format from a pdf to doc or docx if the pdf is in a tabular format.
    The structure gets distorted. Can you please help.

  2. Docx4j is the only open source api which is efficient in converting docx to pdf without compromising the format and styling but catch there is it does not handle space and tabs in documents which keeps the problem unsolved. So I have been doing a lot of research in this area, I have not been able to find a single perfect api in java which converts doc or docx to pdf without compromising the format and styling.

  3. i amusing your code but getting the following error.can u plz help me
    Exception in thread “main” java.lang.NoSuchMethodError: org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR.getPictList()Ljava/util/List;
    at org.apache.poi.xwpf.usermodel.XWPFRun.(XWPFRun.java:75)
    at org.apache.poi.xwpf.usermodel.XWPFParagraph.createRun(XWPFParagraph.java:266)
    at com.tcs.ConvertPdf2Word.main(ConvertPdf2Word.java:27)

Leave a Reply

Your email address will not be published. Required fields are marked *