Read Word and PDF files from PHP
In most of the big web applications we will have to deal more or less with file manipulations. Sometimes we need to create some config files, create PDF files for generating reports or reading XML feed and parsing data. Since most of the servers that host PHP are Linux based, real question here will be, how we can read Windows files, for example .doc or .docx files created in Word or PDF file in order to retrieve data from given files? Fortunately for us, there is a couple solutions to accomplish this. We will here present the most simple ones. For reading Word files we will use antiword. Installing antiwordon Linux system is really strength forward cause antiword is in most of Linux repositories. So, we can for example install it on Ubuntu via:
apt-get install antiword
After installation, usage is very simple. We will simply load whole file content in one variable and later we can manipulate with it.
$fileContent = shell_exec('/usr/local/bin/antiword '.$filename);
foreach( $fileContent as $line )
{
print $line."rn";
}
In above example we output whole file content line by line. Pretty same as with Word documents, we can read PDF files. In that purpose we will use XPDF package. Installation is a bit complicated, but you can download binaries for Linux or MacOS. After you setup Xpdf usage is pretty the same as with antiword:
$fileContent = shell_exec('/usr/local/bin/pdftotext '.$filename.' -');
and later we can manipulate with content as we like. If you have easier way to read Word files or PDF file, please write.
No related posts.
Hi, i am Vladimir Popov, web developer and CEO of Webmarket, Internet company located in Belgrade. I am passionate about Web technologies and creating new features on the Internet and make it a better place to live in.
AntiWord seems to have stopped development 7 years ago and doesn’t handle Word 2007 or 2010 documents.
You may find interesting PHPDocX (http://www.phpdocx.com) that lets you manipulate word documents (.docx) with PHP.
Best regards,
Eduardo