dissabte, d’agost 03, 2013

Converting some PDF files to XHTML with Tika

There must be a better way but so far this works:

for f in `find ~/Dropbox/.../Slides/Week?/*.pdf`;
 do java -jar tika-app-1.4.jar $f > `basename "$f" .pdf`.xhtml ; 
done


This will put all the generated files in the current directory, which is ~/Downloads/tika-1.4/tika-app/target, then you can search for a word in the PDF using plain old grep.