The Shop > Software Tools

Any .pdf experts on the forum ?

(1/5) > >>

awemawson:
I have spent some time scanning the rather tatty manuals for my Fanuc Wire Eroder into searchable PDF files, and have got the bulk of the work done, but two aspects have me stumped.

Firstly, the first two pages of one manual are particularly grubby, and on originally a yellow back ground. I'd like to lift the text from them and drop it onto a clean page. Now you can search individual words on the page so I'm sure that it must be possible to grab the text somehow and drop the back ground - but how? (NB certain lines on these two pages have 'links' on them to other pages in the document that won't work as you only have the two pages not the rest of the document as it's 2 x 36 mByte !)

Secondly, the scanning process uses an OCR program to make the pdf searchable, which is excellent, BUT it also decides to rotate some pages into 'landscape' to get the text the right way up. I want them all in 'portrait'. The pages as scanned are rather varied in size, but roughly A4. I can use a utility to rotate the page, but if I then use another utility to make all the pages A4 sized, it inverts these pages. This seems to happen whatever order I do the conversion in.

It's frustrating as it's VERY close to what I want, but not quite there  :bang:

Total job is about 500 pages of scanned and OCR'd A4 which I've done, and gone through cleaning up grubby finger marks and edge effects - it's just these two aspects baulking me at the moment if anyone can help.

The scanner is a Plustek Opticbook 3800
The scanning software is 'Book Pavilion' and the OCR software it uses in the back ground is 'Finereader Sprint 9' both of which are bundled with the scanner.

The PDF utilities I've been using are: PDFill Editor ($20) and it's free PDF tools

http://www.pdfill.com/

Incidentally I chose this scanner as it can scan up to 2 mm from an edge, so a book can lay on it with half hanging down the front and still scan into the fold without leaving a blank place that would loose text in some books.

http://plustek.com/usa/products/opticbook-series/opticbook-3800/

Any help would be appreciated.

lordedmond:
Andrew
I am no expert with pdf. But the SIL uses it a lot to scan in using a scan snap scanner into the acrobat program ( not cheap ) and it's the acrobat program that should do what you want . He uses it for his business governed by the FSA so I cannot go int details but he does run a new Porsche

Note I do not mean the free reader but the one to generate the proper PDFs. Inc setting passwords and copy limitations

Stuart

Give it a try here
https://www.acrobat.com/en_us/free-trial-download.html

Edited for link

woodguy:
If I understand you correctly, you wish to extract the text and place it on a new page which would then have a background of your choosing.  If you scan the document to jpg or pdf, you can use a number of methods to extract the editable text, then place it on a new page.

See: http://www.wikihow.com/Convert-Images-and-PDF-Files-to-Editable-Text

Brass_Machine:
Hi Andrew,

I have an expert I can pose these questions to. Will let you know.

Eric

vtsteam:
Maybe Inkscape, Andrew.

https://inkscape.org/en/

(re-reading your post, not quite sure what you want to do exactly other than lift text off of yellow pages -- didn't quite get the other part. Inkscape will import pdf docs and convert them to editable svg or do many other things and will also work with text. It will also wrte pdfs.)

Navigation

[0] Message Index

[#] Next page

Go to full version