Unfortunately, PDFs are not static like that. The timestamp alone is enough to perturb the hash, but there are other factors. Apache FOP might render source objects to PDF pages with different PDF primitive structures across releases. We use Ghostscript to compress the final PDF, and that can introduce differences in float rounding, formatting, and object ordering across releases.
Ultimately, what matters is that the content looks the same to the user's eyeballs when all is said and done.
Ghostscript could crush that for you. Once you're in the directory containing the input PDFs, the specific command is:
gswin64c -sDEVICE=txtwrite -o output.txt input.pdf
There's also the pdftotext tools in xpdf library and a pip module called pdfminer.
Whichever flavor you prefer!
I would strongly advise against rasterising a vector image before converting to ps/pdf. Especially when you are concerned about quality. These are vector formats (actually, they are programming languages, but that is a story for another time). Converting from svg to ps/pdf has been done before: https://www.ghostscript.com/download/gpdldnld.html
Aye, it took me a couple hours. Eventually I found a combination of imagemagick or maybe Ghostscript options that seemed to produce Lulu-compatible pdfs but I didn't write it down much to my dismay; I tried to use Lulu again and couldn't get it to accept the pdf at all.
Local printers as mentioned above are probably more reliable and what I'll try next.
I would use ghostscript to do something like this.
https://stackoverflow.com/questions/30284327/reverse-white-and-black-colors-in-a-pdf
You can get GS here: https://www.ghostscript.com/download/gsdnld.html
If your looking to do something like we do at work (where the SQL server/application tells the PDF printer where to put the files) then your going to need something more than CutePDF and it's not going to come cheap at all and it's going to require development to integrate.
If your just looking for PDF printing in general, Microsoft PDF Printer is the only reasonable option.
Also I might add that Ghostscript is clear on how licensing needs to be handled and if it's not clear to you, then have legal/a lawyer review it. https://www.ghostscript.com/license.html
I'm not an expert on this... my days of trying to manipulate PDF contents are far far gone. But, maybe look into https://www.ghostscript.com/? While it's mostly for PS, it used to have functionality for PDFs too.
You have been provided with RegEx-centric approaches which is OK, nonetheless I feel this is were you have to balance between what you want and what works better. IMO the balance is achieved with a mixture between RexEx and the scripting language itself:
numbers := "", p := 1
while p := RegExMatch(Clipboard, "sU)(5\d+-00[1-3]).+> (needs|new)", match, p)
{
if (match2 = "needs")
numbers .= match1 "n"
p += StrLen(match)
}
MsgBox % RTrim(numbers, "
n")
Having said that, why use the Clipboard? That means that you need to open the file, select, copy and trigger the script. I'll use something like Ghostscript to convert the PDF into text or pdftotext
that is part of XpdfReader if you find Ghostscript too cumbersome.
Of course those are just my 2c, also the code above is untested. Good luck!
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/$quality -dNOPAUSE -dQUIET -dBATCH -sOutputFile=$outputfile $inputfile
;)
Wobei $quality
einer der hier genannten Werte ist.
(Und sorry für alle nicht-Linuxer, die hiermit wahrscheinlich nichts anfangen können)
doc-view requires ghostscript to be installed, can you check if you had it installed?
You can algo give pdf-tools a try, I use it on Mac with pdf-tools-install
with nix
I've had good results with PDFTools api as well as GhostScript.
GhostScript is especially good for the whole text-extraction thing.
Here's some quick and dirty example-age: https://stackoverflow.com/questions/6187250/pdf-text-extraction
And, of course, you can find more at https://www.ghostscript.com/
This worked (though on Windows you also need Ghostscript https://www.ghostscript.com/releases/gsdnld.html)
Only part that didn't work was the %3d part, but if that part is removed, then it just exports as ...1, ...2, ...3, etc. Not a huge deal and some quick research reveals that the process would have to be converted to a batch file with some code to make it work, something I personally am not willing to do lol
I use ghostscript for that:
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile="small.pdf" big.pdf
There very well may be a better set of flags for this, but it's what I've been using for awhile and it's worked well for me.
You can reduce the size of your pdf by using ghostscript; example:
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=reduced_file.pdf file_to_reduce.pdf
There are a lot of options, especially concerning the output format.
The man page is very complete.
And windows as well as macOS. Ghostscript is available for pretty much any platform and OS.
Extract pages 2-10 from a PDF:
gs -dBATCH -dNOPAUSE -dSAFER -sDEVICE=pdfwrite -dFirstPage=2 -dLastPage=10 -sOutputFile=output.pdf input.pdf
Combine two PDFs into one:
gs -dBATCH -dNOPAUSE -dSAFER -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -sOutputFile=combined.pdf first.pdf second.pdf
Info here (apparently OS X can render PS 3 fonts? I’ll have to experiment) I guess one of the reasons Type 3 was abandoned is that it doesn’t support “hinting”? And GhostScript looks promising as a font renderer.
A variable and a file are different things. Ghostscript expects the name of a file to write to, it doesn't know what a PowerShell variable is.
However based on How to Use Ghostscript
"Ghostscript also accepts the special filename '-' which indicates the output should be written to standard output (the command shell)."
So something like
gswin64c.exe -dQUIET -o - -sDEVICE=eps2write "C:\Path\inputfile.pdf"
may work.
I have Acrobat Pro with Photoshop, so I used that.
https://helpx.adobe.com/acrobat/using/pdf-layers.html#edit_layer_properties_acrobat_pro
I'm sure you could do the same thing on the command line with Ghostscript, but I don't personally know how.
This is a bit of a tricky solution, but it will allow you to batch crop all the pages without having to do it manually:
I would first scan in all the pages, keeping your originals aligned to a fixed position (for example, top left corner), while allowing the printer to add the unwanted white space to the remainder of each page. I assume these will all be saved into a single PDF file.
Next, convert your file from PDF to TIF. Install Ghostscript (link below) and use the following command to do the conversion (change your paths and filenames as needed). This command will convert at exactly 300dpi, but you can change that depending on your needs:
https://www.ghostscript.com/download/gsdnld.html
c:\progra~2\gs\gs9.27\bin\gswin32c -dNOPAUSE -dBATCH -dSAFER -r300 -sDEVICE=tiff24nc -sOutputFile=outputfile.tif -c save pop -f inputfile.pdf
Next, install Irfanview (link below).
Open Irfanview and go to File > Batch Conversion. Click "Use Advanced Options", then click the Advanced button. Select CROP, then input the crop area: Starting at the top left corner of the page, the X value refers to the horizontal axis, with 1 being the far left position. The Y value refers to the vertical axis, with 1 being the topmost position. So, if our desired page size is a 2 inch square at the top left corner of a 300dpi page, our X-Pos=1 with Width=600 and Y-Pos=1 with Height=600. Leave all the other options alone and click OK.
Back in the Batch Conversion window, just specify the TIF file created in the first step, then specify an output file with a new filename. Click Start Batch when you're ready, and Irfanview will crop all pages of your TIF file in a single step.
Check the new file and adjust the X-Pos and Y-Pos values above as needed (use a higher X-Pos number to shift the crop area to the right, or a higher Y-Pos number to shift the crop area further down).
Good luck!
First, convert (rasterize) your PDF file from PDF to TIF. Install Ghostscript (link below) and use the following command line entry to do the conversion (change your paths and filenames as needed). The following command will rasterize at 203dpi for your Zebra printer, but you can change that as needed:
https://www.ghostscript.com/download/gsdnld.html
c:\progra~2\gs\gs9.27\bin\gswin32c -dNOPAUSE -dBATCH -dSAFER -r203 -sDEVICE=tiff24nc -sOutputFile=outputfile.tif -c save pop -f inputfile.pdf
Next, install Irfanview (link below).
Open Irfanview and go to File > Batch Conversion. Click "Use Advanced Options", then click the Advanced button. Select CROP, then input the crop area. Based on your sample image at 203dpi, I would guesstimate X-Pos=206 with Width=812 and Y-Pos=330 with Height=1218. Leave all the other options alone and click OK. Back in the Batch Conversion window, just specify the TIF file created in the first step, then specify an output file with a new filename. Click Start Batch when you're ready, and Irfanview will crop all 100 pages of the TIF file in a single step!
Check the new file and adjust the X-Pos and Y-Pos values above as needed (use a higher X-Pos number to shift the crop area to the right, or a higher Y-Pos number to shift the crop area further down).
Good luck!
after googling and reading on this topic.. i wonder if you may manage it with..
ghostxps
https://www.ghostscript.com/download/gxpsdnld.html
also this software seems to claim to manage XPS printers.
https://www.papercut.com/kb/Main/SupportedPrinters
Sorry i cant be of much more help. I just like to google, and used to have a job managing printers years ago, and had never heard of XPS before. It has a surprising lack of linux/cups of info about it.
I can find info on converting XPS to pdf, or other formats, but you need to go the other way and print to XPS to send the file to the printer correct?
Only reason: i have more experience with it, and stuff is faster to do it
it still has many annoying bugs when importing PDF and AI files, exact same behavior from the one I was using before (X6, year 2013). Probably because the engine underneath is exactly the same, the ancient ghostscript 8.64 (https://www.ghostscript.com/release_history.html - 12 years old!!!!! AGES old!!!!!)
i upgraded to 2019 and i did not chose the upgrade protection as for example one of my killer apps (finecut) get updates very slowly so for example now they published the update that let it work with 2018, with 2019 you can't install it yet. (anyone know a workaround?)
2019 will be my last one. If after 4 years i would need a newer version it still would be cheaper to buy a 700 euro license than 350+100 euro in 2019 + 300 euro in 2020-21-22. Plus the undeniable advantage that being a totally full version i can put the 2019 version in an older PC while having the 2022 version on the new one
My first programming job (long ago) was at a print shop. We used to use PHP to generate Postscript for all kinds of automated workflows. It's a bit antiquated, but it's very powerful. We were doing things like press sheet layouts and generating huge raster(?) files for the litho machines.
​
There are most likely decent tools for doing postscript-y things in go. Some quick searching found this library, which looks promising. There are also ancient, super mature tools like Ghostscript that can make quick work out of most postscript.
​
SVG might be easier nowadays, depending on the task. I've never looked into it.
xclip -o | enscript -p- - | ps2pdf - "Clipboard of '""$(xclip -o | cut -c 1-10)""'"
Should suffice..... You can tidy it up or format it by using ghostscript commands passed to ps2pdf
Ghostscript could extract text from PDF but it likely won't retain the same structure as you see in the PDF.
Image processing is futile after it become a PDF. Hire somebody to copy it for you might be a more practical approach.
>Could it be that Solus uses different paths?
Something to look into. I'm not using Solus at this time. In 2018 I'll make my switch to Solus. I don't think, I'll have a problem installing my Samsung Xpress M2070FM Laser printer. So I can't guide you on $PATH if this is the case for you.
Do you have ghostscript install? This helps me all the time to view .pdf files.
Ghostscript is licensed under AGPL (restricts network deployment), and if you need to use lib based on ghostscript you need to purchase (expensive) commercial license as well.
I can recommend to use alternative solution based on poppler tools. You can deploy it with your asp.net app and run it directly with System.Diagnostics.Process, or use existing poppler .net wrappers that will do that for you.
So, you want to thumbnail all the pages, one image per page?
>There is no upper limit to the number of pages in the pdf.
And there's your problem with using ImageMagick for round 1: it starts off by loading everything into memory, at full resolution.
Assuming you want to do this, I would suggest:
First, turn into a ghost. Dying might be required.
Afterwards, write something using your own hands. I'll call this "postscript" ("post" because is after your death, "script" because it is your handwriting).
Then, just use the Ghostscript tool to convert to PDF.