Stitching Together Images from a PDF Generated by Microsoft

So I wanted to extract an image from a PDF. „Right-Mouseclick -> Save As” and I thought I was done. Unfortunately, I was wrong. I only got a slice of the image and not the whole image.

After some (non-LLM based) (re-)search, I learned that PDFs with a „Producer: Microsoft: Print To PDF” attribute tend to contain this „feature”. So how to remediate that?

First thing is to get a list of all the images. This is easily done with pdfimages (rather current version, based on poppler):

$ pdfimages -list damaged_by_microsoft.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image     440   198  rgb     3   8  jpeg   no         4  0   600   600 8666B 3.3%
   1     1 image     440   198  rgb     3   8  jpeg   no         5  0   600   600 8382B 3.2%
   1     2 image     440    90  rgb     3   8  jpeg   no         6  0   600   600 6621B 5.6%
…

Extracting those images is easily done with pdfimages as well. I prefer my images as .png, so I added the appropriate conversion flag to the command.

pdfimages -png damaged_by_microsoft.pdf dbm_image

This results in a bunch of „dbm_image-000.png files in the current directory. The cumbersome part starts here where you have to identify, which fragment is the first or last section of a certain image. In my case, I wanted the image from 006 to 026 and the image starting with 027 and ending at 054.

Noting down the index of the first and last fragment of the images we want to export, we can now stich those together using ImageMagick:

magick convert dbm_image-0{06..26}.png -append image01.png
magick convert dbm_image-0{27..54}.png -append image02.png

E voila! I just had to spend a couple of minutes figuring this out instead of just doing a „Save As” thanks to MIcrosoft’s genius in PDF export.

Based off PDF: extracted images are sliced / tiled - Stack Overflow

Tagged as: , , , , , | Author:
[Sonntag, 20250720, 11:35 | permanent link | 0 Kommentar(e)

Name: (required)
E-Mail: (required, never published)
URL: (not required, published)
Comments:
Select the checkbox, if you are a spam bot.
Markiere das Optionsfeld, wenn Du ein spam bot bist.
Use a working email address, unless you want your comment to be removed.

Disclaimer

„Leyrers Online Pamphlet“ ist die persönliche Website von mir, Martin Leyrer. Die hier veröffentlichten Beiträge spiegeln meine Ideen, Interessen, meinen Humor und fallweise auch mein Leben wider.
The postings on this site are my own and do not represent the positions, strategies or opinions of any former, current or future employer of mine.
Impressum / Offenlegung gemäß § 25 Mediengesetz

Search

Me, Elsewhere

Tag Cloud

2007, 2blog, 2do, 2read, a-trust, a.trust, a1, accessability, acta, advent, age, ai, amazon, ankündigung, apache, apple, at, audio, austria, backup, barcamp, basteln, bba, big brother awards, birthday, blog, blogging, book, books, browser, Browser_-_Firefox, bruce sterling, buch, bürgerkarte, cars, cartoon, ccc, cfp, christmas, cloud, coding, collection, command line, commandline, computer, computing, concert, conference, copyright, covid19, css, database, date, datenschutz, debian, delicious, demokratie, design, desktop, deutsch, deutschland, dev, developer, development, devops, digitalisierung, digitalks, dilbert, disobay, dna, dns, Doctor Who, documentation, Domino, domino, Douglas Adams, download, downloads, drm, dsk, dvd, e-card, e-government, e-mail, e-voting, E71, education, Ein_Tag_im_Leben, elga, email, encryption, essen, eu, EU, event, events, exchange, Extensions, fail, fedora, feedback, film, firefox, flash, flightexpress, food, foto, fsfe, fun, future, games, gaming, geek, geld, git, gleichberechtigung, google, graz, grüne, grüninnen, hack, hacker, handtuch, handy, hardware, HHGTTG, history, how-to, howto, hp, html, humor, IBM, ibm, ical, iCalendar, image, innovation, intel, internet, internet explorer, iot, iphone, ipod, isp, IT, it, itfails, itfailsAT, itfailsDE, java, javascript, job, jobmarket, journalismus, keyboard, knowledge, konzert, language, laptop, law, lego, lenovo, life, links, Linux, linux, linuxwochen, linuxwochenende, live, living, lol, london, lost+found, lotus, Lotus, lotus notes, Lotus Notes, lotusnotes, LotusNotes, lotusphere, Lotusphere, Lotusphere2006, lotusphere2007, lotusphere2008, Lotusphere2008, lustig, m3_bei_der_Arbeit, mac, mail, marketing, mathematik, media, medien, metalab, Microsoft, microsoft, mint, mITtendrin, mobile, mood, motivation, movie, mp3, multimedia, music, musik, männer, nasa, nerd, netwatcher, network, netzpolitik, news, nokia, Notes, notes, Notes+Domino, office, online, OOXML, open source, openoffice, opensource, orf, orlando, os, outlook, patents, pc, pdf, performance, perl, personal, php, picture, pictures, podcast, politics, politik, pr, press, presse, privacy, privatsphäre, productivity, programming, protest, public speaking, qtalk, quintessenz, quote, quotes, radio, rant, recherche, recht, release, review, rezension, rip, rss, science, search, security, server, settings, sf, shaarli, Show-n-tell thursday, sicherheit, silverlight, smtp, SnTT, social media, software, sony, sound, space, spam, sprache, spö, ssh, ssl, standards, storage, story, stupid, summerspecial, sun, surveillance, sysadmin, talk, talks, technology, The Hitchhikers Guide to the Galaxy, theme, think, thinkpad, thunderbird, tip, tipp, tools, topgear, torrent, towel, Towel Day, TowelDay, travel, truth, tv, twitter, ubuntu, ui, uk, unix, update, usa, usb, vds, video, videoüberwachung, vienna, Vim, vim, vintage, vista, vorratsdatenspeicherung, vortrag, wahl, wcm, web, web 2.0, web2.0, web20, Web20, webdesign, werbung, wien, wiener linien, wikileaks, windows, windows 7, wired, wishlist, wissen, Wissen_ist_Macht, wlan, work, workshops, wow, writing, wtf, Wunschzettel, wunschzettel, www, xbox, xml, xp, zensur, zukunft, zune, österreich, övp, übersetzung, überwachung

AFK Readinglist