There is an extension called SingleFile that saves entire webpages as a .html file. I then pin that to my IPFS node.
Project: https://github.com/gildas-lormeau/SingleFile
Chrome: https://chrome.google.com/extensions/detail/mpiodijhokgodhhofbcjdecpffjipkle
Firefox: https://addons.mozilla.org/firefox/addon/single-file
It seems like an okay format. Maybe your objection only has to do with some software for that format?
HTML is more annoying when saving a page involves having a file and a folder with a bunch of files. Some browsers can't even properly save some pages. https://github.com/gildas-lormeau/SingleFile is a good way to save web pages into one file, but why do I need a browser extension? Plus, it would be nice if saved pages could be compressed like PDF.
protip! theres a browser extension called SingleFile that will download a webpage and save it to a html file. its much better for important stuff like this that you want to look at again in the future
Maybe SingleFile is what you're looking for. It saves a webpage as a single HTML file which preserves the images and layout. It's available as a browser addon or as a cli program.
I've been using SingleFile to capture web pages 'as is' for archival purposes. I then index these using recoll for full-text search. It works pretty well! My hope is that the archives will be useful not only for their content, but to see what the web looked like over time.
Yes it does. Like I downloaded a bbc webpage and it took more than 5 mins to show up in downloads. Anyways if you feel anything related to addon not working like it should or should have provided a better way then you can contact the dev of this addon (https://addons.mozilla.org/en-CA/firefox/addon/single-file/) under more information, support email. He told me this issue which is just chromium browser issue. I contacted him back a long time ago and he seems to be a good guy. Do rate his addon and promote his addon because a lot of users dont know of this. Also if you feel like you can use it on desktop. Also do share this kiwi browser to others and tell them you can have addons on your mobile. I love people ditch chrome and use kiwi. Chrome is nowhere what users want, kiwi knows us.
These days, there are even tools to just rip the entire page: https://github.com/gildas-lormeau/SingleFile
It's trivial.
Basically:
It can even work with pages that use JS to render.
You could try installing SingleFile and saving it with that. For video content you could try right-clicking and saving the video files manually if there aren't too many.
What archive format did you use? Did you use a backend server?
WebScrapBook focuses more on web page annotation/editing, fulltext search, and sidebar organization, with the need of a backend server. It also supports more archive formats.
The single-HTML web page archive format, supported by both WebScrapBook and SingleFile, is more convenient to use but has more limitations (e.g. in-depth capture and downloading linked files) and is generally larger in size and has worse performance. You probably need to first determine whether it's what you want. See related description 1 and description 2 for details.
BTW, I don't think SingleFileZ really surpasses MAFF or HTZ. It actually requires a browser extension or a special browser configuration (which opens a security hole) in most cases and is likely not available on mobiles, which is hardly different from the counterparts. It also has larger size due to the self-extracting code.
If you want single-HTML anyway, a key difference is that SingleFile focuses more on size compression while WebScrapBook focuses more on fidelity. Although WebScrapBook can be tweaked for smaller size with a sacrifice of some minor information, SingleFile compresses the HTML and CSS code more aggressively, which, unfortunately, is also more likely to break the web page.
(Disclaimer: I am the author of WebScrapBook)
I may able answer my own question. I saved each of the different fonts in html using SingleFile. I then looked for font-family. This time it saved the text of the paragraphs and had a font-family listed in line 100. However none was listed for publisher's default.
Scholar
font-family:Seravnek,Thonburi,Helvetica Neue,sans-serif
paperback font-family:Athelas,Baskerville,Bookman Old Style,Palatino Linotype,Cochin,serif
OpenDyslexic font-family:'OpenDyslexic'
custom old style font-family:Athelas,Baskerville,Bookman Old Style,Palatino Linotype,Cochin,serif
sans serif font-family:Helvetica Neue,sans-serif
I desperately want something better than Raindrop and this certainly looks interesting and more flexible than Pile which I’m also keeping an eye on.
How flexible are the tags? Do they support multiple words (a absolute requirement for me and the reason Pinboard has never worked) and hierarchical tabs (nice to have)?
Does it do any kind of visual bookmarking (like Pile or Pinterest?). For a lot of links I just want more-or-less standard bookmarking but I really want something visual for certain projects. Raindrop also does a decent job of this with options for what to use where. I just really, really would love native and stable. Automation would be huge too.
It’s a bummer you’ve ruled out highlighting, I wouldn’t use a read later service that doesn’t sync to Readwise (or Hypothes.is). I don’t need offline, just highlighting integration of some kind. Thankfully, Matter already has reading apps covered though, at least once they get around to a Mac or web version, Readwise’s reader is also on the horizon.
Last up, webarchive is deprecated or the next thing to it last I checked, I’ve moved on to SingleFile and it’s so much better than webarchives ever were IMO. PDFs are a cluster, I don’t want to touch them for archiving web content.
This looks great and I’ll be eagerly watching it, I’d probably buy it even if it doesn’t do a lot of what I want.
I like SingleFile, but it doesn't retain styles, etc.
I'd personally use SingleFileZ, as it seems to be the most complete "archive".
WebScrapBook just seems bloaty to me, but haven't tried it and have no need for my personal notes in the archives.
Duuude, I just tried https://github.com/gildas-lormeau/SingleFile/ from that Wiki with all the sites I have in ArchiveBox that don't have images, it's perfect. All images there, everything. Implement that in ArchiveBox!
The images are in fact embedded into the file, not loaded externalley. e.g. https://www.toptal.com/developers/hastebin/ulipekiqep.txt
Well I was on r/datacurator and there is an addon called single file, which you install in a web browser, and it will save a webpage with all formatting and everything else as a html file that he can open in any web browser.
I'd use SingleFile (Chrome or Firefox extension) to grab the page with style and everything into a single HTML file. From there it is probably easy to upload to your Wiki to be used from there.
You can also choose a different hosting for this "web clipping" and simply link them from your Wiki. I wrote htmls-to-datasette to process the files into searchable documents.
See https://github.com/gildas-lormeau/SingleFile and https://github.com/pjamar/htmls-to-datasette
Thanks for your support, I managed to run scripts with SingleFile.
I installed SingleFile manually as described here. (I was using Docker previously)
single-file --browser-executable-path=/usr/bin/google-chrome-stable --browser-script=/path/to/scripts.js 'https://www.google.com' google.html
You might like grab-site. I've found it to be very powerful, though with a bit of a learning curve and with only WARC-format output. Its ignore behaviour is *so* much better than wget: it *skips ignored pages entirely* rather than downloading them and *then* deleting.
I've heard that SingleFile--which is primarily used for individual pages--can be used as a whole-website scraper through its command-line interface, but I haven't tried that out myself.
There's a Firefox and Chrome extension named SingleFile that saves a web page into a single HTML. Look at it from links below.
Le tri pragmatique : par exemple tout ce qui plus de 1 mois (ou moins c'est toi qui vois) => poubelle
Archivage : wallabag/pocket ou enregistrer en pdf sur son disque dur ou avec la capture d'écran (sous Firefox : clic droit > Effectuer une capture d'écran > Capturer la page complète) ou en 1 seul html complet avec l'extension de navigateur SingleFile
Mettre les liens de côté : tes marque-pages sur navigateur (avec synchronisation ou non dans le [clahoude]) ou un service de marque-page en ligne ou auto-hebergé (genre shaarli).
I don't mean to hijack the post, but I just wanted to make it easier for everyone to arcive what things they see, so whenever someone tries to call you out saying something is bs, you can prove yourself.
Singlefile is a webpage grabber (Firefox version). But there is a Github too for pretty much every browser version PC/Mobile.
It's pretty easy to use. And you can customize what it downloads. It might take awhile to save an entire site depending on your computer specs/internet speeds. But it's thourogh, and won't miss anything. I'm too poor to buy shares, but I'll contribute what I can to do my part. :)
Hold the line everybody!
It's not quite what you're asking, but as a workaround you could use SingleFile to save the page and then read the local copy instead. All JavaScript is stripped by default.
If you're not familiar with wget and want an exact copy of the page with images, links etc, I highly recommend the SingleFile browser plugin. https://github.com/gildas-lormeau/SingleFile
Try SingleFile. It installs into your web browser as a button. SingleFiles will not download a whole website but allows you to download your current viewed html web page into a single html file that you can save locally. And those files can be opened by any web browser without SingleFiles installed.
Thank you. The hour is formatted according to your preferences on your operating system. The code of SingleFile is indeed open-source (AGPL license). To install manually the extension, go to about:debugging#/runtime/this-firefox, click on the button "Load temporary Add-on..." and select the "manifest.json" file where you unzipped "SingleFile-master.zip".
There is a (small) chapter about performance issues in the FAQ, see https://github.com/gildas-lormeau/SingleFile/blob/master/faq.md#singlefile-is-slow-on-my-computertabletphone-can-it-run-faster.