>im looking for a selfhosted alternative to the Wayback machine, where you can have the webpages saved and all the attacked elements like pictures, videos, and other stuff like that.
But the main thing that I would like: Updating of already downloaded webpages, and the ability to have links in saved webpages to to other saved webpages, just like the Internet Archive
You want a WARC file. Its the only standardized web archiving format and there are several programs to "play" the file. Just like the internet Archive, which is open sourced by the way. ;)
Oh I would definitely recommend you to use grab-site (to download the site) and then use Replay.Web (The application not the website!) to access that site because. Its almost as if you have Internet with a working connection but it works completly offline.
User Script with cron
>#!/bin/bash
>
>docker exec grab-site sh -c "grab-site https://google.com --no-offsite-links --no-video --no-sitemaps"
If the command line doesn't scare you too much you can use grab-site and tune the ignore regex to ignore all urls that don't fit the right product page syntax.
If you wouldn't mind, it would be awesome if you could share the archives afterwards!
First, try downloading the links using old.reddit.com rather than reddit.com if possible. reddit.com is a lot more complicated.
Try using a tool such as grab-site. https://github.com/archiveteam/grab-site
You can feed it a list of URLs from a text file (newline-separated) with grab-site -i <file>
It will record in WARC format which is better for preservation. To play back the WARC, use a tool such as https://replayweb.page
https://github.com/ArchiveTeam/grab-site is a very very easy way, you can even put the URLs in a file and pass it into grab-site (with the -i option). It doesn't run JavaScript unfortunately so if the website needs JS it won't do a complete backup (it'll back up the javascript, but not the websites that the JavaScript downloads). It saves into WARC which can be injested into the WBM or loaded at replayweb.page
I never used this script to download a subreddit, but it worked for other types of websites and there is even a chapter for downloading subreddits in the documentation.