Step 1: Grab a list of 100 authority sites in your niche.
Step 2: Download Xenu Link Sleuth: http://home.snafu.de/tilman/xenulink.html
Step 3: Scrape the 100 authority sites for outgoing link errors.
Step 4: Check all outbound errors to find targets.
Step 5: Contact webmaster/steal links.
I'm not gonna harp on here, but a small step in the right direction (of reconstruction), if you have the DBs for your sites.
There is a key in the wp_options
table that lists off the active plugins. It's called active_plugins
. Sample output (run through an unserializer):
Array
( [0] => query-monitor/query-monitor.php [1] => debug-bar-console/debug-bar-console.php [2] => debug-bar/debug-bar.php [3] => enhanced-text-widget/enhanced-text-widget.php [4] => hookie-woocommerce/hookie-woocommerce.php [5] => log-deprecated-notices/log-deprecated-notices.php [6] => piklist/piklist.php [7] => timber-library/timber.php [8] => woocommerce-menu-bar-cart/wp-menu-cart.php [9] => woocommerce/woocommerce.php [10] => wordpress-seo/wp-seo.php [11] => wp-better-emails/wpbe.php )
Should give you something to work with. Keep in mind that WP will alter this array as soon as it detects plugin(s) missing, so examine your DB dump files before you re-connect each DB to a site.
In terms of media uploads, the database has references to all the uploaded media files, of type attachment
in the posts table. Their expected locations (URLs) are in the guid
field. You can use this information to reinstate original files to their expected names / locations on disk. and then use a plugin like regenerate thumbnails to recreate all the intermediate sizes.
If the media library is full of junk, and you want to restore only the required files, then get the site running, minus media, and then run a link checker across it, and then use the list of 404s to identify only the files you need to upload. I like Xenu's link sleuth, there are others.
I use xenu sleuth - crawls through your complete website, checks broken links, generates sitemaps etc.
You have to download it though so performance will depend on your hardware.
Its free to use.
It depends on the site, and how well they're protecting their files. I would encourage the use of a scripting language like Python, so that once you figure out how things are structured you can write a program to (fully or partially) automate it. See my comment here and try to apply it to the site in question.
I would also attempt to use something like Xenu's Link Sleuth to collect the site's pages (so that they may be fed into a python script) but I don't know how to make xenu "log in" so that it's allowed to see the pages. #19 of the FAQ makes it sound like you can use IE to log in, then let Xenu read IE's cookies. I don't know if it matters what version of IE you use.
Yep - Make sure you set up the right redirects, ahead of time and do as much testing as physically possible.
You will want to look in to using Xenu (or similar) to make sure that the links are all working on an internal development copy of the site before going live.
As long as you do your research beforehand, get the right redirects in place, and ensure that your WordPress setup is optimised before going live, you shouldn't have a problem.
You might see fluctuations for the next couple of months, but spend the time doing all your research and extensive testing now to save yourself loads of issues in 4 weeks time.
Good luck!
As a developer on PC:
Run site through Inspyder InSite(paid, but as a professional developer, worth the price) to do both spell check on all pages (and ALT/META) and 404 checks.
At the very least run it through Xenu Link Slueth(free) for 404s
If you have been developing on a development server first, make sure that robots.txt is changed to not block everything like when on the dev server.
Xenu's Link Sleuth may be helpful on the frontend as far as identifying your current site taxonomy as well as generating an orphan file report.
http://home.snafu.de/tilman/xenulink.html
Some helpful documentation:
http://www.integralworld.net/xenu/
It's good starting point albeit it will probably just get you the tip of the iceberg.
Crawl your site with a basic spider application such as Xenu
Make a matrix of 'old page' and 'new page' in excel etc
Take rules and implement in .htaccess / whatever rewrite modules you're doing as a 301. Have done this for large major sites - it works.
May I suggest using Xenu. It is a link check however if you provide it with FTP details (done after the initial link check) it will do a search through FTP to find any files that exist on in FTP that are not linked to by any other files.
This however doesn't mean that it is 100% foolproof as something may link to it externally or if the file is a server side include or something like that. But this is the simplest way I have found to deal with this problem when you only have FTP access.
A valid concern. Apart from just checking the sites yourself every now and then you could use the xenu link sleuth or w3c's link checker to verify that your outside links are all working correctly. However, that just tells you if the hosts are still up and won't inform you if they've changed the content. In the end I think it's a worthwhile risk to take. I know I'm far too lazy to manually type in the links to visit the site but going to the actual site would be a huge boost in my analysis of your skills. I guess if you really wanted to you could self host some of the sites, thus keeping the content the same, but that's taking it a step too far.
A good program for this is Xenu yes, it has a tad old scool look and feel, but it is an old school tool that still does a great job. It will crawl an entire site, and then when done, sort it by the "Address" column to find ll the ones that start with http:// and update. (Right click on the address, and choose "Properties" and you will see all URLs for your site that use it)
Here are 3 examples of link hecking softwares. Some opensource, some closed source. Link Checker, Linkcheck, Xenus Link Sleuth. Maybe one of them is to your likings and needs.
If you have less than 500 urls, you can use Screaming Frog SEO Crawler for free, although I'm not sure how much data the free account gives you. I've always used the paid version.
An alternative is Xenu which is free, but I have no experience with it.
http://home.snafu.de/tilman/xenulink.html
Crawl your website, look at the Redirects tab and see which pages link to these 30x redirects. Then work your way through them and update the internal links to point to the final urls.
If you're linking to page which redirects more than 5 times, GoogleBot will give up following the redirects.
By this I mean...
Page A to Page B to Page C to Page D to Page E to Page F
Google will get as far as Page E through the redirect chain and give up following them. So if Page F is only linked to through this redirect chain. Google may never find it as a result.
Bit more to it than this, but keeping things simple
The best trifecta for any small bussiness: Crawling your site Xenu http://home.snafu.de/tilman/xenulink.html (all the onsite information is there) backlinks analysis, follow up and competition linksspy https://www.linksspy.com (all the ofsite information is there) reporting and data analytis Tableau http://www.tableau.com (all the visual aspects are here), if you use any of them and have problems post it here and I will happily answer your questions.
I honestly don't think you need to hire a freelancer (your inbox should be exploting right now) you just need the right tools, Crawling your site Xenu http://home.snafu.de/tilman/xenulink.html (all the onsite information is there) backlinks analysis, follow up and competition linksspy https://www.linksspy.com (all the ofsite information is there) reporting and data analytis Tableau http://www.tableau.com (all the visual aspects are here), if you use any of them and have problems post it here and I will happily answer your questions.
I am 80% Xenu works with passwords, and I just checked, it maaaay work on Linux http://home.snafu.de/tilman/xenulink.html ("I have been told that it runs faultlessly under Fedora 13, Red Hat 8, Ubuntu, Kubuntu 14.04 and OS X via wine or WineBottler, and under Crossover on a Mac")
i opened this up in a tab and forgot about it, then, after cooking a meal i came back to my computer and was closing tabs to start redditing from fresh. Then i came across this link, i browsed it for 15 minutes before realizing that it was linked from here. This is very interesting and wierd
edit: after some digging around i realised that http://kouncool.com/wp-content/uploads/2014/10/ has LOTS of files in it edit2: more digging and i found this Imgur, that led me to find this 'Xenu's Link Sleuth (TM) checks Web sites for broken links'
I just did a search in Numbers for "macro" and nothing came up. Did a quick google search and someone in a forum confirmed that there was no macro language for Numbers.
Of course if that's all you use it for there are apps like this and others that will take care of that task. We used to use this app at work which apparently works with Crossover on the Mac.
Screaming Frog falls in the "sort of" free category. If you have more than 500 pages I'd recommend either paying for the license ($100/year--totally worth it) or trying Xenu's Link Sleuth (free, but not quite as good: http://home.snafu.de/tilman/xenulink.html).