The wider your vdev, the longer resilvers will take and (for the same reason) the lower your IOPS will be. Really wide rust stripes can end up requiring weeks to scrub or resilver, with significant performance issues during scrubbing or resilvering due to the low IOPS involved.
> I have 14 disks 900G each
Why? That's an incredible amount of power consumption and initial expense, just to end up with a vdev that would get stomped into the dirt by a pair of 10T mirror vdevs.
edit: new HGST He10 * 4 = $1320; HP 900GB 2.5" disk * 4 = $1540
This is able to be done in NixOS.
We don't have a graphical installer, but installation instructions are here: https://nixos.org/manual/nixos/stable/index.html#sec-installation
NixOS with ZFS root instructions can be found here: https://nixos.wiki/wiki/NixOS_on_ZFS#How_to_install_NixOS_on_a_ZFS_root_filesystem
However, NixOS is very different from other distro's. I would not recommend it for people who want to "get up and running". Once you become familiar with NixOS it's an extremely liberating and empowering distro, it's just not an easy journey to get to the promised land.
I assume your current capacity is roughly 12 * 8 (give or take) so I'll round up to 13 * 8 ~ 104TB.
Even moving that volume of data across the net is going to take you some time. I'm not very confident you'll be done in a month.
Currently, once all your drives in your VDEV are replaced by a larger capacity, zfs can/will expand to make use of the larger sizes. From what I understand it is an all-or-nothing option - so you're looking at some serious outlay of $$$ to replace 14 x drives to the new capacity (I think things are better now with Chia sort of dying a slow death).
If you're anyway going to be replacing the drives, I'd say do it instead of the whole migrate to cloud and back option. Plus you could probably have secondary storage with those replaced disks as well. Win-win overall assuming you are OK with the cash for it.
If you're really out of options and want to go cloud, take a look at Hetzner's auction boxes (https://www.hetzner.com/sb) - there are a few 10x10TBs and 15x10TBs that you can use for a month or two depending but they're definitely in the € 250-300 range per month. The machines are obviously good but depending on where you are, network throughput will likely be bottlenecked to really push that much data consistently to fill them up in short order. Also, there are a few 4x16 TBs that come and go (rough price point of € 65) but you'll need many of them assuming you want some RAID level on them to keep your data safe as you migrate back-and-forth.
Hope this helps and good luck!
The ashift is actually defined per vdev not per zpool. ZFS is designed to query the disks to find the sector size when creating/adding a vdev, but disks lie and you possibly might use a mixture of disks even in the same vdev (not usually recommended but workable).
You should just create your vdevs with ashift=12 unless you know the vdev will only ever use native 512b disks. Though I think this is pretty unlikely and somewhat nearsighted to limit yourself like this.
zpool create -o ashift=12 tank mirror sda sdb zpool add -o ashift=12 tank mirror sdc sdd
This goes for OpenZFS, so Illumos, BSD, Linux, and OSX. It could differ on Solaris as they don't use OpenZFS, I don't think they support the -o ashift=12 parameter.
The upside is you will have great performance with 512b and 4k drives. The downside is you will have more wasted space overhead depending on your exact vdev configuration.
See: for some more info about the ashift=12 overhead.
https://web.archive.org/web/20140403012030/http://www.opendevs.org/ritk/zfs-4k-aligned-space-overhead.html
Do you need to backup ZFS datasets including metadata, etc or just the filesystem data?
For the former I think the best you're going to do is a zfs send to a gzip archive then copy those over, but I don't think you can really do an incremental.
For just the filesystem data, I think good old fashioned rsync or rclone (seriously check out https://rclone.org/ its awesome) will suit you best.
I usually go the ddrescue route right away. I can see the argument for copying the data you know you need first, but going the ddrescue route you only need to read the data once.
Yes, the PC firmware can usually only load things from a fat32 partition (macs can also access HFS+) and so the loaders and other tools live in the EFI subdirectory of the ESP partition.
You'll find these files in subdirectories in/boot/efi/EFI
.
You can also put the kernel and initrd there too and then the boot directory can live on the zfs filesystem. E.g.: /boot/efi/EFI/debian/vmlinuz
and /boot/efi/EFI/debian/initrd.img
. In this case its easier to use the refind boot manager but you can boot the kernel directly.
you'll have to put a 200mb FAT32 efi partition at the beginning of the flash drive with an GPT partitiontable and then put the rEFInd Bootloader on it, blessing it with macOSX, then you'll need to configure it to boot the FreeBSD boot partition (which mustn't be ZFS because there are no drivers for that).
All that is necessary because BIOS emulation mode on macs doesn't work with USB/Firewire drives...
edit: oh it seems freeBSD has no EFI support, so you'll also need to setup an Grubx64.efi bootloader to boot it. yeah I think an internal SATA drive would be the more viable solution by now...
Read on DistroWatch:
>The other point was the announcement that Debian had been seeking legal advice regarding whether the project can include support for playing DVDs (via libdvdcss) and the ZFS advanced file system. The Software Freedom Law Center has given the go ahead for Debian to distribution both ZFS and libdvdcss packages and we should soon see these features appear in Debian's repositories.
If Debian (!) can do this, I expect every other distribution to do this as well.
"no other testing shows any sort of hardware failure" - did you try to check other PSU, mobo? I had pleasure of tracking mysterious SATA errors, changed cables, kernels... but ultimately it was PSU all along. One of my disks has 10k errors in smartmonctl, acquired in few minutes. And not a single one for 2 years since PSU was replaced.
The above system has been crashing every week or so, and that was due to me cheaping out on a CPU - I installed some APU, that apparently was not properly supported (or I bought 2 faulty CPUs). After switch to more "mainstream" Ryzen processor, there was no crash. Is it CPU? Unlikely as you have a decent, non-niche cpu.
Can be some power saving quirk? For one machine for some old graphic GT210 card I had to turn off power saving by pcie_aspm=off kernel parameter (https://www.kernel.org/doc/html/v4.15/admin-guide/kernel-parameters.html). Maybe there you find a parameter for you?
Also: another mobo?
Just 4 ideas, i do not answer what it may be, as I run very basic servers, with cheap, sometimes used hardware, for my personal use, never needed more than 4 HDDs :)
The RAM comment usually makes everyone go nuts. My understanding is that ZFS will just chew up as much free ram for its ARC as is floating around in the system. Dedup uses (and needs) heaps of ram, but I don't think you'll need it in your situation.
Not sure on the number of drives question. I have 4 in raidz1 and they run fine.
LZ4 is pretty quick, but your video files won't compress anyway, so it's probably pointless turning it on. My understanding is that the compression algorithm tries to compress a file, but if it hits something that won't compress (like a video file) it gives up and writes it uncompressed to save time/cpu cycles.
Good read at the bottom of this page RE ZFS myths.
> https://github.com/axboe/fio/blob/master/examples/ssd-test.fio
Oh, wow, I didn't know about the runtime
argument. That's pretty neat!
I'm not sure how reliable just running that test at various bs
values is for uncovering the underlying structure of your SSD, though. I think you probably still need to destroy and recreate the pool with the various ashift
values to be sure.
I wrote some python code that turned "zfs get" into a treemap (https://developers.google.com/chart/interactive/docs/gallery/treemap)
https://i.imgur.com/ISsPxd7.png
It may take me a bit to clean it up.
I ended up with this one: https://www.amazon.com/LSI-Controller-LSI00301-9207-8i-Internal/dp/B008J49G9A/ref=sr_1_44?dchild=1&keywords=mini+sas+card&qid=1589837993&sr=8-44
I got 2 of these cables to go with it: https://www.amazon.com/Cable-Matters-Internal-SFF-8087-Breakout/dp/B018YHS8BS/ref=sr_1_2?dchild=1&keywords=9207%2Bto%2Bsata&qid=1589838412&sr=8-2&th=1
If you're willing to do the bisect yourself, it can be whatever triggers the bug. I, myself, would be afraid to do this on production, because there are a lot of commits (and presumably fixed bugs) between 2.0.0 and the merge-base with 0.8.6. You'd also have to bump back to (at least) kernel 5.4 if you did that, since the 5.10 compatibility patches don't apply cleanly (the OpenZFS refactor to include BSD makes the patches really hairy).
If that doesn't sound like fun (ugh), then it would have to be enough for someone to start from the ground-up (i.e., blank pool) which you'd also be willing to send to someone else (i.e., post online). I personally suspect it's not so much the existing content, but the workload at the snapshot time (but that's a total shot in the dark). One time, I got this in the dataset that just had /var/log
.
Also, just checking: did you get the ereport.fs.zfs.authentication
in zpool events
? If you're getting that on the encrypted dataset, we should move this discussion onto that bug report so it's easier for people researching this bug to get up to speed. In that case, you might try doing the following (the next time the bug happens): do not delete the snapshot, reboot the computer, and scrub twice. In my case, the error went away (and this signaled to me that it was probably some kind of in-memory corruption that didn't hit the disk).
>not using swap
This comment right here indicates that you're not quite up to speed on what either zil/slog or l2arc actually do.
This article showcases a very good zfs setup for high-performance dB.
"This is being written to try to explain why Linux does not have a binary kernel interface, nor does it have a stable kernel interface."
https://www.kernel.org/doc/html/v4.15/process/stable-api-nonsense.html
It doesn't have a stable kernel interface according to this "Greg Kroah-Hartman" guy. He's obviously a Diva programmer with a bigger ego than skill, but he seems to be able to publish on kernel.org, so he might have something to say over there.
And even though his arguments that not having stability are nonsense, just making the statement that Linux doesn't intend to provide stability proves that it doesn't have stability.
I've had mixed results using bonnie/bonnie++. Give IOZone a try?
One question: how much RAM do you have on this host? You want to make sure you are testing the disk IO, not how fast the RAM cache works. A old rule of thumb is to force your IO test to use twice the amount of RAM you have installed. Bonnie is supposed to figure this out "automatically," but but I haven't found anything in the documentation that explains what this actually means. Your -r
settings imply that you have about 96G of RAM on the host, using the 2x guideline.
If you are running the test on a large-memory system, the tests checks can take a very long time. One trick, if you run Linux, is to reboot and force the kernel to not use all of the RAM in the system. Add this to your kernel boot parameters to force the system to use 1G of ram total:
mem=1024M
(see https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html for details)
This may help speed up the tests...
As to your actual tests, I don't have anything comparable I test on, so I can't help you there. :-/
There is more to usable capacity than you are factoring in that can make a difference of a couple TB in different configurations.
The biggest gripe with using SSD with ZFS was a lack support for TRIM call. This was added in FreeBSD 10, not sure if it propagated to outside of it yet.
Might not be 100% what you are after, but Webmin with the ZFS plug-in might get close.
Installing that plugin can be fidgety though.
Ya'll might consider this:
http://ecryptfs.org/about.html
It encrypts each file on disk, and you do a sort of "loopback" mount to present the unencrypted version to the system.
The nice thing about it is you can back up the underlying filesystem (with ZFS, zfs send and everything) and the data will remain encrypted in backups too, which won't work with the LUKS method mentioned below.
ecryptfs is not associated with ZFS; it works with any filesystem. That's what we use where I work for an encrypted file share.
Yes, that's it.
Buffered data to be written will be assigned to a txg during the open stage; the txg contents get finalized during the quiescing stage, and then writing happens during the syncing stage.
Not online, but there is this (Advanced RAID Calculator)[https://play.google.com/store/apps/details?id=com.sshlroot.advancedraidcalculator] in the Play store. It says can estimate storage usage based on RAIDZ levels as well.
> Another way to say it is that a snapshots “USED” size only accounts for unique blocks contained in that snapshot. That’s why when you delete a snapshot the only blocks freed are the ones that were uniquely held by that snapshot.
Aah that makes sense. I'm currently reading through the Snapshots chapter in this book: FreeBSD Mastery: ZFS. The examples of snapshot space consumption only had examples 2 or 3 snapshots deep, so not too many cases where (like in my case) the blocks are probably shared between several dozen snapshots.
Thanks!
"Build a real system."
A "real" system with all the drives I have (see comment elsewhere here) would cost me *way* more than I could afford for a server rack and parts just to hold the drives. Just trying to figure out what will work with what I have right now.
As noted in the other comment, the enclosures are not run through a USB hub, but are connected directly to the computer via 2x PCI expansion cards.
I have 8x sata drives in a mini-itx chassis. The HBA was either overheating or the 4 port splitters weren't liking the bend angle to fit the drive mounts. Either way, I had one of these on had and said why not give it a shot. I removed a 1TB NVME I had in the second slot and installed this: m.2 to 6x SATA
I used 4 ports available on the motherboard, and 4 of these ports (the other two are reserved for future use).
Besides booting slower and generating heat, I don't know what my old LSI HBA was doing?
i cant seem to find an affordable card that is pci-e x1, my board only has 1x pci 16 and 2x pci 1
would this Card be decent? Would you mind checking the listing, and see if any of the 6 slot or 8 slot is good for this?
Thank you so much :)
> people say USB HDDs won't work well with ZFS I use ZFS regularly with a bunch of these HDDs (up to 4 at a time) and the result is both fast and reliable: https://www.amazon.com/Seagate-Backup-External-Drive-Portable/dp/B07MY44VNM Beware they are SMR and so not as speedy when writing, but I manage to eck ~40MB/s out of each of them during sustained sequential writes (sequential read speeds are more like 100MB/s).
> But the RAM is only 4GB Worse, it has no ECC. And believe me, if you value your data, ECC is a must.
Except for PCIe, this one ticks all your boxes and does support ECC: https://www.asrockind.com/en-gb/4X4%20BOX-V1000M
> Can I install ZFS using just the onboard sata ports
Absolutely.
> Likely install 10 gig ethernet nic
Then you'll probably want a proper HBA, because motherboard SATA tends to bottleneck around 600MiB/sec no matter how fast the underlying drives are.
I use and recommend the LSI 9300-8i (example: https://www.amazon.com/LSI-Broadcom-9300-8i-PCI-Express-Profile/dp/B00DSURZYS ), which alleviates the motherboard-SATA bottleneck nicely; I don't know their top throughput but I've seen them pull >2GiB/sec for long periods.
ALSO the Amazon listing literally only mentions SMR twice, and doesnt mention the difference at all, and doesnt give a warning about the crappy firmware they ship with these either
https://www.amazon.com/gp/product/B07PGWXQCM/ref=ppx_yo_dt_b_asin_title_o05_s00?ie=UTF8&th=1
I’m using this ECC memory with my Ryzen 3600X: Kingston Server Premier 32 GB 3200MHz DDR4 ECC CL22 DIMM 2Rx8 Server Memory https://www.amazon.com/dp/B09N9V1ZS4/ref=cm_sw_r_cp_api_i_WN1WMDW91GX2VGDYEZG7
They also have 16GB dimms, working pretty well. Make sure you got an UPS as well
Hi, we've been meaning to give more frequent updates. Once our migration is completed, we will step up our update + software development game.
We use a few of these to move drives:
The case is wrapped in a few moving blankets during transport.
Yeah, definitely a Gamble, I'm currently running only one NVMe on this adapter because my mobo from 2011 (x79 chipset) doesn't support bifurcation:
Xiwai 4X NVME M.2 AHCI to PCI-E... https://www.amazon.ca/dp/B094FH5C5F?ref=ppx_pop_mob_ap_share
So if that half works, who knows.
This did it! After re-importing by-id, I replaced the SATA card with this one, which has a different chipset though not LSI: https://www.amazon.com/gp/product/B099ZCXJLQ/
I booted up with no errors. The pool resilvered once at boot and no longer shows any errors from zpool status - I did a clear/scrub as well, rebooted again, and still no errors so far.
Thanks for the help!
The quick and nasty way before you automate it all is to use something like tmux (https://github.com/tmux/tmux) which is available on most distro's. It will allow you to start a terminal session and run it in the background so it won't matter if you get disconnected you can always reconnect and re-attach to the session.
Yes, if you set it up that way: https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-integrity.html
But OP seems to care more about performance than data safety, there is a tradeoff.
Good question - I don't have a Proxmox system to try. But if my memory of Debian packaging is correct...
That zfs-dkms
package requires linux-headers
etc, which are virtual packages. Normally these are satisfied by the versions headers for your kernel. For example something like linux-headers-4.19.0-3
would be marked as providing linux-headers
in order to satisfy the dependency. If pve-headers
also marks itself as providing linux-headers
and there are no linux-headers
packages available on Proxmox then I think yes, it should work.
edit - https://www.debian.org/doc/debian-policy/ch-relationships.html#virtual-packages-provides
The biggest gripe (as far as I can tell) is that they're only using rZ2 with a 15 wide vdev; with huge drives. With smaller drives this is less of an issue because a resilvering would go faster but as it stands now a resilver would take days (or potentially weeks); increasing the likelyhood that more failures could happen at the same time.
As an example; you've got a nice chunky 15x16TB in a z2 setup. That's roughly 200TB of data. A drive dies, no big deal because you've can lose another and still not have any data loss right? Except you've got 14 other drives of almost certainly similar age and vintage that are going to get thrashed while the resilver happens.
If our theoretical setup will take 100 hours to resilver and there is "only" a 5% chance per hour a drive will fail; then its a 99.41%* chance that a second drive *will fail. Guess what, you now have zero redundancy in an already compromised system.
Long story short, if you're only going z1/z2 then you want fewer drives per vdev; with more resultant vdevs. LTT just built a 4x15 setup but probably should have gone with a 6x10z2 (768TB total) or 10x6z1 setup (800TB). Yes they lose 30-100TB of space but in a 60 drive array but that's a drop in the bucket given they've reduced the chances of data loss.
* As best as I can calculate anyway, I cheated a bit and used Omni's probability calculator
** Haven't watched their video yet, I guessed at them using 16TB drives but knowing Linus he's gotten his hands on 20TB+ which just makes their configuration even more risky.
This timely… the author pans ZFS as buggy and recommends LUKS as an alternative. Meanwhile, a critical vulnerability has been discovered in the latest LUKS that ALLOWS AN ATTACKER TO DECRYPT YOUR DATA.
https://gitlab.com/cryptsetup/cryptsetup/-/commit/0113ac2d889c5322659ad0596d4cfc6da53e356c
/luks you had one job…
This timely… the author pans ZFS as buggy and recommends LUKS as an alternative. Meanwhile, a critical vulnerability has been discovered in the latest LUKS that ALLOWS AN ATTACKER TO DECRYPT YOUR DATA.
https://gitlab.com/cryptsetup/cryptsetup/-/commit/0113ac2d889c5322659ad0596d4cfc6da53e356c
/luks you had one job…
Hm, did you install everything? I am using Debian so it is a bit different.
According to Ubuntu's documentation you need for 16.10:
sudo apt install zfs
You could also try:
I suspect it's going to be a while.
Hilariously, you can have a more recent ZFS under Trusty than you can under Xenial, since the PPA stopped supporting versions later than Canonical's import of ZFS into their repos. https://launchpad.net/~zfs-native/+archive/ubuntu/daily
Not OP but I've been using an AsRock J3710 in the InWin IW-MS04 ITX Case with 4 drives with no issue. Debian is rock solid and intel is well supported so basically anything that's semi-recent and has enough ports should work just fine (so long as you remember that realtek is trash).
An alternative to a serial console is to use Linux’s kdump feature:
This will let you load the memory dump into a debugger and poke around at exactly what’a happening in all parts of the kernel.
Ah yes; you're referring to the way ZFS used to run on Linux under FUSE instead of running as kernel-level implementation.
FUSE is still a decent way to run a filesystem in userspace, but definitely not ideal for production use as it's an extra layer that must be managed. This hasn't been true in a long time and currently OpenZFS (the current implementations we generically refer to as ZFS) is written as a native Linux filesystem.
You'll also hear people refer to Oracle ZFS but that's really not used much outside of large commercial implementations that require support.
I didn't think about data scrubs, I was only thinking about your typical reading/writing files. You make a good point: running scrubs on the GPU could help tremendously.
This StackOverflow post says it's not possible to run OpenCL within the kernel and that it would be difficult to emulate the OpenCL toolchain within the kernel.
But I would expect that ZFS is not the only project that wants to run code on either the CPU or GPU, so the kernel devs may come up with something like OpenCL on their own.
Depending how you want to recover data, it might be simpler to use a specialised data backup and storage company like Backblaze. They have enormous sets of disks in multiple raid clusters storing your files.
I use Duplicati for key directories. Each night it does an incremental backup to Backblaze.com. I pay less than $3/month - although that is not for all my data, just directories of important files. I do not back up my MythTV programme recordings.
Try following these instructions to reset the quick access toolbar in Windows:
https://winaero.com/blog/reset-quick-access-toolbar-windows-10/amp/
It sounds insane but this actually improved access times for my samba shares with large numbers of files per directory.
I run striped 3-way mirrors and I was perplexed as to why I was getting terrible performance over SMB from Windows. I don’t know why this works - I only know that it worked for me and had no impact otherwise on my Windows system.
This comes up often...
Yes, you can use ZFS on a hardware array. I do this often for a variety of reasons on enterprise-grade hardware. Most often, it's a case of needing ZFS-style volume management of a data volume on a Linux system that may have more traditional filesystems for system partitions.
Please see my post at: http://serverfault.com/a/545261/13325
For the hardware being described, I'd use a hardware RAID grouping with a single ZFS LUN versus individual hardware RAID0 LUNs presented to ZFS as "raw disks".
The main reason is that a hot-swap event under that setup will fail and require a new PERC virtual disk to be created in order to recover. There are device naming and enumeration implications as well.
Thanks everyone. I have an older Dell PowerEdge server with extra drive bays so I'm not looking to spend money on another server if I don't need to.
With some fresh eyes this morning I've run across mhddfs which looks like it should be exactly what I need: http://serverfault.com/questions/191299/can-we-mount-multiple-disks-as-one-directory
Thanks!
I am sorry if I came off as rude as well. I totally hear what you are saying, which I both partially agree with and disagree with to a certain extent. That said, I read threads like this and lose all interest in running on ZoL.
Well you can pay <u>me</u> for support :)
But no, there's no official RHEL support for ZoL. I use CentOS and RHEL with ZFS in production environments with HP hardware, but it depends on why you feel you need vendor support here.
But when the shit hits the fan, nobody is going to come in and fix your organization's applications or environment-specific issues. That falls on you and your organization, no?
If you want to run ZFS on RHEL, do it: just make sure you test and self-support properly.
The amount of space depends on your vdev configuration. Making a 24-disk wide vdev is not a wise decision for several reasons. And really wide vdevs like that are going to waste a lot of space.
I have a 12-disk zpool that should theoreticall be 28TiB and with ashift=12 it ends up being 27.4TiB. So I am only losing 600GiB. 1/64th is lost to metadata no matter what ashift you choose. So I am losing very very little space from ashift=12 because my 6-disk vdevs are the optimal width for RAIDZ2.
See more info here:
https://web.archive.org/web/20140403012030/http://www.opendevs.org/ritk/zfs-4k-aligned-space-overhead.html
>Yet, he wants to start using the NAS when the first drive arrives. I have no idea if that is possible. Is it?
You can build a NAS on a single drive, however, you won't be able to grow/extend the ZFS pool layout that easily. ZFS extend possible by replacing drives with bigger capacity (one by one, each time rebuilding the pool) or creating a new RAID2 group and creating stripe between all groups (aka RAID 60 with two groups). https://www.truenas.com/community/threads/extending-a-raidz2-volume.10956/
Consider using MDADM. It supports various array layouts and allows to create RAID 0, convert it to RAID1 and later grow it to RAID 5 without any data loss. https://dev.to/csgeek/converting-raid-1-to-raid-5-on-linux-file-systems-k73.
Moreover, build a backup environment right away when you finish the NAS configuration.
I'm using Seagate Exos x16 16TB drives. Thought I had read they were CMR drives, but now I'm not sure.
This is the second drive in this slot. The first one was throwing zpool errors and showing faulted even though SMART showed no problems. Cable issue maybe?
> https://github.com/axboe/fio/blob/master/examples/ssd-test.fio
Oh, wow, I didn't know about the runtime
argument. That's pretty neat!
I'm not sure how reliable just running that test at various bs
values is for uncovering the underlying structure of your SSD, though. I think you probably still need to destroy and recreate the pool with the various ashift
values to be sure.
ALEZ is the Arch Linux bootable "installer" w/ zfs support baked in. The Arch Linux install environment is mildly similar to system rescue cd in that it is just a CLI w/ a bunch of tools.
I'd only do this if you needed to isolate your pool from everything. Looks like OP figured out what was keeping it open though, so no need.
Could also boot into single user mode, how ever that is done on your system.
NixOS is absolutely amazing and I never want an imperically configured system anymore. Since you've heard of it and especially since you have a friend who can help you get over the initial learning curve in the beginning, I'd definitely recommend you to try it out.
I have no idea what your friend means by it catching fire. They might have been describing the Arch-syndrome because NixOS can be hacked with infinite detail.
Though even if you did manage to break it, you can simply select a previous generation at boot and you're back to a working system (unless you manage to break your bootloader, that's just as severe as on most other distros).
Many of the core NixOS maintainers use ZFS on their systems (one even rolls back his root dataset to an empty snapshot on boot), so ZFS support is very good. When the kernel broke SIMD support for ZFS encryption and hashing last year, NixOS was the only distro that patched support back in for example (afaik).
For booting ZFS with native encryption I'd recommend you to use a separate FAT32 boot partition for /boot/ because you don't have to worry about ZFS support in GRUB and could even use an entirely different bootloader (all bootloaders support FAT32).
There's a gotcha about non-root pools for boot.zfs.requestEncryptionCredentials, so if you want those imported you either have to put the keyfiles onto your root pool somewhere or use LUKS devices as key files.
Are you sure this isn’t due to bad sectors or something like that? Or perhaps a host protected area on one of the drives? For example, this support question about duplicating disks ended with WD saying to RMA the drive.
https://www.hdsentinel.com/forum/viewtopic.php?f=32&t=11575
I had a similar issue (as you narrowed it down with u/ipaqmaster), based on the change of drive presence, due to boot from USB.
In your case, because you used /dev/sdX manually, in my case because my hardware treated the USB as a locally plugged HDD, the subiquity installer with zfs-experimental option had the same problem, precisely:
... writing the wrong device/partition into the EFI-Boot-Variables.
Although you can (in your case) circumvent the problem by using either /dev/disk/by-id/xyz, if your target is easily identifiable, or via /dev/disk/by-uuid/xyz, if you have many drives of the exact same model, you can try also:
Making the Live-System be booted and not be listed with a higher priority then the target-disk.
For example by using Rufus Boot Creator with the option to set the drive at another device-list-position (0x87), which is helpful for BIOS-boot, or by using the Ventoy Boot Creator, which does use GRUB2 to map the Live-Image as removable and at a lower priority than all internal disks/drives. In case you use Ventoy or manually via the identical process of mapping to the backrow with GRUB2, you can use even /dev/sdX addressing. Which is helpful, if you use scripts, that are hardcoded to use /dev/sdX-addressing.
Take this as a note for future experiments, as this has no relevance if you wish to find a solution allowing you to correct the mistake in your given install.
​
​
Anyhow, good luck!
I remember a rule of thumb of using (powers of two) number of data drives. So 2x (4+2) or 1x (8+3) look good.
I'd use the first one. Two VDEVs -- double the write IOPS
> In a RAIDZ-2 configuration, a single IO coming into the VDEV needs to be broken up and written across all the data disks. It then has to have the parity calculated and written to disk before the IO could complete. If all the disks have the same latency, all the operations to the disks will complete at the same time, thereby completing the IO to the VDEV at the speed of one disk.
Could you please explain what you mean by over/underused. Like show each vdev alloc/free space?
There is a table with a list of volumes https://grafana.com/api/dashboards/15008/images/11058/image
If I add a column with capacity it would be close to what you want?
Could you please explain what you mean by over/underused. Like show each vdev alloc/free space?
There is a table with a list of volumes https://grafana.com/api/dashboards/15008/images/11058/image
If I add a column with capacity it would be close to what you want?
Deduplication would have a performance impact on your system as every block that gets written has its hash checked against the master table to determine if its a duplicate.
Have you tried jdupes to find and remove the duplicates? https://github.com/jbruchon/jdupes
I keep different datasets for different things; aside from making it easy to see what's using all your space, I have compression and block size settings set differently. For example for my backups dataset (where client machines back up to) I set max compression because it's automated and I don't care about the slowness. Most others are LZ4 compression, except datasets storing video because video doesn't compress at all. Some applications (for example a torrent temporary directory) benefit from a smaller record size: http://open-zfs.org/wiki/Performance_tuning#Bit_Torrent
I rarely move things manually between datasets. There is some scripting that moves downloads into their proper place (check out Filebot) that runs on a VM that has the ZFS shares NFS-mounted. "mv" across datasets via NFS works fine.
Take a look at https://www.jottacloud.com/ - for a privat account you get unlimited data for €7.5/month. And you only need 1 month, right?
I don't know your upload speed, so this could determine how long time you're going to need the account. FWIW I was able to flatline my fiber's uplink (560Mbps) during my upload :-O
Cool. Thanks.
The main issue i had/have with Borg is that you need either a plan somewhere like BorgBase (or your own server with ssh/borg), _or_ you need to locally create a repository and then sync this up to $cloud with eg. rclone.
I guess restic might be another option?
> There might just not actually be enough power on that rail for both drives. It only has one SATA power connector so I used a good quality power splitter cable to get power to both
Check into this, 5-drive external rack:
https://www.amazon.com/gp/product/B001LF40KE/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1
​
Hook up a standard PC power supply to it and run open-case or drill a hole for SATA cables. I use it with SAS breakout
We have had exceptionally good luck with these:
The drives are presented to the OS exactly as they would be connected directly, in my experience. I once created a mirror zpool with two drives in one of these, then moved them into the machine and everything kept working fine.
And of course, being dual-slot you only need half as many as you have drives.
I wasnt using the molex, but exenders on the SATA power. These things to be exact: https://www.amazon.com/Cable-Matters-Pack-Power-Splitter/dp/B012BPLW08/ref=sr_1_8?dchild=1&keywords=sata+extender&qid=1634570938&sr=8-8.
Anyway, after moving those around, issue went to other drives, so I think I isolated it as being a power issue. After replacing the effected extension, still was having issues, so think it was a SATA plug on the psu that went bad. Just ordered a new PSU with enough plugs to handle the drives. Hopefully that will resolve this.
For cooling, one possibility if you have the space is to set up a 12v brushless blower fans like one of these. I don’t recommend any particular model, just what they can do and they come in various sizes. The connector they use will likely need some kind of motherboard fan header adapter.
It's a cheapo SPCC M.2 drive that I got from amazon late last year. https://smile.amazon.com/gp/product/B07L6FJS7V.
Wouldn't I know that it's not the hardware by running the fio tests after removing the SLOG device and seeing worse results?
If you ever want to buy some straight up, Monoprice is most people's go-to recommendation. These are also good, and half the price: https://amazon.com/Benfei-Straight-Locking-Compatible-Driver/dp/B07JFQ2H9R/ or if you want a more widely-known brand (Benfei is STARTING to get well-known for consistently good cheap stuff, but y'know) there's CableMatters: https://amazon.com/Cable-Matters-3-Pack-Degree-Right/dp/B00KCS91GY/r
No, I think the resilver numbers might be misreporting because these are just 5700 RPM enterprise drives.
https://www.amazon.com/HGST-MegaScale-HMS5C4040BLE640-Coolspin-Enterprise/dp/B073MKXH9R
I've 100% given up on USB storage for ZFS. I honestly wish the Thunder3 Quad X had taken off because the test unit I purchased has been rock solid reliable for years.
L2ARC, definitely not. SLOG, maybe—depends on whether you'll have a lot of sync writes. SLOG has no effect on async writes, only on sync.
You may not want to use your old 256GiB SSD for that, either. At least, not without testing. SLOG is a funny beast and demands extremely low latency, not extremely high throughput. It can also burn through write endurance on SSDs faster than you'd expect, if you have a workload with tons of sync writes in the first place.
If you will have a ton of sync writes and a SLOG will be a good investment, you're better off with a $75 32GiB Optane than a random old SATA SSD. It scratches both the "write endurance" and "incredibly low latency" boxes vastly better than even good NAND SSDs can, let alone rando small ones.
If you don't have or want to spend the $75 for the small Optane... I can't give you a better answer than "maybe" on the SLOG using your old 256GB SSDs. You'd need to do some fio
tests to see how well they actually do or don't improve your sync writes, as well as figuring out whether you really do or don't have sync writes in your workload in the first place.
I have a 2018 mac mini with this enclosure
Running a raidz1 pool that is mounted 24/7 on Big Sur. I've had this enclosure for the last two years. Went through some other less than stellar enclosures.
No crashes, no issues. Would recommend.
Running memory tests as we speak. I'm starting to think it may be the SMR drives too. I bought 3 of these today and plan to shuck them and set them up in a mirror configuration with one as a backup. People are saying that they are CMR drives.
For bifurcation your BIOS wi either support it or it won't. You'll need to check in the settings for the specific PCIe slot to see if it's a feature that's offered. Otherwise there's this card that lets you run 2x NVMe without bifurcation bc it includes a storage controller on the card.
The connectx-3 are great cards. They're capable of being used in fewer lanes than x8, you'll just have less bandwidth. For example, a PCIe 3.0 x16 slot supports a theoretical max of 128Gb/s, so if you put one of those in a x1 slot you'd potentially get 8Gb/s (without taking into account protocol overhead). It's not ideal but it's better than nothing in some cases.
I had an old Intel motherboard that needed a faster link and it had a x16 slot running at x8 physical, so I put an extra connectx-4 card in (x16 connector on this model) and the card worked fine but ran at a bit less than half the rated speed.
yep, in that case the only choice is fiber (either using IB or ethernet), or running multiple cat6 cables in aggregate.
I know what you mean about investing in all the gear. I have a 25U rack that's nearly at capacity, a cluster of R420 systems that act as compute nodes for a H/A virtualization system, which are linked to a R730xd via 100GbE that provides centralized storage which is full of SSDs and Optane drives for ZFS. Plus a pair of OPNsense routers in failover mode, and a couple of general purpose nodes that run the infrastructure automation tools and repositories. I also have my workstation racked there, and run the monitor connections and usb over to my desk in another location.
Having the workstation in the rack lets me keep everything centralized with short cable runs, which are DAC. If I were 20M away like you setup I'd look into a HDMI+USB repeater setup that transits the connections over cat6; those can reach pretty far... which would imply having your desktop computer with the server instead of at your desk.
Here's one of the repeaters I was thinking about buying for an example:
No, the SAS back panel will also have the single SFF-8087 port - it will look the same as on the Dell H200.
You just need a regular cable like this:
https://www.amazon.com/Cable-Matters-Internal-Mini-Mini-SAS/dp/B011W2F626/
Is the one of the MicroServers with the slimline laptop DVD drive or the full size 5.25" one?
If it's the latter - and you've got an appropriate HBA installed - you can get up to eight 2.5" 7mm SSDs in its place: https://www.amazon.com/dp/B00TL4US8K
If not, you can swap the DVD drive at least for one extra, using those adapters people used to use for two drives in laptops: https://www.amazon.com/dp/B01MRI8YFN
The Tyan S7012 are a good build. I make a few ZFS file servers out of them every year, but few notes.
-They originally came with series 5500 intel support only. To get 5600 support, you will have to upgrade the firmware. So I recommend you buy a pair of L5520 CPUs when you get the board, they are super cheap quad core CPUs that a pair sells for around $5 now.
-The South bridge gets hot, some boards come with high profile heat sinks, but not most of them. If your build is not in a high air flow case consider placing a 40mm fan on the heat sink. This will help with system stability.
-It comes with only 5 PCIe x8 open ended slots, (some are only x4) It is nice since you can place larger cards into the slots (all PCIe should be open ended) but be careful in slot one, the rear components may stop you from placing a large card in them.
>passmark of 26,104
No wonder you need water cooling, a 150W CPU needs that. Over all nice. Far more than I would spend, but I'm a bit on the cheap side.
I Purchased https://www.amazon.com/Asus-Hyper-M-2-x16-Card/dp/B0753JTJTG, for the heck of it. If it does not suit my needs I figured I can return or resell it and get most of my money back. My main issue is going to be installing it. Both my PCIe 3.0 x16 slots are full of video cards, and my only other option is the x4 slot that my NVMe card is currently sitting in. I am guessing a I may end up upgrading my board soon or I will most likely pull out my sound card and end up installing a PCI one instead. I have a few LGA 2011 boards, but I just spent a week building my current setup and I don't want to rip it apart. -_-
Can you link me to a good example? Preferably one suited for a homelab, ie not ridicu-enterprise-priced to the max? This is something I'd like to play with.
edit: is something like this a good example? How is the initial configuration done - BIOS-style interface accessed at POST, or is a proprietary application needed in the OS itself to configure it, or...?
Thank you.
I don't have all of the heartbeat mechanisms that RSF-1 uses. Redhat describes the general cluster requirements and best practices here here.
In this setup, I am using two network rings to provide the heartbeat. One is comprised of the management IPs on each server. The other is a private /30 between the two machines; usually directly connected if I have a spare NIC. Otherwise, both live on a 802.3ad bonded interface on each server. Zetavault used to recommend a USB Bridge cable to serve as another heartbeat mechanism to save Ethernet ports.
As for the VIP, NFS daemon, NFS exportfs, NFS notify, zpool dance, I eliminate all of those by using the ZFS sharenfs
property instead of the native Linux exportfs mechanism.
This way, the only cluster resources are VIP and zpool. The filesystem and exportfs properties move with the pool.
You mean like this?
Though, if either the HDD or SSD portion fails, you'd have to replace the whole thing.
Only works in Windows though, but the potential is there.
| I can be the stereotypical shut-in nerd only leaving my house to restock the beer and pizza but as soon as you tie a paycheck to my desk I can't do it.
Don't worry. That's actually quite normal and well described in motivation litterature. Daniel S. Pink's Drive is a good read about this.
Great write-up mercenary_sysadmin. Thanks for all the information. Now I get to reading. Yeah, wish I didn't buy that Fractal Design r4, its only 8 bays and dedicated space for 2 SSDs.
RE the Seagate archive drives, I just feel a need to wade into my data and sort out redundancies and separate entertainment from personal. Not that I will gain that much space. The best way for me to do that is have a drive large enough to put a lot of data in one place. Aside from being slow and unsuitable for NAS, have you heard any other bad things?
RE Motherboards: You mentioned 8core, Avoton. Do you like this one? http://www.amazon.com/ASRock-Motherboard-C2750D4I-COLOR-BOX/dp/B00HIDQG6E/ref=sr_1_1?s=pc&ie=UTF8&qid=1448844310&sr=1-1&keywords=C2750D4I
I have heard you about the ECC ram, but when I look at motherboards they often don't state whether compliant with ECC (maybe they do but I miss it). Can you tell me an easy way to know if a mobo is compliant with ECC ram?
Thanks again. Lot to study and I'll be back.
If I start with 3 or 4 of these WD Red Pro 6tb drives. Am I limiting my choices if I (later) add more drives incrementally (this is where my lack of knowledge of FreeNAS and ZFS gets in the way. Difficult to plan when one does not know how the system actually operates)