|
Hi,
I want to investigate if it is possible to create your own usable HPC storage using zfs and some network filesystem like nfs. Just a thought experiment.. A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD deives for cache. Preferrably in mirror where applicable. Connected to this machine we will have about 410 3TB drives to give approx 1PB of usable storage in a 8+2 raidz configuration. Connected to this will be a ~800 nodes big HPC cluster that will access the storage in parallell is this even possible or do we need to distribute the meta data load over many servers? If that is the case, does it exist any software for FreeBSD that could accomplish this distribution (pNFS dosent seem to be anywhere close to usable in FreeBSD) or do I need to call NetApp or Panasas right away? It would be really nice if I could build my own storage solution. Other possible solutions to this problem is extremley welcome. Best Regards Peter Ankerstål _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
> I want to investigate if it is possible to create your own usable > HPC storage using zfs and some > network filesystem like nfs. > > Just a thought experiment.. > A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) > I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD > deives for cache. > Preferrably in mirror where applicable. > > Connected to this machine we will have about 410 3TB drives to give approx > 1PB of usable storage in a 8+2 raidz configuration. > > Connected to this will be a ~800 nodes big HPC cluster that will > access the storage in parallell > is this even possible or do we need to distribute the meta data load > over many servers? If that is the case, > does it exist any software for FreeBSD that could accomplish this > distribution (pNFS dosent seem to be > anywhere close to usable in FreeBSD) or do I need to call NetApp or > Panasas right away? It would be > really nice if I could build my own storage solution. > > Other possible solutions to this problem is extremley welcome. For starters I'd love to know: - What single motherboard supports up to 192GB of RAM - How you plan on getting roughly 410 hard disks (or 422 assuming an additional 12 SSDs) hooked up to a single machine If you are considering investing the time and especially money (the cost here is almost unfathomable, IMO) into this, I strongly recommend you consider an actual hardware filer (e.g. NetApp). Your performance and reliability will be much greater, plus you will get overall better support from NetApp in the case something goes wrong. In the case you run into problems with FreeBSD (and I can assure you in this kind of setup you will) with this kind of extensive setup, you will be at the mercy of developers' time/schedules with absolutely no guarantee that your problem will be solved. You definitely want a support contract. Thus, go NetApp. -- | Jeremy Chadwick [hidden email] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
Am Mon, 6 Feb 2012 08:22:06 -0800
schrieb Jeremy Chadwick <[hidden email]>: > On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: > > Other possible solutions to this problem is extremley welcome. > > For starters I'd love to know: Recently, someone had a similar proposal on the CentoS list. Except, he really wants to build it, it's 2 PB and it's supposed to be housed in his garage. Someone explained to him that with the 28k BTU, his garage would get rather warm.... There's been a presentation on the last LISA about "Your First Peta-Byte". It's a interesting talk, and entertaining too. It's on youtube. From building our own filers with Solaris and COTS shelves, I can say that the fun quickly wears of... _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Jeremy Chadwick
On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
<[hidden email]> wrote: > On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: >> I want to investigate if it is possible to create your own usable >> HPC storage using zfs and some >> network filesystem like nfs. >> >> Just a thought experiment.. >> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) >> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD >> deives for cache. >> Preferrably in mirror where applicable. >> >> Connected to this machine we will have about 410 3TB drives to give approx >> 1PB of usable storage in a 8+2 raidz configuration. >> >> Connected to this will be a ~800 nodes big HPC cluster that will >> access the storage in parallell >> is this even possible or do we need to distribute the meta data load >> over many servers? If that is the case, >> does it exist any software for FreeBSD that could accomplish this >> distribution (pNFS dosent seem to be >> anywhere close to usable in FreeBSD) or do I need to call NetApp or >> Panasas right away? It would be >> really nice if I could build my own storage solution. >> >> Other possible solutions to this problem is extremley welcome. > > For starters I'd love to know: > > - What single motherboard supports up to 192GB of RAM SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM slots). It's an AMD board, but there should be variants that support Intel CPUs. It's not uncommon to support 256 GB of RAM these days, although 128 GB boards are much more common. > - How you plan on getting roughly 410 hard disks (or 422 assuming > an additional 12 SSDs) hooked up to a single machine In a "head node" + "JBOD" setup? Where the head node has a mobo that supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of 16-24 port multi-lane SAS/SATA controllers with external ports that are cabled up to external JBOD boxes. The SSDs would be connected to the mobo SAS/SATA ports. Each JBOD box contains nothing but power, SAS/SATA backplane, and harddrives. Possibly using SAS expanders. We're considering doing the same for our SAN/NAS setup for centralising storage for our VM hosts, although not quite to the same scale as the OP. :) > If you are considering investing the time and especially money (the cost > here is almost unfathomable, IMO) into this, I strongly recommend you > consider an actual hardware filer (e.g. NetApp). Your performance and > reliability will be much greater, plus you will get overall better > support from NetApp in the case something goes wrong. In the case you > run into problems with FreeBSD (and I can assure you in this kind of > setup you will) with this kind of extensive setup, you will be at the > mercy of developers' time/schedules with absolutely no guarantee that > your problem will be solved. You definitely want a support contract. > Thus, go NetApp. For an HPC setup like the OP wants, where performance and uptime are critical, I agree. You don't want to be skimping on the hardware and software. However, if you have the money for a NetApp setup like this ($ 500,000+ US I'm guessing), then you also have the money to hire a FreeBSD developer(s) to work on the parts of the system that are critical to this (NFS, ZFS, CAM, drivers, scheduler, GEOM, etc). Then, you could go with a white-box, custom build and have the support in-house. -- Freddie Cash [hidden email] _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Jeremy Chadwick
Hi,
On Feb 6, 2012, at 17:22 , Jeremy Chadwick wrote: > - What single motherboard supports up to 192GB of RAM Get an HP DL580/585 - they support 2TB/1TB RAM. > - How you plan on getting roughly 410 hard disks (or 422 assuming > an additional 12 SSDs) hooked up to a single machine Use LSI SAS92XX 4 (x4) port external controllers, and SuperMicro SC847E26-RJBOD1 disk shelves. Each disk shelf needs 2 ports on the LSI controller, which means you get 90 disks per LSI card. The DL580/585's have 11 PCIe slots, so you'd end up with 990 disks per server using this setup. > > If you are considering investing the time and especially money (the cost > here is almost unfathomable, IMO) into this, I strongly recommend you > consider an actual hardware filer (e.g. NetApp). Your performance and > reliability will be much greater, plus you will get overall better > support from NetApp in the case something goes wrong. In the case you > run into problems with FreeBSD (and I can assure you in this kind of > setup you will) with this kind of extensive setup, you will be at the > mercy of developers' time/schedules with absolutely no guarantee that > your problem will be solved. You definitely want a support contract. > Thus, go NetApp. We have NetApp's at our University for home storage, but I would struggle to recommend them for HPC storage. A dedicated HPC filesystem such as Lustre or FhGFS (http://www.fhgfs.com/cms/) will almost certainly give you better performance as they're purpose made. We use FhGFS in a rather small setup (44 TB usable space and ~200 HPC nodes), but they do have installations with 700TB+. The setup consists of 2 metadata nodes and 4 storage nodes, all supermicro servers with 24 WD Velociraptor 600 GB 10K RPM disks. This setup gives us 4.8GB/sec write and 4.3GB/sec read speeds, all for a lot less than a comparable NetApp solution (we paid around €30.000). It now has support for mirroring on a per folder level for resilience. Currently it only runs on Linux but i'm considering a FreeBSD port to get ZFS for volume management and now that OFED is in FreeBSD 9, Infinifband is possible. I'd highly recommend a parallel filesystem, unfortunately not many, if any, are available on FreeBSD at this time. Regards, Michael _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Freddie Cash-8
On 02/06/2012 05:41 PM, Freddie Cash wrote:
Hi all, > On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick > <[hidden email]> wrote: >> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: >>> I want to investigate if it is possible to create your own usable >>> HPC storage using zfs and some network filesystem like nfs. especially HPS sounds interesting to me- but for HPC you typicially need fast r/w-access for all nodes in the cluster. That's why Lustre uses several storages for concurring access over a fast link (typicially Infiniband) Another thing to think about is CPU: you probably need weeks for a rebuild of a single disk in a Petabyte Filesystem- I haven't tried this with ZFS yet, but I'm really interested if anyone already did this. The whole setup sounds a little bit like the system shown by aberdeen: http://www.aberdeeninc.com/abcatg/petabyte-storage.htm schematics at tomshardware: http://www.tomshardware.de/fotoreportage/137-Aberdeen-petarack-petabyte-sas.html The Problem with Aberdeen is they don't use Zil/ L2Arc. >>> Just a thought experiment.. >>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) >>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD >>> deives for cache. >>> Preferrably in mirror where applicable. >>> >>> Connected to this machine we will have about 410 3TB drives to give approx >>> 1PB of usable storage in a 8+2 raidz configuration. I don't know what the situation is for the rest of the world, but 3TB currently is still hard to buy in Europe/ Germany. >>> Connected to this will be a ~800 nodes big HPC cluster that will >>> access the storage in parallell what is your typical load pattern? >>> is this even possible or do we need to distribute the meta data load >>> over many servers? It is a good idea to have >>> If that is the case, >>> does it exist any software for FreeBSD that could accomplish this >>> distribution (pNFS dosent seem to be >>> anywhere close to usable in FreeBSD) or do I need to call NetApp or >>> Panasas right away? not that I know of > SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM > slots). It's an AMD board, but there should be variants that support > Intel CPUs. It's not uncommon to support 256 GB of RAM these days, > although 128 GB boards are much more common. Currently Intel CPUs have 3 Memory Channels. If you have 2 Sockets, 2 Dimms per Channel, 3 Channels- 12 Dimms with cheap 16GB Modules is 192GB. 32GB are also available today ;-) >> - How you plan on getting roughly 410 hard disks (or 422 assuming >> an additional 12 SSDs) hooked up to a single machine > > In a "head node" + "JBOD" setup? Where the head node has a mobo that > supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of > 16-24 port multi-lane SAS/SATA controllers with external ports that > are cabled up to external JBOD boxes. The SSDs would be connected to > the mobo SAS/SATA ports. > > Each JBOD box contains nothing but power, SAS/SATA backplane, and > harddrives. Possibly using SAS expanders. Multilane external) and some JBOD-Chassis (like SUpermicro 847E16-RJBOD1) Regards, Michael! _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Freddie Cash-8
On 2/6/12 8:41 AM, Freddie Cash wrote:
> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick > <[hidden email]> wrote: >> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: >>> I want to investigate if it is possible to create your own usable >>> HPC storage using zfs and some >>> network filesystem like nfs. >>> >>> Just a thought experiment.. >>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) >>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD >>> deives for cache. >>> Preferrably in mirror where applicable. >>> >>> Connected to this machine we will have about 410 3TB drives to give approx >>> 1PB of usable storage in a 8+2 raidz configuration. >>> >>> Connected to this will be a ~800 nodes big HPC cluster that will >>> access the storage in parallell >>> is this even possible or do we need to distribute the meta data load >>> over many servers? If that is the case, >>> does it exist any software for FreeBSD that could accomplish this >>> distribution (pNFS dosent seem to be >>> anywhere close to usable in FreeBSD) or do I need to call NetApp or >>> Panasas right away? It would be >>> really nice if I could build my own storage solution. >>> >>> Other possible solutions to this problem is extremley welcome. >> For starters I'd love to know: >> >> - What single motherboard supports up to 192GB of RAM > SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM > slots). It's an AMD board, but there should be variants that support > Intel CPUs. It's not uncommon to support 256 GB of RAM these days, > although 128 GB boards are much more common. > common wisdom for ZFS is 1GB of RAM per TB of storage.. so 256GB might not be enough. people who have actually tried ZFS more than me may want to comment _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Michael Fuckner
On 2/6/12 9:24 AM, Michael Fuckner wrote:
> On 02/06/2012 05:41 PM, Freddie Cash wrote: > Hi all, > >> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick >> <[hidden email]> wrote: >>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: >>>> I want to investigate if it is possible to create your own usable >>>> HPC storage using zfs and some network filesystem like nfs.If you >>>> use Supermicro I would use X8DTH-iF, some LSI HBA (9200-8e, 2x >>>> Multilane external) and some JBOD-Chassis (like SUpermicro >>>> 847E16-RJBOD1) > no-one seems to have mentioned the obvious route.. a cluster of machines, using the new iSCSI code to make some of them subservient to the others. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Michael Fuckner
On Feb 6, 2012, at 7:24 PM, Michael Fuckner wrote: > Another thing to think about is CPU: you probably need weeks for a rebuild of a single disk in a Petabyte Filesystem- I haven't tried this with ZFS yet, but I'm really interested if anyone already did this. This is where ZFS will shine. Depending on how you stripe disks, you can either get super fast resilver (if you go for stripe of mirrors), to fast (if you go for small number of disks raidz) to reasonable (if you of for large number of disks raidz). If you need high TPS you will want to go with mirrors anyway. The thing is doable with commodity hardware, but I wonder how one ever backups such setup? Daniel_______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Mon, Feb 6, 2012 at 9:34 AM, Daniel Kalchev <[hidden email]> wrote:
> On Feb 6, 2012, at 7:24 PM, Michael Fuckner wrote: >> Another thing to think about is CPU: you probably need weeks for a rebuild of a single disk in a Petabyte Filesystem- I haven't tried this with ZFS yet, but I'm really interested if anyone already did this. > > This is where ZFS will shine. Depending on how you stripe disks, you can either get super fast resilver (if you go for stripe of mirrors), to fast (if you go for small number of disks raidz) to reasonable (if you of for large number of disks raidz). If you need high TPS you will want to go with mirrors anyway. > > The thing is doable with commodity hardware, but I wonder how one ever backups such setup? With a second box configured similarily. :) Although, trying to find "downtime" to do the backups ... -- Freddie Cash [hidden email] _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Michael Fuckner
On Mon, 6 Feb 2012, Michael Fuckner wrote:
> > Another thing to think about is CPU: you probably need weeks for a rebuild of > a single disk in a Petabyte Filesystem- I haven't tried this with ZFS yet, > but I'm really interested if anyone already did this. Why would a disk rebuild take longer for a petabyte filesystem rather than a tens of gigabytes filesystem? The time to rebuild the disk primarily depends on the RAID type used for the zfs vdev (mirrors, raidz1, raidz2, raidz3), how many disks there are in the vdev, the degree of fragmentation, the amount of data stored on that disk, and the disk seek times. In a huge system, it makes sense to be more conservative about the zfs vdev design, and use more vdevs with fewer disks per vdev. Using anything less than raidz2 would be an error. Bob -- Bob Friesenhahn [hidden email], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Freddie Cash-8
-- Peter Ankerstål [hidden email] http://www.pean.org/ On 6 feb 2012, at 17:41, Freddie Cash wrote: > On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick > <[hidden email]> wrote: >> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: >>> I want to investigate if it is possible to create your own usable >>> HPC storage using zfs and some >>> network filesystem like nfs. >>> >>> Just a thought experiment.. >>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) >>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD >>> deives for cache. >>> Preferrably in mirror where applicable. >>> >>> Connected to this machine we will have about 410 3TB drives to give approx >>> 1PB of usable storage in a 8+2 raidz configuration. >>> >>> Connected to this will be a ~800 nodes big HPC cluster that will >>> access the storage in parallell >>> is this even possible or do we need to distribute the meta data load >>> over many servers? If that is the case, >>> does it exist any software for FreeBSD that could accomplish this >>> distribution (pNFS dosent seem to be >>> anywhere close to usable in FreeBSD) or do I need to call NetApp or >>> Panasas right away? It would be >>> really nice if I could build my own storage solution. >>> >>> Other possible solutions to this problem is extremley welcome. >> >> For starters I'd love to know: >> >> - What single motherboard supports up to 192GB of RAM > > SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM > slots). It's an AMD board, but there should be variants that support > Intel CPUs. It's not uncommon to support 256 GB of RAM these days, > although 128 GB boards are much more common. money RAM the better. > >> - How you plan on getting roughly 410 hard disks (or 422 assuming >> an additional 12 SSDs) hooked up to a single machine > > In a "head node" + "JBOD" setup? Where the head node has a mobo that > supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of > 16-24 port multi-lane SAS/SATA controllers with external ports that > are cabled up to external JBOD boxes. The SSDs would be connected to > the mobo SAS/SATA ports. > > Each JBOD box contains nothing but power, SAS/SATA backplane, and > harddrives. Possibly using SAS expanders. > > We're considering doing the same for our SAN/NAS setup for > centralising storage for our VM hosts, although not quite to the same > scale as the OP. :) Yep, NetApp has disk-shelves that can be configured JBOD that fits 60 drives into 4U. :D > >> If you are considering investing the time and especially money (the cost >> here is almost unfathomable, IMO) into this, I strongly recommend you >> consider an actual hardware filer (e.g. NetApp). Your performance and >> reliability will be much greater, plus you will get overall better >> support from NetApp in the case something goes wrong. In the case you >> run into problems with FreeBSD (and I can assure you in this kind of >> setup you will) with this kind of extensive setup, you will be at the >> mercy of developers' time/schedules with absolutely no guarantee that >> your problem will be solved. You definitely want a support contract. >> Thus, go NetApp. > > For an HPC setup like the OP wants, where performance and uptime are > critical, I agree. You don't want to be skimping on the hardware and > software. > NetApp they can install the system and we don't need to put in the extra hours (probably a lot) the get the thing running. But being a huge fan of BSD I wanted to at least look up the possibility to build our own system. > However, if you have the money for a NetApp setup like this ($ > 500,000+ US I'm guessing), then you also have the money to hire a > FreeBSD developer(s) to work on the parts of the system that are > critical to this (NFS, ZFS, CAM, drivers, scheduler, GEOM, etc). > Then, you could go with a white-box, custom build and have the support > in-house. > > -- > Freddie Cash > [hidden email] > _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Michael Aronsen-2
On 6 feb 2012, at 17:49, Michael Aronsen wrote: > Hi, > > On Feb 6, 2012, at 17:22 , Jeremy Chadwick wrote: >> - What single motherboard supports up to 192GB of RAM > > Get an HP DL580/585 - they support 2TB/1TB RAM. > >> - How you plan on getting roughly 410 hard disks (or 422 assuming >> an additional 12 SSDs) hooked up to a single machine > > Use LSI SAS92XX 4 (x4) port external controllers, and SuperMicro SC847E26-RJBOD1 disk shelves. > Each disk shelf needs 2 ports on the LSI controller, which means you get 90 disks per LSI card. > The DL580/585's have 11 PCIe slots, so you'd end up with 990 disks per server using this setup. > >> > > We have NetApp's at our University for home storage, but I would struggle to recommend them for HPC storage. > > A dedicated HPC filesystem such as Lustre or FhGFS (http://www.fhgfs.com/cms/) will almost certainly give you better performance as they're purpose made. > > We use FhGFS in a rather small setup (44 TB usable space and ~200 HPC nodes), but they do have installations with 700TB+. > The setup consists of 2 metadata nodes and 4 storage nodes, all supermicro servers with 24 WD Velociraptor 600 GB 10K RPM disks. > This setup gives us 4.8GB/sec write and 4.3GB/sec read speeds, all for a lot less than a comparable NetApp solution (we paid around €30.000). > It now has support for mirroring on a per folder level for resilience. > > Currently it only runs on Linux but i'm considering a FreeBSD port to get ZFS for volume management and now that OFED is in FreeBSD 9, Infinifband is possible. > > I'd highly recommend a parallel filesystem, unfortunately not many, if any, are available on FreeBSD at this time. > _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
-- Peter Ankerstål [hidden email] http://www.pean.org/ On 6 feb 2012, at 18:31, Julian Elischer wrote: > On 2/6/12 9:24 AM, Michael Fuckner wrote: >> On 02/06/2012 05:41 PM, Freddie Cash wrote: >> Hi all, >> >>> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick >>> <[hidden email]> wrote: >>>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: >>>>> I want to investigate if it is possible to create your own usable >>>>> HPC storage using zfs and some network filesystem like nfs.If you use Supermicro I would use X8DTH-iF, some LSI HBA (9200-8e, 2x Multilane external) and some JBOD-Chassis (like SUpermicro 847E16-RJBOD1) >> > > no-one seems to have mentioned the obvious route.. > > a cluster of machines, using the new iSCSI code to make some of them subservient to the others. > _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Peter Ankerstål-2
Am 06.02.2012 16:52, schrieb Peter Ankerstål:
> Hi, > > I want to investigate if it is possible to create your own usable HPC > storage using zfs and some > network filesystem like nfs. > > Just a thought experiment.. > A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) > I addition the machine will use 3-6 SSD drives for ZIL FYI: With ESXi's NFS client (**the most evil of anything doing synchronous writes), striping my ZIL made no difference. Either way, it used 100% load on the SSDs and went the same pathetic speed. (gstripe worked to some extent) Using "-o sync" in the Linux NFS client does not use the ZIL at max load*. If you can suggest another way to test (without ESXi), I will try it on 3 SSDs (4 if the one that I RMAed comes back in time) and let you know the results. * With a pool that was created in 8.2-STABLE and upgraded to v28, it did use max load. Now it doesn't. But is recreating that situation the right way to test? It is not what you will be using. ** For more info about ESXi's terrible NFS performance, see this graph http://doub.home.xs4all.nl/bench/sync.png and find the mail I got it from in this mailing list [hidden email] with the title "Re: ZFS sync / ZIL clarification". > and 3-6 SSD deives for cache. > Preferrably in mirror where applicable. > > Connected to this machine we will have about 410 3TB drives to give > approx > 1PB of usable storage in a 8+2 raidz configuration. > > Connected to this will be a ~800 nodes big HPC cluster that will > access the storage in parallell > is this even possible or do we need to distribute the meta data load > over many servers? If that is the case, > does it exist any software for FreeBSD that could accomplish this > distribution (pNFS dosent seem to be > anywhere close to usable in FreeBSD) or do I need to call NetApp or > Panasas right away? It would be > really nice if I could build my own storage solution. > > Other possible solutions to this problem is extremley welcome. > > Best Regards > Peter Ankerstål > > _______________________________________________ > [hidden email] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "[hidden email]" _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Daniel Kalchev
On 02/06/2012 06:34 PM, Daniel Kalchev wrote:
> The thing is doable with commodity hardware, but I wonder how one ever backups such setup? zfs send to a second machine? _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
> zfs send to a second machine?
Is there not likely to be a degradation in performance while doing the backup? ...which ties into something that I think is missing from this thread: a statement of what this massive pool of storage is going to be used for. "HPC" is a pretty broad term, and isn't quite a specification in my book. I have faith that the relevant BSD technologies can be cobbled together to get OP to the performance level he needs, but -- and perhaps I have failed basic reading comprehension here -- I don't think we have a clear enough picture of what that level actually /is/ to offer advice on what is almost certainly going to be a multi-million dollar installation! KC_______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Michael Aronsen-2
On 02/06/2012 05:49 PM, Michael Aronsen wrote:
>> - How you plan on getting roughly 410 hard disks (or 422 assuming >> an additional 12 SSDs) hooked up to a single machine > > Use LSI SAS92XX 4 (x4) port external controllers, and SuperMicro SC847E26-RJBOD1 disk shelves. > Each disk shelf needs 2 ports on the LSI controller, which means you get 90 disks per LSI card. > The DL580/585's have 11 PCIe slots, so you'd end up with 990 disks per server using this setup. The backplanes/expanders in the SC847E16-RJBOD1 (E16 = SATA/single expander version, E26 = SAS/dual expander version) can be daisy chained if you want to lower the number of controllers/ports. I have no E26 to test on but I guess it may support daisy chaining too. The LSI 9205-8e controller claims to support 1024 devices being connected to its two 4x6gb ports. Does anyone happen to know many daisy chain jumps are supported by SAS? Would it be possible to daisy chain 10 shelves with 450 drives onto 1 of those controllers? That would require 10 jumps on two chains or 20 jumps on one chain... (Yes, 6000*8/450 = 107 Mbit per drive, but ignore that part. ;) -- Erik _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Peter Ankerstål-2
On Mon, 06 Feb 2012 16:52:11 +0100, Peter Ankerstål <[hidden email]> wrote:
> Hi, > > I want to investigate if it is possible to create your own usable HPC > storage using zfs and some > network filesystem like nfs. > > Just a thought experiment.. > A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) > I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD > deives for cache. > Preferrably in mirror where applicable. > > Connected to this machine we will have about 410 3TB drives to give > approx > 1PB of usable storage in a 8+2 raidz configuration. > > Connected to this will be a ~800 nodes big HPC cluster that will access > the storage in parallell > is this even possible or do we need to distribute the meta data load > over many servers? If that is the case, > does it exist any software for FreeBSD that could accomplish this > distribution (pNFS dosent seem to be > anywhere close to usable in FreeBSD) or do I need to call NetApp or > Panasas right away? It would be > really nice if I could build my own storage solution. > > Other possible solutions to this problem is extremley welcome. > > Best Regards > Peter Ankerstål You might make a call to Backblaze. http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ Ronald. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Michael Aronsen-2
Hi Michael
what is the impact on the latency read and latency write to use a distributed system ? Regards Charles Le 06/02/2012 17:49, Michael Aronsen a écrit : > Hi, > > On Feb 6, 2012, at 17:22 , Jeremy Chadwick wrote: >> - What single motherboard supports up to 192GB of RAM > Get an HP DL580/585 - they support 2TB/1TB RAM. > >> - How you plan on getting roughly 410 hard disks (or 422 assuming >> an additional 12 SSDs) hooked up to a single machine > Use LSI SAS92XX 4 (x4) port external controllers, and SuperMicro SC847E26-RJBOD1 disk shelves. > Each disk shelf needs 2 ports on the LSI controller, which means you get 90 disks per LSI card. > The DL580/585's have 11 PCIe slots, so you'd end up with 990 disks per server using this setup. > >> If you are considering investing the time and especially money (the cost >> here is almost unfathomable, IMO) into this, I strongly recommend you >> consider an actual hardware filer (e.g. NetApp). Your performance and >> reliability will be much greater, plus you will get overall better >> support from NetApp in the case something goes wrong. In the case you >> run into problems with FreeBSD (and I can assure you in this kind of >> setup you will) with this kind of extensive setup, you will be at the >> mercy of developers' time/schedules with absolutely no guarantee that >> your problem will be solved. You definitely want a support contract. >> Thus, go NetApp. > We have NetApp's at our University for home storage, but I would struggle to recommend them for HPC storage. > > A dedicated HPC filesystem such as Lustre or FhGFS (http://www.fhgfs.com/cms/) will almost certainly give you better performance as they're purpose made. > > We use FhGFS in a rather small setup (44 TB usable space and ~200 HPC nodes), but they do have installations with 700TB+. > The setup consists of 2 metadata nodes and 4 storage nodes, all supermicro servers with 24 WD Velociraptor 600 GB 10K RPM disks. > This setup gives us 4.8GB/sec write and 4.3GB/sec read speeds, all for a lot less than a comparable NetApp solution (we paid around €30.000). > It now has support for mirroring on a per folder level for resilience. > > Currently it only runs on Linux but i'm considering a FreeBSD port to get ZFS for volume management and now that OFED is in FreeBSD 9, Infinifband is possible. > > I'd highly recommend a parallel filesystem, unfortunately not many, if any, are available on FreeBSD at this time. > > Regards, > Michael > > _______________________________________________ > [hidden email] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "[hidden email]" _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
| Powered by Nabble | Edit this page |
