Quantcast

HPC and zfs.

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

HPC and zfs.

Peter Ankerstål-2
Hi,

I want to investigate if it is possible to create your own usable HPC
storage using zfs and some
network filesystem like nfs.

Just a thought experiment..
A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
deives for  cache.
Preferrably in  mirror where applicable.

Connected to this machine we will have about 410 3TB drives to give approx
1PB of usable storage in a 8+2 raidz configuration.

Connected to this will be a ~800 nodes big HPC cluster that will access
the storage in parallell
is this even possible or do we need to distribute the meta data load
over many servers? If that is the case,
does it exist any software for FreeBSD that could  accomplish this
distribution (pNFS  dosent seem to be
anywhere close to usable in FreeBSD) or do I need to call NetApp or
Panasas right away? It would be
really nice if I could build my own storage solution.

Other possible solutions to this problem is extremley welcome.

Best Regards
Peter Ankerstål

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Jeremy Chadwick
On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:

> I want to investigate if it is possible to create your own usable
> HPC storage using zfs and some
> network filesystem like nfs.
>
> Just a thought experiment..
> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
> deives for  cache.
> Preferrably in  mirror where applicable.
>
> Connected to this machine we will have about 410 3TB drives to give approx
> 1PB of usable storage in a 8+2 raidz configuration.
>
> Connected to this will be a ~800 nodes big HPC cluster that will
> access the storage in parallell
> is this even possible or do we need to distribute the meta data load
> over many servers? If that is the case,
> does it exist any software for FreeBSD that could  accomplish this
> distribution (pNFS  dosent seem to be
> anywhere close to usable in FreeBSD) or do I need to call NetApp or
> Panasas right away? It would be
> really nice if I could build my own storage solution.
>
> Other possible solutions to this problem is extremley welcome.

For starters I'd love to know:

- What single motherboard supports up to 192GB of RAM
- How you plan on getting roughly 410 hard disks (or 422 assuming
  an additional 12 SSDs) hooked up to a single machine

If you are considering investing the time and especially money (the cost
here is almost unfathomable, IMO) into this, I strongly recommend you
consider an actual hardware filer (e.g. NetApp).  Your performance and
reliability will be much greater, plus you will get overall better
support from NetApp in the case something goes wrong.  In the case you
run into problems with FreeBSD (and I can assure you in this kind of
setup you will) with this kind of extensive setup, you will be at the
mercy of developers' time/schedules with absolutely no guarantee that
your problem will be solved.  You definitely want a support contract.
Thus, go NetApp.

--
| Jeremy Chadwick                                 [hidden email] |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Rainer Duffner
Am Mon, 6 Feb 2012 08:22:06 -0800
schrieb Jeremy Chadwick <[hidden email]>:

> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:

> > Other possible solutions to this problem is extremley welcome.
>
> For starters I'd love to know:


Recently, someone had a similar proposal on the CentoS list.
Except, he really wants to build it, it's 2 PB and it's supposed to be
housed in his garage.
Someone explained to him that with the 28k BTU, his garage would get
rather warm....

There's been a presentation on the last LISA about "Your First
Peta-Byte".

It's a interesting talk, and entertaining too.
It's on youtube.

From building our own filers with Solaris and COTS shelves, I can say
that the fun quickly wears of...







_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Freddie Cash-8
In reply to this post by Jeremy Chadwick
On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
<[hidden email]> wrote:

> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>> I want to investigate if it is possible to create your own usable
>> HPC storage using zfs and some
>> network filesystem like nfs.
>>
>> Just a thought experiment..
>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
>> deives for  cache.
>> Preferrably in  mirror where applicable.
>>
>> Connected to this machine we will have about 410 3TB drives to give approx
>> 1PB of usable storage in a 8+2 raidz configuration.
>>
>> Connected to this will be a ~800 nodes big HPC cluster that will
>> access the storage in parallell
>> is this even possible or do we need to distribute the meta data load
>> over many servers? If that is the case,
>> does it exist any software for FreeBSD that could  accomplish this
>> distribution (pNFS  dosent seem to be
>> anywhere close to usable in FreeBSD) or do I need to call NetApp or
>> Panasas right away? It would be
>> really nice if I could build my own storage solution.
>>
>> Other possible solutions to this problem is extremley welcome.
>
> For starters I'd love to know:
>
> - What single motherboard supports up to 192GB of RAM

SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM
slots).  It's an AMD board, but there should be variants that support
Intel CPUs.  It's not uncommon to support 256 GB of RAM these days,
although 128 GB boards are much more common.

> - How you plan on getting roughly 410 hard disks (or 422 assuming
>  an additional 12 SSDs) hooked up to a single machine

In a "head node" + "JBOD" setup?  Where the head node has a mobo that
supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of
16-24 port multi-lane SAS/SATA controllers with external ports that
are cabled up to external JBOD boxes.  The SSDs would be connected to
the mobo SAS/SATA ports.

Each JBOD box contains nothing but power, SAS/SATA backplane, and
harddrives.  Possibly using SAS expanders.

We're considering doing the same for our SAN/NAS setup for
centralising storage for our VM hosts, although not quite to the same
scale as the OP.  :)

> If you are considering investing the time and especially money (the cost
> here is almost unfathomable, IMO) into this, I strongly recommend you
> consider an actual hardware filer (e.g. NetApp).  Your performance and
> reliability will be much greater, plus you will get overall better
> support from NetApp in the case something goes wrong.  In the case you
> run into problems with FreeBSD (and I can assure you in this kind of
> setup you will) with this kind of extensive setup, you will be at the
> mercy of developers' time/schedules with absolutely no guarantee that
> your problem will be solved.  You definitely want a support contract.
> Thus, go NetApp.

For an HPC setup like the OP wants, where performance and uptime are
critical, I agree. You don't want to be skimping on the hardware and
software.

However, if you have the money for a NetApp setup like this ($
500,000+ US I'm guessing), then you also have the money to hire a
FreeBSD developer(s) to work on the parts of the system that are
critical to this (NFS, ZFS, CAM, drivers, scheduler, GEOM, etc).
Then, you could go with a white-box, custom build and have the support
in-house.

--
Freddie Cash
[hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Michael Aronsen-2
In reply to this post by Jeremy Chadwick
Hi,

On Feb 6, 2012, at 17:22 , Jeremy Chadwick wrote:
> - What single motherboard supports up to 192GB of RAM

Get an HP DL580/585 - they support 2TB/1TB RAM.

> - How you plan on getting roughly 410 hard disks (or 422 assuming
>  an additional 12 SSDs) hooked up to a single machine

Use LSI SAS92XX 4 (x4) port external controllers, and SuperMicro SC847E26-RJBOD1 disk shelves.
Each disk shelf needs 2 ports on the LSI controller, which means you get 90 disks per LSI card.
The DL580/585's have 11 PCIe slots, so you'd end up with 990 disks per server using this setup.

>
> If you are considering investing the time and especially money (the cost
> here is almost unfathomable, IMO) into this, I strongly recommend you
> consider an actual hardware filer (e.g. NetApp).  Your performance and
> reliability will be much greater, plus you will get overall better
> support from NetApp in the case something goes wrong.  In the case you
> run into problems with FreeBSD (and I can assure you in this kind of
> setup you will) with this kind of extensive setup, you will be at the
> mercy of developers' time/schedules with absolutely no guarantee that
> your problem will be solved.  You definitely want a support contract.
> Thus, go NetApp.

We have NetApp's at our University for home storage, but I would struggle to recommend them for HPC storage.

A dedicated HPC filesystem such as Lustre or FhGFS (http://www.fhgfs.com/cms/) will almost certainly give you better performance as they're purpose made.

We use FhGFS in a rather small setup (44 TB usable space and ~200 HPC nodes), but they do have installations with 700TB+.
The setup consists of 2 metadata nodes and 4 storage nodes, all supermicro servers with 24 WD Velociraptor 600 GB 10K RPM disks.
This setup gives us 4.8GB/sec write and 4.3GB/sec read speeds, all for a lot less than a comparable NetApp solution (we paid around €30.000).
It now has support for mirroring on a per folder level for resilience.

Currently it only runs on Linux but i'm considering a FreeBSD port to get ZFS for volume management and now that OFED is in FreeBSD 9, Infinifband is possible.

I'd highly recommend a parallel filesystem, unfortunately not many, if any, are available on FreeBSD at this time.

Regards,
Michael

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Michael Fuckner
In reply to this post by Freddie Cash-8
On 02/06/2012 05:41 PM, Freddie Cash wrote:
Hi all,

> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
> <[hidden email]>  wrote:
>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>>> I want to investigate if it is possible to create your own usable
>>> HPC storage using zfs and some network filesystem like nfs.
especially HPS sounds interesting to me- but for HPC you typicially need
fast r/w-access for all nodes in the cluster. That's why Lustre uses
several storages for concurring access over a fast link (typicially
Infiniband)

Another thing to think about is CPU: you probably need weeks for a
rebuild of a single disk in a Petabyte Filesystem- I haven't tried this
with ZFS yet, but I'm really interested if anyone already did this.

The whole setup sounds a little bit like the system shown by aberdeen:
http://www.aberdeeninc.com/abcatg/petabyte-storage.htm

schematics at tomshardware:
http://www.tomshardware.de/fotoreportage/137-Aberdeen-petarack-petabyte-sas.html

The Problem with Aberdeen is they don't use Zil/ L2Arc.



>>> Just a thought experiment..
>>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
>>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
>>> deives for  cache.
>>> Preferrably in  mirror where applicable.
>>>
>>> Connected to this machine we will have about 410 3TB drives to give approx
>>> 1PB of usable storage in a 8+2 raidz configuration.
I don't know what the situation is for the rest of the world, but 3TB
currently is still hard to buy in Europe/ Germany.

>>> Connected to this will be a ~800 nodes big HPC cluster that will
>>> access the storage in parallell
what is your typical load pattern?

>>> is this even possible or do we need to distribute the meta data load
>>> over many servers?
It is a good idea to have

>>> If that is the case,
>>> does it exist any software for FreeBSD that could  accomplish this
>>> distribution (pNFS  dosent seem to be
>>> anywhere close to usable in FreeBSD) or do I need to call NetApp or
>>> Panasas right away?
not that I know of

> SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM
> slots).  It's an AMD board, but there should be variants that support
> Intel CPUs.  It's not uncommon to support 256 GB of RAM these days,
> although 128 GB boards are much more common.
Currently Intel CPUs have 3 Memory Channels.

If you have 2 Sockets, 2 Dimms per Channel, 3 Channels- 12 Dimms with
cheap 16GB Modules is 192GB. 32GB are also available today ;-)



>> - How you plan on getting roughly 410 hard disks (or 422 assuming
>>   an additional 12 SSDs) hooked up to a single machine
>
> In a "head node" + "JBOD" setup?  Where the head node has a mobo that
> supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of
> 16-24 port multi-lane SAS/SATA controllers with external ports that
> are cabled up to external JBOD boxes.  The SSDs would be connected to
> the mobo SAS/SATA ports.
>
> Each JBOD box contains nothing but power, SAS/SATA backplane, and
> harddrives.  Possibly using SAS expanders.
If you use Supermicro I would use X8DTH-iF, some LSI HBA (9200-8e, 2x
Multilane external) and some JBOD-Chassis (like SUpermicro 847E16-RJBOD1)

Regards,
  Michael!
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Julian Elischer-5
In reply to this post by Freddie Cash-8
On 2/6/12 8:41 AM, Freddie Cash wrote:

> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
> <[hidden email]>  wrote:
>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>>> I want to investigate if it is possible to create your own usable
>>> HPC storage using zfs and some
>>> network filesystem like nfs.
>>>
>>> Just a thought experiment..
>>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
>>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
>>> deives for  cache.
>>> Preferrably in  mirror where applicable.
>>>
>>> Connected to this machine we will have about 410 3TB drives to give approx
>>> 1PB of usable storage in a 8+2 raidz configuration.
>>>
>>> Connected to this will be a ~800 nodes big HPC cluster that will
>>> access the storage in parallell
>>> is this even possible or do we need to distribute the meta data load
>>> over many servers? If that is the case,
>>> does it exist any software for FreeBSD that could  accomplish this
>>> distribution (pNFS  dosent seem to be
>>> anywhere close to usable in FreeBSD) or do I need to call NetApp or
>>> Panasas right away? It would be
>>> really nice if I could build my own storage solution.
>>>
>>> Other possible solutions to this problem is extremley welcome.
>> For starters I'd love to know:
>>
>> - What single motherboard supports up to 192GB of RAM
> SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM
> slots).  It's an AMD board, but there should be variants that support
> Intel CPUs.  It's not uncommon to support 256 GB of RAM these days,
> although 128 GB boards are much more common.
>

common wisdom for ZFS is 1GB of RAM per TB of storage..
so 256GB might not be enough.

people who have actually tried ZFS more than me may want to comment
more on this..


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Julian Elischer-5
In reply to this post by Michael Fuckner
On 2/6/12 9:24 AM, Michael Fuckner wrote:

> On 02/06/2012 05:41 PM, Freddie Cash wrote:
> Hi all,
>
>> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
>> <[hidden email]>  wrote:
>>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>>>> I want to investigate if it is possible to create your own usable
>>>> HPC storage using zfs and some network filesystem like nfs.If you
>>>> use Supermicro I would use X8DTH-iF, some LSI HBA (9200-8e, 2x
>>>> Multilane external) and some JBOD-Chassis (like SUpermicro
>>>> 847E16-RJBOD1)
>

no-one seems to have mentioned the obvious route..

a cluster of machines, using the new iSCSI code to make some of them
subservient to the others.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Daniel Kalchev
In reply to this post by Michael Fuckner

On Feb 6, 2012, at 7:24 PM, Michael Fuckner wrote:

> Another thing to think about is CPU: you probably need weeks for a rebuild of a single disk in a Petabyte Filesystem- I haven't tried this with ZFS yet, but I'm really interested if anyone already did this.

This is where ZFS will shine. Depending on how you stripe disks, you can either get super fast resilver (if you go for stripe of mirrors), to fast (if you go for small number of disks raidz) to reasonable (if you of for large number of disks raidz). If you need high TPS you will want to go with mirrors anyway.

The thing is doable with commodity hardware, but I wonder how one ever backups such setup?

Daniel_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Freddie Cash-8
On Mon, Feb 6, 2012 at 9:34 AM, Daniel Kalchev <[hidden email]> wrote:
> On Feb 6, 2012, at 7:24 PM, Michael Fuckner wrote:
>> Another thing to think about is CPU: you probably need weeks for a rebuild of a single disk in a Petabyte Filesystem- I haven't tried this with ZFS yet, but I'm really interested if anyone already did this.
>
> This is where ZFS will shine. Depending on how you stripe disks, you can either get super fast resilver (if you go for stripe of mirrors), to fast (if you go for small number of disks raidz) to reasonable (if you of for large number of disks raidz). If you need high TPS you will want to go with mirrors anyway.
>
> The thing is doable with commodity hardware, but I wonder how one ever backups such setup?

With a second box configured similarily.  :)  Although, trying to find
"downtime" to do the backups ...

--
Freddie Cash
[hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Bob Friesenhahn
In reply to this post by Michael Fuckner
On Mon, 6 Feb 2012, Michael Fuckner wrote:
>
> Another thing to think about is CPU: you probably need weeks for a rebuild of
> a single disk in a Petabyte Filesystem- I haven't tried this with ZFS yet,
> but I'm really interested if anyone already did this.

Why would a disk rebuild take longer for a petabyte filesystem rather
than a tens of gigabytes filesystem?

The time to rebuild the disk primarily depends on the RAID type used
for the zfs vdev (mirrors, raidz1, raidz2, raidz3), how many disks
there are in the vdev, the degree of fragmentation, the amount of data
stored on that disk, and the disk seek times.

In a huge system, it makes sense to be more conservative about the zfs
vdev design, and use more vdevs with fewer disks per vdev.  Using
anything less than raidz2 would be an error.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Peter Ankerstål-2
In reply to this post by Freddie Cash-8

--
Peter Ankerstål
[hidden email]
http://www.pean.org/

On 6 feb 2012, at 17:41, Freddie Cash wrote:

> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
> <[hidden email]> wrote:
>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>>> I want to investigate if it is possible to create your own usable
>>> HPC storage using zfs and some
>>> network filesystem like nfs.
>>>
>>> Just a thought experiment..
>>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
>>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
>>> deives for  cache.
>>> Preferrably in  mirror where applicable.
>>>
>>> Connected to this machine we will have about 410 3TB drives to give approx
>>> 1PB of usable storage in a 8+2 raidz configuration.
>>>
>>> Connected to this will be a ~800 nodes big HPC cluster that will
>>> access the storage in parallell
>>> is this even possible or do we need to distribute the meta data load
>>> over many servers? If that is the case,
>>> does it exist any software for FreeBSD that could  accomplish this
>>> distribution (pNFS  dosent seem to be
>>> anywhere close to usable in FreeBSD) or do I need to call NetApp or
>>> Panasas right away? It would be
>>> really nice if I could build my own storage solution.
>>>
>>> Other possible solutions to this problem is extremley welcome.
>>
>> For starters I'd love to know:
>>
>> - What single motherboard supports up to 192GB of RAM
>
> SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM
> slots).  It's an AMD board, but there should be variants that support
> Intel CPUs.  It's not uncommon to support 256 GB of RAM these days,
> although 128 GB boards are much more common.
Yeah, the one I was looking at was SuperMicro X8DTU-F, but yeah, the more
money RAM the better.

>
>> - How you plan on getting roughly 410 hard disks (or 422 assuming
>>  an additional 12 SSDs) hooked up to a single machine
>
> In a "head node" + "JBOD" setup?  Where the head node has a mobo that
> supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of
> 16-24 port multi-lane SAS/SATA controllers with external ports that
> are cabled up to external JBOD boxes.  The SSDs would be connected to
> the mobo SAS/SATA ports.
>
> Each JBOD box contains nothing but power, SAS/SATA backplane, and
> harddrives.  Possibly using SAS expanders.
>
> We're considering doing the same for our SAN/NAS setup for
> centralising storage for our VM hosts, although not quite to the same
> scale as the OP.  :)

Yep, NetApp has disk-shelves that can be configured JBOD that fits 60 drives
into 4U. :D

>
>> If you are considering investing the time and especially money (the cost
>> here is almost unfathomable, IMO) into this, I strongly recommend you
>> consider an actual hardware filer (e.g. NetApp).  Your performance and
>> reliability will be much greater, plus you will get overall better
>> support from NetApp in the case something goes wrong.  In the case you
>> run into problems with FreeBSD (and I can assure you in this kind of
>> setup you will) with this kind of extensive setup, you will be at the
>> mercy of developers' time/schedules with absolutely no guarantee that
>> your problem will be solved.  You definitely want a support contract.
>> Thus, go NetApp.
>
> For an HPC setup like the OP wants, where performance and uptime are
> critical, I agree. You don't want to be skimping on the hardware and
> software.
>
A big consideration for us is also the installation. If we go with something like
NetApp they can install the system and we don't need to put in the extra hours
(probably a lot) the get the thing running. But being a huge fan of BSD I wanted
to at least look up the possibility to build our own system.

> However, if you have the money for a NetApp setup like this ($
> 500,000+ US I'm guessing), then you also have the money to hire a
> FreeBSD developer(s) to work on the parts of the system that are
> critical to this (NFS, ZFS, CAM, drivers, scheduler, GEOM, etc).
> Then, you could go with a white-box, custom build and have the support
> in-house.
>
> --
> Freddie Cash
> [hidden email]
>

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Peter Ankerstål-2
In reply to this post by Michael Aronsen-2


On 6 feb 2012, at 17:49, Michael Aronsen wrote:

> Hi,
>
> On Feb 6, 2012, at 17:22 , Jeremy Chadwick wrote:
>> - What single motherboard supports up to 192GB of RAM
>
> Get an HP DL580/585 - they support 2TB/1TB RAM.
>
>> - How you plan on getting roughly 410 hard disks (or 422 assuming
>>  an additional 12 SSDs) hooked up to a single machine
>
> Use LSI SAS92XX 4 (x4) port external controllers, and SuperMicro SC847E26-RJBOD1 disk shelves.
> Each disk shelf needs 2 ports on the LSI controller, which means you get 90 disks per LSI card.
> The DL580/585's have 11 PCIe slots, so you'd end up with 990 disks per server using this setup.
>
>>
>
> We have NetApp's at our University for home storage, but I would struggle to recommend them for HPC storage.
>
> A dedicated HPC filesystem such as Lustre or FhGFS (http://www.fhgfs.com/cms/) will almost certainly give you better performance as they're purpose made.
>
> We use FhGFS in a rather small setup (44 TB usable space and ~200 HPC nodes), but they do have installations with 700TB+.
> The setup consists of 2 metadata nodes and 4 storage nodes, all supermicro servers with 24 WD Velociraptor 600 GB 10K RPM disks.
> This setup gives us 4.8GB/sec write and 4.3GB/sec read speeds, all for a lot less than a comparable NetApp solution (we paid around €30.000).
> It now has support for mirroring on a per folder level for resilience.
>
> Currently it only runs on Linux but i'm considering a FreeBSD port to get ZFS for volume management and now that OFED is in FreeBSD 9, Infinifband is possible.
>
> I'd highly recommend a parallel filesystem, unfortunately not many, if any, are available on FreeBSD at this time.
>
Thanks for the input. We recently had a visit by NetApp and Whamcloud actually and they where pitching for a NetApp+Whamcloud(lustre) installation.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Peter Ankerstål-2
In reply to this post by Julian Elischer-5

--
Peter Ankerstål
[hidden email]
http://www.pean.org/

On 6 feb 2012, at 18:31, Julian Elischer wrote:

> On 2/6/12 9:24 AM, Michael Fuckner wrote:
>> On 02/06/2012 05:41 PM, Freddie Cash wrote:
>> Hi all,
>>
>>> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
>>> <[hidden email]>  wrote:
>>>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>>>>> I want to investigate if it is possible to create your own usable
>>>>> HPC storage using zfs and some network filesystem like nfs.If you use Supermicro I would use X8DTH-iF, some LSI HBA (9200-8e, 2x Multilane external) and some JBOD-Chassis (like SUpermicro 847E16-RJBOD1)
>>
>
> no-one seems to have mentioned the obvious route..
>
> a cluster of machines, using the new iSCSI code to make some of them subservient to the others.
>
I have thought about iSCSI but will this actually scale meta data performance?

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Peter Maloney
In reply to this post by Peter Ankerstål-2
Am 06.02.2012 16:52, schrieb Peter Ankerstål:
> Hi,
>
> I want to investigate if it is possible to create your own usable HPC
> storage using zfs and some
> network filesystem like nfs.
>
> Just a thought experiment..
> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
> I addition the machine will use 3-6 SSD drives for ZIL
FYI: With ESXi's NFS client (**the most evil of anything doing
synchronous writes), striping my ZIL made no difference. Either way, it
used 100% load on the SSDs and went the same pathetic speed. (gstripe
worked to some extent)

Using "-o sync" in the Linux NFS client does not use the ZIL at max
load*. If you can suggest another way to test (without ESXi), I will try
it on 3 SSDs (4 if the one that I RMAed comes back in time) and let you
know the results.

* With a pool that was created in 8.2-STABLE and upgraded to v28, it did
use max load. Now it doesn't. But is recreating that situation the right
way to test? It is not what you will be using.

** For more info about ESXi's terrible NFS performance, see this graph
http://doub.home.xs4all.nl/bench/sync.png and find the mail I got it
from in this mailing list [hidden email] with the title "Re:
ZFS sync / ZIL clarification".

> and 3-6 SSD deives for  cache.
> Preferrably in  mirror where applicable.
>
> Connected to this machine we will have about 410 3TB drives to give
> approx
> 1PB of usable storage in a 8+2 raidz configuration.
>
> Connected to this will be a ~800 nodes big HPC cluster that will
> access the storage in parallell
> is this even possible or do we need to distribute the meta data load
> over many servers? If that is the case,
> does it exist any software for FreeBSD that could  accomplish this
> distribution (pNFS  dosent seem to be
> anywhere close to usable in FreeBSD) or do I need to call NetApp or
> Panasas right away? It would be
> really nice if I could build my own storage solution.
>
> Other possible solutions to this problem is extremley welcome.
>
> Best Regards
> Peter Ankerstål
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Michael Fuckner
In reply to this post by Daniel Kalchev
On 02/06/2012 06:34 PM, Daniel Kalchev wrote:
> The thing is doable with commodity hardware, but I wonder how one ever backups such setup?
zfs send to a second machine?

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: HPC and zfs.

Kamil Choudhury
> zfs send to a second machine?

Is there not likely to be a degradation in performance while doing the backup?

...which ties into something that I think is missing from this thread: a statement of what this massive
pool of storage is going to be used for. "HPC" is a pretty broad term, and isn't quite a specification in
my book.

I have faith that the relevant BSD technologies can be cobbled together to get OP to the performance
level he needs, but -- and perhaps I have failed basic reading comprehension here -- I don't think we
have a clear enough picture of what that level actually /is/ to offer advice on what is almost certainly
going to be a multi-million dollar installation!

KC_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Erik Stian Tefre
In reply to this post by Michael Aronsen-2
On 02/06/2012 05:49 PM, Michael Aronsen wrote:
>> - How you plan on getting roughly 410 hard disks (or 422 assuming
>>  an additional 12 SSDs) hooked up to a single machine
>
> Use LSI SAS92XX 4 (x4) port external controllers, and SuperMicro SC847E26-RJBOD1 disk shelves.
> Each disk shelf needs 2 ports on the LSI controller, which means you get 90 disks per LSI card.
> The DL580/585's have 11 PCIe slots, so you'd end up with 990 disks per server using this setup.

The backplanes/expanders in the SC847E16-RJBOD1 (E16 = SATA/single
expander version, E26 = SAS/dual expander version) can be daisy chained
if you want to lower the number of controllers/ports. I have no E26 to
test on but I guess it may support daisy chaining too.

The LSI 9205-8e controller claims to support 1024 devices being
connected to its two 4x6gb ports. Does anyone happen to know many daisy
chain jumps are supported by SAS? Would it be possible to daisy chain 10
shelves with 450 drives onto 1 of those controllers? That would require
10 jumps on two chains or 20 jumps on one chain...
(Yes, 6000*8/450 = 107 Mbit per drive, but ignore that part. ;)

--
Erik
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Ronald Klop-2
In reply to this post by Peter Ankerstål-2
On Mon, 06 Feb 2012 16:52:11 +0100, Peter Ankerstål <[hidden email]> wrote:

> Hi,
>
> I want to investigate if it is possible to create your own usable HPC  
> storage using zfs and some
> network filesystem like nfs.
>
> Just a thought experiment..
> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD  
> deives for  cache.
> Preferrably in  mirror where applicable.
>
> Connected to this machine we will have about 410 3TB drives to give  
> approx
> 1PB of usable storage in a 8+2 raidz configuration.
>
> Connected to this will be a ~800 nodes big HPC cluster that will access  
> the storage in parallell
> is this even possible or do we need to distribute the meta data load  
> over many servers? If that is the case,
> does it exist any software for FreeBSD that could  accomplish this  
> distribution (pNFS  dosent seem to be
> anywhere close to usable in FreeBSD) or do I need to call NetApp or  
> Panasas right away? It would be
> really nice if I could build my own storage solution.
>
> Other possible solutions to this problem is extremley welcome.
>
> Best Regards
> Peter Ankerstål

You might make a call to Backblaze.

http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

Ronald.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPC and zfs.

Charles Orbello
In reply to this post by Michael Aronsen-2
Hi Michael

what is the impact on the latency read and latency write to use a
distributed system ?

Regards
Charles

Le 06/02/2012 17:49, Michael Aronsen a écrit :

> Hi,
>
> On Feb 6, 2012, at 17:22 , Jeremy Chadwick wrote:
>> - What single motherboard supports up to 192GB of RAM
> Get an HP DL580/585 - they support 2TB/1TB RAM.
>
>> - How you plan on getting roughly 410 hard disks (or 422 assuming
>>   an additional 12 SSDs) hooked up to a single machine
> Use LSI SAS92XX 4 (x4) port external controllers, and SuperMicro SC847E26-RJBOD1 disk shelves.
> Each disk shelf needs 2 ports on the LSI controller, which means you get 90 disks per LSI card.
> The DL580/585's have 11 PCIe slots, so you'd end up with 990 disks per server using this setup.
>
>> If you are considering investing the time and especially money (the cost
>> here is almost unfathomable, IMO) into this, I strongly recommend you
>> consider an actual hardware filer (e.g. NetApp).  Your performance and
>> reliability will be much greater, plus you will get overall better
>> support from NetApp in the case something goes wrong.  In the case you
>> run into problems with FreeBSD (and I can assure you in this kind of
>> setup you will) with this kind of extensive setup, you will be at the
>> mercy of developers' time/schedules with absolutely no guarantee that
>> your problem will be solved.  You definitely want a support contract.
>> Thus, go NetApp.
> We have NetApp's at our University for home storage, but I would struggle to recommend them for HPC storage.
>
> A dedicated HPC filesystem such as Lustre or FhGFS (http://www.fhgfs.com/cms/) will almost certainly give you better performance as they're purpose made.
>
> We use FhGFS in a rather small setup (44 TB usable space and ~200 HPC nodes), but they do have installations with 700TB+.
> The setup consists of 2 metadata nodes and 4 storage nodes, all supermicro servers with 24 WD Velociraptor 600 GB 10K RPM disks.
> This setup gives us 4.8GB/sec write and 4.3GB/sec read speeds, all for a lot less than a comparable NetApp solution (we paid around €30.000).
> It now has support for mirroring on a per folder level for resilience.
>
> Currently it only runs on Linux but i'm considering a FreeBSD port to get ZFS for volume management and now that OFED is in FreeBSD 9, Infinifband is possible.
>
> I'd highly recommend a parallel filesystem, unfortunately not many, if any, are available on FreeBSD at this time.
>
> Regards,
> Michael
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Loading...