Quantcast

UFS Crash and directories now missing

classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

UFS Crash and directories now missing

Alejandro Imass-6
Hi folks,

We had a server crash and required a hard reboot. The system is on one
disk and another disc mounts /usr/jails and everything runs in jails,
pristine base system, and the base system is working perfectly.

The second volume, the one with the jails mounted but every jail
directory disappeared except one. df still shows the data being used
so I'm guessing it's a logical error in the directory structure or
something. I unmounted the drive and ran fsck and reported no
problems. df shows the data being use so where is the data??

This is FreeBSD 8.2 updated, patched etc. The volume was UFS + Journal

Any help is GREATLY appreciated!

Thanks!

--
Alejandro Imass
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Alejandro Imass-6
On Fri, Apr 27, 2012 at 7:52 PM, Alejandro Imass <[hidden email]> wrote:

> Hi folks,
>
> We had a server crash and required a hard reboot. The system is on one
> disk and another disc mounts /usr/jails and everything runs in jails,
> pristine base system, and the base system is working perfectly.
>
> The second volume, the one with the jails mounted but every jail
> directory disappeared except one. df still shows the data being used
> so I'm guessing it's a logical error in the directory structure or
> something. I unmounted the drive and ran fsck and reported no
> problems. df shows the data being use so where is the data??
>

OK, so here is an update, maybe someone has some clue here....

All the jails wound up in the /usr/local/etc/apache22 of the only
surviving jail which is the http proxy to all the other jails.

Right before the server crashed I noticed MySQL at 100% o several CPUs
and the server was on it's knees, so I'm wondering.... was this an
attack? is it possible that Apache or MySQL moved the files??

I mean the jails are there, I'm even backing them up right now.... but
how did these directories move here?????

Anybody has ANY logical explanation???

Thanks,

--
Alejandro Imass

> This is FreeBSD 8.2 updated, patched etc. The volume was UFS + Journal
>
> Any help is GREATLY appreciated!
>
> Thanks!
>
> --
> Alejandro Imass
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Erich Dollansky-5
Hi,

On Saturday 28 April 2012 09:33:47 Alejandro Imass wrote:

> On Fri, Apr 27, 2012 at 7:52 PM, Alejandro Imass <[hidden email]> wrote:
> >
> > We had a server crash and required a hard reboot. The system is on one
> > disk and another disc mounts /usr/jails and everything runs in jails,
> > pristine base system, and the base system is working perfectly.
> >
> > The second volume, the one with the jails mounted but every jail
> > directory disappeared except one. df still shows the data being used
> > so I'm guessing it's a logical error in the directory structure or
> > something. I unmounted the drive and ran fsck and reported no
> > problems. df shows the data being use so where is the data??
> >

what is du saying?
>
> OK, so here is an update, maybe someone has some clue here....
>
> All the jails wound up in the /usr/local/etc/apache22 of the only
> surviving jail which is the http proxy to all the other jails.

You want to say that all the data you were looking for have been moved to this directory?
>
> Right before the server crashed I noticed MySQL at 100% o several CPUs
> and the server was on it's knees, so I'm wondering.... was this an
> attack? is it possible that Apache or MySQL moved the files??
>
> I mean the jails are there, I'm even backing them up right now.... but
> how did these directories move here?????
>
> Anybody has ANY logical explanation???

Journaling is new to me. Could this be the cause?

Erich
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Alejandro Imass-6
On Fri, Apr 27, 2012 at 11:00 PM, Erich Dollansky
<[hidden email]> wrote:

> Hi,
>
> On Saturday 28 April 2012 09:33:47 Alejandro Imass wrote:
>> On Fri, Apr 27, 2012 at 7:52 PM, Alejandro Imass <[hidden email]> wrote:
>> >
>> > We had a server crash and required a hard reboot. The system is on one
>> > disk and another disc mounts /usr/jails and everything runs in jails,
>> > pristine base system, and the base system is working perfectly.
>> >
>> > The second volume, the one with the jails mounted but every jail
>> > directory disappeared except one. df still shows the data being used
>> > so I'm guessing it's a logical error in the directory structure or
>> > something. I unmounted the drive and ran fsck and reported no
>> > problems. df shows the data being use so where is the data??
>> >
>
> what is du saying?
>>
>> OK, so here is an update, maybe someone has some clue here....
>>
>> All the jails wound up in the /usr/local/etc/apache22 of the only
>> surviving jail which is the http proxy to all the other jails.
>
> You want to say that all the data you were looking for have been moved to this directory?
>>

EXACTLY THAT. In fact the data is intact and I have already backed-up
everything to another disk.

>> Right before the server crashed I noticed MySQL at 100% o several CPUs
>> and the server was on it's knees, so I'm wondering.... was this an
>> attack? is it possible that Apache or MySQL moved the files??
>>
>> I mean the jails are there, I'm even backing them up right now.... but
>> how did these directories move here?????
>>
>> Anybody has ANY logical explanation???
>
> Journaling is new to me. Could this be the cause?
>

Maybe so, I have no idea.

Maybe it's because EzJail mount volumes with each jail or some other
wild explanation. I honestly have never seen this before. I am just
glad that UFS was nice enough to keep my data somewhere at least, and
after my bad experiences with ZFS I can now say with a lot more
certainty that UFS rocks. I mean something got screwed up but the data
was not lost.

Hope someone can shed some light here..

--
Alejandro
> Erich
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Wojciech Puchar-5
In reply to this post by Alejandro Imass-6
> something. I unmounted the drive and ran fsck and reported no
> problems. df shows the data being use so where is the data??

your data is here as df shown usage and fsck see no errors. most probably
root directory of that volume got corrupted and subdirs were found and put
in lost+found
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Wojciech Puchar-5
In reply to this post by Alejandro Imass-6
>
> All the jails wound up in the /usr/local/etc/apache22 of the only
> surviving jail which is the http proxy to all the other jails.
>
> Right before the server crashed I noticed MySQL at 100% o several CPUs
> and the server was on it's knees, so I'm wondering.... was this an
> attack? is it possible that Apache or MySQL moved the files??
>
> I mean the jails are there, I'm even backing them up right now.... but
> how did these directories move here?????
>
> Anybody has ANY logical explanation???
>
99% - someone did moved them.
1% - hardware problem possibly memory. without this there is no way for
directory to be "accidentally" moved
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Alejandro Imass-6
On Sat, Apr 28, 2012 at 1:43 AM, Wojciech Puchar
<[hidden email]> wrote:

>>
>> All the jails wound up in the /usr/local/etc/apache22 of the only
>> surviving jail which is the http proxy to all the other jails.
>>
>> Right before the server crashed I noticed MySQL at 100% o several CPUs
>> and the server was on it's knees, so I'm wondering.... was this an
>> attack? is it possible that Apache or MySQL moved the files??
>>
>> I mean the jails are there, I'm even backing them up right now.... but
>> how did these directories move here?????
>>
>> Anybody has ANY logical explanation???
>>
> 99% - someone did moved them.
> 1% - hardware problem possibly memory. without this there is no way for
> directory to be "accidentally" moved

I somewhat agree, but it wasn't a person. I am the only administrator,
the only one with root access. The jails were effectively moved to the
/usr/local/etc/apache22 of the single that survived at the top level.
I'm thinking something between mount, EzJail, the journal and the way
MySQL created a great deal of head contention, so something must have
gotten corrupted at the directory level like you state, but the
strange part is no _data_ corruption as such, because I was able to
physically archive the jails, move them to the correct directory and
archived them all with ezjail-admin to a different disk. I was
thinking of formatting the jails drive, but after all this disk
activity and no errors, and everything booted up correctly, I am not
so sure now that it's needed it.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Wojciech Puchar-5
> I somewhat agree, but it wasn't a person. I am the only administrator,
> the only one with root access. The jails were effectively moved to the
> /usr/local/etc/apache22 of the single that survived at the top level.
> I'm thinking something between mount, EzJail, the journal and the way
> MySQL created a great deal of head contention, so something must have
> gotten corrupted at the directory level like you state, but the
> strange part is no _data_ corruption as such, because I was able to
> physically archive the jails, move them to the correct directory and

no matter what you do FreeBSD DOES NOT ramdomly move directories. if you
are sure you didn't move it yourself then it must be machine
hardware problem but still unlikely.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Alejandro Imass-6
On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar
<[hidden email]> wrote:

>> I somewhat agree, but it wasn't a person. I am the only administrator,
>> the only one with root access. The jails were effectively moved to the
>> /usr/local/etc/apache22 of the single that survived at the top level.
>> I'm thinking something between mount, EzJail, the journal and the way
>> MySQL created a great deal of head contention, so something must have
>> gotten corrupted at the directory level like you state, but the
>> strange part is no _data_ corruption as such, because I was able to
>> physically archive the jails, move them to the correct directory and
>
>
> no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are
> sure you didn't move it yourself then it must be machine hardware problem
> but still unlikely.

After a little more research, ___it it NOT unlikely at all___ that
under high distress and a hard boot, UFS could have somehow corrupted
the directory structure, whilst maintaining the data intact. From what
I've learned so far, UFS is actually divided into 2 layers: one that
controls the directory structure and metadata and a lower layer
containing the data, so the directories being screwed up and the data
intact it is actually quite possible.

What I'm trying to do is figure out is how it happened, and try
prevent it from happening again, so instead of dismissing it as
impossibility, I think we all should spend a little time figuring out
how these things can happen and determine how it can be prevented or
reduced.

"Should you find your neighbor's beard catch fire, it's wise to soak one's own"

--
Alejandro
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Robert Bonomi

 Alejandro Imass <[hidden email]> wrote:

> On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar
> <[hidden email]> wrote:
> >> I somewhat agree, but it wasn't a person. I am the only administrator,
> >> the only one with root access. The jails were effectively moved to the
> >> /usr/local/etc/apache22 of the single that survived at the top level.
> >> I'm thinking something between mount, EzJail, the journal and the way
> >> MySQL created a great deal of head contention, so something must have
> >> gotten corrupted at the directory level like you state, but the
> >> strange part is no _data_ corruption as such, because I was able to
> >> physically archive the jails, move them to the correct directory and
> >
> >
> > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are
> > sure you didn't move it yourself then it must be machine hardware problem
> > but still unlikely.
>
> After a little more research, ___it it NOT unlikely at all___ that
> under high distress and a hard boot, UFS could have somehow corrupted
> the directory structure, whilst maintaining the data intact.

This is techically accurate, *BUT* the specifics of the quote "corruption"
unquote in the case under discussion make it *EXTREMELY* unlikely that this
is what happened.

99.99+++% of all UFS filesystem "corruption' issues are the result of a
system crash _between_ the time cached 'meta-data' is updated in memory
and that data is flushed to disk (a deferred write).

The second most common (and vanishingly rare) failure mode is a powerfail
_as_ a sector of disk is being written -- resulting in 'garbage data'
being written to disk.

The next possibility is 'cosmic rays'.  If running on 'cheap' hardware (i.e.,
without 'ECC' memory), this can cause a *SINGLE-BIT* error in data being
output.

The fact that the 'corrupted' filesystem passed fsck -without- any reported
errors shows that everything in the filesystem meta-data was consistent

Given *that*, there are precisely *TWO* ways that the 'results' that have
been reported could have happened.

  1) "Something" did a mv(2) of the various jail directories 'from' their
     original location to the 'apache' diretory.  This involves simply
     *copying* the diretory entry from the jail's 'parent directory' to
     the apache directory, and then marking the entry in the original
     parent as 'unused'.  Nothing other than the  directory whre the jail
     'used to live', and the directory 'where it was found' are touched.
     This occured _through_ the system 'mv' function, so all the normal
     'housekeeping' was done properly.

  2) it was -not- done though mv(2) -- but that requires that a whole
     *series* of "corruptions" of the filesystem, _ALL_ of which had to
     occur in 'exactly' the right way.  They are:
       1) The -size- (filesystem metadata) of the orignal parent directory
          had to be changed to reflect the smaller size.
       2) the 'indirect block' info for the original parent directory had to
          be changed to reflect the absense of the block(s) that are no
          longer part of that file.
       3) the _size_ of the Apache directory had to be increased to reflect
          the additional block(s) that are now part o that directory.
       4) the 'indirect block' info for the apache directory has to be
          changed to reflect the presense of the new block(s) that are now
          part of that file.

    This requires multiple -hundreds- of bits 'in error', in a minimum of
    FOUR separate disk locations. A -single- failure simply *CANNOT* cause
    all of this.

The probability of a random single-bit error in a gigabyte of RAM is on the
order of one such occurance in six months.  The odds of having multiple
*simultaneous* errors is the probability of a single-bit error raised to
the power of the number of bits in error.  e.g. the probability of a
simultaneous 10-bit radom error is roughly 1 in 30 million years.  The odds
of it being a -specific- ten bits out of that gigabyte is preposterously
small.  The odds of the required specific _multiple-hundreds_ of bits in
error occuringis (conservatively) 1 in
  (30 million years)**50 * ((2**30)!) / ((2^9)!)

The first factor, alone, is over 7.1E373 years.

I think it is safe to conclude that the probabilities -greatly- favor
alternative #1.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Alejandro Imass-6
On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
<[hidden email]> wrote:

>
>  Alejandro Imass <[hidden email]> wrote:
>> On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar
>> <[hidden email]> wrote:
>> >> I somewhat agree, but it wasn't a person. I am the only administrator,
>> >> the only one with root access. The jails were effectively moved to the
>> >> /usr/local/etc/apache22 of the single that survived at the top level.
>> >> I'm thinking something between mount, EzJail, the journal and the way
>> >> MySQL created a great deal of head contention, so something must have
>> >> gotten corrupted at the directory level like you state, but the
>> >> strange part is no _data_ corruption as such, because I was able to
>> >> physically archive the jails, move them to the correct directory and
>> >
>> >
>> > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are
>> > sure you didn't move it yourself then it must be machine hardware problem
>> > but still unlikely.
>>
>> After a little more research, ___it it NOT unlikely at all___ that
>> under high distress and a hard boot, UFS could have somehow corrupted
>> the directory structure, whilst maintaining the data intact.
>
> This is techically accurate, *BUT* the specifics of the quote "corruption"
> unquote in the case under discussion make it *EXTREMELY* unlikely that this
> is what happened.
>
> 99.99+++% of all UFS filesystem "corruption' issues are the result of a
> system crash _between_ the time cached 'meta-data' is updated in memory
> and that data is flushed to disk (a deferred write).
>
> The second most common (and vanishingly rare) failure mode is a powerfail
> _as_ a sector of disk is being written -- resulting in 'garbage data'
> being written to disk.
>
> The next possibility is 'cosmic rays'.  If running on 'cheap' hardware (i.e.,
> without 'ECC' memory), this can cause a *SINGLE-BIT* error in data being
> output.
>
> The fact that the 'corrupted' filesystem passed fsck -without- any reported
> errors shows that everything in the filesystem meta-data was consistent
>
> Given *that*, there are precisely *TWO* ways that the 'results' that have
> been reported could have happened.
>
>  1) "Something" did a mv(2) of the various jail directories 'from' their
>     original location to the 'apache' diretory.  This involves simply
>     *copying* the diretory entry from the jail's 'parent directory' to
>     the apache directory, and then marking the entry in the original
>     parent as 'unused'.  Nothing other than the  directory whre the jail
>     'used to live', and the directory 'where it was found' are touched.
>     This occured _through_ the system 'mv' function, so all the normal
>     'housekeeping' was done properly.
>
>  2) it was -not- done though mv(2) -- but that requires that a whole
>     *series* of "corruptions" of the filesystem, _ALL_ of which had to
>     occur in 'exactly' the right way.  They are:

[...]

> I think it is safe to conclude that the probabilities -greatly- favor
> alternative #1.
>

OK. So after your comments and further research I concur with you on
the mv but if it wasn't a human, then this might be exposing a serious
security flaw in the jail system or the way EzJail implements it. The
whole point of using jails is to protect things like this from
happening. Given that the only jail that survived was the front-end
Apache Web server/reverse proxy, then it is also safe to suspect the
apache (or other) process running on it was able to perform a mv of
the rest of the jails to it's own /usr/local/etc/apache22 directory.

Is there no possibility is that after the system crash, the journal
recocery process and/or fsck could have moved this directories ?

Thanks,

--
Alejandro
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Alejandro Imass-6
On Sat, Apr 28, 2012 at 12:36 PM, Alejandro Imass <[hidden email]> wrote:

> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
> <[hidden email]> wrote:
>>
>>  Alejandro Imass <[hidden email]> wrote:
>>> On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar
>>> <[hidden email]> wrote:
>>> >> I somewhat agree, but it wasn't a person. I am the only administrator,
>>> >> the only one with root access. The jails were effectively moved to the
>>> >> /usr/local/etc/apache22 of the single that survived at the top level.
>>> >> I'm thinking something between mount, EzJail, the journal and the way
>>> >> MySQL created a great deal of head contention, so something must have
>>> >> gotten corrupted at the directory level like you state, but the
>>> >> strange part is no _data_ corruption as such, because I was able to
>>> >> physically archive the jails, move them to the correct directory and
>>> >
>>> >
>>> > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are
>>> > sure you didn't move it yourself then it must be machine hardware problem
>>> > but still unlikely.
>>>
>>> After a little more research, ___it it NOT unlikely at all___ that
>>> under high distress and a hard boot, UFS could have somehow corrupted
>>> the directory structure, whilst maintaining the data intact.
>>
>> This is techically accurate, *BUT* the specifics of the quote "corruption"
>> unquote in the case under discussion make it *EXTREMELY* unlikely that this
>> is what happened.
>>
>> 99.99+++% of all UFS filesystem "corruption' issues are the result of a
>> system crash _between_ the time cached 'meta-data' is updated in memory
>> and that data is flushed to disk (a deferred write).
>>
>> The second most common (and vanishingly rare) failure mode is a powerfail
>> _as_ a sector of disk is being written -- resulting in 'garbage data'
>> being written to disk.
>>
>> The next possibility is 'cosmic rays'.  If running on 'cheap' hardware (i.e.,
>> without 'ECC' memory), this can cause a *SINGLE-BIT* error in data being
>> output.
>>
>> The fact that the 'corrupted' filesystem passed fsck -without- any reported
>> errors shows that everything in the filesystem meta-data was consistent
>>
>> Given *that*, there are precisely *TWO* ways that the 'results' that have
>> been reported could have happened.
>>
>>  1) "Something" did a mv(2) of the various jail directories 'from' their
>>     original location to the 'apache' diretory.  This involves simply
>>     *copying* the diretory entry from the jail's 'parent directory' to
>>     the apache directory, and then marking the entry in the original
>>     parent as 'unused'.  Nothing other than the  directory whre the jail
>>     'used to live', and the directory 'where it was found' are touched.
>>     This occured _through_ the system 'mv' function, so all the normal
>>     'housekeeping' was done properly.
>>
>>  2) it was -not- done though mv(2) -- but that requires that a whole
>>     *series* of "corruptions" of the filesystem, _ALL_ of which had to
>>     occur in 'exactly' the right way.  They are:
>
> [...]
>
>> I think it is safe to conclude that the probabilities -greatly- favor
>> alternative #1.
>>
>
> OK. So after your comments and further research I concur with you on
> the mv but if it wasn't a human, then this might be exposing a serious
> security flaw in the jail system or the way EzJail implements it. The
> whole point of using jails is to protect things like this from
> happening. Given that the only jail that survived was the front-end
> Apache Web server/reverse proxy, then it is also safe to suspect the
> apache (or other) process running on it was able to perform a mv of
> the rest of the jails to it's own /usr/local/etc/apache22 directory.
>
> Is there no possibility is that after the system crash, the journal
> recocery process and/or fsck could have moved this directories ?
>

Also note that even the EzJail basejail was moved also, so it could be
a security hole in the way nullfs is used or in nullfs itself. but the
curious thing is that the basejail is supposed to be mounted read-only
so how did that get moved to the http-proxy jail??

That is why I suspect it could have been something in the boot process
like the journal recovery, fsck or something else with that kind of
privilege and when the EzJail filesystems were unmounted.

--
Alejandro
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Robert Bonomi
In reply to this post by Alejandro Imass-6

Alejandro Imass <[hidden email]> wrote:

> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
> <[hidden email]> wrote:
> >  Alejandro Imass <[hidden email]> wrote:
> >> After a little more research, ___it it NOT unlikely at all___ that
> >> under high distress and a hard boot, UFS could have somehow corrupted
> >> the directory structure, whilst maintaining the data intact.
> >
> > This is techically accurate, *BUT* the specifics of the quote "corruption"
> > unquote in the case under discussion make it *EXTREMELY* unlikely that this
> > is what happened.
> >
> > 99.99+++% of all UFS filesystem "corruption' issues are the result of a
> > system crash _between_ the time cached 'meta-data' is updated in memory
> > and that data is flushed to disk (a deferred write).
> >
> > The second most common (and vanishingly rare) failure mode is a powerfail
> > _as_ a sector of disk is being written -- resulting in 'garbage data'
> > being written to disk.
> >
> > The next possibility is 'cosmic rays'.  If running on 'cheap' hardware
> > (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in
> > data being output.
> >
> > The fact that the 'corrupted' filesystem passed fsck -without- any reported
> > errors shows that everything in the filesystem meta-data was consistent
> >
> [...]
>
> > I think it is safe to conclude that the probabilities -greatly- favor
> > alternative #1.
> >
>
> OK. So after your comments and further research I concur with you on
> the mv but if it wasn't a human, then this might be exposing a serious
> security flaw in the jail system or the way EzJail implements it.

BOGON ALERT!!!  

Jails only prevent stuff -inside- the jail from affecting stuff outside the
jail.   They do *NOT* prevent stuff 'oustide the jail' from affecting stuff
INSIDE the jail.


"For any fool-proof system, there exists a *sufficiently*determined* fool
 capable of breaking it."

>                                                                   The
> whole point of using jails is to protect things like this from
> happening.

FALSE TO FACT.

>            Given that the only jail that survived was the front-end
> Apache Web server/reverse proxy, then it is also safe to suspect the
> apache (or other) process running on it was able to perform a mv of
> the rest of the jails to it's own /usr/local/etc/apache22 directory.

FALSE TO FACT.

> Is there no possibility is that after the system crash, the journal
> recocery process and/or fsck could have moved this directories ?

"Anything is 'possible'" -- c.f. 'nasal monkeys'.

HOWEVER, if, for example, you would bother to examine the source code for
fsck you would discover that it doesn't do -anything- 'significant' without
ASKING FIRST.   You reported it didn't find any problems -- not even anay
of the 'petty' ones it will correct w/o asking -if- the '-p' option is
specified.

"Journal revovery" _could_, 'theoretically' have done it -- *IF* "something
else" did the 'mv' just before the crash, and that operation was journaled,
but not yet committed to disk at the time of the crash.  However, on a
standard UFS filesystem, filesystem metadata updates are written
'synchronously', which should eliminate _that_ wild, unfounded, speculaction.


 "You sir, don't know what you don't know, and much of what you "think"
 you know is incorrect."





_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Alejandro Imass
On Sat, Apr 28, 2012 at 1:31 PM, Robert Bonomi <[hidden email]> wrote:

>
> Alejandro Imass <[hidden email]> wrote:
>> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
>> <[hidden email]> wrote:
>> >  Alejandro Imass <[hidden email]> wrote:
>> >> After a little more research, ___it it NOT unlikely at all___ that
>> >> under high distress and a hard boot, UFS could have somehow corrupted
>> >> the directory structure, whilst maintaining the data intact.
>> >
>> > This is techically accurate, *BUT* the specifics of the quote "corruption"
>> > unquote in the case under discussion make it *EXTREMELY* unlikely that this
>> > is what happened.
>> >
>> > 99.99+++% of all UFS filesystem "corruption' issues are the result of a
>> > system crash _between_ the time cached 'meta-data' is updated in memory
>> > and that data is flushed to disk (a deferred write).
>> >
>> > The second most common (and vanishingly rare) failure mode is a powerfail
>> > _as_ a sector of disk is being written -- resulting in 'garbage data'
>> > being written to disk.
>> >
>> > The next possibility is 'cosmic rays'.  If running on 'cheap' hardware
>> > (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in
>> > data being output.
>> >
>> > The fact that the 'corrupted' filesystem passed fsck -without- any reported
>> > errors shows that everything in the filesystem meta-data was consistent
>> >
>> [...]
>>
>> > I think it is safe to conclude that the probabilities -greatly- favor
>> > alternative #1.
>> >
>>
>> OK. So after your comments and further research I concur with you on
>> the mv but if it wasn't a human, then this might be exposing a serious
>> security flaw in the jail system or the way EzJail implements it.
>
> BOGON ALERT!!!
>

I admit my ignorance on how the filesystem works but I don't think
your condescending remarks add a lot of value. The issue here is this
actually happened and there is a flaw somewhere other than "the stupid
administrator did it".
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Polytropon
On Sat, 28 Apr 2012 13:52:02 -0400, Alejandro Imass wrote:

> On Sat, Apr 28, 2012 at 1:31 PM, Robert Bonomi <[hidden email]> wrote:
> >
> > Alejandro Imass <[hidden email]> wrote:
> >> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
> >> <[hidden email]> wrote:
> >> >  Alejandro Imass <[hidden email]> wrote:
> >> >> After a little more research, ___it it NOT unlikely at all___ that
> >> >> under high distress and a hard boot, UFS could have somehow corrupted
> >> >> the directory structure, whilst maintaining the data intact.
> >> >
> >> > This is techically accurate, *BUT* the specifics of the quote "corruption"
> >> > unquote in the case under discussion make it *EXTREMELY* unlikely that this
> >> > is what happened.
> >> >
> >> > 99.99+++% of all UFS filesystem "corruption' issues are the result of a
> >> > system crash _between_ the time cached 'meta-data' is updated in memory
> >> > and that data is flushed to disk (a deferred write).
> >> >
> >> > The second most common (and vanishingly rare) failure mode is a powerfail
> >> > _as_ a sector of disk is being written -- resulting in 'garbage data'
> >> > being written to disk.
> >> >
> >> > The next possibility is 'cosmic rays'.  If running on 'cheap' hardware
> >> > (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in
> >> > data being output.
> >> >
> >> > The fact that the 'corrupted' filesystem passed fsck -without- any reported
> >> > errors shows that everything in the filesystem meta-data was consistent
> >> >
> >> [...]
> >>
> >> > I think it is safe to conclude that the probabilities -greatly- favor
> >> > alternative #1.
> >> >
> >>
> >> OK. So after your comments and further research I concur with you on
> >> the mv but if it wasn't a human, then this might be exposing a serious
> >> security flaw in the jail system or the way EzJail implements it.
> >
> > BOGON ALERT!!!
> >
>
> I admit my ignorance on how the filesystem works but I don't think
> your condescending remarks add a lot of value. The issue here is this
> actually happened and there is a flaw somewhere other than "the stupid
> administrator did it".

If you search the archives of this list, you'll find my _first_
post to that list: I've had a similar problem, df shows data
must be there after crash (panic -> reboot -> fsck trouble), but
files aren't there (even _not_ in lost+found). It's quite possible
that in _exceptional_ moments this can happen. The fsck program
is intended to repair the most typical file system faults, but
nothing "complicated" will be done without interaction: Altering
data on disk will _always_ involve the responsible (!) admin to
check if it is really intended "to do so".

There can be many reasons. I've never found out what was the
reason for the trouble I've had. Some years ago, I found a "make"
failing because "/uss/src/blah... something not found", and
a quick memtest revealed the secret: defective RAM module that
caused a "bit error", and the difference between "r" and "s"
is just one bit. Replaced the module - everything worked.
Mean soldering rays from outer space. :-)

You'll find many useful forensic tools in the ports collection
that might help locate "lost" data (quotes intended as long as
the data is still on the disk). The more complex your setting
is (e. g. striped disks, or ZFS), this can be nearly impossible.
"Plain old UFS" can sometimes be your saviour (but BACKUP should
be your real friend).





--
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Alejandro Imass
On Sat, Apr 28, 2012 at 2:01 PM, Polytropon <[hidden email]> wrote:

> On Sat, 28 Apr 2012 13:52:02 -0400, Alejandro Imass wrote:
>> On Sat, Apr 28, 2012 at 1:31 PM, Robert Bonomi <[hidden email]> wrote:
>> >
>> > Alejandro Imass <[hidden email]> wrote:
>> >> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
>> >> <[hidden email]> wrote:
>> >> >  Alejandro Imass <[hidden email]> wrote:
>> >> >> After a little more research, ___it it NOT unlikely at all___ that
>> >> >> under high distress and a hard boot, UFS could have somehow corrupted
>> >> >> the directory structure, whilst maintaining the data intact.
>> >> >
>> >> > This is techically accurate, *BUT* the specifics of the quote "corruption"
>> >> > unquote in the case under discussion make it *EXTREMELY* unlikely that this
>> >> > is what happened.
>> >> >
>> >> > 99.99+++% of all UFS filesystem "corruption' issues are the result of a
>> >> > system crash _between_ the time cached 'meta-data' is updated in memory
>> >> > and that data is flushed to disk (a deferred write).
>> >> >
>> >> > The second most common (and vanishingly rare) failure mode is a powerfail
>> >> > _as_ a sector of disk is being written -- resulting in 'garbage data'
>> >> > being written to disk.
>> >> >
>> >> > The next possibility is 'cosmic rays'.  If running on 'cheap' hardware
>> >> > (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in
>> >> > data being output.
>> >> >
>> >> > The fact that the 'corrupted' filesystem passed fsck -without- any reported
>> >> > errors shows that everything in the filesystem meta-data was consistent
>> >> >
>> >> [...]
>> >>
>> >> > I think it is safe to conclude that the probabilities -greatly- favor
>> >> > alternative #1.
>> >> >
>> >>
>> >> OK. So after your comments and further research I concur with you on
>> >> the mv but if it wasn't a human, then this might be exposing a serious
>> >> security flaw in the jail system or the way EzJail implements it.
>> >
>> > BOGON ALERT!!!
>> >
>>
>> I admit my ignorance on how the filesystem works but I don't think
>> your condescending remarks add a lot of value. The issue here is this
>> actually happened and there is a flaw somewhere other than "the stupid
>> administrator did it".
>
> If you search the archives of this list, you'll find my _first_
> post to that list: I've had a similar problem, df shows data
> must be there after crash (panic -> reboot -> fsck trouble), but
> files aren't there (even _not_ in lost+found). It's quite possible
> that in _exceptional_ moments this can happen. The fsck program
> is intended to repair the most typical file system faults, but
> nothing "complicated" will be done without interaction: Altering
> data on disk will _always_ involve the responsible (!) admin to
> check if it is really intended "to do so".
>
> There can be many reasons. I've never found out what was the
[...]

> that might help locate "lost" data (quotes intended as long as
> the data is still on the disk). The more complex your setting
> is (e. g. striped disks, or ZFS), this can be nearly impossible.
> "Plain old UFS" can sometimes be your saviour (but BACKUP should
> be your real friend).
>

Thanks for your reply.

I can't figure out how there was no data loss and yet the directories
moved just like that. We have nightly backups and it's one of the
features we love about EzJail and it's archive feature. The base
system sits on another disk entirely and it's pristine, we don't
install anything except the basic system on the system disk and the
other disk is exclusively divided in jails, so the possibility of an
outside process doing the mv is unlikely.

Everything point to that something or someone executed a mv but how
was this done? or if there is a potential problem and could happen
again. And contrary to other comments here, and my admitted ignorance,
I believe there are actually 3 possibilities:

1) something inside a jail was able to move the other jails into itself
2) something outside the jails moved the jails
3) the directories were moved at reboot by journal recovery, fsck or
something else

That is what worries me, is that it wasn't just some random bit or
cosmic ray, but the potential of happening again. I am not so sure
that it is *impossible* that a jail could affect other jails with
EzJail.

--
Alejandro
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Jerome Herman
In reply to this post by Alejandro Imass
On 28/04/2012 19:52, Alejandro Imass wrote:

> On Sat, Apr 28, 2012 at 1:31 PM, Robert Bonomi<[hidden email]>  wrote:
>> Alejandro Imass<[hidden email]>  wrote:
>>> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
>>> <[hidden email]>  wrote:
>>>>   Alejandro Imass<[hidden email]>  wrote:
>>>>> After a little more research, ___it it NOT unlikely at all___ that
>>>>> under high distress and a hard boot, UFS could have somehow corrupted
>>>>> the directory structure, whilst maintaining the data intact.
>>>> This is techically accurate, *BUT* the specifics of the quote "corruption"
>>>> unquote in the case under discussion make it *EXTREMELY* unlikely that this
>>>> is what happened.
>>>>
>>>> 99.99+++% of all UFS filesystem "corruption' issues are the result of a
>>>> system crash _between_ the time cached 'meta-data' is updated in memory
>>>> and that data is flushed to disk (a deferred write).
>>>>
>>>> The second most common (and vanishingly rare) failure mode is a powerfail
>>>> _as_ a sector of disk is being written -- resulting in 'garbage data'
>>>> being written to disk.
>>>>
>>>> The next possibility is 'cosmic rays'.  If running on 'cheap' hardware
>>>> (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in
>>>> data being output.
>>>>
>>>> The fact that the 'corrupted' filesystem passed fsck -without- any reported
>>>> errors shows that everything in the filesystem meta-data was consistent
>>>>
>>> [...]
>>>
>>>> I think it is safe to conclude that the probabilities -greatly- favor
>>>> alternative #1.
>>>>
>>> OK. So after your comments and further research I concur with you on
>>> the mv but if it wasn't a human, then this might be exposing a serious
>>> security flaw in the jail system or the way EzJail implements it.
>> BOGON ALERT!!!
>>
> I admit my ignorance on how the filesystem works but I don't think
> your condescending remarks add a lot of value. The issue here is this
> actually happened and there is a flaw somewhere other than "the stupid
> administrator did it".
Ok,

Not wanting to take any side in what could end up in personal attacks
and nasty things being said about any poster genitors but :

- Jails are very widely used, in fact it is probably one of the most
used functionnality of FreeBSD. Far beyond ZFS, MAC or any of the other
nice thingies FreeBSD has.
- Jails are very often misused. Though not overly complex, creating a
proper jail and upgrading it can sometime be a bit tricky.
- Though not entirely devoid of bug and perfect, FreeBSD 8.2 is probably
the best thing there is out there when it comes to system stability. It
might be lacking some little nooks and cranies when it comes to perfect
compliance with obscure standards, it might not behave as expected in
some very few situation, but these are extremely rare. FreeBSD 8.2 is
very widely used and this is one of the first time I heard of such a
problem in jails. Nothing even remotely rings a bell.

Take all these information into account and put yourself in our shoes.
When reading your problem description, most of us will be inclined to
think that you did something wrong.

My personnal guess would be that you probably abused  "ln" a bit too
much when creating the jails (total shot in the dark here, but it could
explain what happened).  I don't see how journaling could impact your
jails in anyway except if your jails were all extremely new when the
crash happened or that the I/O was such that FreeBSD could never sync
and commit journal from the time you created your jails to the time
where the system crashed.
Extremely unlikely.

So my question is : where all the jail created properly ? Did you cpdup
each and every one of them or were you lazy at some point ? Are all the
jails properly declared in rc.conf ? My guess would be that the first
jail was created in the right way, but that others were created using cp
and ln, resulting in unexpected behaviour in the end. If I am right then
the "surviving" jail would be either the first or the last you created.

Jerome Herman

> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Edward Martinez
In reply to this post by Alejandro Imass
On 04/28/2012 11:16 AM, Alejandro Imass wrote:
> That is what worries me, is that it wasn't just some random bit or
> cosmic ray, but the potential of happening again. I am not so sure
> that it is*impossible*  that a jail could affect other jails with
> EzJail.

   Sorry I'm late to the party. How about contacting EZjail and
explaining what has happen:-)
   it may be a bug?

   http://erdgeist.org/arts/software/ezjail/#Author
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Erich Dollansky-5
In reply to this post by Alejandro Imass-6
Hi,

On Saturday 28 April 2012 20:15:25 Alejandro Imass wrote:

> On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar
> <[hidden email]> wrote:
> >> I somewhat agree, but it wasn't a person. I am the only administrator,
> >> the only one with root access. The jails were effectively moved to the
> >> /usr/local/etc/apache22 of the single that survived at the top level.
> >> I'm thinking something between mount, EzJail, the journal and the way
> >> MySQL created a great deal of head contention, so something must have
> >> gotten corrupted at the directory level like you state, but the
> >> strange part is no _data_ corruption as such, because I was able to
> >> physically archive the jails, move them to the correct directory and
> >
> >
> > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are
> > sure you didn't move it yourself then it must be machine hardware problem
> > but still unlikely.
>
> After a little more research, ___it it NOT unlikely at all___ that
> under high distress and a hard boot, UFS could have somehow corrupted
> the directory structure, whilst maintaining the data intact. From what
> I've learned so far, UFS is actually divided into 2 layers: one that
> controls the directory structure and metadata and a lower layer
> containing the data, so the directories being screwed up and the data
> intact it is actually quite possible.
>
> What I'm trying to do is figure out is how it happened, and try
> prevent it from happening again, so instead of dismissing it as
> impossibility, I think we all should spend a little time figuring out
> how these things can happen and determine how it can be prevented or
> reduced.

somebody mentioned the links. Did you use links in the jails to access the data? If then the directories of the jails got screwed, the links are gone but the original data is still there. The damaged directory might got fixed during the first reboot after the crash and you never noticed the fix.

Erich
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: UFS Crash and directories now missing

Perry Hutchison
In reply to this post by Alejandro Imass
Alejandro Imass <[hidden email]> wrote:

> 3) the directories were moved at reboot by journal recovery,
> fsck or something else

I think it's *extremely* unlikely that fsck was involved, because
it just doesn't do things like that.  It might move an orphaned
directory (or file) to lost+found, but nowhere else.  That's in
addition to the fact that, as someone already mentioned, it asks
before doing anything.  I don't know enough about the details of
journal recovery to comment on it as a suspect.

> That is what worries me, is that it wasn't just some random bit
> or cosmic ray, but the potential of happening again ...

Any chance that your base system -- rather than one of the jails --
has somehow been cracked; maybe even that the cracker precipitated
the crash?  It might be wise to restore the whole system from backup,
the base from a moderately old one since it doesn't change anyway,
rather than trying to recover.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
1234
Loading...