|
Hi folks,
We had a server crash and required a hard reboot. The system is on one disk and another disc mounts /usr/jails and everything runs in jails, pristine base system, and the base system is working perfectly. The second volume, the one with the jails mounted but every jail directory disappeared except one. df still shows the data being used so I'm guessing it's a logical error in the directory structure or something. I unmounted the drive and ran fsck and reported no problems. df shows the data being use so where is the data?? This is FreeBSD 8.2 updated, patched etc. The volume was UFS + Journal Any help is GREATLY appreciated! Thanks! -- Alejandro Imass _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Fri, Apr 27, 2012 at 7:52 PM, Alejandro Imass <[hidden email]> wrote:
> Hi folks, > > We had a server crash and required a hard reboot. The system is on one > disk and another disc mounts /usr/jails and everything runs in jails, > pristine base system, and the base system is working perfectly. > > The second volume, the one with the jails mounted but every jail > directory disappeared except one. df still shows the data being used > so I'm guessing it's a logical error in the directory structure or > something. I unmounted the drive and ran fsck and reported no > problems. df shows the data being use so where is the data?? > OK, so here is an update, maybe someone has some clue here.... All the jails wound up in the /usr/local/etc/apache22 of the only surviving jail which is the http proxy to all the other jails. Right before the server crashed I noticed MySQL at 100% o several CPUs and the server was on it's knees, so I'm wondering.... was this an attack? is it possible that Apache or MySQL moved the files?? I mean the jails are there, I'm even backing them up right now.... but how did these directories move here????? Anybody has ANY logical explanation??? Thanks, -- Alejandro Imass > This is FreeBSD 8.2 updated, patched etc. The volume was UFS + Journal > > Any help is GREATLY appreciated! > > Thanks! > > -- > Alejandro Imass _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
Hi,
On Saturday 28 April 2012 09:33:47 Alejandro Imass wrote: > On Fri, Apr 27, 2012 at 7:52 PM, Alejandro Imass <[hidden email]> wrote: > > > > We had a server crash and required a hard reboot. The system is on one > > disk and another disc mounts /usr/jails and everything runs in jails, > > pristine base system, and the base system is working perfectly. > > > > The second volume, the one with the jails mounted but every jail > > directory disappeared except one. df still shows the data being used > > so I'm guessing it's a logical error in the directory structure or > > something. I unmounted the drive and ran fsck and reported no > > problems. df shows the data being use so where is the data?? > > what is du saying? > > OK, so here is an update, maybe someone has some clue here.... > > All the jails wound up in the /usr/local/etc/apache22 of the only > surviving jail which is the http proxy to all the other jails. You want to say that all the data you were looking for have been moved to this directory? > > Right before the server crashed I noticed MySQL at 100% o several CPUs > and the server was on it's knees, so I'm wondering.... was this an > attack? is it possible that Apache or MySQL moved the files?? > > I mean the jails are there, I'm even backing them up right now.... but > how did these directories move here????? > > Anybody has ANY logical explanation??? Journaling is new to me. Could this be the cause? Erich _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Fri, Apr 27, 2012 at 11:00 PM, Erich Dollansky
<[hidden email]> wrote: > Hi, > > On Saturday 28 April 2012 09:33:47 Alejandro Imass wrote: >> On Fri, Apr 27, 2012 at 7:52 PM, Alejandro Imass <[hidden email]> wrote: >> > >> > We had a server crash and required a hard reboot. The system is on one >> > disk and another disc mounts /usr/jails and everything runs in jails, >> > pristine base system, and the base system is working perfectly. >> > >> > The second volume, the one with the jails mounted but every jail >> > directory disappeared except one. df still shows the data being used >> > so I'm guessing it's a logical error in the directory structure or >> > something. I unmounted the drive and ran fsck and reported no >> > problems. df shows the data being use so where is the data?? >> > > > what is du saying? >> >> OK, so here is an update, maybe someone has some clue here.... >> >> All the jails wound up in the /usr/local/etc/apache22 of the only >> surviving jail which is the http proxy to all the other jails. > > You want to say that all the data you were looking for have been moved to this directory? >> EXACTLY THAT. In fact the data is intact and I have already backed-up everything to another disk. >> Right before the server crashed I noticed MySQL at 100% o several CPUs >> and the server was on it's knees, so I'm wondering.... was this an >> attack? is it possible that Apache or MySQL moved the files?? >> >> I mean the jails are there, I'm even backing them up right now.... but >> how did these directories move here????? >> >> Anybody has ANY logical explanation??? > > Journaling is new to me. Could this be the cause? > Maybe so, I have no idea. Maybe it's because EzJail mount volumes with each jail or some other wild explanation. I honestly have never seen this before. I am just glad that UFS was nice enough to keep my data somewhere at least, and after my bad experiences with ZFS I can now say with a lot more certainty that UFS rocks. I mean something got screwed up but the data was not lost. Hope someone can shed some light here.. -- Alejandro > Erich _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Alejandro Imass-6
> something. I unmounted the drive and ran fsck and reported no
> problems. df shows the data being use so where is the data?? your data is here as df shown usage and fsck see no errors. most probably root directory of that volume got corrupted and subdirs were found and put in lost+found _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Alejandro Imass-6
>
99% - someone did moved them.
> All the jails wound up in the /usr/local/etc/apache22 of the only > surviving jail which is the http proxy to all the other jails. > > Right before the server crashed I noticed MySQL at 100% o several CPUs > and the server was on it's knees, so I'm wondering.... was this an > attack? is it possible that Apache or MySQL moved the files?? > > I mean the jails are there, I'm even backing them up right now.... but > how did these directories move here????? > > Anybody has ANY logical explanation??? > 1% - hardware problem possibly memory. without this there is no way for directory to be "accidentally" moved _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, Apr 28, 2012 at 1:43 AM, Wojciech Puchar
<[hidden email]> wrote: >> >> All the jails wound up in the /usr/local/etc/apache22 of the only >> surviving jail which is the http proxy to all the other jails. >> >> Right before the server crashed I noticed MySQL at 100% o several CPUs >> and the server was on it's knees, so I'm wondering.... was this an >> attack? is it possible that Apache or MySQL moved the files?? >> >> I mean the jails are there, I'm even backing them up right now.... but >> how did these directories move here????? >> >> Anybody has ANY logical explanation??? >> > 99% - someone did moved them. > 1% - hardware problem possibly memory. without this there is no way for > directory to be "accidentally" moved I somewhat agree, but it wasn't a person. I am the only administrator, the only one with root access. The jails were effectively moved to the /usr/local/etc/apache22 of the single that survived at the top level. I'm thinking something between mount, EzJail, the journal and the way MySQL created a great deal of head contention, so something must have gotten corrupted at the directory level like you state, but the strange part is no _data_ corruption as such, because I was able to physically archive the jails, move them to the correct directory and archived them all with ezjail-admin to a different disk. I was thinking of formatting the jails drive, but after all this disk activity and no errors, and everything booted up correctly, I am not so sure now that it's needed it. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
> I somewhat agree, but it wasn't a person. I am the only administrator,
> the only one with root access. The jails were effectively moved to the > /usr/local/etc/apache22 of the single that survived at the top level. > I'm thinking something between mount, EzJail, the journal and the way > MySQL created a great deal of head contention, so something must have > gotten corrupted at the directory level like you state, but the > strange part is no _data_ corruption as such, because I was able to > physically archive the jails, move them to the correct directory and no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are sure you didn't move it yourself then it must be machine hardware problem but still unlikely. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar
<[hidden email]> wrote: >> I somewhat agree, but it wasn't a person. I am the only administrator, >> the only one with root access. The jails were effectively moved to the >> /usr/local/etc/apache22 of the single that survived at the top level. >> I'm thinking something between mount, EzJail, the journal and the way >> MySQL created a great deal of head contention, so something must have >> gotten corrupted at the directory level like you state, but the >> strange part is no _data_ corruption as such, because I was able to >> physically archive the jails, move them to the correct directory and > > > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are > sure you didn't move it yourself then it must be machine hardware problem > but still unlikely. After a little more research, ___it it NOT unlikely at all___ that under high distress and a hard boot, UFS could have somehow corrupted the directory structure, whilst maintaining the data intact. From what I've learned so far, UFS is actually divided into 2 layers: one that controls the directory structure and metadata and a lower layer containing the data, so the directories being screwed up and the data intact it is actually quite possible. What I'm trying to do is figure out is how it happened, and try prevent it from happening again, so instead of dismissing it as impossibility, I think we all should spend a little time figuring out how these things can happen and determine how it can be prevented or reduced. "Should you find your neighbor's beard catch fire, it's wise to soak one's own" -- Alejandro _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
Alejandro Imass <[hidden email]> wrote: > On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar > <[hidden email]> wrote: > >> I somewhat agree, but it wasn't a person. I am the only administrator, > >> the only one with root access. The jails were effectively moved to the > >> /usr/local/etc/apache22 of the single that survived at the top level. > >> I'm thinking something between mount, EzJail, the journal and the way > >> MySQL created a great deal of head contention, so something must have > >> gotten corrupted at the directory level like you state, but the > >> strange part is no _data_ corruption as such, because I was able to > >> physically archive the jails, move them to the correct directory and > > > > > > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are > > sure you didn't move it yourself then it must be machine hardware problem > > but still unlikely. > > After a little more research, ___it it NOT unlikely at all___ that > under high distress and a hard boot, UFS could have somehow corrupted > the directory structure, whilst maintaining the data intact. This is techically accurate, *BUT* the specifics of the quote "corruption" unquote in the case under discussion make it *EXTREMELY* unlikely that this is what happened. 99.99+++% of all UFS filesystem "corruption' issues are the result of a system crash _between_ the time cached 'meta-data' is updated in memory and that data is flushed to disk (a deferred write). The second most common (and vanishingly rare) failure mode is a powerfail _as_ a sector of disk is being written -- resulting in 'garbage data' being written to disk. The next possibility is 'cosmic rays'. If running on 'cheap' hardware (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in data being output. The fact that the 'corrupted' filesystem passed fsck -without- any reported errors shows that everything in the filesystem meta-data was consistent Given *that*, there are precisely *TWO* ways that the 'results' that have been reported could have happened. 1) "Something" did a mv(2) of the various jail directories 'from' their original location to the 'apache' diretory. This involves simply *copying* the diretory entry from the jail's 'parent directory' to the apache directory, and then marking the entry in the original parent as 'unused'. Nothing other than the directory whre the jail 'used to live', and the directory 'where it was found' are touched. This occured _through_ the system 'mv' function, so all the normal 'housekeeping' was done properly. 2) it was -not- done though mv(2) -- but that requires that a whole *series* of "corruptions" of the filesystem, _ALL_ of which had to occur in 'exactly' the right way. They are: 1) The -size- (filesystem metadata) of the orignal parent directory had to be changed to reflect the smaller size. 2) the 'indirect block' info for the original parent directory had to be changed to reflect the absense of the block(s) that are no longer part of that file. 3) the _size_ of the Apache directory had to be increased to reflect the additional block(s) that are now part o that directory. 4) the 'indirect block' info for the apache directory has to be changed to reflect the presense of the new block(s) that are now part of that file. This requires multiple -hundreds- of bits 'in error', in a minimum of FOUR separate disk locations. A -single- failure simply *CANNOT* cause all of this. The probability of a random single-bit error in a gigabyte of RAM is on the order of one such occurance in six months. The odds of having multiple *simultaneous* errors is the probability of a single-bit error raised to the power of the number of bits in error. e.g. the probability of a simultaneous 10-bit radom error is roughly 1 in 30 million years. The odds of it being a -specific- ten bits out of that gigabyte is preposterously small. The odds of the required specific _multiple-hundreds_ of bits in error occuringis (conservatively) 1 in (30 million years)**50 * ((2**30)!) / ((2^9)!) The first factor, alone, is over 7.1E373 years. I think it is safe to conclude that the probabilities -greatly- favor alternative #1. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
<[hidden email]> wrote: > > Alejandro Imass <[hidden email]> wrote: >> On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar >> <[hidden email]> wrote: >> >> I somewhat agree, but it wasn't a person. I am the only administrator, >> >> the only one with root access. The jails were effectively moved to the >> >> /usr/local/etc/apache22 of the single that survived at the top level. >> >> I'm thinking something between mount, EzJail, the journal and the way >> >> MySQL created a great deal of head contention, so something must have >> >> gotten corrupted at the directory level like you state, but the >> >> strange part is no _data_ corruption as such, because I was able to >> >> physically archive the jails, move them to the correct directory and >> > >> > >> > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are >> > sure you didn't move it yourself then it must be machine hardware problem >> > but still unlikely. >> >> After a little more research, ___it it NOT unlikely at all___ that >> under high distress and a hard boot, UFS could have somehow corrupted >> the directory structure, whilst maintaining the data intact. > > This is techically accurate, *BUT* the specifics of the quote "corruption" > unquote in the case under discussion make it *EXTREMELY* unlikely that this > is what happened. > > 99.99+++% of all UFS filesystem "corruption' issues are the result of a > system crash _between_ the time cached 'meta-data' is updated in memory > and that data is flushed to disk (a deferred write). > > The second most common (and vanishingly rare) failure mode is a powerfail > _as_ a sector of disk is being written -- resulting in 'garbage data' > being written to disk. > > The next possibility is 'cosmic rays'. If running on 'cheap' hardware (i.e., > without 'ECC' memory), this can cause a *SINGLE-BIT* error in data being > output. > > The fact that the 'corrupted' filesystem passed fsck -without- any reported > errors shows that everything in the filesystem meta-data was consistent > > Given *that*, there are precisely *TWO* ways that the 'results' that have > been reported could have happened. > > 1) "Something" did a mv(2) of the various jail directories 'from' their > original location to the 'apache' diretory. This involves simply > *copying* the diretory entry from the jail's 'parent directory' to > the apache directory, and then marking the entry in the original > parent as 'unused'. Nothing other than the directory whre the jail > 'used to live', and the directory 'where it was found' are touched. > This occured _through_ the system 'mv' function, so all the normal > 'housekeeping' was done properly. > > 2) it was -not- done though mv(2) -- but that requires that a whole > *series* of "corruptions" of the filesystem, _ALL_ of which had to > occur in 'exactly' the right way. They are: [...] > I think it is safe to conclude that the probabilities -greatly- favor > alternative #1. > OK. So after your comments and further research I concur with you on the mv but if it wasn't a human, then this might be exposing a serious security flaw in the jail system or the way EzJail implements it. The whole point of using jails is to protect things like this from happening. Given that the only jail that survived was the front-end Apache Web server/reverse proxy, then it is also safe to suspect the apache (or other) process running on it was able to perform a mv of the rest of the jails to it's own /usr/local/etc/apache22 directory. Is there no possibility is that after the system crash, the journal recocery process and/or fsck could have moved this directories ? Thanks, -- Alejandro _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, Apr 28, 2012 at 12:36 PM, Alejandro Imass <[hidden email]> wrote:
> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi > <[hidden email]> wrote: >> >> Alejandro Imass <[hidden email]> wrote: >>> On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar >>> <[hidden email]> wrote: >>> >> I somewhat agree, but it wasn't a person. I am the only administrator, >>> >> the only one with root access. The jails were effectively moved to the >>> >> /usr/local/etc/apache22 of the single that survived at the top level. >>> >> I'm thinking something between mount, EzJail, the journal and the way >>> >> MySQL created a great deal of head contention, so something must have >>> >> gotten corrupted at the directory level like you state, but the >>> >> strange part is no _data_ corruption as such, because I was able to >>> >> physically archive the jails, move them to the correct directory and >>> > >>> > >>> > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are >>> > sure you didn't move it yourself then it must be machine hardware problem >>> > but still unlikely. >>> >>> After a little more research, ___it it NOT unlikely at all___ that >>> under high distress and a hard boot, UFS could have somehow corrupted >>> the directory structure, whilst maintaining the data intact. >> >> This is techically accurate, *BUT* the specifics of the quote "corruption" >> unquote in the case under discussion make it *EXTREMELY* unlikely that this >> is what happened. >> >> 99.99+++% of all UFS filesystem "corruption' issues are the result of a >> system crash _between_ the time cached 'meta-data' is updated in memory >> and that data is flushed to disk (a deferred write). >> >> The second most common (and vanishingly rare) failure mode is a powerfail >> _as_ a sector of disk is being written -- resulting in 'garbage data' >> being written to disk. >> >> The next possibility is 'cosmic rays'. If running on 'cheap' hardware (i.e., >> without 'ECC' memory), this can cause a *SINGLE-BIT* error in data being >> output. >> >> The fact that the 'corrupted' filesystem passed fsck -without- any reported >> errors shows that everything in the filesystem meta-data was consistent >> >> Given *that*, there are precisely *TWO* ways that the 'results' that have >> been reported could have happened. >> >> 1) "Something" did a mv(2) of the various jail directories 'from' their >> original location to the 'apache' diretory. This involves simply >> *copying* the diretory entry from the jail's 'parent directory' to >> the apache directory, and then marking the entry in the original >> parent as 'unused'. Nothing other than the directory whre the jail >> 'used to live', and the directory 'where it was found' are touched. >> This occured _through_ the system 'mv' function, so all the normal >> 'housekeeping' was done properly. >> >> 2) it was -not- done though mv(2) -- but that requires that a whole >> *series* of "corruptions" of the filesystem, _ALL_ of which had to >> occur in 'exactly' the right way. They are: > > [...] > >> I think it is safe to conclude that the probabilities -greatly- favor >> alternative #1. >> > > OK. So after your comments and further research I concur with you on > the mv but if it wasn't a human, then this might be exposing a serious > security flaw in the jail system or the way EzJail implements it. The > whole point of using jails is to protect things like this from > happening. Given that the only jail that survived was the front-end > Apache Web server/reverse proxy, then it is also safe to suspect the > apache (or other) process running on it was able to perform a mv of > the rest of the jails to it's own /usr/local/etc/apache22 directory. > > Is there no possibility is that after the system crash, the journal > recocery process and/or fsck could have moved this directories ? > Also note that even the EzJail basejail was moved also, so it could be a security hole in the way nullfs is used or in nullfs itself. but the curious thing is that the basejail is supposed to be mounted read-only so how did that get moved to the http-proxy jail?? That is why I suspect it could have been something in the boot process like the journal recovery, fsck or something else with that kind of privilege and when the EzJail filesystems were unmounted. -- Alejandro _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Alejandro Imass-6
Alejandro Imass <[hidden email]> wrote: > On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi > <[hidden email]> wrote: > > Alejandro Imass <[hidden email]> wrote: > >> After a little more research, ___it it NOT unlikely at all___ that > >> under high distress and a hard boot, UFS could have somehow corrupted > >> the directory structure, whilst maintaining the data intact. > > > > This is techically accurate, *BUT* the specifics of the quote "corruption" > > unquote in the case under discussion make it *EXTREMELY* unlikely that this > > is what happened. > > > > 99.99+++% of all UFS filesystem "corruption' issues are the result of a > > system crash _between_ the time cached 'meta-data' is updated in memory > > and that data is flushed to disk (a deferred write). > > > > The second most common (and vanishingly rare) failure mode is a powerfail > > _as_ a sector of disk is being written -- resulting in 'garbage data' > > being written to disk. > > > > The next possibility is 'cosmic rays'. If running on 'cheap' hardware > > (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in > > data being output. > > > > The fact that the 'corrupted' filesystem passed fsck -without- any reported > > errors shows that everything in the filesystem meta-data was consistent > > > [...] > > > I think it is safe to conclude that the probabilities -greatly- favor > > alternative #1. > > > > OK. So after your comments and further research I concur with you on > the mv but if it wasn't a human, then this might be exposing a serious > security flaw in the jail system or the way EzJail implements it. BOGON ALERT!!! Jails only prevent stuff -inside- the jail from affecting stuff outside the jail. They do *NOT* prevent stuff 'oustide the jail' from affecting stuff INSIDE the jail. "For any fool-proof system, there exists a *sufficiently*determined* fool capable of breaking it." > The > whole point of using jails is to protect things like this from > happening. FALSE TO FACT. > Given that the only jail that survived was the front-end > Apache Web server/reverse proxy, then it is also safe to suspect the > apache (or other) process running on it was able to perform a mv of > the rest of the jails to it's own /usr/local/etc/apache22 directory. FALSE TO FACT. > Is there no possibility is that after the system crash, the journal > recocery process and/or fsck could have moved this directories ? "Anything is 'possible'" -- c.f. 'nasal monkeys'. HOWEVER, if, for example, you would bother to examine the source code for fsck you would discover that it doesn't do -anything- 'significant' without ASKING FIRST. You reported it didn't find any problems -- not even anay of the 'petty' ones it will correct w/o asking -if- the '-p' option is specified. "Journal revovery" _could_, 'theoretically' have done it -- *IF* "something else" did the 'mv' just before the crash, and that operation was journaled, but not yet committed to disk at the time of the crash. However, on a standard UFS filesystem, filesystem metadata updates are written 'synchronously', which should eliminate _that_ wild, unfounded, speculaction. "You sir, don't know what you don't know, and much of what you "think" you know is incorrect." _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, Apr 28, 2012 at 1:31 PM, Robert Bonomi <[hidden email]> wrote:
> > Alejandro Imass <[hidden email]> wrote: >> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi >> <[hidden email]> wrote: >> > Alejandro Imass <[hidden email]> wrote: >> >> After a little more research, ___it it NOT unlikely at all___ that >> >> under high distress and a hard boot, UFS could have somehow corrupted >> >> the directory structure, whilst maintaining the data intact. >> > >> > This is techically accurate, *BUT* the specifics of the quote "corruption" >> > unquote in the case under discussion make it *EXTREMELY* unlikely that this >> > is what happened. >> > >> > 99.99+++% of all UFS filesystem "corruption' issues are the result of a >> > system crash _between_ the time cached 'meta-data' is updated in memory >> > and that data is flushed to disk (a deferred write). >> > >> > The second most common (and vanishingly rare) failure mode is a powerfail >> > _as_ a sector of disk is being written -- resulting in 'garbage data' >> > being written to disk. >> > >> > The next possibility is 'cosmic rays'. If running on 'cheap' hardware >> > (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in >> > data being output. >> > >> > The fact that the 'corrupted' filesystem passed fsck -without- any reported >> > errors shows that everything in the filesystem meta-data was consistent >> > >> [...] >> >> > I think it is safe to conclude that the probabilities -greatly- favor >> > alternative #1. >> > >> >> OK. So after your comments and further research I concur with you on >> the mv but if it wasn't a human, then this might be exposing a serious >> security flaw in the jail system or the way EzJail implements it. > > BOGON ALERT!!! > I admit my ignorance on how the filesystem works but I don't think your condescending remarks add a lot of value. The issue here is this actually happened and there is a flaw somewhere other than "the stupid administrator did it". _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, 28 Apr 2012 13:52:02 -0400, Alejandro Imass wrote:
> On Sat, Apr 28, 2012 at 1:31 PM, Robert Bonomi <[hidden email]> wrote: > > > > Alejandro Imass <[hidden email]> wrote: > >> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi > >> <[hidden email]> wrote: > >> > Alejandro Imass <[hidden email]> wrote: > >> >> After a little more research, ___it it NOT unlikely at all___ that > >> >> under high distress and a hard boot, UFS could have somehow corrupted > >> >> the directory structure, whilst maintaining the data intact. > >> > > >> > This is techically accurate, *BUT* the specifics of the quote "corruption" > >> > unquote in the case under discussion make it *EXTREMELY* unlikely that this > >> > is what happened. > >> > > >> > 99.99+++% of all UFS filesystem "corruption' issues are the result of a > >> > system crash _between_ the time cached 'meta-data' is updated in memory > >> > and that data is flushed to disk (a deferred write). > >> > > >> > The second most common (and vanishingly rare) failure mode is a powerfail > >> > _as_ a sector of disk is being written -- resulting in 'garbage data' > >> > being written to disk. > >> > > >> > The next possibility is 'cosmic rays'. If running on 'cheap' hardware > >> > (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in > >> > data being output. > >> > > >> > The fact that the 'corrupted' filesystem passed fsck -without- any reported > >> > errors shows that everything in the filesystem meta-data was consistent > >> > > >> [...] > >> > >> > I think it is safe to conclude that the probabilities -greatly- favor > >> > alternative #1. > >> > > >> > >> OK. So after your comments and further research I concur with you on > >> the mv but if it wasn't a human, then this might be exposing a serious > >> security flaw in the jail system or the way EzJail implements it. > > > > BOGON ALERT!!! > > > > I admit my ignorance on how the filesystem works but I don't think > your condescending remarks add a lot of value. The issue here is this > actually happened and there is a flaw somewhere other than "the stupid > administrator did it". If you search the archives of this list, you'll find my _first_ post to that list: I've had a similar problem, df shows data must be there after crash (panic -> reboot -> fsck trouble), but files aren't there (even _not_ in lost+found). It's quite possible that in _exceptional_ moments this can happen. The fsck program is intended to repair the most typical file system faults, but nothing "complicated" will be done without interaction: Altering data on disk will _always_ involve the responsible (!) admin to check if it is really intended "to do so". There can be many reasons. I've never found out what was the reason for the trouble I've had. Some years ago, I found a "make" failing because "/uss/src/blah... something not found", and a quick memtest revealed the secret: defective RAM module that caused a "bit error", and the difference between "r" and "s" is just one bit. Replaced the module - everything worked. Mean soldering rays from outer space. :-) You'll find many useful forensic tools in the ports collection that might help locate "lost" data (quotes intended as long as the data is still on the disk). The more complex your setting is (e. g. striped disks, or ZFS), this can be nearly impossible. "Plain old UFS" can sometimes be your saviour (but BACKUP should be your real friend). -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, Apr 28, 2012 at 2:01 PM, Polytropon <[hidden email]> wrote:
> On Sat, 28 Apr 2012 13:52:02 -0400, Alejandro Imass wrote: >> On Sat, Apr 28, 2012 at 1:31 PM, Robert Bonomi <[hidden email]> wrote: >> > >> > Alejandro Imass <[hidden email]> wrote: >> >> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi >> >> <[hidden email]> wrote: >> >> > Alejandro Imass <[hidden email]> wrote: >> >> >> After a little more research, ___it it NOT unlikely at all___ that >> >> >> under high distress and a hard boot, UFS could have somehow corrupted >> >> >> the directory structure, whilst maintaining the data intact. >> >> > >> >> > This is techically accurate, *BUT* the specifics of the quote "corruption" >> >> > unquote in the case under discussion make it *EXTREMELY* unlikely that this >> >> > is what happened. >> >> > >> >> > 99.99+++% of all UFS filesystem "corruption' issues are the result of a >> >> > system crash _between_ the time cached 'meta-data' is updated in memory >> >> > and that data is flushed to disk (a deferred write). >> >> > >> >> > The second most common (and vanishingly rare) failure mode is a powerfail >> >> > _as_ a sector of disk is being written -- resulting in 'garbage data' >> >> > being written to disk. >> >> > >> >> > The next possibility is 'cosmic rays'. If running on 'cheap' hardware >> >> > (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in >> >> > data being output. >> >> > >> >> > The fact that the 'corrupted' filesystem passed fsck -without- any reported >> >> > errors shows that everything in the filesystem meta-data was consistent >> >> > >> >> [...] >> >> >> >> > I think it is safe to conclude that the probabilities -greatly- favor >> >> > alternative #1. >> >> > >> >> >> >> OK. So after your comments and further research I concur with you on >> >> the mv but if it wasn't a human, then this might be exposing a serious >> >> security flaw in the jail system or the way EzJail implements it. >> > >> > BOGON ALERT!!! >> > >> >> I admit my ignorance on how the filesystem works but I don't think >> your condescending remarks add a lot of value. The issue here is this >> actually happened and there is a flaw somewhere other than "the stupid >> administrator did it". > > If you search the archives of this list, you'll find my _first_ > post to that list: I've had a similar problem, df shows data > must be there after crash (panic -> reboot -> fsck trouble), but > files aren't there (even _not_ in lost+found). It's quite possible > that in _exceptional_ moments this can happen. The fsck program > is intended to repair the most typical file system faults, but > nothing "complicated" will be done without interaction: Altering > data on disk will _always_ involve the responsible (!) admin to > check if it is really intended "to do so". > > There can be many reasons. I've never found out what was the > that might help locate "lost" data (quotes intended as long as > the data is still on the disk). The more complex your setting > is (e. g. striped disks, or ZFS), this can be nearly impossible. > "Plain old UFS" can sometimes be your saviour (but BACKUP should > be your real friend). > Thanks for your reply. I can't figure out how there was no data loss and yet the directories moved just like that. We have nightly backups and it's one of the features we love about EzJail and it's archive feature. The base system sits on another disk entirely and it's pristine, we don't install anything except the basic system on the system disk and the other disk is exclusively divided in jails, so the possibility of an outside process doing the mv is unlikely. Everything point to that something or someone executed a mv but how was this done? or if there is a potential problem and could happen again. And contrary to other comments here, and my admitted ignorance, I believe there are actually 3 possibilities: 1) something inside a jail was able to move the other jails into itself 2) something outside the jails moved the jails 3) the directories were moved at reboot by journal recovery, fsck or something else That is what worries me, is that it wasn't just some random bit or cosmic ray, but the potential of happening again. I am not so sure that it is *impossible* that a jail could affect other jails with EzJail. -- Alejandro _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Alejandro Imass
On 28/04/2012 19:52, Alejandro Imass wrote:
> On Sat, Apr 28, 2012 at 1:31 PM, Robert Bonomi<[hidden email]> wrote: >> Alejandro Imass<[hidden email]> wrote: >>> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi >>> <[hidden email]> wrote: >>>> Alejandro Imass<[hidden email]> wrote: >>>>> After a little more research, ___it it NOT unlikely at all___ that >>>>> under high distress and a hard boot, UFS could have somehow corrupted >>>>> the directory structure, whilst maintaining the data intact. >>>> This is techically accurate, *BUT* the specifics of the quote "corruption" >>>> unquote in the case under discussion make it *EXTREMELY* unlikely that this >>>> is what happened. >>>> >>>> 99.99+++% of all UFS filesystem "corruption' issues are the result of a >>>> system crash _between_ the time cached 'meta-data' is updated in memory >>>> and that data is flushed to disk (a deferred write). >>>> >>>> The second most common (and vanishingly rare) failure mode is a powerfail >>>> _as_ a sector of disk is being written -- resulting in 'garbage data' >>>> being written to disk. >>>> >>>> The next possibility is 'cosmic rays'. If running on 'cheap' hardware >>>> (i.e., without 'ECC' memory), this can cause a *SINGLE-BIT* error in >>>> data being output. >>>> >>>> The fact that the 'corrupted' filesystem passed fsck -without- any reported >>>> errors shows that everything in the filesystem meta-data was consistent >>>> >>> [...] >>> >>>> I think it is safe to conclude that the probabilities -greatly- favor >>>> alternative #1. >>>> >>> OK. So after your comments and further research I concur with you on >>> the mv but if it wasn't a human, then this might be exposing a serious >>> security flaw in the jail system or the way EzJail implements it. >> BOGON ALERT!!! >> > I admit my ignorance on how the filesystem works but I don't think > your condescending remarks add a lot of value. The issue here is this > actually happened and there is a flaw somewhere other than "the stupid > administrator did it". Not wanting to take any side in what could end up in personal attacks and nasty things being said about any poster genitors but : - Jails are very widely used, in fact it is probably one of the most used functionnality of FreeBSD. Far beyond ZFS, MAC or any of the other nice thingies FreeBSD has. - Jails are very often misused. Though not overly complex, creating a proper jail and upgrading it can sometime be a bit tricky. - Though not entirely devoid of bug and perfect, FreeBSD 8.2 is probably the best thing there is out there when it comes to system stability. It might be lacking some little nooks and cranies when it comes to perfect compliance with obscure standards, it might not behave as expected in some very few situation, but these are extremely rare. FreeBSD 8.2 is very widely used and this is one of the first time I heard of such a problem in jails. Nothing even remotely rings a bell. Take all these information into account and put yourself in our shoes. When reading your problem description, most of us will be inclined to think that you did something wrong. My personnal guess would be that you probably abused "ln" a bit too much when creating the jails (total shot in the dark here, but it could explain what happened). I don't see how journaling could impact your jails in anyway except if your jails were all extremely new when the crash happened or that the I/O was such that FreeBSD could never sync and commit journal from the time you created your jails to the time where the system crashed. Extremely unlikely. So my question is : where all the jail created properly ? Did you cpdup each and every one of them or were you lazy at some point ? Are all the jails properly declared in rc.conf ? My guess would be that the first jail was created in the right way, but that others were created using cp and ln, resulting in unexpected behaviour in the end. If I am right then the "surviving" jail would be either the first or the last you created. Jerome Herman > _______________________________________________ > [hidden email] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "[hidden email]" _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Alejandro Imass
On 04/28/2012 11:16 AM, Alejandro Imass wrote:
> That is what worries me, is that it wasn't just some random bit or > cosmic ray, but the potential of happening again. I am not so sure > that it is*impossible* that a jail could affect other jails with > EzJail. Sorry I'm late to the party. How about contacting EZjail and explaining what has happen:-) it may be a bug? http://erdgeist.org/arts/software/ezjail/#Author _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Alejandro Imass-6
Hi,
On Saturday 28 April 2012 20:15:25 Alejandro Imass wrote: > On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar > <[hidden email]> wrote: > >> I somewhat agree, but it wasn't a person. I am the only administrator, > >> the only one with root access. The jails were effectively moved to the > >> /usr/local/etc/apache22 of the single that survived at the top level. > >> I'm thinking something between mount, EzJail, the journal and the way > >> MySQL created a great deal of head contention, so something must have > >> gotten corrupted at the directory level like you state, but the > >> strange part is no _data_ corruption as such, because I was able to > >> physically archive the jails, move them to the correct directory and > > > > > > no matter what you do FreeBSD DOES NOT ramdomly move directories. if you are > > sure you didn't move it yourself then it must be machine hardware problem > > but still unlikely. > > After a little more research, ___it it NOT unlikely at all___ that > under high distress and a hard boot, UFS could have somehow corrupted > the directory structure, whilst maintaining the data intact. From what > I've learned so far, UFS is actually divided into 2 layers: one that > controls the directory structure and metadata and a lower layer > containing the data, so the directories being screwed up and the data > intact it is actually quite possible. > > What I'm trying to do is figure out is how it happened, and try > prevent it from happening again, so instead of dismissing it as > impossibility, I think we all should spend a little time figuring out > how these things can happen and determine how it can be prevented or > reduced. somebody mentioned the links. Did you use links in the jails to access the data? If then the directories of the jails got screwed, the links are gone but the original data is still there. The damaged directory might got fixed during the first reboot after the crash and you never noticed the fix. Erich _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Alejandro Imass
Alejandro Imass <[hidden email]> wrote:
> 3) the directories were moved at reboot by journal recovery, > fsck or something else I think it's *extremely* unlikely that fsck was involved, because it just doesn't do things like that. It might move an orphaned directory (or file) to lost+found, but nowhere else. That's in addition to the fact that, as someone already mentioned, it asks before doing anything. I don't know enough about the details of journal recovery to comment on it as a suspect. > That is what worries me, is that it wasn't just some random bit > or cosmic ray, but the potential of happening again ... Any chance that your base system -- rather than one of the jails -- has somehow been cracked; maybe even that the cracker precipitated the crash? It might be wise to restore the whole system from backup, the base from a moderately old one since it doesn't change anyway, rather than trying to recover. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[hidden email]" |
| Powered by Nabble | Edit this page |
