summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-11-22pipe: fix failure to return error code on ->confirm()Nicolas Kaiser
commit e5953cbdff26f7cbae7eff30cd9b18c4e19b7594 upstream. The arguments were transposed, we want to assign the error code to 'ret', which is being returned. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22cifs: fix broken oplock handlingSuresh Jayaraman
commit aa91c7e4ab9b0842b7d7a7cbf8cca18b20df89b5 upstream. cifs_new_fileinfo() does not use the 'oplock' value from the callers. Instead, it sets it to REQ_OPLOCK which seems wrong. We should be using the oplock value obtained from the Server to set the inode's clientCanCacheAll or clientCanCacheRead flags. Fix this by passing oplock from the callers to cifs_new_fileinfo(). This change dates back to commit a6ce4932 (2.6.30-rc3). So, all the affected versions will need this fix. Please Cc stable once reviewed and accepted. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28mm: Move vma_stack_continue into mm.hStefan Bader
commit 39aa3cb3e8250db9188a6f1e3fb62ffa1a717678 upstream. So it can be used by all that need to check for that. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28execve: make responsive to SIGKILL with large argumentsRoland McGrath
commit 9aea5a65aa7a1af9a4236dfaeb0088f1624f9919 upstream. An execve with a very large total of argument/environment strings can take a really long time in the execve system call. It runs uninterruptibly to count and copy all the strings. This change makes it abort the exec quickly if sent a SIGKILL. Note that this is the conservative change, to interrupt only for SIGKILL, by using fatal_signal_pending(). It would be perfectly correct semantics to let any signal interrupt the string-copying in execve, i.e. use signal_pending() instead of fatal_signal_pending(). We'll save that change for later, since it could have user-visible consequences, such as having a timer set too quickly make it so that an execve can never complete, though it always happened to work before. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-28execve: improve interactivity with large argumentsRoland McGrath
commit 7993bc1f4663c0db67bb8f0d98e6678145b387cd upstream. This adds a preemption point during the copying of the argument and environment strings for execve, in copy_strings(). There is already a preemption point in the count() loop, so this doesn't add any new points in the abstract sense. When the total argument+environment strings are very large, the time spent copying them can be much more than a normal user time slice. So this change improves the interactivity of the rest of the system when one process is doing an execve with very large arguments. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28setup_arg_pages: diagnose excessive argument sizeRoland McGrath
commit 1b528181b2ffa14721fb28ad1bd539fe1732c583 upstream. The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not check the size of the argument/environment area on the stack. When it is unworkably large, shift_arg_pages() hits its BUG_ON. This is exploitable with a very large RLIMIT_STACK limit, to create a crash pretty easily. Check that the initial stack is not too large to make it possible to map in any executable. We're not checking that the actual executable (or intepreter, for binfmt_elf) will fit. So those mappings might clobber part of the initial stack mapping. But that is just userland lossage that userland made happen, not a kernel problem. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28xfs: properly account for reclaimed inodesJohannes Weiner
commit 081003fff467ea0e727f66d5d435b4f473a789b3 upstream. When marking an inode reclaimable, a per-AG counter is increased, the inode is tagged reclaimable in its per-AG tree, and, when this is the first reclaimable inode in the AG, the AG entry in the per-mount tree is also tagged. When an inode is finally reclaimed, however, it is only deleted from the per-AG tree. Neither the counter is decreased, nor is the parent tree's AG entry untagged properly. Since the tags in the per-mount tree are not cleared, the inode shrinker iterates over all AGs that have had reclaimable inodes at one point in time. The counters on the other hand signal an increasing amount of slab objects to reclaim. Since "70e60ce xfs: convert inode shrinker to per-filesystem context" this is not a real issue anymore because the shrinker bails out after one iteration. But the problem was observable on a machine running v2.6.34, where the reclaimable work increased and each process going into direct reclaim eventually got stuck on the xfs inode shrinking path, trying to scan several million objects. Fix this by properly unwinding the reclaimable-state tracking of an inode when it is reclaimed. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28ocfs2: Don't walk off the end of fast symlinks.Joel Becker
commit 1fc8a117865b54590acd773a55fbac9221b018f0 upstream. ocfs2 fast symlinks are NUL terminated strings stored inline in the inode data area. However, disk corruption or a local attacker could, in theory, remove that NUL. Because we're using strlen() (my fault, introduced in a731d1 when removing vfs_follow_link()), we could walk off the end of that string. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28reiserfs: fix unwanted reiserfs lock recursionFrederic Weisbecker
commit 9d8117e72bf453dd9d85e0cd322ce4a0f8bccbc0 upstream. Prevent from recursively locking the reiserfs lock in reiserfs_unpack() because we may call journal_begin() that requires the lock to be taken only once, otherwise it won't be able to release the lock while taking other mutexes, ending up in inverted dependencies between the journal mutex and the reiserfs lock for example. This fixes: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35.4.4a #3 ------------------------------------------------------- lilo/1620 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] [<d0325c06>] do_journal_begin_r+0x86/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0315be4>] reiserfs_remount+0x224/0x530 [reiserfs] [<c10b6a20>] do_remount_sb+0x60/0x110 [<c10cee25>] do_mount+0x625/0x790 [<c10cf014>] sys_mount+0x84/0xb0 [<c12fca3d>] syscall_call+0x7/0xb -> #0 (&journal->j_mutex){+.+...}: [<c10560f6>] __lock_acquire+0x1026/0x1180 [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs] [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs] [<c10db9db>] __block_prepare_write+0x1bb/0x3a0 [<c10dbbe6>] block_prepare_write+0x26/0x40 [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs] [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs] [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3188>] vfs_ioctl+0x28/0xa0 [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3eb3>] sys_ioctl+0x63/0x70 [<c12fca3d>] syscall_call+0x7/0xb other info that might help us debug this: 2 locks held by lilo/1620: #0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d032945a>] reiserfs_unpack+0x6a/0x120 [reiserfs] #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] stack backtrace: Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3 Call Trace: [<c10560f6>] __lock_acquire+0x1026/0x1180 [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs] [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs] [<c10db9db>] __block_prepare_write+0x1bb/0x3a0 [<c10dbbe6>] block_prepare_write+0x26/0x40 [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs] [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs] [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3188>] vfs_ioctl+0x28/0xa0 [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3eb3>] sys_ioctl+0x63/0x70 [<c12fca3d>] syscall_call+0x7/0xb Reported-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28reiserfs: fix dependency inversion between inode and reiserfs mutexesFrederic Weisbecker
commit 3f259d092c7a2fdf217823e8f1838530adb0cdb0 upstream. The reiserfs mutex already depends on the inode mutex, so we can't lock the inode mutex in reiserfs_unpack() without using the safe locking API, because reiserfs_unpack() is always called with the reiserfs mutex locked. This fixes: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35c #13 ------------------------------------------------------- lilo/1606 is trying to acquire lock: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] [<d0329e9a>] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs] [<d0316b81>] reiserfs_fill_super+0x941/0xe60 [reiserfs] [<c10b7d17>] get_sb_bdev+0x117/0x170 [<d0313e21>] get_super_block+0x21/0x30 [reiserfs] [<c10b74ba>] vfs_kern_mount+0x6a/0x1b0 [<c10b7659>] do_kern_mount+0x39/0xe0 [<c10cebe0>] do_mount+0x340/0x790 [<c10cf0b4>] sys_mount+0x84/0xb0 [<c12f25cd>] syscall_call+0x7/0xb -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}: [<c1056186>] __lock_acquire+0x1026/0x1180 [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3228>] vfs_ioctl+0x28/0xa0 [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3f53>] sys_ioctl+0x63/0x70 [<c12f25cd>] syscall_call+0x7/0xb other info that might help us debug this: 1 lock held by lilo/1606: #0: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] stack backtrace: Pid: 1606, comm: lilo Not tainted 2.6.35c #13 Call Trace: [<c1056186>] __lock_acquire+0x1026/0x1180 [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3228>] vfs_ioctl+0x28/0xa0 [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3f53>] sys_ioctl+0x63/0x70 [<c12f25cd>] syscall_call+0x7/0xb Reported-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26xfs: prevent reading uninitialized stack memoryDan Rosenberg
commit a122eb2fdfd78b58c6dd992d6f4b1aaef667eef9 upstream. The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12 bytes of uninitialized stack memory, because the fsxattr struct declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero) the 12-byte fsx_pad member before copying it back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26inotify: send IN_UNMOUNT eventsEric Paris
commit 611da04f7a31b2208e838be55a42c7a1310ae321 upstream. Since the .31 or so notify rewrite inotify has not sent events about inodes which are unmounted. This patch restores those events. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26GFS2: gfs2_logd should be using interruptible waitsSteven Whitehouse
commit 5f4874903df3562b9d5649fc1cf7b8c6bb238e42 upstream. Looks like this crept in, in a recent update. Reported-by: Krzysztof Urbaniak <urban@bash.org.pl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26aio: check for multiplication overflow in do_io_submitJeff Moyer
commit 75e1c70fc31490ef8a373ea2a4bea2524099b478 upstream. Tavis Ormandy pointed out that do_io_submit does not do proper bounds checking on the passed-in iocb array:        if (unlikely(nr < 0))                return -EINVAL;        if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))                return -EFAULT;                      ^^^^^^^^^^^^^^^^^^ The attached patch checks for overflow, and if it is detected, the number of iocbs submitted is scaled down to a number that will fit in the long.  This is an ok thing to do, as sys_io_submit is documented as returning the number of iocbs submitted, so callers should handle a return value of less than the 'nr' argument passed in. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26aio: do not return ERESTARTSYS as a result of AIOJan Kara
commit a0c42bac79731276c9b2f28d54f9e658fcf843a2 upstream. OCFS2 can return ERESTARTSYS from its write function when the process is signalled while waiting for a cluster lock (and the filesystem is mounted with intr mount option). Generally, it seems reasonable to allow filesystems to return this error code from its IO functions. As we must not leak ERESTARTSYS (and similar error codes) to userspace as a result of an AIO operation, we have to properly convert it to EINTR inside AIO code (restarting the syscall isn't really an option because other AIO could have been already submitted by the same io_submit syscall). Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26/proc/vmcore: fix seekingArnd Bergmann
commit c227e69028473c7c7994a9b0a2cc0034f3f7e0fe upstream. Commit 73296bc611 ("procfs: Use generic_file_llseek in /proc/vmcore") broke seeking on /proc/vmcore. This changes it back to use default_llseek in order to restore the original behaviour. The problem with generic_file_llseek is that it only allows seeks up to inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual file systems. We should merge generic_file_llseek and default_llseek some day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore is the safer solution. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26Prevent freeing uninitialized pointer in compat_do_readv_writevDan Rosenberg
commit 767b68e96993e29e3480d7ecdd9c4b84667c5762 upstream. In 32-bit compatibility mode, the error handling for compat_do_readv_writev() may free an uninitialized pointer, potentially leading to all sorts of ugly memory corruption. This is reliably triggerable by unprivileged users by invoking the readv()/writev() syscalls with an invalid iovec pointer. The below patch fixes this to emulate the non-compat version. Introduced by commit b83733639a49 ("compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev") Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26char: Mark /dev/zero and /dev/kmem as not capable of writebackJan Kara
commit 371d217ee1ff8b418b8f73fb2a34990f951ec2d4 upstream. These devices don't do any writeback but their device inodes still can get dirty so mark bdi appropriately so that bdi code does the right thing and files inodes to lists of bdi carrying the device inodes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20NFS: Fix a typo in nfs_sockaddr_match_ipaddr6Trond Myklebust
commit b20d37ca9561711c6a3c4b859c2855f49565e061 upstream. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20cifs: fix potential double put of TCP session referenceJeff Layton
commit 460cf3411b858ad509d5255e0dfaf862a83c0299 upstream. cifs_get_smb_ses must be called on a server pointer on which it holds an active reference. It first does a search for an existing SMB session. If it finds one, it'll put the server reference and then try to ensure that the negprot is done, etc. If it encounters an error at that point then it'll return an error. There's a potential problem here though. When cifs_get_smb_ses returns an error, the caller will also put the TCP server reference leading to a double-put. Fix this by having cifs_get_smb_ses only put the server reference if it found an existing session that it could use and isn't returning an error. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20binfmt_misc: fix binfmt_misc priorityJan Sembera
commit ee3aebdd8f5f8eac41c25c80ceee3d728f920f3b upstream. Commit 74641f584da ("alpha: binfmt_aout fix") (May 2009) introduced a regression - binfmt_misc is now consulted after binfmt_elf, which will unfortunately break ia32el. ia32 ELF binaries on ia64 used to be matched using binfmt_misc and executed using wrapper. As 32bit binaries are now matched by binfmt_elf before bindmt_misc kicks in, the wrapper is ignored. The fix increases precedence of binfmt_misc to the original state. Signed-off-by: Jan Sembera <jsembera@suse.cz> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20minix: fix regression in minix_mkdir()Jorge Boncompte [DTI2]
commit eee743fd7eac9f2ea69ad06d093dfb5a12538fe5 upstream. Commit 9eed1fb721c ("minix: replace inode uid,gid,mode init with helper") broke directory creation on minix filesystems. Fix it by passing the needed mode flag to inode init helper. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20statfs() gives ESTALE errorMenyhart Zoltan
commit fbf3fdd2443965d9ba6fb4b5fecd1f6e0847218f upstream. Hi, An NFS client executes a statfs("file", &buff) call. "file" exists / existed, the client has read / written it, but it has already closed it. user_path(pathname, &path) looks up "file" successfully in the directory-cache and restarts the aging timer of the directory-entry. Even if "file" has already been removed from the server, because the lookupcache=positive option I use, keeps the entries valid for a while. nfs_statfs() returns ESTALE if "file" has already been removed from the server. If the user application repeats the statfs("file", &buff) call, we are stuck: "file" remains young forever in the directory-cache. Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20O_DIRECT: fix the splitting up of contiguous I/OJeff Moyer
commit 7a801ac6f5067539ceb5fad0fe90ec49fc156e47 upstream. commit c2c6ca4 (direct-io: do not merge logically non-contiguous requests) introduced a bug whereby all O_DIRECT I/Os were submitted a page at a time to the block layer. The problem is that the code expected dio->block_in_file to correspond to the current page in the dio. In fact, it corresponds to the previous page submitted via submit_page_section. This was purely an oversight, as the dio->cur_page_fs_offset field was introduced for just this purpose. This patch simply uses the correct variable when calculating whether there is a mismatch between contiguous logical blocks and contiguous physical blocks (as described in the comments). I also switched the if conditional following this check to an else if, to ensure that we never call dio_bio_submit twice for the same dio (in theory, this should not happen, anyway). I've tested this by running blktrace and verifying that a 64KB I/O was submitted as a single I/O. I also ran the patched kernel through xfstests' aio tests using xfs, ext4 (with 1k and 4k block sizes) and btrfs and verified that there were no regressions as compared to an unpatched kernel. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Josef Bacik <jbacik@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20sysfs: checking for NULL instead of ERR_PTRDan Carpenter
commit 57f9bdac2510cd7fda58e4a111d250861eb1ebeb upstream. d_path() returns an ERR_PTR and it doesn't return NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20ocfs2: Fix incorrect checksum validation errorSunil Mushran
commit f5ce5a08a40f2086435858ddc80cb40394b082eb upstream. For local mounts, ocfs2_read_locked_inode() calls ocfs2_read_blocks_sync() to read the inode off the disk. The latter first checks to see if that block is cached in the journal, and, if so, returns that block. That is ok. But ocfs2_read_locked_inode() goes wrong when it tries to validate the checksum of such blocks. Blocks that are cached in the journal may not have had their checksum computed as yet. We should not validate the checksums of such blocks. Fixes ossbz#1282 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1282 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Singed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20fuse: flush background queue on connection closeMiklos Szeredi
commit 595afaf9e6ee1b48e13ec4b8bcc8c7dee888161a upstream. David Bartly reported that fuse can hang in fuse_get_req_nofail() when the connection to the filesystem server is no longer active. If bg_queue is not empty then flush_bg_queue() called from request_end() can put more requests on to the pending queue. If this happens while ending requests on the processing queue then those background requests will be queued to the pending list and never ended. Another problem is that fuse_dev_release() didn't wake up processes sleeping on blocked_waitq. Solve this by: a) flushing the background queue before calling end_requests() on the pending and processing queues b) setting blocked = 0 and waking up processes waiting on blocked_waitq() Thanks to David for an excellent bug report. Reported-by: David Bartley <andareed@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20xfs: move aio completion after unwritten extent conversionChristoph Hellwig
commit fb511f2150174b18b28ad54708c1adda0df39b17 upstream. If we write into an unwritten extent using AIO we need to complete the AIO request after the extent conversion has finished. Without that a read could race to see see the extent still unwritten and return zeros. For synchronous I/O we already take care of that by flushing the xfsconvertd workqueue (which might be a bit of overkill). To do that add iocb and result fields to struct xfs_ioend, so that we can call aio_complete from xfs_end_io after the extent conversion has happened. Note that we need a new result field as io_error is used for positive errno values, while the AIO code can return negative error values and positive transfer sizes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20ext4: move aio completion after unwritten extent conversionJiaying Zhang
commit 5b3ff237bef43b9e7fb7d1eb858e29b73fd664f9 upstream. This patch is to be applied upon Christoph's "direct-io: move aio_complete into ->end_io" patch. It adds iocb and result fields to struct ext4_io_end_t, so that we can call aio_complete from ext4_end_io_nolock() after the extent conversion has finished. I have verified with Christoph's aio-dio test that used to fail after a few runs on an original kernel but now succeeds on the patched kernel. See http://thread.gmane.org/gmane.comp.file-systems.ext4/19659 for details. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20direct-io: move aio_complete into ->end_ioChristoph Hellwig
commit 40e2e97316af6e62affab7a392e792494b8d9dde upstream. Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Alex Elder <aelder@sgi.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20xfs: ensure we mark all inodes in a freed cluster XFS_ISTALEDave Chinner
commit 5b3eed756cd37255cad1181bd86bfd0977e97953 upstream. Under heavy load parallel metadata loads (e.g. dbench), we can fail to mark all the inodes in a cluster being freed as XFS_ISTALE as we skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on. When this happens and the inode cluster buffer has already been marked stale and freed, inode reclaim can try to write the inode out as it is dirty and not marked stale. This can result in writing th metadata to an freed extent, or in the case it has already been overwritten trigger a magic number check failure and return an EUCLEAN error such as: Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117 Fix this by ensuring that we hoover up all in memory inodes in the cluster and mark them XFS_ISTALE when freeing the cluster. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20xfs: fix untrusted inode number lookupDave Chinner
commit 4536f2ad8b330453d7ebec0746c4374eadd649b1 upstream. Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode numbers during lookup") changes the inode lookup code to do btree lookups for untrusted inode numbers. This change made an invalid assumption about the alignment of inodes and hence incorrectly calculated the first inode in the cluster. As a result, some inode numbers were being incorrectly considered invalid when they were actually valid. The issue was not picked up by the xfstests suite because it always runs fsr and dump (the two utilities that utilise the bulkstat interface) on cache hot inodes and hence the lookup code in the cold cache path was not sufficiently exercised to uncover this intermittent problem. Fix the issue by relaxing the btree lookup criteria and then checking if the record returned contains the inode number we are lookup for. If it we get an incorrect record, then the inode number is invalid. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26Fix init ordering of /dev/console vs callers of modprobeDavid Howells
commit 31d1d48e199e99077fb30f6fb9a793be7bec756f upstream. Make /dev/console get initialised before any initialisation routine that invokes modprobe because if modprobe fails, it's going to want to open /dev/console, presumably to write an error message to. The problem with that is that if the /dev/console driver is not yet initialised, the chardev handler will call request_module() to invoke modprobe, which will fail, because we never compile /dev/console as a module. This will lead to a modprobe loop, showing the following in the kernel log: request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 This can happen, for example, when the built in md5 module can't find the built in cryptomgr module (because the latter fails to initialise). The md5 module comes before the call to tty_init(), presumably because 'crypto' comes before 'drivers' alphabetically. Fix this by calling tty_init() from chrdev_init(). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26NFS: Fix an Oops in the NFSv4 atomic open codeTrond Myklebust
commit 0a377cff9428af2da2b293d11e07bc4dbf064ee5 upstream. Adam Lackorzynski reports: with 2.6.35.2 I'm getting this reproducible Oops: [ 110.825396] BUG: unable to handle kernel NULL pointer dereference at (null) [ 110.828638] IP: [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4 [ 110.828638] PGD be89f067 PUD bf18f067 PMD 0 [ 110.828638] Oops: 0000 [#1] SMP [ 110.828638] last sysfs file: /sys/class/net/lo/operstate [ 110.828638] CPU 2 [ 110.828638] Modules linked in: rtc_cmos rtc_core rtc_lib amd64_edac_mod i2c_amd756 edac_core i2c_core dm_mirror dm_region_hash dm_log dm_snapshot sg sr_mod usb_storage ohci_hcd mptspi tg3 mptscsih mptbase usbcore nls_base [last unloaded: scsi_wait_scan] [ 110.828638] [ 110.828638] Pid: 11264, comm: setchecksum Not tainted 2.6.35.2 #1 [ 110.828638] RIP: 0010:[<ffffffff811247b7>] [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4 [ 110.828638] RSP: 0000:ffff88003bf5b878 EFLAGS: 00010296 [ 110.828638] RAX: ffff8800bddb48a8 RBX: ffff88003bf5bb18 RCX: 0000000000000000 [ 110.828638] RDX: ffff8800be258800 RSI: 0000000000000000 RDI: ffff88003bf5b9f8 [ 110.828638] RBP: 0000000000000000 R08: ffff8800bddb48a8 R09: 0000000000000004 [ 110.828638] R10: 0000000000000003 R11: ffff8800be779000 R12: ffff8800be258800 [ 110.828638] R13: ffff88003bf5b9f8 R14: ffff88003bf5bb20 R15: ffff8800be258800 [ 110.828638] FS: 0000000000000000(0000) GS:ffff880041e00000(0063) knlGS:00000000556bd6b0 [ 110.828638] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 110.828638] CR2: 0000000000000000 CR3: 00000000be8ef000 CR4: 00000000000006e0 [ 110.828638] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 110.828638] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 110.828638] Process setchecksum (pid: 11264, threadinfo ffff88003bf5a000, task ffff88003f232210) [ 110.828638] Stack: [ 110.828638] 0000000000000000 ffff8800bfbcf920 0000000000000000 0000000000000ffe [ 110.828638] <0> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 110.828638] <0> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 110.828638] Call Trace: [ 110.828638] [<ffffffff81124c1f>] ? nfs4_xdr_enc_setattr+0x90/0xb4 [ 110.828638] [<ffffffff81371161>] ? call_transmit+0x1c3/0x24a [ 110.828638] [<ffffffff813774d9>] ? __rpc_execute+0x78/0x22a [ 110.828638] [<ffffffff81371a91>] ? rpc_run_task+0x21/0x2b [ 110.828638] [<ffffffff81371b7e>] ? rpc_call_sync+0x3d/0x5d [ 110.828638] [<ffffffff8111e284>] ? _nfs4_do_setattr+0x11b/0x147 [ 110.828638] [<ffffffff81109466>] ? nfs_init_locked+0x0/0x32 [ 110.828638] [<ffffffff810ac521>] ? ifind+0x4e/0x90 [ 110.828638] [<ffffffff8111e2fb>] ? nfs4_do_setattr+0x4b/0x6e [ 110.828638] [<ffffffff8111e634>] ? nfs4_do_open+0x291/0x3a6 [ 110.828638] [<ffffffff8111ed81>] ? nfs4_open_revalidate+0x63/0x14a [ 110.828638] [<ffffffff811056c4>] ? nfs_open_revalidate+0xd7/0x161 [ 110.828638] [<ffffffff810a2de4>] ? do_lookup+0x1a4/0x201 [ 110.828638] [<ffffffff810a4733>] ? link_path_walk+0x6a/0x9d5 [ 110.828638] [<ffffffff810a42b6>] ? do_last+0x17b/0x58e [ 110.828638] [<ffffffff810a5fbe>] ? do_filp_open+0x1bd/0x56e [ 110.828638] [<ffffffff811cd5e0>] ? _atomic_dec_and_lock+0x30/0x48 [ 110.828638] [<ffffffff810a9b1b>] ? dput+0x37/0x152 [ 110.828638] [<ffffffff810ae063>] ? alloc_fd+0x69/0x10a [ 110.828638] [<ffffffff81099f39>] ? do_sys_open+0x56/0x100 [ 110.828638] [<ffffffff81027a22>] ? ia32_sysret+0x0/0x5 [ 110.828638] Code: 83 f1 01 e8 f5 ca ff ff 48 83 c4 50 5b 5d 41 5c c3 41 57 41 56 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 81 ec 18 01 00 00 <8b> 06 89 c2 83 e2 08 83 fa 01 19 db 83 e3 f8 83 c3 18 a8 01 8d [ 110.828638] RIP [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4 [ 110.828638] RSP <ffff88003bf5b878> [ 110.828638] CR2: 0000000000000000 [ 112.840396] ---[ end trace 95282e83fd77358f ]--- We need to ensure that the O_EXCL flag is turned off if the user doesn't set O_CREAT. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26nfs: Add "lookupcache" to displayed mount optionsPatrick J. LoPresti
commit 9b00c64318cc337846a7a08a5678f5f19aeff188 upstream. Running "cat /proc/mounts" fails to display the "lookupcache" option. This oversight cost me a bunch of wasted time recently. The following simple patch fixes it. Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26Fix the nested PR lock calling issue in ACLJiaju Zhang
commit 845b6cf34150100deb5f58c8a37a372b111f2918 upstream. Hi, Thanks a lot for all the review and comments so far;) I'd like to send the improved (V4) version of this patch. This patch fixes a deadlock in OCFS2 ACL. We found this bug in OCFS2 and Samba integration using scenario, the symptom is several smbd processes will be hung under heavy workload. Finally we found out it is the nested PR lock calling that leads to this deadlock: node1 node2 gr PR | V PR(EX)---> BAST:OCFS2_LOCK_BLOCKED | V rq PR | V wait=1 After requesting the 2nd PR lock, the process "smbd" went into D state. It can only be woken up when the 1st PR lock's RO holder equals zero. There should be an ocfs2_inode_unlock in the calling path later on, which can decrement the RO holder. But since it has been in uninterruptible sleep, the unlock function has no chance to be called. The related stack trace is: smbd D ffff8800013d0600 0 9522 5608 0x00000000 ffff88002ca7fb18 0000000000000282 ffff88002f964500 ffff88002ca7fa98 ffff8800013d0600 ffff88002ca7fae0 ffff88002f964340 ffff88002f964340 ffff88002ca7ffd8 ffff88002ca7ffd8 ffff88002f964340 ffff88002f964340 Call Trace: [<ffffffff80350425>] schedule_timeout+0x175/0x210 [<ffffffff8034f580>] wait_for_common+0xf0/0x210 [<ffffffffa03e12b9>] __ocfs2_cluster_lock+0x3b9/0xa90 [ocfs2] [<ffffffffa03e7665>] ocfs2_inode_lock_full_nested+0x255/0xdb0 [ocfs2] [<ffffffffa0446019>] ocfs2_get_acl+0x69/0x120 [ocfs2] [<ffffffffa0446368>] ocfs2_check_acl+0x28/0x80 [ocfs2] [<ffffffff800e3507>] acl_permission_check+0x57/0xb0 [<ffffffff800e357d>] generic_permission+0x1d/0xc0 [<ffffffffa03eecea>] ocfs2_permission+0x10a/0x1d0 [ocfs2] [<ffffffff800e3f65>] inode_permission+0x45/0x100 [<ffffffff800d86b3>] sys_chdir+0x53/0x90 [<ffffffff80007458>] system_call_fastpath+0x16/0x1b [<00007f34a4ef6927>] 0x7f34a4ef6927 For details, please see: https://bugzilla.novell.com/show_bug.cgi?id=614332 and http://oss.oracle.com/bugzilla/show_bug.cgi?id=1278 Signed-off-by: Jiaju Zhang <jjzhang@suse.de> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26nilfs2: fix list corruption after ifile creation failureRyusuke Konishi
commit af4e36318edb848fcc0a8d5f75000ca00cdc7595 upstream. If nilfs_attach_checkpoint() gets a memory allocation failure during creation of ifile, it will return without removing nilfs_sb_info struct from ns_supers list. When a concurrently mounted snapshot is unmounted or another new snapshot is mounted after that, this causes kernel oops as below: > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<f83662ff>] nilfs_find_sbinfo+0x74/0xa4 [nilfs2] > *pde = 00000000 > Oops: 0000 [#1] SMP <snip> > Call Trace: > [<f835dc29>] ? nilfs_get_sb+0x165/0x532 [nilfs2] > [<c1173c87>] ? ida_get_new_above+0x16d/0x187 > [<c109a7f8>] ? alloc_vfsmnt+0x7e/0x10a > [<c1070790>] ? kstrdup+0x2c/0x40 > [<c1089041>] ? vfs_kern_mount+0x96/0x14e > [<c108913d>] ? do_kern_mount+0x32/0xbd > [<c109b331>] ? do_mount+0x642/0x6a1 > [<c101a415>] ? do_page_fault+0x0/0x2d1 > [<c1099c00>] ? copy_mount_options+0x80/0xe2 > [<c10705d8>] ? strndup_user+0x48/0x67 > [<c109b3f1>] ? sys_mount+0x61/0x90 > [<c10027cc>] ? sysenter_do_call+0x12/0x22 This fixes the problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26ocfs2/dlm: remove potential deadlock -V3Wengang Wang
commit b11f1f1ab73fd358b1b734a9427744802202ba68 upstream. When we need to take both dlm_domain_lock and dlm->spinlock, we should take them in order of: dlm_domain_lock then dlm->spinlock. There is pathes disobey this order. That is calling dlm_lockres_put() with dlm->spinlock held in dlm_run_purge_list. dlm_lockres_put() calls dlm_put() at the ref and dlm_put() locks on dlm_domain_lock. Fix: Don't grab/put the dlm when the initialising/releasing lockres. That grab is not required because we don't call dlm_unregister_domain() based on refcount. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26ocfs2/dlm: avoid incorrect bit set in refmap on recovery masterWengang Wang
commit a524812b7eaa7783d7811198921100f079034e61 upstream. In the following situation, there remains an incorrect bit in refmap on the recovery master. Finally the recovery master will fail at purging the lockres due to the incorrect bit in refmap. 1) node A has no interest on lockres A any longer, so it is purging it. 2) the owner of lockres A is node B, so node A is sending de-ref message to node B. 3) at this time, node B crashed. node C becomes the recovery master. it recovers lockres A(because the master is the dead node B). 4) node A migrated lockres A to node C with a refbit there. 5) node A failed to send de-ref message to node B because it crashed. The failure is ignored. no other action is done for lockres A any more. For mormal, re-send the deref message to it to recovery master can fix it. Well, ignoring the failure of deref to the original master and not recovering the lockres to recovery master has the same effect. And the later is simpler. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26ocfs2: Count more refcount records in file system fragmentation.Tao Ma
commit 8a2e70c40ff58f82dde67770e6623ca45f0cb0c8 upstream. The refcount record calculation in ocfs2_calc_refcount_meta_credits is too optimistic that we can always allocate contiguous clusters and handle an already existed refcount rec as a whole. Actually because of file system fragmentation, we may have the chance to split a refcount record into 3 parts during the transaction. So consider the worst case in record calculation. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26ocfs2 fix o2dlm dlm run purgelist (rev 3)Srinivas Eeda
commit 7beaf243787f85a2ef9213ccf13ab4a243283fde upstream. This patch fixes two problems in dlm_run_purgelist 1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge the same lockres instead of trying the next lockres. 2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres. spinlock is reacquired but in this window lockres can get reused. This leads to BUG. This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the lockres spinlock protecting it from getting reused. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Acked-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26ocfs2/dlm: fix a dead lockWengang Wang
commit 6d98c3ccb52f692f1a60339dde7c700686a5568b upstream. When we have to take both dlm->master_lock and lockres->spinlock, take them in order lockres->spinlock and then dlm->master_lock. The patch fixes a violation of the rule. We can simply move taking dlm->master_lock to where we have dropped res->spinlock since when we access res->state and free mle memory we don't need master_lock's protection. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26ocfs2: do not overwrite error codes in ocfs2_init_aclTiger Yang
commit 6eda3dd33f8a0ce58ee56a11351758643a698db4 upstream. Setting the acl while creating a new inode depends on the error codes of posix_acl_create_masq. This patch fix a issue of overwriting the error codes of it. Reported-by: Pawel Zawora <pzawora@gmail.com> Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-20mm: fix up some user-visible effects of the stack guard pageLinus Torvalds
commit d7824370e26325c881b665350ce64fb0a4fde24a upstream. This commit makes the stack guard page somewhat less visible to user space. It does this by: - not showing the guard page in /proc/<pid>/maps It looks like lvm-tools will actually read /proc/self/maps to figure out where all its mappings are, and effectively do a specialized "mlockall()" in user space. By not showing the guard page as part of the mapping (by just adding PAGE_SIZE to the start for grows-up pages), lvm-tools ends up not being aware of it. - by also teaching the _real_ mlock() functionality not to try to lock the guard page. That would just expand the mapping down to create a new guard page, so there really is no point in trying to lock it in place. It would perhaps be nice to show the guard page specially in /proc/<pid>/maps (or at least mark grow-down segments some way), but let's not open ourselves up to more breakage by user space from programs that depends on the exact deails of the 'maps' file. Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools source code to see what was going on with the whole new warning. Reported-and-tested-by: François Valenduc <francois.valenduc@tvcablenet.be Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13ext4: fix freeze deadlock under IOEric Sandeen
commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f upstream. Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks when freezing a filesystem which had active IO; the vfs_check_frozen level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing through. Duh. Changing the test to FREEZE_TRANS should let the normal freeze syncing get through the fs, but still block any transactions from starting once the fs is completely frozen. I tested this by running fsstress in the background while periodically snapshotting the fs and running fsck on the result. I ran into occasional deadlocks, but different ones. I think this is a fine fix for the problem at hand, and the other deadlocky things will need more investigation. Reported-by: Phillip Susi <psusi@cfl.rr.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13jfs: don't allow os2 xattr namespace overlap with othersDave Kleikamp
commit aca0fa34bdaba39bfddddba8ca70dba4782e8fe6 upstream. It's currently possible to bypass xattr namespace access rules by prefixing valid xattr names with "os2.", since the os2 namespace stores extended attributes in a legacy format with no prefix. This patch adds checking to deny access to any valid namespace prefix following "os2.". Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Reported-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13signalfd: fill in ssi_int for posix timers and message queuesNathan Lynch
commit a2a20c412c86e0bb46a9ab0dd31bcfe6d201b913 upstream. If signalfd is used to consume a signal generated by a POSIX interval timer or POSIX message queue, the ssi_int field does not reflect the data (sigevent->sigev_value) supplied to timer_create(2) or mq_notify(3). (The ssi_ptr field, however, is filled in.) This behavior differs from signalfd's treatment of sigqueue-generated signals -- see the default case in signalfd_copyinfo. It also gives results that differ from the case when a signal is handled conventionally via a sigaction-registered handler. So, set signalfd_siginfo->ssi_int in the remaining cases (__SI_TIMER, __SI_MESGQ) where ssi_ptr is set. akpm: a non-back-compatible change. Merge into -stable to minimise the number of kernels which are in the field and which miss this feature. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13fs/ecryptfs/file.c: introduce missing freeJulia Lawall
commit ceeab92971e8af05c1e81a4ff2c271124b55bb9b upstream. The comments in the code indicate that file_info should be released if the function fails. This releasing is done at the label out_free, not out. The semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,f1,l; position p1,p2; expression *ptr != NULL; @@ x@p1 = kmem_cache_zalloc(...); ... if (x == NULL) S <... when != x when != if (...) { <+...x...+> } ( x->f1 = E | (x->f1 == NULL || ...) | f(...,x->f1,...) ) ...> ( return <+...x...+>; | return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print "* file: %s kmem_cache_zalloc %s" % (p1[0].file,p1[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13ecryptfs: release reference to lower mount if interpose failsLino Sanfilippo
commit 31f73bee3e170b7cabb35db9e2f4bf7919b9d036 upstream. In ecryptfs_lookup_and_interpose_lower() the lower mount is not decremented if allocation of a dentry info struct failed. As a result the lower filesystem cant be unmounted any more (since it is considered busy). This patch corrects the reference counting. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13eCryptfs: Handle ioctl calls with unlocked and compat functionsTyler Hicks
commit c43f7b8fb03be8bcc579bfc4e6ab70eac887ab55 upstream. Lower filesystems that only implemented unlocked_ioctl weren't being passed ioctl calls because eCryptfs only checked for lower_file->f_op->ioctl and returned -ENOTTY if it was NULL. eCryptfs shouldn't implement ioctl(), since it doesn't require the BKL. This patch introduces ecryptfs_unlocked_ioctl() and ecryptfs_compat_ioctl(), which passes the calls on to the lower file system. https://bugs.launchpad.net/ecryptfs/+bug/469664 Reported-by: James Dupin <james.dupin@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>