summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-10-13udf: fix retun value on error path in udf_load_logicalvolNikola Pajkovsky
commit 68766a2edcd5cd744262a70a2f67a320ac944760 upstream. In case we detect a problem and bail out, we fail to set "ret" to a nonzero value, and udf_load_logicalvol will mistakenly report success. Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13Convert properly UTF-8 to UTF-16Frediano Ziglio
commit fd3ba42c76d3d4b776120c2b24c1791e7bb3deb1 upstream. wchar_t is currently 16bit so converting a utf8 encoded characters not in plane 0 (>= 0x10000) to wchar_t (that is calling char2uni) lead to a -EINVAL return. This patch detect utf8 in cifs_strtoUTF16 and add special code calling utf8s_to_utf16s. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13cifs: reinstate the forcegid optionJeff Layton
commit 72bd481f860f0125c810bb43d878ce5f9c060c58 upstream. Apparently this was lost when we converted to the standard option parser in 8830d7e07a5e38bc47650a7554b7c1cfd49902bf Reported-by: Gregory Lee Bartholomew <gregory.lee.bartholomew@gmail.com> Cc: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13JFFS2: don't fail on bitflips in OOBBrian Norris
commit 74d83beaa229aac7d126ac1ed9414658ff1a89d2 upstream. JFFS2 was designed without thought for OOB bitflips, it seems, but they can occur and will be reported to JFFS2 via mtd_read_oob()[1]. We don't want to fail on these transactions, since the data was corrected. [1] Few drivers report bitflips for OOB-only transactions. With such drivers, this patch should have no effect. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13kpageflags: fix wrong KPF_THP on non-huge compound pagesNaoya Horiguchi
commit 7a71932d5676b7410ab64d149bad8bde6b0d8632 upstream. KPF_THP can be set on non-huge compound pages (like slab pages or pages allocated by drivers with __GFP_COMP) because PageTransCompound only checks PG_head and PG_tail. Obviously this is a bug and breaks user space applications which look for thp via /proc/kpageflags. This patch rules out setting KPF_THP wrongly by additionally checking PageLRU on the head pages. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix fdatasync() for files with only i_size changesJan Kara
commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: always set i_op in ext4_mknod()Bernd Schubert
commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream. ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR to mask those methods. And ext4_iget also always sets it, so there is an inconsistency. Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: online defrag is not supported for journaled filesDmitry Monakhov
commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream. Proper block swap for inodes with full journaling enabled is truly non obvious task. In order to be on a safe side let's explicitly disable it for now. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: move_extent code cleanupDmitry Monakhov
commit 03bd8b9b896c8e940f282f540e6b4de90d666b7c upstream. - Remove usless checks, because it is too late to check that inode != NULL at the moment it was referenced several times. - Double lock routines looks very ugly and locking ordering relays on order of i_ino, but other kernel code rely on order of pointers. Let's make them simple and clean. - check that inodes belongs to the same SB as soon as possible. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix crash when accessing /proc/mounts concurrentlyHerton Ronaldo Krzesinski
commit 50df9fd55e4271e89a7adf3b1172083dd0ca199d upstream. The crash was caused by a variable being erronously declared static in token2str(). In addition to /proc/mounts, the problem can also be easily replicated by accessing /proc/fs/ext4/<partition>/options in parallel: $ cat /proc/fs/ext4/<partition>/options > options.txt ... and then running the following command in two different terminals: $ while diff /proc/fs/ext4/<partition>/options options.txt; do true; done This is also the cause of the following a crash while running xfstests #234, as reported in the following bug reports: https://bugs.launchpad.net/bugs/1053019 https://bugzilla.kernel.org/show_bug.cgi?id=47731 Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Brad Figg <brad.figg@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix potential deadlock in ext4_nonda_switch()Theodore Ts'o
commit 00d4e7362ed01987183e9528295de3213031309c upstream. In ext4_nonda_switch(), if the file system is getting full we used to call writeback_inodes_sb_if_idle(). The problem is that we can be holding i_mutex already, and this causes a potential deadlock when writeback_inodes_sb_if_idle() when it tries to take s_umount. (See lockdep output below). As it turns out we don't need need to hold s_umount; the fact that we are in the middle of the write(2) system call will keep the superblock pinned. Unfortunately writeback_inodes_sb() checks to make sure s_umount is taken, and the VFS uses a different mechanism for making sure the file system doesn't get unmounted out from under us. The simplest way of dealing with this is to just simply grab s_umount using a trylock, and skip kicking the writeback flusher thread in the very unlikely case that we can't take a read lock on s_umount without blocking. Also, we now check the cirteria for kicking the writeback thread before we decide to whether to fall back to non-delayed writeback, so if there are any outstanding delayed allocation writes, we try to get them resolved as soon as possible. [ INFO: possible circular locking dependency detected ] 3.6.0-rc1-00042-gce894ca #367 Not tainted ------------------------------------------------------- dd/8298 is trying to acquire lock: (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 but task is already holding lock: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 which lock already depends on the new lock. 2 locks held by dd/8298: #0: (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3 #1: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 stack backtrace: Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367 Call Trace: [<c015b79c>] ? console_unlock+0x345/0x372 [<c06d62a1>] print_circular_bug+0x190/0x19d [<c019906c>] __lock_acquire+0x86d/0xb6c [<c01999db>] ? mark_held_locks+0x5c/0x7b [<c0199724>] lock_acquire+0x66/0xb9 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c06db935>] down_read+0x28/0x58 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4 [<c0271ece>] ext4_da_write_begin+0x27/0x193 [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205 [<c01ddce7>] generic_file_aio_write+0x78/0xd3 [<c026d336>] ext4_file_write+0x480/0x4a6 [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c [<c0180944>] ? sched_clock_cpu+0x11a/0x13e [<c01967e9>] ? trace_hardirqs_off+0xb/0xd [<c018099f>] ? local_clock+0x37/0x4e [<c0209f2c>] do_sync_write+0x67/0x9d [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44 [<c020a7b9>] vfs_write+0x7b/0xe6 [<c020a9a6>] sys_write+0x3b/0x64 [<c06dd4bd>] syscall_call+0x7/0xb Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: avoid duplicate writes of the backup bg descriptor blocksYongqiang Yang
commit 2ebd1704ded88a8ae29b5f3998b13959c715c4be upstream. The resize code was needlessly writing the backup block group descriptor blocks multiple times (once per block group) during an online resize. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: don't copy non-existent gdt blocks when resizingYongqiang Yang
commit 6df935ad2fced9033ab52078825fcaf6365f34b7 upstream. The resize code was copying blocks at the beginning of each block group in order to copy the superblock and block group descriptor table (gdt) blocks. This was, unfortunately, being done even for block groups that did not have super blocks or gdt blocks. This is a complete waste of perfectly good I/O bandwidth, to skip writing those blocks for sparse bg's. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: ignore last group w/o enough space when resizing instead of BUG'ingYongqiang Yang
commit 03c1c29053f678234dbd51bf3d65f3b7529021de upstream. If the last group does not have enough space for group tables, ignore it instead of calling BUG_ON(). Reported-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13jbd2: don't write superblock when if its emptyEric Sandeen
commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3 upstream. This sequence: # truncate --size=1g fsfile # mkfs.ext4 -F fsfile # mount -o loop,ro fsfile /mnt # umount /mnt # dmesg | tail results in an IO error when unmounting the RO filesystem: [ 318.020828] Buffer I/O error on device loop1, logical block 196608 [ 318.027024] lost page write due to I/O error on loop1 [ 318.032088] JBD2: Error -5 detected when updating journal superblock for loop1-8. This was a regression introduced by commit 24bcc89c7e7c: "jbd2: split updating of journal superblock and marking journal empty". Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-07coredump: prevent double-free on an error path in core dumperDenys Vlasenko
commit f34f9d186df35e5c39163444c43b4fc6255e39c5 upstream. In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate memory for info->fields, it frees already allocated stuff and returns error to its caller, fill_note_info. Which in turn returns error to its caller, elf_core_dump. Which jumps to cleanup label and calls free_note_info, which will happily try to free all info->fields again. BOOM. This is the fix. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Venu Byravarasu <vbyravarasu@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-07vfs: dcache: fix deadlock in tree traversalMiklos Szeredi
commit 8110e16d42d587997bcaee0c864179e6d93603fe upstream. IBM reported a deadlock in select_parent(). This was found to be caused by taking rename_lock when already locked when restarting the tree traversal. There are two cases when the traversal needs to be restarted: 1) concurrent d_move(); this can only happen when not already locked, since taking rename_lock protects against concurrent d_move(). 2) racing with final d_put() on child just at the moment of ascending to parent; rename_lock doesn't protect against this rare race, so it can happen when already locked. Because of case 2, we need to be able to handle restarting the traversal when rename_lock is already held. This patch fixes all three callers of try_to_ascend(). IBM reported that the deadlock is gone with this patch. [ I rewrote the patch to be smaller and just do the "goto again" if the lock was already held, but credit goes to Miklos for the real work. - Linus ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02LockD: pass service to per-net up and down functionsStanislav Kinsbursky
commit 4db77695bf5738bdafa83d1b58b64cbecc6f55e7 upstream. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFSd: set nfsd_serv to NULL after service destructionStanislav Kinsbursky
commit 57c8b13e3cd0f94944c9691ce7f58e5fcef8a12d upstream. In nfsd_destroy(): if (destroy) svc_shutdown_net(nfsd_serv, net); svc_destroy(nfsd_server); svc_shutdown_net(nfsd_serv, net) calls nfsd_last_thread(), which sets nfsd_serv to NULL, causing a NULL dereference on the following line. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFSd: introduce nfsd_destroy() helperStanislav Kinsbursky
commit 19f7e2ca44dfc3c1b3f499fc46801f98d403500f upstream. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFS: return error from decode_getfh in decode openWeston Andros Adamson
commit 01913b49cf1dc6409a07dd2a4cc6af2e77f3c410 upstream. If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last decode_* call must have succeeded. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFS: Fix a problem with the legacy binary mount codeTrond Myklebust
commit 872ece86ea5c367aa92f44689c2d01a1c767aeb3 upstream. Apparently, am-utils is still using the legacy binary mountdata interface, and is having trouble parsing /proc/mounts due to the 'port=' field being incorrectly set. The following patch should fix up the regression. Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFS: Fix the initialisation of the readdir 'cookieverf' arrayTrond Myklebust
commit c3f52af3e03013db5237e339c817beaae5ec9e3a upstream. When the NFS_COOKIEVERF helper macro was converted into a static inline function in commit 99fadcd764 (nfs: convert NFS_*(inode) helpers to static inline), we broke the initialisation of the readdir cookies, since that depended on doing a memset with an argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *). At this point, NFS_COOKIEVERF seems to be more of an obfuscation than a helper, so the best thing would be to just get rid of it. Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881 Reported-by: Andi Kleen <andi@firstfloor.org> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02cifs: fix return value in cifsConvertToUTF16Jeff Layton
commit c73f693989d7a7d99ec66a7065295a0c93d0b127 upstream. This function returns the wrong value, which causes the callers to get the length of the resulting pathname wrong when it contains non-ASCII characters. This seems to fix https://bugzilla.samba.org/show_bug.cgi?id=6767 Reported-by: Baldvin Kovacs <baldvin.kovacs@gmail.com> Reported-and-Tested-by: Nicolas Lefebvre <nico.lefebvre@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02fs/proc: fix potential unregister_sysctl_table hangFrancesco Ruggeri
commit 6bf6104573482570f7103d3e5ddf9574db43a363 upstream. The unregister_sysctl_table() function hangs if all references to its ctl_table_header structure are not dropped. This can happen sometimes because of a leak in proc_sys_lookup(): proc_sys_lookup() gets a reference to the table via lookup_entry(), but it does not release it when a subsequent call to sysctl_follow_link() fails. This patch fixes this leak by making sure the reference is always dropped on return. See also commit 076c3eed2c31 ("sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry") which reorganized this code in 3.4. Tested in Linux 3.4.4. Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02eCryptfs: Copy up attributes of the lower target inode after renameTyler Hicks
commit 8335eafc2859e1a26282bef7c3d19f3d68868b8a upstream. After calling into the lower filesystem to do a rename, the lower target inode's attributes were not copied up to the eCryptfs target inode. This resulted in the eCryptfs target inode staying around, rather than being evicted, because i_nlink was not updated for the eCryptfs inode. This also meant that eCryptfs didn't do the final iput() on the lower target inode so it stayed around, as well. This would result in a failure to free up space occupied by the target file in the rename() operation. Both target inodes would eventually be evicted when the eCryptfs filesystem was unmounted. This patch calls fsstack_copy_attr_all() after the lower filesystem does its ->rename() so that important inode attributes, such as i_nlink, are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called and eCryptfs can drop its final reference on the lower inode. http://launchpad.net/bugs/561129 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()Miklos Szeredi
commit b161dfa6937ae46d50adce8a7c6b12233e96e7bd upstream. IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02vfs: make O_PATH file descriptors usable for 'fstat()'Linus Torvalds
commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2 upstream. We already use them for openat() and friends, but fstat() also wants to be able to use O_PATH file descriptors. This should make it more directly comparable to the O_SEARCH of Solaris. Note that you could already do the same thing with "fstatat()" and an empty path, but just doing "fstat()" directly is simpler and faster, so there is no reason not to just allow it directly. See also commit 332a2e1244bd, which did the same thing for fchdir, for the same reasons. Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14fuse: fix retrieve lengthMiklos Szeredi
commit c9e67d483776d8d2a5f3f70491161b205930ffe1 upstream. In some cases fuse_retrieve() would return a short byte count if offset was non-zero. The data returned was correct, though. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14ext3: Fix fdatasync() for files with only i_size changesJan Kara
commit 156bddd8e505b295540f3ca0e27dda68cb0d49aa upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14udf: Fix data corruption for files in ICBJan Kara
commit 9c2fc0de1a6e638fe58c354a463f544f42a90a09 upstream. When a file is stored in ICB (inode), we overwrite part of the file, and the page containing file's data is not in page cache, we end up corrupting file's data by overwriting them with zeros. The problem is we use simple_write_begin() which simply zeroes parts of the page which are not written to. The problem has been introduced by be021ee4 (udf: convert to new aops). Fix the problem by providing a ->write_begin function which makes the page properly uptodate. Reported-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14block: replace __getblk_slow misfix by grow_dev_page fixHugh Dickins
commit 676ce6d5ca3098339c028d44fe0427d1566a4d2d upstream. Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow") is not good: a successful call to grow_buffers() cannot guarantee that the page won't be reclaimed before the immediate next call to __find_get_block(), which is why there was always a loop there. Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595: inode #19278: block 664: comm cc1: unable to read itable block" on console, which pointed to this commit. I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs sometimes fails from a missing header file, under memory pressure on ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7 itself, despite that commit being in there: bisection pointed to an irrelevant pinctrl merge, but hard to tell when failure takes between 18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2). (I've since found such __ext4_get_inode_loc errors in /var/log/messages from previous weeks: why the message never appeared on console until yesterday morning is a mystery for another day.) Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus a checkpatch nitfix). Simplify the interface between grow_buffers() and grow_dev_page(), and avoid the infinite loop beyond end of device by instead checking init_page_buffers()'s end_block there (I presume that's more efficient than a repeated call to blkdev_max_block()), returning -ENXIO to __getblk_slow() in that case. And remove akpm's ten-year-old "__getblk() cannot fail ... weird" comment, but that is worrying: are all users of __getblk() really now prepared for a NULL bh beyond end of device, or will some oops?? Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14NFS: Alias the nfs module to nfs4bjschuma@gmail.com
commit 425e776d93a7a5070b77d4f458a5bab0f924652c upstream. This allows distros to remove the line from their modprobe configuration. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14UBIFS: fix complaints about too small debug buffer sizeArtem Bityutskiy
commit 65b455b123c7e2b835a0b7148f9bae584f95000e upstream. When debugging is enabled, we use a temporary on-stack buffer for formatting the key strings like "(11368871, direntry, 0xcd0750)". The buffer size is 32 bytes and sometimes it is not enough to fit the key string - e.g., when inode numbers are high. This is not fatal, but the key strings are incomplete and UBIFS complains like this: UBIFS assert failed in dbg_snprintf_key at 137 (pid 1) This is a regression caused by "515315a UBIFS: fix key printing". Fix the issue by increasing the buffer to 48 bytes. Reported-by: Michael Hench <michaelhench@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Michael Hench <michaelhench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14NFS: return -ENOKEY when the upcall fails to map the nameBryan Schumaker
commit 12dfd080556124088ed61a292184947711b46cbe upstream. This allows the normal error-paths to handle the error, rather than making a special call to complete_request_key() just for this instance. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14NFS: Clear key construction data if the idmap upcall failsBryan Schumaker
commit c5066945b7ea346a11424dbeb7830b7d7d00c206 upstream. idmap_pipe_downcall already clears this field if the upcall succeeds, but if it fails (rpc.idmapd isn't running) the field will still be set on the next call triggering a BUG_ON(). This patch tries to handle all possible ways that the upcall could fail and clear the idmap key data for each one. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_doneTrond Myklebust
commit 47fbf7976e0b7d9dcdd799e2a1baba19064d9631 upstream. Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence disconnected data server) we've been sending layoutreturn calls while there is potentially still outstanding I/O to the data servers. The reason we do this is to avoid races between replayed writes to the MDS and the original writes to the DS. When this happens, the BUG_ON() in nfs4_layoutreturn_done can be triggered because it assumes that we would never call layoutreturn without knowing that all I/O to the DS is finished. The fix is to remove the BUG_ON() now that the assumptions behind the test are obsolete. Reported-by: Boaz Harrosh <bharrosh@panasas.com> Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14pnfs: defer release of pages in layoutgetIdan Kedar
commit 8554116e17eef055d9dd58a94b3427cb2ad1c317 upstream. we have encountered a bug whereby reading a lot of files (copying fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused a general protection fault in xdr_shrink_bufhead. this function is called when decoding the response from LAYOUTGET. the decoding is done by a worker thread, and the caller of LAYOUTGET waits for the worker thread to complete. hitting Ctrl+C caused the synchronous wait to end and the next thing the caller does is to free the pages, so when the worker thread calls xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these pages has been moved to nfs4_layoutget_release. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14NFSv3: Ensure that do_proc_get_root() reports errors correctlyTrond Myklebust
commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream. If the rpc call to NFS3PROC_FSINFO fails, then we need to report that error so that the mount fails. Otherwise we can end up with a superblock with completely unusable values for block sizes, maxfilesize, etc. Reported-by: Yuanming Chen <hikvision_linux@163.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14vfs: canonicalize create mode in build_open_flags()Miklos Szeredi
commit e68726ff72cf7ba5e7d789857fcd9a75ca573f03 upstream. Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) | S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14vfs: missed source of ->f_pos racesAl Viro
commit 0e665d5d1125f9f4ccff56a75e814f10f88861a2 upstream. compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: fix kernel BUG on large-scale rm -rf commandsTheodore Ts'o
commit 89a4e48f8479f8145eca9698f39fe188c982212f upstream. Commit 968dee7722: "ext4: fix hole punch failure when depth is greater than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel crashes when users ran run "rm -rf" on large directory hierarchy on ext4 filesystems on RAID devices: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710) Call Trace: [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110 [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0 [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0 [<ffffffff81207e05>] ext4_truncate+0xf5/0x100 [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490 [<ffffffff811a1312>] evict+0xa2/0x1a0 [<ffffffff811a1513>] iput+0x103/0x1f0 [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0 [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40 [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50 [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 RIP [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0 This could be reproduced as follows: The problem in commit 968dee7722 was that caused the variable 'i' to be left uninitialized if the truncate required more space than was available in the journal. This resulted in the function ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused ext4_ext_remove_space() to restart the truncate operation after starting a new jbd2 handle. Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: Marti Raudsepp <marti@juffo.org> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: fix long mount times on very big file systemsTheodore Ts'o
commit 0548bbb85337e532ca2ed697c3e9b227ff2ed4b4 upstream. Commit 8aeb00ff85a: "ext4: fix overhead calculation used by ext4_statfs()" introduced a O(n**2) calculation which makes very large file systems take forever to mount. Fix this with an optimization for non-bigalloc file systems. (For bigalloc file systems the overhead needs to be set in the the superblock.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: avoid kmemcheck complaint from reading uninitialized memoryTheodore Ts'o
commit 7e731bc9a12339f344cddf82166b82633d99dd86 upstream. Commit 03179fe923 introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: make sure the journal sb is written in ext4_clear_journal_err()Theodore Ts'o
commit d796c52ef0b71a988364f6109aeb63d79c5b116b upstream. After we transfer set the EXT4_ERROR_FS bit in the file system superblock, it's not enough to call jbd2_journal_clear_err() to clear the error indication from journal superblock --- we need to call jbd2_journal_update_sb_errno() as well. Otherwise, when the root file system is mounted read-only, the journal is replayed, and the error indicator is transferred to the superblock --- but the s_errno field in the jbd2 superblock is left set (since although we cleared it in memory, we never flushed it out to disk). This can end up confusing e2fsck. We should make e2fsck more robust in this case, but the kernel shouldn't be leaving things in this confused state, either. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26fuse: verify all ioctl retry iov elementsZach Brown
commit fb6ccff667712c46b4501b920ea73a326e49626a upstream. Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15ore: Fix out-of-bounds access in _ios_obj()Boaz Harrosh
commit 9e62bb4458ad2cf28bd701aa5fab380b846db326 upstream. _ios_obj() is accessed by group_index not device_table index. The oc->comps array is only a group_full of devices at a time it is not like ore_comp_dev() which is indexed by a global device_table index. This did not BUG until now because exofs only uses a single COMP for all devices. But with other FSs like PanFS this is not true. This bug was only in the write_path, all other users were using it correctly [This is a bug since 3.2 Kernel] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15nilfs2: fix deadlock issue between chcp and thaw ioctlsRyusuke Konishi
commit 572d8b3945a31bee7c40d21556803e4807fd9141 upstream. An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09ext4: undo ext4_calc_metadata_amount if we fail to claim spaceTheodore Ts'o
commit 03179fe92318e7934c180d96f12eff2cb36ef7b6 upstream. The function ext4_calc_metadata_amount() has side effects, although it's not obvious from its function name. So if we fail to claim space, regardless of whether we retry to claim the space again, or return an error, we need to undo these side effects. Otherwise we can end up incorrectly calculating the number of metadata blocks needed for the operation, which was responsible for an xfstests failure for test #271 when using an ext2 file system with delalloc enabled. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09ext4: don't let i_reserved_meta_blocks go negativeBrian Foster
commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream. If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>