summaryrefslogtreecommitdiff
path: root/fs/overlayfs
AgeCommit message (Collapse)Author
2018-09-15ovl: proper cleanup of workdirMiklos Szeredi
commit eea2fb4851e9dcbab6b991aaf47e2e024f1f55a0 upstream. When mounting overlayfs it needs a clean "work" directory under the supplied workdir. Previously the mount code removed this directory if it already existed and created a new one. If the removal failed (e.g. directory was not empty) then it fell back to a read-only mount not using the workdir. While this has never been reported, it is possible to get a non-empty "work" dir from a previous mount of overlayfs in case of crash in the middle of an operation using the work directory. In this case the left over state should be discarded and the overlay filesystem will be consistent, guaranteed by the atomicity of operations on moving to/from the workdir to the upper layer. This patch implements cleaning out any files left in workdir. It is implemented using real recursion for simplicity, but the depth is limited to 2, because the worst case is that of a directory containing whiteouts under "work". Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15ovl: override creds with the ones from the superblock mounterAntonio Murdaca
commit 3fe6e52f062643676eb4518d68cee3bc1272091b upstream. In user namespace the whiteout creation fails with -EPERM because the current process isn't capable(CAP_SYS_ADMIN) when setting xattr. A simple reproducer: $ mkdir upper lower work merged lower/dir $ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged $ unshare -m -p -f -U -r bash Now as root in the user namespace: \# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir \# rm -fR merged/* This ends up failing with -EPERM after the files in dir has been correctly deleted: unlinkat(4, "2", 0) = 0 unlinkat(4, "1", 0) = 0 unlinkat(4, "3", 0) = 0 close(4) = 0 unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not permitted) Interestingly, if you don't place files in merged/dir you can remove it, meaning if upper/dir does not exist, creating the char device file works properly in that same location. This patch uses ovl_sb_creator_cred() to get the cred struct from the superblock mounter and override the old cred with these new ones so that the whiteout creation is possible because overlay is wrong in assuming that the creds it will get with prepare_creds will be in the initial user namespace. The old cap_raise game is removed in favor of just overriding the old cred struct. This patch also drops from ovl_copy_up_one() the following two lines: override_cred->fsuid = stat->uid; override_cred->fsgid = stat->gid; This is because the correct uid and gid are taken directly with the stat struct and correctly set with ovl_set_attr(). Signed-off-by: Antonio Murdaca <runcom@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15ovl: rename is_merge to is_lowestMiklos Szeredi
commit 56656e960b555cb98bc414382566dcb59aae99a2 upstream. The 'is_merge' is an historical naming from when only a single lower layer could exist. With the introduction of multiple lower layers the meaning of this flag was changed to mean only the "lowest layer" (while all lower layers were being merged). So now 'is_merge' is inaccurate and hence renaming to 'is_lowest' Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-28ovl: warn instead of error if d_type is not supportedVivek Goyal
commit e7c0b5991dd1be7b6f6dc2b54a15a0f47b64b007 upstream. overlay needs underlying fs to support d_type. Recently I put in a patch in to detect this condition and started failing mount if underlying fs did not support d_type. But this breaks existing configurations over kernel upgrade. Those who are running docker (partially broken configuration) with xfs not supporting d_type, are surprised that after kernel upgrade docker does not run anymore. https://github.com/docker/docker/issues/22937#issuecomment-229881315 So instead of erroring out, detect broken configuration and warn about it. This should allow existing docker setups to continue working after kernel upgrade. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 45aebeaf4f67 ("ovl: Ensure upper filesystem supports d_type") Cc: <stable@vger.kernel.org> 4.6 Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-28ovl: Do d_type check only if work dir creation was successfulVivek Goyal
commit 21765194cecf2e4514ad75244df459f188140a0f upstream. d_type check requires successful creation of workdir as iterates through work dir and expects work dir to be present in it. If that's not the case, this check will always return d_type not supported even if underlying filesystem might be supporting it. So don't do this check if work dir creation failed in previous step. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-28ovl: Ensure upper filesystem supports d_typeVivek Goyal
commit 45aebeaf4f67468f76bedf62923a576a519a9b68 upstream. In some instances xfs has been created with ftype=0 and there if a file on lower fs is removed, overlay leaves a whiteout in upper fs but that whiteout does not get filtered out and is visible to overlayfs users. And reason it does not get filtered out because upper filesystem does not report file type of whiteout as DT_CHR during iterate_dir(). So it seems to be a requirement that upper filesystem support d_type for overlayfs to work properly. Do this check during mount and fail if d_type is not supported. Suggested-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13ovl: filter trusted xattr for non-adminMiklos Szeredi
[ Upstream commit a082c6f680da298cf075886ff032f32ccb7c5e1a ] Filesystems filter out extended attributes in the "trusted." domain for unprivlieged callers. Overlay calls underlying filesystem's method with elevated privs, so need to do the filtering in overlayfs too. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16ovl: fix failure to fsync lower dirAmir Goldstein
commit d796e77f1dd541fe34481af2eee6454688d13982 upstream. As a writable mount, it is not expected for overlayfs to return EINVAL/EROFS for fsync, even if dir/file is not changed. This commit fixes the case of fsync of directory, which is easier to address, because overlayfs already implements fsync file operation for directories. The problem reported by Raphael is that new PostgreSQL 10.0 with a database in overlayfs where lower layer in squashfs fails to start. The failure is due to fsync error, when PostgreSQL does fsync on all existing db directories on startup and a specific directory exists lower layer with no changes. Reported-by: Raphael Hertzog <raphael@ouaza.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Raphaël Hertzog <hertzog@debian.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-10ovl: fsync after copy-upMiklos Szeredi
commit 641089c1549d8d3df0b047b5de7e9a111362cdce upstream. Make sure the copied up file hits the disk before renaming to the final destination. If this is not done then the copy-up may corrupt the data in the file in case of a crash. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28ovl: copy_up_xattr(): use strnlenMiklos Szeredi
commit 8b326c61de08f5ca4bc454a168f19e7e43c4cc2a upstream. Be defensive about what underlying fs provides us in the returned xattr list buffer. strlen() may overrun the buffer, so use strnlen() and WARN if the contents are not properly null terminated. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28ovl: Fix info leak in ovl_lookup_temp()Richard Weinberger
commit 6a45b3628ce4dcf7498b39c87d475bab6e2a9b24 upstream. The function uses the memory address of a struct dentry as unique id. While the address-based directory entry is only visible to root it is IMHO still worth fixing since the temporary name does not have to be a kernel address. It can be any unique number. Replace it by an atomic integer which is allowed to wrap around. Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: e9be9d5e76e3 ("overlay filesystem") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15ovl: fix workdir creationMiklos Szeredi
commit e1ff3dd1ae52cef5b5373c8cc4ad949c2c25a71c upstream. Workdir creation fails in latest kernel. Fix by allowing EOPNOTSUPP as a valid return value from vfs_removexattr(XATTR_NAME_POSIX_ACL_*). Upper filesystem may not support ACL and still be perfectly able to support overlayfs. Reported-by: Martin Ziegler <ziegler@uni-freiburg.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: c11b9fdd6a61 ("ovl: remove posix_acl_default from workdir") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15ovl: listxattr: use strnlen()Miklos Szeredi
commit 7cb35119d067191ce9ebc380a599db0b03cbd9d9 upstream. Be defensive about what underlying fs provides us in the returned xattr list buffer. If it's not properly null terminated, bail out with a warning insead of BUG. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15ovl: remove posix_acl_default from workdirMiklos Szeredi
commit c11b9fdd6a612f376a5e886505f1c54c16d8c380 upstream. Clear out posix acl xattrs on workdir and also reset the mode after creation so that an inherited sgid bit is cleared. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15ovl: don't copy up opaquenessMiklos Szeredi
commit 0956254a2d5b9e2141385514553aeef694dfe3b5 upstream. When a copy up of a directory occurs which has the opaque xattr set, the xattr remains in the upper directory. The immediate behavior with overlayfs is that the upper directory is not treated as opaque, however after a remount the opaque flag is used and upper directory is treated as opaque. This causes files created in the lower layer to be hidden when using multiple lower directories. Fix by not copying up the opaque flag. To reproduce: ----8<---------8<---------8<---------8<---------8<---------8<---- mkdir -p l/d/s u v w mnt mount -t overlay overlay -olowerdir=l,upperdir=u,workdir=w mnt rm -rf mnt/d/ mkdir -p mnt/d/n umount mnt mount -t overlay overlay -olowerdir=u:l,upperdir=v,workdir=w mnt touch mnt/d/foo umount mnt mount -t overlay overlay -olowerdir=u:l,upperdir=v,workdir=w mnt ls mnt/d ----8<---------8<---------8<---------8<---------8<---------8<---- output should be: "foo n" Reported-by: Derek McGowan <dmcg@drizz.net> Link: https://bugzilla.kernel.org/show_bug.cgi?id=151291 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20ovl: disallow overlayfs as upperdirMiklos Szeredi
commit 76bc8e2843b66f8205026365966b49ec6da39ae7 upstream. This does not work and does not make sense. So instead of fixing it (probably not hard) just disallow. Reported-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10ovl: handle ATTR_KILL*Miklos Szeredi
commit b99c2d913810e56682a538c9f2394d76fca808f8 upstream. Before 4bacc9c9234c ("overlayfs: Make f_path...") file->f_path pointed to the underlying file, hence suid/sgid removal on write worked fine. After that patch file->f_path pointed to the overlay file, and the file mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid removal simply stopped working. The fix is to copy the mode bits, but then ovl_setattr() needs to clear ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in the next patch copy the mode. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: Eric Schultz <eric@startuperic.com> Cc: Eric Hameleers <alien@slackware.com> [backported by Eric Hameleers as seen in https://bugzilla.kernel.org/show_bug.cgi?id=150711] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27ovl: verify upper dentry before unlink and renameMiklos Szeredi
commit 11f3710417d026ea2f4fcf362d866342c5274185 upstream. Unlink and rename in overlayfs checked the upper dentry for staleness by verifying upper->d_parent against upperdir. However the dentry can go stale also by being unhashed, for example. Expand the verification to actually look up the name again (under parent lock) and check if it matches the upper dentry. This matches what the VFS does before passing the dentry to filesytem's unlink/rename methods, which excludes any inconsistency caused by overlayfs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27ovl: verify upper dentry in ovl_remove_and_whiteout()Maxim Patlasov
commit cfc9fde0b07c3b44b570057c5f93dda59dca1c94 upstream. The upper dentry may become stale before we call ovl_lock_rename_workdir. For example, someone could (mistakenly or maliciously) manually unlink(2) it directly from upperdir. To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir and and check if it matches the upper dentry. Essentially, it is the same problem and similar solution as in commit 11f3710417d0 ("ovl: verify upper dentry before unlink and rename"). Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27ovl: Copy up underlying inode's ->i_mode to overlay inodeVivek Goyal
commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream. Right now when a new overlay inode is created, we initialize overlay inode's ->i_mode from underlying inode ->i_mode but we retain only file type bits (S_IFMT) and discard permission bits. This patch changes it and retains permission bits too. This should allow overlay to do permission checks on overlay inode itself in task context. [SzM] It also fixes clearing suid/sgid bits on write. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-20fs: add file_dentry()Miklos Szeredi
commit d101a125954eae1d397adda94ca6319485a50493 upstream. This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay"). Regular files opened on overlayfs will result in the file being opened on the underlying filesystem, while f_path points to the overlayfs mount/dentry. This confuses filesystems which get the dentry from struct file and assume it's theirs. Add a new helper, file_dentry() [*], to get the filesystem's own dentry from the file. This checks file->f_path.dentry->d_flags against DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not set (this is the common, non-overlayfs case). In the uncommon case it will call into overlayfs's ->d_real() to get the underlying dentry, matching file_inode(file). The reason we need to check against the inode is that if the file is copied up while being open, d_real() would return the upper dentry, while the open file comes from the lower dentry. [*] If possible, it's better simply to use file_inode() instead. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16ovl: fix getcwd() failure after unsuccessful rmdirRui Wang
commit ce9113bbcbf45a57c082d6603b9a9f342be3ef74 upstream. ovl_remove_upper() should do d_drop() only after it successfully removes the dir, otherwise a subsequent getcwd() system call will fail, breaking userspace programs. This is to fix: https://bugzilla.kernel.org/show_bug.cgi?id=110491 Signed-off-by: Rui Wang <rui.y.wang@intel.com> Reviewed-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16ovl: copy new uid/gid into overlayfs runtime inodeKonstantin Khlebnikov
commit b81de061fa59f17d2730aabb1b84419ef3913810 upstream. Overlayfs must update uid/gid after chown, otherwise functions like inode_owner_or_capable() will check user against stale uid. Catched by xfstests generic/087, it chowns file and calls utimes. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16ovl: fix working on distributed fs as lower layerKonstantin Khlebnikov
commit b5891cfab08fe3144a616e8e734df7749fb3b7d0 upstream. This adds missing .d_select_inode into alternative dentry_operations. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Fixes: 7c03b5d45b8e ("ovl: allow distributed fs as lower layer") Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Reviewed-by: Nikolay Borisov <kernel@kyup.com> Tested-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16ovl: ignore lower entries when checking purity of non-directory entriesKonstantin Khlebnikov
commit 45d11738969633ec07ca35d75d486bf2d8918df6 upstream. After rename file dentry still holds reference to lower dentry from previous location. This doesn't matter for data access because data comes from upper dentry. But this stale lower dentry taints dentry at new location and turns it into non-pure upper. Such file leaves visible whiteout entry after remove in directory which shouldn't have whiteouts at all. Overlayfs already tracks pureness of file location in oe->opaque. This patch just uses that for detecting actual path type. Comment from Vivek Goyal's patch: Here are the details of the problem. Do following. $ mkdir upper lower work merged upper/dir/ $ touch lower/test $ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir= work merged $ mv merged/test merged/dir/ $ rm merged/dir/test $ ls -l merged/dir/ /usr/bin/ls: cannot access merged/dir/test: No such file or directory total 0 c????????? ? ? ? ? ? test Basic problem seems to be that once a file has been unlinked, a whiteout has been left behind which was not needed and hence it becomes visible. Whiteout is visible because parent dir is of not type MERGE, hence od->is_real is set during ovl_dir_open(). And that means ovl_iterate() passes on iterate handling directly to underlying fs. Underlying fs does not know/filter whiteouts so it becomes visible to user. Why did we leave a whiteout to begin with when we should not have. ovl_do_remove() checks for OVL_TYPE_PURE_UPPER() and does not leave whiteout if file is pure upper. In this case file is not found to be pure upper hence whiteout is left. So why file was not PURE_UPPER in this case? I think because dentry is still carrying some leftover state which was valid before rename. For example, od->numlower was set to 1 as it was a lower file. After rename, this state is not valid anymore as there is no such file in lower. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Viktor Stanchev <me@viktorstanchev.com> Suggested-by: Vivek Goyal <vgoyal@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109611 Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25ovl: setattr: check permissions before copy-upMiklos Szeredi
commit cf9a6784f7c1b5ee2b9159a1246e327c331c5697 upstream. Without this copy-up of a file can be forced, even without actually being allowed to do anything on the file. [Arnd Bergmann] include <linux/pagemap.h> for PAGE_CACHE_SIZE (used by MAX_LFS_FILESIZE definition). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25ovl: root: copy attrMiklos Szeredi
commit ed06e069775ad9236087594a1c1667367e983fb5 upstream. We copy i_uid and i_gid of underlying inode into overlayfs inode. Except for the root inode. Fix this omission. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25ovl: check dentry positiveness in ovl_cleanup_whiteouts()Konstantin Khlebnikov
commit 84889d49335627bc770b32787c1ef9ebad1da232 upstream. This patch fixes kernel crash at removing directory which contains whiteouts from lower layers. Cache of directory content passed as "list" contains entries from all layers, including whiteouts from lower layers. So, lookup in upper dir (moved into work at this stage) will return negative entry. Plus this cache is filled long before and we can race with external removal. Example: mkdir -p lower0/dir lower1/dir upper work overlay touch lower0/dir/a lower0/dir/b mknod lower1/dir/a c 0 0 mount -t overlay none overlay -o lowerdir=lower1:lower0,upperdir=upper,workdir=work rm -fr overlay/dir Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25ovl: use a minimal buffer in ovl_copy_xattrVito Caputo
commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe upstream. Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes https://github.com/coreos/bugs/issues/489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25ovl: allow zero size xattrMiklos Szeredi
commit 97daf8b97ad6f913a34c82515be64dc9ac08d63e upstream. When ovl_copy_xattr() encountered a zero size xattr no more xattrs were copied and the function returned success. This is clearly not the desired behavior. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-06ovl: get rid of the dead code left from broken (and disabled) optimizationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06ovl: fix permission checking for setattrMiklos Szeredi
[Al Viro] The bug is in being too enthusiastic about optimizing ->setattr() away - instead of "copy verbatim with metadata" + "chmod/chown/utimes" (with the former being always safe and the latter failing in case of insufficient permissions) it tries to combine these two. Note that copyup itself will have to do ->setattr() anyway; _that_ is where the elevated capabilities are right. Having these two ->setattr() (one to set verbatim copy of metadata, another to do what overlayfs ->setattr() had been asked to do in the first place) combined is where it breaks. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-10-31Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs bug fixes from Miklos Szeredi: "This contains fixes for bugs that appeared in earlier kernels (all are marked for -stable)" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: free lower_mnt array in ovl_put_super ovl: free stack of paths in ovl_fill_super ovl: fix open in stacked overlay ovl: fix dentry reference leak ovl: use O_LARGEFILE in ovl_copy_up()
2015-10-12ovl: free lower_mnt array in ovl_put_superKonstantin Khlebnikov
This fixes memory leak after umount. Kmemleak report: unreferenced object 0xffff8800ba791010 (size 8): comm "mount", pid 2394, jiffies 4294996294 (age 53.920s) hex dump (first 8 bytes): 20 1c 13 02 00 88 ff ff ....... backtrace: [<ffffffff811f8cd4>] create_object+0x124/0x2c0 [<ffffffff817a059b>] kmemleak_alloc+0x7b/0xc0 [<ffffffff811dffe6>] __kmalloc+0x106/0x340 [<ffffffffa0152bfc>] ovl_fill_super+0x55c/0x9b0 [overlay] [<ffffffff81200ac4>] mount_nodev+0x54/0xa0 [<ffffffffa0152118>] ovl_mount+0x18/0x20 [overlay] [<ffffffff81201ab3>] mount_fs+0x43/0x170 [<ffffffff81220d34>] vfs_kern_mount+0x74/0x170 [<ffffffff812233ad>] do_mount+0x22d/0xdf0 [<ffffffff812242cb>] SyS_mount+0x7b/0xc0 [<ffffffff817b6bee>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: dd662667e6d3 ("ovl: add mutli-layer infrastructure") Cc: <stable@vger.kernel.org> # v4.0+
2015-10-12ovl: free stack of paths in ovl_fill_superKonstantin Khlebnikov
This fixes small memory leak after mount. Kmemleak report: unreferenced object 0xffff88003683fe00 (size 16): comm "mount", pid 2029, jiffies 4294909563 (age 33.380s) hex dump (first 16 bytes): 20 27 1f bb 00 88 ff ff 40 4b 0f 36 02 88 ff ff '......@K.6.... backtrace: [<ffffffff811f8cd4>] create_object+0x124/0x2c0 [<ffffffff817a059b>] kmemleak_alloc+0x7b/0xc0 [<ffffffff811dffe6>] __kmalloc+0x106/0x340 [<ffffffffa01b7a29>] ovl_fill_super+0x389/0x9a0 [overlay] [<ffffffff81200ac4>] mount_nodev+0x54/0xa0 [<ffffffffa01b7118>] ovl_mount+0x18/0x20 [overlay] [<ffffffff81201ab3>] mount_fs+0x43/0x170 [<ffffffff81220d34>] vfs_kern_mount+0x74/0x170 [<ffffffff812233ad>] do_mount+0x22d/0xdf0 [<ffffffff812242cb>] SyS_mount+0x7b/0xc0 [<ffffffff817b6bee>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: a78d9f0d5d5c ("ovl: support multiple lower layers") Cc: <stable@vger.kernel.org> # v4.0+
2015-10-12ovl: fix open in stacked overlayMiklos Szeredi
If two overlayfs filesystems are stacked on top of each other, then we need recursion in ovl_d_select_inode(). I guess d_backing_inode() is supposed to do that. But currently it doesn't and that functionality is open coded in vfs_open(). This is now copied into ovl_d_select_inode() to fix this regression. Reported-by: Alban Crequy <alban.crequy@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...") Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+
2015-10-12ovl: fix dentry reference leakDavid Howells
In ovl_copy_up_locked(), newdentry is leaked if the function exits through out_cleanup as this just to out after calling ovl_cleanup() - which doesn't actually release the ref on newdentry. The out_cleanup segment should instead exit through out2 as certainly newdentry leaks - and possibly upper does also, though this isn't caught given the catch of newdentry. Without this fix, something like the following is seen: BUG: Dentry ffff880023e9eb20{i=f861,n=#ffff880023e82d90} still in use (1) [unmount of tmpfs tmpfs] BUG: Dentry ffff880023ece640{i=0,n=bigfile} still in use (1) [unmount of tmpfs tmpfs] when unmounting the upper layer after an error occurred in copyup. An error can be induced by creating a big file in a lower layer with something like: dd if=/dev/zero of=/lower/a/bigfile bs=65536 count=1 seek=$((0xf000)) to create a large file (4.1G). Overlay an upper layer that is too small (on tmpfs might do) and then induce a copy up by opening it writably. Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> # v3.18+
2015-10-12ovl: use O_LARGEFILE in ovl_copy_up()David Howells
Open the lower file with O_LARGEFILE in ovl_copy_up(). Pass O_LARGEFILE unconditionally in ovl_copy_up_data() as it's purely for catching 32-bit userspace dealing with a file large enough that it'll be mishandled if the application isn't aware that there might be an integer overflow. Inside the kernel, there shouldn't be any problems. Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> # v3.18+
2015-09-04fs: create and use seq_show_option for escapingKees Cook
Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. [akpm@linux-foundation.org: add lost chunk, per Kees] [keescook@chromium.org: seq_show_option should be using const parameters] Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-12fix a braino in ovl_d_select_inode()Al Viro
when opening a directory we want the overlayfs inode, not one from the topmost layer. Reported-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Tested-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
2015-07-02Merge branch 'overlayfs-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This relaxes the requirements on the lower layer filesystem: now ones that implement .d_revalidate, such as NFS, can be used. Upper layer filesystems still has the "no .d_revalidate" requirement. Also a bad interaction with jffs2 locking has been fixed" * 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: lookup whiteouts outside iterate_dir() ovl: allow distributed fs as lower layer ovl: don't traverse automount points
2015-06-22Merge branch 'for-linus-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "In this pile: pathname resolution rewrite. - recursion in link_path_walk() is gone. - nesting limits on symlinks are gone (the only limit remaining is that the total amount of symlinks is no more than 40, no matter how nested). - "fast" (inline) symlinks are handled without leaving rcuwalk mode. - stack footprint (independent of the nesting) is below kilobyte now, about on par with what it used to be with one level of nested symlinks and ~2.8 times lower than it used to be in the worst case. - struct nameidata is entirely private to fs/namei.c now (not even opaque pointers are being passed around). - ->follow_link() and ->put_link() calling conventions had been changed; all in-tree filesystems converted, out-of-tree should be able to follow reasonably easily. For out-of-tree conversions, see Documentation/filesystems/porting for details (and in-tree filesystems for examples of conversion). That has sat in -next since mid-May, seems to survive all testing without regressions and merges clean with v4.1" * 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits) turn user_{path_at,path,lpath,path_dir}() into static inlines namei: move saved_nd pointer into struct nameidata inline user_path_create() inline user_path_parent() namei: trim do_last() arguments namei: stash dfd and name into nameidata namei: fold path_cleanup() into terminate_walk() namei: saner calling conventions for filename_parentat() namei: saner calling conventions for filename_create() namei: shift nameidata down into filename_parentat() namei: make filename_lookup() reject ERR_PTR() passed as name namei: shift nameidata inside filename_lookup() namei: move putname() call into filename_lookup() namei: pass the struct path to store the result down into path_lookupat() namei: uninline set_root{,_rcu}() namei: be careful with mountpoint crossings in follow_dotdot_rcu() Documentation: remove outdated information from automount-support.txt get rid of assorted nameidata-related debris lustre: kill unused helper lustre: kill unused macro (LOOKUP_CONTINUE) ...
2015-06-22ovl: lookup whiteouts outside iterate_dir()Miklos Szeredi
If jffs2 can deadlock on overlayfs readdir because it takes the same lock on ->iterate() as in ->lookup(). Fix by moving whiteout checking outside iterate_dir(). Optimized by collecting potential whiteouts (DT_CHR) in a temporary list and if non-empty iterating throug these and checking for a 0/0 chardev. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Fixes: 49c21e1cacd7 ("ovl: check whiteout while reading directory") Reported-by: Roman Yeryomin <leroi.lists@gmail.com>
2015-06-22ovl: allow distributed fs as lower layerMiklos Szeredi
Allow filesystems with .d_revalidate as lower layer(s), but not as upper layer. For local filesystems the rule was that modifications on the layers directly while being part of the overlay results in undefined behavior. This can easily be extended to distributed filesystems: we assume the tree used as lower layer is static, which means ->d_revalidate() should always return "1". If that is not the case, return -ESTALE, don't try to work around the modification. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-06-22ovl: don't traverse automount pointsMiklos Szeredi
NFS and other distributed filesystems may place automount points in the tree. Previoulsy overlayfs refused to mount such filesystems types (based on the existence of the .d_automount callback), even if the actual export didn't have any automount points. It cannot be determined in advance whether the filesystem has automount points or not. The solution is to allow fs with .d_automount but refuse to traverse any automount points encountered. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-06-19overlayfs: Make f_path always point to the overlay and f_inode to the underlayDavid Howells
Make file->f_path always point to the overlay dentry so that the path in /proc/pid/fd is correct and to ensure that label-based LSMs have access to the overlay as well as the underlay (path-based LSMs probably don't need it). Using my union testsuite to set things up, before the patch I see: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:38 5 -> /a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 13381 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 13381 Links: 1 ... After the patch: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:22 5 -> /mnt/a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 40346 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 40346 Links: 1 ... Note the change in where /proc/$$/fd/5 points to in the ls command. It was pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107 (which is correct). The inode accessed, however, is the lower layer. The union layer is on device 25h/37d and the upper layer on 24h/36d. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-19overlay: Call ovl_drop_write() earlier in ovl_dentry_open()David Howells
Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open() as we've done the copy up for which we needed the freeze-write lock by that point. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-19ovl: mount read-only if workdir can't be createdMiklos Szeredi
OpenWRT folks reported that overlayfs fails to mount if upper fs is full, because workdir can't be created. Wordir creation can fail for various other reasons too. There's no reason that the mount itself should fail, overlayfs can work fine without a workdir, as long as the overlay isn't modified. So mount it read-only and don't allow remounting read-write. Add a couple of WARN_ON()s for the impossible case of workdir being used despite being read-only. Reported-by: Bastian Bittorf <bittorf@bluebottle.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.18+
2015-05-14ovl: don't remove non-empty opaque directoryMiklos Szeredi
When removing an opaque directory we can't just call rmdir() to check for emptiness, because the directory will need to be replaced with a whiteout. The replacement is done with RENAME_EXCHANGE, which doesn't check emptiness. Solution is just to check emptiness by reading the directory. In the future we could add a new rename flag to check for emptiness even for RENAME_EXCHANGE to optimize this case. Reported-by: Vincent Batts <vbatts@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Jordi Pujol Palomer <jordipujolp@gmail.com> Fixes: 263b4a0fee43 ("ovl: dont replace opaque dir") Cc: <stable@vger.kernel.org> # v4.0+