summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2022-05-26 12:32:41 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2022-05-26 12:32:41 -0700
commit98931dd95fd489fcbfa97da563505a6f071d7c77 (patch)
tree44683fc4a92efa614acdca2742a7ff19d26da1e3 /Documentation/filesystems
parentdf202b452fe6c6d6f1351bad485e2367ef1e644e (diff)
parentf403f22f8ccb12860b2b62fec3173c6ccd45938b (diff)
Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: "Almost all of MM here. A few things are still getting finished off, reviewed, etc. - Yang Shi has improved the behaviour of khugepaged collapsing of readonly file-backed transparent hugepages. - Johannes Weiner has arranged for zswap memory use to be tracked and managed on a per-cgroup basis. - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime enablement of the recent huge page vmemmap optimization feature. - Baolin Wang contributes a series to fix some issues around hugetlb pagetable invalidation. - Zhenwei Pi has fixed some interactions between hwpoisoned pages and virtualization. - Tong Tiangen has enabled the use of the presently x86-only page_table_check debugging feature on arm64 and riscv. - David Vernet has done some fixup work on the memcg selftests. - Peter Xu has taught userfaultfd to handle write protection faults against shmem- and hugetlbfs-backed files. - More DAMON development from SeongJae Park - adding online tuning of the feature and support for monitoring of fixed virtual address ranges. Also easier discovery of which monitoring operations are available. - Nadav Amit has done some optimization of TLB flushing during mprotect(). - Neil Brown continues to labor away at improving our swap-over-NFS support. - David Hildenbrand has some fixes to anon page COWing versus get_user_pages(). - Peng Liu fixed some errors in the core hugetlb code. - Joao Martins has reduced the amount of memory consumed by device-dax's compound devmaps. - Some cleanups of the arch-specific pagemap code from Anshuman Khandual. - Muchun Song has found and fixed some errors in the TLB flushing of transparent hugepages. - Roman Gushchin has done more work on the memcg selftests. ... and, of course, many smaller fixes and cleanups. Notably, the customary million cleanup serieses from Miaohe Lin" * tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits) mm: kfence: use PAGE_ALIGNED helper selftests: vm: add the "settings" file with timeout variable selftests: vm: add "test_hmm.sh" to TEST_FILES selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests selftests: vm: add migration to the .gitignore selftests/vm/pkeys: fix typo in comment ksm: fix typo in comment selftests: vm: add process_mrelease tests Revert "mm/vmscan: never demote for memcg reclaim" mm/kfence: print disabling or re-enabling message include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace" include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion" mm: fix a potential infinite loop in start_isolate_page_range() MAINTAINERS: add Muchun as co-maintainer for HugeTLB zram: fix Kconfig dependency warning mm/shmem: fix shmem folio swapoff hang cgroup: fix an error handling path in alloc_pagecache_max_30M() mm: damon: use HPAGE_PMD_SIZE tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate nodemask.h: fix compilation error with GCC12 ...
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/locking.rst18
-rw-r--r--Documentation/filesystems/proc.rst154
-rw-r--r--Documentation/filesystems/vfs.rst17
3 files changed, 122 insertions, 67 deletions
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 515bc48ab58b..d1bf77ef3bc1 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -258,8 +258,9 @@ prototypes::
int (*launder_folio)(struct folio *);
bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count);
int (*error_remove_page)(struct address_space *, struct page *);
- int (*swap_activate)(struct file *);
+ int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
int (*swap_deactivate)(struct file *);
+ int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
locking rules:
All except dirty_folio and free_folio may block
@@ -287,6 +288,7 @@ is_partially_uptodate: yes
error_remove_page: yes
swap_activate: no
swap_deactivate: no
+swap_rw: yes, unlocks
====================== ======================== ========= ===============
->write_begin(), ->write_end() and ->read_folio() may be called from
@@ -386,15 +388,19 @@ cleaned, or an error value if not. Note that in order to prevent the folio
getting mapped back in and redirtied, it needs to be kept locked
across the entire operation.
-->swap_activate will be called with a non-zero argument on
-files backing (non block device backed) swapfiles. A return value
-of zero indicates success, in which case this file can be used for
-backing swapspace. The swapspace operations will be proxied to the
-address space operations.
+->swap_activate() will be called to prepare the given file for swap. It
+should perform any validation and preparation necessary to ensure that
+writes can be performed with minimal memory allocation. It should call
+add_swap_extent(), or the helper iomap_swapfile_activate(), and return
+the number of extents added. If IO should be submitted through
+->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted
+directly to the block device ``sis->bdev``.
->swap_deactivate() will be called in the sys_swapoff()
path after ->swap_activate() returned success.
+->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate().
+
file_lock_operations
====================
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 6a0dd99786f9..1bc91fb8c321 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -942,56 +942,73 @@ can be substantial. In many cases there are other means to find out
additional memory using subsystem specific interfaces, for instance
/proc/net/sockstat for TCP memory allocations.
-The following is from a 16GB PIII, which has highmem enabled.
-You may not have all of these fields.
+Example output. You may not have all of these fields.
::
> cat /proc/meminfo
- MemTotal: 16344972 kB
- MemFree: 13634064 kB
- MemAvailable: 14836172 kB
- Buffers: 3656 kB
- Cached: 1195708 kB
- SwapCached: 0 kB
- Active: 891636 kB
- Inactive: 1077224 kB
- HighTotal: 15597528 kB
- HighFree: 13629632 kB
- LowTotal: 747444 kB
- LowFree: 4432 kB
- SwapTotal: 0 kB
- SwapFree: 0 kB
- Dirty: 968 kB
- Writeback: 0 kB
- AnonPages: 861800 kB
- Mapped: 280372 kB
- Shmem: 644 kB
- KReclaimable: 168048 kB
- Slab: 284364 kB
- SReclaimable: 159856 kB
- SUnreclaim: 124508 kB
- PageTables: 24448 kB
- NFS_Unstable: 0 kB
- Bounce: 0 kB
- WritebackTmp: 0 kB
- CommitLimit: 7669796 kB
- Committed_AS: 100056 kB
- VmallocTotal: 112216 kB
- VmallocUsed: 428 kB
- VmallocChunk: 111088 kB
- Percpu: 62080 kB
- HardwareCorrupted: 0 kB
- AnonHugePages: 49152 kB
- ShmemHugePages: 0 kB
- ShmemPmdMapped: 0 kB
+ MemTotal: 32858820 kB
+ MemFree: 21001236 kB
+ MemAvailable: 27214312 kB
+ Buffers: 581092 kB
+ Cached: 5587612 kB
+ SwapCached: 0 kB
+ Active: 3237152 kB
+ Inactive: 7586256 kB
+ Active(anon): 94064 kB
+ Inactive(anon): 4570616 kB
+ Active(file): 3143088 kB
+ Inactive(file): 3015640 kB
+ Unevictable: 0 kB
+ Mlocked: 0 kB
+ SwapTotal: 0 kB
+ SwapFree: 0 kB
+ Zswap: 1904 kB
+ Zswapped: 7792 kB
+ Dirty: 12 kB
+ Writeback: 0 kB
+ AnonPages: 4654780 kB
+ Mapped: 266244 kB
+ Shmem: 9976 kB
+ KReclaimable: 517708 kB
+ Slab: 660044 kB
+ SReclaimable: 517708 kB
+ SUnreclaim: 142336 kB
+ KernelStack: 11168 kB
+ PageTables: 20540 kB
+ NFS_Unstable: 0 kB
+ Bounce: 0 kB
+ WritebackTmp: 0 kB
+ CommitLimit: 16429408 kB
+ Committed_AS: 7715148 kB
+ VmallocTotal: 34359738367 kB
+ VmallocUsed: 40444 kB
+ VmallocChunk: 0 kB
+ Percpu: 29312 kB
+ HardwareCorrupted: 0 kB
+ AnonHugePages: 4149248 kB
+ ShmemHugePages: 0 kB
+ ShmemPmdMapped: 0 kB
+ FileHugePages: 0 kB
+ FilePmdMapped: 0 kB
+ CmaTotal: 0 kB
+ CmaFree: 0 kB
+ HugePages_Total: 0
+ HugePages_Free: 0
+ HugePages_Rsvd: 0
+ HugePages_Surp: 0
+ Hugepagesize: 2048 kB
+ Hugetlb: 0 kB
+ DirectMap4k: 401152 kB
+ DirectMap2M: 10008576 kB
+ DirectMap1G: 24117248 kB
MemTotal
Total usable RAM (i.e. physical RAM minus a few reserved
bits and the kernel binary code)
MemFree
- The sum of LowFree+HighFree
+ Total free RAM. On highmem systems, the sum of LowFree+HighFree
MemAvailable
An estimate of how much memory is available for starting new
applications, without swapping. Calculated from MemFree,
@@ -1005,8 +1022,9 @@ Buffers
Relatively temporary storage for raw disk blocks
shouldn't get tremendously large (20MB or so)
Cached
- in-memory cache for files read from the disk (the
- pagecache). Doesn't include SwapCached
+ In-memory cache for files read from the disk (the
+ pagecache) as well as tmpfs & shmem.
+ Doesn't include SwapCached.
SwapCached
Memory that once was swapped out, is swapped back in but
still also is in the swapfile (if memory is needed it
@@ -1018,6 +1036,11 @@ Active
Inactive
Memory which has been less recently used. It is more
eligible to be reclaimed for other purposes
+Unevictable
+ Memory allocated for userspace which cannot be reclaimed, such
+ as mlocked pages, ramfs backing pages, secret memfd pages etc.
+Mlocked
+ Memory locked with mlock().
HighTotal, HighFree
Highmem is all memory above ~860MB of physical memory.
Highmem areas are for use by userspace programs, or
@@ -1034,26 +1057,20 @@ SwapTotal
SwapFree
Memory which has been evicted from RAM, and is temporarily
on the disk
+Zswap
+ Memory consumed by the zswap backend (compressed size)
+Zswapped
+ Amount of anonymous memory stored in zswap (original size)
Dirty
Memory which is waiting to get written back to the disk
Writeback
Memory which is actively being written back to the disk
AnonPages
Non-file backed pages mapped into userspace page tables
-HardwareCorrupted
- The amount of RAM/memory in KB, the kernel identifies as
- corrupted.
-AnonHugePages
- Non-file backed huge pages mapped into userspace page tables
Mapped
files which have been mmaped, such as libraries
Shmem
Total memory used by shared memory (shmem) and tmpfs
-ShmemHugePages
- Memory used by shared memory (shmem) and tmpfs allocated
- with huge pages
-ShmemPmdMapped
- Shared memory mapped into userspace with huge pages
KReclaimable
Kernel allocations that the kernel will attempt to reclaim
under memory pressure. Includes SReclaimable (below), and other
@@ -1064,9 +1081,10 @@ SReclaimable
Part of Slab, that might be reclaimed, such as caches
SUnreclaim
Part of Slab, that cannot be reclaimed on memory pressure
+KernelStack
+ Memory consumed by the kernel stacks of all tasks
PageTables
- amount of memory dedicated to the lowest level of page
- tables.
+ Memory consumed by userspace page tables
NFS_Unstable
Always zero. Previous counted pages which had been written to
the server, but has not been committed to stable storage.
@@ -1098,7 +1116,7 @@ Committed_AS
has been allocated by processes, even if it has not been
"used" by them as of yet. A process which malloc()'s 1G
of memory, but only touches 300M of it will show up as
- using 1G. This 1G is memory which has been "committed" to
+ using 1G. This 1G is memory which has been "committed" to
by the VM and can be used at any time by the allocating
application. With strict overcommit enabled on the system
(mode 2 in 'vm.overcommit_memory'), allocations which would
@@ -1107,7 +1125,7 @@ Committed_AS
not fail due to lack of memory once that memory has been
successfully allocated.
VmallocTotal
- total size of vmalloc memory area
+ total size of vmalloc virtual address space
VmallocUsed
amount of vmalloc area which is used
VmallocChunk
@@ -1115,6 +1133,30 @@ VmallocChunk
Percpu
Memory allocated to the percpu allocator used to back percpu
allocations. This stat excludes the cost of metadata.
+HardwareCorrupted
+ The amount of RAM/memory in KB, the kernel identifies as
+ corrupted.
+AnonHugePages
+ Non-file backed huge pages mapped into userspace page tables
+ShmemHugePages
+ Memory used by shared memory (shmem) and tmpfs allocated
+ with huge pages
+ShmemPmdMapped
+ Shared memory mapped into userspace with huge pages
+FileHugePages
+ Memory used for filesystem data (page cache) allocated
+ with huge pages
+FilePmdMapped
+ Page cache mapped into userspace with huge pages
+CmaTotal
+ Memory reserved for the Contiguous Memory Allocator (CMA)
+CmaFree
+ Free remaining memory in the CMA reserves
+HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb
+ See Documentation/admin-guide/mm/hugetlbpage.rst.
+DirectMap4k, DirectMap2M, DirectMap1G
+ Breakdown of page table sizes used in the kernel's
+ identity mapping of RAM
vmallocinfo
~~~~~~~~~~~
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 12a011d2cbc6..08069ecd49a6 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -749,8 +749,9 @@ cache in your filesystem. The following members are defined:
size_t count);
void (*is_dirty_writeback)(struct folio *, bool *, bool *);
int (*error_remove_page) (struct mapping *mapping, struct page *page);
- int (*swap_activate)(struct file *);
+ int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
int (*swap_deactivate)(struct file *);
+ int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};
``writepage``
@@ -948,15 +949,21 @@ cache in your filesystem. The following members are defined:
unless you have them locked or reference counts increased.
``swap_activate``
- Called when swapon is used on a file to allocate space if
- necessary and pin the block lookup information in memory. A
- return value of zero indicates success, in which case this file
- can be used to back swapspace.
+
+ Called to prepare the given file for swap. It should perform
+ any validation and preparation necessary to ensure that writes
+ can be performed with minimal memory allocation. It should call
+ add_swap_extent(), or the helper iomap_swapfile_activate(), and
+ return the number of extents added. If IO should be submitted
+ through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will
+ be submitted directly to the block device ``sis->bdev``.
``swap_deactivate``
Called during swapoff on files where swap_activate was
successful.
+``swap_rw``
+ Called to read or write swap pages when SWP_FS_OPS is set.
The File Object
===============