diff --git a/README.md b/README.md index b82172863698..0b1ea4594302 100644 --- a/README.md +++ b/README.md @@ -83,7 +83,7 @@ If you wanna use JuiceFS in Hadoop, check [Hadoop Java SDK](https://juicefs.com/ - [Redis Best Practices](https://juicefs.com/docs/community/redis_best_practices) - [How to Setup Object Storage](https://juicefs.com/docs/community/how_to_setup_object_storage) -- [Cache Management](https://juicefs.com/docs/community/cache_management) +- [Cache](https://juicefs.com/docs/community/cache) - [Fault Diagnosis and Analysis](https://juicefs.com/docs/community/fault_diagnosis_and_analysis) - [FUSE Mount Options](https://juicefs.com/docs/community/fuse_mount_options) - [Using JuiceFS on Windows](https://juicefs.com/docs/community/installation#windows) diff --git a/README_CN.md b/README_CN.md index 22123efafb2c..89d13e20c29c 100644 --- a/README_CN.md +++ b/README_CN.md @@ -84,7 +84,7 @@ JuiceFS 使用 [Hadoop Java SDK](https://juicefs.com/docs/zh/community/hadoop_ja - [Redis 最佳实践](https://juicefs.com/docs/zh/community/redis_best_practices) - [如何设置对象存储](https://juicefs.com/docs/zh/community/how_to_setup_object_storage) -- [缓存管理](https://juicefs.com/docs/zh/community/cache_management) +- [缓存](https://juicefs.com/docs/zh/community/cache) - [故障诊断和分析](https://juicefs.com/docs/zh/community/fault_diagnosis_and_analysis) - [FUSE 挂载选项](https://juicefs.com/docs/zh/community/fuse_mount_options) - [在 Windows 中使用 JuiceFS](https://juicefs.com/docs/zh/community/installation#windows-系统) diff --git a/docs/en/administration/fault_diagnosis_and_analysis.md b/docs/en/administration/fault_diagnosis_and_analysis.md index bee4cd900aa2..f4fadd3d7e82 100644 --- a/docs/en/administration/fault_diagnosis_and_analysis.md +++ b/docs/en/administration/fault_diagnosis_and_analysis.md @@ -211,7 +211,7 @@ Metrics description: - `cpu`: CPU usage of the process. - `mem`: Physical memory used by the process. -- `buf`: Current [buffer size](../guide/cache_management.md#buffer-size), if this value is constantly close to (or even exceeds) the configured [`--buffer-size`](../reference/command_reference.md#mount), you should increase buffer size or decrease application workload. +- `buf`: Current [buffer size](../guide/cache.md#buffer-size), if this value is constantly close to (or even exceeds) the configured [`--buffer-size`](../reference/command_reference.md#mount), you should increase buffer size or decrease application workload. - `cache`: Internal metric, ignore this. #### `fuse` diff --git a/docs/en/administration/metadata_dump_load.md b/docs/en/administration/metadata_dump_load.md index 07667121aa52..7cf67e9ece92 100644 --- a/docs/en/administration/metadata_dump_load.md +++ b/docs/en/administration/metadata_dump_load.md @@ -115,7 +115,7 @@ juicefs config --secret-key xxxxx mysql://user:password@(192.168.1.6:3306)/juice ### Encrypted file system {#encrypted-file-system} -For [encrypted file system](../security/encrypt.md), all data is encrypted before uploading to the object storage, including automatic metadata backups. This is different from the `dump` command, which only output metadata in plain text. +For [encrypted file system](../security/encryption.md), all data is encrypted before uploading to the object storage, including automatic metadata backups. This is different from the `dump` command, which only output metadata in plain text. For an encrypted file system, it is necessary to additionally set the `JFS_RSA_PASSPHRASE` environment variable and specify the RSA private key and encryption algorithm when restoring the automatically backed-up metadata: diff --git a/docs/en/administration/troubleshooting.md b/docs/en/administration/troubleshooting.md index 38eed88e0d97..a422743574e3 100644 --- a/docs/en/administration/troubleshooting.md +++ b/docs/en/administration/troubleshooting.md @@ -100,7 +100,7 @@ The first issue with slow connection is upload / download timeouts (demonstrated * Reduce buffer size, e.g. [`--buffer-size=64`](../reference/command_reference.md#mount) or even lower. In a large bandwidth condition, increasing buffer size improves parallel performance. But in a low speed environment, this only makes `flush` operations slow and prone to timeouts. * Default timeout for GET / PUT requests are 60 seconds, increasing `--get-timeout` and `--put-timeout` may help with read / write timeouts. -In addition, the ["Client Write Cache"](../guide/cache_management.md#writeback) feature needs to be used with caution in low bandwidth environment. Let's briefly go over the JuiceFS Client background job design: every JuiceFS Client runs background jobs by default, one of which is data compaction, and if the client has poor internet speed, it'll drag down performance for the whole system. A worse case is when client write cache is also enabled, compaction results are uploaded too slowly, forcing other clients into a read hang when accessing the affected files: +In addition, the ["Client Write Cache"](../guide/cache.md#writeback) feature needs to be used with caution in low bandwidth environment. Let's briefly go over the JuiceFS Client background job design: every JuiceFS Client runs background jobs by default, one of which is data compaction, and if the client has poor internet speed, it'll drag down performance for the whole system. A worse case is when client write cache is also enabled, compaction results are uploaded too slowly, forcing other clients into a read hang when accessing the affected files: ```text # While compaction results are slowly being uploaded in low speed clients, read from other clients will hang and eventually fail @@ -115,7 +115,7 @@ To avoid this type of issue, we recommend disabling background jobs on low-bandw In JuiceFS, a typical read amplification manifests as object storage traffic being much larger than JuiceFS Client read speed. For example, JuiceFS Client is reading at 200MiB/s, while S3 traffic grows up to 2GiB/s. -JuiceFS is equipped with the [prefetch mechanism](../guide/cache_management.md#client-read-cache): when reading a block at arbitrary position, the whole block is asynchronously scheduled for download. This is a read optimization enabled by default, but in some cases, this brings read amplification. Once we know this, we can start the diagnose. +JuiceFS is equipped with the [prefetch mechanism](../guide/cache.md#client-read-cache): when reading a block at arbitrary position, the whole block is asynchronously scheduled for download. This is a read optimization enabled by default, but in some cases, this brings read amplification. Once we know this, we can start the diagnose. We'll collect JuiceFS access log (see [Access log](./fault_diagnosis_and_analysis.md#access-log)) to determine the file system access patterns of our application, and adjust JuiceFS configuration accordingly. Below is a diagnose process in an actual production environment: @@ -157,7 +157,7 @@ Studying the access log, it's easy to conclude that our application performs fre If JuiceFS Client takes up too much memory, you may choose to optimize memory usage using below methods, but note that memory optimization is not free, and each setting adjustment will bring corresponding overhead, please do sufficient testing and verification before adjustment. -* Read/Write buffer size (`--buffer-size`) directly correlate to JuiceFS Client memory usage, using a lower `--buffer-size` will effectively decrease memory usage, but please note that the reduction may also affect the read and write performance. Read more at [Read/Write Buffer](../guide/cache_management.md#buffer-size). +* Read/Write buffer size (`--buffer-size`) directly correlate to JuiceFS Client memory usage, using a lower `--buffer-size` will effectively decrease memory usage, but please note that the reduction may also affect the read and write performance. Read more at [Read/Write Buffer](../guide/cache.md#buffer-size). * JuiceFS mount client is an Go program, which means you can decrease `GOGC` (default to 100, in percentage) to adopt a more active garbage collection. This inevitably increase CPU usage and may even directly hinder performance. Read more at [Go Runtime](https://pkg.go.dev/runtime#hdr-Environment_Variables). * If you use self-hosted Ceph RADOS as the data storage of JuiceFS, consider replacing glibc with [TCMalloc](https://google.github.io/tcmalloc), the latter comes with more efficient memory management and may decrease off-heap memory footprint in this scenario. diff --git a/docs/en/development/internals.md b/docs/en/development/internals.md index 5140eab1a7b0..7f2d1871bf14 100644 --- a/docs/en/development/internals.md +++ b/docs/en/development/internals.md @@ -907,7 +907,7 @@ You can configure the compression algorithm (supporting `lz4` and `zstd`) with t #### Data encryption -The RSA private key can be configured to enable [static data encryption](../security/encrypt.md) when formatting a file system with the `--encrypt-rsa-key ` parameter, which allows all data blocks of this file system to be encrypted before uploading to the object storage. The object name is still the same as default, while its content becomes a header plus the result of the data encryption algorithm. The header contains a random seed and the symmetric key used for decryption, and the symmetric key itself is encrypted with the RSA private key. Therefore, it is not allowed to modify the RSA private key in the [file system formatting Information](#setting), otherwise reading existing data will fail. +The RSA private key can be configured to enable [static data encryption](../security/encryption.md) when formatting a file system with the `--encrypt-rsa-key ` parameter, which allows all data blocks of this file system to be encrypted before uploading to the object storage. The object name is still the same as default, while its content becomes a header plus the result of the data encryption algorithm. The header contains a random seed and the symmetric key used for decryption, and the symmetric key itself is encrypted with the RSA private key. Therefore, it is not allowed to modify the RSA private key in the [file system formatting Information](#setting), otherwise reading existing data will fail. :::note If both compression and encryption are enabled, the original data will be compressed and then encrypted before uploading to the object storage. diff --git a/docs/en/faq.md b/docs/en/faq.md index 04aa1d6d26e1..4a167a0083f9 100644 --- a/docs/en/faq.md +++ b/docs/en/faq.md @@ -92,7 +92,7 @@ Read [JuiceFS Internals](development/internals.md) and [Data Processing Flow](in You could mount JuiceFS with [`--writeback` option](reference/command_reference.md#mount), which will write the small files into local disks first, then upload them to object storage in background, this could speedup coping many small files into JuiceFS. -See ["Write Cache in Client"](guide/cache_management.md#writeback) for more information. +See ["Write Cache in Client"](guide/cache.md#writeback) for more information. ### Does JuiceFS support distributed cache? @@ -104,7 +104,7 @@ See ["Write Cache in Client"](guide/cache_management.md#writeback) for more info Yes, JuiceFS could be mounted using `juicefs` without root. The default directory for caching is `$HOME/.juicefs/cache` (macOS) or `/var/jfsCache` (Linux), you should change that to a directory which you have write permission. -See ["Read Cache in Client"](guide/cache_management.md#client-read-cache) for more information. +See ["Read Cache in Client"](guide/cache.md#client-read-cache) for more information. ## Access Related Questions diff --git a/docs/en/getting-started/for_distributed.md b/docs/en/getting-started/for_distributed.md index 8e584b0bd2ca..1e26a17b7cb1 100644 --- a/docs/en/getting-started/for_distributed.md +++ b/docs/en/getting-started/for_distributed.md @@ -113,7 +113,7 @@ JuiceFS guarantees a "close-to-open" consistency, which means that when two or m #### Increase cache size to improve performance -Since object storage is a network-based storage service, it will inevitably encounter access latency. To solve this problem, JuiceFS provides and enables caching mechanism by default, i.e. allocating a part of local storage as a buffer layer between data and object storage, and caching data asynchronously to local storage when reading files. Please refer to ["Cache"](../guide/cache_management.md) for more details. +Since object storage is a network-based storage service, it will inevitably encounter access latency. To solve this problem, JuiceFS provides and enables caching mechanism by default, i.e. allocating a part of local storage as a buffer layer between data and object storage, and caching data asynchronously to local storage when reading files. Please refer to ["Cache"](../guide/cache.md) for more details. JuiceFS will set 100GiB cache in `$HOME/.juicefs/cache` or `/var/jfsCache` directory by default. Setting a larger cache space on a faster SSD can effectively improve read and write performance of JuiceFS even more . diff --git a/docs/en/guide/cache_management.md b/docs/en/guide/cache.md similarity index 99% rename from docs/en/guide/cache_management.md rename to docs/en/guide/cache.md index c0b5dfa77ea6..d3cd55b62910 100644 --- a/docs/en/guide/cache_management.md +++ b/docs/en/guide/cache.md @@ -1,7 +1,6 @@ --- title: Cache sidebar_position: 3 -slug: /cache_management --- For a file system driven by a combination of object storage and database, cache is an important medium for interacting efficiently between the local client and the remote service. Read and write data can be loaded into the cache in advance or asynchronously, and then the client uploads to or prefetches from the remote service in the background. The use of caching technology can significantly reduce the latency of storage operations and increase data throughput compared to interacting with remote services directly. diff --git a/docs/en/introduction/README.md b/docs/en/introduction/README.md index 2d6aeab59e52..30508ab94c71 100644 --- a/docs/en/introduction/README.md +++ b/docs/en/introduction/README.md @@ -20,7 +20,7 @@ JuiceFS provides rich APIs for various forms of data management, analysis, archi - **Distributed**: Each file system can be mounted on thousands of servers at the same time with high-performance concurrent reads and writes and shared data. - **Strong Consistency**: Any changes committed to files are immediately visible on all servers. - **Outstanding Performance**: JuiceFS achieves millisecond-level latency and nearly unlimited throughput depending on the object storage scale (see [performance test results](../benchmark/benchmark.md)). -- **Data Security**: JuiceFS supports encryption in transit and encryption at rest (view [Details](../security/encrypt.md)). +- **Data Security**: JuiceFS supports encryption in transit and encryption at rest (view [Details](../security/encryption.md)). - **File Lock**: JuiceFS supports BSD lock (flock) and POSIX lock (fcntl). - **Data Compression**: JuiceFS supports the [LZ4](https://lz4.github.io/lz4) and [Zstandard](https://facebook.github.io/zstd) compression algorithms to save storage space. diff --git a/docs/en/introduction/architecture.md b/docs/en/introduction/architecture.md index 2121e65d5549..c18e8f922c22 100644 --- a/docs/en/introduction/architecture.md +++ b/docs/en/introduction/architecture.md @@ -38,7 +38,7 @@ Chunks exist to optimize lookup and positioning, while the actual file writing i For example, if a file is generated through a continuous sequential write, each chunk contains only one slice. The figure above illustrates this scenario: a 160 MB file is sequentially written, resulting in three chunks, each containing only one slice. -File writing generates slices, and invoking `flush` persists these slices. `flush` can be explicitly called by the user, and even if not invoked, the JuiceFS client automatically performs `flush` at the appropriate time to prevent buffer overflow (refer to [buffer-size](../guide/cache_management.md#buffer-size)). When persisting to the object storage, slices are further split into individual *blocks* (default maximum size of 4 MB) to enable multi-threaded concurrent writes, thereby enhancing write performance. The previously mentioned chunks and slices are logical data structures, while blocks represent the final physical storage form and serve as the smallest storage unit for the object storage and disk cache. +File writing generates slices, and invoking `flush` persists these slices. `flush` can be explicitly called by the user, and even if not invoked, the JuiceFS client automatically performs `flush` at the appropriate time to prevent buffer overflow (refer to [buffer-size](../guide/cache.md#buffer-size)). When persisting to the object storage, slices are further split into individual *blocks* (default maximum size of 4 MB) to enable multi-threaded concurrent writes, thereby enhancing write performance. The previously mentioned chunks and slices are logical data structures, while blocks represent the final physical storage form and serve as the smallest storage unit for the object storage and disk cache. ![Split slices to blocks](../images/slice-to-block.svg) @@ -68,5 +68,5 @@ However, it is not difficult to imagine that looking up the "most recently writt Additional technical aspects of JuiceFS storage design: * Irrespective of the file size, JuiceFS avoids storage merging to prevent read amplification and ensure optimal performance. -* JuiceFS provides strong consistency guarantees while allowing tuning options with caching mechanisms tailored to specific use cases. For example, by configuring more aggressive metadata caching, a certain level of consistency can be traded for enhanced performance. For more details, see [Metadata cache](../guide/cache_management.md#metadata-cache). +* JuiceFS provides strong consistency guarantees while allowing tuning options with caching mechanisms tailored to specific use cases. For example, by configuring more aggressive metadata caching, a certain level of consistency can be traded for enhanced performance. For more details, see [Metadata cache](../guide/cache.md#metadata-cache). * JuiceFS supports the ["Trash"](../security/trash.md) functionality and enables it by default. After a file is deleted, it is retained for a certain period before being permanently cleared. This helps you avoid data loss caused by accidental deletion. diff --git a/docs/en/introduction/comparison/juicefs_vs_s3fs.md b/docs/en/introduction/comparison/juicefs_vs_s3fs.md index cceb44023bb7..db233315e755 100644 --- a/docs/en/introduction/comparison/juicefs_vs_s3fs.md +++ b/docs/en/introduction/comparison/juicefs_vs_s3fs.md @@ -34,7 +34,7 @@ S3FS does not limit the cache capacity by default, which may cause the cache to JuiceFS uses a completely different caching approach than S3FS. First, JuiceFS guarantees data consistency. Secondly, JuiceFS defines a default disk cache usage limit of 100GiB, which can be freely adjusted by users as needed, and by default ensures that no more space is used when disk free space falls below 10%. When the cache usage limit reaches the upper limit, JuiceFS will automatically do cleanup using an LRU-like algorithm to ensure that cache is always available for subsequent read and write operations. -For more on JuiceFS caching, see [documentation](../../guide/cache_management.md). +For more on JuiceFS caching, see [documentation](../../guide/cache.md). ## Features diff --git a/docs/en/introduction/comparison/juicefs_vs_seaweedfs.md b/docs/en/introduction/comparison/juicefs_vs_seaweedfs.md index 78d24fe2fa54..be90f5163593 100644 --- a/docs/en/introduction/comparison/juicefs_vs_seaweedfs.md +++ b/docs/en/introduction/comparison/juicefs_vs_seaweedfs.md @@ -137,7 +137,7 @@ SeaweedFS determines whether to compress data based on factors such as the file Both support encryption, including encryption during transmission and at rest: - SeaweedFS supports encryption both in transit and at rest. When data encryption is enabled, all data written to the volume server is encrypted using random keys. The corresponding key information is managed by the filer that maintains the file metadata. For details, see the [Wiki](https://github.com/seaweedfs/seaweedfs/wiki/Filer-Data-Encryption). -- For details about JuiceFS' encryption feature, see [Data Encryption](../../security/encrypt.md). +- For details about JuiceFS' encryption feature, see [Data Encryption](../../security/encryption.md). ## Client protocol comparison @@ -177,7 +177,7 @@ Both support a CSI Driver. For details, see: SeaweedFS client is equipped with [basic cache capabilities](https://github.com/seaweedfs/seaweedfs/wiki/FUSE-Mount), but its documentation weren't located at the time of writing, you can search for `cache` in the [source code](https://github.com/seaweedfs/seaweedfs/blob/master/weed/command/mount.go). -JuiceFS' client supports [metadata and data caching](../../guide/cache_management.md), allowing users to optimize based on their application's needs. +JuiceFS' client supports [metadata and data caching](../../guide/cache.md), allowing users to optimize based on their application's needs. ### Object storage gateway diff --git a/docs/en/introduction/io_processing.md b/docs/en/introduction/io_processing.md index 09f90ccb648a..708d2bda22ae 100644 --- a/docs/en/introduction/io_processing.md +++ b/docs/en/introduction/io_processing.md @@ -28,7 +28,7 @@ Generally, when JuiceFS writes a small file, the file is uploaded to the object - The size of data written to the object storage during PUT operations is 128 KiB, calculated by `object.put / object.put_c`. - The number of metadata transactions is approximately twice the number of PUT operations, since each file requires one create and one write. -When JuiceFS uploads objects smaller than the block size, it simultaneously writes them into the [local cache](../guide/cache_management.md) to improve future performance. As shown in the third stage of the figure above, the write bandwidth of the `blockcache` is the same as that of the object storage. Since small files are cached, reading these files is extremely fast, as demonstrated in the fourth stage. +When JuiceFS uploads objects smaller than the block size, it simultaneously writes them into the [local cache](../guide/cache.md) to improve future performance. As shown in the third stage of the figure above, the write bandwidth of the `blockcache` is the same as that of the object storage. Since small files are cached, reading these files is extremely fast, as demonstrated in the fourth stage. Write operations are immediately committed to the client buffer, resulting in very low write latency (typically just a few microseconds). The actual upload to the object storage is automatically triggered internally when certain conditions are met, such as when the size or number of slices exceeds their limit, or data stays in the buffer for too long. Explicit calls, such as closing a file or invoking `fsync`, can also trigger uploading. @@ -48,7 +48,7 @@ Client write cache is also referred to as "Writeback mode" throughout the docs. For scenarios that does not deem consistency and data security as top priorities, enabling client write cache is also an option to further improve performance. When client write cache is enabled, flush operations return immediately after writing data to the local cache directory. Then, local data is uploaded asynchronously to the object storage. In other words, the local cache directory is a cache layer for the object storage. -Learn more in [Client Write Cache](../guide/cache_management.md#writeback). +Learn more in [Client Write Cache](../guide/cache.md#writeback). ## Data reading process {#workflow-of-read} diff --git a/docs/en/reference/command_reference.md b/docs/en/reference/command_reference.md index 811f61de157d..23a49bdd3271 100644 --- a/docs/en/reference/command_reference.md +++ b/docs/en/reference/command_reference.md @@ -635,13 +635,13 @@ juicefs mount redis://localhost /mnt/jfs --backup-meta 0 #### Metadata cache related options {#mount-metadata-cache-options} -For metadata cache description and usage, refer to [Kernel metadata cache](../guide/cache_management.md#kernel-metadata-cache) and [Client memory metadata cache](../guide/cache_management.md#client-memory-metadata-cache). +For metadata cache description and usage, refer to [Kernel metadata cache](../guide/cache.md#kernel-metadata-cache) and [Client memory metadata cache](../guide/cache.md#client-memory-metadata-cache). |Items|Description| |-|-| -|`--attr-cache=1`|attributes cache timeout in seconds (default: 1), read [Kernel metadata cache](../guide/cache_management.md#kernel-metadata-cache)| -|`--entry-cache=1`|file entry cache timeout in seconds (default: 1), read [Kernel metadata cache](../guide/cache_management.md#kernel-metadata-cache)| -|`--dir-entry-cache=1`|dir entry cache timeout in seconds (default: 1), read [Kernel metadata cache](../guide/cache_management.md#kernel-metadata-cache)| +|`--attr-cache=1`|attributes cache timeout in seconds (default: 1), read [Kernel metadata cache](../guide/cache.md#kernel-metadata-cache)| +|`--entry-cache=1`|file entry cache timeout in seconds (default: 1), read [Kernel metadata cache](../guide/cache.md#kernel-metadata-cache)| +|`--dir-entry-cache=1`|dir entry cache timeout in seconds (default: 1), read [Kernel metadata cache](../guide/cache.md#kernel-metadata-cache)| |`--open-cache=0`|open file cache timeout in seconds (0 means disable this feature) (default: 0)| |`--open-cache-limit value` 1.1 |max number of open files to cache (soft limit, 0 means unlimited) (default: 10000)| @@ -655,7 +655,7 @@ For metadata cache description and usage, refer to [Kernel metadata cache](../gu |`--get-timeout=60`|the max number of seconds to download an object (default: 60)| |`--put-timeout=60`|the max number of seconds to upload an object (default: 60)| |`--io-retries=10`|number of retries after network failure (default: 10)| -|`--max-uploads=20`|Upload concurrency, defaults to 20. This is already a reasonably high value for 4M writes, with such write pattern, increasing upload concurrency usually demands higher `--buffer-size`, learn more at [Read/Write Buffer](../guide/cache_management.md#buffer-size). But for random writes around 100K, 20 might not be enough and can cause congestion at high load, consider using a larger upload concurrency, or try to consolidate small writes in the application end. | +|`--max-uploads=20`|Upload concurrency, defaults to 20. This is already a reasonably high value for 4M writes, with such write pattern, increasing upload concurrency usually demands higher `--buffer-size`, learn more at [Read/Write Buffer](../guide/cache.md#buffer-size). But for random writes around 100K, 20 might not be enough and can cause congestion at high load, consider using a larger upload concurrency, or try to consolidate small writes in the application end. | |`--max-deletes=10`|number of threads to delete objects (default: 10)| |`--upload-limit=0`|bandwidth limit for upload in Mbps (default: 0)| |`--download-limit=0`|bandwidth limit for download in Mbps (default: 0)| @@ -664,15 +664,15 @@ For metadata cache description and usage, refer to [Kernel metadata cache](../gu |Items|Description| |-|-| -|`--buffer-size=300`|total read/write buffering in MiB (default: 300), see [Read/Write buffer](../guide/cache_management.md#buffer-size)| -|`--prefetch=1`|prefetch N blocks in parallel (default: 1), see [Client read data cache](../guide/cache_management.md#client-read-cache)| -|`--writeback`|upload objects in background (default: false), see [Client write data cache](../guide/cache_management.md#writeback)| -|`--upload-delay=0`|When `--writeback` is enabled, you can use this option to add a delay to object storage upload, default to 0, meaning that upload will begin immediately after write. Different units are supported, including `s` (second), `m` (minute), `h` (hour). If files are deleted during this delay, upload will be skipped entirely, when using JuiceFS for temporary storage, use this option to reduce resource usage. Refer to [Client write data cache](../guide/cache_management.md#writeback).| -|`--cache-dir=value`|directory paths of local cache, use `:` (Linux, macOS) or `;` (Windows) to separate multiple paths (default: `$HOME/.juicefs/cache` or `/var/jfsCache`), see [Client read data cache](../guide/cache_management.md#client-read-cache)| +|`--buffer-size=300`|total read/write buffering in MiB (default: 300), see [Read/Write buffer](../guide/cache.md#buffer-size)| +|`--prefetch=1`|prefetch N blocks in parallel (default: 1), see [Client read data cache](../guide/cache.md#client-read-cache)| +|`--writeback`|upload objects in background (default: false), see [Client write data cache](../guide/cache.md#writeback)| +|`--upload-delay=0`|When `--writeback` is enabled, you can use this option to add a delay to object storage upload, default to 0, meaning that upload will begin immediately after write. Different units are supported, including `s` (second), `m` (minute), `h` (hour). If files are deleted during this delay, upload will be skipped entirely, when using JuiceFS for temporary storage, use this option to reduce resource usage. Refer to [Client write data cache](../guide/cache.md#writeback).| +|`--cache-dir=value`|directory paths of local cache, use `:` (Linux, macOS) or `;` (Windows) to separate multiple paths (default: `$HOME/.juicefs/cache` or `/var/jfsCache`), see [Client read data cache](../guide/cache.md#client-read-cache)| |`--cache-mode value` 1.1 |file permissions for cached blocks (default: "0600")| -|`--cache-size=102400`|size of cached object for read in MiB (default: 102400), see [Client read data cache](../guide/cache_management.md#client-read-cache)| -|`--free-space-ratio=0.1`|min free space ratio (default: 0.1), if [Client write data cache](../guide/cache_management.md#writeback) is enabled, this option also controls write cache size, see [Client read data cache](../guide/cache_management.md#client-read-cache)| -|`--cache-partial-only`|cache random/small read only (default: false), see [Client read data cache](../guide/cache_management.md#client-read-cache)| +|`--cache-size=102400`|size of cached object for read in MiB (default: 102400), see [Client read data cache](../guide/cache.md#client-read-cache)| +|`--free-space-ratio=0.1`|min free space ratio (default: 0.1), if [Client write data cache](../guide/cache.md#writeback) is enabled, this option also controls write cache size, see [Client read data cache](../guide/cache.md#client-read-cache)| +|`--cache-partial-only`|cache random/small read only (default: false), see [Client read data cache](../guide/cache.md#client-read-cache)| |`--verify-cache-checksum value` 1.1 |Checksum level for cache data. After enabled, checksum will be calculated on divided parts of the cache blocks and stored on disks, which are used for verification during reads. The following strategies are supported:
| |`--cache-eviction value` 1.1 |cache eviction policy (none or 2-random) (default: "2-random")| |`--cache-scan-interval value` 1.1 |interval (in seconds) to scan cache-dir to rebuild in-memory index (default: "3600")| diff --git a/docs/en/security/encrypt.md b/docs/en/security/encryption.md similarity index 100% rename from docs/en/security/encrypt.md rename to docs/en/security/encryption.md diff --git a/docs/zh_cn/administration/fault_diagnosis_and_analysis.md b/docs/zh_cn/administration/fault_diagnosis_and_analysis.md index fe1966eed684..27f39fda5e46 100644 --- a/docs/zh_cn/administration/fault_diagnosis_and_analysis.md +++ b/docs/zh_cn/administration/fault_diagnosis_and_analysis.md @@ -209,7 +209,7 @@ juicefs profile /tmp/juicefs.accesslog --uid 12345 - `cpu`:进程的 CPU 使用率。 - `mem`:进程的物理内存使用量。 -- `buf`:进程已使用的[读写缓冲区](../guide/cache_management.md#buffer-size)大小,如果该数值逼近甚至超过客户端所设置的 [`--buffer-size`](../reference/command_reference.md#mount),说明读写缓冲区空间不足,需要视情况扩大,或者降低应用读写负载。 +- `buf`:进程已使用的[读写缓冲区](../guide/cache.md#buffer-size)大小,如果该数值逼近甚至超过客户端所设置的 [`--buffer-size`](../reference/command_reference.md#mount),说明读写缓冲区空间不足,需要视情况扩大,或者降低应用读写负载。 - `cache`:内部指标,无需关注。 #### `fuse` diff --git a/docs/zh_cn/administration/metadata_dump_load.md b/docs/zh_cn/administration/metadata_dump_load.md index c59657dceb00..4bf53755e615 100644 --- a/docs/zh_cn/administration/metadata_dump_load.md +++ b/docs/zh_cn/administration/metadata_dump_load.md @@ -115,7 +115,7 @@ juicefs config --secret-key xxxxx mysql://user:password@(192.168.1.6:3306)/juice ### 加密文件系统 {#encrypted-file-system} -对于[加密的文件系统](../security/encrypt.md),所有文件都会在本地加密后才上传到后端对象存储,包括元数据自动备份文件,也会加密后才上传至对象存储。这与 `dump` 命令不同,`dump` 导出的元数据永远是明文的。 +对于[加密的文件系统](../security/encryption.md),所有文件都会在本地加密后才上传到后端对象存储,包括元数据自动备份文件,也会加密后才上传至对象存储。这与 `dump` 命令不同,`dump` 导出的元数据永远是明文的。 对于加密文件系统,在恢复自动备份的元数据时需要额外设置 `JFS_RSA_PASSPHRASE` 环境变量,以及指定 RSA 私钥和加密算法: diff --git a/docs/zh_cn/administration/troubleshooting.md b/docs/zh_cn/administration/troubleshooting.md index 75ad571cdb92..79379cc7fcfc 100644 --- a/docs/zh_cn/administration/troubleshooting.md +++ b/docs/zh_cn/administration/troubleshooting.md @@ -100,7 +100,7 @@ $ ls -l /usr/bin/fusermount * 降低读写缓冲区大小,比如 [`--buffer-size=64`](../reference/command_reference.md#mount) 或者更小。当带宽充裕时,增大读写缓冲区能提升并发性能。但在低带宽场景下使用过大的读写缓冲区,`flush` 的上传时间会很长,因此容易超时。 * 默认 GET/PUT 请求超时时间为 60 秒,因此增大 `--get-timeout` 以及 `--put-timeout`,可以改善读写超时的情况。 -此外,低带宽环境下需要慎用[「客户端写缓存」](../guide/cache_management.md#writeback)特性。先简单介绍一下 JuiceFS 的后台任务设计:每个 JuiceFS 客户端默认都启用后台任务,后台任务中会执行碎片合并(compaction)、异步删除等工作,而如果节点网络状况太差,则会降低系统整体性能。更糟的是如果该节点还启用了客户端写缓存,则容易出现碎片合并后上传缓慢,导致其他节点无法读取该文件的危险情况: +此外,低带宽环境下需要慎用[「客户端写缓存」](../guide/cache.md#writeback)特性。先简单介绍一下 JuiceFS 的后台任务设计:每个 JuiceFS 客户端默认都启用后台任务,后台任务中会执行碎片合并(compaction)、异步删除等工作,而如果节点网络状况太差,则会降低系统整体性能。更糟的是如果该节点还启用了客户端写缓存,则容易出现碎片合并后上传缓慢,导致其他节点无法读取该文件的危险情况: ```text # 由于 writeback,碎片合并后的结果迟迟上传不成功,导致其他节点读取文件报错 @@ -115,7 +115,7 @@ $ ls -l /usr/bin/fusermount 在 JuiceFS 中,一个典型的读放大现象是:对象存储的下行流量,远大于实际读文件的速度。比方说 JuiceFS 客户端的读吞吐为 200MiB/s,但是在 S3 观察到了 2GiB/s 的下行流量。 -JuiceFS 中内置了[预读](../guide/cache_management.md#client-read-cache)(prefetch)机制:随机读 block 的某一段,会触发整个 block 下载,这个默认开启的读优化策略,在某些场景下会带来读放大。了解这个设计以后,我们就可以开始排查了。 +JuiceFS 中内置了[预读](../guide/cache.md#client-read-cache)(prefetch)机制:随机读 block 的某一段,会触发整个 block 下载,这个默认开启的读优化策略,在某些场景下会带来读放大。了解这个设计以后,我们就可以开始排查了。 结合先前问题排查方法一章中介绍的[访问日志](./fault_diagnosis_and_analysis.md#access-log)知识,我们可以采集一些访问日志来分析程序的读模式,然后针对性地调整配置。下面是一个实际生产环境案例的排查过程: @@ -157,7 +157,7 @@ grep "read (148153116," access.log 如果 JuiceFS 客户端内存占用过高,考虑按照以下方向进行排查调优,但也请注意,内存优化势必不是免费的,每一项设置调整都将带来相应的开销,请在调整前做好充分的测试与验证。 -* 读写缓冲区(也就是 `--buffer-size`)的大小,直接与 JuiceFS 客户端内存占用相关,因此可以通过降低读写缓冲区大小来减少内存占用,但请注意降低以后可能同时也会对读写性能造成影响。更多详见[「读写缓冲区」](../guide/cache_management.md#buffer-size)。 +* 读写缓冲区(也就是 `--buffer-size`)的大小,直接与 JuiceFS 客户端内存占用相关,因此可以通过降低读写缓冲区大小来减少内存占用,但请注意降低以后可能同时也会对读写性能造成影响。更多详见[「读写缓冲区」](../guide/cache.md#buffer-size)。 * JuiceFS 挂载客户端是一个 Go 程序,因此也可以通过降低 `GOGC`(默认 100)来令 Go 在运行时执行更为激进的垃圾回收(将带来更多 CPU 消耗,甚至直接影响性能)。详见[「Go Runtime」](https://pkg.go.dev/runtime#hdr-Environment_Variables)。 * 如果你使用自建的 Ceph RADOS 作为 JuiceFS 的数据存储,可以考虑将 glibc 替换为 [TCMalloc](https://google.github.io/tcmalloc),后者有着更高效的内存管理实现,能在该场景下有效降低堆外内存占用。 diff --git a/docs/zh_cn/development/internals.md b/docs/zh_cn/development/internals.md index f67bfb046eed..42c1e448f566 100644 --- a/docs/zh_cn/development/internals.md +++ b/docs/zh_cn/development/internals.md @@ -910,7 +910,7 @@ objects: #### 数据加密 -在文件系统格式化时可以通过 `--encrypt-rsa-key ` 参数配置 RSA 私钥以开启[静态数据加密](../security/encrypt.md)功能,使得此文件系统的所有数据 Block 会经过加密后再上传到对象存储。此时对象名称仍与默认配置相同,内容为一段 header 加上数据经加密算法后的结果。这段 header 里记录了用来解密的对称密钥以及随机种子,而对称密钥本身又经过 RSA 私钥加密。因此,文件[文统格式化信息](#setting)中的 RSA 私钥目前不允许修改,否则会导致读取已有数据失败。 +在文件系统格式化时可以通过 `--encrypt-rsa-key ` 参数配置 RSA 私钥以开启[静态数据加密](../security/encryption.md)功能,使得此文件系统的所有数据 Block 会经过加密后再上传到对象存储。此时对象名称仍与默认配置相同,内容为一段 header 加上数据经加密算法后的结果。这段 header 里记录了用来解密的对称密钥以及随机种子,而对称密钥本身又经过 RSA 私钥加密。因此,文件[文统格式化信息](#setting)中的 RSA 私钥目前不允许修改,否则会导致读取已有数据失败。 :::note 备注 若同时开启压缩和加密,原始数据会先压缩再加密后上传到对象存储。 diff --git a/docs/zh_cn/faq.md b/docs/zh_cn/faq.md index bdf0d6a72de3..54e518d3ea6c 100644 --- a/docs/zh_cn/faq.md +++ b/docs/zh_cn/faq.md @@ -92,7 +92,7 @@ JuiceFS 不将原始文件存入对象存储,而是将其按照某个大小( 请在挂载时加上 [`--writeback` 选项](reference/command_reference.md#mount),它会先把数据写入本机的缓存,然后再异步上传到对象存储,会比直接上传到对象存储快很多倍。 -请查看[「客户端写缓存」](guide/cache_management.md#writeback)了解更多信息。 +请查看[「客户端写缓存」](guide/cache.md#writeback)了解更多信息。 ### JuiceFS 支持分布式缓存吗? diff --git a/docs/zh_cn/getting-started/for_distributed.md b/docs/zh_cn/getting-started/for_distributed.md index ca7135949dd5..c34dbf8bf1d7 100644 --- a/docs/zh_cn/getting-started/for_distributed.md +++ b/docs/zh_cn/getting-started/for_distributed.md @@ -113,7 +113,7 @@ juicefs mount redis://tom:mypassword@myjfs-sh-abc.redis.rds.aliyuncs.com:6379/1 #### 调大缓存提升性能 -由于「对象存储」是基于网络的存储服务,不可避免会产生访问延时。为了解决这个问题,JuiceFS 提供并默认启用了缓存机制,即划拨一部分本地存储作为数据与对象存储之间的一个缓冲层,读取文件时会异步地将数据缓存到本地存储,详情请查阅[「缓存」](../guide/cache_management.md)。 +由于「对象存储」是基于网络的存储服务,不可避免会产生访问延时。为了解决这个问题,JuiceFS 提供并默认启用了缓存机制,即划拨一部分本地存储作为数据与对象存储之间的一个缓冲层,读取文件时会异步地将数据缓存到本地存储,详情请查阅[「缓存」](../guide/cache.md)。 缓存机制让 JuiceFS 可以高效处理海量数据的读写任务,默认情况下,JuiceFS 会在 `$HOME/.juicefs/cache` 或 `/var/jfsCache` 目录设置 100GiB 的缓存。在速度更快的 SSD 上设置更大的缓存空间可以有效提升 JuiceFS 的读写性能。 diff --git a/docs/zh_cn/guide/cache_management.md b/docs/zh_cn/guide/cache.md similarity index 98% rename from docs/zh_cn/guide/cache_management.md rename to docs/zh_cn/guide/cache.md index 6527555487b0..2969f398c0f5 100644 --- a/docs/zh_cn/guide/cache_management.md +++ b/docs/zh_cn/guide/cache.md @@ -1,7 +1,6 @@ --- title: 缓存 sidebar_position: 3 -slug: /cache_management --- 对于一个由对象存储和数据库组合驱动的文件系统,缓存是本地客户端与远端服务之间高效交互的重要纽带。读写的数据可以提前或者异步载入缓存,再由客户端在后台与远端服务交互执行异步上传或预取数据。相比直接与远端服务交互,采用缓存技术可以大大降低存储操作的延时并提高数据吞吐量。 @@ -67,7 +66,7 @@ JuiceFS 客户端在 `open` 操作即打开一个文件时,其文件属性会 * 发起修改的挂载点,自身的内核元数据缓存能够主动失效。但对于多个挂载点访问、修改同一文件的情况,只有发起修改的客户端能享受到内核元数据缓存主动失效,其他客户端就只能等待缓存自然过期。 * 调用 `write` 成功后,挂载点自身立刻就能看到文件长度的变化(比如用 `ls -al` 查看文件大小,可能会注意到文件不断变大)——但这并不意味着修改已经成功提交,在 `flush` 成功前,是不会将这些改动同步到对象存储的,其他挂载点也看不到文件的变动。调用 `fsync, fdatasync, close` 都能触发 `flush`,让修改得以持久化、对其他客户端可见。 -* 作为上一点的极端情况,如果调用 `write` 写入,并在当前挂载点观察到文件长度不断增长,但最后的 `flush` 因为某种原因失败了,比方说到达了文件系统配额上限,文件会长度会立刻发生回退,比如从 10M 变为 0。这是一个容易引人误会的情况——并不是 JuiceFS 清空了你的数据,而是写入自始至终就没有成功,只是由于发起修改的挂载点能够提前预览文件长度的变化,让人误以为写入已经成功提交。 +* 作为上一点的极端情况,如果调用 `write` 写入,并在当前挂载点观察到文件长度不断增长,但最后的 `flush` 因为某种原因失败了,比方说到达了文件系统配额上限,文件长度会立刻发生回退,比如从 10M 变为 0。这是一个容易引人误会的情况——并不是 JuiceFS 清空了你的数据,而是写入自始至终就没有成功,只是由于发起修改的挂载点能够提前预览文件长度的变化,让人误以为写入已经成功提交。 * 发起修改的挂载点,能够监听对应的文件变动(比如使用 [`fswatch`](https://emcrisostomo.github.io/fswatch/) 或者 [`Watchdog`](https://python-watchdog.readthedocs.io/en/stable))。但范畴也仅限于该挂载点发起修改的文件,也就是说 A 修改的文件,无法在 B 挂载点进行监听。 * 目前而言,由于 FUSE 尚不支持 inotify API,所以如果你希望监听 JuiceFS 特定目录下的文件变化,请使用轮询的方式(比如 [`PollingObserver`](https://python-watchdog.readthedocs.io/en/stable/_modules/watchdog/observers/polling.html#PollingObserver))。 diff --git a/docs/zh_cn/introduction/README.md b/docs/zh_cn/introduction/README.md index 34a3110f478a..84474d7decee 100644 --- a/docs/zh_cn/introduction/README.md +++ b/docs/zh_cn/introduction/README.md @@ -24,7 +24,7 @@ JuiceFS 提供了丰富的 API,适用于各种形式数据的管理、分析 5. **分布式设计**:同一文件系统可在上千台服务器同时挂载,高性能并发读写,共享数据; 6. **强一致性**:确认的文件修改会在所有服务器上立即可见,保证强一致性; 7. **强悍性能**:毫秒级延迟,近乎无限的吞吐量(取决于对象存储规模),查看[性能测试结果](../benchmark/benchmark.md); -8. **数据安全**:支持传输中加密(encryption in transit)和静态加密(encryption at rest),[查看详情](../security/encrypt.md); +8. **数据安全**:支持传输中加密(encryption in transit)和静态加密(encryption at rest),[查看详情](../security/encryption.md); 9. **文件锁**:支持 BSD 锁(flock)和 POSIX 锁(fcntl); 10. **数据压缩**:支持 [LZ4](https://lz4.github.io/lz4) 和 [Zstandard](https://facebook.github.io/zstd) 压缩算法,节省存储空间。 diff --git a/docs/zh_cn/introduction/architecture.md b/docs/zh_cn/introduction/architecture.md index ff82f9ad5e5a..c608252c174c 100644 --- a/docs/zh_cn/introduction/architecture.md +++ b/docs/zh_cn/introduction/architecture.md @@ -38,7 +38,7 @@ Chunk 的存在是为了优化查找定位,实际的文件写入则在「Slice 举例说明,如果一个文件是由一次连贯的顺序写生成,那么每个 Chunk 中只将会仅包含一个 Slice。上方的示意图就属于这种情况:顺序写入一个 160M 文件,最终会产生 3 个 Chunk,而每个 Chunk 仅包含一个 Slice。 -文件写入会产生 Slice,而调用 `flush` 则会将这些 Slice 持久化。`flush` 可以被用户显式调用,就算不调用,JuiceFS 客户端也会自动在恰当的时机进行 `flush`,防止[缓冲区](../guide/cache_management.md#buffer-size)被写满。持久化到对象存储时,为了能够尽快写入,会对 Slice 进行进一步拆分成一个个「Block」(默认最大 4M),多线程并发写入以提升写性能。上边介绍的 Chunk、Slice,其实都是逻辑数据结构,Block 则是最终的物理存储形式,是对象存储和磁盘缓存的最小存储单元。 +文件写入会产生 Slice,而调用 `flush` 则会将这些 Slice 持久化。`flush` 可以被用户显式调用,就算不调用,JuiceFS 客户端也会自动在恰当的时机进行 `flush`,防止[缓冲区](../guide/cache.md#buffer-size)被写满。持久化到对象存储时,为了能够尽快写入,会对 Slice 进行进一步拆分成一个个「Block」(默认最大 4M),多线程并发写入以提升写性能。上边介绍的 Chunk、Slice,其实都是逻辑数据结构,Block 则是最终的物理存储形式,是对象存储和磁盘缓存的最小存储单元。 ![slice-to-block](../images/slice-to-block.svg) @@ -63,5 +63,5 @@ Chunk 的存在是为了优化查找定位,实际的文件写入则在「Slice 最后,JuiceFS 的存储设计,还有着以下值得一提的技术特点: * 对于任意大小的文件,JuiceFS 都不进行合并存储,这也是为了性能考虑,避免读放大。 -* 提供强一致性保证,但也可以根据场景需要与缓存功能一起调优,比如通过设置出更激进的元数据缓存,牺牲一部分一致性,换取更好的性能。详见[「元数据缓存」](../guide/cache_management.md#metadata-cache)。 +* 提供强一致性保证,但也可以根据场景需要与缓存功能一起调优,比如通过设置出更激进的元数据缓存,牺牲一部分一致性,换取更好的性能。详见[「元数据缓存」](../guide/cache.md#metadata-cache)。 * 支持并默认开启[「回收站」](../security/trash.md)功能,删除文件后保留一段时间才彻底清理,最大程度避免误删文件导致事故。 diff --git a/docs/zh_cn/introduction/comparison/juicefs_vs_alluxio.md b/docs/zh_cn/introduction/comparison/juicefs_vs_alluxio.md index ec05c1e9cec1..c35bf17514e0 100644 --- a/docs/zh_cn/introduction/comparison/juicefs_vs_alluxio.md +++ b/docs/zh_cn/introduction/comparison/juicefs_vs_alluxio.md @@ -50,7 +50,7 @@ JuiceFS 是一个分布式文件系统,实现了自己的存储格式,文件 Alluxio 和 JuiceFS 都支持多级缓存,设计上各有特色,但都能够支持用硬盘、SSD、内存来灵活配置大容量或者高性能缓存,详见: * [Alluxio 缓存](https://docs.alluxio.io/os/user/stable/cn/core-services/Caching.html) -* [JuiceFS 缓存](../../guide/cache_management.md) +* [JuiceFS 缓存](../../guide/cache.md) * JuiceFS 企业版在社区版的基础上,支持更为强大的[分布式缓存](/docs/zh/cloud/guide/distributed-cache) ### 一致性 {#consistency} @@ -82,7 +82,7 @@ Alluxio 本质上并不是一个存储系统,虽然你也可以通过 Alluxio Alluxio 仅在[企业版](https://docs.alluxio.io/ee/user/stable/en/security/Security.html#encryption)支持数据加密。 -JuiceFS 支持[传输中加密以及静态加密](../../security/encrypt.md)。 +JuiceFS 支持[传输中加密以及静态加密](../../security/encryption.md)。 ## 客户端协议对比 {#client-protocol-comparison} diff --git a/docs/zh_cn/introduction/comparison/juicefs_vs_s3fs.md b/docs/zh_cn/introduction/comparison/juicefs_vs_s3fs.md index 96806db291d8..3647fa21e002 100644 --- a/docs/zh_cn/introduction/comparison/juicefs_vs_s3fs.md +++ b/docs/zh_cn/introduction/comparison/juicefs_vs_s3fs.md @@ -34,7 +34,7 @@ S3FS 默认不限制缓存空间上限,对于较大的 Buket 可能导致缓 在缓存方面,JuiceFS 与 S3FS 完全不同,首先,JuiceFS 是保证数据一致性的。其次,JuiceFS 默认定义了 100GiB 的磁盘缓存使用上限,用户可以根据需要自由调整该值,而且默认会确保磁盘剩余空间低于 10% 时不再使用更多空间。当缓存用量达到上限,JuiceFS 会采用类似 LRU 的算法自动进行清理,确保后续的读写操作始终有缓存可用。 -有关 JuiceFS 缓存的更多内容请参考[文档](../../guide/cache_management.md)。 +有关 JuiceFS 缓存的更多内容请参考[文档](../../guide/cache.md)。 ## 功能特性 diff --git a/docs/zh_cn/introduction/comparison/juicefs_vs_seaweedfs.md b/docs/zh_cn/introduction/comparison/juicefs_vs_seaweedfs.md index 024de78cc5b2..59e1df60b8d7 100644 --- a/docs/zh_cn/introduction/comparison/juicefs_vs_seaweedfs.md +++ b/docs/zh_cn/introduction/comparison/juicefs_vs_seaweedfs.md @@ -131,7 +131,7 @@ JuiceFS 支持使用 LZ4 或者 Zstandard 来为所有写入的数据进行压 二者均支持加密,包括传输中加密及静态加密: * SeaweedFS 支持传输中加密与静态加密。在开启了数据加密后,所有写入 Volume Server 的数据都会使用随机的密钥进行加密,而这些对应的随机密钥信息则由维护文件元数据的 Filer 进行管理,详见 [Wiki](https://github.com/seaweedfs/seaweedfs/wiki/Filer-Data-Encryption)。 -* JuiceFS 的加密功能详见[文档](../../security/encrypt.md)。 +* JuiceFS 的加密功能详见[文档](../../security/encryption.md)。 ## 客户端协议对比 @@ -169,7 +169,7 @@ JuiceFS [完整兼容 HDFS API](../../deployment/hadoop_java_sdk.md)。包括 Ha SeaweedFS 客户端[具备简单客户端缓存能力](https://github.com/seaweedfs/seaweedfs/wiki/FUSE-Mount),由于在写作期间未能找到具体文档,可以直接在其[源码](https://github.com/seaweedfs/seaweedfs/blob/master/weed/command/mount.go)中搜索 `cache` 相关字样。 -JuiceFS 客户端支持[元数据以及数据缓存](../../guide/cache_management.md),提供更丰富的定制空间,允许用户根据自己的应用场景进行调优。 +JuiceFS 客户端支持[元数据以及数据缓存](../../guide/cache.md),提供更丰富的定制空间,允许用户根据自己的应用场景进行调优。 ### 对象存储网关 diff --git a/docs/zh_cn/introduction/io_processing.md b/docs/zh_cn/introduction/io_processing.md index f28ed233db79..199db16c1f17 100644 --- a/docs/zh_cn/introduction/io_processing.md +++ b/docs/zh_cn/introduction/io_processing.md @@ -28,7 +28,7 @@ JuiceFS 对大文件会做多级拆分([JuiceFS 如何存储文件](../introdu - 对象存储 PUT 的大小就是 128 KiB - 元数据事务数大致是 PUT 计数的两倍,对应每个文件的一次 Create 和一次 Write -对于这种不足一个 Block Size 的对象,JuiceFS 在上传的同时还会尝试写入到本地[缓存](../guide/cache_management.md),来提升后续可能的读请求速度。因此从图中第 3 阶段也可以看到,创建小文件时,本地缓存(blockcache)与对象存储有着同等的写入带宽,而在读取时(第 4 阶段)大部分均在缓存命中,这使得小文件的读取速度看起来特别快。 +对于这种不足一个 Block Size 的对象,JuiceFS 在上传的同时还会尝试写入到本地[缓存](../guide/cache.md),来提升后续可能的读请求速度。因此从图中第 3 阶段也可以看到,创建小文件时,本地缓存(blockcache)与对象存储有着同等的写入带宽,而在读取时(第 4 阶段)大部分均在缓存命中,这使得小文件的读取速度看起来特别快。 由于写请求写入客户端内存缓冲区即可返回,因此通常来说 JuiceFS 的 Write 时延非常低(几十微秒级别),真正上传到对象存储的动作由内部自动触发,比如单个 Slice 过大,Slice 数量过多,或者仅仅是在缓冲区停留时间过长等,或应用主动触发,比如关闭文件、调用 `fsync` 等。 @@ -48,7 +48,7 @@ JuiceFS 支持随机写,包括通过 mmap 等进行的随机写。 如果对数据一致性和可靠性没有极致要求,可以在挂载时添加 `--writeback` 以进一步提写性能。客户端缓存开启后,Slice flush 仅需写到本地缓存目录即可返回,数据由后台线程异步上传到对象存储。换个角度理解,此时本地目录就是对象存储的缓存层。 -更详细的介绍请见[「客户端写缓存」](../guide/cache_management.md#writeback)。 +更详细的介绍请见[「客户端写缓存」](../guide/cache.md#writeback)。 ## 读取流程 {#workflow-of-read} diff --git a/docs/zh_cn/reference/command_reference.md b/docs/zh_cn/reference/command_reference.md index d3782a3fb1ad..888bd85c68f9 100644 --- a/docs/zh_cn/reference/command_reference.md +++ b/docs/zh_cn/reference/command_reference.md @@ -168,7 +168,7 @@ juicefs format sqlite3://myjfs.db myjfs --trash-days=0 |-|-| |`--block-size=4096`|块大小,单位为 KiB,默认 4096。4M 是一个较好的默认值,不少对象存储(比如 S3)都将 4M 设为内部的块大小,因此将 JuiceFS block size 设为相同大小,往往也能获得更好的性能。| |`--compress=none`|压缩算法,支持 `lz4`、`zstd`、`none`(默认),启用压缩将不可避免地对性能产生一定影响。| -|`--encrypt-rsa-key=value`|RSA 私钥的路径,查看[数据加密](../security/encrypt.md)以了解更多。| +|`--encrypt-rsa-key=value`|RSA 私钥的路径,查看[数据加密](../security/encryption.md)以了解更多。| |`--encrypt-algo=aes256gcm-rsa`|加密算法 (aes256gcm-rsa, chacha20-rsa) (默认:"aes256gcm-rsa")| |`--hash-prefix`|给每个对象添加 hash 前缀,默认为 false。| |`--shards=0`|如果对象存储服务在桶级别设置了限速(或者你使用自建的对象存储服务,单个桶的性能有限),可以将数据块根据名字哈希分散存入 N 个桶中。该值默认为 0,也就是所有数据存入单个桶。当 N 大于 0 时,`bucket` 需要包含 `%d` 占位符,例如 `--bucket=juicefs-%d`。`--shards` 设置无法动态修改,需要提前规划好用量。| @@ -635,7 +635,7 @@ juicefs mount redis://localhost /mnt/jfs --backup-meta 0 #### 元数据缓存参数 {#mount-metadata-cache-options} -元数据缓存的介绍和使用,详见[「内核元数据缓存」](../guide/cache_management.md#kernel-metadata-cache)及[「客户端内存元数据缓存」](../guide/cache_management.md#client-memory-metadata-cache)。 +元数据缓存的介绍和使用,详见[「内核元数据缓存」](../guide/cache.md#kernel-metadata-cache)及[「客户端内存元数据缓存」](../guide/cache.md#client-memory-metadata-cache)。 |项 | 说明| |-|-| @@ -655,7 +655,7 @@ juicefs mount redis://localhost /mnt/jfs --backup-meta 0 |`--get-timeout=60`|下载一个对象的超时时间;单位为秒 (默认:60)| |`--put-timeout=60`|上传一个对象的超时时间;单位为秒 (默认:60)| |`--io-retries=10`|网络异常时的重试次数 (默认:10)| -|`--max-uploads=20`|上传并发度,默认为 20。对于粒度为 4M 的写入模式,20 并发已经是很高的默认值,在这样的写入模式下,提高写并发往往需要伴随增大 `--buffer-size`, 详见「[读写缓冲区](../guide/cache_management.md#buffer-size)」。但面对百 K 级别的小随机写,并发量大的时候很容易产生阻塞等待,造成写入速度恶化。如果无法改善应用写模式,对其进行合并,那么需要考虑采用更高的写并发,避免排队等待。| +|`--max-uploads=20`|上传并发度,默认为 20。对于粒度为 4M 的写入模式,20 并发已经是很高的默认值,在这样的写入模式下,提高写并发往往需要伴随增大 `--buffer-size`, 详见「[读写缓冲区](../guide/cache.md#buffer-size)」。但面对百 K 级别的小随机写,并发量大的时候很容易产生阻塞等待,造成写入速度恶化。如果无法改善应用写模式,对其进行合并,那么需要考虑采用更高的写并发,避免排队等待。| |`--max-deletes=10`|删除对象的连接数 (默认:10)| |`--upload-limit=0`|上传带宽限制,单位为 Mbps (默认:0)| |`--download-limit=0`|下载带宽限制,单位为 Mbps (默认:0)| @@ -664,15 +664,15 @@ juicefs mount redis://localhost /mnt/jfs --backup-meta 0 |项 | 说明| |-|-| -|`--buffer-size=300`|读写缓冲区的总大小;单位为 MiB (默认:300)。阅读[「读写缓冲区」](../guide/cache_management.md#buffer-size)了解更多。| -|`--prefetch=1`|并发预读 N 个块 (默认:1)。阅读[「客户端读缓存」](../guide/cache_management.md#client-read-cache)了解更多。| -|`--writeback`|后台异步上传对象,默认为 false。阅读[「客户端写缓存」](../guide/cache_management.md#writeback)了解更多。| -|`--upload-delay=0`|启用 `--writeback` 后,可以使用该选项控制数据延迟上传到对象存储,默认为 0 秒,相当于写入后立刻上传。该选项也支持 `s`(秒)、`m`(分)、`h`(时)这些单位。如果在等待的时间内数据被应用删除,则无需再上传到对象存储。如果数据只是临时落盘,可以考虑用该选项节约资源。阅读[「客户端写缓存」](../guide/cache_management.md#writeback)了解更多。| -|`--cache-dir=value`|本地缓存目录路径;使用 `:`(Linux、macOS)或 `;`(Windows)隔离多个路径 (默认:`$HOME/.juicefs/cache` 或 `/var/jfsCache`)。阅读[「客户端读缓存」](../guide/cache_management.md#client-read-cache)了解更多。| +|`--buffer-size=300`|读写缓冲区的总大小;单位为 MiB (默认:300)。阅读[「读写缓冲区」](../guide/cache.md#buffer-size)了解更多。| +|`--prefetch=1`|并发预读 N 个块 (默认:1)。阅读[「客户端读缓存」](../guide/cache.md#client-read-cache)了解更多。| +|`--writeback`|后台异步上传对象,默认为 false。阅读[「客户端写缓存」](../guide/cache.md#writeback)了解更多。| +|`--upload-delay=0`|启用 `--writeback` 后,可以使用该选项控制数据延迟上传到对象存储,默认为 0 秒,相当于写入后立刻上传。该选项也支持 `s`(秒)、`m`(分)、`h`(时)这些单位。如果在等待的时间内数据被应用删除,则无需再上传到对象存储。如果数据只是临时落盘,可以考虑用该选项节约资源。阅读[「客户端写缓存」](../guide/cache.md#writeback)了解更多。| +|`--cache-dir=value`|本地缓存目录路径;使用 `:`(Linux、macOS)或 `;`(Windows)隔离多个路径 (默认:`$HOME/.juicefs/cache` 或 `/var/jfsCache`)。阅读[「客户端读缓存」](../guide/cache.md#client-read-cache)了解更多。| |`--cache-mode value` 1.1|缓存块的文件权限 (默认:"0600")| -|`--cache-size=102400`|缓存对象的总大小;单位为 MiB (默认:102400)。阅读[「客户端读缓存」](../guide/cache_management.md#client-read-cache)了解更多。| -|`--free-space-ratio=0.1`|最小剩余空间比例,默认为 0.1。如果启用了[「客户端写缓存」](../guide/cache_management.md#writeback),则该参数还控制着写缓存占用空间。阅读[「客户端读缓存」](../guide/cache_management.md#client-read-cache)了解更多。| -|`--cache-partial-only`|仅缓存随机小块读,默认为 false。阅读[「客户端读缓存」](../guide/cache_management.md#client-read-cache)了解更多。| +|`--cache-size=102400`|缓存对象的总大小;单位为 MiB (默认:102400)。阅读[「客户端读缓存」](../guide/cache.md#client-read-cache)了解更多。| +|`--free-space-ratio=0.1`|最小剩余空间比例,默认为 0.1。如果启用了[「客户端写缓存」](../guide/cache.md#writeback),则该参数还控制着写缓存占用空间。阅读[「客户端读缓存」](../guide/cache.md#client-read-cache)了解更多。| +|`--cache-partial-only`|仅缓存随机小块读,默认为 false。阅读[「客户端读缓存」](../guide/cache.md#client-read-cache)了解更多。| |`--verify-cache-checksum=full` 1.1|缓存数据一致性检查级别,启用 Checksum 校验后,生成缓存文件时会对数据切分做 Checksum 并记录于文件末尾,供读缓存时进行校验。支持以下级别:
  • `none`:禁用一致性检查,如果本地数据被篡改,将会读到错误数据;
  • `full`(默认):读完整数据块时才校验,适合顺序读场景;
  • `shrink`:对读范围内的切片数据进行校验,校验范围不包含读边界所在的切片(可以理解为开区间),适合随机读场景;
  • `extend`:对读范围内的切片数据进行校验,校验范围同时包含读边界所在的切片(可以理解为闭区间),因此将带来一定程度的读放大,适合对正确性有极致要求的随机读场景。
| |`--cache-eviction value` 1.1|缓存逐出策略 (none 或 2-random) (默认值:"2-random")| |`--cache-scan-interval value` 1.1|扫描缓存目录重建内存索引的间隔 (以秒为单位) (默认:"3600")| diff --git a/docs/zh_cn/security/encrypt.md b/docs/zh_cn/security/encryption.md similarity index 100% rename from docs/zh_cn/security/encrypt.md rename to docs/zh_cn/security/encryption.md