mmcrfs command
Creates a GPFS? file system.
Synopsis
mmcrfs Device {"DiskDesc[;DiskDesc...]" | -F StanzaFile}
[-A {yes | no | automount}] [-B BlockSize] [-D {posix | nfs4}]
[-E {yes | no}] [-i InodeSize] [-j {cluster | scatter}]
[-k {posix | nfs4 | all}] [-K {no | whenpossible | always}]
[-L LogFileSize] [-m DefaultMetadataReplicas]
[-M MaxMetadataReplicas] [-n NumNodes] [-Q {yes | no}]
[-r DefaultDataReplicas] [-R MaxDataReplicas]
[-S {yes | no | relatime}] [-T Mountpoint] [-t DriveLetter]
[-v {yes | no}] [-z {yes | no}] [--filesetdf | --nofilesetdf]
[--inode-limit MaxNumInodes[:NumInodesToPreallocate]]
[--log-replicas LogReplicas] [--metadata-block-size MetadataBlockSize]
[--perfileset-quota | --noperfileset-quota]
[--mount-priority Priority] [--version VersionString]
[--write-cache-threshold HAWCThreshold]
Availability
Available on all IBM Spectrum Scale? editions.
Description
Use the mmcrfs command to create a GPFS file system. The first two parameters must be Device and either DiskDescList or StanzaFile and they must be in that order. The block size and replication factors chosen affect file system performance. A maximum of 256 file systems can be mounted in a GPFS cluster at one time, including remote file systems.
When deciding on the maximum number of files (number of inodes) in a file system, consider that for file systems that will be doing parallel file creates, if the total number of free inodes is not greater than 5% of the total number of inodes, there is the potential for slowdown in file system access. The total number of inodes can be increased using the mmchfs command.
When deciding on a block size for a file system, consider these points:
- Supported block sizes are 64 KiB, 128 KiB, 256 KiB, 512 KiB, 1 MiB, 2 MiB, 4 MiB, 8 MiB, and 16 MiB.
- The GPFS block size determines:
- The minimum disk space allocation unit. The minimum amount of space that file data can occupy is a sub-block. A sub-block is 1/32 of the block size.
- The maximum size of a read or write request that GPFS sends to the underlying disk driver.
- From a performance perspective, it is recommended that you set the GPFS block size to match the application buffer size, the RAID stripe size, or a multiple of the RAID stripe size. If the GPFS block size does not match the RAID stripe size, performance may be severely degraded, especially for write operations. If IBM Spectrum Scale RAID is in use, the block size must equal the vdisk track size. For more information about IBM Spectrum Scale RAID, see IBM Spectrum Scale RAID: Administration.
- In file systems with a high degree of variance in the size of files within the file system, using a small block size would have a large impact on performance when accessing large files. In this kind of system it is suggested that you use a block size of 256 KB (8 KB sub-block). Even if only 1% of the files are large, the amount of space taken by the large files usually dominates the amount of space used on disk, and the waste in the sub-block used for small files is usually insignificant. For further performance information, see the GPFS white papers in the Techdocs Library.
- The effect of block size on file system performance largely depends on the application I/O pattern.
- A larger block size is often beneficial for large sequential read and write workloads.
- A smaller block size is likely to offer better performance for small file, small random read and write, and metadata-intensive workloads.
- The efficiency of many algorithms that rely on caching file data in a GPFS page pool depends more on the number of blocks cached rather than the absolute amount of data. For a page pool of a given size, a larger file system block size would mean fewer blocks cached. Therefore, when you create file systems with a block size larger than the default of 256 KB, it is recommended that you increase the page pool size in proportion to the block size.
- The file system block size must not exceed the value of the GPFS maximum file system block size. The default maximum block size is 1 MiB. If a larger block size is desired, use the mmchconfigcommand to increase the maxblocksize configuration parameter before starting GPFS.
Results
Upon successful completion of the mmcrfs command, these tasks are completed on all GPFS nodes:
- Mount point directory is created.
- File system is formatted.
Prior to GPFS 3.5, the disk information was specified in the form of disk descriptors defined as follows (with the second, third, and sixth fields reserved):
DiskName:::DiskUsage:FailureGroup::StoragePool:
For backward compatibility, the mmcrfs command will still accept the traditional disk descriptors, but their use is discouraged.
Parameters
- Device
- The device name of the file system to be created.
File system names need not be fully-qualified. fs0 is as acceptable as /dev/fs0. However, file system names must be unique within a GPFS cluster. Do not specify an existing entry in /dev.
This must be the first parameter.
- "DiskDesc[;DiskDesc...]"
- A descriptor for each disk to be included. Each descriptor is separated by a semicolon (;). The entire list must be enclosed in quotation marks (‘ or "). The use of disk descriptors is discouraged.
- -F StanzaFile
- Specifies a file containing the NSD stanzas and pool stanzas for the disks to be added to the file system.
NSD stanzas have this format:
%nsd: nsd=NsdName usage={dataOnly | metadataOnly | dataAndMetadata | descOnly} failureGroup=FailureGroup pool=StoragePool servers=ServerList device=DiskName
where:
- nsd=NsdName
- The name of an NSD previously created by the mmcrnsd command. For a list of available disks, issue the mmlsnsd -F command. This clause is mandatory for the mmcrfs command.
- usage={dataOnly | metadataOnly | dataAndMetadata | descOnly}
- Specifies the type of data to be stored on the disk:
- dataAndMetadata
- Indicates that the disk contains both data and metadata. This is the default for disks in the system pool.
- dataOnly
- Indicates that the disk contains data and does not contain metadata. This is the default for disks in storage pools other than the system pool.
- metadataOnly
- Indicates that the disk contains metadata and does not contain data.
- descOnly
- Indicates that the disk contains no data and no file metadata. Such a disk is used solely to keep a copy of the file system descriptor, and can be used as a third failure group in certain disaster-recovery configurations. For more information, see the IBM Spectrum Scale: Administration Guide and search for "Synchronous mirroring utilizing GPFS replication"
- failureGroup=FailureGroup
- Identifies the failure group to which the disk belongs. A failure group identifier can be a simple integer or a topology vector that consists of up to three comma-separated integers. The default is -1, which indicates that the disk has no point of failure in common with any other disk.
GPFS uses this information during data and metadata placement to ensure that no two replicas of the same block can become unavailable due to a single failure. All disks that are attached to the same NSD server or adapter must be placed in the same failure group.
If the file system is configured with data replication, all storage pools must have two failure groups to maintain proper protection of the data. Similarly, if metadata replication is in effect, the system storage pool must have two failure groups.
Disks that belong to storage pools in which write affinity is enabled can use topology vectors to identify failure domains in a shared-nothing cluster. Disks that belong to traditional storage pools must use simple integers to specify the failure group.
- pool=StoragePool
- Specifies the storage pool to which the disk is to be assigned. If this name is not provided, the default is system.
Only the system storage pool can contain metadataOnly, dataAndMetadata, or descOnly disks. Disks in other storage pools must be dataOnly.
- servers=ServerList
- A comma-separated list of NSD server nodes. This clause is ignored by the mmcrfs command.
- device=DiskName
- The block device name of the underlying disk device. This clause is ignored by the mmcrfs command.
Pool stanzas have this format:
%pool: pool=StoragePoolName blockSize=BlockSize usage={dataOnly | metadataOnly | dataAndMetadata} layoutMap={scatter | cluster} allowWriteAffinity={yes | no} writeAffinityDepth={0 | 1 | 2} blockGroupFactor=BlockGroupFactor
where:
- pool=StoragePoolName
- Is the name of a storage pool.
- blockSize=BlockSize
- Specifies the block size of the disks in the storage pool.
- usage={dataOnly | metadataOnly | dataAndMetadata}
- Specifies the type of data to be stored in the storage pool:
- dataAndMetadata
- Indicates that the disks in the storage pool contain both data and metadata. This is the default for disks in the system pool.
- dataOnly
- Indicates that the disks contain data and do not contain metadata. This is the default for disks in storage pools other than the system pool.
- metadataOnly
- Indicates that the disks contain metadata and do not contain data.
- layoutMap={scatter | cluster}
- Specifies the block allocation map type. When allocating blocks for a given file, GPFS first uses a round-robin algorithm to spread the data across all disks in the storage pool. After a disk is selected, the location of the data block on the disk is determined by the block allocation map type. If cluster is specified, GPFS attempts to allocate blocks in clusters. Blocks that belong to a particular file are kept adjacent to each other within each cluster. If scatter is specified, the location of the block is chosen randomly.
The cluster allocation method may provide better disk performance for some disk subsystems in relatively small installations. The benefits of clustered block allocation diminish when the number of nodes in the cluster or the number of disks in a file system increases, or when the file system‘s free space becomes fragmented. The cluster allocation method is the default for GPFS clusters with eight or fewer nodes and for file systems with eight or fewer disks.
The scatter allocation method provides more consistent file system performance by averaging out performance variations due to block location (for many disk subsystems, the location of the data relative to the disk edge has a substantial effect on performance). This allocation method is appropriate in most cases and is the default for GPFS clusters with more than eight nodes or file systems with more than eight disks.
The block allocation map type cannot be changed after the storage pool has been created.
- allowWriteAffinity={yes | no}
- Indicates whether the File Placement Optimizer (FPO) feature is to be enabled for the storage pool. For more information on FPO, see File Placement Optimizer
- writeAffinityDepth={0 | 1 | 2}
- Specifies the allocation policy to be used by the node writing the data.
A write affinity depth of 0 indicates that each replica is to be striped across the disks in a cyclical fashion with the restriction that no two disks are in the same failure group. By default, the unit of striping is a block; however, if the block group factor is specified in order to exploit chunks, the unit of striping is a chunk.
A write affinity depth of 1 indicates that the first copy is written to the writer node. The second copy is written to a different rack. The third copy is written to the same rack as the second copy, but on a different half (which can be composed of several nodes).
A write affinity depth of 2 indicates that the first copy is written to the writer node. The second copy is written to the same rack as the first copy, but on a different half (which can be composed of several nodes). The target node is determined by a hash value on the fileset ID of the file, or it is chosen randomly if the file does not belong to any fileset. The third copy is striped across the disks in a cyclical fashion with the restriction that no two disks are in the same failure group. The following conditions must be met while using a write affinity depth of 2 to get evenly allocated space in all disks:
- The configuration in disk number, disk size, and node number for each rack must be similar.
- The number of nodes must be the same in the bottom half and the top half of each rack.
This behavior can be altered on an individual file basis by using the --write-affinity-failure-group option of the mmchattr command.
This parameter is ignored if write affinity is disabled for the storage pool.
- blockGroupFactor=BlockGroupFactor
- Specifies how many file system blocks are laid out sequentially on disk to behave like a single large block. This option only works if --allow-write-affinity is set for the data pool. This applies only to a new data block layout; it does not migrate previously existing data blocks.
See the section about File Placement Optimizer in the IBM Spectrum Scale: Administration Guide.
- -A {yes | no | automount}
- Indicates when the file system is to be mounted:
- yes
- When the GPFS daemon starts. This is the default.
- no
- Manual mount.
- automount
- On non-Windows nodes, when the file system is first accessed. On Windows nodes, when the GPFS daemon starts.
- -B BlockSize
- Specifies the size of data blocks. Must be 64 KiB, 128 KiB, 256 KiB (the default), 512 KiB, 1 MiB, 2 MiB, 4 MiB, 8 MiB, or 16 MiB. Specify this value with the character K or M, for example 512K.
- -D {nfs4 | posix}
- Specifies whether a deny-write open lock will block writes, which is expected and required by NFS V4. File systems supporting NFS V4 must have -D nfs4 set. The option -D posix allows NFS writes even in the presence of a deny-write open lock. If you intend to export the file system using NFS V4 or Samba, you must use -D nfs4. For NFS V3 (or if the file system is not NFS exported at all) use -D posix. The default is -D nfs4.
- -E {yes | no}
- Specifies whether to report exact mtime values (-E yes), or to periodically update the mtime value for a file system (-E no). If it is more desirable to display exact modification times for a file system, specify or use the default -E yes.
- -i InodeSize
- Specifies the byte size of inodes. Supported inode sizes are 512, 1024, and 4096 bytes. The default is 4096.
- -j {cluster | scatter}
- Specifies the default block allocation map type to be used if layoutMap is not specified for a given storage pool.
- -k {posix | nfs4 | all}
- Specifies the type of authorization supported by the file system:
- posix
- Traditional GPFS ACLs only (NFS V4 and Windows ACLs are not allowed). Authorization controls are unchanged from earlier releases.
- nfs4
- Support for NFS V4 and Windows ACLs only. Users are not allowed to assign traditional GPFS ACLs to any file system objects (directories and individual files).
- all
- Any supported ACL type is permitted. This includes traditional GPFS (posix) and NFS V4 NFS V4 and Windows ACLs (nfs4).
The administrator is allowing a mixture of ACL types. For example, fileA may have a posix ACL, while fileB in the same file system may have an NFS V4 ACL, implying different access characteristics for each file depending on the ACL type that is currently assigned. The default is -k all.
Avoid specifying nfs4 or all unless files will be exported to NFS V4 or Samba clients, or the file system will be mounted on Windows. NFS V4 and Windows ACLs affect file attributes (mode) and have access and authorization characteristics that are different from traditional GPFS ACLs.
- -K {no | whenpossible | always}
- Specifies whether strict replication is to be enforced:
- no
- Indicates that strict replication is not enforced. GPFS will try to create the needed number of replicas, but will still return EOK as long as it can allocate at least one replica.
- whenpossible
- Indicates that strict replication is enforced provided the disk configuration allows it. If the number of failure groups is insufficient, strict replication will not be enforced. This is the default value.
- always
- Indicates that strict replication is enforced.
For more information, see the topic "Strict replication" in the IBM Spectrum Scale: Problem Determination Guide.
- -L LogFileSize
- Specifies the size of the internal log files. The LogFileSize specified must be a multiple of the metadata block size. The default size is 4 MB or the metadata block size, whichever is larger. The minimum size is 256 KB and the maximum size is 1024 MB. Specify this value with the K or M character, for example: 8M.
In most cases, allowing the log file size to default works well. An increased log file size is useful for file systems that have a large amount of metadata activity, such as creating and deleting many small files or performing extensive block allocation and deallocation of large files.
- -m DefaultMetadataReplicas
- Specifies the default number of copies of inodes, directories, and indirect blocks for a file. Valid values are 1, 2, and (for GPFS V3.5.0.7 and later) 3. This value cannot be greater than the value of MaxMetadataReplicas. The default is 1.
- -M MaxMetadataReplicas
- Specifies the default maximum number of copies of inodes, directories, and indirect blocks for a file. Valid values are 1, 2, and (for GPFS V3.5.0.7 and later) 3. This value cannot be less than the value of DefaultMetadataReplicas. The default is 2.
- -n NumNodes
- The estimated number of nodes that will mount the file system in the local cluster and all remote clusters. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value can be changed after the file system has been created.
When you create a GPFS file system, you might want to overestimate the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations (For more information, see GPFS architecture ). If you are sure there will never be more than 64 nodes, allow the default value to be applied. If you are planning to add nodes to your system, you should specify a number larger than the default.
- -Q {yes | no}
- Activates quotas automatically when the file system is mounted. The default is -Q no. Issue the mmdefedquota command to establish default quota values. Issue the mmedquotacommand to establish explicit quota values.
To activate GPFS quota management after the file system has been created:
- Mount the file system.
- To establish default quotas:
- Issue the mmdefedquota command to establish default quota values.
- Issue the mmdefquotaon command to activate default quotas.
- To activate explicit quotas:
- Issue the mmedquota command to activate quota values.
- Issue the mmquotaon command to activate quota enforcement.
- -r DefaultDataReplicas
- Specifies the default number of copies of each data block for a file. Valid values are 1, 2, and (for GPFS V3.5.0.7 and later) 3. This value cannot be greater than the value of MaxDataReplicas. The default is 1.
- -R MaxDataReplicas
- Specifies the default maximum number of copies of data blocks for a file. Valid values are 1, 2, and (for GPFS V3.5.0.7 and later) 3. This value cannot be less than the value of DefaultDataReplicas. The default is 2.
- -S {yes | no | relatime}
- Suppresses the periodic updating of the value of atime as reported by the gpfs_stat(), gpfs_fstat(), stat(), and fstat() calls. The default value is -S no. Specifying -S yes for a new file system results in reporting the time the file system was created.
If relatime is specified, the file access time is updated only if the existing access time is older than the value of the atimeDeferredSeconds configuration attribute or the existing file modification time is greater than the existing access time.
- -T MountPoint
- Specifies the mount point directory of the GPFS file system. If it is not specified, the mount point will be set to DefaultMountDir/Device. The default value for DefaultMountDir is /gpfs but, it can be changed with the mmchconfig command.
- -t DriveLetter
- Specifies the drive letter to use when the file system is mounted on Windows.
- -v {yes | no}
- Verifies that specified disks do not belong to an existing file system. The default is -v yes. Specify -v no only when you want to reuse disks that are no longer needed for an existing file system. If the command is interrupted for any reason, use -v no on the next invocation of the command.
Important
Using -v no on a disk that already belongs to a file system will corrupt that file system. This will not be noticed until the next time that file system is mounted.
- -z {yes | no}
- Enable or disable DMAPI on the file system. Turning this option on will require an external data management application such as IBM Spectrum Protect? hierarchical storage management (HSM) before the file system can be mounted. The default is -z no. For more information on DMAPI for GPFS, see GPFS-specific DMAPI events.
- --filesetdf
- Specifies that when quotas are enforced for a fileset, the numbers reported by the df command are based on the quotas for the fileset (rather than the entire file system). This option affects the df command behavior only on Linux nodes.
- --nofilesetdf
- Specifies that when quotas are enforced for a fileset, the numbers reported by the df command are based on the quotas for the entire file system (rather than individual filesets. This is the default.
- --inode-limit MaxNumInodes[:NumInodesToPreallocate]
- Specifies the maximum number of files in the file system.
For file systems on which you intend to create files in parallel, if the total number of free inodes is not greater than 5% of the total number of inodes, file system access might slow down. Take this into consideration when creating your file system.
The parameter NumInodesToPreallocate specifies the number of inodes that the system will immediately preallocate. If you do not specify a value for NumInodesToPreallocate, GPFS will dynamically allocate inodes as needed.
You can specify the NumInodes and NumInodesToPreallocate values with a suffix, for example 100K or 2M. Note that in order to optimize file system operations, the number of inodes that are actually created may be greater than the specified value.
- --log-replicas LogReplicas
- Specifies the number of recovery log replicas. Valid values are 1, 2, 3, or DEFAULT. If not specified, or if DEFAULT is specified, the number of log replicas is the same as the number of metadata replicas currently in effect for the file system.
This option is only applicable if the recovery log is stored in the system.log storage pool.
- --metadata-block-size MetadataBlockSize
- Specifies the block size for the system storage pool, provided its usage is set to metadataOnly. Valid values are the same as those listed for -B BlockSize.
- --perfileset-quota
- Sets the scope of user and group quota limit checks to the individual fileset level (rather than the entire file system).
- --noperfileset-quota
- Sets the scope of user and group quota limit checks to the entire file system (rather than per individual fileset). This is the default.
- --mount-priority Priority
- Controls the order in which the individual file systems are mounted at daemon startup or when one of the all keywords is specified on the mmmount command.
File systems with higher Priority numbers are mounted after file systems with lower numbers. File systems that do not have mount priorities are mounted last. A value of zero indicates no priority. This is the default.
- --version VersionString
- Enable only the file system features that are compatible with the specified release. The lowest allowed VersionString is 3.1.0.0.
The default value is the current product version, which enables all currently available features but prevents nodes that are running earlier GPFS releases from accessing the file system.Windows nodes can mount only file systems that are created with GPFS 3.2.1.5 or later.
- --profile ProfileName
- Specifies a predefined profile of attributes to be applied. System-defined profiles are located in /usr/lpp/mmfs/profiles/. All the file system attributes listed under a file system stanza will be changed as a result of this command. The following system-defined profile names are accepted:
- gpfsProtocolDefaults
- gpfsProtocolRandomIO
The file system attributes will be applied at file system creation. If there is a current profile in place on the system (use mmlsconfig profile to check), then the file system will be created with those attributes and values listed in the profile‘s file system stanza. The default is to use whatever attributes and values associate with the current profile setting.
Furthermore, any and all file system attributes from an installed profile file can be by-passed with ‘--profile=userDefinedProfile‘, where the userDefinedProfile is a profile file has been installed by the user in /var/mmfs/etc/.
User-defined profiles consist of the following stanzas:
%cluster: [CommaSeparatedNodesOrNodeClasses:]ClusterConfigurationAttribute=Value ... %filesystem: FilesystemConfigurationAttribute=Value ...
A sample file can be found in /usr/lpp/mmfs/samples/sample.profile. See the mmchconfig command for a detailed description of the different configuration parameters.
User-defined profiles should be used only by experienced administrators. When in doubt, use the mmchconfig command instead.
- --write-cache-threshold HAWCThreshold
- Specifies the maximum length (in bytes) of write requests that will be initially buffered in the highly-available write cache before being written back to primary storage. Only synchronous write requests are guaranteed to be buffered in this fashion.
A value of 0 disables this feature. 64K is the maximum supported value. Specify in multiples of 4K.
This feature can be enabled or disabled at any time (the file system does not need to be unmounted). For more information about this feature, see Highly-available write cache (HAWC).
Exit status
- 0
- Successful completion.
- nonzero
- A failure has occurred.
Security
You must have root authority to run the mmcrfs command.
The node on which the command is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more information, see Requirements for administering a GPFS file system.
Examples
This example shows how to create a file system named gpfs1 using three disks, each with a block size of 512 KB, allowing metadata and data replication to be 2, turning quotas on, and creating /gpfs1 as the mount point. The NSD stanzas describing the three disks are assumed to have been placed in file/tmp/freedisks. To complete this task, issue the command:
mmcrfs gpfs1 -F /tmp/freedisks -B 512K -m 2 -r 2 -Q yes -T /gpfs1
The system displays output similar to:
The following disks of gpfs1 will be formatted on node c21f1n13:
hd2n97: size 1951449088 KB
hd3n97: size 1951449088 KB
hd4n97: size 1951449088 KB
Formatting file system ...
Disks up to size 16 TB can be added to storage pool ‘system‘.
Creating Inode File
Creating Allocation Maps
Creating Log Files
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool ‘system‘
19 % complete on Tue Feb 28 18:03:20 2012
42 % complete on Tue Feb 28 18:03:25 2012
62 % complete on Tue Feb 28 18:03:30 2012
79 % complete on Tue Feb 28 18:03:35 2012
96 % complete on Tue Feb 28 18:03:40 2012
100 % complete on Tue Feb 28 18:03:41 2012
Completed creation of file system /dev/gpfs1.
mmcrfs: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
See also
- mmchfs command
- mmdelfs command
- mmdf command
- mmedquota command
- mmfsck command
- mmlsfs command
- mmlspool command
Location
/usr/lpp/mmfs/bin
Parent topic:Command reference
https://www.ibm.com/support/knowledgecenter/zh/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1adm_mmcrfs.htm