Both xfs_copy and xfsdump can be used for xfs backup, the question is are they same? What's different?
If you cannot mount an XFS file system, get i/o error or 'mount: Structure needs cleaning', then you will have to repair xfs filesystem after underneath storage problem fixed.
blkid useful examples
Similar to lsblk, the blkid is a command-line utility to locate/print block device attributes. It also can be used to search a blockdev by its parameters Here are some blkid command examples on linux.
As you can tell from the commands name:
xfs_metadump is a utility to copy XFS filesystem metadata to a file
xfs_mdrestore is an utility to restore XFS metadata from file to XFS filesystem
Good News from NASA
Today NASA announced a code-speedup contest called the High Performance Fast Computing Challenge (HPFCC). The competition will reward qualified contenders who can manipulate the agency’s FUN3D design software so it runs ten to 10,000 times faster on the Pleiades supercomputer without any decrease in accuracy.
Keep data on the disk is pretty expensive, especially for long term archiving data. Tape is is the best place for these data. LTO technology is one of solution to meet current big data requirement. In addition to that, LTO7 actually is faster than disk in streaming. Thus, tape is not just ideal for archiving, it also can be used as tertiary storage solution.
Here are key facts about LTO generation 7 tape technology
Mdadm is an utility to manage Software array on Linux. Here is an example show you how to fix an array that is inactive state. For more infor about mdadm, see Mdadm, a tool for software array on linux
Quite often, in shell environment, you want to run one program in multiple processes to either fully utilize you CPU resources or other purpose. For example, you wan to do checksum or some checks to large number of files, using same program, or script. How to run one program in parallel to speed up the whole processing time?
In the begining time, I used the way like this
In most of circumstances, parity and media scan are background processes, user and storage admins won't need to take care of them. However, things could go wrong in your unexpected way. Here is my story.
Like conventional filesystem, Hadoop HDFS also offer filesystem consistency and integration check. Close enough, the command is also called fsck, this can be used to identify corrupt files on Hadoopy HDFS
Hadoop HDFS blocks allocation strategy tries to spread new blocks evenly among all the datanodes. In a large scale cluster, each node has different capacity, while quite often you need decommission some old nodes, also adding new nodes for more capacity. How Hadoop balance the space usage on all data nodes?
And, how to protect new node from being overloaded and being bottleneck due to all the new blocks would be allocated and read from that datanode?
blktrace is a block layer IO tracing mechanism which provides detailed information about request queue operations up to user space.
What is Apache Hive ? The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Read Hive Official site
lsblk lists information about all or the specified block devices. The lsblk command reads the sysfs filesystem to gather information.
The command prints all block devices (except RAM disks) in a tree-like format by default. Use lsblk --help to get a list of all available columns.
If you want to check block device attributes, use blkid command.
Once you have Hadoop setup, either single setup or cluster setup, the first thing you want to try is to create files and directories on Hadoop Distributed File System (HDFS), surely you can find whole HDFS commands reference.
Below are some examples for mostly used HDFS commands for files and directories management. Hadoop 2.7.3 on SL7(RHEL7/CentOS7)
I have another article discussed about conventional RAID, Compared to conventional RAID, some moden cluster file systems like GPFS™ , Ceph, gluster etc.. they use declusterd array.
Their RAID implement a sophisticated data and spare space disk layout scheme that allows for arbitrarily sized disk arrays while also reducing the overhead to clients when recovering from disk failures. To accomplish this, the cluster RAID uniformly spreads or declusters user data, redundancy information, and spare space across all the disks of a declustered array.
This article is based on IBM GPFS decluster knowledge document, I made is more general.
The hdfs dfsadmin command supports a few HDFS administration related operations. The command used to be bin/hadoop dfsadmin, now deprecated.)， In this article, we'll go through dfsadmin commands with examples
DDP stands for Dynamic Disk Pooling(DDP), also known as Distribute RAID, or D-RAID. A hardware declustered array implementation. Vendor's like IBM, DELL, NetApp, EMC etc.. all support this type of array on some of their storage products.
Dynamic Disk Pooling (DDP) dynamically distributes data, spare capacity, and protection information across a pool of disk drives. DDP improves the time and performance of traditional RAID arrays.
iostat is one of mostly used performance tools for troubleshooting on Linux, it reports CPU statistics and input/output statistics for devices, partitions and NFS.
Here are some useful command examples:
Here are some scsi_id examples on RHEL6,SL6,CentOS6. There are some updates on RHEL7, see scsi_id examples on RHEL7, SL7, CentOS7
scsi_id is a tool to retrieve and generate a unique SCSI identifier, it's being widly used lots of system admin tools, primarily used by udev.
Using the systemctl Command
The most important command for managing services on a RHEL 7 (systemd) system is the systemctl command. Here are some examples of the systemctl command (using the nfs-server service as an example) and a few other commands that you may find useful:
Hard Disk (Hard Drive) Performance – transfer rates, latency and seek times
The performance of a hard disk is very important to the overall speed of the system – a slow hard disk having the potential to hinder a fast processor like no other system component – and the effective speed of a hard disk is determined by a number of factors.
Using journalctl on RHEL7/CentOS7/SL7
Same as systemctl, journalctl is also a systemd utility. It’s used for querying and displaying messages from the systemd journal. Journalctl is the standard way to read messages from one or more binary journal binary files.
In the following examples, I will show you how itl can be used with some of its parameters. Each parameter can be used on its own or combined with other parameters to further narrow the scope of search. To get a full listing of journalctl options, you can visit the journalctl man page.
This is a LVM command reference for quick look up, if you want more detail info about each command, see Red hat Logical Volume administration
cstore-fdw is an open sourced Postgres extension that uses the Optimized Row Columnar (ORC) format for its data layout. Use Protobuf-c for serializing and deserializing table metadata. It's dedicated for data analyzing purpose.
Postgres works with it through Postgres foreign data wrapper APIs, so it's transparent to user end.
You can find other articles like the introduction of cstore-fdw and how to setup/config cstore-fdw, at here I'll just give you a glance at technical layer what and how it works in details from handson experience.
I/O aligment is one of the key point for storage performance, As part of enhancements of the SCSI and ATA standards, which now allow storage devices to indicate their preferred (and in some cases, required) I/O alignment and I/O size.
Here are some SQL examples to get postgres database,table index sizes, all tested for postgres 9.1, 9.2, and 9.3.
In the other article, I iniroduced hugepages and transparent hugepages, here I'm going to show you how to enable, use, monitor, and disable hugepages.
Upgraded Postgres to 9.4 recently, want to check if my handy SQLs still works for pg9.4, Here is the one to get postgres database,table index sizes, actually it these SQLs still work since postgres 9.1, 9.2, and 9.3.