list files recursively in large filesystem


For most of cases, find command can do this perfectly, but for large filesystems, especially cluster file system like dCache, gpfs, luster.. Possibly you need to find another way to list the files in your namespace. One is because of security concerns of find, the other is Birthday problem.

Here is a simple bash script I've been using for years.


#!/bin/bash

foreachd() {
  for file in $1/*
  do
        if [ -f $file ]
        then
                let cnt_file=$cnt_file+1
                echo $file
        elif [ -d $file ]
          then
        echo $file
                foreachd $file
        fi
  done
}

if [ $# -lt 1 ]
then
    echo "USAGE: find.sh <dir>"
else
  id=`whoami`
  if [ "$id" = "root" ] ; then
    foreachd $1
  else
    echo "It must be run by root"
  fi
fi


The short script prints out all files under a directory recursively, with a little tweaking, you can let it only prints out sub directories.

Here is one example:

It's also quite common, in large cluster file systems, as time goes, there left lots of empty directories, here a script to find them out.

the method is that find empty directories that not accessed in 31 days

#!/bin/bash
###############################################################################
# RETURNS:      0  - OK
#               1  - some error occured
#
# USAGE:        ./find_namespace_emptydir.sh <dir>
# WHERE:
# NOTES: at 100Hz, find section takes ~6-10% CPU, at 50Hz, verification takes ~20-30% CPU. It varies on different host and platforms
# CHANGES:
#
###############################################################################

home_dir="/home/trteam/namespace_emptydir"
emptydir=$home_dir"/emptydir.lst"
emptydir_verified=$home_dir"/emptydir_verified.lst"
rm -f $emptydir
rm -f $emptydir_verified
rate=10
let sleep_microsec=1000000/$rate
cnt=0
cnt_empty=0
break_cnt=100
foreachd(){
for file in $1/*
do
        #echo $file
        if [ -d $file ]
        then
                echo -n "."
                usleep $sleep_microsec
                let cnt=$cnt+1
                let break_cnt=$break_cnt-1
                if [ $break_cnt -eq 0 ] ; then
                  echo "$cnt directories have been searched, $cnt_empty are empty"
                  break_cnt=100
                fi
                cd $file
                num_files=`ls |wc -l`
                if [ $num_files -eq 0 ]; then
                  echo $file >>$emptydir
                  let cnt_empty=$cnt_empty+1
                else
                  foreachd $file
                fi
        fi
done
}
if [ $# -lt 1 ]
  then
    echo "USAGE:  find_namespace_emptydir.sh <dir>"
    exit 0
  else
    foreachd $1
    echo "$cnt directories have been searched, $cnt_empty are empty"
fi

## Verfication #1. check if it's empty, #2. check if it was created one month(31 days) ago
cd $home_dir
##
if [ -f $emptydir ]
 then
    rate=10
    let sleep_microsec=1000000/$rate
    break_cnt=100
    cnt=0
    emptyfor31days_cnt=0
    total_cnt=`cat $emptydir |wc -l`
    unix_time_now=`date +%s`
    for file in `cat $emptydir`
    {
        echo -n "."
        usleep $sleep_microsec
        num_subfiles=`ls $file |wc -l`
        if [ $num_subfiles -eq 0 ] ; then
            if [ -d $file ] ; then
               date_stamp=`ls -d --full-time $file |awk '{print $6}'`
               unix_time=`date -d $date_stamp +%s`
               let time_diff=$unix_time_now-$unix_time
               if [ $time_diff -gt 2678400 ] ; then
                  echo $file >>$emptydir_verified
              let emptyfor31days_cnt=$emptyfor31days_cnt+1
               fi
            fi
        fi
        let break_cnt=$break_cnt-1
        let cnt=$cnt+1
        if [ $break_cnt -eq 0 ] ; then
          echo "$cnt of $total_cnt files have been verified, $emptyfor31days_cnt have been empty for more than 31 days"
          break_cnt=100
        fi
    }
    echo "$cnt of $total_cnt files have been verified, $emptyfor31days_cnt have been empty for more than 31 days"
fi

In addition to finding out the empty directories, The one below is the script to clear those empty directories out.

#!/bin/bash
###############################################################################
# FILE:         delete_namespace_emptydir.sh
# PURPOSE:
#
# RETURNS:      0  - OK
#               1  - some error occured
#
# USAGE:        ./delete_namespace_emptydir.sh
# WHERE:
# NOTES: at 5Hz to minize performance impact to namespace, and uncomment the line with rm when you are sure what you are doing.
# CHANGES:
#
###############################################################################

home_dir="/home/trteam/namespace_emptydir"
emptydir_verified=$home_dir"/emptydir_verified.lst"
emptydir_deleted=$home_dir"/emptydir_deleted.lst"

id=`whoami`
if [ "$id" != "root" ] ; then
  echo "It must be run by root"
  exit 0
fi

rm -f $emptydir_deleted
## deletion #1. check if it's empty, #2. check if it was created one month(31 days) ago, then delete
cd $home_dir
##
if [ -f $emptydir_verified ]
 then
    rate=5
    let sleep_microsec=1000000/$rate
    break_cnt=100
    cnt=0
    total_cnt=`cat $emptydir_verified |wc -l`
    unix_time_now=`date +%s`
    for file in `cat $emptydir_verified`
    {
        echo -n "."
        usleep $sleep_microsec
        num_subfiles=`ls $file |wc -l`
        if [ $num_subfiles -eq 0 ] ; then
            if [ -d $file ] ; then
               date_stamp=`ls -d --full-time $file |awk '{print $6}'`
               unix_time=`date -d $date_stamp +%s`
               let time_diff=$unix_time_now-$unix_time
               if [ $time_diff -gt 2678400 ] ; then
                  rm -rf $file
                  echo $file >>$emptydir_deleted
                  let cnt=$cnt+1
               fi
            fi
        fi
        let break_cnt=$break_cnt-1
        if [ $break_cnt -eq 0 ] ; then
          echo "$cnt of $total_cnt files have been deleted"
          break_cnt=100
        fi
    }
    echo "$cnt of $total_cnt files have been deleted"
fi



The whole process can be done just in a simple find command. But, considering security concerns, which I mentioned in find command useful examples,  I use the scripts I posted above.