Using BerkeleyDB in perl.
There are lots of DB functions in perl. In this article, I'll only show you two different DB, GDBM and Berkeley(Btree), I could do a more later to have some comparisom
In this description from perldoc.perl.org

DB_File - Perl5 access to Berkeley DB version 1.x

Berkeley DB is a C library which provides a consistent interface to a number of database formats. DB_File provides an interface to all three of the database types currently supported by Berkeley DB.

The file types are:

    DB_HASH

    This database type allows arbitrary key/value pairs to be stored in data files. This is equivalent to the functionality provided by other hashing packages like DBM, NDBM, ODBM, GDBM, and SDBM. Remember though, the files created using DB_HASH are not compatible with any of the other packages mentioned.

    A default hashing algorithm, which will be adequate for most applications, is built into Berkeley DB. If you do need to use your own hashing algorithm it is possible to write your own in Perl and have DB_File use it instead.
    DB_BTREE

    The btree format allows arbitrary key/value pairs to be stored in a sorted, balanced binary tree.

    As with the DB_HASH format, it is possible to provide a user defined Perl routine to perform the comparison of keys. By default, though, the keys are stored in lexical order.
    DB_RECNO

    DB_RECNO allows both fixed-length and variable-length flat text files to be manipulated using the same key/value pair interface as in DB_HASH and DB_BTREE. In this case the key will consist of a record (line) number.

Here is my example, it doesn't make different if the mapping file is small, but there is big different if number of records in DB is large, in my case, >10M, BerkeleyDB is much faster than GDBM.

Declare db

use DB_File ;
our $db_type="BerkeleyDB";
our $dbfile="mydbfilename";

Initialize db file

sub db_init
{
  unlink $dbfile;
}

Open the db file

sub db_open
{
  my $mode=$_[0];
  my $err=_FALSE;
  if ( $db_type eq "GDBM" ) {
    dbmopen(%idof,$dbfile,0666) or $err=_TRUE;
  }
  elsif( $db_type eq "BerkeleyDB" ) {
    if( $mode eq "rdwr" ) {
      tie (%idof, "DB_File", $dbfile, O_RDWR|O_CREAT, 0666, $DB_BTREE ) or $err=_TRUE;
    }
    elsif( $mode eq "rdonly" ) {
      tie (%idof, "DB_File", $dbfile, O_RDONLY, 0666, $DB_BTREE ) or $err=_TRUE;
    }
    else{
      $err=_TRUE;
    }
  }
  else {
    $err=_TRUE;
  }
  return($err);
}

Close db file

sub db_close
{
  my $err=_FALSE;
  if ( $db_type eq "GDBM" ) {
    dbmclose(%pnfsidof);
  }
  elsif( $db_type eq "BerkeleyDB" ) {
    untie %pnfsidof;
  }
  else {
    $err=_TRUE;
  }
  return(0);
}

Examples:

Store mapping into db file, consumed time is counted. path can be urls or regular filepath. There is an addition function md5_base64 called here to have path's crc stored instead of path itself, to save space, also quicker for searching later. 'reload' parameter controls wether the whole db freshly reloaded or just incremental adding.

Store mapping

sub transfer_path_id_maping_into_db
{
  my $input=$_[0];
  my $reload=$_[1];
  my $res = 0 ;
  my $total_id=0;
  my $j=0;
  my $err=_FALSE;
  my ($inputtext,$id,$s_time,$e_time,$path,$path_crc,%map);
  $s_time=time();
  $mode="rdwr";
  ($err)=db_open($mode);
  if(! $err ) {
    printf("reading input file at $s_time ...\n");
    open( FILE, $input )  or printf( "cannot open $input: $!\n" );
    for (;;) {
      undef $!;
      unless (defined( $inputtext = readline(*FILE))) {
        last if eof  ;
        last if $!;
      }
      if( ! ( $inputtext =~ /unexpected mapping string/ ) ) {
       printf("not expected input ...\n") if $debug;
       next;
      }
      ($path,$id)=split(/\|/,$inputtext);
      $path =~ s/^\s+//; #remove leading spaces
      $path =~ s/\s+$//; #remove trailing spaces
      $id =~ s/^\s+//; #remove leading spaces
      $id =~ s/\s+$//; #remove trailing spaces
      #$debug and  printf("path:%s id:%s\n",$path,$id);
      if ($path =~ /aaaa/ or $path =~ /bbbb/ or $path =~ /cccc/ or $path =~ /dddd/ or $path =~ /eeee/ or $path =~ /ffff/ or $path =~ /gggg/) {
        if( length($pnfsid) >=24 ) {
                $path_crc=md5_base64($path);
                if (! exists $idof{$path_crc} ) {
                $map{$path_crc} = $id;
                }
                else {
                $map{$path_crc} = "duplicated_crc";
                }
                $total_id++;
                $j++;
                if($j == 1000) {
                        printf("%d rows have been loaded into db\n",$total_id) if $debug;
                        $j=0;
                }
        }
      }
    }
  }
  close( FILE ) ;
  $e_time=time();
  printf("time elapsed %d secs for md5 compute %d mapping\n",$e_time-$s_time,$total_id);
  if ($reload eq "yes") {
    foreach $path_crc (sort keys %map) {
        $idof{$path_crc} = $map{$path_crc};
    }
  }
  else {
        foreach $path_crc (sort keys %map) {
          if (! exists $idof{$path_crc} ) {
                $idof{$path_crc} = $id;
          }
          else {
                $idof{$path_crc} = "duplicated_crc";
          }
        }
  }
  db_close();
  $e_time=time();
  printf("time elapsed %d secs for loading %d mapping\n",$e_time-$s_time,$total_id);
}

Once mapping is stored in BerkeleyDB, searching is easy:

Searching the mapping

  $s_time=time();
    if( ! -f $lockfile )  {
      ($err)=db_open($mode);
      if( ! $err ) {
        $s_time=time();
        $path_crc=md5_base64($path);
        if( exists($idof{$path_crc}) ) {
          $id=$idof{$path_crc};
          if( length ($id ) >= 24 and $id ne "duplicated_crc") {
            printf("path:'%s' id:'%s'\n",$path,$id) if $debug;
          }
        }
      }
      db_close();
    }
    $e_time=time();
    $time_esp=$e_time-$s_time;
    printf("time elapsed:%d\n",$time_esp) if $debug;