ocehb: (Default)
ocehb ([personal profile] ocehb) wrote2011-03-15 01:58 pm
Entry tags:

сравнение скорости регулярного выражения и поиска в хеше

используем конструкцию (??{ code }); perldoc perlre



# for i in {1..10}; do print "Size: $[ 2**$i ]";
  perl -MBenchmark=timethese -e '$N = int ($ARGV[0]/2);
    @a = map { int rand(1024000) } 0..$ARGV[0];
    @hash{@a} = (1)x($#a+1);
    $re = join "|",keys %hash;
    timethese ($ARGV[1],
      { A => sub { return $a[$N] =~ m#^(\w+)(??{ exists $hash{$1} ? "" : "N/A" })$# },
        B => sub { return $a[$N] =~ m#^$re$# } } )' $[ 2**$i ] 1000000; done
Size: 2
Benchmark: timing 1000000 iterations of A, B...
         A:  3 wallclock secs ( 4.53 usr +  0.02 sys =  4.55 CPU) @ 219780.22/s (n=1000000)
         B:  0 wallclock secs ( 0.63 usr +  0.00 sys =  0.63 CPU) @ 1587301.59/s (n=1000000)
Size: 4
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.54 usr +  0.00 sys =  4.54 CPU) @ 220264.32/s (n=1000000)
         B:  2 wallclock secs ( 1.10 usr +  0.00 sys =  1.10 CPU) @ 909090.91/s (n=1000000)
Size: 8
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.49 usr +  0.02 sys =  4.51 CPU) @ 221729.49/s (n=1000000)
         B:  2 wallclock secs ( 1.16 usr +  0.02 sys =  1.18 CPU) @ 847457.63/s (n=1000000)
Size: 16
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.86 usr +  0.01 sys =  4.87 CPU) @ 205338.81/s (n=1000000)
         B:  2 wallclock secs ( 1.24 usr +  0.00 sys =  1.24 CPU) @ 806451.61/s (n=1000000)
Size: 32
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.53 usr +  0.01 sys =  4.54 CPU) @ 220264.32/s (n=1000000)
         B:  2 wallclock secs ( 1.44 usr +  0.00 sys =  1.44 CPU) @ 694444.44/s (n=1000000)
Size: 64
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.53 usr + -0.01 sys =  4.52 CPU) @ 221238.94/s (n=1000000)
         B:  1 wallclock secs ( 1.76 usr +  0.01 sys =  1.77 CPU) @ 564971.75/s (n=1000000)
Size: 128
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.45 usr +  0.03 sys =  4.48 CPU) @ 223214.29/s (n=1000000)
         B:  3 wallclock secs ( 2.45 usr +  0.00 sys =  2.45 CPU) @ 408163.27/s (n=1000000)
Size: 256
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.54 usr +  0.01 sys =  4.55 CPU) @ 219780.22/s (n=1000000)
         B:  5 wallclock secs ( 3.88 usr +  0.02 sys =  3.90 CPU) @ 256410.26/s (n=1000000)
Size: 512
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.50 usr +  0.02 sys =  4.52 CPU) @ 221238.94/s (n=1000000)
         B:  7 wallclock secs ( 6.64 usr +  0.01 sys =  6.65 CPU) @ 150375.94/s (n=1000000)
Size: 1024
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.56 usr + -0.01 sys =  4.55 CPU) @ 219780.22/s (n=1000000)
         B: 13 wallclock secs (12.27 usr +  0.04 sys = 12.31 CPU) @ 81234.77/s (n=1000000)




резюме: на больших массивах имеет смысл использовать поиск через хеш.

update: при модификаторе /о регулярное выражение выигрывает у хеша, но до определенного размера.

Size: 16384
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.58 usr + -0.01 sys =  4.57 CPU) @ 218818.38/s (n=1000000)
         B:  0 wallclock secs ( 0.90 usr +  0.02 sys =  0.92 CPU) @ 1086956.52/s (n=1000000)
Size: 32768
Benchmark: timing 1000000 iterations of A, B...
         A:  5 wallclock secs ( 4.55 usr +  0.00 sys =  4.55 CPU) @ 219780.22/s (n=1000000)
         B: 239 wallclock secs (237.26 usr +  0.37 sys = 237.63 CPU) @ 4208.22/s (n=1000000)

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org