作者zard1989 (St. Kevin)
看板Perl
标题Re: [问题] 档案字串比对
时间Sat Jun 5 18:58:57 2010
※ 引述《cp3cp3 (侵掠如火、不动如山)》之铭言:
: 若我有一个档案,是这样的资料格式
: AB EF CCA,XDE,PPC ACE,DDE
: AC DG ACE ACE,DDE,CCA
: DC AS CCA,XDE,PPC,FDS,JKL CCA,XDE,PPC,FDS
: 第一栏的物件对应到第三栏
: 第二栏的物件对应到第四栏
: 第三栏和第四栏内的物件个数是不固定的,但至少>=1
: 我想要利用第三栏和第四栏的资讯算出每一个record的交集和连集数目,
: 要如何写比较好?
: 谢谢!
底下是我的写法,不知道有没有会错意 @_@
======================Code=========================
#!/usr/bin/env perl
use strict;
open my $filehandle, "testfile.txt";
my %records;
while (<$filehandle>) {
my @cols = split /\s+/;
$records{$cols[0]} = [split /,/, $cols[2]];
$records{$cols[1]} = [split /,/, $cols[3]];
}
close $filehandle;
generate_data(%records);
sub generate_data {
my %records = @_;
my @names = sort keys %records;
print "Record A\tRecord B\tUnion\tIntersection\n";
for my $first (0..$#names-1) {
for my $second ($first+1..$#names) {
my %count;
my $firstname = $names[$first];
my $secondname = $names[$second];
for (@{$records{$firstname}}, @{$records{$secondname}}) {
$count{$_}++;
}
printf "%8s\t%8s\t%5d\t%12d\n",
$firstname, $secondname,
scalar keys %count, # union
scalar grep { $count{$_} > 1 } keys %count;
#intersection
}
}
}
==================Sample Output=========================
Record A Record B Union Intersection
AB AC 4 0
AB AS 4 3
AB DC 5 3
AB DG 5 1
AB EF 5 0
AC AS 5 0
AC DC 6 0
AC DG 3 1
AC EF 2 1
AS DC 5 4
AS DG 6 1
AS EF 6 0
DC DG 7 1
DC EF 7 0
DG EF 3 2
--
※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 140.112.214.6
1F:推 cp3cp3:感谢 06/07 01:16
2F:→ cp3cp3:若我要把哪些是Union和哪些是Intersection印出来? 06/07 01:19
3F:→ zard1989:把 keys %count前面的scalar拿掉就可以了 06/07 10:37
4F:推 cp3cp3:拿掉scalar没有字串出来耶,全部变成数值0 06/07 12:30
5F:→ zard1989:看下一篇 06/07 17:13