検索時間のとりあえず平均を出して見るよ
BM25
>>> 330.33 / 4512
0.07321143617021277
edit
>>> 7822.05/ 882
8.868537414965987
jw
>>> 4115.67 / 1237
3.327138237671787
jw
>>> 5930.88 / 235
25.237787234042553
この問題をどうにかしろ!
件数を下げると,同等の速度なのか?
10件を検索するだけのスクリプトを動かしてみよう
BM25
cat time.csv | sed 's/elapsed_time://g' | sed 's/\[sec\]//g' | awk '{a+=$1} END {print a }'
29.8979
edit
cat time.csv | sed 's/elapsed_time://g' | sed 's/\[sec\]//g' | awk '{a+=$1} END {print a }'
7547.05
cat time.csv | sed 's/elapsed_time://g' | sed 's/\[sec\]//g' | awk '{a+=$1} END {print a }'
18.1789
件数は?
# edit
for i in ./search_result/* ; do wc $i ;done | awk '{a+=$1} END {print a}'
10268
# BM25
😍 master ../HD/false_negative_50_jar/false_negative for i in ./search_result/* ; do wc $i ;done | awk '{a+=$1} END {print a}'
49599
全体的に何件以上出ているの?
for i in ./search_result/* ; do wc $i ;done | awk '{if ($1 > 2) a+=$1} END {print a}'
2012
2件以上のものがeditは非常に少ない
一件しか出なかったやつは,検索バースマークと同じものが結果として出ているのか?
cat onceResult.csv| while read file ; do cat $file | awk -v FS=',' '{for( i = 4; i <= NF ; i++) printf("%s,", $i)} END {printf("-")}'| (IFS='-' read a b ; diff <(echo $a) <(echo $b | sed 's/-//g') > /dev/null && echo OK);done > onceResultDiff.csv
0件
-rw-r--r-- 1 mituba staff 0 6 16 03:32 onceResultDiff.csv
encode_dataの類似度とdataの類似度がシンクロしていない説. マジ? マジじゃなかった
# python3でやる
全部オーケーだった
二件以上のやつもすべてオーケーだった
とりあえず一つを取り足して見てみます.
org.jvnet.hyperjaxb3.ejb.schemas.customizations.Version,jar:file:/Volumes/HD/false_negative_50_jar/birthmark_server/data/jar/hyperjaxb3-ejb-schemas-customizations-0.5.6.jar!/org/jvnet/hyperjaxb3/ejb/schemas/customizations/Version.class,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.ElementCollection,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.Embeddable,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.EmbeddedId,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.Entity,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.Id,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.ManyToMany,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.ManyToOne,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.MappedSuperclass,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.OneToMany,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.OneToOne,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
全てorg.jvnet.hyperjaxb3.ejb.schemas.customizationsのパッケージのクラスである.類似度は全て1.0
検索エンジンに入っていないバースマークで検索を行ってみたけど,検索結果は0件だった.
:じゃなくて,data:*にしよう
検索時間は?
cat xmlunit_time.csv | sed 's/elapsed_time://g' | sed 's/[sec]//g' | awk '{num+=$1} END {print num}'
56.1769
結果は10件出ている
for i in ./search_result/* ; do wc $i ;done | awk '{if ($1 > 10) print $4}' | wc -l
44
for i in ./search_result/* ; do wc $i ;done | awk '{if ($1 < 9) print $4}' | wc -l
0
:よりは早い,ワンチャンある