検索時間のとりあえず平均を出して見るよ

BM25

>>> 330.33 / 4512
0.07321143617021277

edit

>>> 7822.05/ 882
8.868537414965987

jw

>>> 4115.67 / 1237
3.327138237671787

jw

>>> 5930.88 / 235
25.237787234042553

この問題をどうにかしろ!

件数を下げると,同等の速度なのか?

10件を検索するだけのスクリプトを動かしてみよう

BM25

cat time.csv | sed 's/elapsed_time://g' | sed 's/\[sec\]//g' | awk '{a+=$1} END {print a }'
29.8979

edit

cat time.csv | sed 's/elapsed_time://g' | sed 's/\[sec\]//g' | awk '{a+=$1} END {print a }'
7547.05


cat time.csv | sed 's/elapsed_time://g' | sed 's/\[sec\]//g' | awk '{a+=$1} END {print a }'
18.1789

件数は?

# edit
for i in ./search_result/* ; do wc $i ;done | awk '{a+=$1} END {print a}'
10268

# BM25
 😍    master   ../HD/false_negative_50_jar/false_negative  for i in ./search_result/* ; do wc $i ;done | awk '{a+=$1} END {print a}'
49599

全体的に何件以上出ているの?

for i in ./search_result/* ; do wc $i ;done | awk '{if ($1 > 2) a+=$1} END {print a}'
2012

2件以上のものがeditは非常に少ない

一件しか出なかったやつは,検索バースマークと同じものが結果として出ているのか?

 cat onceResult.csv| while read file ; do cat $file | awk -v FS=',' '{for( i = 4; i <= NF ; i++) printf("%s,", $i)} END {printf("-")}'| (IFS='-' read a b ; diff <(echo $a) <(echo $b | sed 's/-//g') > /dev/null && echo OK);done > onceResultDiff.csv
0件
-rw-r--r--      1 mituba  staff       0  6 16 03:32 onceResultDiff.csv

encode_dataの類似度とdataの類似度がシンクロしていない説. マジ? マジじゃなかった

# python3でやる
全部オーケーだった

二件以上のやつもすべてオーケーだった

とりあえず一つを取り足して見てみます.

org.jvnet.hyperjaxb3.ejb.schemas.customizations.Version,jar:file:/Volumes/HD/false_negative_50_jar/birthmark_server/data/jar/hyperjaxb3-ejb-schemas-customizations-0.5.6.jar!/org/jvnet/hyperjaxb3/ejb/schemas/customizations/Version.class,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.ElementCollection,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.Embeddable,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.EmbeddedId,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.Entity,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.Id,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.ManyToMany,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.ManyToOne,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.MappedSuperclass,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.OneToMany,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25
org.jvnet.hyperjaxb3.ejb.schemas.customizations.OneToOne,1.0,kgram,25 183,183 177,177 25,25 180,180 199,199 4,4 172,172 25,180 182,182 172,25 25,25 181,181 177,25 193,193 154,154 3,3 172,25 166,166 4,183 154,25 192,192 58,58 25,180 198,198 25,25 182,182 167,167 3,3 54,54 25,25 18,18 21,21 184,184 25,184 21,21 21,21 185,185 154,172 4,172 178,178 58,25 1,1 1,1 25,183 54,185 54,54 21,21 172,182 182,182 176,176 178,176 25,25 199,199 25,167 25,25 58,183 87,87 25,193 153,153 25,25 21,184 182,1 181,181 25,25 176,176 187,89 183,183 176,182 177,183 25

全てorg.jvnet.hyperjaxb3.ejb.schemas.customizationsのパッケージのクラスである.類似度は全て1.0

検索エンジンに入っていないバースマークで検索を行ってみたけど,検索結果は0件だった.

:じゃなくて,data:*にしよう

検索時間は?

cat xmlunit_time.csv | sed 's/elapsed_time://g' | sed 's/[sec]//g' | awk '{num+=$1} END {print num}'
56.1769

結果は10件出ている

for i in ./search_result/* ; do wc $i ;done | awk '{if ($1 > 10) print $4}' | wc -l
      44
for i in ./search_result/* ; do wc $i ;done | awk '{if ($1 < 9) print $4}' | wc -l
       0

:よりは早い,ワンチャンある