SimilarityRank
text_file 0.707 32473
HTML_file 0.674 97415
HTML 0.661 6994
UTF-8 0.650 57904
parse 0.638 10537
HTML_code 0.631 77284
XML_file 0.627 43784
Word_document 0.627 92243
XML_document 0.623 95807
HTML_page 0.623 90337
XML 0.622 9868
Ascii 0.614 50769
XHTML 0.611 66531
PDFs 0.609 82936
text 0.608 502
Unicode 0.599 42490
URLs 0.598 60393
JSON 0.595 32713
PDF_file 0.594 21298
html 0.588 24819
output_format 0.588 92955
file_format 0.587 22357
stylesheet 0.586 58283
filename 0.585 16718
file_type 0.583 44528