International Journal of Advances in Computer Science and Its Applications
Author(s) : JAGADISH DHARANIKOTA, SUNEETA AGARWAL
Due to increase in the data size and limited network bandwidths there is need of compressing the data files. This compression technique saves the memory and data can be transferred faster over the network. Pattern matching on compressed files is one of the requirements for Information retrieval applications. Files compressed using (s,c) dense code compression helps significantly to reduce the time for searching as it avoids the decompression of the compressed file for finding the pattern. In this paper we propose an approach for phrase matching in the compressed files by modifying standard string matching algorithms like horspool and Sunday algorithm. This phrase matching can be used by search engines in relevant document retrieval for the given query. Pattern matching on (s, c) dense code compressed files had lots of advantages along with better compression ratios when compared to other standard compression algorithms. Searching the text on the compressed files is up to 8 times faster when compared to uncompressed file . Here we propose a new searching technique for phrase searching in (s,c) dense code file. We apply frequency based codeword matching searching using standard algorithms with proper modification in them. We show that our proposed searching technique is faster than straight forward techniques.