文件读写,二分查找
学习当中遇到的问题,问了好多同学都不能给个完整的解决,有的同学问是什么类型文件,当是文本文件吧,要是二进制的应该做不出来吧。求高手帮忙。。
题:设有一个已经排序的英文词典文件,每一个词条的格式为:词语/词性/例句
例如:book/n./This is a book.
试编写一个程序,根据二分查找的原理,在文件中查找指定单词的词条,并输出词条信息。
[解决办法]
假如不用到hash,并每个词条一行,即book/n./This is a book.是独立的一行
算法如下:
1 根据回车换行(\r\n)统计有多少行。
2 设统计结果有n行,对行数用二分法,即第一次比较输入的词条和n/2行的词条,。。。。
思路不复杂
[解决办法]
2种方法:
1. 把词语/词性/例句从文件读入内存,再用二分法查找。
或者
2. 仅把词语读入内存,并记录每一行的起始位置,用二分法在内存中查找单词,然后从文件读入整个词条。
如果词条不多的话,用以方法1,如果词条太多,无法全部读入内存可以用方法2.
[解决办法]
/**
* @file dict.c
* @brief
*/
struct dent {
char *word;
long pos;
};
struct dict {
unsigned size;
unsigned len;
struct dent *ents;
};
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int dict_add(struct dict *dict, const char *word, long pos)
{
unsigned len;
struct dent *ents;
if (dict->len >= dict->size) {
dict->size *= 2;
dict->ents = (struct dent *)realloc(dict->ents,
dict->size * sizeof(struct dent));
}
len = dict->len++;
ents = dict->ents;
ents[len].word = strdup(word);
ents[len].pos = pos;
}
int dict_init(struct dict *dict)
{
dict->size = 512;
dict->len = 0;
dict->ents = (struct dent *)malloc(dict->size * sizeof(struct dent));
}
int dent_cmp(const void *v1, const void *v2)
{
const struct dent *e1, *e2;
e1 = (const struct dent *)v1;
e2 = (const struct dent *)v2;
return strcmp(e1->word, e2->word);
}
struct dent *dict_find(const struct dict *dict, struct dent *dent)
{
struct dent *ptr;
ptr = (struct dent *)bsearch(dent, dict->ents, dict->len,
sizeof(struct dent), &dent_cmp);
if (NULL == ptr)
return NULL;
dent->pos = ptr->pos;
return dent;
}
int main(int argc, char *argv[])
{
struct dict dict;
char *prop;
char *stmt;
char *word;
FILE *fp;
long pos;
char line[1024];
struct dent dent;
int i, n;
dict_init(&dict);
pos = 0;
fp = fopen("dict.txt", "r");
while (fgets(line, sizeof(line), fp)) {
word = strtok(line, "/");
dict_add(&dict, word, pos);
pos = ftell(fp);
}
fclose(fp);
fp = fopen("dict.txt", "r");
dent.word = strdup("table");
if (NULL != dict_find(&dict, &dent)) {
fseek(fp, dent.pos, SEEK_SET);
fgets(line, sizeof(line), fp);
printf("%s", line);
} else {
fprintf(stderr, "%s: not found\n", dent.word);
}
free(dent.word);
dent.word = strdup("chair");
if (NULL != dict_find(&dict, &dent)) {
fseek(fp, dent.pos, SEEK_SET);
fgets(line, sizeof(line), fp);
printf("%s", line);
} else {
fprintf(stderr, "%s: not found\n", dent.word);
}
free(dent.word);
fclose(fp);
return 0;
}
plus a terminating NUL character. The usual skip of leading white
space is suppressed. The string is to be made up of characters in
(or not in) a particular set; the set is defined by the characters
between the open bracket [ character and a close bracket ] charac-
ter. The set excludes those characters if the first character
after the open bracket is a circumflex ^. To include a close
bracket in the set, make it the first character after the open
bracket or the circumflex; any other position will end the set.
The hyphen character - is also special; when placed between two
other characters, it adds all intervening characters to the set.
To include a hyphen, make it the last character before the final
close bracket. For instance, `[^]0-9-]' means the set ``everything
except close bracket, zero through nine, and hyphen''. The string
ends with the appearance of a character not in the (or, with a cir-
cumflex, in) set or when the field width runs out.