资料读写，二分查找

2013-01-06

文件读写，二分查找学习当中遇到的问题，问了好多同学都不能给个完整的解决，有的同学问是什么类型文件，当是

文件读写，二分查找
学习当中遇到的问题，问了好多同学都不能给个完整的解决，有的同学问是什么类型文件，当是文本文件吧，要是二进制的应该做不出来吧。求高手帮忙。。
题：设有一个已经排序的英文词典文件，每一个词条的格式为：词语/词性/例句
例如：book/n./This is a book.
试编写一个程序，根据二分查找的原理，在文件中查找指定单词的词条，并输出词条信息。
[解决办法]
假如不用到hash，并每个词条一行，即book/n./This is a book.是独立的一行

算法如下：
1 根据回车换行(\r\n)统计有多少行。
2 设统计结果有n行，对行数用二分法，即第一次比较输入的词条和n/2行的词条，。。。。

思路不复杂
[解决办法]
2种方法：
1. 把词语/词性/例句从文件读入内存，再用二分法查找。
或者
2. 仅把词语读入内存，并记录每一行的起始位置，用二分法在内存中查找单词，然后从文件读入整个词条。

如果词条不多的话，用以方法1，如果词条太多，无法全部读入内存可以用方法2.
[解决办法]

引用:

飞哥，首先谢谢你的思路。
但我有两个疑问：１．你要统计换行符，是要遍历整个文件吗？
这样的话，直接一行一行读入判断是不是想要的词条好像就可以了。
２．对行号用二分法，那是不是有办法直接从文件中取得第Ｎ行的词条？

有道理，不过题目要求用二分法。哈哈！！

在制作词典文件时，都会把这个行数统计出来以作后用的，不然对词条排序就是多次一举。实际中不需要查找的时候再去统计。

[解决办法]
下面是例子程序，楼主可以参考！
fb82:/home/mymtom/src/csdn$ cat dict.txt
book/n./This is a book.
chair/n./This is a chair.
goat/n./This is a goat.
table/n./This is a table.
zoo/n./This is a zoo.
fb82:/home/mymtom/src/csdn$ expand -4 dict.c


/**
 * @file    dict.c
 * @brief   
 */

struct dent {
    char *word;
    long pos;
};

struct dict {
    unsigned size;
    unsigned len;
    struct dent *ents;
};

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int dict_add(struct dict *dict, const char *word, long pos)
{
    unsigned len;
    struct dent *ents;
    if (dict->len >= dict->size) {
        dict->size *= 2;
        dict->ents = (struct dent *)realloc(dict->ents,
                dict->size * sizeof(struct dent));
    }
    len = dict->len++;
    ents = dict->ents;
    ents[len].word = strdup(word);
    ents[len].pos = pos;
}

int dict_init(struct dict *dict)
{
    dict->size = 512;
    dict->len = 0;
    dict->ents = (struct dent *)malloc(dict->size * sizeof(struct dent));
}

int dent_cmp(const void *v1, const void *v2)
{
    const struct dent *e1, *e2;
    e1 = (const struct dent *)v1;
    e2 = (const struct dent *)v2;
    return strcmp(e1->word, e2->word);
}

struct dent *dict_find(const struct dict *dict, struct dent *dent) 
 
{
    struct dent *ptr;
    ptr = (struct dent *)bsearch(dent, dict->ents, dict->len,
            sizeof(struct dent), &dent_cmp);
    if (NULL == ptr)
        return NULL;
    dent->pos = ptr->pos;
    return dent;
}

int main(int argc, char *argv[])
{
    struct dict dict;
    char *prop;
    char *stmt;
    char *word;
    FILE *fp;
    long pos;
    char line[1024];
    struct dent dent;
    int i, n;

    dict_init(&dict);
    pos = 0;
    fp = fopen("dict.txt", "r");
    while (fgets(line, sizeof(line), fp)) {
        word = strtok(line, "/");
        dict_add(&dict, word, pos);
        pos = ftell(fp);
    }
    fclose(fp);

    
    fp = fopen("dict.txt", "r");

    dent.word = strdup("table");
    if (NULL != dict_find(&dict, &dent)) {
        fseek(fp, dent.pos, SEEK_SET);
        fgets(line, sizeof(line), fp);
        printf("%s", line);
    } else {
        fprintf(stderr, "%s: not found\n", dent.word);
    }
    free(dent.word);

    dent.word = strdup("chair");
    if (NULL != dict_find(&dict, &dent)) {
        fseek(fp, dent.pos, SEEK_SET);
        fgets(line, sizeof(line), fp);
        printf("%s", line);
    } else {
        fprintf(stderr, "%s: not found\n", dent.word);
    }
    free(dent.word);

    fclose(fp);

    return 0;
}

fb82:/home/mymtom/src/csdn$ ./dict
table/n./This is a table.
chair/n./This is a chair.
[解决办法]
fscanf

     [     Matches a nonempty sequence of characters from the specified set of
           accepted characters; the next pointer must be a pointer to char,
           and there must be enough room for all the characters in the string,

           plus a terminating NUL character.  The usual skip of leading white
           space is suppressed.  The string is to be made up of characters in
           (or not in) a particular set; the set is defined by the characters
           between the open bracket [ character and a close bracket ] charac-
           ter.  The set excludes those characters if the first character
           after the open bracket is a circumflex ^.  To include a close
           bracket in the set, make it the first character after the open
           bracket or the circumflex; any other position will end the set.
           The hyphen character - is also special; when placed between two
           other characters, it adds all intervening characters to the set.
           To include a hyphen, make it the last character before the final
           close bracket.  For instance, `[^]0-9-]' means the set ``everything
           except close bracket, zero through nine, and hyphen''.  The string
           ends with the appearance of a character not in the (or, with a cir-
           cumflex, in) set or when the field width runs out.

热点排行

C语言

资料读写，二分查找