在txt文件中，查找并处理重复的单词解决方法

2012-02-20

在txt文件中，查找并处理重复的单词有一个文本文件内容如下：分割线你好，很好23 26003 测试

在txt文件中，查找并处理重复的单词
有一个文本文件内容如下：
========分割线========

你好，很好
23 26003 测试你好
24 26666 视频网络的性能等等
25 26003 测试你好
26 10023 这个测试句子
很好：是的
27 55210 没有了

========分割线========

现在要找出文件中的重复项如：

23 26003 测试你好
25 26003 测试你好

并在重复的后面加上: (重复)
即：
23 26003 测试你好
25 26003 测试你好(重复)

注意：这两句除了前面的序号不同后面的字都相同

最后的文件是这样子：
========分割线========

你好，很好
23 26003 测试你好
24 26666 视频网络的性能等等
25 26003 测试你好(重复)
26 10023 这个测试句子
很好：是的
27 55210 没有了

========分割线========

写代码实现上面的功能，这个东西我想了很长时间没办法，求各位帮帮忙

[解决办法]

C/C++ code

#include   "stdafx.h" #include <fstream>#include <string>#include <set>using namespace std;bool checkFile(const char* pSrcFile, const char* pDestFile){    ifstream inf(pSrcFile);    ofstream outf(pDestFile);    set<string> setLine;    if(inf && outf)    {        string strLine;        while(getline(inf, strLine))        {            bool bDuplicate(false);            int nStart = strLine.find_first_of(' ', 0);//first space            if(nStart != string::npos)            {                int nEnd = strLine.find_first_not_of(' ', nStart);//first space end                if(nEnd != string::npos)                {                    string strSub = strLine.substr(nEnd, strLine.size() - nEnd);                    if(setLine.find(strSub) != setLine.end())                    {                        bDuplicate = true;                    }                    else                    {                        setLine.insert(strSub);                    }                }            }            if(bDuplicate)            {                strLine += "(重复)";            }            strLine += "\n";            outf.write(strLine.c_str(), strLine.size());        }        outf.flush();        return true;    }    return false;}int main(int argc, char *argv[]){    checkFile("c:\\test1.txt", "c:\\test2.txt");    system("pause");    return 0;}
[解决办法]
C/C++ code#include <iostream>#include <string>#include <fstream>#include <vector>using namespace std;int main(){    vector<string> v;    fstream file;    string filename;    cout<<"Input file's name:";    cin>>filename;    string str;    file.open(filename.c_str());    if(!file.is_open())    {        cout<<"fail to open this file!"<<endl;        return 1;    }    while(file>>str)    {        v.push_back(str);        getline(file,str);        if(str==" ")            continue;        v.push_back(str);    }    vector<string>::iterator iter=v.begin();    while(iter!=v.end())    {        vector<string>::iterator it=iter;        for(++it;it!=v.end();it++)        {            if((*iter)==(*it))            {                (*it)=(*it)+"(重复)";            }        }        iter++;    }    int k=0;    for(iter=v.begin();iter!=v.end();iter++)    {        if(k%2==0)            cout<<endl;        k++;        cout<<*iter<<"  ";    }    cout<<endl;    return 1;}
[解决办法]
这个文本不复杂，可以自己解析。

读取 line；
分解获得序号以后的 string，
和 vector 中已有的string 比较（find 方法查找一下），如有重复，追加"(重复)"，push_back()添加到vector中
continue ...

子串的获得，find方法等的使用，看：www.cppreference.com/cppstring/index.html

热点排行