如何从UTF-8的XML中读取中文
XML文件中用的是UTF-8编码,可是又包含中文,该怎么解决?
直接解析肯定是不行的,但是如果我先用GBK替换<?xml version="1.0" encoding="UTF-8"?>中的UTF-8的话,仍然会出现org.xml.sax.SAXParseException: An invalid XML character错误。
XML输入:
<?xml version="1.0" encoding="UTF-8"?><Files Domain="odpsfile" Path="2008"><File FileTempPath="6c08a588-c245-11dc-958a-d1128874cdde.doc" Index="1" Name="卫生防疫.doc" Title="新建"/></Files>
String input = glwj.replace("UTF-8","GBK");DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();DocumentBuilder documentBuilder = dbf.newDocumentBuilder();Document doc = documentBuilder.parse(StringBufferInputStream(input));<?xml version="1.0" encoding="UTF-8"?><Files Domain="odpsfile" Path="2008"><File FileTempPath="6c08a588-c245-11dc-958a-d1128874cdde.doc" Index="1" Name="卫生防疫.doc" Title="新建"/></Files>