Java按字节数截取字符串,一个中文长度为2
碰到可能会截取汉字的情况,当然是要不能截取出乱码来,就是不能对整个汉字截取一半。如"我ABC汉字d"这个字符串,截取5个字节的时候,应该是"我ABC",而截取8个字节的时候,应该是"我ABC汉",而不应该是"我ABC汉?",其中"?"为半个汉字,可理解为向前截取
public static String subStr_1(String str, int start, int end) throws UnsupportedEncodingException{ if (str == null) return null; String chinese = "[\u0391-\uFFE5]"; byte[] b = str.getBytes("UTF-8"); String temp = new String(b, start, end); String last = getLastStr(temp); while(!last.matches(chinese)){ temp = new String(b, start, ++end); last = getLastStr(temp); } return new String(b, start, end); }public static String getByteStr(String str, int start, int end) throws UnsupportedEncodingException{byte[] b = str.getBytes("UTF-8");return new String(b, start, end);}