首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > 编程 >

XPCOM字符串操作(2)

2012-12-27 
XPCOM字符串操作(二)?If the string is ASCII, will it be compared to, assigned to, or otherwise inter

XPCOM字符串操作(二)

?

If the string is ASCII, will it be compared to, assigned to, or otherwise interact with non-ASCII strings??When assigning or comparing an 8-bit ASCII value (in)to a 16-bit UCS2 string, an "inflation" needs to happen at runtime. If your strings are small enough (say, less than 64 bytes) then it may make sense to store your string in a 16-bit unicode class as well, to avoid the extra conversion. The tradeoff is that your ASCII string takes up twice as much space as a 16-bit Unicode string than it would as an 8-bit string.Is the string usually ASCII, but needs to support unicode??If your string is most often ASCII but needs to be able to store Unicode characters, then UTF-8 may be the right encoding. ASCII characters will still be stored in 8-bit storage but other Unicode characters will take up 2 to 4 bytes. However if the string ever needs to be compared or assigned to a 16-bit string, a runtime conversion will be necessary.Are you storing large strings of non-ASCII data??Up until this point, UTF-8 might seem like the ideal encoding. The drawback is that for most non-European characters (such as Chinese, Indian and Japanese) in BMP, UTF-8 takes 50% more space than UTF-16. For characters in plane 1 and above, both UTF-8 and UTF-16 take 4 bytes.Do you need to manipulate the contents of a Unicode string??One problem with encoding Unicode characters in UTF-8 or other 8-bit storage formats is that the actual Unicode character can span multiple bytes in a string. In most encodings, the actual number of bytes varies from character to character. When you need to iterate over each character, you must take the encoding into account. This is vastly simplified when iterating 16-bit strings because each 16-bit code unit (PRUnichar) corresponds to a Unicode character as long as all characters are in BMP, which is often the case. However, you have to keep in mind that a single Unicode character in plane 1 and beyond is represented in two 16-bit code units in 16-bit strings so that the number of?PRUnichar's is?not?always equal to the number of Unicode characters. For the same reason, the position and the index in terms of 16-bit code units are not always the same as the position and the index in terms of Unicode characters.


To assist with ASCII, UTF-8, and UTF-16 conversions, there are some helper methods and classes. Some of these classes look like functions, because they are most often used as temporary objects on the stack.

For example, imagine a UTF-8 string where the first Unicode character of the string is represented with a 3-byte UTF-8 sequence, the "inflated" UTF-16 string will contain the 3?PRUnichar's instead of the single?PRUnichar?that represents the first character. These?PRUnichar's have nothing to do with the first Unicode character in the UTF-8 string.

    NS_ConvertASCIItoUTF16(nsACString)?- a?nsAutoString?which holds a temporary buffer containing the inflated value of the string.CopyASCIItoUTF16(nsACString, nsAString)?- does an in-place conversion from one string into a Unicode string object.AppendASCIItoUTF16(nsACString, nsAString)?- appends an ASCII string to a Unicode string.ToNewUnicode(nsACString)?- Creates a new?PRUnichar*?string which contains the inflated value.

    Here are some examples of proper?NS_LITERAL_[C]STRING?usage.

    // call Init(const PRUnichar*)Init(NS_LITERAL_STRING("start value").get());// call Init(const nsAString&)Init(NS_LITERAL_STRING("start value"));// call Init(const nsACString&)Init(NS_LITERAL_CSTRING("start value"));

    There are a few details which can be useful in tracking down issues with these macros:

    NS_LITERAL_STRING?does compile-time conversion to UTF-16 on some platforms (e.g. Windows, Linux, and Mac) but does runtime conversion on other platforms. By usingNS_LITERAL_STRING?your code is guaranteed to use the best possible conversion for the platform in question.

    Because some platforms do runtime conversion, the use of literal string concatenation inside a?NS_LITERAL_STRING/NS_NAMED_LITERAL_STRING?macro will compile on these platforms, but not on platforms which support compile-time conversion.

    For example:

    // call Init(nsAString&)Init(NS_LITERAL_STRING("start "     "value")); // only compiles on some platforms

    The reason for this is that on some platforms, the?L"..."?syntax is used, but it is only applied to the first string in the concatenation ("start "). When the compiler attempts to concatenate this with the non-Unicode string?"value"?it gets confused.

    Also, using preprocessor macros as the string literal is unsupported:

    #define some_string "See Mozilla Run"...Init(NS_LITERAL_STRING( some_string )); // only compiles on some platforms/with some compilers.

    SetUtf16String()?the value of the string can be used through a variety of methods including?Iterators,?PromiseFlatString, and assignment to other strings.

    In?GetValue(), the first parameter,?aKey, is treated as a raw sequence of 8-bit values. Any non-ASCII characters in?aKey?will be preserved when crossing XPConnect boundaries. The implementation of?GetValue()?will assign a UTF-8 encoded 8-bit string into?aResult. If the?this?method is called across XPConnect boundaries, such as from a script, then the result will be decoded from UTF-8 into UTF-16 and used as a Unicode value.

    nsDependentString?when you have a raw character pointer that you need to convert to an nsAString-compatible string.Use?Substring()?to extract fragments of existing strings.Use?iterators?to parse and extract string fragments.

    nsAString&?for "out" parameters.Retrieving "out" string/wstringsnsXPIDLString
    nsXPIDLCString
    Use?getter_Copies(). Similar to?nsString / nsCString.Wrapping character buffersnsDependentString
    nsDependentCString
    Wrap?const char* / const PRUnichar*?buffers.Literal stringsNS_LITERAL_STRING
    NS_LITERAL_CSTRING
    Similar to?nsDependent[C]String, but pre-calculates length at build time.

    Appendix B - nsAString Reference

    Read-only methods.

      Length()IsEmpty()IsVoid()?- XPConnect will convert void nsAStrings to JavaScript?null.BeginReading(iterator)EndReading(iterator)Equals(string[, comparator])First()Last()CountChar()Left(outstring, length)Mid(outstring, position, length)Right(outstring, length)FindChar(character)

      Methods that modify the string.

        Assign(string)Append(string)Insert(string)Cut(start, length)Replace(start, length, string)Truncate(length)SetIsVoid(true)?- Make it null. XPConnect will convert void nsAStrings to JavaScript?null.BeginWriting(iterator)EndWriting(iterator)SetCapacity()

        ?

热点排行