将部分UTF-8解码为NSString

在使用NSURLConnection类通过网络获取UTF-8编码文件时，委托的connection:didReceiveData:很可能connection:didReceiveData:消息将与NSData一起发送， NSData会截断UTF-8文件 – 因为UTF-8是一个multithreading字节编码方案，单个字符可以在两个单独的NSData发送

换句话说，如果我加入了从connection:didReceiveData:获得的所有数据connection:didReceiveData:我将拥有一个有效的UTF-8文件，但每个单独的数据都不是有效的UTF-8 （）。

我不想将所有下载的文件存储在内存中。

我想要的是：给定NSData ，解码你可以进入NSString任何东西。如果NSData的最后几个字节是未闭合的代理，请告诉我，所以我可以将它们保存为下一个NSData 。

一个明显的解决方案是反复尝试使用initWithData:encoding:进行解码initWithData:encoding:每次截断最后一个字节，直到成功。不幸的是，这可能非常浪费。

如果要确保不要在UTF-8多字节序列的中间停止，则需要查看字节数组的末尾并检查前2位。

如果顶部位是0，那么它是ASCII样式的非转义UTF-8代码之一，你已经完成了。
如果顶部位是1而倒数第二个是0，那么它是转义序列的延续并且可能代表该序列的最后一个字节，因此您需要缓冲该字符以供以后使用，然后查看前面的内容字符*
如果顶部位为1且第二个顶部也为1，则它是多字节序列的开头，您需要通过查找前0位来确定序列中有多少个字符。

查看Wikipedia条目中的多字节表： http ： //en.wikipedia.org/wiki/UTF-8

 // assumes that receivedData contains both the leftovers and the new data unsigned char *data= [receivedData bytes]; UInteger byteCount= [receivedData length]; if (byteCount<1) return nil; // or @""; unsigned char *lastByte = data[byteCount-1]; if ( lastByte & 0x80 == 0) { NSString *newString = [NSString initWithBytes: data length: byteCount encoding: NSUTF8Encoding]; // verify success // remove bytes from mutable receivedData, or set overflow to empty return newString; } // now eat all of the continuation bytes UInteger backCount=0; while ( (byteCount > 0) && (lastByte & 0xc0 == 0x80)) { backCount++; byteCount--; lastByte = data[byteCount-1]; } // at this point, either we have exhausted byteCount or we have the initial character // if we exhaust the byte count we're probably in an illegal sequence, as we should // always have the initial character in the receivedData if (byteCount<1) { // error! return nil; } // at this point, you can either use just byteCount, or you can compute the // length of the sequence from the lastByte in order // to determine if you have exactly the right number of characters to decode UTF-8. UInteger requiredBytes = 0; if (lastByte & 0xe0 == 0xc0) { // 110xxxxx // 2 byte sequence requiredBytes= 1; } else if (lastByte & 0xf0 == 0xe0) { // 1110xxxx // 3 byte sequence requiredBytes= 2; } else if (lastByte & 0xf8 == 0xf0) { // 11110xxx // 4 byte sequence requiredBytes= 3; } else if (lastByte & 0xfc == 0xf8) { // 111110xx // 5 byte sequence requiredBytes= 4; } else if (lastByte & 0xfe == 0xfc) { // 1111110x // 6 byte sequence requiredBytes= 5; } else { // shouldn't happen, illegal UTF8 seq } // now we know how many characters we need and we know how many // (backCount) we have, so either use them, or take the // introductory character away. if (requiredBytes==backCount) { // we have the right number of bytes byteCount += backCount; } else { // we don't have the right number of bytes, so remove the intro character byteCount -= 1; } NSString *newString = [NSString initWithBytes: data length: byteCount encoding: NSUTF8Encoding]; // verify success // remove byteCount bytes from mutable receivedData, or set overflow to the // bytes between byteCount and [receivedData count] return newString;

UTF-8是一种非常简单的解析编码，旨在使检测不完整序列变得容易，如果从不完整序列开始，则可以找到它的开头。

从末尾向后搜索一个<= 0x7f或> 0xc0的字节。如果它<= 0x7f，它就完成了。如果它在0xc0和0xdf之间（包括0和0），则需要一个后续字节才能完成。如果它在0xe0和0xef之间，则需要两个后续字节才能完成。如果它> = 0xf0，则需要三个后续字节才能完成。

我有类似的问题 – 部分解码utf8

之前

  NSString * adsTopic = [components[2] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]; adsInfo->adsTopic = malloc(sizeof(char) * adsTopic.length + 1); strncpy(adsInfo->adsTopic, [adsTopic UTF8String], adsTopic.length + 1);

之后[解决]

  NSString *adsTopic = [components[2] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]; NSUInteger byteCount = [adsTopic lengthOfBytesUsingEncoding:NSUTF8StringEncoding]; NSLog(@"number of Unicode characters in the string topic == %lu",(unsigned long)byteCount); adsInfo->adsTopic = malloc(byteCount+1); strncpy(adsInfo->adsTopic, [adsTopic UTF8String], byteCount + 1); NSString *text=[NSString stringWithCString:adsInfo.adsTopic encoding:NSUTF8StringEncoding]; NSLog(@"=== %@", text);

将部分UTF-8解码为NSString

架构i386的未定义符号：“_OBJC_CLASS _ $ _ ZipException”，引用自：error

obj-c中的静态背景图像位置（类似于background-attachment：fixed）

为什么UIViewController在主线程上解除分配？

导航控制器推视图控制器

inheritance属性，setter在从readonly读取inheritance的属性时不合成

带有导航栏和视图控制器的自动布局（iOS 7）

App提交iOS的捆绑ID

苹果是否提供SIRI的API？

我可以在自定义字体中embedded自定义字体并从ios框架中访问它吗？

UITextfield显示在键盘 – 初学者的顶部