将Vision VNTextObservation转换为string

我正在浏览Apple的Vision API文档,并且看到了一些与UIImages文本检测有关的类:

1) class VNDetectTextRectanglesRequest

2) class VNTextObservation

看起来他们可以检测字符,但是我没有看到用任何方法来处理字符。 一旦你已经检测到字符,你会怎么把它转换成NSLinguisticTagger可以解释的NSLinguisticTagger

这里是一个简要概述Vision

感谢您的阅读。

添加我自己的进展,如果有人有一个更好的解决scheme:

我已经成功在屏幕上绘制区域框和字符框。 苹果的视觉API实际上是非常高效的。 您必须将video的每一帧转换为图像并将其提供给识别器。 这比直接从相机提供像素缓冲区要准确得多。

  if #available(iOS 11.0, *) { guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {return} var requestOptions:[VNImageOption : Any] = [:] if let camData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) { requestOptions = [.cameraIntrinsics:camData] } let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: 6, options: requestOptions) let request = VNDetectTextRectanglesRequest(completionHandler: { (request, _) in guard let observations = request.results else {print("no result"); return} let result = observations.map({$0 as? VNTextObservation}) DispatchQueue.main.async { self.previewLayer.sublayers?.removeSubrange(1...) for region in result { guard let rg = region else {continue} self.drawRegionBox(box: rg) if let boxes = region?.characterBoxes { for characterBox in boxes { self.drawTextBox(box: characterBox) } } } } }) request.reportCharacterBoxes = true try? imageRequestHandler.perform([request]) } } 

现在我正在尝试重新调整文本。 苹果公司不提供任何内置的OCR模式。 我想使用CoreML来做到这一点,所以我试图将Tesseract训练过的数据模型转换为CoreML。

你可以在这里findTesseract模型: https : //github.com/tesseract-ocr/tessdata ,我想下一步是编写一个coremltools转换器来支持这些types的input和输出.coreML文件。

或者,您可以直接链接到TesseractiOS,并尝试使用从Vision API获得的区域框和字符框来提供它。

SwiftOCR

我刚刚得到SwiftOCR使用小套文本。

https://github.com/garnele007/SwiftOCR

使用

https://github.com/Swift-AI/Swift-AI

它使用NeuralNet-MNIST模型进行文本识别。

TODO:VNTextObservation> SwiftOCR

一旦我有一个连接到另一个,将使用VNTextObservation发布它的例子。

OpenCV + Tesseract OCR

我试图使用OpenCV + Tesseract但得到编译错误,然后发现SwiftOCR。

还请参见:Google Vision iOS

注意Google Vision文本识别 – Android sdk具有文本检测,但也有iOS cocoapod。 所以请留意,最终应该在iOS上添加文本识别function。

https://developers.google.com/vision/text-overview

//更正:刚试过,但是只有sdk的Android版本支持文本检测。

https://developers.google.com/vision/text-overview

如果您订阅了版本: https : //libraries.io/cocoapods/GoogleMobileVision

点击SUBSCRIBE RELEASES,你可以看到TextDetection被添加到Cocoapod的iOS部分

感谢GitHub用户,你可以testing一个例子: https : //gist.github.com/Koze/e59fa3098388265e578dee6b3ce89dd8

 - (void)detectWithImageURL:(NSURL *)URL { VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithURL:URL options:@{}]; VNDetectTextRectanglesRequest *request = [[VNDetectTextRectanglesRequest alloc] initWithCompletionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) { if (error) { NSLog(@"%@", error); } else { for (VNTextObservation *textObservation in request.results) { // NSLog(@"%@", textObservation); // NSLog(@"%@", textObservation.characterBoxes); NSLog(@"%@", NSStringFromCGRect(textObservation.boundingBox)); for (VNRectangleObservation *rectangleObservation in textObservation.characterBoxes) { NSLog(@" |-%@", NSStringFromCGRect(rectangleObservation.boundingBox)); } } } }]; request.reportCharacterBoxes = YES; NSError *error; [handler performRequests:@[request] error:&error]; if (error) { NSLog(@"%@", error); } } 

事情是,结果是每个检测到的字符的边界框的数组。 从我从Vision的会议中收集的信息,我认为你应该使用CoreML来检测实际的字符。

推荐WWDC 2017会谈: 视觉框架:build立在核心ML (还没有完成观看),看看一个类似的例子叫做MNISTVision 25:50

下面是另一个漂亮的应用程序, 展示了使用Keras(Tensorflow)来训练使用CoreML进行手写识别的MNIST模型 : Github

这是如何做到这一点…

  // // ViewController.swift // import UIKit import Vision import CoreML class ViewController: UIViewController { //HOLDS OUR INPUT var inputImage:CIImage? //RESULT FROM OVERALL RECOGNITION var recognizedWords:[String] = [String]() //RESULT FROM RECOGNITION var recognizedRegion:String = String() //OCR-REQUEST lazy var ocrRequest: VNCoreMLRequest = { do { //THIS MODEL IS TRAINED BY ME FOR FONT "Inconsolata" (Numbers 0...9 and UpperCase Characters A..Z) let model = try VNCoreMLModel(for:OCR().model) return VNCoreMLRequest(model: model, completionHandler: self.handleClassification) } catch { fatalError("cannot load model") } }() //OCR-HANDLER func handleClassification(request: VNRequest, error: Error?) { guard let observations = request.results as? [VNClassificationObservation] else {fatalError("unexpected result") } guard let best = observations.first else { fatalError("cant get best result")} self.recognizedRegion = self.recognizedRegion.appending(best.identifier) } //TEXT-DETECTION-REQUEST lazy var textDetectionRequest: VNDetectTextRectanglesRequest = { return VNDetectTextRectanglesRequest(completionHandler: self.handleDetection) }() //TEXT-DETECTION-HANDLER func handleDetection(request:VNRequest, error: Error?) { guard let observations = request.results as? [VNTextObservation] else {fatalError("unexpected result") } // EMPTY THE RESULTS self.recognizedWords = [String]() //NEEDED BECAUSE OF DIFFERENT SCALES let transform = CGAffineTransform.identity.scaledBy(x: (self.inputImage?.extent.size.width)!, y: (self.inputImage?.extent.size.height)!) //A REGION IS LIKE A "WORD" for region:VNTextObservation in observations { guard let boxesIn = region.characterBoxes else { continue } //EMPTY THE RESULT FOR REGION self.recognizedRegion = "" //A "BOX" IS THE POSITION IN THE ORIGINAL IMAGE (SCALED FROM 0... 1.0) for box in boxesIn { //SCALE THE BOUNDING BOX TO PIXELS let realBoundingBox = box.boundingBox.applying(transform) //TO BE SURE guard (inputImage?.extent.contains(realBoundingBox))! else { print("invalid detected rectangle"); return} //SCALE THE POINTS TO PIXELS let topleft = box.topLeft.applying(transform) let topright = box.topRight.applying(transform) let bottomleft = box.bottomLeft.applying(transform) let bottomright = box.bottomRight.applying(transform) //LET'S CROP AND RECTIFY let charImage = inputImage? .cropped(to: realBoundingBox) .applyingFilter("CIPerspectiveCorrection", parameters: [ "inputTopLeft" : CIVector(cgPoint: topleft), "inputTopRight" : CIVector(cgPoint: topright), "inputBottomLeft" : CIVector(cgPoint: bottomleft), "inputBottomRight" : CIVector(cgPoint: bottomright) ]) //PREPARE THE HANDLER let handler = VNImageRequestHandler(ciImage: charImage!, options: [:]) //SOME OPTIONS (TO PLAY WITH..) self.ocrRequest.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFill //FEED THE CHAR-IMAGE TO OUR OCR-REQUEST - NO NEED TO SCALE IT - VISION WILL DO IT FOR US !! do { try handler.perform([self.ocrRequest]) } catch { print("Error")} } //APPEND RECOGNIZED CHARS FOR THAT REGION self.recognizedWords.append(recognizedRegion) } //THATS WHAT WE WANT - PRINT WORDS TO CONSOLE DispatchQueue.main.async { self.PrintWords(words: self.recognizedWords) } } func PrintWords(words:[String]) { // VOILA' print(recognizedWords) } func doOCR(ciImage:CIImage) { //PREPARE THE HANDLER let handler = VNImageRequestHandler(ciImage: ciImage, options:[:]) //WE NEED A BOX FOR EACH DETECTED CHARACTER self.textDetectionRequest.reportCharacterBoxes = true self.textDetectionRequest.preferBackgroundProcessing = false //FEED IT TO THE QUEUE FOR TEXT-DETECTION DispatchQueue.global(qos: .userInteractive).async { do { try handler.perform([self.textDetectionRequest]) } catch { print ("Error") } } } override func viewDidLoad() { super.viewDidLoad() // Do any additional setup after loading the view, typically from a nib. //LETS LOAD AN IMAGE FROM RESOURCE let loadedImage:UIImage = UIImage(named: "Sample1.png")! //TRY Sample2, Sample3 too //WE NEED A CIIMAGE - NOT NEEDED TO SCALE inputImage = CIImage(image:loadedImage)! //LET'S DO IT self.doOCR(ciImage: inputImage!) } override func didReceiveMemoryWarning() { super.didReceiveMemoryWarning() // Dispose of any resources that can be recreated. } } 

你会发现这里包括完整的项目是训练有素的模型!

我使用Google的Tesseract OCR引擎将图像转换为实际的string。 您必须使用cocoapods将其添加到您的Xcode项目。 虽然Tesseract会执行OCR,即使您只是简单地将包含文本的图像提供给它,使其执行得更好/更快的方法是使用检测到的文本矩形为实际包含文本的图像提供部分,这是Apple的Vision Framework派上用场。 这是一个引擎的链接: Tesseract OCR这里有一个链接到我的项目的文本检测+ OCR已经实现的当前阶段: 大声 – 相机到语音希望这些可以用一些。 祝你好运!