Deep dive into natural scene text recognition and our 50+ language auto-detection pipeline.
Extracting text from "in-the-wild" imagery (street signs, billboards, graffiti) is fundamentally harder than flat document scans. It requires handling curved perspective transformations, drastic lighting changes, and noisy backgrounds.
When calling the API, supply the scene_text: true flag. This engages the heavy-weight ResNet backbone specialized in curving and shadow removal before passing data to the recognizer.
await solveocr.extract("tokyo_neon.jpg", {
scene_text: true
})If you don't know the language of the sign (e.g. touring a foreign country), omit the `languages` array. The vision model will perform a global classification pass.