If you spend any time inside Taobao, KakaoTalk, a Japanese gacha game, or a raw manga reader on your iPhone, you’ve already discovered iOS’s quiet limitation. Apple has built Live Translation into Messages, FaceTime, and Phone calls. Visual Intelligence reads text from your camera and from screenshots. Apple Translate handles selectable text and whole web pages. None of those features help the moment you open any of the apps above.
The text is right there on your screen. You can see it. But iOS won’t translate it without first making you screenshot it, switch apps, run it through Translate or Google Lens, and switch back. Users abroad describe doing this hundreds of times a day. The screenshot folder fills with translations they never go back to delete.
That gap isn’t a bug. It’s a design choice — and the workaround isn’t going to come from Cupertino.
What Apple actually ships in iOS 26
To be fair, Apple’s translation tools cover more ground than they used to.
- Live Translation in Messages, FaceTime, and Phone calls is real-time, on-device, and surprisingly accurate. Probably the best translation feature Apple has ever shipped.
- Visual Intelligence in iOS 26 works on both the camera and on screenshots — point at a sign, or take a screenshot of a webpage, and you get a translation overlay on the image.
- Apple Translate handles any text you can highlight, plus whole-page translation in Safari with one tap.
What ties all three together is the same thing that limits them: every one requires a deliberate, intentional action. Open the camera. Capture a screenshot. Highlight text. Tap Translate. iOS doesn’t have a system-level “translate whatever is currently on the screen” mode.
Android does. Long-press the home button on a Pixel, and Google’s Circle to Search lets you select any text on screen for translation in place — no screenshot, no app switch. Samsung phones have a similar capability through both Live Translate and Circle to Search. Xiaomi, Huawei, and Oppo have built full-screen translation directly into their system layer.
Why iPhone is structurally different
Third-party apps on iOS have very limited ways to draw on top of other apps. The Picture-in-Picture API is the main one — designed for video calls and media playback, but flexible enough that developers can render arbitrary content as if it were video and display it in a floating PiP window. Apple’s App Store guidelines suggest PiP should be used for video, but a small ecosystem of utility apps has emerged that uses PiP for things like translators, calculators, and dictionaries. Apple approves them case by case.
That’s why your FaceTime call can hover over Safari, your Translate app cannot do the same thing natively, but a third-party PiP-based translator can.
It’s a roundabout way to solve what should be a system-level feature. But on iPhone in 2026, it’s the only path that exists.
What an iPhone screen translator actually needs to do
If you’ve ever used Circle to Search on Android, you already know the workflow you want. Tap once, see translation. Keep using the app. The text appears in a small floating window you can drag around the screen.
For that to work on iPhone, an app needs to:
- Float persistently over any third-party app — no screenshots, no app switching.
- OCR non-selectable text — speech bubbles in manga, button labels in Chinese super-apps, dialogue boxes in Japanese games, all rendered as part of the image.
- Handle CJK languages well — Chinese and Japanese are the hardest cases.
- Stay out of the way — small footprint, draggable, dismissable.
Apple’s built-in tools fail criteria 1 and 4. Visual Intelligence and Apple Translate each require you to stop, capture, then resume. Fine for one translation a day; terrible for two hundred.
The app I actually use
PiP Screen Translate uses iOS Picture-in-Picture to float a small translation window directly on top of whatever app I’m in. The OCR runs in the background; the translation shows up in the floating window in real time. I can drag it around to peek at different parts of the underlying app.

What I use it for, roughly in order:
- Reading raw Korean webtoons that are 40 chapters ahead of the fan translation.
- Browsing Taobao and 1688 for parts and components.
- Playing the JP version of mobile games — Umamusume, FGO, Project Sekai — that don’t have global releases.
- Navigating Chinese super-apps when I travel — Meituan, Dianping, Xiaohongshu.
- Translating signs and on-screen text in anime that streaming platforms never bother subtitling.
It’s not magic. Machine translation flattens Japanese honorifics, loses Chinese chengyu, collapses Korean honorific levels. For dialogue that matters, human translators are still better. But for the 80% case — “what does this menu button do,” “what is this dish called,” “what is this character roughly saying” — it works.
Free tier covers casual usage. Ad-free unlimited use is around $3/month with discounted annual plans.
What I wish Apple would do
The fix is obvious. Apple already has the components: Visual Intelligence’s OCR, Apple Translate’s language coverage, system overlay frameworks. What’s missing is the combination — an always-available translation overlay that reads whatever is currently on screen, inside any app, without requiring a screenshot first. Visual Intelligence could become that, with one more iteration. Live Translation could expand from Messages and FaceTime to “any app’s text.”
Until Apple ships it, the workaround is already in the App Store.




