ScreenAI: A visual language model for UI and visually ... - Google …
http://research.google/blog/screenai-a-visual-language-model-for-ui-and-visually-situated-language-understanding/
WEBMar 19, 2024 · ScreenAI’s architecture is based on PaLI, composed of a multimodal encoder block and an autoregressive decoder.The PaLI encoder uses a vision transformer (ViT) that creates image embeddings and a multimodal encoder that takes the concatenation of the image and text embeddings as input. This flexible architecture …
DA: 63 PA: 98 MOZ Rank: 70