Beyond the Cloud: Pioneering Local AI on Mobile Devices with Apple, Nvidia, and Samsung

Jun 20, 2024 • 13 min read

On-device AI refers to local ML models that operate directly on a device, eliminating the need to transmit data to the cloud.

This approach allows AI features to function offline, ensuring greater privacy for user data such as photos, voice recordings, text inputs, and activity patterns. These data are analyzed by ML algorithms that are used in the device's chips and memory, with only relevant outputs sent to the cloud if necessary.

Edge computing, a related concept, enables AI applications to run directly on field devices, processing data closer to its source. This method reduces latency and bandwidth issues associated with cloud processing, as data does not need to traverse a network to be analyzed.

Companies like Qualcomm are leading the charge in bringing advanced AI processing to smartphones. For instance, the Snapdragon® 8 Gen 3 Mobile Platform supports generative AI models with up to 10 billion parameters on-device without compromising battery life.

The significance of on-device AI is even shown by a McKinsey study, which found that AI adoption has surged to 72 percent among organizations. This highlights the growing importance of mobile AI processing as the key driver of technological advancement today.

AI on Mobile Devices

AI in mobile devices has significantly enhanced their usability, security, and overall utility.

Key advantages of AI implementation in mobile devices include more powerful app authentication, which has bolstered security by improving the accuracy and effectiveness of authentication processes. Automated reply functions powered by AI have made local AI chatbots and virtual assistants more sophisticated, enabling mobile apps to provide rapid, automated responses to user inquiries, thus enhancing user engagement.

AI also enables highly personalized user experiences by leveraging data analytics and machine learning to tailor app interactions to individual preferences. Additionally, AI can automate repetitive tasks within mobile apps, saving users time and effort. Furthermore, ML algorithms can analyze user behavior patterns, providing valuable insights to app developers.

The potential extends beyond these functionalities, as data scientists and AI engineers strive to replicate human capabilities through advanced machine learning models. For instance, panoptic segmentation, researched by companies like Facebook, Uber, and Tesla, is crucial for autonomous driving, providing detailed information about objects and surfaces in the environment.

Apple has implemented on-device panoptic segmentation in its iPhone cameras using transformers, achieving pixel-wise comprehension of scenes and subjects, such as distinguishing between sky, people, and various elements of a scene.

Similarly, Samsung’s Galaxy phones feature AI-driven Live Translate, offering real-time translations during voice calls without sending data to the cloud, ensuring privacy and efficiency.

Machine learning also enhances mobile-device keyboards, as demonstrated by Gboard, which has reduced typing latency and the need for manual corrections through improved algorithms.

AI often operates behind the scenes on mobile devices, subtly improving various aspects of user experience without overtly revealing its presence, thereby making our interactions with technology smoother and more intuitive.

Technological Innovations: Apple, Nvidia, Samsung

There are three companies that are truly spearheading this technological trend, leveraging AI for mobile devices to the fullest: Apple, Nvidia, and Samsung.

Apple Intelligence & M4 Chip

Apple recently announced the M4 chip, a revolutionary advancement delivering phenomenal performance to the all-new iPad Pro. Built using second-generation 3-nanometer technology, the M4 is a system on a chip (SoC) that enhances the industry-leading power efficiency of Apple silicon. This chip features Apple’s fastest Neural Engine to date, capable of up to 38 trillion operations per second, outpacing the neural processing units of any AI PCs available today.

Coupled with faster memory bandwidth, next-generation machine learning (ML) accelerators in the CPU, and a high-performance GPU, the M4 transforms the new iPad Pro into an exceptionally powerful device for artificial intelligence tasks, including the best local AI image generator.

The M4’s Neural Engine is a dedicated IP block within the chip, designed specifically to accelerate AI workloads. This engine, capable of a staggering 38 trillion operations per second, is 60 times faster than the first Neural Engine found in the A11 Bionic.

Together with advanced ML accelerators in the CPU, a high-performance GPU, and high-bandwidth unified memory, the Neural Engine makes the M4 an unparalleled chip for AI. With AI features in iPadOS such as Live Captions for real-time audio captions and Visual Look Up, which identifies objects in video and photos, the new iPad Pro enables users to perform remarkable AI-driven tasks seamlessly and on-device, including local image generation AI.

Apple has also recently unveiled iOS 18, which promises to make the iPhone more personal, capable, and intelligent than ever before.

iOS 18 marks the debut of Apple Intelligence, a sophisticated personal intelligence system integrated deeply across iPhone, iPad, and Mac. This system leverages generative models and personal context to provide intuitive and relevant assistance, from rewriting text to creating custom photo memories.

Privacy remains a cornerstone, with Apple Intelligence designed to keep user data secure and private. Other notable features include improved mail categorization, a new Passwords app, and updates to Safari, making it easier to find and enjoy web content.

Nvidia’s Role

Within AI PCs, Nvidia’s GeForce RTX™ stands out as one of the best in the market. Whether for class, work, or entertainment, the RTX-powered AI offers a plethora of AI-powered options. For instance, the ChatRTX application provides tailored responses from local files, allowing users to search personal notes, files, and photos using text or voice commands through a custom, private local AI chatbot.

Moreover, the Nvidia Broadcast app betters video conferencing calls, voice chats, and live streams with AI-powered features such as noise removal, background replacement, and more. Another impressive feature is RTX Video, which uses AI super resolution and video HDR to transform internet video into crystal-clear 4K HDR quality.

ChatRTX, a demo app, exemplifies Nvidia’s AI capabilities by allowing users to personalize a GPT large language model (LLM) connected to their content—documents, notes, images, or other data. Utilizing retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration, users can query a custom chatbot to obtain contextually relevant answers swiftly and securely, all while running locally on their Windows RTX PC or workstation.

The AI processors in every GeForce RTX GPU deliver exceptional performance across the most demanding games, applications, and workflows, illustrating Nvidia's leadership in AI technology, including NVIDIA local AI.

Samsung AI Capabilities

In addition to the previously mentioned features, Samsung has further expanded its AI capabilities in 2024.

Galaxy AI can now respond to text prompts with helpful answers similar to Google Gemini, OpenAI's GPT-4, Meta AI, and other generative AI models. These advanced language skills make computers and phones more user-friendly and unlock new functionalities previously unattainable, including local search AI.

Galaxy AI can compose messages, conduct quick image searches, translate foreign languages, and more. One of the latest enhancements to Samsung's AI models is the inclusion of multimodal input, allowing information to be shared and interpreted in various forms. For example, users can ask a multimodal AI to create a table from an uploaded image. Galaxy AI can "see" photos, comprehend type-written commands, and use speech prompts to generate appropriate responses.

Additionally, Samsung has incorporated AI in various innovative features such as Live Translate, which provides real-time translations during voice calls, ensuring privacy and efficiency by processing everything on the device without sending data to the cloud. This feature works natively on Galaxy S24, meaning the person on the other end can use any smartphone or even a landline telephone, with no baseline tech expectations for both parties.

Samsung leverages machine learning to enhance mobile-device keyboards. For instance, Gboard has utilized machine learning to improve the mobile typing experience significantly. Their latest algorithm enhancements have cut decoding latency by 50% and reduced the fraction of words users must manually correct by more than 10%, showcasing the profound impact of AI on everyday tasks.

Business Use Cases

LLaMA Model Use Case

In 2024, industry giants IBM and Microsoft have identified small language models as a pivotal trend in artificial intelligence. These models enhance mobile apps by requiring less memory and processing power, making them ideal for smartphones. They facilitate offline functionalities like nearly-human local AI chatbots, language translation, text generation, and summarization. Additionally, they reduce costs by minimizing cloud reliance and enhance user experience with faster on-device processing.

At Netguru, we developed a proof-of-concept (POC) application integrating the LLaMA model with Apple's Transformers architecture, allowing this advanced machine learning model to be deployed on iPhone devices. This achievement followed extensive feasibility research, prototyping, and rigorous performance testing on iPhone hardware.

The application offers users a unique experience by allowing them to input text and receive relevant information. It supports various uses like summarizing documents, assisting with writing, and generating creative content. One notable instance of this application involves generating comprehensive content from minimal text inputs. For example, providing a brief description of a favorite restaurant could result in an in-depth review covering ambiance, food quality, and an overall experience summary. This functionality has the potential to revolutionize how users access and interact with information, streamlining the process, contributing to local AI business success.

Given our society's decreasing attention span, the ability to summarize lengthy documents is extremely useful. Its capability to accelerate text generation while maintaining simplicity is particularly beneficial for users needing quick summaries or creative content on the go.

Another striking feature of the application is its text prediction capability. You've probably experienced your smartphone suggesting the next word while typing. Beyond this simple task, users can now input a sentence, prompting the application to generate further information or complete thoughts. This assists in the writing process, stimulates creative ideas, and facilitates any task requiring more text based on initial input. While the machine itself isn't inherently creative, it can stimulate human creativity by presenting new perspectives on a subject.

Mobile Diffusion Use Case

Mobile Diffusion is a project using the Stable Diffusion model to generate images based on given prompts. With advancements in Apple ANE architecture, this model can now run locally on iPhones, eliminating the need for expensive servers and allowing private, local image generation.

The project's goal was to develop a mobile application utilizing the Stable Diffusion model to generate images from prompts, running locally on iPhones using Apple ANE architecture. This enables users to create images without incurring costs associated with server usage.

A practical use case for Mobile Diffusion is a professional photographer testing different effects and ideas for their images. Using the mobile application, they can quickly generate various images based on prompts without any costs.

Mobile Diffusion can generate images for diverse purposes, such as creating backgrounds for websites or apps, producing art for print and digital media, or generating graphics for marketing materials. With the capability to generate images locally on iPhones, users can seamlessly and easily create visuals for various needs without incurring additional costs.

Facing the Challenges

AI is still an emerging technology, so it’s important to outline the challenges:

Incorrectness: It can produce errors or hallucinations, leading to software defects. Accuracy must be verified through rigorous testing and validation.
Usability: Effective use of AI requires new skills in crafting prompts and validating results. User expertise significantly impacts usability.
Trust: Understanding limitations is crucial to avoid overreliance.
Context dependency and human oversight: AI effectiveness varies by context. Human oversight remains critical to ensure informed, ethical decisions.
Cost: The evolving costs of AI, including governance and security measures, must be carefully assessed.
Constant evolution: Organizations must stay updated with the rapid advancements in AI technology.
Intellectual property violations: AI’s training data may include copyrighted content, risking legal issues.

The Future of On-Device AI

The evolution of on-device AI, driven by privacy and security concerns, marks a significant shift from cloud-based processing to local, device-based computation. This transition is enabled by advancements in chip technology, such as Apple's M4 and Nvidia's RTX GPUs, which provide unprecedented AI capabilities and efficiency.

Upcoming innovations in chip technology promise even more powerful AI performance and energy efficiency. These advancements will likely accelerate the integration of sophisticated AI features in mobile devices, enhancing user experiences and enabling new applications in areas like personalized services, real-time processing, and enhanced security.

Businesses will benefit from reduced costs, increased data privacy, and improved operational efficiencies. Staying abreast of these technological developments is crucial for organizations aiming to leverage AI's full potential while addressing associated challenges such as accuracy, usability, and security.

This forward-looking approach ensures that businesses can harness the power of AI while maintaining a competitive edge in a rapidly evolving landscape, including applications in local AI voice generators, local AI art generators, and local text-to-speech AI.