Agile Loop is leveraging large action models to shift the focus towards a new type of operating system—AL OS1. This operating system contributes to changing human-software interaction through simple human language prompts. Through AL OS1, a wide range of desktop and Web applications such as Jira, Zoho, Microsoft PowerPoint, Slack, and others can be navigated simultaneously for monotonous workflows and boosting productivity. Employing a model that stands out because of its easy-to-use interface and accuracy in intricate tasks is what sets AL OS1 apart.
With AL OS1, you can effortlessly create a PowerPoint presentation and send it to your colleague via Outlook. Just give the system a prompt, and it will handle everything for you. From opening PowerPoint to creating the email and sending it, every step will be taken care of on your behalf—making the entire process seamless and efficient.
This is mainly possible to execute because of Large Action Models. The action model is the key component that finalizes the puzzle, enabling the full use of Artificial Intelligence capabilities.
As most companies seek to utilize and incorporate Large Language Models with other existing models, the focus shifted to Large Multi-Modal Models (LMMs); which allowed us to think outside the box and develop Large Action Models. Unlike older models such as LLMs, which deal with text generation, multi-modal models can handle different types of information, such as text, images, sound, and video. This information helps them understand the task better, similar to how people see and interact with their environment.
For example, GPT 4, is currently using LMMs allowing users to prompt through text and images, as well as voice, enabling the system to understand prompts in different ways, not just bound by text like GPT-3.5. Large Action Models, on the other hand, allow the system to learn any new interface through its self-learning capabilities. This makes it possible for complex tasks to be carried out through simple language prompts and allows the system to take over your keyboard and mouse cursor.
What can be done through LAM that hasn’t been done before?
This AI Agent comprehends your computer interface, replicating the human experience, and performs tasks intelligently—thus re-inventing human-software interaction. AI Agents today can only provide you with the information you need and at most compile the information in a document. Agile Loop is looking to change end-to-end computer-user interaction.
AI Agents should be able to maintain the highest standards in their capabilities to perform complex tasks. They should be able to execute and comprehend complex tasks simultaneously as humans do but with the edge of carrying out these tasks in machine-level time.
How is AL OS1 trained?
AL OS1 can carry out complex tasks on your behalf because it understands the thought process humans go through when operating any system. If you are looking to find an application on your desktop such as Google Photos and view photos from September 2021, the AI agent will first search for the app through the search bar available on the screen, open the app, and then search September 2021 in the search bar for Google Photos. This is exactly what you would have done to execute this task without an AI Agent.
AL OS1 is trained through action spaces rather than just language and vision models which are only capable of memorizing text or UI elements available on the screen. AL OS1 can navigate any new interface, not just the ones it has been trained on. Its unique self-learning capabilities allow the system to understand any element on the screen, predict, and carry out the next action to take. In contrast to generative AI chatbots, LAM can perform tasks independently, eliminating the requirement for custom integrations like Application Programming Interfaces (APIs). This is why Large Action Models can distinguish themselves from Large Language Models which lack mirroring human-like actions on a computer interface.
Initially trained through demonstration, Agile Loop shifted to a different approach which involved creating a tool where AL OS1 was able to understand new software and actions on its own. This helped in collecting data with little to no manpower and involving a reinforcement learning methodology for the system. Distinctively, it can generalize unseen software through its action space and training, ensuring accurate results even in unfamiliar scenarios.
The Future of Large Action Models
Agile Loop is optimistic about tapping into the capabilities of LAM and can assert with confidence that AL OS1 possesses cognitive abilities akin to humans. It autonomously conducts research, operates various computer applications through interfaces, compiles and presents information, and ultimately performs tasks on your behalf.
Working towards making Large Action Models a reality, LAM can introduce a new era of productivity, simplicity, and transparency—enhancing our proficiency in tasks we find most captivating and liberating us from those we don’t.