In just 13 days, OpenAI built a large model of a Autonomous Humanoid Robot that can listen, speak, and make decisions autonomously. Eric Jang, a senior robotics expert, predicted not long ago: “ChatGPT appeared overnight. I think the same will happen with intelligent robotics.”
He may be right.
Late at night on March 13th, a video of a Autonomous Humanoid Robot began to circulate on X.
OpenAI, which has never demonstrated robot direction capabilities before, demonstrated its robot intelligence capabilities for the first time in cooperation with an investment company’s humanoid robot.
Figure, a robotics company backed by OpenAI, uploaded the video. In the video, Figure’s humanoid robot can fully communicate with humans fluently, understand human intentions, and can also understand human natural language instructions to grab and place, and explain why it does what it does.
Behind it is the intelligent brain configured by OpenAI.
In the past year’s progress in embodied intelligence, you may have seen similar demonstrations of robots making autonomous decisions and picking up items, but in this video, the Figure humanoid robot’s conversational fluency and sense of intelligence are close to The fluidity of movements at human operating speed is absolutely top-notch.
Figure also specifically emphasized that the entire video was shot in one shot without any acceleration or editing. At the same time, the robot behaves completely autonomously without any remote control – it seems to be a secret irony of the Stanford cooking robot that exploded some time ago and showed off its cool mechanical capabilities but not much intelligence.
What’s more frightening than the robot’s intelligent performance is that this is just the result of OpenAI’s small trial – from the time OpenAI announced its cooperation with Figure to advance the frontier of 66666Humanoid Robot to the release of this video, it was only a short period of thirteen days. .
The intelligence behind this Figure humanoid robot comes from the end-to-end large language-visual model, which is currently a very cutting-edge field in the field of embodied intelligence. Last year Geek Park reported on Google’s progress in a similar field. The end-to-end robot control model developed by Google is hailed by some in the industry as the GPT-3 moment for large-scale robot models.
At that time, Google’s robot model could only do some crawling based on dialogue, and could not talk to humans, nor could it explain to humans why it did what it did. Google itself, starting with Everyday Robotics, has more than five years of experience in robotics research.
Figure itself was founded in 2022. It has only been 13 days since OpenAI announced its involvement and today they jointly launched a robot capable of autonomous dialogue and decision-making.
The development of robot intelligence is obviously accelerating.
1. Driven by end-to-end large models, the autonomous humanoid robot’s speed is close to human speed
Figure founder Brett Adcock and AI team leader Corey Lynch explain the principles behind the robot interaction in this video on X.
This breakthrough was made jointly by OpenAI and Figure. OpenAI provides visual reasoning and language understanding, while Figure’s neural network provides fast, low-level, dexterous robot movements.
All actions performed by the robot are due to learned and internalized abilities , rather than from remote operation.
The researchers transcribed image input from the robot’s camera and text from speech captured by the onboard microphone into a multimodal model (VLM) trained by OpenAI to understand both images and text, which processed the entire history of the conversation. Record, derive a verbal response, and then send it back to a human via text-to-speech.
The same model is also responsible for deciding which learned closed-loop behaviors to run on the robot to complete a given command, loading specific neural network weights onto the GPU and executing the policy.
This is why this robot belongs to “end-to-end” robot control. Starting from the language input, the model takes over all processing and directly outputs language and behavioral results , instead of outputting some results in the middle and loading other programs to process these results.
Figure’s onboard camera captures images at 10hz, and the neural network outputs 24 degrees of freedom motion at 200hz.
The founder of Figure mentioned that this means that the speed of robots has been significantly improved, starting to approach human speed.
The multi-modal capabilities of OpenAI’s model are the key to the robot’s ability to interact with the world. We can see many similar moments from the video display, such as:
Describe its surroundings.
Use common sense reasoning when making decisions. For example, “dishes such as plates and cups on the table are likely to go into the clothes drying rack next.”
Convert ambiguous high-level requests such as “I’m hungry” into context-appropriate actions, such as “pass the person an apple.”
Describe in plain English why it does a specific action. For example, “This is the only edible item I can offer you from the table.”
The powerful ability of the model allows it to also have short-term memory . For example, as shown in the video, “Can you put them there?” What do “they” refer to? Where is “there”? Correct answers require the ability to reflect on memory.
The specific movements of both hands can be understood in two steps:
First, an Internet pre-trained model performs common sense reasoning on images and text to derive high-level plans. As shown in the video: Figure’s humanoid robot quickly formed two plans: 1) place the cup on the dish rack, and 2) place the plate on the dish rack.
Secondly, the 24-DOF motion (wrist posture and finger joint angle) generated by the large model at a frequency of 200hz serves as a high-speed “setpoint” for higher-speed full-body controller tracking. Full body controller ensures safe, stable dynamics such as maintaining balance.
All behaviors are driven by a neural network visual motion Transformer strategy that maps pixels directly to actions.
2. From ChatGPT to Sora to robots, OpenAI wants to take over the “intelligence” thing and make autonomous humanoid robot
OpenAI quietly shut down its robotics team in the summer of 2021. At that time, OpenAI announced an indefinite end to exploration in the field of robotics due to a lack of data needed to train robots to use artificial intelligence to move and reason, resulting in research and development being hampered.
But obviously, OpenAI has not given up its focus on this field.
In March 2023, exactly a year ago, Geek Park reported that OpenAI invested in 1X Technologies, a robot manufacturer from Norway. Its vice president is Eric Jang, who I mentioned at the beginning of the article, who believes that embodied intelligence will suddenly arrive.
Coincidentally, the technical direction of 1X Technologies is also end-to-end neural network control of robots.
In early March this year, OpenAI , together with other investors, participated in Figure’s Series B financing, bringing it to a valuation of US$2.6 billion two years after its establishment .
It was after this round of financing that OpenAI announced its cooperation with Figure.
Brett Adcock, the founder of Figure, is a serial entrepreneur who is “good at organizing”. He has founded at least 7 companies throughout his career, one of which went public with a valuation of US$2.7 billion, and another was acquired for US$110 million.
After founding the company, he recruited research scientist Jerry Pratt as chief technology officer and former Boston Dynamics/Apple engineer Michael Rose as director of robot control. Corey Lynch, the leader of the AI team who shared this time, was originally an AI researcher at Google Deepmind.
Figure announced that it has recruited hard-core design talents in motors, firmware, thermal, electronic products, middleware operating systems, battery systems, actuator sensors, machinery and structures.
The company is indeed making rapid progress. Before cooperating with OpenAI, many achievements have been made. In January 2024, Figure 01 (Figure’s first humanoid robot) learned to make coffee. The company said that an end-to-end neural network was introduced behind this, and the robot learned to correct its own mistakes, with a training time of 10 hours.
In February, the company showed off the latest progress of Figure 01. In the video, the robot had learned to move boxes and deliver them to the conveyor belt, but the speed was only 16.7% of humans.
Even in terms of commercialization, the first step has been taken: Figure announced the signing of a commercial agreement with BMW Manufacturing Company to integrate AI and robotics into vehicle production, deploying it at BMW’s manufacturing plant in Spartanburg, South Carolina. .
In today’s video presentation tweet, Figure announced that its goal is to train a world model that can eventually sell billion-level model-driven humanoid robots.
However, although the cooperation between OpenAI and Figure is progressing smoothly, it seems that OpenAI is not betting on a robotics company.
On March 13, Physical Intelligence, a newly established robotic AI company formed by a group of researchers from the Google research team, professors from the University of California, Berkeley, and Stanford University, was reported by Bloomberg to have received financing from OpenAI.
Not surprisingly, the company is also researching artificial intelligence that can become a universal robotic system in the future.
Bulls are betting on the field of robotics, and they cooperated in 13 days to create a leading large-scale robot model. OpenAI’s intentions in the field of robotics are attracting attention.
The future of intelligent humanoid robots depends on more than just Musk.
Previous Post Analysts Say: “GTA 6” will be a Game that can change the face of the Gaming Industry