Meta AI’s model for bringing children’s drawings to life

“I vividly remember listening as a child to Harold and the purple pencil, the story of a boy with a magic pencil whose drawings came to life. It was something I wish I had existed as a kid, and now, all these years later, I could help make it real. noted Jesse Smith, postdoctoral researcher who works with Meta AI’s Creativity Pillar. He was part of the team that built the first AI-powered animation tool capable of automatically animating children’s drawings of human figures. The whole process only takes a few minutes.

Why do we need this?

Children’s drawings are often unique, inventive and above all abstract, prompting adults to recognize people and things in pictures differently. Humans, like a parent or teacher, would find it easy to recognize and understand what these drawings mean, but for an AI, this illegibility poses a huge challenge. For example, even a state-of-the-art artificial intelligence system, trained to spot objects in photorealistic images, can be confused by the whimsical and unique appearance of these children’s drawings.

Smith mentioned that what surprised him was how difficult it is to get a model to predict good character segmentation to enable animation. A plausible reason for this is that many typefaces are hollow drawn – meaning part or all of the body is outlined with a stroke, but the inner part remains unmarked. The inside and outside of the image have the same color and texture, and we can’t rely on them to infer which pixels belong to the character.

To this end, Meta has introduced an AI system that automatically animates children’s hand-drawn figurines and brings them to life. A user can use this system by simply uploading the images to the prototype system and then animating the image to perform tasks like dancing, jumping, jumping, etc. These animations can be downloaded and shared with loved ones.

https://www.facebook.com/watch/?v=196558752689269

Object detection for children’s drawings

First, the researchers devised mechanisms to distinguish the human figure from the background and from other figures throughout the image. To extract the human-like characters from the drawings, Meta AI researchers used the CNN-based pattern detection model – Mask R-CNN, as implemented in Detectron2. The model was then refined using ResNet-50 + FPN to predict a single class, “human figure”. About 1000 drawings were used to train the AI ​​model.

Credit: Meta AI

After identifying and extracting human figures, the next step is masking. It is the process of separating human figures from other parts of the scene that closely mirror the outlines of the figures, which would then be used to create a mesh that would eventually be distorted to produce the animation.

The researchers developed a classical image processing approach to crop the image using the predicted bounding box for each detected character. They then applied adaptive thresholding and morphological closure operations, a flood fill from the edges of the box, and assumed that the mask is the largest polygon untouched by this flood fill.

To identify the key points of the human figures, the researchers used AlphaPose, a model trained for human pose detection. Then, using the pose detector trained on the initial dataset, the researchers created an in-house tool that allows parents to upload and animate their children’s drawings.

For the final animation task, the researchers used the extracted mask to generate the mesh and texture it with the original design; they created a skeleton for the character using the intended joint locations. They took advantage of the “distorted perspective” children use to draw. The researchers then determine whether the movement is more recognizable from the front or from the side while considering the lower and upper body separately. The movement is then projected onto a single 2D plane and pilots the character. The results are then validated using perpetual user studies conducted using Mechanical Turk.

As we move on

The researchers believe that this study could help apply movements more suited to subcategories of tricks. Also, a finer analysis could be useful to increase the appeal of any animation. They also predict that an AI system could create detailed animation for a complex drawing.