Behind the Paper

Bringing Ancient Art to Life: Breathing Life into Shanshui Art with AI and Perlin Noise

When we think of traditional Chinese Shanshui (mountain-water) paintings, we imagine serene, timeless scenes of misty mountains and flowing rivers—a harmony between nature and artistic expression that has been refined over centuries. But what if these tranquil landscapes could come to life?

In our recent paper, Generative AI Shanshui Animation Enhancement using Perlin Noise and Diffusion Models,” we set out to explore exactly that: using modern generative AI to animate classical Shanshui art without losing its soul.

The Challenge: Preserving Art in the Age of AI

Generative AI has made incredible strides in image and video synthesis, but traditional art forms like Shanshui painting remain a tough nut to crack. The main hurdles are:

  • Limited training data: There aren’t enough high-quality, digitized Shanshui paintings to train a model from scratch.

  • Aesthetic complexity: Shanshui isn’t just about shapes—it’s about composition, brushstroke style, mood, and cultural nuance.

Simply fine-tuning a diffusion model on a few Shanshui images wasn’t enough. We needed a way to guide the AI to understand the structure and spirit of the art, not just mimic it.

Our Approach: A Hybrid Creative Pipeline

We built a modular system that combines several AI techniques into a coherent creative workflow:

1. Generating the Skeleton with Perlin Noise

Instead of starting from noise or random latent vectors, we used Perlin Noise—a classic computer graphics algorithm—to generate the foundational “skeleton” of the landscape. Perlin Noise gives us natural-looking, continuous variations that mimic the organic flow of ink and brushwork. Mountains, ridges, and water paths emerge in a way that already feels artistic, not algorithmic.

2. Guiding Diffusion with ControlNet and GPT-4

We then used Stable Diffusion paired with ControlNet to “fill in” the skeleton with style, color, and detail. ControlNet ensured the generated structure stayed true to the original sketch, while GPT-4 helped generate rich, descriptive prompts that captured the essence of Shanshui—terms like “misty mountains,” “flowing river,” “distant pine trees,” and “soft ink wash.”

This prompt engineering step was crucial. It allowed us to steer the diffusion model toward artistic authenticity without needing thousands of training examples.

3. From Image to Animation with AnimateDiff

Here’s where the magic happens: turning a static painting into a living animation. We developed an Image-to-Video (I2V) Encoder that prepares the generated landscape for AnimateDiff, a diffusion-based video generation model. By introducing controlled noise and temporal dynamics, we created smooth, coherent motion—clouds drifting, water flowing, leaves rustling—all while preserving the painting’s style.

4. Refining with Textual Inversion and LoRA

To further enhance quality, we used Textual Inversion to teach the model what not to generate (e.g., “blurry,” “oversaturated”), and experimented with LoRA fine-tuning to adapt the model more closely to Shanshui aesthetics. Interestingly, we found that a well-designed Perlin Noise backbone often outperformed LoRA in maintaining structural integrity and stylistic purity.