image-blaster — Turn a Single Photo into a 3D Environment + Audio + Mesh in One Pipeline

Take a single photo, run a Claude-orchestrated workflow, and walk away with an explorable 3D environment, ambient audio, and a downloadable object mesh — in under five minutes in typical runs (actual time varies with API latency and queueing). That is the pitch behind image-blaster, an open-source Claude Code skill pipeline that chains World Labs, FAL, GPT Image 2, Hunyuan3D, and ElevenLabs into a single workflow. It is a genuine creative experiment worth knowing about, with some real API costs and rough edges to factor in.

Source: Weibo post by AI creator 陆玉金, May 2026 (Weibo URL not independently verified; no direct permalink available). Primary verifiable source: GitHub repo github.com/neilsonnn/image-blaster

The pipeline at a glance

                        ┌────────────────────┐
                        │   Input Image      │
                        │  (single photo)    │
                        └────────┬───────────┘
                                 │
                    ┌────────────▼────────────┐
                    │  nano-banana            │
                    │  (cleanup + reference)  │
                    └────────────┬────────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              │                  │                  │
   ┌──────────▼──────┐  ┌────────▼──────┐  ┌───────▼────────┐
   │  World Labs     │  │  Hunyuan-3D   │  │  ElevenLabs    │
   │  Marble         │  │  (via FAL)    │  │  SFX           │
   │  (3D world      │  │  (mesh model) │  │  (ambient +    │
   │   exploration)  │  │               │  │   object audio)│
   └──────────┬──────┘  └────────┬──────┘  └───────┬────────┘
              │                  │                  │
              └──────────────────▼──────────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │   Complete Scene:       │
                    │   3D env + mesh + audio │
                    └─────────────────────────┘

              (gpt-image-2 invoked optionally
               when image editing is needed
               before feeding into pipeline)

The five model components

Model	Role in pipeline	Notes
World Labs Marble	Generates an interactive 3D Gaussian-splat environment from the input image	Co-founded by Fei-Fei Li (李飞飞); World Labs API key required; check current credit pricing before running at scale
nano-banana	Image cleanup — source cleanup, clean plates, and object reference image prep	Runs as the first preprocessing step so downstream models get clean input
gpt-image-2	Alternate image-edit provider — named in the README as an alternative to nano-banana when the edit skill is instructed to prefer it	Optional; OpenAI API key required if used (not listed in the default `.env.example`)
Hunyuan-3D (via FAL)	Generates downloadable 3D mesh models of objects extracted from the image	Runs through FAL’s inference API; FAL key required; produces per-object geometry, not a full-scene mesh
ElevenLabs SFX	Generates ambient environmental audio and per-object sound effects	Runs through FAL’s inference API (no separate ElevenLabs key required); outputs audio that matches the visual scene content

What you can generate

image-blaster turns a flat photograph into a multi-modal creative artifact:

Interactive 3D environment — walk around the scene using the World Labs Marble viewer (browser-based; verify current viewer requirements at worldlabs.ai)
Downloadable object mesh — per-object 3D geometry (Hunyuan-3D output via FAL) you can import into Blender, Unity, or any 3D tool; note this is object-level mesh, not a full reconstructed scene mesh
Ambient audio layer — ElevenLabs SFX generates soundscapes based on the scene content; specific audio examples (e.g. particular nature or crowd sounds) are illustrative and will vary by model output
Edited reference images — GPT Image 2 can pre-process or stylize the source before it enters the pipeline, giving you creative control over the starting point

The repo claims under five minutes in typical runs; actual time varies with API latency, queueing, and model choices. The default .env.example requires only two API keys: WORLD_LABS_API_KEY and FAL_KEY. ElevenLabs SFX and Hunyuan-3D both run through FAL’s inference API, so no separate ElevenLabs key is needed for the default workflow. GPT Image 2 is an alternate image-edit path — if you use it, an OpenAI key is also required (not documented in the default .env.example). Check current pricing at World Labs and FAL before running at any scale.

How LearnAI Team Could Use This

For Q’s AI education work at Monmouth University, image-blaster is genuinely instructive at multiple levels:

In CS-310 (Object-Oriented Design): The pipeline is a strong OO design teaching example. Each model is a loosely coupled component with a single responsibility — nano-banana cleans input, World Labs generates the world, Hunyuan-3D produces geometry, ElevenLabs handles audio. The integration layer that wires them together mirrors what students build in design pattern exercises. Running the tool and then asking “how would you architect this yourself?” makes the Facade and Chain-of-Responsibility patterns concrete without being abstract.

Live classroom demos: A single dramatic demo — drop a photo of campus, wait five minutes, walk around a 3D version with audio — creates the kind of moment that makes students actually curious about AI systems rather than just intimidated. It demonstrates multi-modal AI without requiring a lecture on transformers.

Explaining multi-model pipelines: Most students first encounter AI as a single model doing a single thing. image-blaster breaks that mental model immediately: five different models, five different modalities, all cooperating. It’s an honest illustration of how production AI systems actually work.

Hackathon seed: For AI hackathons or creative projects, image-blaster provides an instant demo scaffold. Students can clone it, swap out one model, and have something novel within hours rather than days.

Real-World Use Cases

Use Case	Who Benefits	How
Game asset prototyping	Indie developers	Convert concept art into explorable 3D environments before committing to full modeling
Virtual museum exhibits	Educators, archivists	Turn historical photographs into explorable 3D environments (Hunyuan-3D produces object-level mesh, not full scene reconstruction) with ambient audio
Architecture visualization	Designers	From a single render or mood board image, create a quick spatial impression for clients
Film pre-visualization	Directors, producers	Convert storyboard panels into navigable 3D previews during pre-production
AI education demos	Teachers, workshop facilitators	Show students what a real multi-model pipeline looks like without writing a line of code
Social media content	Creators	Generate novel 3D + audio content from a single hero image for short-form video

Important things to know

Early-stage software. image-blaster is an open-source experiment, not a production SaaS. Expect rough edges: undocumented flags, occasional model API changes that break the pipeline, and limited error handling when upstream services are slow or down. Pin dependency versions if you use this in any stable workflow.

API costs are real. The pipeline bills through two providers: World Labs (for Marble world generation) and FAL (for Hunyuan-3D mesh and ElevenLabs SFX, both routed through FAL’s inference API). If you use the GPT Image 2 editing path, OpenAI also charges for that step. The original source describes World Labs as “zero marginal cost,” but current World Labs documentation indicates API credits are limited — treat that claim with caution and check current credit pricing before running at scale. FAL charges per generation for each model call (e.g. Hunyuan3D is priced per mesh generation). Set spending caps before experimenting freely.

World Labs provenance matters — but is not a stability guarantee. Fei-Fei Li (李飞飞) co-founded World Labs, which gives the Marble world model genuine research credibility. That said, “co-founded by a notable researcher” is not the same as “production-ready infrastructure.” API behavior, pricing, and availability can change on a research-stage platform. Treat it as a powerful preview, not a stable dependency.

Only two keys are required for the default workflow. The .env.example lists only WORLD_LABS_API_KEY and FAL_KEY. ElevenLabs SFX and Hunyuan-3D both run through FAL, so no separate ElevenLabs key is needed. GPT Image 2 is an alternate image-edit provider — add an OpenAI key only if you instruct the pipeline to use it (not documented in the default .env.example).

Output quality varies with input. Like all image-to-3D pipelines, quality degrades significantly with complex scenes, unusual perspectives, or heavily occluded objects. Hunyuan-3D produces object-level geometry, not a full reconstructed scene — do not expect a complete navigable mesh of the whole image. Clean, well-lit, single-subject photos produce the best results.

Privacy and licensing. If you use campus photos, student work, or any images that include people’s faces for demos, check your institution’s acceptable-use and image-rights policies before running them through third-party cloud APIs. Each provider’s data retention terms apply.

Third-party API fragility. The pipeline depends on two primary external providers (World Labs and FAL), with FAL itself routing calls to Hunyuan-3D and ElevenLabs SFX. Optionally OpenAI is a third. If any provider has an outage, changes its API contract, or deprecates a model, the relevant pipeline step breaks. Pin API client versions and build in graceful fallback logic if you use this in any recurring workflow.