NVIDIA H100 ComfyUI setup

preface

Yesterday I wrote about my AI slop addiction. For now I stopped at least the long nights of image and video prompting. This is a post I was working on while still at it. I thought I’d still publish it, it contains some useful stuff just for reference.

I couldn’t resist and wanted to know what it’s like to use a NVIDIA H100 to generate AI image and video slop. You can get one for ~2$/hr (yes, per hour) on Hyperstack for example.

I set up a VM with a 80GB VRAM H100 and an additional 500GB SSD volume to persist the setup via the Hyperstack UI. Also created an entry for an SSH key to be able to access the VM. My ssh terminal mojo is quite rusty so I used Claude to help me out a bit.

Hyperstack has some docs on how to access your VM via SSH, in our case this looks like:

ssh ubuntu@<vm's_public_ip> -i ~/.ssh/<filename>

To be able to use the additional volume, I needed to format and mount it. Note vdc is the name of the volume I found out via running lsblk.

# Format
sudo mkfs -t ext4 /dev/vdc

# Create a mount point
sudo mkdir /mnt/comfy-ui-data

# Mount it
sudo mount /dev/vdc /mnt/comfy-ui-data

# Tweak permissions, change ownership
sudo chown -R ubuntu:ubuntu /mnt/comfy-ui-data

# Give read/write permissions
sudo chmod -R 755 /mnt/comfy-ui-data

Next we’ll set up ComfyUI. I mostly followed the instructions on how to install it on Linux.

# clone repository
cd /mnt/comfy-ui-data
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# install torch related stuff for NVIDIA
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121

# install dependencies
pip install -r requirements.txt

# goto ComfyUI/custom_nodes and install ComfyUI manager
cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager

The next section is about downloading models from Hugging Face into their respective directories. These are just reference commands without any specific order.

# Place in ComfyUI/models/diffusion_models
curl -L https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/resolve/main/split_files/diffusion_models/hunyuan_video_t2v_720p_bf16.safetensors?download=true --output hunyuan_video_t2v_720p_bf16.safetensors
curl -L "https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_FastVideo_720_fp8_e4m3fn.safetensors?download=true" --output hunyuan_video_FastVideo_720_fp8_e4m3fn.safetensors

# Place in ComfyUI/models/text_encoders
curl -L "https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/resolve/main/split_files/text_encoders/clip_l.safetensors?download=true" --output clip_l.safetensors
curl -L "https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/resolve/main/split_files/text_encoders/llava_llama3_fp8_scaled.safetensors?download=true" --output llava_llama3_fp8_scaled.safetensors
curl -L "https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14/resolve/main/Long-ViT-L-14-GmP-SAE-TE-only.safetensors" --output Long-ViT-L-14-GmP-SAE-TE-only.safetensors

# Place in ComfyUI/models/vae
curl -L "https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/resolve/main/split_files/vae/hunyuan_video_vae_bf16.safetensors?download=true" --output hunyuan_video_vae_bf16.safetensors

# Place in ComfyUI/models/clip_vision
curl -L "https://huggingface.co/openai/clip-vit-large-patch14/resolve/main/model.safetensors?download=true" --output clip-vit-large-patch14_OPENAI.safetensors

# Place in ComfyUI/models/loras
curl -L "https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hyvideo_FastVideo_LoRA-fp8.safetensors" --output hyvideo_FastVideo_LoRA-fp8.safetensors

# Place in ComfyUI/models/unet
curl -L "https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/resolve/main/split_files/diffusion_models/hunyuan_video_t2v_720p_bf16.safetensors" --output hunyuan_video_t2v_720p_bf16.safetensors

Downloading models from Civitai requires an API key, look at the docs here: https://education.civitai.com/civitais-guide-to-downloading-via-api/

Once all models are downloaded (this can take a while), it’s time to start up ComfyUI.

cd /mnt/comfy-ui-data/ComfyUI
python3 main.py

ComfyUI runs a web server, to access the UI, create a tunnel to its exposed port from your local machine like this:

ssh -L 8188:localhost:8188 ubuntu@<ip-address> -i ~/.ssh/<filename>

The -L flag creates a tunnel where the first 8188: is the local port on your machine you want to map this to, localhost:8188 is the destination host and port on the remote VM. If you want to run this in the background (so you can use the same terminal for other commands), you can add the -N flag.

You can now access ComfyUI on your local machine’s browser! Have fun creating some slop!

To download slop from the VM to my local machine I used rsync:

# Create local output directory if needed
mkdir -p ~/ComfyUI-output

# Run rsync to download just new data from ComfyUI's output directory
rsync -avzP -e "ssh -i ~/.ssh/<filename>" ubuntu@<ip-address>:/mnt/comfy-ui-data/ComfyUI/output/ ~/ComfyUI-output/

I already had some LORAs on my local machine I wanted to upload to the VM. This is a bit slower because it’s limited by your internet connection’s upload speed. But LORAs usually are way smaller than the checkpoint models.

# Create local LORA directory if needed and move the LORAs you want to upload in there
mkdir -p ~/ComfyUI-loras

# Run rsync this time the other way around to sync new LORAs to the VM
rsync -avz -e "ssh -i ~/.ssh/<filename>" ~/dev/ComfyUI-loras/ ubuntu@<ip-address>:/mnt/comfy-ui-data/ComfyUI/models/loras/

I played around with creating txt2vid with Hunyuan, here’s the docs to get a basic workflow running in ComfyUI: https://blog.comfy.org/p/hunyuanvideo-native-support-in-comfyui

Hunyuan can also be combined with LORAs, here’s a workflow doing that: https://civitai.com/models/1081086?modelVersionId=1244929 (watch out, CivitAI contains a lot of NSFW stuff)

An interesting finding was that Hunyuan creates perfect looping videos if you set the length to 201 frames!

Hunyuan prompting is quite a rabbit hole. You might get a perfect cinematic video with narrow depth of field for something as simple as the typical a cat walking on grass, but as soon as you try something more elaborate it starts to spit out just ugly shit.

For comparison, the img2vid workflow I played with that used A1111 with Stable Diffusion to create a starting image and then used that with Kling Video Pro v1.6 got me consistent results I never came close with Hunyuan’s txt2vid. There are some early stage img2vid workflows for Hunyuan (https://github.com/AeroScripts/leapfusion-hunyuan-image2video) but IMHO not in the same league as Kling Video. Tencent says they are working on official img2vid support and it will come soon.

epilog

That’s it for now. Pro tip: Don’t forget to hibernate your H100 VM once you’re done for the day, otherwise it gets expensive quickly!

As mentioned earlier, I stopped for now with this, because I ended up spending way too much time just coming up with stupid slop. There’s still some worthwile learnings there: I finally was able to get going with an H100 via cloud, it’s an amazing piece of hardware and pretty nice that it’s as simple to use these days. It was interesting to see how ComfyUI works. I might get back to it for some Super 8 video restoration.

I’ll leave you with a Super 8 still frame of me eating pasta in the late 1970s. This was captured using a KODAK Reels Super 8 film digitizer. By default it delivers a bit shitty quality but there’s a community contributed firmware out there that improves things a bit. I then used Final Cut Pro with the Neat Video plugin to remove dust and scratches and then another pass through Topaz Video AI for some upscaling and frame rate conversion.

A young walterra eating pasta, Super 8 still frame.

more resources

github.com/pwillia7/Basic_ComfyUI_Workflows