gpu-kernels

Open source skill library for AI coding agents to write, optimize, and debug high performance compute kernels across CUDA, Triton, and quantized workloads.

cuda high-performance-computing triton quantization rocm gpu-kernels prompt-engineering llm-agents ai-coding kernel-optimization

Updated Apr 11, 2026

meowmeowxw / nvidia-kernel-detective

Star

Real-time NVIDIA GPU command capture, decoding, and visualization

real-time kernel nvidia linux-kernel-driver nvidia-gpu gpu-kernels blackwell-gpu

Updated Apr 14, 2026
C

hliadis / High-Performance-Computing

Star

c hpc gpu optimization openmp cuda gpu-kernels

Updated Dec 15, 2022
C

anoojpatel / metaxu

Star

A self-hosted low-level functional-style programming language 🌀

algebraic-effects functional-programming self-hosted algebraic-data-types compilers python-compiler gpu-kernels borrow-checker mutable-value-semantics

Updated Sep 25, 2025
Python

PwnKit-Labs / noeris

Star

Noeris — autonomous kernel fusion discovery + Triton autotuning for LLM kernels and Gemma layer deeper fusion (A100/H100 wins).

benchmarking cuda pytorch triton autotuning gemma gpu-kernels github-actions kernel-fusion llm-training llm-inference kernel-optimization

Updated Apr 17, 2026
Python

sean1832 / Macho

Star

High-performance GPU-accelerated C# scripting for Rhino Grasshopper, powered by ILGPU

grasshopper3d gpu-kernels gpu-programming rhino3d grasshopper-plugin ilgpu scripting-tool

Updated Mar 31, 2025
C#

StigLidu / AdaExplore

Star

The official implementation for paper "AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation"

gpu-kernels llm-agent llm-inference self-improving-agent inference-time-scaling

Updated Apr 21, 2026
Python

martini9393 / gpu-executor

Star

Assignment 2: GPU Executor

computation-graph gpu-kernels

Updated May 12, 2017
Python

shyamsridhar123 / MedAssist_MOE

Star

Medical AI diagnostics system implementing real compiled Mojo GPU kernels with MAX Graph integration

python mojo healthcare diagnostics gpu-kernels mixture-of-experts medical-ai

Updated Aug 25, 2025
Python

AregGevorgyan / JaxonFlow

Star

Alternate backend for JAX and PyTorch that generates optimized kernels using AI agents

ai pytorch ai-agents gpu-kernels jax llm

Updated Feb 3, 2026
Python

poyea / lollipop

Star

🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy

python cuda cuda-kernels gpu-kernels gpu-programming cuda-programming cuda-kernel

Updated Apr 20, 2026
Cuda

SergiuDeveloper / cuda-kernel-verifier

Star

Runtime correctness checker for custom CUDA kernels. Attach a single decorator to periodically verify outputs against a reference implementation, with outlier-biased sampling and zero training graph impact.

deep-learning gpu cuda python3 pytorch triton gpu-kernels pytohn

Updated Mar 13, 2026
Python

varad-more / fused-triton-rmsnorm-residual-qkv

Star

Production-grade Triton kernel fusing residual add + RMSNorm + packed QKV projection into a single GPU launch for decoder-only transformer inference (Llama-3, Mistral, Qwen2). +2.4% tok/s, -1.5 GB VRAM on A10G.

cuda pytorch transformer triton llama memory-bandwidth gpu-kernels kernel-fusion rmsnorm llm-inference

Updated Apr 18, 2026
Python

kalyani-25 / Reimplementation_flash-attention-from-scratch

Star

16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official performance on A100 — Ampere architecture

deep-learning cuda pytorch ampere gpu-kernels nsight llm-inference flashattention

Updated Mar 6, 2026
Cuda

Improve this page

Add a description, image, and links to the gpu-kernels topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-kernels topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu-kernels

Here are 28 public repositories matching this topic...

ROCm / rocprofiler-compute

xmartlabs / cuda-calculator

dlsys-course / assignment2-2017

eyalroz / gpu-kernel-runner

upenn-acg / gpudrano-static-analysis_v1.0

beehive-lab / beehive-spirv-toolkit

KrxGu / kernel-skills

meowmeowxw / nvidia-kernel-detective

hliadis / High-Performance-Computing

anoojpatel / metaxu

PwnKit-Labs / noeris

sean1832 / Macho

StigLidu / AdaExplore

martini9393 / gpu-executor

shyamsridhar123 / MedAssist_MOE

AregGevorgyan / JaxonFlow

poyea / lollipop

SergiuDeveloper / cuda-kernel-verifier

varad-more / fused-triton-rmsnorm-residual-qkv

kalyani-25 / Reimplementation_flash-attention-from-scratch

Improve this page

Add this topic to your repo