Gregory Axler

I develop novel deep learning solutions. Strong foundation in generative models (LLMs, diffusion, Flow-based, GANs, VAE). Proven ability to implement cutting-edge research and design/deploy high-performance models using Torch, JAX, CUDA, and Rust.

Experience

Symbolica AI | Founding Research Engineer | Apr 2023 – Present
- Developed efficient deep generative models for reinforcement learning-based fine-tuning of LLMs, including methods for synthetic data generation.
- Full lifecycle ownership including data pipelines, distributed training, and model architecture.
- Contributed to team recruitment and engineering best practices.
AI Teach U | Lead Research Engineer | Nov 2018 – Dec 2023
- Designed and built deep learning models for computer game animation, including video-to-3D facial animation system and audio synthesis using GANs, optical flow, pose estimation, and flow-based models.
- Developed infrastructure for automated data ingestion and model retraining.
- Managed full ML pipeline from data processing to deployment.
ClearML (formerly Allegro.AI) | Research Scientist | Nov 2016 – Dec 2019
- Developed and implemented novel solutions for diverse computer vision research projects, including CNN architecture optimization, self-improving object detection, semantic segmentation, and video search engine.
- This included building end-to-end training and evaluation pipelines.

Projects

Efficient Theorem Proving via Structured Generative Models (Symbolica)
- Developed a structured generative model, theorem identification algorithm, and synthetic data generation methods to enhance RL-based theorem proof search in a distributed environment.
- Achieved a 2.5x reduction in generation steps and a 50% faster search time.
- Addressed GPU underutilization and combinatorial complexity of theorem identification.
Llama Inference Engine in Pure Rust (Personal Project)
- Built a high-performance Llama inference engine from scratch in pure Rust (github repo).
- Optimized matrix-vector multiplication, parallelized computation, and implemented quantization.
- Demonstrates expertise in transformer models, Rust programming, and performance optimization.
OpenAI Triton Compiler Contributions and Rust-CUDA Transpiler Development (Symbolica/Personal)
- Developed high-performance GPU algorithms using CUDA and Rust, including a CUDA-based compression algorithm that reduced memory needs by 98%.
- Contributed to OpenAI's Triton compiler and developed a Rust-CUDA/Triton transpiler at Symbolica.
- Enabled simple integration of GPU kernels inside Rust code at Symbolica.
Optimized CNN Architectures for Object Detection on Edge Devices (ClearML)
- Designed and implemented optimized CNN architectures for embedded systems, achieving a 150% inference speed-up through dynamic network sizing.
- Developed a technique that adjusts the network size based on the predicted complexity of a frame (resulting in a US patent).

Education

MSc Operations Research (Magna Cum Laude) | Tel-Aviv University | 2018
- Publication: "Toward a Dataset-Agnostic Word Segmentation Method" | IEEE ICIP

Skills & Tools

ML/DL & Analysis Tools: Torch, JAX, Triton, Pallas, HuggingFace, Ray, Numba, OpenCV, Pandas/Polars, Seaborn, SQL
Programming Languages: Python, Rust, CUDA, C, C++ (Some: Haskell, Javascript, Go)
Infra & Serving: FastAPI, Docker, Kubernetes, AWS, GCP, W&B

Experience

Projects

Efficient Theorem Proving via Structured Generative Models (Symbolica)

Llama Inference Engine in Pure Rust (Personal Project)

OpenAI Triton Compiler Contributions and Rust-CUDA Transpiler Development (Symbolica/Personal)

Optimized CNN Architectures for Object Detection on Edge Devices (ClearML)

Education

Skills & Tools