I develop novel deep learning solutions. Strong foundation in generative models (LLMs, diffusion, Flow-based, GANs, VAE). Proven ability to implement cutting-edge research and design/deploy high-performance models using Torch, JAX, CUDA, and Rust.
Experience
Symbolica AI | Founding Research Engineer | Apr 2023 – Present
Developed efficient deep generative models for reinforcement learning-based fine-tuning of LLMs, including methods for synthetic data generation.
Full lifecycle ownership including data pipelines, distributed training, and model architecture.
Contributed to team recruitment and engineering best practices.
AI Teach U | Lead Research Engineer | Nov 2018 – Dec 2023
Designed and built deep learning models for computer game animation, including video-to-3D facial animation system and audio synthesis using GANs, optical flow, pose estimation, and flow-based models.
Developed infrastructure for automated data ingestion and model retraining.
Managed full ML pipeline from data processing to deployment.
ClearML (formerly Allegro.AI) | Research Scientist | Nov 2016 – Dec 2019
Developed and implemented novel solutions for diverse computer vision research projects, including CNN architecture optimization, self-improving object detection, semantic segmentation, and video search engine.
This included building end-to-end training and evaluation pipelines.
Projects
Efficient Theorem Proving via Structured Generative Models (Symbolica)
Developed a structured generative model, theorem identification algorithm, and synthetic data generation methods to enhance RL-based theorem proof search in a distributed environment.
Achieved a 2.5x reduction in generation steps and a 50% faster search time.
Addressed GPU underutilization and combinatorial complexity of theorem identification.
Llama Inference Engine in Pure Rust (Personal Project)
Built a high-performance Llama inference engine from scratch in pure Rust (github repo).
Optimized matrix-vector multiplication, parallelized computation, and implemented quantization.
Demonstrates expertise in transformer models, Rust programming, and performance optimization.
OpenAI Triton Compiler Contributions and Rust-CUDA Transpiler Development (Symbolica/Personal)
Developed high-performance GPU algorithms using CUDA and Rust, including a CUDA-based compression algorithm that reduced memory needs by 98%.
Contributed to OpenAI's Triton compiler and developed a Rust-CUDA/Triton transpiler at Symbolica.
Enabled simple integration of GPU kernels inside Rust code at Symbolica.
Optimized CNN Architectures for Object Detection on Edge Devices (ClearML)
Designed and implemented optimized CNN architectures for embedded systems, achieving a 150% inference speed-up through dynamic network sizing.
Developed a technique that adjusts the network size based on the predicted complexity of a frame (resulting in a US patent).
Education
MSc Operations Research (Magna Cum Laude) | Tel-Aviv University | 2018
Publication: "Toward a Dataset-Agnostic Word Segmentation Method" | IEEE ICIP