Shivnath Tathe

Work and Projects

2026
Published on arXiv: True 4-Bit Quantized CNN Training on CPU
This paper demonstrates that CNNs can be trained from scratch at true 4-bit precision on commodity CPUs, with no specialized hardware and no post-training tricks. It achieved 92.34% accuracy on CIFAR-10 (just 0.16% below full-precision baseline), 70.94% on CIFAR-100, and 83.16% on a consumer Android device in only 6 epochs, with 8x memory compression over FP32. The key contribution is tanh-based soft weight clipping as a novel quantization technique.
Read the paper on arXiv

4-bit QAT CNN PyTorch CIFAR-10
2026
T4NT-0.5B-50M: Tanh 4-bit Neural Transformer scaled to 50M tokens
The same 535M parameter decoder-only architecture as T4NT-0.5B, extended with training on 50M tokens of WikiText data. Uses tanh soft weight clipping throughout and achieves improved perplexity over the 15M-token baseline. Released under Apache 2.0.
View T4NT-0.5B-50M on Hugging Face

LLM 4-bit Transformer HuggingFace
2025
T4NT-0.5B: Tanh 4-bit Neural Transformer, first run trained on 15M tokens
A 535M parameter decoder-only transformer trained entirely from scratch using 4-bit quantization-aware training on a single NVIDIA T4 GPU in under 3 hours. Architecture uses RoPE, SwiGLU, and RMSNorm. Best validation perplexity of 93.04 on WikiText-103. Deployed size of only 255 MB in INT4. Over 200 downloads on Hugging Face.
View T4NT-0.5B on Hugging Face

LLM 4-bit QAT RoPE SwiGLU RMSNorm
2025
T4NT-0.5B-Instruct: Instruction-tuned 4-bit language model
Instruction-tuned version of T4NT-0.5B, fine-tuned to follow user prompts and instructions rather than purely completing text. Retains the 4-bit quantized weight structure with full instruction-following capability.
View on Hugging Face

Instruction Tuning 4-bit
2025
AgentForge: Autonomous AI Agents that Build Their Own Tools
The working implementation of the Zenodo paper on autonomous tool-creation. Agents are built with LangChain and CrewAI and dynamically generate new tools at runtime to handle unforeseen challenges, reducing human intervention in complex workflows.
View on GitHub

LangChain CrewAI Agentic AI
2025
Published on Zenodo: Autonomous Tool-Creation in AI Agents
A conceptual framework for AI agents capable of self-generating tools in real-time. Targets mission-critical environments including aerospace, medical emergencies, and defense. Explores theoretical foundations, potential applications, and open challenges in building self-evolving agentic systems that require no human intervention.
Read the paper on Zenodo

AI Agents Agentic AI Zenodo
2025
DevShakti AI — OpenAI-Compatible LLM Platform
A self-hosted LLM platform running a fine-tuned Qwen 3 model on a private VPS with OpenAI-compatible API endpoints, authentication, rate limiting, and real-time token streaming. Includes a Python SDK (devshakti-ai) with 300+ downloads and live documentation with an interactive playground. Reduced inference response time from 52s to 19s, a 73% improvement.
View on GitHub

Qwen3 FastAPI React Python SDK VPS
2025
Genagent — AI Content Generation Platform
A full-stack AI content generation platform capable of producing 12+ content types including blog titles, SEO keywords, and social media captions. Built with Next.js for SSR, Drizzle ORM with PostgreSQL for data handling, and Razorpay for credit-based payments. Achieved 30% faster page loads through backend optimisation.
View on GitHub

Next.js Gemini API PostgreSQL Razorpay Drizzle ORM
2025
DevShakti — Offline AI Assistant for Android
DevShakti is a fully offline Android app built with React Native, running GGUF-quantized LLMs entirely on-device with no cloud dependency and no data leaving the device. Features include document upload with on-device RAG, tool calling, multi-agent pipelines, and the ability to connect external apps directly into the assistant.
View on GitHub

React Native GGUF On-device RAG Tool Calling Multi-Agent Android
2024
Joined ISG eSolutions as Software Engineer
Building RAG-based knowledge base systems using FastAPI, OpenAI, and vector embeddings, serving 10+ business clients through a 3-stage retrieval pipeline. Migrated vector storage from Pinecone to Qdrant for cost efficiency, adopted by 20+ clients in production. Built AI workflow automation tools including an offline LLM SQL assistant and a travel itinerary generator that cut extraction time by 57%. Designing custom multi-agent pipelines using LangGraph and LangChain for production workflows.

LangGraph LangChain RAG FastAPI Qdrant Angular
2024
Graduated, Shivaji University, Kolhapur
Computer science and software engineering.

Research by Shivnath Tathe

2026
LACE: Loss-Adaptive Capacity Expansion for Continual Learning
arXiv:2603.28611 · March 2026

LACE introduces a loss-driven mechanism that dynamically expands neural network capacity during training. When sustained loss indicates insufficient capacity, new dimensions are added without labels, replay buffers, or external control.

Results: Precise domain-boundary detection (100% precision), performance matching large fixed-capacity models, and unsupervised domain separation in GPT-2 activations.

Continual Learning Dynamic Capacity Loss-Based Expansion Adaptive Models
2026
True 4-Bit Quantized CNN Training on CPU: Achieving Full-Precision Parity
arXiv:2603.13931 · March 2026

This paper challenges the assumption that neural network training requires high precision and expensive hardware. Using a novel tanh-based soft weight clipping technique combined with symmetric 4-bit quantization, dynamic per-layer scaling, and straight-through estimators, a VGG-style CNN was trained entirely from scratch on commodity CPUs with no specialized kernels and no post-training quantization.

Results: 92.34% accuracy on CIFAR-10 (0.16% below full-precision baseline), 70.94% on CIFAR-100, and 83.16% on a consumer Android device in 6 epochs. 8x memory compression over FP32. Source code, training logs, and model checkpoints are open-source.

4-bit QAT PyTorch CNN CPU Training Tanh Clipping
2025
Autonomous Tool-Creation in AI Agents: A Conceptual Framework for Self-Evolving Systems
Zenodo · 2025

This paper proposes an agent architecture that enables autonomous tool generation during execution, going beyond AutoGPT and LangChain-style agents restricted to predefined toolsets. The framework allows agents to identify gaps in their capabilities and synthesize new tools on the fly without human intervention.

Key applications explored include autonomous diagnostics, emergency-response systems, aerospace, and defense. Implemented as AgentForge, an open-source project built on LangChain and CrewAI.

Agentic AI LangChain Tool Synthesis Self-Evolving Systems

About Shivnath Tathe

Work and Projects

Research by Shivnath Tathe

Technical Skills

Contact