// Machine Learning

SOLAR 10.7B: How Depth Upscaling Makes a 10B Model Beat 30B Models

Upstage's SOLAR 10.7B uses depth upscaling - duplicating and fine-tuning Llama 2 layers - to create a model that outperforms 30B-class models on the HuggingFace leaderboard while remaining practical to serve.

Apr 14, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

Kosmos-2: Grounded Image Understanding That Links Text to Image Regions

Microsoft's Kosmos-2 produces bounding box coordinates inline with its text output, connecting every noun and phrase in its response to a specific region of the image.

Apr 12, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

Phi-3 Vision: Microsoft's 4.2B Multimodal Model for Edge Devices

Phi-3 Vision packs chart understanding, document analysis, and image reasoning into 4.2 billion parameters - small enough to run on a mobile device with CoreML or ONNX, yet scoring 59.8% on MMMU.

Apr 11, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

SDXL-Turbo: Real-Time Image Generation in 1-4 Steps

Stability AI's Adversarial Diffusion Distillation compresses SDXL into a 1-step model that generates 512px images in under 200ms - enabling real-time interactive generation.

Apr 8, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

RoBERTa: BERT Done Right - When and How to Use It for Classification

RoBERTa improves on BERT through better pre-training - dynamic masking, no next-sentence prediction, larger batches, and more data - delivering consistent GLUE leaderboard advantages for classification tasks.

Apr 8, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

NVIDIA Nemotron-4 340B and Llama-3.1-Nemotron-70B: Enterprise LLMs From NVIDIA

NVIDIA entered the foundation model market with two distinct plays: Nemotron-4 340B for synthetic data generation pipelines, and Llama-3.1-Nemotron-70B-Instruct with an Arena Hard score of 85.1% for enterprise inference.

Apr 5, 2026

5 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

CogVLM2: Open-Source Video and Image Understanding With Long Context

Zhipu AI's CogVLM2 introduces a Visual Expert Module that gives visual tokens their own weight matrices, enabling richer image and video understanding than shared-weight alternatives.

Apr 4, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

Moondream2: A 1.9B VLM That Runs on a Raspberry Pi

Moondream2 is a 1.9B parameter vision-language model that fits in 1.2GB RAM when quantized, enabling image captioning, visual Q&A, and object detection on embedded hardware and edge devices.

Apr 2, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

Phi-3 Mini: Running a 3.8B Parameter LLM On Your Phone

Phi-3 Mini at 3.8B parameters outperforms Mixtral 8x7B on several benchmarks and runs in browsers via WebGPU or on Android/iOS via ONNX. Here's how.

Mar 31, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

InternVL 2: The Open-Source VLM Approaching GPT-4V on Benchmarks

Shanghai AI Lab's InternVL2-26B scores 61.2% on MMMU - within 2 points of GPT-4V - using a 6B vision encoder and dynamic high-resolution image tiling.

Mar 30, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

DistilBERT in Production: Fast NLP Classification Without the GPU Bill

DistilBERT delivers 97% of BERT's performance at 40% smaller size and 60% faster inference, making it the practical default for production text classification that needs low latency on CPU.

Mar 29, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

// Machine Learning

Idefics2: HuggingFace's Open Multimodal Model Built on Mistral and SigLIP

Idefics2 is an 8B open multimodal model that handles interleaved image-text sequences, arbitrary image resolutions, and fine-tuning for document and chart understanding.

Mar 25, 2026

7 min read

Mahmudul Haque Qudrati

CEO & ML Engineer

Machine Learning

SOLAR 10.7B: How Depth Upscaling Makes a 10B Model Beat 30B Models

Kosmos-2: Grounded Image Understanding That Links Text to Image Regions

Phi-3 Vision: Microsoft's 4.2B Multimodal Model for Edge Devices

SDXL-Turbo: Real-Time Image Generation in 1-4 Steps

RoBERTa: BERT Done Right - When and How to Use It for Classification

NVIDIA Nemotron-4 340B and Llama-3.1-Nemotron-70B: Enterprise LLMs From NVIDIA

CogVLM2: Open-Source Video and Image Understanding With Long Context

Moondream2: A 1.9B VLM That Runs on a Raspberry Pi

Phi-3 Mini: Running a 3.8B Parameter LLM On Your Phone

InternVL 2: The Open-Source VLM Approaching GPT-4V on Benchmarks

DistilBERT in Production: Fast NLP Classification Without the GPU Bill

Idefics2: HuggingFace's Open Multimodal Model Built on Mistral and SigLIP

Explore Other Categories

Artificial Intelligence

LLM & Language Models

Prompt Engineering

Developer Tools

Open Source AI

AI Cost & Efficiency

AI Scoring & Evals

AI Marketing & SEO

Mobile Development

Web Development

Data Science

AI Agents