site:www.marktechpost.com

DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference

Large language models that use the Mixture-of-Experts (MoE) architecture have enabled significant increases in model capacity without a corresponding rise in computation. However, this approach also ...

marktechpost4d

This AI Paper from Menlo Research Introduces AlphaMaze: A Two-Stage Training Framework for Enhancing Spatial Reasoning in Large Language Models

Artificial intelligence continues to advance in natural language processing but still faces challenges in spatial reasoning tasks. Visual-spatial reasoning is fundamental for robotics, autonomous ...

marktechpost5d

What are AI Agents? Demystifying Autonomous Software with a Human Touch

In today’s digital landscape, technology continues to advance at a steady pace. One development that has steadily gained attention is the concept of the AI agent—software designed to perform tasks ...

marktechpost6d

Stanford Researchers Introduce OctoTools: A Training-Free Open-Source Agentic AI Framework Designed to Tackle Complex Reasoning Across Diverse Domains

Large language models (LLMs) are limited by complex reasoning tasks that require multiple steps, domain-specific knowledge, or external tool integration. To address these challenges, researchers have ...

marktechpost6d

TokenSkip: Optimizing Chain-of-Thought Reasoning in LLMs Through Controllable Token Compression

Large Language Models (LLMs) face significant challenges in complex reasoning tasks, despite the breakthrough advances achieved through Chain-of-Thought (CoT) prompting. The primary challenge lies in ...

marktechpost5d

Building a Legal AI Chatbot: A Step-by-Step Guide Using bigscience/T0pp LLM, Open-Source NLP Models, Streamlit, PyTorch, and Hugging Face Transformers

In this tutorial, we will build an efficient Legal AI CHatbot using open-source tools. It provides a step-by-step guide to creating a chatbot using bigscience/T0pp LLM, Hugging Face Transformers, and ...

marktechpost6d

Moonshot AI and UCLA Researchers Release Moonlight: A 3B/16B-Parameter Mixture-of-Expert (MoE) Model Trained with 5.7T Tokens Using Muon Optimizer

Training large language models (LLMs) has become central to advancing artificial intelligence, yet it is not without its challenges. As model sizes and datasets continue to grow, traditional ...

marktechpost6d

Fine-Tuning NVIDIA NV-Embed-v1 on Amazon Polarity Dataset Using LoRA and PEFT: A Memory-Efficient Approach with Transformers and Hugging Face

In this tutorial, we explore how to fine-tune NVIDIA’s NV-Embed-v1 model on the Amazon Polarity dataset using LoRA (Low-Rank Adaptation) with PEFT (Parameter-Efficient Fine-Tuning) from Hugging Face.

marktechpost7d

Meet Baichuan-M1: A New Series of Large Language Models Trained on 20T Tokens with a Dedicated Focus on Enhancing Medical Capabilities

While LLMs have shown remarkable advancements in general-purpose applications, their development for specialized fields like medicine remains limited. The complexity of medical knowledge and the ...

marktechpost5d

This AI Paper from Weco AI Introduces AIDE: A Tree-Search-Based AI Agent for Automating Machine Learning Engineering

The development of high-performing machine learning models remains a time-consuming and resource-intensive process. Engineers and researchers spend significant time fine-tuning models, optimizing ...

marktechpost7d

Meta AI Releases ‘NATURAL REASONING’: A Multi-Domain Dataset with 2.8 Million Questions To Enhance LLMs’ Reasoning Capabilities

Large language models (LLMs) have shown remarkable advancements in reasoning capabilities in solving complex tasks. While models like OpenAI’s o1 and DeepSeek’s R1 have significantly improved ...

marktechpost12d

LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

After the advent of LLMs, AI Research has focused solely on the development of powerful models day by day. These cutting-edge new models improve users’ experience across various reasoning, content ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results