Microsoft Research Blog

LLM profiling guides KV cache optimization

May 8, 2024 | Liyuan Liu and Jianfeng Gao

LLMs rely on memory-intensive mechanisms like the key-value (KV) cache to store and quickly retrieve data. FastGen optimizes KV cache usage, reducing LLM memory demands by up to 50% while maintaining performance.

Recent Posts

LLM profiling guides KV cache optimization

May 8, 2024 | Liyuan Liu and Jianfeng Gao

LLMs rely on memory-intensive mechanisms like the key-value (KV) cache to store and quickly retrieve data. FastGen optimizes KV cache usage, reducing LLM memory demands by up to 50% while maintaining performance.
LoftQ: Reimagining LLM fine-tuning with smarter initialization

May 7, 2024

LoftQ boosts LLM efficiency by streamlining the fine-tuning process, reducing computational demands while preserving high performance. Innovations like this can help make AI technology more energy-efficient.
Research Focus: Week of April 29, 2024

May 2, 2024

In this edition: Can LLMs transform natural language into formal method postconditions; Semantically aligned question + code generation for automated insight generation; Explaining CLIP performance disparities on blind/low vision data; plus recent news.
Microsoft at ASPLOS 2024: Advancing hardware and software for high-scale, secure, and efficient modern applications

April 29, 2024 | Rodrigo Fonseca and Madan Musuvathi

From AI and deep learning to innovations in infrastructure, researchers from Microsoft are bridging the gap between architecture, programming languages, and operating systems to advance the state of the art at ASPLOS 2024.
SIGMA: An open-source mixed-reality system for research on physical task assistance

April 29, 2024 | Dan Bohus and Sean Andrist

Microsoft recently developed and released the Situated Interactive Guidance, Monitoring, and Assistance (SIGMA) system, an open-source research platform, to enable research and innovation at the intersection of mixed reality and AI.
SAMMO: A general-purpose framework for prompt optimization

April 18, 2024 | Tobias Schnabel and Jennifer Neville

SAMMO optimizes prompts for LLMs by leveraging their structure to guide optimization. This minimizes the time and effort needed to find performant prompts on a variety of tasks.
Research Focus: Week of April 15, 2024

April 17, 2024

In this issue: New research on appropriate reliance on generative AI; Power management opportunities for LLMs in the cloud; LLMLingua-2 improves task-agnostic prompt compression; Enhancing COMET to embrace under-resourced African languages:
Microsoft at NSDI 2024: Discoveries and implementations in networked systems

April 16, 2024 | Ranveer Chandra

Microsoft at NDSI 2024: Discoveries and implementations in networked systems Topics range from 5G, space, datacenters, and wide-area networking to applications in artificial intelligence, security, video conferencing, and gaming. Learn more about the discoveries and advances we're making with networked systems.
Research Focus: Week of April 1, 2024

April 3, 2024

In this issue: New research helps COMET embrace African languages; FeatUp improves deep features, a computer vision research cornerstone; LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error; Benchmarking LLMs across languages and more.
Learning from interaction with Microsoft Copilot (web)

March 27, 2024

Microsoft researchers are taking a comprehensive and dynamic approach to help Copilot (web) continuously learn from interaction and feedback, improving the AI system and making it increasingly useful for consumers. Learn more.
Research Focus: Week of March 18, 2024

March 20, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Large language models (LLMs) have shown impressive capabilities, yet they still struggle with math reasoning. In a recent…
Intelligent monitoring: Towards AI-assisted monitoring for cloud services

March 19, 2024

Integrating AI into cloud service monitoring improves incident detection accuracy, reduces unnecessary alerts, and enhances overall system reliability. This helps organizations better align with business goals and increase customer satisfaction.