Bayesian beagle
  • Bayesian beagle
Categories
All (1427)
architectures (540)
education (296)
hci (285)
production (487)
programming (159)
prompt-engineering (382)
recommender (43)
robustness (273)
security (168)
social-sciences (290)

From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
education
hci
architectures
Humans’ unique tool usage distinguishes them from animals. Sum2Act pipeline enhances LLMs for real-world tasks.
Feb 28, 2024

MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery
social-sciences
production
hci
TL;DR: Miko framework uses language and image models to uncover social media users’ intentions.
Feb 28, 2024

Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models
education
production
architectures
prompt-engineering
English-centric LLMs excel in multilingual tasks, decomposed prompting improves efficacy and efficiency in sequence labeling.
Feb 28, 2024

Exploring Advanced Methodologies in Security Evaluation for LLMs
education
programming
robustness
security
prompt-engineering
Large Language Models (LLMs) have advanced language abilities but raise security and ethical concerns. Ongoing research needed.
Feb 28, 2024

An Iterative Associative Memory Model for Empathetic Response Generation
social-sciences
hci
Proposed IAMM model captures associated words for empathetic response generation, validated by experiments.
Feb 28, 2024

Learning or Self-aligning? Rethinking Instruction Fine-tuning
social-sciences
production
architectures
Instruction Fine-tuning (IFT) in language models is critical, but learning additional world knowledge can have negative effects.
Feb 28, 2024

Few-Shot Fairness: Unveiling LLM’s Potential for Fairness-Aware Classification
social-sciences
production
architectures
TL;DR: Using LLMs for fairness in AI, GPT-4 shows superior accuracy and fairness.
Feb 28, 2024

Meta-Task Prompting Elicits Embedding from Large Language Models
production
prompt-engineering
MetaEOL is a new unsupervised embedding method for generating high-quality sentence embeddings from LLMs.
Feb 28, 2024

Automated Discovery of Integral with Deep Learning
prompt-engineering
Advancements in deep learning can deduce integrals, but AI lacks human scientific discovery ability.
Feb 28, 2024

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication
production
hci
architectures
prompt-engineering
TL;DR: Non-NL formats improve LLM reasoning efficiency and multi-agent communication.
Feb 28, 2024

MEGAnno+: A Human-LLM Collaborative Annotation System
education
social-sciences
TL;DR: Large language models and humans should collaborate for reliable data labeling. Check out MEGAnno+ for more.
Feb 28, 2024

Prospect Personalized Recommendation on Large Language Model-based Agent Platform
recommender
hci
ACM article format guide for LATEX users.
Feb 28, 2024

Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions
social-sciences
prompt-engineering
LingoLLM enables large language models to process and translate endangered languages with linguistic knowledge.
Feb 28, 2024

Gradient-Free Adaptive Global Pruning for Pre-trained Language Models
social-sciences
TL;DR: AdaGP improves LLM efficiency with global pruning and modular function optimization.
Feb 28, 2024

Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models
architectures
robustness
Model editing improves large language models for medical knowledge without affecting irrelevant information.
Feb 28, 2024

ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection
robustness
ChatSpamDetector uses large language models to accurately detect phishing emails with detailed reasoning.
Feb 28, 2024

Human Simulacra: A Step toward the Personification of Large Language Models
social-sciences
TL;DR: Large language models can replace human participants in experiments, with potential for practical applications.
Feb 28, 2024

Towards Generalist Prompting for Large Language Models by Mental Models
hci
prompt-engineering
Large language models need specially designed prompting methods for optimal performance. MeMo achieves this.
Feb 28, 2024

Exploring Multilingual Human Value Concepts in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?
social-sciences
hci
architectures
LLMs encode multilingual human values, with cross-lingual inconsistencies and transfer traits. Suggestions for LLM pre-training.
Feb 28, 2024

Multi-FAct: Assessing Multilingual LLMs’ Multi-Regional Knowledge using FActScore
social-sciences
robustness
Multilingual LLMs have factual accuracy issues, with English outperforming other languages. Geographic biases exist.
Feb 28, 2024

Evaluating Quantized Large Language Models
social-sciences
production
architectures
PTQ reduces LLM cost, memory consumption, and computational overhead. Thorough evaluation of quantized LLMs.
Feb 28, 2024

Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation
architectures
RAG improves LLMs by refining retrieved information, enhancing performance by 9.39%.
Feb 28, 2024

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
architectures
robustness
KV caching accelerates Large Language Models, but eviction can harm generation quality. MiKV compresses effectively.
Feb 28, 2024

Large Language Models As Evolution Strategies
education
production
architectures
prompt-engineering
Large language models can perform evolutionary optimization algorithms without explicit task specification.
Feb 28, 2024

CogBench: a large language model walks into a psychology lab
education
hci
architectures
production
social-sciences
prompt-engineering
CogBench benchmarks LLMs using cognitive psychology metrics, highlighting the role of model size and RLHF.
Feb 28, 2024

Do Large Language Models Mirror Cognitive Language Processing?
social-sciences
LLMs simulate cognitive language processing, with model scaling and alignment training improving similarity.
Feb 28, 2024

Exploring Multi-Document Information Consolidation for Scientific Sentiment Summarization
social-sciences
LLMs can generate plausible summaries; sentiment consolidation framework improves meta-review generation. Code and data available.
Feb 28, 2024

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
production
architectures
TL;DR: New DPA framework improves user control over large language models.
Feb 28, 2024

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
prompt-engineering
security
architectures
robustness
Large language models (LLMs) can be manipulated to generate harmful responses, but DRA can counteract this.
Feb 28, 2024

How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning
production
prompt-engineering
LLMs use multiple pathways for CoT reasoning, with a functional rift in the middle layers.
Feb 28, 2024

Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging
production
architectures
TL;DR: Lemur framework improves log parsing with entropy sampling and chain-of-thought merging for better system monitoring.
Feb 28, 2024

Small But Funny: A Feedback-Driven Approach to Humor Distillation
education
hci
programming
LLMs help SLMs with complex tasks, but feedback improves performance more than imitation alone.
Feb 28, 2024

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
hci
architectures
robustness
production
security
Fine-tuning chat models without safety prompts can lead to unsafe behaviors. PTST principle mitigates this.
Feb 28, 2024

Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
education
hci
prompt-engineering
Multi-agent discussion improves LLM reasoning, but single-agent with strong prompts performs similarly.
Feb 28, 2024

LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
architectures
robustness
Conversational AI systems can be negatively impacted by task-switches in conversational history.
Feb 28, 2024

Retrieval-based Full-length Wikipedia Generation for Emergent Events
production
TL;DR: Generating accurate Wikipedia documents for recent events using web sources and LLMs.
Feb 28, 2024

A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems
hci
Survey reviews research on multi-turn dialogue systems, focusing on large language models and future research.
Feb 28, 2024

Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
education
LLMs perform well on medical questions, but current benchmarks lack complexity. New datasets address this.
Feb 28, 2024

LeMo-NADe: Multi-Parameter Neural Architecture Discovery with LLMs
production
architectures
TL;DR: Framework LeMo-NADe automates neural network architecture discovery for edge devices, yielding high performance.
Feb 28, 2024

Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning
prompt-engineering
Large language models struggle with toxic Chain-of-Thought reasoning, but a new method improves performance.
Feb 28, 2024

Language Models Represent Beliefs of Self and Others
social-sciences
LLMs show ToM abilities through neural activations, impacting social reasoning and diverse tasks.
Feb 28, 2024

Cause and Effect: Can Large Language Models Truly Understand Causality?
architectures
TL;DR: CARE-CA framework enhances causal reasoning and explainability using explicit and implicit detection methods.
Feb 28, 2024

The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA
production
ACM article format guide for LATEX documents. Covers common variations and formatting elements.
Feb 28, 2024

BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra
prompt-engineering
Existing systems lack user control and insight; BlendSQL improves performance and scalability.
Feb 27, 2024

Massive Activations in Large Language Models
production
TL;DR: Large Language Models have massive activations with constant values, affecting attention probabilities. Also in Vision Transformers.
Feb 27, 2024

Training-Free Long-Context Scaling of Large Language Models
architectures
DCA enables LLMs to process long sequences without continual training, achieving comparable performance to finetuned models.
Feb 27, 2024

Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities
robustness
education
security
hci
prompt-engineering
TL;DR: Deep learning for software security faces challenges, but large language models show promise.
Feb 27, 2024

Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
robustness
education
security
social-sciences
hci
LLMs can generate harmful responses in multi-turn dialogue, posing safety challenges.
Feb 27, 2024

AmbigNLG: Addressing Task Ambiguity in Instruction for NLG
prompt-engineering
education
social-sciences
architectures
AmbigNLG tackles task ambiguity in NLG instructions, improving LLM performance with clear instructions.
Feb 27, 2024

Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles
production
robustness
architectures
Increased propaganda on media, limited detection in non-English content, GPT-4 struggles with fine-grained detection.
Feb 27, 2024

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
hci
social-sciences
education
LLMs excel in objective tasks, struggle in subjective tasks. RiC method improves subjective task performance.
Feb 27, 2024

Creating Suspenseful Stories: Iterative Planning with Large Language Models
prompt-engineering
LLMs struggle with suspenseful story generation, but our method shows promise without supervised corpora.
Feb 27, 2024

A Language Model based Framework for New Concept Placement in Ontologies
education
prompt-engineering
Using language models to insert new concepts into ontology, leveraging neural methods for edge search and selection.
Feb 27, 2024

Prescribing Large Language Models for Perioperative Care: What’s The Right Dose for Pre-trained Models?
production
social-sciences
architectures
Title: The Impact of Social Media on Mental Health in Adolescents Abstract: This study examines the relationship between social media use and mental health in…
Feb 27, 2024

Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies
prompt-engineering
LLMs excel in NLP tasks, but lack systematic generalization. GPT-4 outperforms GPT-3.5 and Neural Data Router.
Feb 27, 2024

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
robustness
architectures
education
DS-Agent automates data science tasks using large language models, achieving high success rates and performance.
Feb 27, 2024

The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks
production
programming
architectures
education
LLMs improve type inference in Python, but need fine-tuning for callgraph analysis.
Feb 27, 2024

Sinkhorn Distance Minimization for Knowledge Distillation
education
KD compresses LLMs. Existing methods have limitations. SinKD uses Sinkhorn distance for effective supervision. Superior to state-of-the-art.
Feb 27, 2024

MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning
prompt-engineering
Tool-augmented large language model MathSensei improves mathematical reasoning, outperforming gpt-3.5-turbo on complex problems.
Feb 27, 2024

Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
social-sciences
security
Existing QA datasets are too easy for powerful language models. Introducing Researchy Questions dataset.
Feb 27, 2024

A Piece of Theatre: Investigating How Teachers Design LLM Chatbots to Assist Adolescent Cyberbullying Education
prompt-engineering
hci
social-sciences
education
TL;DR: Cyberbullying harms teens; chatbot tool helps teachers educate and support students.
Feb 27, 2024

LLM-Resistant Math Word Problem Generation via Adversarial Attacks
education
security
robustness
LLMs challenge fair assessment. Adversarial examples degrade math problem-solving ability. Shared vulnerabilities identified. Code available.
Feb 27, 2024

Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs
production
hci
social-sciences
architectures
LLMs show left-leaning views, reliability increases with size, and vary across policy programs.
Feb 27, 2024

TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
robustness
TruthX improves truthfulness of large language models by editing internal representations in truthful space.
Feb 27, 2024

Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning
prompt-engineering
TL;DR: CLIPS is a Bayesian agent architecture for flexible, context-sensitive instruction following and goal assistance.
Feb 27, 2024

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
production
architectures
BitNet b1.58 introduces 1-bit LLM variant, matching full-precision LLM performance while being cost-effective.
Feb 27, 2024

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models
prompt-engineering
robustness
LLM confidence calibration improved by Fact-and-Reflection prompting method, reducing Expected Calibration Error by 23.5%.
Feb 27, 2024

Language Agents as Optimizable Graphs
prompt-engineering
programming
hci
TL;DR: Techniques unify LLM-based agents as computational graphs, improving problem solvers. Code available at GitHub.
Feb 27, 2024

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
hci
education
ShapeLLM is a 3D language model for object understanding and interaction, achieving state-of-the-art performance.
Feb 27, 2024

Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
production
Introducing QRData benchmark to evaluate Large Language Models’ quantitative reasoning on real-world data.
Feb 27, 2024

Large Language Model for Participatory Urban Planning
hci
social-sciences
TL;DR: LLM-based framework for participatory urban planning outperforms traditional methods in satisfaction and inclusion metrics.
Feb 27, 2024

Tower: An Open Multilingual Large Language Model for Translation-Related Tasks
production
architectures
Tailoring LLMs for translation tasks improves performance, competitive with general-purpose models.
Feb 27, 2024

Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
prompt-engineering
robustness
LLMs need to address hallucination issues; Re-Ex method improves revision performance efficiently.
Feb 27, 2024

Deep Learning Based Named Entity Recognition Models for Recipes
prompt-engineering
Automated protocols for recognizing recipe text entities are valuable for various applications. Fine-tuned spaCy-transformer is best.
Feb 27, 2024

SoFA: Shielded On-the-fly Alignment via Priority Rule Following
social-sciences
TL;DR: New method for aligning Large Language Models with human values using priority rule following.
Feb 27, 2024

Determinants of LLM-assisted Decision-Making
hci
LLMs impact decision-making; study identifies factors and interactions for better informed decisions.
Feb 27, 2024

Consistency Matters: Explore LLMs Consistency From a Black-Box Perspective
architectures
LLM consistency is lacking in NLP research. We built a dataset and achieved best performance.
Feb 27, 2024

Can an LLM-Powered Socially Assistive Robot Effectively and Safely Deliver Cognitive Behavioral Therapy? A Study With University Students
education
social-sciences
hci
prompt-engineering
LLM-powered SAR-guided CBT effective for anxiety and depression, outperforming chatbot and traditional methods.
Feb 27, 2024

Ansible Lightspeed: A Code Generation Service for IT Automation
programming
architectures
education
LLMs improve developer productivity, but domain-specific languages like Ansible need more attention. Ansible Lightspeed has high user acceptance.
Feb 27, 2024

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
production
architectures
SongComposer is an LLM for song composition, outperforming GPT-4 in various tasks.
Feb 27, 2024

Case-Based or Rule-Based: How Do Transformers Do the Math?
education
production
social-sciences
prompt-engineering
architectures
LLMs struggle with simple math, use case-based reasoning, but can improve with Rule-Following Fine-Tuning.
Feb 27, 2024

Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers
prompt-engineering
production
architectures
LLM-based prompt optimizer GPO improves performance by up to 56.8% on Big-Bench Hard. Code available.
Feb 27, 2024

Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides
prompt-engineering
production
architectures
Nissist uses TSGs and incident history to reduce human intervention in incident management.
Feb 27, 2024

Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection
education
TL;DR: Study evaluates LLMs for OOD intent detection, finding strengths and weaknesses compared to fine-tuned models.
Feb 27, 2024

Evaluating Very Long-Term Conversational Memory of LLM Agents
production
hci
architectures
education
Long-term dialogue models struggle with understanding lengthy conversations and lag behind human performance.
Feb 27, 2024

Investigating Continual Pretraining in Large Language Models: Insights and Implications
architectures
Study on Continual Learning in large language models, focusing on efficient training and adaptability.
Feb 27, 2024

BASES: Large-scale Web Search User Simulation with Large Language Model based Agents
production
architectures
TL;DR: LLM-based user simulation framework BASES effectively models web search behaviors, supported by WARRIORS dataset.
Feb 27, 2024

Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning
social-sciences
MELoRA improves performance with fewer parameters than LoRA in NLP tasks.
Feb 27, 2024

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
prompt-engineering
education
LLM-based Agent-Pro learns and evolves through interactions, outperforming vanilla LLM in games.
Feb 27, 2024

REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering
production
architectures
REAR improves LLMs’ ability to assess relevance of retrieved documents in open-domain QA. Outperforms previous RAG approaches.
Feb 27, 2024

Navigating Complexity: Orchestrated Problem Solving with Multi-Agent LLMs
prompt-engineering
education
TL;DR: New approach uses decomposition to help large language models solve complex and vague problems effectively.
Feb 26, 2024

CodeS: Towards Building Open-source Language Models for Text-to-SQL
programming
architectures
CodeS: Open-source language model for text-to-SQL, outperforms SOTA with smaller parameters.
Feb 26, 2024

RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering
architectures
production
ARAG improves retrieval efficiency, but lacks evaluation. RetrievalQA tests ARAG methods. Time-Aware Adaptive Retrieval proposed.
Feb 26, 2024

A Survey of Large Language Models in Cybersecurity
programming
LLMs in cybersecurity: applications, uses, limitations, and suggestions for improvement.
Feb 26, 2024

Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling
prompt-engineering
education
Using large language models to create legal stories improves comprehension and interest in law.
Feb 26, 2024

LLM Inference Unveiled: Survey and Roofline Model Insights
production
architectures
TL;DR: Survey introduces framework for analyzing Large Language Model inference techniques, addressing challenges and providing insights.
Feb 26, 2024

PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering
hci
architectures
PerLTQA dataset combines semantic and episodic memories for personalized QA tasks, outperforming LLMs.
Feb 26, 2024

WIPI: A New Web Threat for LLM-Driven Web Agents
hci
security
LLMs used in Web Agents may be vulnerable to WIPI attacks, with a high success rate.
Feb 26, 2024

LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
production
Advancements in LLMs for autonomous agents, LLMArena evaluates LLM capabilities in multi-agent environments.
Feb 26, 2024

LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language
prompt-engineering
hci
education
LLMs struggle with prompt quality, LangGPT framework improves LLM performance and prompt design.
Feb 26, 2024

HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
prompt-engineering
programming
social-sciences
HumanEval-XL: Multilingual code generation benchmark for evaluating multilingual LLMs. 22,080 prompts, 23 NLs, 12 PLs.
Feb 26, 2024

Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models
education
Tool-augmented LLMs improve knowledge access, but face limitations; DEER framework enhances flexibility and generalizability.
Feb 26, 2024

Pandora’s White-Box: Increased Training Data Leakage in Open LLMs
security
TL;DR: Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Feb 26, 2024

ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing
architectures
production
education
ProLLaMA transforms LLMs into ProLLMs for multiple protein language processing tasks. State-of-the-art results. Code available.
Feb 26, 2024

Defending LLMs against Jailbreaking Attacks via Backtranslation
security
production
architectures
prompt-engineering
robustness
New method defends language models from jailbreaking attacks using backtranslation prompts.
Feb 26, 2024

Improving LLM-based Machine Translation with Systematic Self-Correction
robustness
architectures
production
LLMs have translation errors, but self-correction framework TER improves quality across languages.
Feb 26, 2024

From Large Language Models and Optimization to Decision Optimization CoPilot: A Research Manifesto
architectures
LLMs can simplify optimization models for business decisions, proposing a Decision Optimization CoPilot.
Feb 26, 2024

Unveiling ChatGPT’s Usage in Open Source Projects: A Mining-based Study
architectures
programming
production
education
LLMs like ChatGPT are used in software projects for 45 tasks, providing insights for developers and researchers.
Feb 26, 2024

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
architectures
prompt-engineering
security
production
TL;DR: Code-style instructions improve robustness of Large Language Models, outperforming natural language instructions.
Feb 26, 2024

OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)
prompt-engineering
TL;DR: Developed specialized language model for oncology advice, improved accuracy using real patient interactions. Released to research community.
Feb 26, 2024

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models
robustness
security
Adversarial jailbreaking of LLMs addressed with CodeChameleon framework, achieving high Attack Success Rate.
Feb 26, 2024

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
robustness
security
architectures
production
ShieldLM is a customizable and explainable safety detector for Large Language Models.
Feb 26, 2024

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
prompt-engineering
security
TL;DR: New method generates diverse adversarial prompts to enhance robustness of large language models.
Feb 26, 2024

LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery
prompt-engineering
education
VQA in robotic surgery needs continual updating due to evolving trainee needs and data challenges.
Feb 26, 2024

Data-freeWeight Compress and Denoise for Large Language Models
architectures
Large Language Models (LLMs) face scalability constraints, but Data-free Joint Rank-k Approximation offers promising compression.
Feb 26, 2024

Two-stage Generative Question Answering on Temporal Knowledge Graph Using Large Language Models
hci
GenTKGQA framework improves temporal knowledge graph question answering, outperforming state-of-the-art baselines, achieving 100% on simple questions.
Feb 26, 2024

Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models
education
Recent LVLMs struggle with fine-grained visual categorization, proposing a new evaluation benchmark. Code and dataset available.
Feb 26, 2024

Memory GAPS: Would LLM pass the Tulving Test?
hci
Title: The Impact of Social Media on Mental Health: A Literature Review Abstract: This article reviews the existing literature on the impact of social media on mental…
Feb 26, 2024

Benchmarking LLMs on the Semantic Overlap Summarization Task
prompt-engineering
hci
Semantic Overlap Summarization (SOS) task evaluates LLMs’ ability to summarize common information from alternative narratives.
Feb 26, 2024

Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding
prompt-engineering
programming
Hybrid approach combines large and small language models for efficient autoregressive decoding. Speeds up tasks with minor performance penalties.
Feb 26, 2024

Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering
hci
education
Open-source LLMs use Chain-of-Discussion framework to improve open-ended question answering quality.
Feb 26, 2024

From RAGs to riches: Using large language models to write documents for clinical trials
social-sciences
production
Large language models (LLMs) can rapidly generate clinical trial documents, but need improvement in quality.
Feb 26, 2024

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property
production
architectures
LLMs performance in IP domain evaluated with MoZIP benchmark. MoZi model outperforms others.
Feb 26, 2024

Do Large Language Models Latently Perform Multi-Hop Reasoning?
prompt-engineering
robustness
education
Large Language Models (LLMs) show evidence of latent multi-hop reasoning in complex prompts.
Feb 26, 2024

RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation
programming
Generative model RepoAgent creates high-quality code documentation, underexplored in software engineering.
Feb 26, 2024

Unraveling Babel: Exploring Multilingual Activation Patterns within Large Language Models
architectures
production
Study explores multilingual activation patterns in large language models, shedding light on processing mechanisms.
Feb 26, 2024

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
prompt-engineering
architectures
programming
MathGenie improves math problem generation and solution accuracy in language models.
Feb 26, 2024

Predicting Sustainable Development Goals Using Course Descriptions – from LLMs to Conventional Foundation Models
architectures
production
Predicting UN SDGs for university courses using PaLM 2, training smaller language models. BART best performer.
Feb 26, 2024

LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification
robustness
production
Researchers use advanced learning algorithms and data augmentation to address limited data availability. They propose a DP-based DA method for text classification on private…
Feb 26, 2024

Immunization against harmful fine-tuning attacks
robustness
security
architectures
production
TL;DR: Large language models can be purposely fine-tuned for harmful goals, requiring effective defense strategies.
Feb 26, 2024

Integrating Large Language Models with Graphical Session-Based Recommendation
recommender
hci
architectures
production
prompt-engineering
social-sciences
education
LLMGR integrates large language models with Graph Neural Networks for session-based recommendation tasks.
Feb 26, 2024

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
architectures
production
LLMs process multilingual texts using language-specific neurons, which can be selectively activated.
Feb 26, 2024

mEdIT: Multilingual Text Editing via Instruction Tuning
architectures
production
Multilingual text editing with instruction tuning for improved editing.
Feb 26, 2024

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
hci
social-sciences
education
Recent work evaluates values in large language models using surveys. Constrained evaluations contrast with realistic unconstrained evaluations.
Feb 26, 2024

From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings
robustness
prompt-engineering
security
TL;DR: Adversarial Suffixes Embedding Translation Framework improves understanding and attack success rate of large language models.
Feb 25, 2024

Text Understanding and Generation Using Transformer Models for Intelligent E-commerce Recommendations
recommender
Transformer pre-training models enhance e-commerce text understanding and recommendation systems, benefiting users and merchants.
Feb 25, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM
programming
ChatMusician111 integrates music into LLMs, outperforming GPT-4 in music generation.
Feb 25, 2024

From Text to Transformation: A Comprehensive Review of Large Language Models’ Versatility
social-sciences
Study explores impact of Large Language Models (LLMs) in diverse domains, identifies research gaps.
Feb 25, 2024

LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting
prompt-engineering
LSTPrompt improves time-series forecasting with tailored prompts for better adaptability and performance.
Feb 25, 2024

Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy
robustness
TL;DR: Detecting machine-generated texts using MMD-MP method for improved stability and performance.
Feb 25, 2024

Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression
prompt-engineering
Gist-COCO compresses prompts for large language models, outperforming previous models in various tasks.
Feb 25, 2024

Likelihood-based Mitigation of Evaluation Bias in Large Language Models
programming
social-sciences
LLMs may have likelihood bias in evaluating natural language generation, but bias can be mitigated.
Feb 25, 2024

HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs
robustness
architectures
education
LLMs struggle with hallucinations, but a new framework detects and benchmarks them effectively.
Feb 25, 2024

Citation-Enhanced Generation for LLM-based Chatbot
robustness
hci
LLMs in chatbots may produce hallucinated content; CEG approach with retrieval argumentation addresses this issue.
Feb 25, 2024

How Can LLM Guide RL? A Value-Based Approach
architectures
RL algorithms need extensive trial-and-error; LLM guidance improves sample efficiency in planning tasks.
Feb 25, 2024

Attacking LLM Watermarks by Exploiting Their Strengths
robustness
security
Generative models create human-like content, but watermarking to verify source is vulnerable to attacks.
Feb 25, 2024

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
robustness
prompt-engineering
security
architectures
TL;DR: SemanticSmooth defends against jailbreaking attacks on large language models with strong performance. Codes available.
Feb 25, 2024

AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
hci
TL;DR: AVI-Talking uses language models to generate expressive 3D talking faces aligned with speech.
Feb 25, 2024

LLMs with Chain-of-Thought Are Non-Causal Reasoners
hci
prompt-engineering
LLMs reasoning and CoTs show surprising discrepancies with human reasoning processes. Factors influencing causal structure explored. Code released.
Feb 25, 2024

LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
prompt-engineering
TL;DR: LSTP improves video-language model efficiency, temporal understanding, and spatial-temporal alignment for various tasks.
Feb 25, 2024

PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails
robustness
social-sciences
security
LLMs vulnerable to jailbreak attacks, Guard Models ineffective against PRP attack strategy.
Feb 24, 2024

Evaluating Prompting Strategies for Grammatical Error Correction Based on Language Proficiency
prompt-engineering
social-sciences
Analysis of GEC prompting strategies with LLMs based on language proficiency to reduce overcorrection.
Feb 24, 2024

QuaCer-C: Quantitative Certification of Knowledge Comprehension in LLMs
education
TL;DR: New certification framework for LLMs shows performance improvement with more parameters, Mistral model less performant.
Feb 24, 2024

Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models
robustness
Large language models (LLMs) are susceptible to data contamination. CDD and TED mitigate this issue.
Feb 24, 2024

Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance
prompt-engineering
hci
social-sciences
Politeness in prompts affects LLM performance across languages, cultural context matters.
Feb 22, 2024

Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
education
Pretrained language models are effective for scientific summarization, but traditional evaluation methods are inadequate. New facet-aware metric proposed.
Feb 22, 2024

Learning to Reduce: Optimal Representations of Structured Data in Prompting Large Language Models
prompt-engineering
TL;DR: Proposed framework uses reinforcement learning to improve large language model’s reasoning with structured data.
Feb 22, 2024

Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
education
programming
TL;DR: Integrating Large Language Models into IDEs can boost developer productivity with proper evaluation.
Feb 22, 2024

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
architectures
production
RLHF needs efficient AI alignment; PPO is costly, but simpler methods can outperform.
Feb 22, 2024

Can Large Language Models Detect Misinformation in Scientific News Reporting?
prompt-engineering
Detecting misinformation in scientific reporting using large language models and prompt engineering strategies.
Feb 22, 2024

MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
education
architectures
hci
LLMs dialogue abilities evaluated with MT-Bench-101, revealing differing performance trends across tasks.
Feb 22, 2024

Unveiling Linguistic Regions in Large Language Models
architectures
social-sciences
LLMs show strong cross-lingual alignment. Core linguistic region crucial for proficiency in multiple languages.
Feb 22, 2024

Generalizing Reward Modeling for Out-of-Distribution Preference Learning
architectures
production
TL;DR: Optimizing reward model for out-of-distribution preference learning with meta-learning approach, showing improved generalization.
Feb 22, 2024

IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus
architectures
production
Large Language Models struggle with Information Extraction; IEPile corpus improves LLM performance.
Feb 22, 2024

Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded Knowledge
prompt-engineering
HSP improves LLM reasoning accuracy, surpassing GPT-3.5, with publicly available code and dataset.
Feb 22, 2024

From Keywords to Structured Summaries: Streamlining Scholarly Knowledge Access
architectures
IR engines vital for scientific community, need structured records and advanced IT tools for efficiency.
Feb 22, 2024

KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
hci
LLMs need cultural understanding for deployment. KorNAT measures alignment with South Korea. Few models meet reference score.
Feb 22, 2024

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments
architectures
Large language models (LLMs) can be augmented with tools to handle complex environments effectively.
Feb 22, 2024

LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition
architectures
LLM-DA proposes data augmentation for NER tasks, improving model performance with limited data.
Feb 22, 2024

Automating Psychological Hypothesis Generation with AI: Large Language Models Meet Causal Graph
social-sciences
LLM and causal graphs generate novel psychological hypotheses, surpassing LLM-only and expert ideas.
Feb 22, 2024

Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond
prompt-engineering
Task embedding faces challenges with prompt-guided Large Language Models, proposing a unified framework for adaptability.
Feb 22, 2024

Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models
production
EEVE-Korean-v1.0 is a leading Korean pre-trained model for text understanding.
Feb 22, 2024

Understanding and Patching Compositional Reasoning in LLMs
prompt-engineering
LLMs struggle with compositional reasoning, but our research uncovers and fixes the root causes.
Feb 22, 2024

Scaling Efficient LLMs
architectures
production
Efficient LLMs need fewer parameters for desired accuracy, with implications for training corpus size.
Feb 22, 2024

Content Conditional Debiasing for Fair Text Embedding
social-sciences
Proposing method for fair text embeddings, achieving fairness while maintaining utility trade-off.
Feb 22, 2024

A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health
architectures
social-sciences
production
Efficiently allocate health resources and adapt to policy changes using DLM language model.
Feb 22, 2024

COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling
hci
social-sciences
New framework uses language analysis to predict therapeutic alliance in psychotherapy sessions.
Feb 22, 2024

Leveraging Large Language Models for Concept Graph Recovery and Question Answering in NLP Education
education
hci
social-sciences
LLMs show promise in educational scenarios, with competitive concept graph recovery and improved question-answering.
Feb 22, 2024

Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard
hci
social-sciences
LLMs exhibit distinctive linguistic styles, enabling accurate classification with 88% accuracy.
Feb 22, 2024

Zero-shot cross-lingual transfer in instruction tuning of large language model
education
architectures
production
TL;DR: Instruction tuning in multilingual settings successful with proper hyperparameter tuning and large data.
Feb 22, 2024

MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented Generation
robustness
Augmented generation methods face challenges with false vector matching, MeTMaP framework detects inaccuracies.
Feb 22, 2024

Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models
hci
social-sciences
Examining counseling conversation dynamics, domain knowledge and LLMs improve conversation representation by 15%.
Feb 22, 2024

RelayAttention for Efficient Large Language Model Serving with Long System Prompts
prompt-engineering
robustness
architectures
production
Improving efficiency of large language models with long prompts using RelayAttention algorithm.
Feb 22, 2024

Is Cognition and Action Consistent or Not: Investigating Large Language Model’s Personality
hci
social-sciences
Study evaluates reliability of Large Language Models in emulating human-like personality traits.
Feb 22, 2024

Eagle: Ethical Dataset Given from Real Interactions
robustness
hci
social-sciences
security
Large language models have ethical issues, new dataset captures real-world problems.
Feb 22, 2024

InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks
architectures
production
Influence functions improve model performance and identify data points needing manual annotation.
Feb 22, 2024

On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe
prompt-engineering
LLMs excel at reverse dictionary task, predicting general reasoning performance. In-context learning enhances conceptual inference.
Feb 22, 2024

Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
education
Using large language models can adjust text difficulty for better student understanding.
Feb 22, 2024

Rule or Story, Which is a Better Commonsense Expression for Talking with Large Language Models?
hci
social-sciences
Stories are better than rules for retrieving commonsense from large language models.
Feb 22, 2024

Using Large Language Models for Natural Language Processing Tasks in Requirements Engineering: A Systematic Guideline
education
LLMs for NLP in RE need basic knowledge and a usage guideline.
Feb 22, 2024

An LLM-Enhanced Adversarial Editing System for Lexical Simplification
architectures
production
Proposed LS method uses Adversarial Editing System and LLM-enhanced loss for lexical simplification.
Feb 22, 2024

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
architectures
production
CriticBench evaluates LLMs’ critique and correction reasoning across tasks, revealing key performance factors.
Feb 22, 2024

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
architectures
production
MoE LLMs achieve higher performance with fewer parameters, enhanced by expert-level sparsification techniques.
Feb 22, 2024

Visual Hallucinations of Multi-modal Large Language Models
robustness
architectures
Tool VHTest generates diverse VH instances, finds MLLM hallucinations, and improves performance through fine-tuning.
Feb 22, 2024

Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation
hci
production
Novel LLM agent framework for urban mobility generation with real-world data validation.
Feb 22, 2024

Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization
programming
social-sciences
Language models lack explainability in code summarization, with no alignment between human and model focus.
Feb 22, 2024

Identifying Multiple Personalities in Large Language Models with External Evaluation
education
hci
social-sciences
prompt-engineering
production
LLMs’ personalities analyzed using external evaluation method, showing different personalities in different scenarios.
Feb 22, 2024

LLMs with Industrial Lens: Deciphering the Challenges and Prospects – A Survey
education
architectures
LLMs drive industrial applications, but challenges and opportunities need exploration for enhancement.
Feb 22, 2024

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
architectures
ConceptMath evaluates LLMs’ mathematical reasoning at different granularities, revealing performance variations and offering fine-tuning strategies.
Feb 22, 2024

Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph
production
architectures
recommender
LLM-KERec improves recommendation systems by incorporating complementary knowledge and capturing user intent transitions.
Feb 21, 2024

LLM Jailbreak Attack versus Defense Techniques – A Comprehensive Study
security
prompt-engineering
robustness
hci
TL;DR: Large language models can generate harmful content, jailbreaking is a challenge, and new defense techniques are needed.
Feb 21, 2024

Graph Representation of Narrative Context: Coherence Dependency via Retrospective Questions
hci
Novel NarCo graph improves narrative comprehension and performance in various tasks without human annotations.
Feb 21, 2024

Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models
robustness
production
LLMs generate text with errors, but PGI method reduces error rate to 3.15%. Strategic application is key.
Feb 21, 2024

Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models
security
Text watermarking lacks cross-lingual consistency, vulnerable to removal attack, defense method proposed. AUC improved.
Feb 21, 2024

Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
robustness
production
architectures
security
LLMs vulnerable to simple attacks, raising concerns about reliability in real-world scenarios.
Feb 21, 2024

CriticBench: Evaluating Large Language Models as Critic
production
CriticBench evaluates critique ability of Large Language Models across diverse tasks and response qualities.
Feb 21, 2024

Towards Building Multilingual Language Model for Medicine
production
social-sciences
Developed open-source multilingual medical language model, MMedLM 2, outperforms other models, rivaling GPT-4.
Feb 21, 2024

Learning to Poison Large Language Models During Instruction Tuning
robustness
security
LLMs vulnerable to data poisoning attacks, new approach for trigger learning, high success rate.
Feb 21, 2024

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning
production
prompt-engineering
architectures
LLMs need help reasoning; Frodo framework improves reasoning and answer accuracy.
Feb 21, 2024

Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent
architectures
hci
TL;DR: Neeko framework improves multi-character role-playing for dialogue agents with dynamic low-rank adapter strategy.
Feb 21, 2024

Factual Consistency Evaluation of Summarisation in the Era of Large Language Models
production
architectures
Article: The Impact of Social Media on Mental Health in Adolescents tl;dr: Social media use linked to negative mental health outcomes in adolescents.
Feb 21, 2024

$ exttt{Se}^2$: $ extit{Se}$quential Example $ extit{Se}\(lection for In-Context Learning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Large language models need sequential examples for in-context learning, 𝚂𝚎2superscript𝚂𝚎2^{2}Se start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT method improves selection.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='212' data-categories='security,production' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071840' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Large_Language_Models_are_Advanced_Anonymizers/2024-02-21-Large_Language_Models_are_Advanced_Anonymizers.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.13846v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Large Language Models are Advanced Anonymizers</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Large language models can infer personal data, new anonymization methods needed.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='213' data-categories='social-sciences,hci' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/KorNAT_LLM_Alignment_Benchmark_for_Korean_Social_Values_and_Common_Knowledge/2024-02-21-KorNAT_LLM_Alignment_Benchmark_for_Korean_Social_Values_and_Common_Knowledge.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13605v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs need cultural understanding; KorNAT measures alignment with South Korea. Few models met reference score.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='214' data-categories='robustness,production,architectures,security' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071796' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Coercing_LLMs_to_do_and_reveal_(almost)_anything/2024-02-21-Coercing_LLMs_to_do_and_reveal_(almost)_anything.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.14020v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Coercing LLMs to do and reveal (almost) anything</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">Adversarial attacks on large language models have broader impact than jailbreaking, including coercion of unintended behaviors.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='215' data-categories='hci,social-sciences,architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071828' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Investigating_Multilingual_Instruction_Tuning_Do_Polyglot_Models_Demand_for_Multilingual_Instructions/2024-02-21-Investigating_Multilingual_Instruction_Tuning_Do_Polyglot_Models_Demand_for_Multilingual_Instructions.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13703v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">TL;DR: Multilingual LLMs benefit from instruction-tuning on parallel datasets, improving cross-lingual capabilities.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='216' data-categories='robustness,prompt-engineering,programming' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071876' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Test_Driven_Development_for_Code_Generation/2024-02-21-Test_Driven_Development_for_Code_Generation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13521v1/extracted/5421759/min.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Test-Driven Development for Code Generation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">TL;DR: Test-driven development improves GPT4 code generation.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='217' data-categories='social-sciences,hci' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Are_LLMs_Effective_Negotiators_Systematic_Evaluation_of_the_Multifaceted_Capabilities_of_LLMs_in_Negotiation_Dialogues/2024-02-21-Are_LLMs_Effective_Negotiators_Systematic_Evaluation_of_the_Multifaceted_Capabilities_of_LLMs_in_Negotiation_Dialogues.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13550v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs can enhance negotiation research but struggle with context and strategic responses.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='218' data-categories='education,prompt-engineering' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071796' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Cognitive_Visual_Language_Mapper_Advancing_Multimodal_Comprehension_with_Enhanced_Visual_Knowledge_Alignment/2024-02-21-Cognitive_Visual_Language_Mapper_Advancing_Multimodal_Comprehension_with_Enhanced_Visual_Knowledge_Alignment.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13561v1/extracted/5421915/figures/intro_case.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">LMMs need visual knowledge alignment for better knowledge-based visual question answering. CVLM improves LMMs by 5%.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='219' data-categories='production,architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071844' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LongRoPE_Extending_LLM_Context_Window_Beyond_2_Million_Tokens/2024-02-21-LongRoPE_Extending_LLM_Context_Window_Beyond_2_Million_Tokens.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13753v1/extracted/5419364/final_ppl.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">LongRoPE extends context window of large language models to 2048k tokens with minimal fine-tuning.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='220' data-categories='robustness,social-sciences' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071776' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/A_Comprehensive_Study_of_Multilingual_Confidence_Estimation_on_Large_Language_Models/2024-02-21-A_Comprehensive_Study_of_Multilingual_Confidence_Estimation_on_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13606v1/extracted/5422187/figs/frame.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">LLMs need reliable confidence estimations; MlingConf improves cross-lingual confidence scores for diverse languages.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='221' data-categories='production,architectures,recommender' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LLM4SBR_A_Lightweight_and_Effective_Framework_for_Integrating_Large_Language_Models_in_Session_based_Recommendation/2024-02-21-LLM4SBR_A_Lightweight_and_Effective_Framework_for_Integrating_Large_Language_Models_in_Session_based_Recommendation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13840v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LLM4SBR: A Lightweight and Effective Framework for Integrating Large Language Models in Session-based Recommendation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('recommender'); return false;">recommender</div> </div> <div class="card-text listing-description delink">Traditional session-based recommendation lacks semantic information, but LLM4SBR integrates large language models for improvement.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='222' data-categories='robustness,prompt-engineering,security' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/GradSafe_Detecting_Unsafe_Prompts_for_LLMs_via_Safety_Critical_Gradient_Analysis/2024-02-21-GradSafe_Detecting_Unsafe_Prompts_for_LLMs_via_Safety_Critical_Gradient_Analysis.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13494v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">GradSafe detects unsafe prompts in LLMs without extensive training, outperforming existing methods.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='223' data-categories='architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/KInIT_at_SemEval_2024_Task_8_Fine_tuned_LLMs_for_Multilingual_Machine_Generated_Text_Detection/2024-02-21-KInIT_at_SemEval_2024_Task_8_Fine_tuned_LLMs_for_Multilingual_Machine_Generated_Text_Detection.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13671v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">SemEval-2024 Task 8 detects machine-generated text to prevent misuse of large language models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='224' data-categories='education,hci' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071836' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LLMs_Meet_Long_Video_Advancing_Long_Video_Comprehension_with_An_Interactive_Visual_Adapter_in_LLMs/2024-02-21-LLMs_Meet_Long_Video_Advancing_Long_Video_Comprehension_with_An_Interactive_Visual_Adapter_in_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13546v1/extracted/5421903/figures/model.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">TL;DR: Interactive Visual Adapter improves video understanding in large language models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='225' data-categories='production' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071892' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/What_Linguistic_Features_and_Languages_are_Important_in_LLM_Translation/2024-02-21-What_Linguistic_Features_and_Languages_are_Important_in_LLM_Translation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">What Linguistic Features and Languages are Important in LLM Translation?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Llama2 excels in machine translation, but performance varies for languages not in its training data.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='226' data-categories='robustness,production' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/SYNFAC_EDIT_Synthetic_Imitation_Edit_Feedback_for_Factual_Alignment_in_Clinical_Summarization/2024-02-21-SYNFAC_EDIT_Synthetic_Imitation_Edit_Feedback_for_Factual_Alignment_in_Clinical_Summarization.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13919v1/extracted/5416467/Images/acl_main.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">GPT used to improve factual accuracy in clinical NLP.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='227' data-categories='robustness,prompt-engineering,security' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071860' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/RITFIS_Robust_input_testing_framework_for_LLMs_based_intelligent_software/2024-02-21-RITFIS_Robust_input_testing_framework_for_LLMs_based_intelligent_software.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13518v1/extracted/5421727/RITFIS_framework.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">RITFIS: Robust input testing framework for LLMs-based intelligent software</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">RITFIS assesses robustness of NLP software, adapting DNN testing methods for LLM-based software.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='228' data-categories='social-sciences' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Bangla_AI_A_Framework_for_Machine_Translation_Utilizing_Large_Language_Models_for_Ethnic_Media/2024-02-21-Bangla_AI_A_Framework_for_Machine_Translation_Utilizing_Large_Language_Models_for_Ethnic_Media.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.14179v1/extracted/5423885/Z.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Bangla AI: A Framework for Machine Translation Utilizing Large Language Models for Ethnic Media</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Ethnic media uses LLM and MMT for news translation and searching, with potential ethical challenges.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='229' data-categories='prompt-engineering' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Self_DC_When_to_retrieve_and_When_to_generate_Self_Divide_and_Conquer_for_Compositional_Unknown_Questions/2024-02-21-Self_DC_When_to_retrieve_and_When_to_generate_Self_Divide_and_Conquer_for_Compositional_Unknown_Questions.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Article: The Impact of Social Media on Mental Health in Adolescents tl;dr: Social media use linked to negative mental health outcomes in adolescents.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='230' data-categories='robustness' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/OlympiadBench_A_Challenging_Benchmark_for_Promoting_AGI_with_Olympiad_Level_Bilingual_Multimodal_Scientific_Problems/2024-02-21-OlympiadBench_A_Challenging_Benchmark_for_Promoting_AGI_with_Olympiad_Level_Bilingual_Multimodal_Scientific_Problems.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.14008v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">Large Language and Multimodal Models surpass human capabilities, but struggle with rigorous Olympiad-level challenges. GPT-4V scores 17.23%.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='231' data-categories='robustness,architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Ouroboros_Speculative_Decoding_with_Large_Model_Enhanced_Drafting/2024-02-21-Ouroboros_Speculative_Decoding_with_Large_Model_Enhanced_Drafting.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13720v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Ouroboros: Speculative Decoding with Large Model Enhanced Drafting</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">Ouroboros accelerates language model inference with speculative decoding and phrase candidate pool. Speedups up to 2.8x. Source code available.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='232' data-categories='production,social-sciences,architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Beyond_Probabilities_Unveiling_the_Misalignment_in_Evaluating_Large_Language_Models/2024-02-21-Beyond_Probabilities_Unveiling_the_Misalignment_in_Evaluating_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13887v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">TL;DR: Probability-based evaluation of Large Language Models for MCQs has limitations, needs improvement.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='233' data-categories='prompt-engineering,architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Unlocking_Instructive_In_Context_Learning_with_Tabular_Prompting_for_Relational_Triple_Extraction/2024-02-21-Unlocking_Instructive_In_Context_Learning_with_Tabular_Prompting_for_Relational_Triple_Extraction.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13741v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Unlocking Instructive In-Context Learning with Tabular Prompting for Relational Triple Extraction</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">Innovative methods improve relational triple extraction with effective prompts and proper demonstrations.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='234' data-categories='prompt-engineering' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/User_LLM_Efficient_LLM_Contextualization_with_User_Embeddings/2024-02-21-User_LLM_Efficient_LLM_Contextualization_with_User_Embeddings.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13598v1/extracted/5419570/figures/user-llm-motivation.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">User-LLM: Efficient LLM Contextualization with User Embeddings</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">User-LLM framework contextualizes LLMs with user embeddings for improved performance and efficiency.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='235' data-categories='architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Self_Distillation_Bridges_Distribution_Gap_in_Language_Model_Fine_Tuning/2024-02-21-Self_Distillation_Bridges_Distribution_Gap_in_Language_Model_Fine_Tuning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13669v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">TL;DR: Self-Distillation Fine-Tuning bridges distribution gap, improves LLM performance on specific tasks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='236' data-categories='security,robustness' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071864' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Round_Trip_Translation_Defence_against_Large_Language_Model_Jailbreaking_Attacks/2024-02-21-Round_Trip_Translation_Defence_against_Large_Language_Model_Jailbreaking_Attacks.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13517v1/extracted/5421742/Figures/Figure1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Round Trip Translation Defence against Large Language Model Jailbreaking Attacks</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">New method defends against social-engineered attacks on large language models, mitigating over 70% of attacks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='237' data-categories='robustness,prompt-engineering,architectures,security' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071840' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Large_Language_Models_are_Vulnerable_to_Bait_and_Switch_Attacks_for_Generating_Harmful_Content/2024-02-21-Large_Language_Models_are_Vulnerable_to_Bait_and_Switch_Attacks_for_Generating_Harmful_Content.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">Safe language model outputs can be manipulated into harmful content through Bait-and-Switch attacks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='238' data-categories='prompt-engineering' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071804' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/DeiSAM_Segment_Anything_with_Deictic_Prompting/2024-02-21-DeiSAM_Segment_Anything_with_Deictic_Prompting.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.14123v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">DeiSAM: Segment Anything with Deictic Prompting</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: DeiSAM uses neural networks and logic reasoners for deictic promptable image segmentation.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='239' data-categories='education,production' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Using_Large_Language_Models_for_Natural_Language_Processing_Tasks_in_Requirements_Engineering_A_Systematic_Guideline/2024-02-21-Using_Large_Language_Models_for_Natural_Language_Processing_Tasks_in_Requirements_Engineering_A_Systematic_Guideline.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13823v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Using Large Language Models for Natural Language Processing Tasks in Requirements Engineering: A Systematic Guideline</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">TL;DR: This article provides knowledge and guidelines for using Large Language Models in NLP for RE.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='240' data-categories='hci,architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/An_Evaluation_of_Large_Language_Models_in_Bioinformatics_Research/2024-02-21-An_Evaluation_of_Large_Language_Models_in_Bioinformatics_Research.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13714v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">An Evaluation of Large Language Models in Bioinformatics Research</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">LLMs like ChatGPT show potential in bioinformatics tasks, with some limitations. Motivates future research.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='241' data-categories='robustness,prompt-engineering,security,production' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/An_Explainable_Transformer_based_Model_for_Phishing_Email_Detection_A_Large_Language_Model_Approach/2024-02-21-An_Explainable_Transformer_based_Model_for_Phishing_Email_Detection_A_Large_Language_Model_Approach.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13871v1/extracted/5401888/figs/Methodology.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Phishing emails are a serious threat. Our DistilBERT model effectively detects them with high accuracy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='242' data-categories='architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071828' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/InfinityBench_Extending_Long_Context_Evaluation_Beyond_100K_Tokens/2024-02-21-InfinityBench_Extending_Long_Context_Evaluation_Beyond_100K_Tokens.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13718v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">InfinityBench: Extending Long Context Evaluation Beyond 100K Tokens</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">LLMs need improvement to effectively process 100K+ context, lacking standardized benchmark.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='243' data-categories='social-sciences' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071856' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Potential_and_Challenges_of_Model_Editing_for_Social_Debiasing/2024-02-21-Potential_and_Challenges_of_Model_Editing_for_Social_Debiasing.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13462v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Potential and Challenges of Model Editing for Social Debiasing</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Large language models suffer from stereotype biases, model editing methods show potential for debiasing.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='244' data-categories='education,production,architectures,programming,social-sciences' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Kuaiji_the_First_Chinese_Accounting_Large_Language_Model/2024-02-21-Kuaiji_the_First_Chinese_Accounting_Large_Language_Model.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13866v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Kuaiji: the First Chinese Accounting Large Language Model</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Kuaiji: specialized Chinese accounting LLM with high accuracy and speed.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='245' data-categories='recommender' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071844' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Leveraging_Translation_For_Optimal_Recall_Tailoring_LLM_Personalization_With_User_Profiles/2024-02-21-Leveraging_Translation_For_Optimal_Recall_Tailoring_LLM_Personalization_With_User_Profiles.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13500v1/extracted/5421673/method_ir.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Leveraging Translation For Optimal Recall: Tailoring LLM Personalization With User Profiles</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('recommender'); return false;">recommender</div> </div> <div class="card-text listing-description delink">Novel technique improves cross-language information retrieval with personalized query refinement and semantic expansion.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='246' data-categories='architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071792' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Calibrating_Large_Language_Models_with_Sample_Consistency/2024-02-21-Calibrating_Large_Language_Models_with_Sample_Consistency.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13904v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Calibrating Large Language Models with Sample Consistency</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">LLMs need calibrated confidence; consistency-based methods outperform post-hoc approaches, with potential for model enhancement.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='247' data-categories='education' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071824' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Hybrid_Reasoning_Based_on_Large_Language_Models_for_Autonomous_Car_Driving/2024-02-21-Hybrid_Reasoning_Based_on_Large_Language_Models_for_Autonomous_Car_Driving.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13602v1/extracted/5422113/flow2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs improve autonomous driving by combining text and images for decision-making in dynamic situations.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='248' data-categories='architectures' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071816' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/From_Text_to_CQL_Bridging_Natural_Language_and_Corpus_Search_Engine/2024-02-21-From_Text_to_CQL_Bridging_Natural_Language_and_Corpus_Search_Engine.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13740v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">From Text to CQL: Bridging Natural Language and Corpus Search Engine</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">NLP automates natural language to CQL queries, improving linguistic research and text analysis.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='249' data-categories='prompt-engineering' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Effective_and_Efficient_Conversation_Retrieval_for_Dialogue_State_Tracking_with_Implicit_Text_Summaries/2024-02-21-Effective_and_Efficient_Conversation_Retrieval_for_Dialogue_State_Tracking_with_Implicit_Text_Summaries.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13043v2/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: Few-shot DST uses LLM conversation retriever, improved with text summaries for better performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='250' data-categories='robustness,prompt-engineering' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071864' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/RefuteBench_Evaluating_Refuting_Instruction_Following_for_Large_Language_Models/2024-02-21-RefuteBench_Evaluating_Refuting_Instruction_Following_for_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13463v1/extracted/5421565/editing.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">LLMs struggle to accept and follow user feedback, prompting need for recall-and-repeat prompts.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='251' data-categories='robustness' data-listing-date-sort='1708473600000' data-listing-file-modified-sort='1717413071776' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/ARL2_Aligning_Retrievers_for_Black_box_Large_Language_Models_via_Self_guided_Adaptive_Relevance_Labeling/2024-02-21-ARL2_Aligning_Retrievers_for_Black_box_Large_Language_Models_via_Self_guided_Adaptive_Relevance_Labeling.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13542v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">Arl2 improves large language models with better retriever learning and transfer capabilities.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 21, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='252' data-categories='education' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071884' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Towards_Cross_Tokenizer_Distillation_the_Universal_Logit_Distillation_Loss_for_LLMs/2024-02-20-Towards_Cross_Tokenizer_Distillation_the_Universal_Logit_Distillation_Loss_for_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12030v2/extracted/5419742/tokenize-vocabularies-small.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">TL;DR: Universal Logit Distillation compresses knowledge from large language models for wider applicability.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='253' data-categories='education' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Me_LLaMA_Foundation_Large_Language_Models_for_Medical_Applications/2024-02-20-Me_LLaMA_Foundation_Large_Language_Models_for_Medical_Applications.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12749v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Me LLaMA: Foundation Large Language Models for Medical Applications</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">Me LLaMA outperforms other medical LLMs in various tasks, making it ideal for medical AI.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='254' data-categories='prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071828' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Instruction_tuned_Language_Models_are_Better_Knowledge_Learners/2024-02-20-Instruction_tuned_Language_Models_are_Better_Knowledge_Learners.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12847v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Instruction-tuned Language Models are Better Knowledge Learners</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Pre-instruction-tuning (PIT) improves large language model (LLM) knowledge absorption by 17.8%.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='255' data-categories='robustness,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071880' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/TofuEval_Evaluating_Hallucinations_of_LLMs_on_Topic_Focused_Dialogue_Summarization/2024-02-20-TofuEval_Evaluating_Hallucinations_of_LLMs_on_Topic_Focused_Dialogue_Summarization.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13249v1/x3.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Advances in news summarization don't carry over to dialogue summarization. LLMs generate factual errors. Benchmark dataset released.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='256' data-categories='architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071884' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/TreeEval_Benchmark_Free_Evaluation_of_Large_Language_Models_through_Tree_Planning/2024-02-20-TreeEval_Benchmark_Free_Evaluation_of_Large_Language_Models_through_Tree_Planning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13125v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">TreeEval introduces a benchmark-free evaluation method for large language models, addressing data leakage issues.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='257' data-categories='education,production,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/ELAD_Explanation_Guided_Large_Language_Models_Active_Distillation/2024-02-20-ELAD_Explanation_Guided_Large_Language_Models_Active_Distillation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13098v1/extracted/5420568/framework.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">ELAD: Explanation-Guided Large Language Models Active Distillation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: ELAD framework improves LLM distillation efficiency with active learning and sample selection.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='258' data-categories='social-sciences,hci' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Advancing_Large_Language_Models_to_Capture_Varied_Speaking_Styles_and_Respond_Properly_in_Spoken_Conversations/2024-02-20-Advancing_Large_Language_Models_to_Capture_Varied_Speaking_Styles_and_Respond_Properly_in_Spoken_Conversations.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12786v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Spoken dialogue style affects responses; Spoken-LLM framework outperforms text-only models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='259' data-categories='social-sciences,architectures,production,security' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Bayesian_Reward_Models_for_LLM_Alignment/2024-02-20-Bayesian_Reward_Models_for_LLM_Alignment.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13210v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Bayesian Reward Models for LLM Alignment</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">Bayesian reward models mitigate overoptimization in large language model responses.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='260' data-categories='social-sciences' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Are_Large_Language_Models_Rational_Investors/2024-02-20-Are_Large_Language_Models_Rational_Investors.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12713v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Are Large Language Models Rational Investors?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">LLMs in finance have biases, need thorough assessment. FBI framework evaluates rationality, reveals varying degrees of irrationality.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='261' data-categories='education,hci' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Modality_Aware_Integration_with_Large_Language_Models_for_Knowledge_based_Visual_Question_Answering/2024-02-20-Modality_Aware_Integration_with_Large_Language_Models_for_Knowledge_based_Visual_Question_Answering.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12728v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">KVQA challenges addressed with modality-aware integration for image understanding and knowledge reasoning.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='262' data-categories='robustness,hci' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071864' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Reliable_LLM_based_User_Simulator_for_Task_Oriented_Dialogue_Systems/2024-02-20-Reliable_LLM_based_User_Simulator_for_Task_Oriented_Dialogue_Systems.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Article: The Impact of Social Media on Mental Health in Adolescents tl;dr: Social media use linked to negative mental health outcomes in adolescents.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='263' data-categories='architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Smaug_Fixing_Failure_Modes_of_Preference_Optimisation_with_DPO_Positive/2024-02-20-Smaug_Fixing_Failure_Modes_of_Preference_Optimisation_with_DPO_Positive.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13228v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">DPO improves large language model performance, DPOP outperforms DPO, achieves state-of-the-art open-source performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='264' data-categories='social-sciences,architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071792' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/CIF_Bench_A_Chinese_Instruction_Following_Benchmark_for_Evaluating_the_Generalizability_of_Large_Language_Models/2024-02-20-CIF_Bench_A_Chinese_Instruction_Following_Benchmark_for_Evaluating_the_Generalizability_of_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13109v1/x12.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">TL;DR: CIF-Bench tests LLMs' generalizability to Chinese, revealing limitations and evaluation biases.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='265' data-categories='architectures' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Slot_VLM_SlowFast_Slots_for_Video_Language_Modeling/2024-02-20-Slot_VLM_SlowFast_Slots_for_Video_Language_Modeling.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13088v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Slot-VLM: SlowFast Slots for Video-Language Modeling</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">Slot-VLM framework generates video tokens for efficient question-answering, achieving state-of-the-art performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='266' data-categories='robustness,security' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/GumbelSoft_Diversified_Language_Model_Watermarking_via_the_GumbelMax_trick/2024-02-20-GumbelSoft_Diversified_Language_Model_Watermarking_via_the_GumbelMax_trick.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12948v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">Large language models raise concerns about misuse, but GumbelSoft watermark enhances diversity and performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='267' data-categories='social-sciences,production,hci,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071828' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Investigating_Cultural_Alignment_of_Large_Language_Models/2024-02-20-Investigating_Cultural_Alignment_of_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13231v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Investigating Cultural Alignment of Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Large Language Models (LLMs) align with cultures, but need diverse pretraining data for accuracy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='268' data-categories='programming,architectures,production,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071864' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/RoCode_A_Dataset_for_Measuring_Code_Intelligence_from_Problem_Definitions_in_Romanian/2024-02-20-RoCode_A_Dataset_for_Measuring_Code_Intelligence_from_Problem_Definitions_in_Romanian.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13222v1/extracted/5420407/images/flag.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: RoCode provides Romanian programming dataset to evaluate language models and fine-tune Romanian models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='269' data-categories='social-sciences' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071812' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Explaining_Relationships_Among_Research_Papers/2024-02-20-Explaining_Relationships_Among_Research_Papers.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Explaining Relationships Among Research Papers</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Automatically generate customized literature reviews to help researchers decide what to read.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='270' data-categories='social-sciences,hci' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Are_Large_Language_Models_(LLMs)_Good_Social_Predictors/2024-02-20-Are_Large_Language_Models_(LLMs)_Good_Social_Predictors.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/https:/browse.arxiv.org/html/2402.12620v1/extracted/5418884/img/votingresult1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Are Large Language Models (LLMs) Good Social Predictors?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs struggle with social prediction without input shortcuts, requiring further enhancement.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='271' data-categories='education,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071856' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/PromptKD_Distilling_Student_Friendly_Knowledge_for_Generative_Language_Models_via_Prompt_Tuning/2024-02-20-PromptKD_Distilling_Student_Friendly_Knowledge_for_Generative_Language_Models_via_Prompt_Tuning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12842v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Advancements in large language models raise inference costs, prompting research into model compression. PromptKD achieves state-of-the-art performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='272' data-categories='architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Soft_Self_Consistency_Improves_Language_Model_Agents/2024-02-20-Soft_Self_Consistency_Improves_Language_Model_Agents.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13212v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Soft Self-Consistency Improves Language Model Agents</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Sampling and scoring improve language model generations; Soft Self-Consistency increases performance and efficiency.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='273' data-categories='robustness,security' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071876' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/TRAP_Targeted_Random_Adversarial_Prompt_Honeypot_for_Black_Box_Identification/2024-02-20-TRAP_Targeted_Random_Adversarial_Prompt_Honeypot_for_Black_Box_Identification.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12991v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">TL;DR: TRAP method detects LLM use in third-party apps with high accuracy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='274' data-categories='prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071876' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/SymBa_Symbolic_Backward_Chaining_for_Multi_step_Natural_Language_Reasoning/2024-02-20-SymBa_Symbolic_Backward_Chaining_for_Multi_step_Natural_Language_Reasoning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12806v1/extracted/5419471/figures/figure_intro.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">SymBa: Symbolic Backward Chaining for Multi-step Natural Language Reasoning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: Symbolic Backward Chaining improves multi-step reasoning with LLM integration.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='275' data-categories='prompt-engineering,recommender' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Unlocking_the_Why_of_Buying_Introducing_a_New_Dataset_and_Benchmark_for_Purchase_Reason_and_Post_Purchase_Experience/2024-02-20-Unlocking_the_Why_of_Buying_Introducing_a_New_Dataset_and_Benchmark_for_Purchase_Reason_and_Post_Purchase_Experience.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Unlocking the `Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('recommender'); return false;">recommender</div> </div> <div class="card-text listing-description delink">High-quality datasets needed for explainable recommendation systems, propose novel purchase reason explanation task.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='276' data-categories='education,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Structure_Guided_Prompt_Instructing_Large_Language_Model_in_Multi_Step_Reasoning_by_Exploring_Graph_Structure_of_the_Text/2024-02-20-Structure_Guided_Prompt_Instructing_Large_Language_Model_in_Multi_Step_Reasoning_by_Exploring_Graph_Structure_of_the_Text.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13415v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">LLMs struggle with complex reasoning, but Structure Guided Prompt improves multi-step reasoning capabilities.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='277' data-categories='hci' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071836' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Large_Language_Model_based_Human_Agent_Collaboration_for_Complex_Task_Solving/2024-02-20-Large_Language_Model_based_Human_Agent_Collaboration_for_Complex_Task_Solving.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12914v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Large Language Model-based Human-Agent Collaboration for Complex Task Solving</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Integration of LLMs in human-agent collaboration for complex task-solving, ReHAC method shows effectiveness.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='278' data-categories='architectures' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/SiLLM_Large_Language_Models_for_Simultaneous_Machine_Translation/2024-02-20-SiLLM_Large_Language_Models_for_Simultaneous_Machine_Translation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13036v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">SiLLM: Large Language Models for Simultaneous Machine Translation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">SiLLM decouples SiMT into policy and translation sub-tasks, achieving state-of-the-art performance with LLM.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='279' data-categories='robustness,social-sciences' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071792' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Bias_in_Language_Models_Beyond_Trick_Tests_and_Toward_RUTEd_Evaluation/2024-02-20-Bias_in_Language_Models_Beyond_Trick_Tests_and_Toward_RUTEd_Evaluation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12649v1/extracted/5418948/final_results_combined.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Bias benchmarks don't accurately predict real-world harm in language models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='280' data-categories='social-sciences,architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071792' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/BiMediX_Bilingual_Medical_Mixture_of_Experts_LLM/2024-02-20-BiMediX_Bilingual_Medical_Mixture_of_Experts_LLM.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13253v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">BiMediX: Bilingual Medical Mixture of Experts LLM</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Introducing BiMediX: bilingual medical LLM for English and Arabic, outperforming state-of-the-art models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='281' data-categories='robustness,architectures,production,security' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071804' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Defending_Jailbreak_Prompts_via_In_Context_Adversarial_Game/2024-02-20-Defending_Jailbreak_Prompts_via_In_Context_Adversarial_Game.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13148v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Defending Jailbreak Prompts via In-Context Adversarial Game</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">ICAG defends large language models from jailbreak attacks without fine-tuning, with high efficacy and transferability.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='282' data-categories='social-sciences,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071816' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Few_shot_clinical_entity_recognition_in_three_languages_Masked_language_models_outperform_LLM_prompting/2024-02-20-Few_shot_clinical_entity_recognition_in_three_languages_Masked_language_models_outperform_LLM_prompting.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Few shot clinical entity recognition in three languages: Masked language models outperform LLM prompting</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Large Language Models not ready for clinical entity recognition; better for speeding up data annotation.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='283' data-categories='architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071812' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Event_level_Knowledge_Editing/2024-02-20-Event_level_Knowledge_Editing.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13093v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Event-level Knowledge Editing</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Knowledge editing updates large language models with new events for efficiency and completeness. ELKEN benchmark challenges existing methods.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='284' data-categories='robustness,social-sciences,architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Benchmarking_Retrieval_Augmented_Generation_for_Medicine/2024-02-20-Benchmarking_Retrieval_Augmented_Generation_for_Medicine.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13178v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Benchmarking Retrieval-Augmented Generation for Medicine</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">LLMs struggle with outdated knowledge, but RAG improves medical question answering accuracy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='285' data-categories='social-sciences,hci,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071792' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Can_Large_Language_Models_be_Good_Emotional_Supporter_Mitigating_Preference_Bias_on_Emotional_Support_Conversation/2024-02-20-Can_Large_Language_Models_be_Good_Emotional_Supporter_Mitigating_Preference_Bias_on_Emotional_Support_Conversation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13211v1/extracted/5420974/figure/llms_motivation.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">ESConv dataset reveals LLMs struggle with emotional support, need external assistance for improvement.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='286' data-categories='education,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Healthcare_Copilot_Eliciting_the_Power_of_General_LLMs_for_Medical_Consultation/2024-02-20-Healthcare_Copilot_Eliciting_the_Power_of_General_LLMs_for_Medical_Consultation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13408v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: Healthcare Copilot enhances language models for medical consultations, with three main components and positive results.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='287' data-categories='robustness,security' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Measuring_Impacts_of_Poisoning_on_Model_Parameters_and_Neuron_Activations_A_Case_Study_of_Poisoning_CodeBERT/2024-02-20-Measuring_Impacts_of_Poisoning_on_Model_Parameters_and_Neuron_Activations_A_Case_Study_of_Poisoning_CodeBERT.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12936v1/extracted/5420119/results/distribution_codebert-base_layer_11_weight.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Measuring Impacts of Poisoning on Model Parameters and Neuron Activations: A Case Study of Poisoning CodeBERT</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">TL;DR: Analyzing model parameters to detect backdoor signals in code models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='288' data-categories='social-sciences' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Understanding_the_effects_of_language_specific_class_imbalance_in_multilingual_fine_tuning/2024-02-20-Understanding_the_effects_of_language_specific_class_imbalance_in_multilingual_fine_tuning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13016v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Understanding the effects of language-specific class imbalance in multilingual fine-tuning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Imbalanced labels in multilingual datasets affect transformer model performance, but language-specific class weights can help.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='289' data-categories='prompt-engineering,programming' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/A_Simple_but_Effective_Approach_to_Improve_Structured_Language_Model_Output_for_Information_Extraction/2024-02-20-A_Simple_but_Effective_Approach_to_Improve_Structured_Language_Model_Output_for_Information_Extraction.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13364v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">G&O method improves LLMs' structured text generation, enhancing performance in NER and RE tasks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='290' data-categories='architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/OLViT_Multi_Modal_State_Tracking_via_Attention_Based_Embeddings_for_Video_Grounded_Dialog/2024-02-20-OLViT_Multi_Modal_State_Tracking_via_Attention_Based_Embeddings_for_Video_Grounded_Dialog.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13146v1/extracted/5420688/figures/website.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Novel video dialog model 𝕆⁢𝕃⁢𝕍𝕆𝕃𝕍blackboard_O blackboard_L blackboard_Vi𝕋𝕋blackboard_T  improves object tracking and dialog state tracking.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='291' data-categories='education,architectures' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071876' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Synthetic_Data_(Almost)_from_Scratch_Generalized_Instruction_Tuning_for_Language_Models/2024-02-20-Synthetic_Data_(Almost)_from_Scratch_Generalized_Instruction_Tuning_for_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13064v1/extracted/5420465/images/glan_cmp_v4.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">GLAN is a method for instruction tuning of Large Language Models using a pre-curated taxonomy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='292' data-categories='programming,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071796' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Code_Needs_Comments_Enhancing_Code_LLMs_with_Comment_Augmentation/2024-02-20-Code_Needs_Comments_Enhancing_Code_LLMs_with_Comment_Augmentation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13013v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Code Needs Comments: Enhancing Code LLMs with Comment Augmentation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: Pre-training data impacts code-focused LLMs; new method improves performance on programming skill benchmarks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='293' data-categories='robustness,architectures,security,production,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071824' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/How_Easy_is_It_to_Fool_Your_Multimodal_LLMs_An_Empirical_Analysis_on_Deceptive_Prompts/2024-02-20-How_Easy_is_It_to_Fool_Your_Multimodal_LLMs_An_Empirical_Analysis_on_Deceptive_Prompts.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13220v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">MAD-Bench tests MLLMs' vulnerability to deceptive prompts, showing GPT-4V outperforms other models. Proposed remedy improves accuracy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='294' data-categories='prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/An_Autonomous_Large_Language_Model_Agent_for_Chemical_Literature_Data_Mining/2024-02-20-An_Autonomous_Large_Language_Model_Agent_for_Chemical_Literature_Data_Mining.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12993v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">An Autonomous Large Language Model Agent for Chemical Literature Data Mining</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">AI aids chemical synthesis data analysis, overcoming challenges in literature processing.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='295' data-categories='robustness,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/OPDAI_at_SemEval_2024_Task_6_Small_LLMs_can_Accelerate_Hallucination_Detection_with_Weakly_Supervised_Data/2024-02-20-OPDAI_at_SemEval_2024_Task_6_Small_LLMs_can_Accelerate_Hallucination_Detection_with_Weakly_Supervised_Data.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12913v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination Detection with Weakly Supervised Data</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Unified system detects LLM hallucination, wins prize, achieves results, uses prompt engineering, few-shot learning.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='296' data-categories='architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/A_Survey_on_Knowledge_Distillation_of_Large_Language_Models/2024-02-20-A_Survey_on_Knowledge_Distillation_of_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13116v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A Survey on Knowledge Distillation of Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Survey explores knowledge distillation in Large Language Models, bridging gap between proprietary and open-source models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='297' data-categories='architectures,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Effective_and_Efficient_Conversation_Retrieval_for_Dialogue_State_Tracking_with_Implicit_Text_Summaries/2024-02-20-Effective_and_Efficient_Conversation_Retrieval_for_Dialogue_State_Tracking_with_Implicit_Text_Summaries.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13043v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: Few-shot DST with LLM uses conversation summarization for effective conversation retrieval and improved performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='298' data-categories='architectures,production' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Softmax_Probabilities_(Mostly)_Predict_Large_Language_Model_Correctness_on_Multiple_Choice_QA/2024-02-20-Softmax_Probabilities_(Mostly)_Predict_Large_Language_Model_Correctness_on_Multiple_Choice_QA.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.13213v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Softmax Probabilities (Mostly) Predict Large Language Model Correctness on Multiple-Choice Q&A</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Large language models overconfident on Q&A tasks; wrong answers associated with smaller maximum softmax probabilities. Abstaining improves performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='299' data-categories='education,architectures,prompt-engineering' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071840' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Learning_to_Check_Unleashing_Potentials_for_Self_Correction_in_Large_Language_Models/2024-02-20-Learning_to_Check_Unleashing_Potentials_for_Self_Correction_in_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.13035v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">LLMs improve reasoning through self-correction, enhanced by meticulous training data design.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='300' data-categories='education' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071816' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/FormulaQA_A_Question_Answering_Dataset_for_Formula_Based_Numerical_Reasoning/2024-02-20-FormulaQA_A_Question_Answering_Dataset_for_Formula_Based_Numerical_Reasoning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12692v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">FormulaQA: A Question Answering Dataset for Formula-Based Numerical Reasoning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">Proposing FormulaQA dataset for formula-based numerical reasoning, evaluating LLMs and exploring retrieval-augmented LLMs.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='301' data-categories='robustness,security,education,hci,prompt-engineering,programming' data-listing-date-sort='1708387200000' data-listing-file-modified-sort='1717413071856' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Prompt_Stealing_Attacks_Against_Large_Language_Models/2024-02-20-Prompt_Stealing_Attacks_Against_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12959v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Prompt Stealing Attacks Against Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">TL;DR: Proposed prompt stealing attack aims to steal well-designed prompts from large language models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 20, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='302' data-categories='education' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071828' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Investigating_Multi_Hop_Factual_Shortcuts_in_Knowledge_Editing_of_Large_Language_Models/2024-02-19-Investigating_Multi_Hop_Factual_Shortcuts_in_Knowledge_Editing_of_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11900v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs can use shortcuts for multi-hop reasoning, but erasing them reduces failures.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='303' data-categories='production,architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Sequoia_Scalable_Robust_and_Hardware_aware_Speculative_Decoding/2024-02-19-Sequoia_Scalable_Robust_and_Hardware_aware_Speculative_Decoding.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">Sequoia improves large language model inference speed by up to 10.33x on specific hardware platforms.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='304' data-categories='robustness,architectures,production' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Adaptive_Skeleton_Graph_Decoding/2024-02-19-Adaptive_Skeleton_Graph_Decoding.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12280v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Adaptive Skeleton Graph Decoding</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Large language models (LLMs) use Skeleton Graph Decoding (SGD) for faster, higher quality responses.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='305' data-categories='education' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071884' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Towards_Cross_Tokenizer_Distillation_the_Universal_Logit_Distillation_Loss_for_LLMs/2024-02-19-Towards_Cross_Tokenizer_Distillation_the_Universal_Logit_Distillation_Loss_for_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12030v1/extracted/5417308/tokenize-vocabularies-small.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">TL;DR: Universal Logit Distillation compresses knowledge from large language models for wider applicability.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='306' data-categories='prompt-engineering' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071800' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Creating_a_Fine_Grained_Entity_Type_Taxonomy_Using_LLMs/2024-02-19-Creating_a_Fine_Grained_Entity_Type_Taxonomy_Using_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12557v1/extracted/5402905/assets/init_prompt.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Creating a Fine Grained Entity Type Taxonomy Using LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">GPT-4 and GPT-4 Turbo autonomously develop a detailed entity type taxonomy. Over 5000 nuanced types.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='307' data-categories='prompt-engineering' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071836' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LLM_as_Prompter_Low_resource_Inductive_Reasoning_on_Arbitrary_Knowledge_Graphs/2024-02-19-LLM_as_Prompter_Low_resource_Inductive_Reasoning_on_Arbitrary_Knowledge_Graphs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11804v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">KG inductive reasoning with LLMs improves low-resource scenarios, outperforming previous methods in reasoning tasks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='308' data-categories='social-sciences,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Microstructures_and_Accuracy_of_Graph_Recall_by_Large_Language_Models/2024-02-19-Microstructures_and_Accuracy_of_Graph_Recall_by_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11821v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Microstructures and Accuracy of Graph Recall by Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs struggle with graph recall, exhibiting biased patterns and domain dependence.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='309' data-categories='education' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/BIDER_Bridging_Knowledge_Inconsistency_for_Efficient_Retrieval_Augmented_LLMs_via_Key_Supporting_Evidence/2024-02-19-BIDER_Bridging_Knowledge_Inconsistency_for_Efficient_Retrieval_Augmented_LLMs_via_Key_Supporting_Evidence.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12174v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">BIDER refines retrieval documents into Key Supporting Evidence for improved answer quality in LLMs.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='310' data-categories='production,architectures,social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/High_quality_Data_to_Text_Generation_for_Severely_Under_Resourced_Languages_with_Out_of_the_box_Large_Language_Models/2024-02-19-High_quality_Data_to_Text_Generation_for_Severely_Under_Resourced_Languages_with_Out_of_the_box_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12267v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">LLMs outperform for under-resourced languages, showing potential to bridge performance gap.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='311' data-categories='prompt-engineering,programming' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Enhancing_Large_Language_Models_for_Text_to_Testcase_Generation/2024-02-19-Enhancing_Large_Language_Models_for_Text_to_Testcase_Generation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11910v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Enhancing Large Language Models for Text-to-Testcase Generation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">TL;DR: GPT-3.5 fine-tuned for text-to-testcase generation outperforms other language models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='312' data-categories='production,architectures,education' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071876' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Task_Oriented_Dialogue_with_In_Context_Learning/2024-02-19-Task_Oriented_Dialogue_with_In_Context_Learning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Task-Oriented Dialogue with In-Context Learning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">System combines large language models with business logic for efficient task-oriented dialogue systems.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='313' data-categories='education,prompt-engineering' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Artifacts_or_Abduction_How_Do_LLMs_Answer_Multiple_Choice_Questions_Without_the_Question/2024-02-19-Artifacts_or_Abduction_How_Do_LLMs_Answer_Multiple_Choice_Questions_Without_the_Question.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12483v1/extracted/5418316/data/full_vs_artifact1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">LLMs perform well on MCQA with choices-only prompts, using group dynamics and question inference.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='314' data-categories='social-sciences,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LLM_Agents_for_Psychology_A_Study_on_Gamified_Assessments/2024-02-19-LLM_Agents_for_Psychology_A_Study_on_Gamified_Assessments.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12326v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LLM Agents for Psychology: A Study on Gamified Assessments</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">PsychoGAT uses game agents for engaging and effective psychological assessment, validated through psychometric evaluations.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='315' data-categories='robustness,architectures,security,production' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Emulated_Disalignment_Safety_Alignment_for_Large_Language_Models_May_Backfire!/2024-02-19-Emulated_Disalignment_Safety_Alignment_for_Large_Language_Models_May_Backfire!.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12343v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Inference-time attack framework Emulated Disalignment (ED) doubles harmfulness of pre-trained language models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='316' data-categories='robustness,security' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071836' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Language_Models_are_Homer_Simpson!_Safety_Re_Alignment_of_Fine_tuned_Language_Models_through_Task_Arithmetic/2024-02-19-Language_Models_are_Homer_Simpson!_Safety_Re_Alignment_of_Fine_tuned_Language_Models_through_Task_Arithmetic.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11746v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">RESTA method improves safety of language models through simple arithmetic addition.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='317' data-categories='education' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Small_Models_Big_Insights_Leveraging_Slim_Proxy_Models_To_Decide_When_and_What_to_Retrieve_for_LLMs/2024-02-19-Small_Models_Big_Insights_Leveraging_Slim_Proxy_Models_To_Decide_When_and_What_to_Retrieve_for_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12052v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">SlimPLM enhances large language models' knowledge acquisition, improving question-answering performance with lower computational costs.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='318' data-categories='education' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071840' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Learning_to_Edit_Aligning_LLMs_with_Knowledge_Editing/2024-02-19-Learning_to_Edit_Aligning_LLMs_with_Knowledge_Editing.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11905v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Learning to Edit: Aligning LLMs with Knowledge Editing</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LTE framework improves knowledge editing in large language models without compromising performance or efficiency.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='319' data-categories='programming' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071800' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/DB_LLM_Accurate_Dual_Binarization_for_Efficient_LLMs/2024-02-19-DB_LLM_Accurate_Dual_Binarization_for_Efficient_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">DB-LLM: Accurate Dual-Binarization for Efficient LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">LLMs improved with Dual-Binarization method for computational efficiency and accuracy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='320' data-categories='prompt-engineering,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071812' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/FIPO_Free_form_Instruction_oriented_Prompt_Optimization_with_Preference_Dataset_and_Modular_Fine_tuning_Schema/2024-02-19-FIPO_Free_form_Instruction_oriented_Prompt_Optimization_with_Preference_Dataset_and_Modular_Fine_tuning_Schema.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11811v1/extracted/5416670/example.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">FIPO optimizes prompts for Large Language Models, improving user-bot interactions.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='321' data-categories='education' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/An_Empirical_Evaluation_of_LLMs_for_Solving_Offensive_Security_Challenges/2024-02-19-An_Empirical_Evaluation_of_LLMs_for_Solving_Offensive_Security_Challenges.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11814v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">An Empirical Evaluation of LLMs for Solving Offensive Security Challenges</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs effectively solve CTF challenges, outperforming human participants, with potential for cybersecurity education.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='322' data-categories='education,architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071884' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Transformer_based_Causal_Language_Models_Perform_Clustering/2024-02-19-Transformer_based_Causal_Language_Models_Perform_Clustering.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12151v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Transformer-based Causal Language Models Perform Clustering</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">LLMs struggle to follow human instructions, but additional training improves capability through data clustering.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='323' data-categories='education,social-sciences,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/EmoBench_Evaluating_the_Emotional_Intelligence_of_Large_Language_Models/2024-02-19-EmoBench_Evaluating_the_Emotional_Intelligence_of_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12071v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">EmoBench: Evaluating the Emotional Intelligence of Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs need better EI benchmarks. EmoBench proposes comprehensive machine EI evaluation. Gap found between LLMs and humans.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='324' data-categories='robustness,education' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071844' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/MARS_Meaning_Aware_Response_Scoring_for_Uncertainty_Estimation_in_Generative_LLMs/2024-02-19-MARS_Meaning_Aware_Response_Scoring_for_Uncertainty_Estimation_in_Generative_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11756v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">Generative LLMs need better accuracy estimation. MARS improves uncertainty estimation in LLMs.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='325' data-categories='social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LEMMA_Towards_LVLM_Enhanced_Multimodal_Misinformation_Detection_with_External_Knowledge_Augmentation/2024-02-19-LEMMA_Towards_LVLM_Enhanced_Multimodal_Misinformation_Detection_with_External_Knowledge_Augmentation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11943v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">LVLM improves multimodal misinformation detection, but LEMMA with external knowledge augmentation is more accurate.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='326' data-categories='production,social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/NEO_BENCH_Evaluating_Robustness_of_Large_Language_Models_with_Neologisms/2024-02-19-NEO_BENCH_Evaluating_Robustness_of_Large_Language_Models_with_Neologisms.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12261v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Neologisms impact LLM performance, benchmark shows lower perplexities with later knowledge cutoff dates.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='327' data-categories='production' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Graph_Based_Retriever_Captures_the_Long_Tail_of_Biomedical_Knowledge/2024-02-19-Graph_Based_Retriever_Captures_the_Long_Tail_of_Biomedical_Knowledge.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12352v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">LLMs struggle with rare info in biomedical research. RAG and knowledge graph combo improves retrieval.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='328' data-categories='robustness' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/HU_at_SemEval_2024_Task_8A_Can_Contrastive_Learning_Learn_Embeddings_to_Detect_Machine_Generated_Text/2024-02-19-HU_at_SemEval_2024_Task_8A_Can_Contrastive_Learning_Learn_Embeddings_to_Detect_Machine_Generated_Text.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">Proposed system detects machine-generated text using contrastive learning with single model.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='329' data-categories='robustness,prompt-engineering,security' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/ArtPrompt_ASCII_Art_based_Jailbreak_Attacks_against_Aligned_LLMs/2024-02-19-ArtPrompt_ASCII_Art_based_Jailbreak_Attacks_against_Aligned_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11753v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">LLMs vulnerable to ASCII art-based jailbreak attack, bypassing safety measures.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='330' data-categories='production,architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/AnyGPT_Unified_Multimodal_LLM_with_Discrete_Sequence_Modeling/2024-02-19-AnyGPT_Unified_Multimodal_LLM_with_Discrete_Sequence_Modeling.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12226v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">AnyGPT is a multimodal language model that can process speech, text, images, and music.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='331' data-categories='prompt-engineering' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071876' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Tables_as_Images_Exploring_the_Strengths_and_Limitations_of_LLMs_on_Multimodal_Representations_of_Tabular_Data/2024-02-19-Tables_as_Images_Exploring_the_Strengths_and_Limitations_of_LLMs_on_Multimodal_Representations_of_Tabular_Data.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12424v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Comparing LLMs on tabular data with different prompts and formats for effective use.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='332' data-categories='architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/All_Language_Models_Large_and_Small/2024-02-19-All_Language_Models_Large_and_Small.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12061v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">All Language Models Large and Small</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">TL;DR: LONDI framework uses large language models selectively, reducing computational costs by 30%.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='333' data-categories='production,prompt-engineering,architectures,programming' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071816' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/GTBench_Uncovering_the_Strategic_Reasoning_Limitations_of_LLMs_via_Game_Theoretic_Evaluations/2024-02-19-GTBench_Uncovering_the_Strategic_Reasoning_Limitations_of_LLMs_via_Game_Theoretic_Evaluations.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12348v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">LLMs' reasoning in game tasks varies; open-source LLMs less competitive than commercial ones.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='334' data-categories='robustness,education,architectures,programming,production' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071776' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/ARKS_Active_Retrieval_in_Knowledge_Soup_for_Code_Generation/2024-02-19-ARKS_Active_Retrieval_in_Knowledge_Soup_for_Code_Generation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12317v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">ARKS: Active Retrieval in Knowledge Soup for Code Generation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">TL;DR: ARKS improves code generation by integrating diverse sources and using active retrieval strategy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='335' data-categories='production,architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Uncertainty_quantification_in_fine_tuned_LLMs_using_LoRA_ensembles/2024-02-19-Uncertainty_quantification_in_fine_tuned_LLMs_using_LoRA_ensembles.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12264v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Uncertainty quantification in fine-tuned LLMs using LoRA ensembles</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">Fine-tuning large language models improves performance, but understanding and trusting predictions is still lacking.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='336' data-categories='hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071800' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Comprehensive_Cognitive_LLM_Agent_for_Smartphone_GUI_Automation/2024-02-19-Comprehensive_Cognitive_LLM_Agent_for_Smartphone_GUI_Automation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11941v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Comprehensive Cognitive LLM Agent for Smartphone GUI Automation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">TL;DR: CoCo-Agent improves GUI automation with comprehensive perception and conditional action prediction. New state-of-the-art performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='337' data-categories='robustness,education,architectures,production,social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071864' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Reformatted_Alignment/2024-02-19-Reformatted_Alignment.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12219v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Reformatted Alignment</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">TL;DR: ReAlign improves language model alignment with human values and factual accuracy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='338' data-categories='architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Meta_Ranking_Less_Capable_Language_Models_are_Capable_for_Single_Response_Judgement/2024-02-19-Meta_Ranking_Less_Capable_Language_Models_are_Capable_for_Single_Response_Judgement.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12146v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">LLMs face reliability challenges, propose Meta Ranking method for error detection and performance enhancement.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='339' data-categories='robustness,architectures,security,production,prompt-engineering' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071800' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/CovRL_Fuzzing_JavaScript_Engines_with_Coverage_Guided_Reinforcement_Learning_for_LLM_based_Mutation/2024-02-19-CovRL_Fuzzing_JavaScript_Engines_with_Coverage_Guided_Reinforcement_Learning_for_LLM_based_Mutation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12222v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">CovRL: Fuzzing JavaScript Engines with Coverage-Guided Reinforcement Learning for LLM-based Mutation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">CovRL-Fuzz combines language models and reinforcement learning for improved bug-finding in JavaScript engines.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='340' data-categories='production,architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Enhancing_Multilingual_Capabilities_of_Large_Language_Models_through_Self_Distillation_from_Resource_Rich_Languages/2024-02-19-Enhancing_Multilingual_Capabilities_of_Large_Language_Models_through_Self_Distillation_from_Resource_Rich_Languages.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">TL;DR: SDRRL method improves multilingual performance of large language models. Source code available.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='341' data-categories='social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071892' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/What_Evidence_Do_Language_Models_Find_Convincing/2024-02-19-What_Evidence_Do_Language_Models_Find_Convincing.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11782v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">What Evidence Do Language Models Find Convincing?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Retrieval-augmented language models struggle with ambiguous queries, relying on website relevance over stylistic features.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='342' data-categories='robustness,security' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071776' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/A_Chinese_Dataset_for_Evaluating_the_Safeguards_in_Large_Language_Models/2024-02-19-A_Chinese_Dataset_for_Evaluating_the_Safeguards_in_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A Chinese Dataset for Evaluating the Safeguards in Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">Large language models (LLMs) pose risks, especially in Chinese, requiring safety assessment criteria.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='343' data-categories='education,prompt-engineering' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Self_AMPLIFY_Improving_Small_Language_Models_with_Self_Post_Hoc_Explanations/2024-02-19-Self_AMPLIFY_Improving_Small_Language_Models_with_Self_Post_Hoc_Explanations.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12038v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Self-AMPLIFY automates rationale generation for Small Language Models, improving performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='344' data-categories='production' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071812' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Explain_then_Rank_Scale_Calibration_of_Neural_Rankers_Using_Natural_Language_Explanations_from_Large_Language_Models/2024-02-19-Explain_then_Rank_Scale_Calibration_of_Neural_Rankers_Using_Natural_Language_Explanations_from_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12276v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Scale calibration in ranking systems is crucial for mirroring real-world value and boosting effectiveness. Neural rankers pose challenges.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='345' data-categories='education,prompt-engineering,social-sciences,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071880' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/The_Colorful_Future_of_LLMs_Evaluating_and_Improving_LLMs_as_Emotional_Supporters_for_Queer_Youth/2024-02-19-The_Colorful_Future_of_LLMs_Evaluating_and_Improving_LLMs_as_Emotional_Supporters_for_Queer_Youth.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11886v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Queer youth rely on online resources for support, but LLMs lack empathy and personalization.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='346' data-categories='social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071804' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Direct_Large_Language_Model_Alignment_Through_Self_Rewarding_Contrastive_Prompt_Distillation/2024-02-19-Direct_Large_Language_Model_Alignment_Through_Self_Rewarding_Contrastive_Prompt_Distillation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Method to align large language models with human expectations using contrastive prompt pairs. Outperforms RLAIF.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='347' data-categories='social-sciences,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Enhancing_Empathetic_Response_Generation_by_Augmenting_LLMs_with_Small_scale_Empathetic_Models/2024-02-19-Enhancing_Empathetic_Response_Generation_by_Augmenting_LLMs_with_Small_scale_Empathetic_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">AI needs nuanced emotional understanding; Hybrid Empathetic Framework combines large and small models for improvement.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='348' data-categories='prompt-engineering,social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071796' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/ChatGPT_Based_Data_Augmentation_for_Improved_Parameter_Efficient_Debiasing_of_LLMs/2024-02-19-ChatGPT_Based_Data_Augmentation_for_Improved_Parameter_Efficient_Debiasing_of_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11764v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">TL;DR: ChatGPT generates synthetic data to enhance debiasing of Large Language Models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='349' data-categories='architectures,production,social-sciences,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071892' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/WorldCoder_a_Model_Based_LLM_Agent_Building_World_Models_by_Writing_Code_and_Interacting_with_the_Environment/2024-02-19-WorldCoder_a_Model_Based_LLM_Agent_Building_World_Models_by_Writing_Code_and_Interacting_with_the_Environment.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12275v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Model-based agent builds Python program to represent world knowledge, efficient in gridworlds.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='350' data-categories='production,prompt-engineering,architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071828' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Is_Open_Source_There_Yet_A_Comparative_Study_on_Commercial_and_Open_Source_LLMs_in_Their_Ability_to_Label_Chest_X_Ray_Reports/2024-02-19-Is_Open_Source_There_Yet_A_Comparative_Study_on_Commercial_and_Open_Source_LLMs_in_Their_Ability_to_Label_Chest_X_Ray_Reports.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12298v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray Reports</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">TL;DR: GPT-4 outperforms open-source models in zero-shot labeling, but few-shot prompting brings them on par.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='351' data-categories='robustness,education,architectures,prompt-engineering,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071804' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Do_Large_Language_Models_Understand_Logic_or_Just_Mimick_Context/2024-02-19-Do_Large_Language_Models_Understand_Logic_or_Just_Mimick_Context.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Do Large Language Models Understand Logic or Just Mimick Context?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs excel in logical reasoning due to in-context learning, but don't truly understand logical rules.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='352' data-categories='robustness,prompt-engineering' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071800' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Confidence_Matters_Revisiting_Intrinsic_Self_Correction_Capabilities_of_Large_Language_Models/2024-02-19-Confidence_Matters_Revisiting_Intrinsic_Self_Correction_Capabilities_of_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12563v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">LLMs self-correct with confidence using IoE prompting for improved accuracy. Code available.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='353' data-categories='robustness,architectures,security' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Are_LLM_based_Evaluators_Confusing_NLG_Quality_Criteria/2024-02-19-Are_LLM_based_Evaluators_Confusing_NLG_Quality_Criteria.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12055v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Are LLM-based Evaluators Confusing NLG Quality Criteria?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">LLMs perform well in NLG but confuse evaluation criteria, requiring further research and improvements.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='354' data-categories='education,social-sciences,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Automatic_Evaluation_for_Mental_Health_Counseling_using_LLMs/2024-02-19-Automatic_Evaluation_for_Mental_Health_Counseling_using_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11958v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Automatic Evaluation for Mental Health Counseling using LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">TL;DR: Automatic LLM-based evaluation offers cost-effective and dependable assessment of counseling quality.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='355' data-categories='education,prompt-engineering,security,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/SPML_A_DSL_for_Defending_Language_Models_Against_Prompt_Attacks/2024-02-19-SPML_A_DSL_for_Defending_Language_Models_Against_Prompt_Attacks.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11755v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">SPML: A DSL for Defending Language Models Against Prompt Attacks</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs transformed chatbots, vulnerable to attacks. SPML prevents malicious execution, surpassing other models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='356' data-categories='architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071892' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/WKVQuant_Quantizing_Weight_and_KeyValue_Cache_for_Large_Language_Models_Gains_More/2024-02-19-WKVQuant_Quantizing_Weight_and_KeyValue_Cache_for_Large_Language_Models_Gains_More.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">WKVQuant optimizes LLMs' memory usage without sacrificing accuracy or efficiency.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='357' data-categories='social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071892' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Your_Large_Language_Model_is_Secretly_a_Fairness_Proponent_and_You_Should_Prompt_it_Like_One/2024-02-19-Your_Large_Language_Model_is_Secretly_a_Fairness_Proponent_and_You_Should_Prompt_it_Like_One.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12150v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">LLMs need prompts to express diverse viewpoints for fairness. FairThinking pipeline outperforms in experiments.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='358' data-categories='robustness,architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Mafin_Enhancing_Black_Box_Embeddings_with_Model_Augmented_Fine_tuning/2024-02-19-Mafin_Enhancing_Black_Box_Embeddings_with_Model_Augmented_Fine_tuning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-tuning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">RAG mitigates LLM hallucinations. Mafin enhances black-box embeddings with trainable model, improving performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='359' data-categories='robustness,security' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071860' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/ROSE_Doesnt_Do_That_Boosting_the_Safety_of_Instruction_Tuned_Large_Language_Models_with_Reverse_Prompt_Contrastive_Decoding/2024-02-19-ROSE_Doesnt_Do_That_Boosting_the_Safety_of_Instruction_Tuned_Large_Language_Models_with_Reverse_Prompt_Contrastive_Decoding.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11889v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">ROSE method boosts safety of large language models without additional training, improving output.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='360' data-categories='social-sciences,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Shall_We_Talk_Exploring_Spontaneous_Collaborations_of_Competing_LLM_Agents/2024-02-19-Shall_We_Talk_Exploring_Spontaneous_Collaborations_of_Competing_LLM_Agents.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Shall We Talk: Exploring Spontaneous Collaborations of Competing LLM Agents</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLM agents can form collaborations without explicit instructions, mimicking human social interactions.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='361' data-categories='robustness,prompt-engineering,architectures,security' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Groot_Adversarial_Testing_for_Generative_Text_to_Image_Models_with_Tree_based_Semantic_Transformation/2024-02-19-Groot_Adversarial_Testing_for_Generative_Text_to_Image_Models_with_Tree_based_Semantic_Transformation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12100v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">Groot automates testing of text-to-image models for NSFW content, outperforming existing methods with 93.66% success.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='362' data-categories='robustness' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Have_Seen_Me_Before_Automating_Dataset_Updates_Towards_Reliable_and_Timely_Evaluation/2024-02-19-Have_Seen_Me_Before_Automating_Dataset_Updates_Towards_Reliable_and_Timely_Evaluation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11894v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">Automating dataset updates for reliable and timely evaluation of Large Language Models. Mimicking and extending strategies.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='363' data-categories='robustness' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071816' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/GenAudit_Fixing_Factual_Errors_in_Language_Model_Outputs_with_Evidence/2024-02-19-GenAudit_Fixing_Factual_Errors_in_Language_Model_Outputs_with_Evidence.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12566v1/extracted/5418278/figures/genaudit_fig1_attempt3.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">LLMs can make dangerous errors; GenAudit tool assists fact-checking for document-grounded tasks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='364' data-categories='production,architectures' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/A_synthetic_data_approach_for_domain_generalization_of_NLI_models/2024-02-19-A_synthetic_data_approach_for_domain_generalization_of_NLI_models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12368v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A synthetic data approach for domain generalization of NLI models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">NLI benchmark task for LLMs, domain generalization, synthetic data improves model generalization.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='365' data-categories='robustness,prompt-engineering,hci' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/Structured_Chain_of_Thought_Prompting_for_Few_Shot_Generation_of_Content_Grounded_QA_Conversations/2024-02-19-Structured_Chain_of_Thought_Prompting_for_Few_Shot_Generation_of_Content_Grounded_QA_Conversations.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Structured Chain-of-Thought Prompting for Few-Shot Generation of Content-Grounded QA Conversations</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">SCoT approach improves question-answer conversations, increases faithfulness to grounding documents, and trains strong conversational QA agents.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='366' data-categories='education,prompt-engineering' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071804' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Distilling_Large_Language_Models_for_Text_Attributed_Graph_Learning/2024-02-19-Distilling_Large_Language_Models_for_Text_Attributed_Graph_Learning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.12022v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Distilling Large Language Models for Text-Attributed Graph Learning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TAGs are graphs of connected textual documents. LLMs and graph models are combined for TAG learning.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='367' data-categories='robustness,social-sciences' data-listing-date-sort='1708300800000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Understanding_Fine_grained_Distortions_in_Reports_of_Scientific_Findings/2024-02-19-Understanding_Fine_grained_Distortions_in_Reports_of_Scientific_Findings.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.12431v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Understanding Fine-grained Distortions in Reports of Scientific Findings</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Distorted science communication harms trust and behavior. Detecting distortions in findings is challenging.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 19, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='368' data-categories='education' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071844' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LoRA_Flow_Dynamic_LoRA_Fusion_for_Large_Language_Models_in_Generative_Tasks/2024-02-18-LoRA_Flow_Dynamic_LoRA_Fusion_for_Large_Language_Models_in_Generative_Tasks.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11455v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LoRA-Flow uses dynamic weights to combine LoRAs for better performance in generative tasks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='369' data-categories='education,prompt-engineering' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='4'> <a href="/posts/SciAgent_Tool_augmented_Language_Models_for_Scientific_Reasoning/2024-02-18-SciAgent_Tool_augmented_Language_Models_for_Scientific_Reasoning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11451v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">SciAgent: Tool-augmented Language Models for Scientific Reasoning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: Introducing tool-augmented scientific reasoning for Large Language Models, with impressive performance in experiments.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='370' data-categories='robustness,education,prompt-engineering,social-sciences,hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Aint_Misbehavin____Using_LLMs_to_Generate_Expressive_Robot_Behavior_in_Conversations_with_the_Tabletop_Robot_Haru/2024-02-18-Aint_Misbehavin____Using_LLMs_to_Generate_Expressive_Robot_Behavior_in_Conversations_with_the_Tabletop_Robot_Haru.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11571v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">TL;DR: Social robots use large language models for dynamic, expressive conversations, with some limitations.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='371' data-categories='robustness' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Aligning_Modalities_in_Vision_Large_Language_Models_via_Preference_Fine_tuning/2024-02-18-Aligning_Modalities_in_Vision_Large_Language_Models_via_Preference_Fine_tuning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11411v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Aligning Modalities in Vision Large Language Models via Preference Fine-tuning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">VLLMs merge vision and language models, but can hallucinate. POVID reduces hallucinations and improves performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='372' data-categories='prompt-engineering' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/AutoPRM_Automating_Procedural_Supervision_for_Multi_Step_Reasoning_via_Controllable_Question_Decomposition/2024-02-18-AutoPRM_Automating_Procedural_Supervision_for_Multi_Step_Reasoning_via_Controllable_Question_Decomposition.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11452v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">AutoPRM improves large language models for complex reasoning tasks without extensive manual labeling.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='373' data-categories='robustness' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Benchmark_Self_Evolving_A_Multi_Agent_Framework_for_Dynamic_LLM_Evaluation/2024-02-18-Benchmark_Self_Evolving_A_Multi_Agent_Framework_for_Dynamic_LLM_Evaluation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">Benchmark framework dynamically evaluates Large Language Models, revealing performance decline and widening model performance discrepancies.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='374' data-categories='education' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071892' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Whats_the_Plan_Evaluating_and_Developing_Planning_Aware_Techniques_for_LLMs/2024-02-18-Whats_the_Plan_Evaluating_and_Developing_Planning_Aware_Techniques_for_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11489v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">What's the Plan? Evaluating and Developing Planning-Aware Techniques for LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs lack planning skills, hybrid approach with classical planning is more effective. Introducing SimPlan.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='375' data-categories='social-sciences' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/A_Multi_Aspect_Framework_for_Counter_Narrative_Evaluation_using_Large_Language_Models/2024-02-18-A_Multi_Aspect_Framework_for_Counter_Narrative_Evaluation_using_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11676v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Counter narratives are effective in combating hate speech; proposed evaluation framework aligns with human judgment.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='376' data-categories='robustness,education' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071892' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/When_Do_LLMs_Need_Retrieval_Augmentation_Mitigating_LLMs_Overconfidence_Helps_Retrieval_Augmentation/2024-02-18-When_Do_LLMs_Need_Retrieval_Augmentation_Mitigating_LLMs_Overconfidence_Helps_Retrieval_Augmentation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11457v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs struggle with knowledge boundaries, but enhancing perception reduces overconfidence and improves performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='377' data-categories='education,prompt-engineering,programming' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071792' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Can_ChatGPT_Support_Developers_An_Empirical_Evaluation_of_Large_Language_Models_for_Code_Generation/2024-02-18-Can_ChatGPT_Support_Developers_An_Empirical_Evaluation_of_Large_Language_Models_for_Code_Generation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11702v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">LLMs show promise in code generation, but current use is limited to high-level concepts and examples.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='378' data-categories='robustness' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/KMMLU_Measuring_Massive_Multitask_Language_Understanding_in_Korean/2024-02-18-KMMLU_Measuring_Massive_Multitask_Language_Understanding_in_Korean.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11548v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">KMMLU: Measuring Massive Multitask Language Understanding in Korean</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">New Korean benchmark KMMLU tests LLMs, showing need for improvement in Korean language models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='379' data-categories='social-sciences' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/A_Note_on_Bias_to_Complete/2024-02-18-A_Note_on_Bias_to_Complete.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A Note on Bias to Complete</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">Minimizing social bias for better decision-making, with new bias types and strategies.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='380' data-categories='prompt-engineering,social-sciences,hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071804' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Decoding_News_Narratives_A_Critical_Analysis_of_Large_Language_Models_in_Framing_Bias_Detection/2024-02-18-Decoding_News_Narratives_A_Critical_Analysis_of_Large_Language_Models_in_Framing_Bias_Detection.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Bias Detection</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Study evaluates GPT-3.5 Turbo, GPT-4, and Flan-T5 in detecting framing bias in news headlines.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='381' data-categories='programming,hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071880' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Tool_Augmented_LLMs_as_a_Universal_Interface_for_IDEs/2024-02-18-Tool_Augmented_LLMs_as_a_Universal_Interface_for_IDEs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11635v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Tool-Augmented LLMs as a Universal Interface for IDEs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">IDEs have evolved, but Large Language Models may change their concept.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='382' data-categories='hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071840' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Learning_From_Failure_Integrating_Negative_Examples_when_Fine_tuning_Large_Language_Models_as_Agents/2024-02-18-Learning_From_Failure_Integrating_Negative_Examples_when_Fine_tuning_Large_Language_Models_as_Agents.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11651v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs need better tool use; using negative examples improves model performance. Code and data available.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='383' data-categories='robustness,security' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Stumbling_Blocks_Stress_Testing_the_Robustness_of_Machine_Generated_Text_Detectors_Under_Attacks/2024-02-18-Stumbling_Blocks_Stress_Testing_the_Robustness_of_Machine_Generated_Text_Detectors_Under_Attacks.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11638v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">Study tests text detectors' robustness to attacks from diverse categories, finding significant vulnerabilities.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='384' data-categories='education,programming' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071840' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Learning_to_Learn_Faster_from_Human_Feedback_with_Language_Model_Predictive_Control/2024-02-18-Learning_to_Learn_Faster_from_Human_Feedback_with_Language_Model_Predictive_Control.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11450v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Learning to Learn Faster from Human Feedback with Language Model Predictive Control</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">LLMs improved to remember interactions and adapt efficiently, enhancing robot teachability and success rates.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='385' data-categories='prompt-engineering' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='4'> <a href="/posts/Self_seeding_and_Multi_intent_Self_instructing_LLMs_for_Generating_Intent_aware_Information_Seeking_dialogs/2024-02-18-Self_seeding_and_Multi_intent_Self_instructing_LLMs_for_Generating_Intent_aware_Information_Seeking_dialogs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11633v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">LLMs used to generate intent-aware dialogs, outperforming human-generated data for intent prediction.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='386' data-categories='prompt-engineering' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071816' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/GNNavi_Navigating_the_Information_Flow_in_Large_Language_Models_by_Graph_Neural_Network/2024-02-18-GNNavi_Navigating_the_Information_Flow_in_Large_Language_Models_by_Graph_Neural_Network.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11709v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural Network</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: GNNavi improves prompt-based fine-tuning for large language models with minimal parameter updates.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='387' data-categories='social-sciences,security,hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071824' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/How_Susceptible_are_Large_Language_Models_to_Ideological_Manipulation/2024-02-18-How_Susceptible_are_Large_Language_Models_to_Ideological_Manipulation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11725v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">How Susceptible are Large Language Models to Ideological Manipulation?</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">LLMs easily absorb and generalize ideological biases, raising concerns about societal impact.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='388' data-categories='education' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071828' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/InfuserKI_Enhancing_Large_Language_Models_with_Knowledge_Graphs_via_Infuser_Guided_Knowledge_Integration/2024-02-18-InfuserKI_Enhancing_Large_Language_Models_with_Knowledge_Graphs_via_Infuser_Guided_Knowledge_Integration.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11441v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs struggle with knowledge tasks. InfuserKI framework efficiently integrates new knowledge without forgetting.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='389' data-categories='hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/MatPlotAgent_Method_and_Evaluation_for_LLM_Based_Agentic_Scientific_Data_Visualization/2024-02-18-MatPlotAgent_Method_and_Evaluation_for_LLM_Based_Agentic_Scientific_Data_Visualization.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11453v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">MatPlotAgent automates scientific data visualization tasks, improving LLM performance with a new benchmark.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='390' data-categories='social-sciences,hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Modelling_Political_Coalition_Negotiations_Using_LLM_based_Agents/2024-02-18-Modelling_Political_Coalition_Negotiations_Using_LLM_based_Agents.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11712v1/extracted/5411827/figures/task.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Modelling Political Coalition Negotiations Using LLM-based Agents</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Coalition negotiations modeled using NLP with new dataset, evaluating large language models' performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='391' data-categories='robustness,programming' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071844' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LongAgent_Scaling_Language_Models_to_128k_Context_through_Multi_Agent_Collaboration/2024-02-18-LongAgent_Scaling_Language_Models_to_128k_Context_through_Multi_Agent_Collaboration.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11550v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">LLMs struggle with long context, but LongAgent improves long-text processing.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='392' data-categories='education,prompt-engineering,social-sciences,hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Shaping_Human_AI_Collaboration_Varied_Scaffolding_Levels_in_Co_writing_with_Language_Models/2024-02-18-Shaping_Human_AI_Collaboration_Varied_Scaffolding_Levels_in_Co_writing_with_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11723v1/extracted/5416360/fig1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Study explores impact of AI language model scaffolding on co-writing process and productivity.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='393' data-categories='prompt-engineering,security' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Dont_Go_To_Extremes_Revealing_the_Excessive_Sensitivity_and_Calibration_Limitations_of_LLMs_in_Implicit_Hate_Speech_Detection/2024-02-18-Dont_Go_To_Extremes_Revealing_the_Excessive_Sensitivity_and_Calibration_Limitations_of_LLMs_in_Implicit_Hate_Speech_Detection.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11406v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">LLMs struggle with detecting implicit hate speech and have limited confidence calibration.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='394' data-categories='hci' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071848' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Multi_dimensional_Evaluation_of_Empathetic_Dialog_Responses/2024-02-18-Multi_dimensional_Evaluation_of_Empathetic_Dialog_Responses.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Multi-dimensional Evaluation of Empathetic Dialog Responses</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Proposed framework measures empathy in conversations, with best results from instruction-finetuned classifiers.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='395' data-categories='recommender' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071840' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Large_Language_Models_as_Data_Augmenters_for_Cold_Start_Item_Recommendation/2024-02-18-Large_Language_Models_as_Data_Augmenters_for_Cold_Start_Item_Recommendation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Large Language Models as Data Augmenters for Cold-Start Item Recommendation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('recommender'); return false;">recommender</div> </div> <div class="card-text listing-description delink">LLMs improve recommendation systems by inferring user preferences for cold-start items from textual descriptions.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='396' data-categories='education' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071892' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Vision_Flan_Scaling_Human_Labeled_Tasks_in_Visual_Instruction_Tuning/2024-02-18-Vision_Flan_Scaling_Human_Labeled_Tasks_in_Visual_Instruction_Tuning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11690v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">Vision-Flan dataset improves VLMs' performance, GPT-4 data enhances human-preferred formats, and LLMs benefit from visual instruction tuning.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='397' data-categories='education,prompt-engineering' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071800' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Counter_intuitive_Large_Language_Models_Can_Better_Understand_Knowledge_Graphs_Than_We_Thought/2024-02-18-Counter_intuitive_Large_Language_Models_Can_Better_Understand_Knowledge_Graphs_Than_We_Thought.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11541v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Counter-intuitive: Large Language Models Can Better Understand Knowledge Graphs Than We Thought</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Using knowledge graphs to enhance language models' comprehension, messy KG knowledge is effectively handled.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='398' data-categories='prompt-engineering,programming' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071872' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Solving_Data_centric_Tasks_using_Large_Language_Models/2024-02-18-Solving_Data_centric_Tasks_using_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11734v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Solving Data-centric Tasks using Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">LLMs replacing help forums for non-professional programmers, cluster-then-select technique improves performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='399' data-categories='prompt-engineering' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/One_Prompt_To_Rule_Them_All_LLMs_for_Opinion_Summary_Evaluation/2024-02-18-One_Prompt_To_Rule_Them_All_LLMs_for_Opinion_Summary_Evaluation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.11683v1/extracted/5416190/images/comparison.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">New dataset and prompts improve opinion summary evaluation, outperforming previous methods.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='400' data-categories='education' data-listing-date-sort='1708214400000' data-listing-file-modified-sort='1717413071856' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Perils_of_Self_Feedback_Self_Bias_Amplifies_in_Large_Language_Models/2024-02-18-Perils_of_Self_Feedback_Self_Bias_Amplifies_in_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.11436v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Perils of Self-Feedback: Self-Bias Amplifies in Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">Self-feedback improves some tasks, worsens others due to large language model bias.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 18, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='401' data-categories='robustness' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Uncertainty_Decomposition_and_Quantification_for_In_Context_Learning_of_Large_Language_Models/2024-02-15-Uncertainty_Decomposition_and_Quantification_for_In_Context_Learning_of_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10189v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">LLMs' in-context learning has uncertainties, addressed by a new method.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='402' data-categories='architectures,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071860' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/QUICK_Quantization_aware_Interleaving_and_Conflict_free_Kernel_for_efficient_LLM_inference/2024-02-15-QUICK_Quantization_aware_Interleaving_and_Conflict_free_Kernel_for_efficient_LLM_inference.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10076v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">QUICK optimizes CUDA kernels for faster inference of quantized Large Language Models. Up to 1.91x speedup.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='403' data-categories='security,robustness' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/PAL_Proxy_Guided_Black_Box_Attack_on_Large_Language_Models/2024-02-15-PAL_Proxy_Guided_Black_Box_Attack_on_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09674v1/extracted/5409801/figures/banner.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">PAL: Proxy-Guided Black-Box Attack on Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">LLMs vulnerable to harmful content, Proxy-Guided Attack achieves high success rate, improves safety testing. Code: https://github.com/chawins/pal.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='404' data-categories='architectures,education,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Selective_Reflection_Tuning_Student_Selected_Data_Recycling_for_LLM_Instruction_Tuning/2024-02-15-Selective_Reflection_Tuning_Student_Selected_Data_Recycling_for_LLM_Instruction_Tuning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.10110v1/extracted/5411213/Figures/reflection_main.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Selective Reflection-Tuning improves LLM finetuning without new data, achieving superior performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='405' data-categories='security,hci,robustness' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/AbuseGPT_Abuse_of_Generative_AI_ChatBots_to_Create_Smishing_Campaigns/2024-02-15-AbuseGPT_Abuse_of_Generative_AI_ChatBots_to_Create_Smishing_Campaigns.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09728v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing Campaigns</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">AI chatbots can be exploited to create smishing texts, posing a cybersecurity threat.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='406' data-categories='recommender' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071836' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LLM_based_Federated_Recommendation/2024-02-15-LLM_based_Federated_Recommendation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LLM-based Federated Recommendation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('recommender'); return false;">recommender</div> </div> <div class="card-text listing-description delink">LLMs enhance recommendation systems, but pose privacy risks. PPLR framework balances performance and preserves privacy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='407' data-categories='education,prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/LAPDoc_Layout_Aware_Prompting_for_Documents/2024-02-15-LAPDoc_Layout_Aware_Prompting_for_Documents.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09841v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LAPDoc: Layout-Aware Prompting for Documents</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Training LLMs with layout enrichment improves document understanding by 15%.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='408' data-categories='architectures,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/OpenMathInstruct_1_A_1.8_Million_Math_Instruction_Tuning_Dataset/2024-02-15-OpenMathInstruct_1_A_1.8_Million_Math_Instruction_Tuning_Dataset.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Synthetic datasets improve math instruction tuning for large language models. OpenMathInstruct-1 dataset and model released.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='409' data-categories='robustness,prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071800' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Crafting_a_Good_Prompt_or_Providing_Exemplary_Dialogues_A_Study_of_In_Context_Learning_for_Persona_based_Dialogue_Generation/2024-02-15-Crafting_a_Good_Prompt_or_Providing_Exemplary_Dialogues_A_Study_of_In_Context_Learning_for_Persona_based_Dialogue_Generation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Crafting a Good Prompt or Providing Exemplary Dialogues? A Study of In-Context Learning for Persona-based Dialogue Generation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">ICL improves dialogue generation; prompt adjustments and diverse demos are key.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='410' data-categories='hci,architectures,social-sciences,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071792' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Both_Matter_Enhancing_the_Emotional_Intelligence_of_Large_Language_Models_without_Compromising_the_General_Intelligence/2024-02-15-Both_Matter_Enhancing_the_Emotional_Intelligence_of_Large_Language_Models_without_Compromising_the_General_Intelligence.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10073v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Emotional Intelligence (EI) is crucial for AI assistants; MoEI enhances EI without compromising general intelligence.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='411' data-categories='social-sciences,prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071860' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/RS_DPO_A_Hybrid_Rejection_Sampling_and_Direct_Preference_Optimization_Method_for_Alignment_of_Large_Language_Models/2024-02-15-RS_DPO_A_Hybrid_Rejection_Sampling_and_Direct_Preference_Optimization_Method_for_Alignment_of_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.10038v1/extracted/5409495/RLHF_flowchart.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">RLHF with PPO unstable, DPO relies on contrastive responses, RS-DPO combines rejection sampling for improved alignment.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='412' data-categories='education' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071796' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Case_Study_Testing_Model_Capabilities_in_Some_Reasoning_Tasks/2024-02-15-Case_Study_Testing_Model_Capabilities_in_Some_Reasoning_Tasks.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Case Study: Testing Model Capabilities in Some Reasoning Tasks</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs excel in personalized content but need improvement in reasoning abilities for complex scenarios.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='413' data-categories='production,prompt-engineering,architectures,robustness,security' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071884' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Towards_Safer_Large_Language_Models_through_Machine_Unlearning/2024-02-15-Towards_Safer_Large_Language_Models_through_Machine_Unlearning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10058v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Towards Safer Large Language Models through Machine Unlearning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">TL;DR: Selective Knowledge negation Unlearning (SKU) removes harmful knowledge while preserving model utility.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='414' data-categories='production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071884' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Towards_Reducing_Diagnostic_Errors_with_Interpretable_Risk_Prediction/2024-02-15-Towards_Reducing_Diagnostic_Errors_with_Interpretable_Risk_Prediction.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10109v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Towards Reducing Diagnostic Errors with Interpretable Risk Prediction</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Method uses LLMs to identify evidence in EHRs, reduce diagnostic errors, and mitigate delays.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='415' data-categories='architectures,social-sciences,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071864' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='4'> <a href="/posts/Rethinking_Information_Structures_in_RLHF_Reward_Generalization_from_a_Graph_Theory_Perspective/2024-02-15-Rethinking_Information_Structures_in_RLHF_Reward_Generalization_from_a_Graph_Theory_Perspective.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10184v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">RLHF faces trilemma, we propose tree-based reward model for better performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='416' data-categories='robustness' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071804' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Do_LLMs_Know_about_Hallucination_An_Empirical_Investigation_of_LLMs_Hidden_States/2024-02-15-Do_LLMs_Know_about_Hallucination_An_Empirical_Investigation_of_LLMs_Hidden_States.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09733v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">LLMs react differently to genuine versus fabricated responses, with potential to mitigate hallucination.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='417' data-categories='hci,education,social-sciences,prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071816' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/GPT_4s_assessment_of_its_performance_in_a_USMLE_based_case_study/2024-02-15-GPT_4s_assessment_of_its_performance_in_a_USMLE_based_case_study.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09654v1/extracted/5407792/Pictures/FeedBack.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">GPT-4's assessment of its performance in a USMLE-based case study</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Study evaluates GPT-4's confidence in healthcare questions with and without feedback, offering insights for AI reliability in healthcare.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='418' data-categories='hci,programming' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071812' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Exploring_the_Potential_of_Large_Language_Models_in_Artistic_Creation_Collaboration_and_Reflection_on_Creative_Programming/2024-02-15-Exploring_the_Potential_of_Large_Language_Models_in_Artistic_Creation_Collaboration_and_Reflection_on_Creative_Programming.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09750v1/extracted/5380332/Figures/FourCircles.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> </div> <div class="card-text listing-description delink">LLMs in artist-AI collaboration for creative coding, reflection types, user performance, and design suggestions.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='419' data-categories='robustness,programming,prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071796' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/CodeMind_A_Framework_to_Challenge_Large_Language_Models_for_Code_Reasoning/2024-02-15-CodeMind_A_Framework_to_Challenge_Large_Language_Models_for_Code_Reasoning.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09664v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">CodeMind: A Framework to Challenge Large Language Models for Code Reasoning</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">CodeMind evaluates LLMs' code reasoning abilities, showing fair understanding for simple programs but drops for complex ones.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='420' data-categories='architectures,social-sciences,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071864' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Rewards_in_Context_Multi_objective_Alignment_of_Foundation_Models_with_Dynamic_Preference_Adjustment/2024-02-15-Rewards_in_Context_Multi_objective_Alignment_of_Foundation_Models_with_Dynamic_Preference_Adjustment.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10207v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">TL;DR: RiC simplifies and adapts foundation model alignment to human preferences, outperforming RL.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='421' data-categories='education' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/NutePrune_Efficient_Progressive_Pruning_with_Numerous_Teachers_for_Large_Language_Models/2024-02-15-NutePrune_Efficient_Progressive_Pruning_with_Numerous_Teachers_for_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09773v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">Structured pruning compresses Large Language Models for efficient deployment on resource-constrained hardware. NutePrune method enhances performance.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='422' data-categories='hci,architectures' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071876' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/TDAG_A_Multi_Agent_Framework_based_on_Dynamic_Task_Decomposition_and_Agent_Generation/2024-02-15-TDAG_A_Multi_Agent_Framework_based_on_Dynamic_Task_Decomposition_and_Agent_Generation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10178v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">TL;DR: Proposed multi-agent framework enhances adaptability in real-world tasks, outperforming established baselines.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='423' data-categories='education,prompt-engineering,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071796' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Chain_of_Thought_Reasoning_Without_Prompting/2024-02-15-Chain_of_Thought_Reasoning_Without_Prompting.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10200v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Chain-of-Thought Reasoning Without Prompting</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Novel approach uses top-k decoding to elicit reasoning paths in LLMs without prompting.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='424' data-categories='prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Beyond_Imitation_Generating_Human_Mobility_from_Context_aware_Reasoning_with_Large_Language_Models/2024-02-15-Beyond_Imitation_Generating_Human_Mobility_from_Context_aware_Reasoning_with_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09836v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Beyond Imitation: Generating Human Mobility from Context-aware Reasoning with Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: MobiGeaR uses reasoning to generate mobility data efficiently and accurately, improving downstream applications.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='425' data-categories='production,education,hci,architectures,social-sciences' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071816' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='7'> <a href="/posts/Fine_tuning_Large_Language_Model_(LLM)_Artificial_Intelligence_Chatbots_in_Ophthalmology_and_LLM_based_evaluation_using_GPT_4/2024-02-15-Fine_tuning_Large_Language_Model_(LLM)_Artificial_Intelligence_Chatbots_in_Ophthalmology_and_LLM_based_evaluation_using_GPT_4.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10083v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">GPT-4 evaluation aligns with clinicians, identifying clinical inaccuracies in LLM-generated responses.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='426' data-categories='architectures,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071876' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/TOAD_Task_Oriented_Automatic_Dialogs_with_Diverse_Response_Styles/2024-02-15-TOAD_Task_Oriented_Automatic_Dialogs_with_Diverse_Response_Styles.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">TL;DR: New TOAD dataset for virtual assistants, simulates app context, challenges response styles.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='427' data-categories='education,prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071784' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Answer_is_All_You_Need_Instruction_following_Text_Embedding_via_Answering_the_Question/2024-02-15-Answer_is_All_You_Need_Instruction_following_Text_Embedding_via_Answering_the_Question.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09642v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Answer is All You Need: Instruction-following Text Embedding via Answering the Question</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: New text embedder encodes user instructions for improved representation and interpretability.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='428' data-categories='architectures,robustness,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071792' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/BitDelta_Your_Fine_Tune_May_Only_Be_Worth_One_Bit/2024-02-15-BitDelta_Your_Fine_Tune_May_Only_Be_Worth_One_Bit.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10193v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">BitDelta: Your Fine-Tune May Only Be Worth One Bit</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">LLMs trained in two phases, BitDelta quantizes fine-tuned model weights to 1 bit, reducing GPU memory requirements.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='429' data-categories='architectures,programming,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071852' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/OptiMUS_Scalable_Optimization_Modeling_with_(MI)LP_Solvers_and_Large_Language_Models/2024-02-15-OptiMUS_Scalable_Optimization_Modeling_with_(MI)LP_Solvers_and_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10172v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">OptiMUS uses LLM to solve optimization problems from natural language, outperforming existing methods.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='430' data-categories='education,social-sciences' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071776' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/AI_Hospital_Interactive_Evaluation_and_Collaboration_of_LLMs_as_Intern_Doctors_for_Clinical_Diagnosis/2024-02-15-AI_Hospital_Interactive_Evaluation_and_Collaboration_of_LLMs_as_Intern_Doctors_for_Clinical_Diagnosis.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09742v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">AI Hospital uses LLMs for interactive diagnosis, with dispute resolution improving accuracy.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='431' data-categories='education' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071824' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/How_to_Train_Data_Efficient_LLMs/2024-02-15-How_to_Train_Data_Efficient_LLMs.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09668v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">How to Train Data-Efficient LLMs</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">TL;DR: Study on data-efficient pre-training of large language models using Ask-LLM and Density sampling.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='432' data-categories='hci,architectures,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071832' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Knowledge_Infused_LLM_Powered_Conversational_Health_Agent_A_Case_Study_for_Diabetes_Patients/2024-02-15-Knowledge_Infused_LLM_Powered_Conversational_Health_Agent_A_Case_Study_for_Diabetes_Patients.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.10153v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">TL;DR: Knowledge-infused LLM-powered CHA outperforms GPT4 in diabetes management.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='433' data-categories='architectures' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Unmemorization_in_Large_Language_Models_via_Self_Distillation_and_Deliberate_Imagination/2024-02-15-Unmemorization_in_Large_Language_Models_via_Self_Distillation_and_Deliberate_Imagination.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.10052v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">Novel 'deliberate imagination' approach unlearns sensitive data while preserving LLM capabilities.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='434' data-categories='social-sciences' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071888' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Unlocking_Structure_Measuring_Introducing_PDD_an_Automatic_Metric_for_Positional_Discourse_Coherence/2024-02-15-Unlocking_Structure_Measuring_Introducing_PDD_an_Automatic_Metric_for_Positional_Discourse_Coherence.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">New metric measures discourse coherence in long-form text, outperforms existing methods.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='435' data-categories='architectures,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/Self_Play_Fine_Tuning_of_Diffusion_Models_for_Text_to_Image_Generation/2024-02-15-Self_Play_Fine_Tuning_of_Diffusion_Models_for_Text_to_Image_Generation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10210v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Fine-tuning Diffusion Models with SPIN-Diffusion improves performance and alignment with less data.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='436' data-categories='robustness' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071808' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Enhancing_Large_Language_Models_with_Pseudo__and_Multisource__Knowledge_Graphs_for_Open_ended_Question_Answering/2024-02-15-Enhancing_Large_Language_Models_with_Pseudo__and_Multisource__Knowledge_Graphs_for_Open_ended_Question_Answering.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09911v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">Framework combines Pseudo-Graph Generation and Atomic Knowledge Verification to enhance Large Language Models.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='437' data-categories='prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071776' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/A_Human_Inspired_Reading_Agent_with_Gist_Memory_of_Very_Long_Contexts/2024-02-15-A_Human_Inspired_Reading_Agent_with_Gist_Memory_of_Very_Long_Contexts.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09727v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">ReadAgent extends LLM context length by 20x, outperforming baselines on reading comprehension tasks.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='438' data-categories='prompt-engineering' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Best_Arm_Identification_for_Prompt_Learning_under_a_Limited_Budget/2024-02-15-Best_Arm_Identification_for_Prompt_Learning_under_a_Limited_Budget.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09723v1/extracted/5409979/figures/procedure.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Best Arm Identification for Prompt Learning under a Limited Budget</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Large language model prompt learning with budget constraints improves performance over previous methods.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='439' data-categories='social-sciences' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='4'> <a href="/posts/Aligning_Crowd_Feedback_via_Distributional_Preference_Reward_Modeling/2024-02-15-Aligning_Crowd_Feedback_via_Distributional_Preference_Reward_Modeling.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09764v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Aligning Crowd Feedback via Distributional Preference Reward Modeling</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> </div> <div class="card-text listing-description delink">TL;DR: DPRM aligns large language models with diverse human preferences using beta distribution and optimal transportation-based loss.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='440' data-categories='hci,education,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/GeoEval_Benchmark_for_Evaluating_LLMs_and_Multi_Modal_Models_on_Geometry_Problem_Solving/2024-02-15-GeoEval_Benchmark_for_Evaluating_LLMs_and_Multi_Modal_Models_on_Geometry_Problem_Solving.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10104v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">Advancements in LLMs and MMs for geometry problems, WizardMath model excels, GPT-series rephrasing effective.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='441' data-categories='security,architectures,production' data-listing-date-sort='1707955200000' data-listing-file-modified-sort='1717413071780' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/A_Trembling_House_of_Cards_Mapping_Adversarial_Attacks_against_Language_Agents/2024-02-15-A_Trembling_House_of_Cards_Mapping_Adversarial_Attacks_against_Language_Agents.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.10196v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">LLMs have great potential but pose safety risks. Mapping adversarial attacks is urgent.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 15, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='442' data-categories='security,prompt-engineering,robustness,architectures,education' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071844' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Leveraging_the_Context_through_Multi_Round_Interactions_for_Jailbreaking_Attacks/2024-02-14-Leveraging_the_Context_through_Multi_Round_Interactions_for_Jailbreaking_Attacks.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09177v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs vulnerable to Contextual Interaction Attack using prior context to extract harmful information.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='443' data-categories='recommender,robustness,prompt-engineering' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071840' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Large_Language_Model_with_Graph_Convolution_for_Recommendation/2024-02-14-Large_Language_Model_with_Graph_Convolution_for_Recommendation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.08859v1/x2.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Large Language Model with Graph Convolution for Recommendation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('recommender'); return false;">recommender</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">Text info for user/item profiling; LLMs improve description quality, capture high-order relations in user-item graph.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='444' data-categories='robustness' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/Automated_Unit_Test_Improvement_using_Large_Language_Models_at_Meta/2024-02-14-Automated_Unit_Test_Improvement_using_Large_Language_Models_at_Meta.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09171v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Automated Unit Test Improvement using Large Language Models at Meta</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> </div> <div class="card-text listing-description delink">Meta's TestGen-LLM tool improves human-written tests, with high success rates in deployment.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='445' data-categories='security' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Attacking_Large_Language_Models_with_Projected_Gradient_Descent/2024-02-14-Attacking_Large_Language_Models_with_Projected_Gradient_Descent.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09154v1/x1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Attacking Large Language Models with Projected Gradient Descent</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">LLM alignment methods easily broken by adversarial prompts, but PGD attack is faster and more effective.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='446' data-categories='security' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/GrounDial_Human_norm_Grounded_Safe_Dialog_Response_Generation/2024-02-14-GrounDial_Human_norm_Grounded_Safe_Dialog_Response_Generation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">GrounDial: Human-norm Grounded Safe Dialog Response Generation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> </div> <div class="card-text listing-description delink">Conversational AI GrounDial generates safe responses without additional tuning or data.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='447' data-categories='architectures,production,education' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071844' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/LlaSMol_Advancing_Large_Language_Models_for_Chemistry_with_a_Large_Scale_Comprehensive_High_Quality_Instruction_Tuning_Dataset/2024-02-14-LlaSMol_Advancing_Large_Language_Models_for_Chemistry_with_a_Large_Scale_Comprehensive_High_Quality_Instruction_Tuning_Dataset.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/bayesian-beagle.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs outperform GPT-4 in chemistry tasks using SMolInstruct dataset, Mistral model recommended.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='448' data-categories='robustness,production' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/HGOT_Hierarchical_Graph_of_Thoughts_for_Retrieval_Augmented_In_Context_Learning_in_Factuality_Evaluation/2024-02-14-HGOT_Hierarchical_Graph_of_Thoughts_for_Retrieval_Augmented_In_Context_Learning_in_Factuality_Evaluation.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09390v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> </div> <div class="card-text listing-description delink">HGOT improves retrieval in LLMs, enhancing factuality by 7%.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='449' data-categories='social-sciences,robustness,production,prompt-engineering' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071844' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Leveraging_Large_Language_Models_for_Enhanced_NLP_Task_Performance_through_Knowledge_Distillation_and_Optimized_Training_Strategies/2024-02-14-Leveraging_Large_Language_Models_for_Enhanced_NLP_Task_Performance_through_Knowledge_Distillation_and_Optimized_Training_Strategies.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09282v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Leveraging Large Language Models for Enhanced NLP Task Performance through Knowledge Distillation and Optimized Training Strategies</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> </div> <div class="card-text listing-description delink">TL;DR: GPT-4 integration improves BERT model for NER tasks, outperforming human annotations.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='450' data-categories='security,programming,prompt-engineering,robustness,architectures' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071860' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Rapid_Adoption_Hidden_Risks_The_Dual_Impact_of_Large_Language_Model_Customization/2024-02-14-Rapid_Adoption_Hidden_Risks_The_Dual_Impact_of_Large_Language_Model_Customization.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09179v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Rapid Adoption, Hidden Risks: The Dual Impact of Large Language Model Customization</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('programming'); return false;">programming</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">Customized LLMs like GPTs vulnerable to instruction backdoor attacks, requiring defense mechanisms.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='451' data-categories='robustness,education' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071828' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Into_the_Unknown_Self_Learning_Large_Language_Models/2024-02-14-Into_the_Unknown_Self_Learning_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09147v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Into the Unknown: Self-Learning Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">Self-learning LLM framework uses hallucination score to identify knowledge gaps for efficient learning.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='452' data-categories='architectures,social-sciences,hci,education' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071812' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='1'> <a href="/posts/Evaluating_the_Experience_of_LGBTQ+_People_Using_Large_Language_Model_Based_Chatbots_for_Mental_Health_Support/2024-02-14-Evaluating_the_Experience_of_LGBTQ+_People_Using_Large_Language_Model_Based_Chatbots_for_Mental_Health_Support.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09260v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LGBTQ+ individuals rely on chatbots for mental health, but they struggle to address specific challenges.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='453' data-categories='social-sciences,prompt-engineering,education' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071868' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Scaling_the_Authoring_of_AutoTutors_with_Large_Language_Models/2024-02-14-Scaling_the_Authoring_of_AutoTutors_with_Large_Language_Models.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09216v1/x24.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Scaling the Authoring of AutoTutors with Large Language Models</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('social-sciences'); return false;">social-sciences</div> <div class="listing-category" onclick="window.quartoListingCategory('prompt-engineering'); return false;">prompt-engineering</div> <div class="listing-category" onclick="window.quartoListingCategory('education'); return false;">education</div> </div> <div class="card-text listing-description delink">LLMs used in Intelligent Tutoring Systems with guardrails for better learning results.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='454' data-categories='hci' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071812' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='3'> <a href="/posts/Exploring_Neuron_Interactions_and_Emergence_in_LLMs_From_the_Multifractal_Analysis_Perspective/2024-02-14-Exploring_Neuron_Interactions_and_Emergence_in_LLMs_From_the_Multifractal_Analysis_Perspective.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="/img/2402.09099v1/image_1.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis Perspective</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> </div> <div class="card-text listing-description delink">Research explores neuron interactions in large language models, introducing concepts of self-organization and multifractal analysis.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='455' data-categories='security,hci,production,robustness,architectures' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071788' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/Attacks_Defenses_and_Evaluations_for_LLM_Conversation_Safety_A_Survey/2024-02-14-Attacks_Defenses_and_Evaluations_for_LLM_Conversation_Safety_A_Survey.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09283v1/extracted/5408740/figures/defense_overview.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey</h5> <div class="listing-categories"> <div class="listing-category" onclick="window.quartoListingCategory('security'); return false;">security</div> <div class="listing-category" onclick="window.quartoListingCategory('hci'); return false;">hci</div> <div class="listing-category" onclick="window.quartoListingCategory('production'); return false;">production</div> <div class="listing-category" onclick="window.quartoListingCategory('robustness'); return false;">robustness</div> <div class="listing-category" onclick="window.quartoListingCategory('architectures'); return false;">architectures</div> </div> <div class="card-text listing-description delink">TL;DR: Survey covers LLM conversation safety studies on attacks, defenses, and evaluations. Encourages further investigation.</div> <div class="card-attribution card-text-small end"> <div class="listing-date">`Feb 14, 2024`{=html}</div> </div> </div> </div></a></div> <div class="g-col-1" data-index='456' data-categories='architectures,production' data-listing-date-sort='1707868800000' data-listing-file-modified-sort='1717413071820' data-listing-date-modified-sort='NaN' data-listing-reading-time-sort='2'> <a href="/posts/HiRE_High_Recall_Approximate_Top_\)k\(_Estimation_for_Efficient_LLM_Inference/2024-02-14-HiRE_High_Recall_Approximate_Top_\)k\(_Estimation_for_Efficient_LLM_Inference.qmd" class="quarto-grid-link"> <div class="quarto-grid-item card h-100 card-left"> <p class="card-img-top"> <img data-src="https://browse.arxiv.org/html/2402.09360v1/extracted/5409158/figures/herd.png" class="thumbnail-image card-img" style="height: 150px;" > </p> <div class="card-body post-contents"> <h5 class="no-anchor card-title listing-title">HiRE: High Recall Approximate Top-\)k$ Estimation for Efficient LLM Inference
architectures
production
Autoregressive decoding with LLMs on accelerators can improve latency using HiRE compression scheme.
Feb 14, 2024

Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit Clues
security
robustness
TL;DR: Puzzler is an indirect jailbreak attack approach with high success rate.
Feb 14, 2024

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
education
prompt-engineering
Mustard framework generates high-quality theorem and proof data for language model training.
Feb 14, 2024

Reinforcement Learning from Human Feedback with Active Queries
architectures
production
TL;DR: Proposed query-efficient RLHF method reduces human-labelled preference data needed for large language models.
Feb 14, 2024

Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
security
programming
production
robustness
architectures
Code auditing for Large Language Models (LLMs) is challenging due to potential copyright infringement. TraWiC offers a solution.
Feb 14, 2024

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning
programming
education
DolphCoder improves code generation with diverse instructions and self-evaluation, outperforming benchmarks.
Feb 14, 2024

Tree-Based Hard Attention with Self-Motivation for Large Language Models
prompt-engineering
Large language models struggle with hierarchical text structures, but TEAROOM improves task-specific property estimation.
Feb 14, 2024

AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous Systems
social-sciences
hci
TL;DR: Visualization approach for analyzing behavior and evolution of Large Language Model based Autonomous systems.
Feb 14, 2024

Role-Playing Simulation Games using ChatGPT
education
hci
prompt-engineering
COVID-19 led to digital transformation in education. Large Language Models enhance teaching quality.
Feb 14, 2024

Large Language Model Interaction Simulator for Cold-Start Item Recommendation
architectures
recommender
hci
LLM-InS simulates user behavior for cold items, improving recommendation performance.
Feb 14, 2024

Exploring the Adversarial Capabilities of Large Language Models
security
robustness
programming
LLMs can create adversarial examples to undermine hate speech detection systems, posing challenges for safety measures.
Feb 14, 2024

ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
architectures
production
TL;DR: ICDPO improves LLM content alignment without fine-tuning, outperforming baselines and competing with SFT + LoRA.
Feb 14, 2024

(Ir)rationality and Cognitive Biases in Large Language Models
social-sciences
robustness
prompt-engineering
LLMs display irrationality different from humans in reasoning tasks, with inconsistent responses.
Feb 14, 2024

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
architectures
robustness
production
New approach improves factual accuracy in large language models without human annotations.
Feb 14, 2024

Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling
architectures
POGER improves AIGT detection in black-box settings with efficient word generation probability estimation.
Feb 14, 2024

Developing a Framework for Auditing Large Language Models Using Human-in-the-Loop
architectures
robustness
production
education
Auditing LLMs for bias and inconsistencies using automatic and scalable probes with human-in-the-loop.
Feb 14, 2024

Rationality Report Cards: Assessing the Economic Rationality of Large Language Models
hci
prompt-engineering
LLMs as decision-making agents need methodology for assessing economic rationality, proposed in this paper.
Feb 14, 2024

AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability
architectures
production
prompt-engineering
AQA-Bench assesses language models’ sequential reasoning in algorithmic contexts, revealing performance variations. Code available.
Feb 14, 2024

Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models
social-sciences
prompt-engineering
LLMs perform well on reasoning benchmarks, but lack humanlike abstract reasoning abilities.
Feb 14, 2024

Rethinking Large Language Model Architectures for Sequential Recommendations
recommender
LLM-based Lite-LLM4Rec improves sequential recommendation efficiency and performance by 46.8%.
Feb 14, 2024

Personalized Large Language Models
social-sciences
production
LLMs advanced NLP, but personalization improves reasoning in subjective tasks.
Feb 14, 2024

Multi-Query Focused Disaster Summarization via Instruction-Based Prompting
prompt-engineering
CrisisFACTS advances disaster summarization using web sources, retrieval, and QA-motivated prompting. Strong results shown.
Feb 14, 2024

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
security
robustness
TL;DR: SafeDecoding defends LLMs from jailbreak attacks, reducing harm without compromising helpfulness.
Feb 14, 2024

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
security
Adversarial robustness research focuses on open-source LLMs, proposing embedding space attacks as a threat model.
Feb 14, 2024

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations
recommender
prompt-engineering
Large language models lack efficiency in mining relationships from graph data. Proposed framework improves recommendation tasks. Code available.
Feb 14, 2024

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
architectures
production
Large language models face memory bottleneck; proposed LESS integration improves caching efficiency.
Feb 14, 2024

AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach
architectures
robustness
production
education
AuditLLM is a tool to probe and audit Large Language Models for consistency and reliability.
Feb 14, 2024

Premise Order Matters in Reasoning with Large Language Models
prompt-engineering
LLMs struggle with premise ordering in reasoning tasks, leading to significant performance drops. New benchmark released.
Feb 14, 2024

Copyright Traps for Large Language Models
architectures
robustness
production
Debates on fair use of copyright in training language models. Proposed copyright traps for detection.
Feb 14, 2024

How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?
education
prompt-engineering
Article: The Impact of Social Media on Mental Health: A Review of the Literature tl;dr: Social media can negatively impact mental health, but more research is needed.
Feb 14, 2024

Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
robustness
production
security
architectures
LLMs face security risks, including indirect jailbreak attacks like Pandora, which manipulates RAG to generate malicious content.
Feb 13, 2024

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
security
architectures
production
robustness
MLLM agent can be jailbroken by adversarial images, leading to infectious jailbreak in multi-agent environments.
Feb 13, 2024

LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents
prompt-engineering
architectures
Benchmark system evaluates language-oriented task planners for home-service embodied agents, accelerating development.
Feb 13, 2024

COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
security
production
architectures
Jailbreaks on large language models studied for controllable attack generation using COLD-Attack framework.
Feb 13, 2024

Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
robustness
production
architectures
TL;DR: MARINE framework reduces object hallucinations in LVLMs without expensive training or API access.
Feb 13, 2024

Large Language Models as Minecraft Agents
architectures
production
hci
education
TL;DR: Study evaluates LLMs as Minecraft agents, introduces clarification questions, and presents online interaction platform.
Feb 13, 2024

Prompted Contextual Vectors for Spear-Phishing Detection
prompt-engineering
production
security
Novel method detects LLM-generated spear-phishing emails with 91% accuracy using document vectorization.
Feb 13, 2024

On Limitations of the Transformer Architecture
robustness
LLMs struggle with composing functions due to domain size, impacting mathematical tasks.
Feb 13, 2024

ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow Discussions
hci
programming
education
ChatGPT and LLaMA challenge human expertise on Stack Overflow, but don’t outperform it in some domains.
Feb 13, 2024

GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency
robustness
prompt-engineering
LLMs in writing systems frustrate users, but GhostWriter offers personalized control and empowerment.
Feb 13, 2024

Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs
hci
social-sciences
architectures
LLMs can simulate human strategic behavior in social settings, with multi-agent architecture more accurate.
Feb 13, 2024

Large Language Models for the Automated Analysis of Optimization Algorithms
production
programming
architectures
education
LLMs integrated into STNWeb for optimization algorithm visualizations, enhancing user experience.
Feb 13, 2024

BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models
production
education
architectures
Adapting black-box LLMs like GPT-4 and Gemini is challenging. BBox-Adapter improves performance and cost efficiency.
Feb 13, 2024

The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation Setting
production
social-sciences
hci
architectures
LLMs improve personalized health interventions, outperforming laypersons and healthcare professionals.
Feb 13, 2024

JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models
programming
Proposed unsupervised authorship obfuscation method JamDec outperforms previous methods, competes with GPT3.5.
Feb 13, 2024

Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering
production
architectures
Improving Large Language Models’ accuracy and reliability through fine-tuning and data quality filters.
Feb 13, 2024

Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
production
education
MLLMs extended to domain-specific visual tasks using VQA-IN method, achieving high performance.
Feb 13, 2024

Knowledge Editing on Black-box Large Language Models
production
architectures
Knowledge editing aims to modify large language models with a new evaluation framework.
Feb 13, 2024

Lying Blindly: Bypassing ChatGPT’s Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale
production
hci
ChatGPT can create realistic disinformation about the war in Ukraine that’s hard to detect.
Feb 13, 2024

Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style
prompt-engineering
production
social-sciences
Novel dataset for counterarguments, strong paraphrasing abilities, GPT-3.5 turbo highest argument quality.
Feb 13, 2024

Unsupervised Evaluation of Code LLMs with Round-Trip Correctness
programming
New evaluation method RTC expands LLM testing to real-world software domains without human curation.
Feb 13, 2024

Improving Black-box Robustness with In-Context Rewriting
production
architectures
LLM-TTA improves OOD robustness for NLP models without regressing ID performance.
Feb 13, 2024

Tandem Transformers for Inference Efficient LLMs
production
architectures
Tandem transformers combine small and large models for faster, accurate language generation.
Feb 13, 2024

InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment
education
InstructGraph improves LLMs for graph tasks, outperforming GPT-4 and LLaMA2 by 13-38%. Code available.
Feb 13, 2024

LLMs and the Human Condition
social-sciences
hci
education
Integrating decision-making theories for conversational AI, aiming to understand language-based AI processes.
Feb 13, 2024

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment
prompt-engineering
education
New LLM-driven prompt optimization framework outperforms human-engineered prompts for multi-step tasks.
Feb 13, 2024

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages
production
social-sciences
SemRel dataset explores semantic relatedness in 14 languages, aiding NLP tasks and LLM performance.
Feb 13, 2024

Human Curriculum Effects Emerge with In-Context Learning in Neural Networks
production
Learning benefits from blocked examples with rule-like structure and interleaving without rules. Neural models demonstrate this.
Feb 13, 2024

Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models
prompt-engineering
CyclePrompt uses cycle-consistency to improve LLM performance without fine-tuning or external data.
Feb 13, 2024

Rethinking Machine Unlearning for Large Language Models
robustness
Exploring machine unlearning in large language models to eliminate undesirable data influence and maintain essential knowledge.
Feb 13, 2024

Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search
programming
architectures
VMCTS uses MCTS to guide LLMs to generate verified programs, improving synthesis capabilities.
Feb 13, 2024

Eliciting Big Five Personality Traits in Large Language Models: A Textual Analysis with Classifier-Driven Approach
prompt-engineering
production
social-sciences
hci
LLMs in recruitment raise ethical concerns. Study examines output variations based on input prompts.
Feb 13, 2024

Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback
architectures
New planning algorithm integrates Large Language Models into robotics, improving task success rates.
Feb 13, 2024

LLM-driven Imitation of Subrational Behavior : Illusion or Reality?
social-sciences
hci
LLMs used to model human behavior through synthetic demonstrations, replicating well-established findings.
Feb 13, 2024

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
production
architectures
social-sciences
Aya is a multilingual language model outperforming others in 101 languages.
Feb 12, 2024

Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English
social-sciences
Toisón de Oro bridges gap in Spanish financial NLP with bilingual framework and evaluation benchmark.
Feb 12, 2024

Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning
architectures
production
DP finetuning of LLMs for privacy, utility, and scalability using zeroth-order methods.
Feb 12, 2024

Food Recommendation as Language Processing (F-RLP): A Personalized and Contextual Paradigm
recommender
Challenges in food recommendation systems; F-RLP framework improves accuracy and personalization.
Feb 12, 2024

Secret Collusion Among Generative AI Agents
architectures
robustness
Large language models enable AI collusion, posing privacy and security risks. Proposed mitigation measures and model evaluation framework.
Feb 12, 2024

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
hci
OS-Copilot framework creates generalist agents for comprehensive computer tasks, outperforming previous methods.
Feb 12, 2024

Active Preference Learning for Large Language Models
prompt-engineering
architectures
TL;DR: Fine-tuning large language models with DPO active learning strategy improves performance and learning rate.
Feb 12, 2024

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
hci
Method enables users to ask questions about textual graphs, providing relevant replies and highlights.
Feb 12, 2024

Investigating the Impact of Data Contamination of Large Language Models in Text-to-SQL Translation
robustness
programming
architectures
GPT-3.5’s Text-to-SQL performance affected by Data Contamination, shown in unfamiliar dataset.
Feb 12, 2024

Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate
prompt-engineering
robustness
hci
Fact-checking LLMs need better explanations; MADR framework improves faithfulness, credibility, and trustworthiness.
Feb 12, 2024

T-RAG: Lessons from the LLM Trenches
architectures
LLM used for question answering over private documents, with focus on data security and robustness.
Feb 12, 2024

Mercury: An Efficiency Benchmark for LLM Code Synthesis
architectures
programming
production
Mercury is a new benchmark for evaluating code efficiency of Large Language Models.
Feb 12, 2024

PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
security
Large language models (LLMs) have limitations. Retrieval-Augmented Generation (RAG) mitigates them. PoisonedRAG attacks RAG.
Feb 12, 2024

Quantitative knowledge retrieval from large language models
production
prompt-engineering
education
Exploring LLMs for quantitative knowledge retrieval in data analysis tasks. Prompt engineering framework.
Feb 12, 2024

Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping
architectures
Multi-time bootstrapping self-alignment enhances model performance and data diversity for large language models.
Feb 12, 2024

Grounding Data Science Code Generation with Input-Output Specifications
prompt-engineering
programming
TL;DR: Gift4Code improves LLM code generation by fine-tuning with I/O specifications in data science tasks.
Feb 12, 2024

Game Agent Driven by Free-Form Text Command: Using LLM-based Code Generation and Behavior Branch
prompt-engineering
programming
Proposes text command control system for game agents using natural language commands.
Feb 12, 2024

Assessing Generalization for Subpopulation Representative Modeling via In-Context Learning
social-sciences
hci
LLM-based SRMs improve performance but benefit varies across demographics, posing challenges for practitioners and decision-makers.
Feb 12, 2024

Do Membership Inference Attacks Work on Large Language Models?
production
TL;DR: Membership inference attacks on large language models have poor performance due to dataset size and fuzzy boundaries.
Feb 12, 2024

Addressing cognitive bias in medical language models
prompt-engineering
social-sciences
architectures
LLMs in medicine susceptible to cognitive biases, GPT-4 most resilient, need for bias mitigation.
Feb 12, 2024

Large Language Models Ad Referendum: How Good Are They at Machine Translation in the Legal Domain?
architectures
production
LLMs perform well in legal translation, human evaluation important.
Feb 12, 2024

Lissard: Long and Simple Sequential Reasoning Datasets
production
architectures
Language models struggle with repetitive tasks on long sequences, as shown in Lissard benchmark.
Feb 12, 2024

Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking
prompt-engineering
programming
education
LLMs like ChatGPT mimic human-like interactions for software guidance, but users struggle to understand and evaluate their advice.
Feb 12, 2024

Retrieval-Augmented Thought Process as Sequential Decision Making
production
LLMs have challenges, RATP addresses them with external knowledge and improved decision process.
Feb 12, 2024

Policy Improvement using Language Feedback Models
prompt-engineering
architectures
production
LFMs identify desirable behavior for imitation learning, outperforming LLMs and improving task-completion rate.
Feb 12, 2024

The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models
production
architectures
prompt-engineering
LLMs enhance ASR accuracy in medical transcription, improving WER and semantic coherence.
Feb 12, 2024

Large Language Models are Few-shot Generators: Proposing Hybrid Prompt Algorithm To Generate Webshell Escape Samples
prompt-engineering
security
Hybrid Prompt algorithm generates high-quality webshell samples for AI-based detection.
Feb 12, 2024

Resilient Watermarking for LLM-Generated Codes
architectures
programming
security
robustness
TL;DR: ACW efficiently watermark AI-generated code, resisting tampering and outperforming existing methods.
Feb 12, 2024

On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
prompt-engineering
robustness
architectures
LLMs struggle with reasoning, but external verification improves performance more than self-critique.
Feb 12, 2024

Empowering Federated Learning for Massive Models with NVIDIA FLARE
production
architectures
Federated learning with NVIDIA FLARE improves AI model performance without centralized data.
Feb 12, 2024

AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy
architectures
production
LLMs improve forecasting accuracy by 23%, even with biased assistants, in cognitively demanding tasks.
Feb 12, 2024

TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection
production
architectures
security
Novel framework for trustworthy fake news detection prioritizes explainability, generalizability, and controllability of models.
Feb 12, 2024

Text-centric Alignment for Multi-Modality Learning
architectures
TAMML addresses modality mismatch in multimodal learning using Large Language Models for improved generalizability.
Feb 12, 2024

WildfireGPT: Tailored Large Language Model for Wildfire Analysis
production
LLMs struggle with domain-specific info, so WildfireGPT provides precise, current, and relevant wildfire risk insights.
Feb 12, 2024

Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
social-sciences
education
rDPO improves Large Language Model alignment without human data, using self-critique prompting and external rewards.
Feb 12, 2024

Anchor-based Large Language Models
architectures
AnLLM uses anchor-based attention to reduce memory demand and improve inference speed for LLMs.
Feb 12, 2024

Suppressing Pink Elephants with Direct Principle Feedback
architectures
social-sciences
Methods like RLHF and Constitutional AI train language models, but controlling them at inference time is important. Using Direct Principle Feedback improves performance on…
Feb 12, 2024

GRILLBot In Practice: Lessons and Tradeoffs Deploying Large Language Models for Adaptable Conversational Task Assistants
architectures
education
Developing GRILLBot for Alexa Prize TaskBot Challenge using hybrid architecture with LLMs.
Feb 12, 2024

Utilizing Large LanguageModels to Detect Privacy Leaks in Mini-App Code
security
robustness
Article: The Impact of Social Media on Mental Health: A Literature Review tl;dr: Social media has complex effects on mental health, with both positive and negative impacts.
Feb 12, 2024

CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity
production
security
education
LLMs outperform humans in cybersecurity, CyberMetric dataset facilitates fair comparison.
Feb 12, 2024

Large Language Models as Agents in Two-Player Games
hci
education
Defining LLM training processes as language-based games for insights and advancements.
Feb 12, 2024

Pushing The Limit of LLM Capacity for Text Classification
social-sciences
RGPT boosts text classification LLM performance, outperforming 8 PLMs and 7 LLMs by 1.36% on average.
Feb 12, 2024

Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models
social-sciences
Developed a tool to extract prognostic factors for difficult-to-treat depression from electronic health records.
Feb 12, 2024

Effort and Size Estimation in Software Projects with Large Language Model-based Intelligent Interfaces
education
LLMs benefit software design but pose challenges in estimating development efforts. New approach proposed.
Feb 11, 2024

Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
prompt-engineering
RAG-based LLM outputs are affected by input prefixes; GGPP improves robustness.
Feb 11, 2024

Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine
prompt-engineering
education
LARL-RM uses language models to encode high-level knowledge, speeding up reinforcement learning by 30%.
Feb 11, 2024

Large-Language-Model Empowered Dose Volume Histogram Prediction for Intensity Modulated Radiotherapy
robustness
social-sciences
Deep learning model predicts DVHs, enhanced by large-language model for radiotherapy treatment planning.
Feb 11, 2024

Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example
programming
Automating code change patterns with Large Language Models improves effectiveness and acceptance rate.
Feb 11, 2024

GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks
education
LLMs and GMs combined for pre-defined and open-ended tasks in graph domain.
Feb 11, 2024

CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain
security
TL;DR: Study creates CPSDbench to evaluate LLMs in Chinese public security tasks.
Feb 11, 2024

Does ChatGPT and Whisper Make Humanoid Robots More Relatable?
social-sciences
hci
Humanoid robots struggle to communicate effectively, but integrating LLMs improves user experience.
Feb 11, 2024

How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
prompt-engineering
social-sciences
hci
LLMs balance honesty and helpfulness, influenced by human feedback and prompting. GPT-4 Turbo mimics human responses.
Feb 11, 2024

Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation
robustness
hci
Conversational search improved with diverse conversation modeling using ConvAug framework.
Feb 11, 2024

A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
social-sciences
TL;DR: Nash Learning from Human Feedback (NLHF) explores reward-model-free learning from human preference.
Feb 11, 2024

Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies
social-sciences
hci
Advancements in ML and NLP improve NL2SQL error handling. User study evaluates error-handling mechanisms.
Feb 11, 2024

Differentially Private Training of Mixture of Experts Models
social-sciences
TL;DR: Investigates integrating Differential Privacy in training Mixture of Experts models for NLP.
Feb 11, 2024

Unified Speech-Text Pretraining for Spoken Dialog Modeling
architectures
production
hci
Proposes Unified Spoken Dialog Model (USDM) for natural-sounding spoken responses without ASR or TTS.
Feb 8, 2024

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
social-sciences
architectures
hci
TL;DR: Social scene simulation aligns large language models with human values, outperforming other methods.
Feb 8, 2024

Real-World Robot Applications of Foundation Models: A Review
production
Foundation models like LLMs and VLMs have wide applications in robotics, with future challenges discussed.
Feb 8, 2024

On the Convergence of Zeroth-Order Federated Tuning in Large Language Models
architectures
production
TL;DR: FedMeZO integrates memory-efficient optimization with federated learning for faster convergence and reduced memory usage.
Feb 8, 2024

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
architectures
IR-QLoRA improves accuracy of quantized LLMs with LoRA, compatible with various frameworks.
Feb 8, 2024

TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
social-sciences
hci
TimeArena introduces temporal dynamics for better language model multitasking, highlighting human superiority.
Feb 8, 2024

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
architectures
production
Proposing conversational web navigation problem, introducing WEBLINX benchmark, and evaluating models for web navigation.
Feb 8, 2024

How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
architectures
education
social-sciences
hci
production
Study explores LLM negotiation abilities using NegotiationArena, finding tactics and irrational behaviors.
Feb 8, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
production
SPHINX-X: Multimodal Large Language Model series with improved architecture and training efficiency.
Feb 8, 2024

Let Your Graph Do the Talking: Encoding Structured Data for LLMs
architectures
production
prompt-engineering
GraphToken method encodes structured data for language models, improving graph reasoning tasks by 73%.
Feb 8, 2024

Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models
prompt-engineering
Novel zero-shot CoT prompting method improves LLM performance across reasoning tasks. Code available.
Feb 8, 2024

Question Aware Vision Transformer for Multimodal Reasoning
prompt-engineering
Vision-Language models improved with QA-ViT, embedding question awareness in vision encoder for dynamic visual features.
Feb 8, 2024

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models
prompt-engineering
LLMs generate self-explanations, but their faithfulness is questionable. Plausibility may compromise faithfulness.
Feb 8, 2024

The Impact of AI Tool on Engineering at ANZ Bank An Emperical Study on GitHub Copilot within Coporate Environment
programming
education
architectures
hci
prompt-engineering
robustness
AI, particularly GitHub Copilot, boosts productivity and code quality in software engineering at ANZ Bank.
Feb 8, 2024

Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images
social-sciences
hci
New large vision-language models may exhibit gender and racial biases in responses to input images.
Feb 8, 2024

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
architectures
production
ChatSim enables editable photo-realistic 3D driving scene simulations via natural language commands.
Feb 8, 2024

In-Context Learning Can Re-learn Forbidden Tasks
security
architectures
social-sciences
production
robustness
Safety training for large language models is still vulnerable; in-context learning can undo it.
Feb 8, 2024

CIC: A framework for Culturally-aware Image Captioning
social-sciences
hci
prompt-engineering
CIC framework generates culturally-aware image captions, outperforming VLP-based methods.
Feb 8, 2024

Is it Possible to Edit Large Language Models Robustly?
education
architectures
social-sciences
production
prompt-engineering
robustness
TL;DR: Research explores model editing for language models to improve communicative AI applications.
Feb 8, 2024

Rocks Coding, Not Development–A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks
programming
architectures
education
social-sciences
hci
ChatGPT shows potential for coding tasks, but struggles with typical software development. Interaction insights provided.
Feb 8, 2024

Guiding Large Language Models with Divide-and-Conquer Program for Discerning Problem Solving
education
robustness
prompt-engineering
Foundation models like Large Language Models have many applications. Prompt design can unlock their potential.
Feb 8, 2024

Comprehensive Assessment of Jailbreak Attacks Against LLMs
security
architectures
hci
prompt-engineering
robustness
LLMs have vulnerabilities to jailbreak attacks, prompting need for evaluation and safeguards.
Feb 8, 2024

Automated Smart Contract Summarization via LLMs
programming
prompt-engineering
Gemini-Pro-Vision outperforms MMTrans in generating contract code summarization from multimodal inputs.
Feb 8, 2024

Efficient Models for the Detection of Hate, Abuse and Profanity
social-sciences
robustness
LLMs trained on web data may generate hateful or profane content. HAP detection is crucial.
Feb 8, 2024

It’s Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
architectures
LLMs used for error correction in ASR, UADF improves WER and reduces data uncertainty.
Feb 8, 2024

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
security
robustness
SALAD-Bench evaluates LLMs, attack, and defense methods with diverse, innovative questions and evaluators.
Feb 8, 2024

Enhancing Zero-shot Counting via Language-guided Exemplar Learning
social-sciences
Novel ExpressCount enhances zero-shot object counting using language-guided exemplar learning, achieving state-of-the-art performance.
Feb 8, 2024

Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
social-sciences
architectures
production
Study investigates Machine Unlearning (MU) for selective forgetting in language models, proposes evaluation metrics and annotation method.
Feb 8, 2024

PromptCrypt: Prompt Encryption for Secure Communication with Large Language Models
security
robustness
production
prompt-engineering
TL;DR: Cloud-based LLMs like ChatGPT raise privacy concerns, but PromptCrypt encrypts user inputs effectively.
Feb 8, 2024

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion
education
production
CREMA framework efficiently integrates multiple modalities for video reasoning, outperforming strong multimodal models with fewer parameters.
Feb 8, 2024

Large Language Model Meets Graph Neural Network in Knowledge Distillation
architectures
education
production
LLMs and GNNs combined for improved node classification in Text-Attributed Graphs.
Feb 8, 2024

In-Context Principle Learning from Mistakes
prompt-engineering
LEAP improves few-shot prompting for LLMs without needing more input or examples.
Feb 8, 2024

Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking
production
hci
LLMs in conversational search increase selective exposure and bias, with opinionated LLMs exacerbating the effect.
Feb 8, 2024

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs
production
Proposed PF decoder outperforms sampling in quality and robustness for LLM decoding.
Feb 8, 2024

GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study
hci
prompt-engineering
GPT-4 generates 24,000 narratives, 87.43% valid. ML models classify narratives.
Feb 8, 2024

FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs
social-sciences
robustness
production
FACT-GPT automates fact-checking by identifying related claims with high accuracy, aiding in misinformation combat.
Feb 8, 2024

Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
architectures
education
Bonsai method prunes large models for faster, accurate performance with limited hardware.
Feb 8, 2024

Driving Everywhere with Large Language Model Policy Adaptation
architectures
production
LLaDA helps human and autonomous drivers adapt to new traffic rules and environments.
Feb 8, 2024

Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
hci
security
architectures
prompt-engineering
LLMs pose safety concerns, RIPPLE method bypasses safety measures with high success rate.
Feb 8, 2024

You Can REST Now: Automated Specification Inference and Black-Box Testing of RESTful APIs with Large Language Models
production
architectures
TL;DR: RESTful APIs need better documentation and testing, LLMs can help automate this process.
Feb 7, 2024

SumRec: A Framework for Recommendation using Open-Domain Dialogue
hci
recommender
Article: The Impact of Social Media on Mental Health in Adolescents tl;dr: Social media use linked to negative mental health outcomes in adolescents.
Feb 7, 2024

Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning
architectures
aLLM4TS framework adapts LLMs for time-series representation learning, outperforming traditional methods.
Feb 7, 2024

Pedagogical Alignment of Large Language Models
prompt-engineering
social-sciences
architectures
production
education
TL;DR: Pedagogically-aligned LLMs guide students with feedback, outperforming previous methods in educational settings.
Feb 7, 2024

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science
robustness
LLMs in science have potential risks, need safety measures, and a triadic framework for mitigation.
Feb 7, 2024

TransLLaMa: LLM-based Simultaneous Translation System
programming
architectures
Decoder-only LLMs can perform SiMT tasks with fine-tuning and wait token. GPT-4 shows promise.
Feb 7, 2024

Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach
programming
education
Post-editing improves large language model text quality; neural programmer-interpreter enhances performance across domains.
Feb 7, 2024

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
social-sciences
MLLMs show potential in human-like discernment but face challenges and biases.
Feb 7, 2024

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models
prompt-engineering
LLMs generate self-explanations, but their faithfulness is questionable. Plausibility may compromise faithfulness.
Feb 7, 2024

L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ
production
education
architectures
PTQ and QAT reduce costs for Large Language Models. L4Q improves generality and accuracy.
Feb 7, 2024

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning
social-sciences
DP amplifies bias in large language models, but CDA can mitigate it.
Feb 7, 2024

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
architectures
Selecting longest responses consistently outperforms sophisticated methods for LLM fine-tuning.
Feb 7, 2024

Direct Language Model Alignment from Online AI Feedback
architectures
DAP methods lack online feedback, but OAIF improves performance with LLM annotator feedback.
Feb 7, 2024

RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation
recommender
New paradigm for LLM-based recommendation systems outperforms current methods with less training data.
Feb 7, 2024

Automated Smart Contract Summarization via LLMs
programming
prompt-engineering
Gemini-Pro-Vision outperforms MMTrans in generating contract code summarization from multimodal inputs.
Feb 7, 2024

Leveraging LLMs for Unsupervised Dense Retriever Ranking
architectures
Novel unsupervised technique uses large language models to select dense retrievers for specific test corpus.
Feb 7, 2024

Are LLMs Ready for Real-World Materials Discovery?
education
Large Language Models (LLMs) have potential for materials science, but need improvement for practical use.
Feb 7, 2024

A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models
architectures
LLMs’ self-rationalizing capabilities are appealing, but their faithfulness to predictions is questionable. Proposed statistical framework compares LLM and Bayesian network…
Feb 7, 2024

The Effect of Sampling Temperature on Problem Solving in Large Language Models
education
Sampling temperature has no significant impact on Large Language Model performance for problem-solving tasks.
Feb 7, 2024

A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?
production
education
LLMs can accelerate Bayesian optimization in molecular space with domain-specific data.
Feb 7, 2024

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
architectures
security
production
robustness
SALAD-Bench: a comprehensive safety benchmark for Large Language Models (LLMs) and defense methods.
Feb 7, 2024

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
security
robustness
LLMs have safety vulnerabilities, critical regions are sparse, and more robust safety strategies are needed.
Feb 7, 2024

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
production
architectures
Hydra heads improve speculative decoding speed by 1.31x and 2.71x.
Feb 7, 2024

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
architectures
DVDet enhances open-vocabulary object detection with precise region-text alignment, outperforming state-of-the-art methods.
Feb 7, 2024

Can Large Language Model Agents Simulate Human Trust Behaviors?
social-sciences
hci
LLM agents can simulate human trust behaviors with high alignment and implications for various scenarios.
Feb 7, 2024

An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration
production
education
LLMs face challenges; proposed KG-LLM collaboration improves reasoning and transparency, outperforming baselines.
Feb 7, 2024

Detecting Generated Native Ads in Conversational Search
production
security
robustness
architectures
Conversational search engines may integrate advertising, but LLMs can be used to block them.
Feb 7, 2024

CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients
hci
education
TL;DR: CataractBot provides expert-endorsed health information, saving time and accommodating diverse literacy levels.
Feb 7, 2024

ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12
hci
education
prompt-engineering
architectures
Challenges in teaching young children Scratch, ChatScratch AI system improves autonomous programming learning.
Feb 7, 2024

InCoRo: In-Context Learning for Robotics Control with Feedback Loops
education
LLMs used to translate commands for robotic units in dynamic environments, achieving high success rates.
Feb 7, 2024

MEMORYLLM: Towards Self-Updatable Large Language Models
robustness
architectures
MEMORYLLM is a large language model with self-updatable parameters for integrating new knowledge effectively.
Feb 7, 2024

SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph
hci
LLMs improve question answering over knowledge graphs; semantic clues boost performance by 33%.
Feb 7, 2024

Reconfidencing LLMs from the Grouping Loss Perspective
architectures
production
social-sciences
robustness
Large language models like ChatGPT and LLaMA are overconfident and generate inaccurate answers. New evaluation dataset and reconfidencing proposed.
Feb 7, 2024

Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems
production
social-sciences
architectures
Cognitive assistants using NLP show better user experience and performance than intent-based systems.
Feb 7, 2024

Hydragen: High-Throughput LLM Inference with Shared Prefixes
production
prompt-engineering
architectures
Hydragen improves LLM throughput by 32x with shared prefixes, enabling efficient attention computation.
Feb 7, 2024

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
architectures
PaDeLLM-NER reduces latency for NER with LLMs, improving speed without sacrificing quality.
Feb 7, 2024

Position Paper: Against Spurious Sparks \(-\) Dovelating Inflated AI Claims
social-sciences
hci
TL;DR: Humans attribute human-like qualities to AI, caution needed in interpreting AI research.
Feb 7, 2024

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory
architectures
LLMs struggle with long sequences, InfLLM adds memory units for better processing.
Feb 7, 2024

TinyLLM: Learning a Small Student from Multiple Large Language Models
education
prompt-engineering
TL;DR: TinyLLM uses knowledge distillation to teach small language models reasoning skills from large ones.
Feb 7, 2024

Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback
production
hci
architectures
New multimodal query suggestion system improves search results by 18%.
Feb 7, 2024

The Future of Cognitive Strategy-enhanced Persuasive Dialogue Agents: New Perspectives and Trends
social-sciences
hci
prompt-engineering
Persuasion in dialogue systems using cognitive psychology for human-like interaction.
Feb 7, 2024

Automatic Robotic Development through Collaborative Framework by Large Language Models
architectures
education
programming
TL;DR: Automated collaboration framework using LLMs for complex robot development without specialized knowledge.
Feb 6, 2024

Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models
social-sciences
Traditional deep learning struggles with stock prediction explanations. Our SEP framework improves LLM performance autonomously.
Feb 6, 2024

Empowering Language Models with Active Inquiry for Deeper Understanding
hci
architectures
prompt-engineering
education
LaMAI improves LLM responses with active inquiry, outperforming other frameworks.
Feb 6, 2024

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science
architectures
robustness
security
LLMs in science have potential risks, need safety measures, and a triadic framework for mitigation.
Feb 6, 2024

Can Generative Agents Predict Emotion?
hci
social-sciences
TL;DR: Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Feb 6, 2024

Explaining Autonomy: Enhancing Human-Robot Interaction through Explanation Generation with Large Language Models
production
System generates explanations for autonomous robot actions using Large Language Models (LLMs). Evaluated in navigation test.
Feb 6, 2024

The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs
robustness
MLLMs struggle with inconsistent image-text pairs, leading to hallucination. CorrelationQA benchmark assesses this.
Feb 6, 2024

ReLU\(^2\) Wins: Discovering Efficient Activation Functions for Sparse LLMs
architectures
Sparse computation for Large Language Models in low-resource scenarios, using non-ReLU activation functions. ReLU\(^2\) is most efficient.
Feb 6, 2024

Self-Discover: Large Language Models Self-Compose Reasoning Structures
prompt-engineering
Self-Discover framework improves LLMs’ performance on complex reasoning problems, outperforming other methods. Universally applicable.
Feb 6, 2024

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
prompt-engineering
Fine-tuning large language models for stable material generation with high reliability and flexibility.
Feb 6, 2024

Large Language Models As MOOCs Graders
architectures
social-sciences
education
prompt-engineering
Study explores using large language models to replace peer grading in MOOCs, showing promising results.
Feb 6, 2024

The World of Generative AI: Deepfakes and Large Language Models
robustness
GenAI like deepfakes and LLMs pose risks and ethical concerns for society.
Feb 6, 2024

Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy
hci
architectures
production
robustness
prompt-engineering
security
XR devices are becoming more common, using large language models can improve inclusivity and engagement.
Feb 6, 2024

Systematic Biases in LLM Simulations of Debates
hci
architectures
production
social-sciences
education
LLMs struggle to simulate human behavior, especially in political debates, due to inherent biases.
Feb 6, 2024

Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models
production
prompt-engineering
education
Novel teacher-student model improves prostate cancer symptom extraction from clinical notes using Large Language Models.
Feb 6, 2024

Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context
programming
LLMs struggle with repository-level code completion, but IDECoder leverages IDEs for improvement.
Feb 6, 2024

Scaling Laws for Downstream Task Performance of Large Language Models
architectures
production
Scaling laws in transfer learning for language models impact downstream performance in machine translation.
Feb 6, 2024

Can Large Language Models Detect Rumors on Social Media?
production
TL;DR: Proposed LeRuD approach improves rumor detection using LLMs on social media.
Feb 6, 2024

Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought
architectures
production
LLMs trained on large text datasets are impacted differently by static and dynamic noise.
Feb 6, 2024

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
architectures
Advancements in VLMMs using RLAIF for video-text alignment outperform previous approaches.
Feb 6, 2024

Professional Agents – Evolving Large Language Models into Autonomous Experts with Human-Level Competencies
hci
education
Large language models (LLMs) like ChatGPT, PaLM, and GPT-4 enable Professional Agents (PAgents) for advanced AI applications.
Feb 6, 2024

In-context learning agents are asymmetric belief updaters
architectures
production
social-sciences
LLMs learn asymmetrically from outcomes, influenced by problem framing.
Feb 6, 2024

Training Language Models to Generate Text with Citations via Fine-grained Rewards
robustness
LLMs need in-text citations for credibility. Proposed training framework improves citation generation. Outperforms GPT-3.5-turbo.
Feb 6, 2024

Large Language Models to Enhance Bayesian Optimization
architectures
production
education
TL;DR: LLAMBO integrates large language models to improve Bayesian optimization for hyperparameter tuning.
Feb 6, 2024

Assured LLM-Based Software Engineering
robustness
Assured LLMSE uses semantic filters to improve code with Large Language Models independently.
Feb 6, 2024

RevOrder: A Novel Method for Enhanced Arithmetic in Language Models
architectures
RevOrder improves arithmetic in large language models, reducing complexity and boosting performance.
Feb 6, 2024

Chatbot Meets Pipeline: Augment Large Language Model with Definite Finite Automaton
hci
DFA-LLM enhances LLMs for regulated responses in conversations, validated as effective.
Feb 6, 2024

Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models
prompt-engineering
education
LLMs struggle with geometric reasoning, but a new framework enhances their abilities.
Feb 6, 2024

Position Paper: Against Spurious Sparks-Dovelating Inflated AI Claims
hci
social-sciences
Humans attribute human-like qualities to objects and AI, caution needed in interpreting AI research.
Feb 6, 2024

Batch Universal Prediction
production
TL;DR: Large language models are good at generating human-like sentences, evaluated using batch regret.
Feb 6, 2024

Hierarchical Large Language Models in Cloud Edge End Architecture for Heterogeneous Robot Cluster Control
programming
TL;DR: Innovative architecture uses large language models to enhance multi-agent strategy generation and motion control.
Feb 6, 2024

Measuring Implicit Bias in Explicitly Unbiased Large Language Models
hci
social-sciences
prompt-engineering
LLMs can have implicit biases, measured by IAT and Decision Bias tests. Bias found in 6 LLMs.
Feb 6, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
architectures
production
security
HarmBench evaluates red teaming methods for language models, enhancing robustness and defense development.
Feb 6, 2024

The Use of a Large Language Model for Cyberbullying Detection
hci
production
social-sciences
architectures
Social media fuels cyberbullying, threatening mental and physical health. RoBERTa model outperforms others in detection.
Feb 6, 2024

Discovery of the Hidden World with Large Language Models
production
COAT uses large language models to discover causal factors from unstructured data.
Feb 6, 2024

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
hci
social-sciences
LLMs used for decision-making, RAP framework leverages past experiences, excels in text and multimodal environments.
Feb 6, 2024

Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning
prompt-engineering
recommender
OPPU improves large language model personalization, outperforming existing methods across diverse tasks.
Feb 6, 2024

INSIDE: LLMs’ Internal States Retain the Power of Hallucination Detection
architectures
robustness
LLMs’ internal states used for hallucination detection with EigenScore metric. Test time feature clipping explored.
Feb 6, 2024

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
architectures
production
NLP research focuses on LLMs, but data contamination and evaluation issues are concerning.
Feb 6, 2024

Minds versus Machines: Rethinking Entailment Verification with Language Models
social-sciences
Humans and Large Language Models differ in inference judgments. Flan-T5 model outperforms GPT-3.5 and rivals GPT-4.
Feb 6, 2024

Multi-line AI-assisted Code Authoring
architectures
production
programming
CodeCompose evolved to provide multi-line suggestions, overcoming challenges and improving usability for developers.
Feb 6, 2024

MolTC: Towards Molecular Relational Modeling In Language Models
architectures
MolTC framework improves molecular interaction prediction using large language models and graphical information.
Feb 6, 2024

Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning
prompt-engineering
education
New method improves Large Language Models’ reasoning power by 27-31% in factual and math tasks.
Feb 6, 2024

JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching
production
JobSkape framework generates comprehensive synthetic data for skill-to-taxonomy matching, outperforming baselines.
Feb 5, 2024

Nevermind: Instruction Override and Moderation in Large Language Models
production
architectures
security
LLMs perform best in following instructions, but struggle with overrides and safety guidelines.
Feb 5, 2024

Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases
education
prompt-engineering
robustness
architectures
production
Automatic prompt engineering for Large Language Models using a new calibration process. Outperforms state-of-the-art methods. Modular and adaptable. Code available.
Feb 5, 2024

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
production
architectures
Multimodal LLMs scaled to video with efficient decomposition for unified pre-training.
Feb 5, 2024

A Framework for Partially Observed Reward-States in RLHF
production
architectures
hci
RLHF study lacks consideration of human internal states. PORRL models aim to address this.
Feb 5, 2024

Homograph Attacks on Maghreb Sentiment Analyzers
production
social-sciences
Homograph attacks decrease Arabic sentiment analysis accuracy, highlighting weaknesses in language models.
Feb 5, 2024

Constrained Decoding for Cross-lingual Label Projection
production
architectures
Zero-shot cross-lingual transfer improved by constrained decoding for label projection. Versatile and high-performing.
Feb 5, 2024

LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models
social-sciences
hci
education
Study explores impact of language interaction on persona-conditioned LLM agents, highlighting need for robust personas.
Feb 5, 2024

Conversation Reconstruction Attack Against GPT Models
security
production
architectures
programming
Advancements in GPT models pose privacy risks in multi-round conversations, requiring attention.
Feb 5, 2024

C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
robustness
production
architectures
RAG models reduce generation risks with theoretical guarantees and empirical evidence.
Feb 5, 2024

Large Language Model Distilling Medication Recommendation Model
prompt-engineering
recommender
social-sciences
LLMs improve medication recommendation, addressing semantic nuances and computational costs.
Feb 5, 2024

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
robustness
production
architectures
security
TL;DR: Novel role-playing system generates and tests jailbreaks to improve safety of language models.
Feb 5, 2024

Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS
prompt-engineering
production
architectures
programming
New algorithm improves LLM code generation, addressing PPA-unawareness and achieving 31.8% area-delay product improvement.
Feb 5, 2024

CIDAR: Culturally Relevant Instruction Dataset For Arabic
social-sciences
architectures
education
TL;DR: CIDAR is an open Arabic instruction-tuning dataset culturally-aligned by human reviewers.
Feb 5, 2024

Empowering Time Series Analysis with Large Language Models: A Survey
production
architectures
LLMs used for time series analysis, challenges, methods, applications, and future research opportunities.
Feb 5, 2024

SWAG: Storytelling With Action Guidance
hci
SWAG improves long-form story generation using two-model feedback loop, outperforming previous techniques.
Feb 5, 2024

Detecting Scams Using Large Language Models
robustness
production
architectures
security
LLMs used to detect scams in cybersecurity, with focus on phishing and fraud. Preliminary evaluation shows effectiveness.
Feb 5, 2024

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
robustness
KV cache size limits LLM efficiency, but KIVI algorithm reduces memory usage and increases throughput.
Feb 5, 2024

Chain-of-Feedback: Mitigating the Effects of Inconsistency in Responses
prompt-engineering
education
LLMs struggle with knowledge-based questions, leading to inconsistent and unreliable responses. Recursive feedback may improve accuracy.
Feb 5, 2024

Beyond Text: Improving LLM’s Decision Making for Robot Navigation via Vocal Cues
hci
social-sciences
security
Text-based LLMs struggle in human-robot interaction, but integrating audio features improves performance by 70.26%.
Feb 5, 2024

Large Language Models are Geographically Biased
social-sciences
education
LLMs carry biases from training data, leading to geographic biases and systemic errors.
Feb 5, 2024

Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
prompt-engineering
production
hci
architectures
Proposing Skill Set Optimization (SSO) to improve LLM actor performance in interactive environments.
Feb 5, 2024

Neural networks for abstraction and reasoning: Towards broad generalization in machines
education
Article: The Impact of Social Media on Mental Health in Adolescents tl;dr: Social media use linked to negative mental health outcomes in adolescents.
Feb 5, 2024

Shortened LLaMA: A Simple Depth Pruning for Large Language Models
production
Pruning reduces large language model size for faster inference on memory-constrained devices.
Feb 5, 2024

Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach
social-sciences
prompt-engineering
TL;DR: Study uses LLMs to analyze Reddit comments for suicidal risk assessment, prioritizing privacy and cost-effectiveness.
Feb 5, 2024

The Matrix: A Bayesian learning model for LLMs
production
architectures
Bayesian learning model for Large Language Models (LLMs) behavior and optimization metric.
Feb 5, 2024

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
production
Algorithm UoT improves large language models by actively seeking information, achieving 57.8% performance improvement.
Feb 5, 2024

RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews
social-sciences
hci
RACER automates analysis of healthcare interviews, achieving high agreement with human evaluators.
Feb 5, 2024

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
prompt-engineering
LLMs struggle with asynchronous planning, but PLaG technique improves performance.
Feb 5, 2024

UniMem: Towards a Unified View of Long-Context Large Language Models
production
architectures
UniMem unifies long-context methods for large language models, improving performance in handling long contexts.
Feb 5, 2024

EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models
prompt-engineering
production
architectures
education
Instruction tuning for Large Language Models is crucial. EasyInstruct framework facilitates research and development.
Feb 5, 2024

UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing
programming
TL;DR: UniTSyn dataset enhances LLMs for unit test synthesis, improving test generation accuracy and code coverage.
Feb 4, 2024

Discovering More Effective Tensor Network Structure Search Algorithms via Large Language Models (LLMs)
prompt-engineering
GPTN-SS uses large language models to develop effective tensor network structure search algorithms.
Feb 4, 2024

Enhance Reasoning for Large Language Models in the Game Werewolf
prompt-engineering
hci
Framework integrates LLMs and Thinker for enhanced reasoning, demonstrated in Werewolf game.
Feb 4, 2024

Jailbreaking Attack against Multimodal Large Language Model
social-sciences
security
TL;DR: Paper explores jailbreaking attacks on language models, proposes algorithm for image prompts.
Feb 4, 2024

Large Language Model Adaptation for Networking
hci
NetLLM adapts large language models for networking tasks, outperforming state-of-the-art algorithms.
Feb 4, 2024

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
education
TL;DR: New MLLM LHRS-Bot understands remote sensing images and performs nuanced reasoning.
Feb 4, 2024

DefInt: A Default-interventionist Framework for Efficient Reasoning with Hybrid Large Language Models
prompt-engineering
Large language models face reasoning challenges. Default-Interventionist framework improves accuracy and reduces token cost.
Feb 4, 2024

Are Large Language Models Table-based Fact-Checkers?
prompt-engineering
education
LLMs show potential for table-based fact verification with prompt engineering and instruction tuning.
Feb 4, 2024

Factuality of Large Language Models in the Year 2024
programming
LLMs provide quick answers but often incorrect. Research focuses on improving factuality.
Feb 4, 2024

DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models
prompt-engineering
LLMs struggle with decision-making, but DeLLMa framework improves accuracy by 40%.
Feb 4, 2024

KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion
prompt-engineering
KICGPT integrates language model and triple-based KGC retriever for efficient knowledge graph completion.
Feb 4, 2024

GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
robustness
prompt-engineering
education
TL;DR: GeReA uses MLLM for VQA, outperforming previous methods with 66.5% and 63.3% accuracy.
Feb 4, 2024

LLM-Enhanced Data Management
prompt-engineering
education
ML techniques for data management have limitations; LLMDB addresses challenges for improved performance.
Feb 4, 2024

PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial Reasoning Problems?
prompt-engineering
education
LLMs struggle with complex reasoning, but Puzzle-LM combines them with solvers for improvement.
Feb 4, 2024

GLaPE: Gold Label-agnostic Prompt Evaluation and Optimization for Large Language Model
prompt-engineering
LLMs’ task performance relies on prompt design, GLaPE proposes label-agnostic prompt evaluation.
Feb 4, 2024

Solution-oriented Agent-based Models Generation with Verifier-assisted Iterative In-context Learning
education
programming
Agent-based models (ABMs) are complex, but SAGE framework uses large language models (LLMs) effectively.
Feb 4, 2024

Evaluating Large Language Models in Analysing Classroom Dialogue
prompt-engineering
social-sciences
hci
education
Study examines GPT-4’s use in analyzing classroom dialogue, finding time savings and high consistency.
Feb 4, 2024

Comparative Study of Large Language Model Architectures on Frontier
architectures
production
TL;DR: Comparative study of GPT-NeoX and LLaMA for materials science, achieving state-of-the-art performance.
Feb 1, 2024

Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?
architectures
production
Smaller LLMs like FLAN-T5 are cost-efficient for real-world industrial deployment.
Feb 1, 2024

Prompt-Time Symbolic Knowledge Capture with Large Language Models
architectures
prompt-engineering
Utilizing large language models for prompt-driven knowledge capture, focusing on prompt-to-triple generation.
Feb 1, 2024

Dense Reward for Free in Reinforcement Learning from Human Feedback
architectures
production
RLHF improves LLM training by redistributing rewards based on attention weights, leading to better outcomes.
Feb 1, 2024

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection
architectures
robustness
security
social-sciences
LLMs improve bot detection but also pose risks, with potential to evade detection.
Feb 1, 2024

Investigating Bias Representations in Llama 2 Chat via Activation Steering
architectures
hci
social-sciences
Addressing societal bias in LLMs, using activation steering to mitigate gender bias. Bias persists post-RLHF.
Feb 1, 2024

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
architectures
production
Instruction tuning needs high-quality data. Superfiltering uses smaller model to improve efficiency.
Feb 1, 2024

EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
architectures
production
EE-Tuning makes training early-exit large language models more efficient and accessible.
Feb 1, 2024

Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement
prompt-engineering
TL;DR: Neuro-symbolic Logic-Explainer improves ethical NLI explanations, enhancing logical validity and alignment of LLMs.
Feb 1, 2024

Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning
robustness
TL;DR: Paper addresses uncertainty in language models for cost-efficient AI agent development.
Feb 1, 2024

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
architectures
robustness
security
production
LVLMs vulnerable to typographic attacks; new benchmark and self-generated attacks more effective.
Feb 1, 2024

Actor Identification in Discourse: A Challenge for LLMs?
production
robustness
hci
architectures
social-sciences
Identifying political actors in public debate is challenging. LLM struggles but hybrid model improves performance.
Feb 1, 2024

Towards scalable robotic intervention of children with Autism Spectrum Disorder using LLMs
education
hci
social-sciences
TL;DR: Social robot uses language model to teach perspective-taking to children with ASD. GPT-2 + BART pipeline is effective.
Feb 1, 2024

An Exam-based Evaluation Approach Beyond Traditional Relevance Judgments
prompt-engineering
education
IR evaluation based on answering key questions, not relevance judgments. New metric for evaluation.
Feb 1, 2024

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants
production
programming
robustness
security
hci
architectures
education
TL;DR: Study evaluates LLMs for secure code generation in real-world scenarios.
Feb 1, 2024

Intent Assurance using LLMs guided by Intent Drift
architectures
production
IBN aligns network operations with business objectives, but faces challenges in processing and assuring intents.
Feb 1, 2024

Large Language Models Based Fuzzing Techniques: A Survey
programming
robustness
security
prompt-engineering
architectures
education
Fuzzing tests using Large Language Models for software security and vulnerability analysis.
Feb 1, 2024

Unlearnable Algorithms for In-context Learning
architectures
production
TL;DR: Efficient unlearning for large language models using in-context learning and few-shot training examples.
Feb 1, 2024

SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models
architectures
education
TL;DR: SA-MDKIF injects medical knowledge into LLMs, improving performance by 10-20% in medical tasks.
Feb 1, 2024

Can Large Language Models Understand Context?
architectures
production
LLMs show impressive language understanding, but struggle with nuanced context. Pre-trained models outperform quantized ones. Code available.
Feb 1, 2024

Don’t Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
architectures
education
Study identifies and addresses knowledge gaps in large language models, improving accuracy.
Feb 1, 2024

Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
robustness
production
prompt-engineering
LLMs have reasoning flaws, but a new framework improves planning-based reasoning. Outperforms GPT-3.5-Turbo.
Feb 1, 2024

LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law
architectures
production
Pretrained LLMs can accurately forecast dynamical systems without fine-tuning.
Feb 1, 2024

Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model
production
social-sciences
AI in healthcare needs more detailed methods. Heath-LLM framework improves disease prediction and health management.
Feb 1, 2024

Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detection
robustness
security
education
TL;DR: Large language models pose risks in education due to easy evasion of detection methods.
Feb 1, 2024

Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective
social-sciences
Computational experiments and LLM-based Agents enhance each other, with potential for future research.
Feb 1, 2024

SymbolicAI: A framework for logic-based approaches combining generative models and solvers
architectures
production
SymbolicAI framework integrates generative models with diverse solvers, enabling explainable computational graphs. VERTEX score evaluates LLMs.
Feb 1, 2024

Does extsc{DetectGPT} Fully Utilize Perturbation? Selective Perturbation on Model-Based Contrastive Learning Detector would be Better
hci
DetectGPT improves text detection, but introduces noise. Pecola outperforms SOTA method in accuracy.
Feb 1, 2024

From PARIS to LE-PARIS: Toward Patent Response Automation with Recommender Systems and Collaborative Large Language Models
recommender
PARIS and LE-PARIS improve patent attorney efficiency and performance in handling Office Actions.
Feb 1, 2024

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
architectures
robustness
production
Advancements in LLMs for AI planning, Formal-LLM framework improves plan validity by 50%. Open-sourced.
Feb 1, 2024

Supporting Anticipatory Governance using LLMs: Evaluating and Aligning Large Language Models with the News Media to Anticipate the Negative Impacts of AI
production
architectures
social-sciences
LLMs used to anticipate AI impacts may have biases, but aligning them with diverse data can help.
Jan 31, 2024

WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts
production
prompt-engineering
architectures
ToE method improves WSC question generation, revealing LLM biases and overconfidence. GPT-4 accuracy 68.7%.
Jan 31, 2024

Generative AI to Generate Test Data Generators
prompt-engineering
security
robustness
AI can effectively generate realistic test data across different domains and languages.
Jan 31, 2024

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain
hci
architectures
social-sciences
Advancements in AI show parallels between large language models and human neural processing.
Jan 31, 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
production
architectures
KVQuant improves quantization of cached KV activations, achieving better performance with lower precision.
Jan 31, 2024

LoRec: Large Language Model for Robust Sequential Recommendation against Poisoning Attacks
recommender
security
architectures
production
TL;DR: LLM4Dec detects unknown fraudsters in recommender systems, LoRec integrates LLMs to defend against poisoning attacks.
Jan 31, 2024

Towards Efficient and Reliable LLM Serving: A Real-World Workload Study
architectures
TL;DR: Industry faces challenges with high costs and reliability of large language models, new dataset and benchmark suite developed.
Jan 31, 2024

EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation
social-sciences
EEG-GPT unifies EEG classification using large language models, achieving high performance with minimal data.
Jan 31, 2024

[Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs
production
Seq2seq systems solve coreference challenges in literary text with markdown-like annotations.
Jan 31, 2024

Global-Liar: Factuality of LLMs over Time and Geographic Regions
production
architectures
social-sciences
AI-driven solutions like GPT models need factual accuracy and fairness, especially for global equity.
Jan 31, 2024

Enhancing Large Language Model with Decomposed Reasoning for Emotion Cause Pair Extraction
production
hci
architectures
robustness
social-sciences
TL;DR: DECC framework improves emotion-cause pair extraction using large language models without additional training.
Jan 31, 2024

Multipath parsing in the brain
prompt-engineering
architectures
social-sciences
Humans process sentences incrementally, resolving syntactic ambiguities word-by-word, with evidence for multipath parsing.
Jan 31, 2024

Prompt-Driven LLM Safeguarding via Directed Representation Optimization
production
architectures
robustness
prompt-engineering
security
Safety prompts don’t significantly improve large language model safety; DRO method optimizes them effectively.
Jan 31, 2024

Paramanu: A Family of Novel Efficient Indic Generative Foundation Language Models
production
architectures
Gyan AI Paramanu: efficient, powerful language models for 10 Indian languages, outperforming larger models.
Jan 31, 2024

Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning
production
prompt-engineering
architectures
Advancements in reasoning for Large Language Models using Deductive Beam Search to reduce errors.
Jan 31, 2024

Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
hci
architectures
education
prompt-engineering
social-sciences
LLMs model human biases in text comprehension and solution planning, but not in solution execution.
Jan 31, 2024

SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models
robustness
architectures
LLMs used in real-time strategy tasks in StarCraft II. SwarmBrain achieves victory against Computer players.
Jan 31, 2024

Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
architectures
education
MLLMs excel in vision-language but struggle with depth perception. Proximity QA framework improves this. New dataset available.
Jan 31, 2024

ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation
robustness
TL;DR: LLM-based pipeline generates SVA from natural language, with 43% error rate. Iterative prompting improves accuracy.
Jan 31, 2024

Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for Assurance Cases
production
ACs verify non-functional requirements; GPT-4 Turbo automates identifying defeaters in EA notation.
Jan 31, 2024

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM
robustness
prompt-engineering
education
TL;DR: SymPrompt improves large language model test generation for complex software units.
Jan 31, 2024

Probing Language Models’ Gesture Understanding for Enhanced Human-AI Interaction
programming
prompt-engineering
social-sciences
hci
Proposal to study Large Language Models’ ability to interpret non-verbal cues in text.
Jan 31, 2024

ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search
architectures
education
Federated search with LLMs improves resource selection without extensive labels or features.
Jan 31, 2024

LongAlign: A Recipe for Long Context Alignment of Large Language Models
production
architectures
education
LongAlign improves large language models for long context tasks by 30%. Open-sourced at https://github.com/THUDM/LongAlign.
Jan 31, 2024

I Think, Therefore I am: Awareness in Large Language Models
hci
production
LLMs show some awareness, but lack full capability awareness. Ethical responses are important.
Jan 31, 2024

Large Language Models for Mathematical Reasoning: Progresses and Challenges
education
programming
hci
Survey explores LLMs in math problem-solving, datasets, techniques, challenges, and future prospects.
Jan 31, 2024

SWEA: Changing Factual Knowledge in Large Language Models via Subject Word Embedding Altering
production
architectures
Model editing methods have limitations. SWEA framework proposes reliable knowledge editing without increasing overhead.
Jan 31, 2024

Synthetic Dialogue Dataset Generation using LLM Agents
hci
prompt-engineering
Goal-oriented conversational agent for linear programming problem elicitation and model generation. Evaluation results provided.
Jan 30, 2024

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code
security
hci
robustness
programming
production
LLMs can be misused; watermarking with error-correction codes improves accuracy and robustness.
Jan 30, 2024

Large Language Model Evaluation via Matrix Entropy
production
Novel metric matrix entropy evaluates data compression proficiency in large language models.
Jan 30, 2024

Data-efficient Fine-tuning for LLM-based Recommendation
production
architectures
recommender
LLMs’ few-shot fine-tuning for recommendation data pruning method reduces time costs by 97%.
Jan 30, 2024

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
hci
architectures
Developing reliable evaluation methods for Large Language Models (LLMs) is challenging. ScaleEval framework assists in meta-evaluation.
Jan 30, 2024

CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
production
robustness
architectures
RAG enhances language models with external knowledge, but current benchmarks are limited. New comprehensive benchmark created.
Jan 30, 2024

MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
hci
MT-Eval benchmarks LLMs for multi-turn conversations, identifying key factors impacting performance.
Jan 30, 2024

MouSi: Poly-Visual-Expert Vision-Language Models
production
architectures
Ensemble experts improve VLM performance by unifying visual encoders and addressing positional encoding issues.
Jan 30, 2024

Customizing Language Model Responses with Contrastive In-Context Learning
prompt-engineering
education
TL;DR: Using contrastive examples improves large language model performance for specific content generation.
Jan 30, 2024

Towards Generating Executable Metamorphic Relations Using Large Language Models
production
architectures
TL;DR: Proposed approach automates deriving executable metamorphic relations from requirements, showing promising results for testing.
Jan 30, 2024

Weak-to-Strong Jailbreaking on Large Language Models
production
security
robustness
architectures
Aligned language models can still be hacked using smaller models as guides. Defense strategies are needed.
Jan 30, 2024

LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation
production
education
robustness
architectures
LLaMP111Code reduces hallucinations in language models for materials science, improving data comprehension and integration.
Jan 30, 2024

A Preliminary Study on Using Large Language Models in Software Pentesting
prompt-engineering
security
robustness
education
LLMs can automate security tasks, improve over time with human interaction, and outperform static code analyzers.
Jan 30, 2024

Enhancing Compiler Transformation Robustness with Large Language Models
robustness
architectures
Framework integrates LLMs into translation validation for LLVM compiler transformations, using formal verification and prediction.
Jan 30, 2024

Can Large Language Models Replace Economic Choice Prediction Labs?
social-sciences
AI can predict human economic choices, even outperforming models trained on human data.
Jan 30, 2024

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
production
architectures
UltraTool benchmarks LLMs’ tool utilization in complex real-world scenarios, offering novel insights.
Jan 30, 2024

SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
production
architectures
SemScore metric outperforms others in evaluating instruction-tuned LLMs.
Jan 30, 2024

Detecting mental disorder on social media: a ChatGPT-augmented explainable approach
social-sciences
Title: The Impact of Social Media on Mental Health: A Literature Review Abstract: This literature review examines the relationship between social media use and mental…
Jan 30, 2024

Finetuning Large Language Models for Vulnerability Detection
security
architectures
robustness
programming
production
TL;DR: Finetuned WizardCoder LLM improves vulnerability detection in source code.
Jan 30, 2024

Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity Alignment
production
Entity alignment for Knowledge Graphs improved by integrating Large Language Models for semantic knowledge.
Jan 30, 2024

Incoherent Probability Judgments in Large Language Models
social-sciences
hci
LLMs excel at text generation but struggle with coherent probability judgments, showing human-like biases.
Jan 30, 2024

Transfer Learning for Text Diffusion Models
production
architectures
Explore text diffusion as an alternative to autoregressive decoding for language models. AR2Diff adaptation shows promise.
Jan 30, 2024

H2O-Danube-1.8B Technical Report
production
architectures
H2O-Danube-1.8B: Highly competitive language model trained on 1T tokens, openly available.
Jan 30, 2024

Weaver: Foundation Models for Creative Writing
production
architectures
Weaver: specialized large language models for improved content creation, outperforming generalist LLMs.
Jan 30, 2024

Detecting LLM-Assisted Writing in Scientific Communication: Are We There Yet?
robustness
prompt-engineering
programming
Article: The Impact of Social Media on Mental Health: A Review of the Literature tl;dr: Social media can negatively impact mental health, but more research is needed.
Jan 30, 2024

Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models
social-sciences
hci
Survey explores hate speech moderation, emphasizes role of large language and multimodal models. Identifies research gaps.
Jan 30, 2024

Learning Agent-based Modeling with LLM Companions: Experiences of Novices and Experts Using ChatGPT & NetLogo Chat
prompt-engineering
architectures
hci
programming
production
education
LLMs can change programming; NetLogo Chat supports learning; experts benefit more.
Jan 30, 2024

Conditional and Modal Reasoning in Large Language Models
robustness
architectures
Study examines large language models’ reasoning abilities with conditionals and epistemic modals, finding inconsistencies.
Jan 30, 2024

A Cross-Language Investigation into Jailbreak Attacks in Large Language Models
security
robustness
programming
architectures
LLMs face security challenges, including Multilingual Jailbreak attacks, but mitigation strategies can be effective.
Jan 30, 2024

Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning
social-sciences
hci
education
production
Dialogue systems using LLMs like GPT-4 improve mental health app outcomes. Ethical concerns remain.
Jan 29, 2024

LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning
security
robustness
architectures
production
LLMs show potential for vulnerability detection, but need further evaluation and enhancement.
Jan 29, 2024

The role of library versions in Developer-ChatGPT conversations
production
architectures
programming
ChatGPT aids developers, but library version constraints are rarely mentioned in conversations.
Jan 29, 2024

Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis
social-sciences
architectures
production
AI in healthcare for automatic diagnosis using language models, AMSC framework, and improved efficiency.
Jan 29, 2024

Scaling Sparse Fine-Tuning to Large Language Models
architectures
production
Sparse fine-tuning (SFT) scales to large language models, outperforming other methods. Compatible with quantization and efficient optimizers.
Jan 29, 2024

A Linguistic Comparison between Human and ChatGPT-Generated Conversations
social-sciences
hci
education
Comparing human and ChatGPT-generated dialogues, finding differences and similarities in linguistic categories.
Jan 29, 2024

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory
security
robustness
LeftoverLocals vulnerability allows data recovery from GPU memory, impacting security of GPU applications.
Jan 29, 2024

An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project
prompt-engineering
education
production
architectures
programming
LLMs can enhance software development by generating code and aiding in error debugging.
Jan 29, 2024

Security Code Review by LLMs: A Deep Dive into Responses
robustness
security
prompt-engineering
architectures
programming
LLMs struggle with verbosity, vagueness, and incompleteness in security code review.
Jan 29, 2024

PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
production
architectures
education
PathMMU is a specialized pathology benchmark for large multimodal models, challenging even top-performing models.
Jan 29, 2024

Corrective Retrieval Augmented Generation
robustness
architectures
production
TL;DR: Corrective Retrieval Augmented Generation (CRAG) improves large language model (LLM) text generation accuracy.
Jan 29, 2024

APIGen: Generative API Method Recommendation
production
recommender
architectures
programming
APIGen improves API recommendation by selecting diverse examples and enabling reasoning for better results.
Jan 29, 2024

Leveraging Professional Radiologists’ Expertise to Enhance LLMs’ Evaluation for Radiology Reports
social-sciences
education
AI improves radiology reports, but current metrics lack accuracy. Our method aligns AI with radiologist standards.
Jan 29, 2024

Tradeoffs Between Alignment and Helpfulness in Language Models
security
architectures
production
Representation engineering improves alignment but decreases model helpfulness, with a quadratic tradeoff.
Jan 29, 2024

You tell me: A Dataset of GPT-4-Based Behaviour Change Support Conversations
hci
social-sciences
Dataset of user interactions with GPT-4 agents for behavior change interventions. Valuable insights for system design.
Jan 29, 2024

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
architectures
production
Large language models need massive data, but web data is noisy. WRAP pre-training improves performance.
Jan 29, 2024

LCVO: An Efficient Pretraining-Free Framework for Visual Question Answering Grounding
education
LCVO modular method for VQA Grounding is efficient, adaptable, and competitive with baseline methods.
Jan 29, 2024

SelectLLM: Can LLMs Select Important Instructions to Annotate?
education
Training large language models with diverse data improves comprehension. SelectLLM selects high-quality instructions effectively.
Jan 29, 2024

Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets
social-sciences
architectures
production
MT metrics need improvement, ACES challenge set evaluates 50 metrics, LLM-based methods unreliable.
Jan 29, 2024

InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
prompt-engineering
InfoLossQA framework recovers simplification-induced information loss using QA pairs, but models struggle with reliability.
Jan 29, 2024

LLMs as On-demand Customizable Service
programming
Hierarchical LLM architecture enhances accessibility and deployability of large language models across computing platforms.
Jan 29, 2024

Diverse, but Divisive: LLMs Can Exaggerate Gender Differences in Opinion Related to Harms of Misinformation
robustness
social-sciences
hci
Fact-checkers prioritize limited resources, AI reflects gender differences in misinformation opinions.
Jan 29, 2024

Knowledge-Aware Code Generation with Large Language Models
prompt-engineering
education
hci
production
architectures
programming
LLMs struggle with complex programming tasks, but KareCoder improves problem-solving on novel problems.
Jan 29, 2024

E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models
prompt-engineering
education
production
architectures
social-sciences
TL;DR: E-EVAL is a benchmark for Chinese K-12 education LLMs, showing strengths and limitations.
Jan 29, 2024

ReGAL: Refactoring Programs to Discover Generalizable Abstractions
programming
ReGAL improves large language models by learning reusable functions through code refactorization.
Jan 29, 2024

AI as a Medical Ally: Evaluating ChatGPT’s Usage and Impact in Indian Healthcare
hci
social-sciences
Study on ChatGPT in Indian healthcare: pros for education, caution for reliability, privacy, and trust.
Jan 28, 2024

ACCESS: Prompt Engineering for Automated Web Accessibility Violation Corrections
prompt-engineering
TL;DR: Web accessibility is crucial, but most sites fail to meet requirements. New approach reduces errors.
Jan 28, 2024

From Word Embedding to Reading Embedding Using Large Language Model, EEG and Eye-tracking
education
Innovative BCI tasks predict word relevance for reading comprehension, achieving 68.7% accuracy.
Jan 28, 2024

RE-GAINS & EnCHANT: Intelligent Tool Manipulation Systems For Enhanced Query Responses
education
LLMs struggle with tool invocation and chaining, but RE-GAINS and EnCHANT offer cost-effective solutions.
Jan 28, 2024

OpineBot: Class Feedback Reimagined Using a Conversational LLM
social-sciences
prompt-engineering
hci
education
OpineBot improves class feedback with LLM-based chatbot, engaging students and providing deeper feedback.
Jan 28, 2024

Comuniqa : Exploring Large Language Models for improving speaking skills
hci
education
social-sciences
LLMs improve speaking skills, but lack human cognitive capabilities and empathy.
Jan 28, 2024

LLsM: Generative Linguistic Steganography with Large Language Model
prompt-engineering
TL;DR: LLsM scheme uses Large Language Model for better steganographic text quality and anti-steganalysis.
Jan 28, 2024

YODA: Teacher-Student Progressive Learning for Language Models
prompt-engineering
education
YODA framework emulates human learning to improve model fine-tuning, showing significant performance gains.
Jan 28, 2024

Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting
robustness
prompt-engineering
social-sciences
LLMs perform better on scalable tasks with CoT prompting, but can reproduce societal biases. CoT reduces bias.
Jan 28, 2024

CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks
production
TL;DR: CompactifAI compresses LLMs using quantum-inspired Tensor Networks, maintaining accuracy with smaller size.
Jan 25, 2024

Transformers and Cortical Waves: Encoders for Pulling In Context Across Time
production
Transformers like ChatGPT use self-attention to learn long-range temporal dependencies in sequences. Cortical waves may implement similar encoding.
Jan 25, 2024

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
architectures
WebVoyager is a powerful web agent that interacts with real-world websites and outperforms other models in practical tasks.
Jan 25, 2024

True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning
production
architectures
hci
TL;DR: TWOSOME integrates large language models with reinforcement learning agents for efficient interaction with environments and superior performance.
Jan 25, 2024

Towards Uncertainty-Aware Language Agent
production
UALA framework improves large language model interaction by incorporating uncertainty quantification, showing significant performance improvement and reduced reliance on…
Jan 25, 2024

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts
production
architectures
education
hci
prompt-engineering
Recent progress in NLP focuses on improving LLMs using innovative prompting techniques like Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts to enhance reasoning and…
Jan 25, 2024

CUI@CHI 2024: Building Trust in CUIs-From Design to Deployment
social-sciences
architectures
hci
Workshop aims to explore trust and reliance in conversational user interfaces, engaging a multidisciplinary group of researchers and practitioners.
Jan 25, 2024

A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification
social-sciences
GPT-4 model outperforms supervised models in classifying breast cancer pathology reports.
Jan 25, 2024

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
production
prompt-engineering
Large language models generate convincing explanations but lack consistency. Explanation-consistency finetuning improves explanation coherence across various datasets.
Jan 25, 2024

How Can Large Language Models Understand Spatial-Temporal Data?
production
architectures
This paper introduces STG-LLM, an approach empowering LLMs for spatial-temporal forecasting using STG-Tokenizer and STG-Adapter.
Jan 25, 2024

Adaptive Text Watermark for Large Language Models
robustness
security
TL;DR: Proposal for adaptive watermarking in AI-generated text maintains quality and security, achieving comparable robustness to existing methods.
Jan 25, 2024

RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization
production
prompt-engineering
Romanized text enhances performance and efficiency of Large Language Models for non-Latin languages like Hindi.
Jan 25, 2024

GPTVoiceTasker: LLM-Powered Virtual Assistant for Smartphone
production
architectures
GptVoiceTasker enhances mobile task efficiency by intelligently interpreting commands and automating device interactions.
Jan 25, 2024

LocMoE: A Low-overhead MoE for Large Language Model Training
architectures
MoE model for language models is improved with a new routing strategy, reducing training time without sacrificing accuracy.
Jan 25, 2024

Ta’keed: The First Generative Fact-Checking System for Arabic Claims
production
architectures
Ta’keed is an Arabic fact-checking system with explainable claim credibility assessment. F1 score of 0.77.
Jan 25, 2024

ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using ChatGPT
production
architectures
programming
ZS4C proposes a lightweight method to synthesize compilable code from incomplete code snippets, achieving 87.6% compilation success.
Jan 25, 2024

Improving Natural Language Capability of Code Large Language Model
production
architectures
programming
New framework integrates code models with natural language processing tools, and performs well in multi-language code generation benchmark.
Jan 25, 2024

ChatGPT and Human Synergy in Black-Box Testing: A Comparative Analysis
architectures
education
robustness
hci
social-sciences
programming
ChatGPT shows promise in generating software test cases, matching human results and potentially enhancing collaboration for broader test coverage.
Jan 25, 2024

Integrating Large Language Models into Recommendation via Mutual Augmentation and Adaptive Aggregation
recommender
LLama4Rec integrates conventional and LLM-based recommendation models, addressing their respective strengths and weaknesses to improve recommendation performance.
Jan 25, 2024

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models
robustness
ServerlessLLM improves LLM inference speed by 10-200X through optimized checkpoint loading and server allocation.
Jan 25, 2024

Towards Goal-oriented Large Language Model Prompting: A Survey
education
hci
prompt-engineering
LLMs perform better with goal-oriented prompts, not relying on human-like thinking. A new taxonomy is presented for this method.
Jan 25, 2024

BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
architectures
prompt-engineering
BootPIG enables personalized image generation in text-to-image models using reference images, outperforming existing methods.
Jan 25, 2024

Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration
production
architectures
Proposes an architecture using multiple LLMs to achieve new state-of-the-art performance at lower cost.
Jan 25, 2024

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
production
architectures
Six-bit quantization (FP6) improves large language models (LLMs) on GPUs with TC-FPx kernel for optimized inference.
Jan 25, 2024

ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases
production
architectures
prompt-engineering
Reasoning over commonsense knowledge bases (CSKB) is challenging for large language models. ConstraintChecker plugin improves CSKB reasoning.
Jan 25, 2024

The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support
social-sciences
production
hci
LLM chatbots are used for mental health support, but have risks. Study analyzes user experiences and suggests ethical design recommendations.
Jan 25, 2024

GraphiMind: LLM-centric Interface for Information Graphics Design
architectures
hci
education
LLMs and GraphiMind simplify creating information graphics for non-professionals through language-based design tools.
Jan 24, 2024

Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes
architectures
prompt-engineering
education
production
Healthcare focuses on Large Language Models (LLMs) but needs better real-world assessments. GPT-4 performs best.
Jan 24, 2024

Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models
security
prompt-engineering
robustness
Detecting harmful memes is challenging due to implicit meanings. The proposed explainable approach uses reasoning and debate among language models for better detection.
Jan 24, 2024

MLLMReID: Multimodal Large Language Model-based Person Re-identification
architectures
education
social-sciences
TL;DR: Adapting MLLMs for person re-identification, addressing overfitting and feature utilization issues.
Jan 24, 2024

How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability
programming
hci
education
architectures
production
ChatGPT, an AI language model, shows potential for face biometrics tasks, aiming to improve transparency in decision-making.
Jan 24, 2024

A Repository-Level Dataset For Detecting, Classifying and Repairing Software Vulnerabilities
architectures
security
TL;DR: Open-source software vulnerabilities pose risks, and a new framework, ReposVul, addresses data limitations for vulnerability detection.
Jan 24, 2024

UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems
hci
production
LLMs lack personalization. UniMS-RAG system integrates multiple sources for more tailored responses, achieving state-of-the-art performance.
Jan 24, 2024

Clue-Guided Path Exploration: An Efficient Knowledge Base Question-Answering Framework with Low Computational Resource Consumption
architectures
prompt-engineering
production
New framework CGPE merges knowledge base with LLM, outperforming existing approaches, reducing computational demands.
Jan 24, 2024

How AI Ideas Affect the Creativity, Diversity, and Evolution of Human Ideas: Evidence From a Large, Dynamic Experiment
hci
production
social-sciences
Exposure to AI-generated ideas increases collective diversity, but not individual creativity. Disclosure and difficulty influenced the adoption of AI ideas.
Jan 24, 2024

The Calibration Gap between Model and Human Confidence in Large Language Models
social-sciences
hci
Large language models need well-calibrated confidence to be trusted. User perception can be improved with tailored explanations.
Jan 24, 2024

Prompt Weight Experiments for LLM Instruction Fine-Tuning
architectures
education
prompt-engineering
production
Study examines impact of prompt token classification loss weighting on LLaMA models fine-tuned on instruction tasks. Results vary based on dataset length.
Jan 24, 2024

Supporting Sensemaking of Large Language Model Outputs at Scale
education
prompt-engineering
Large language models (LLMs) present multiple responses. We design features to compare and present their outputs effectively.
Jan 24, 2024

Research about the Ability of LLM in the Tamper-Detection Area
education
architectures
production
security
robustness
Large Language Models (LLMs) effective in basic tamper detection, struggle with highly sophisticated forgeries and AI-generated images.
Jan 24, 2024

SpecLLM: Exploring Generation and Review of VLSI Design Specification with Large Language Model
architectures
production
robustness
Using large language models for automating architecture specification development shows promising potential for revolutionizing IC design.
Jan 24, 2024

Investigating the Efficacy of Large Language Models for Code Clone Detection
robustness
hci
social-sciences
prompt-engineering
programming
Large Language Models (LLMs) succeed in prompt-based code tasks. Preliminary study shows LLMs’ applicability in non-generative tasks like Code Clone Detection.
Jan 24, 2024

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
production
VisualWebArena benchmarks multimodal web agents for visually grounded tasks, addressing limitations in existing benchmarks.
Jan 24, 2024

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
architectures
AgentBoard is a benchmark and evaluation framework for analyzing large language models.
Jan 24, 2024

Fine-grained Contract NER using instruction based model
architectures
education
production
Instruction-based techniques improve few-shot learning, but LLMs struggle with NER. Paper proposes a task transformation for LLM adaptation.
Jan 24, 2024

MM-LLMs: Recent Advances in MultiModal Large Language Models
education
architectures
MM-LLMs have evolved and can support MM inputs and outputs. This survey provides design, models, performance, and future directions.
Jan 24, 2024

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
education
production
Introducing InstructDoc - a collection of VDU datasets and InstructDr model for flexible, high-performance document understanding.
Jan 24, 2024

Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design
education
LLMs like ChatGPT4 can assist in scientific tasks, but may have limitations in technical design.
Jan 24, 2024

Can AI Assistants Know What They Don’t Know?
production
education
robustness
AI assistants based on large language models can perform tasks well, but still make errors. A new method helps reduce mistakes.
Jan 24, 2024

It’s About Time: Incorporating Temporality in Retrieval Augmented Language Models
architectures
production
Global web search needs accurate and up-to-date info. TempRALM improves retrieval over RALM by considering temporal relevance.
Jan 24, 2024

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
production
TL;DR: SpeechGPT-Gen uses Chain-of-Information Generation to efficiently model semantic and perceptual information in large-scale speech generation, excelling in various…
Jan 24, 2024

Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4
programming
RCA is crucial for cloud service incident diagnosis. GPT-4 shows promise, but in-context learning outperforms fine-tuning.
Jan 24, 2024

TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data
architectures
prompt-engineering
production
We propose a Step-wise Pipeline using large language models for tabular and textual question answering, outperforming existing methods.
Jan 24, 2024

TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance
education
social-sciences
prompt-engineering
Larger language models excel at reasoning but transferring their abilities to smaller models is challenging. Teaching via Principle Discovery (TPD) framework effectively…
Jan 24, 2024

ULTRA: Unleash LLMs’ Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Refinement
architectures
production
TL;DR: ULTRA framework efficiently extracts event arguments from text using large language models, outperforming strong baselines by 9.8%.
Jan 24, 2024

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
production
architectures
Non-autoregressive LM-fused ASR system improves speech recognition, achieving up to 10.8% WER improvement. Ablation study explores key parameters’ impact.
Jan 23, 2024

Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network
production
architectures
LLM Chatbots face token streaming stalls due to network instability. The Chatterbox transport scheme reduces stalls by 71%.
Jan 23, 2024

The Neglected Tails of Vision-Language Models
architectures
Vision-language models display imbalanced performance, especially with rare concepts. The proposed method measures concept frequency and improves zero-shot recognition…
Jan 23, 2024

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents
production
architectures
AutoRT system leverages vision-language & large language models to guide autonomous robot deployment in new scenarios. Significantly scales up data collection.
Jan 23, 2024

BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
production
architectures
Bi-directional Tuning for lossless Acceleration (BiTA) boosts large language models (LLMs) speed without extra memory costs.
Jan 23, 2024

Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion
production
architectures
Transformer models struggle to learn structural recursion for programming tasks due to limitations in capturing syntax and semantics.
Jan 23, 2024

Evaluation of large language models for assessing code maintainability
production
programming
robustness
architectures
Open-source software and LLMs can automate tasks, but cross-entropy alone may not predict maintainability accurately.
Jan 23, 2024

Analyzing COVID-19 Vaccination Sentiments in Nigerian Cyberspace: Insights from a Manually Annotated Twitter Dataset
hci
social-sciences
TL;DR: Precautionary measures and vaccines combat COVID-19, but there are controversies on Twitter. Study uses transformer-based models to analyze Nigerians’ vaccine…
Jan 23, 2024

Knowledge Distillation from Language-Oriented to Emergent Communication for Multi-Agent Remote Control
hci
production
architectures
Comparison finds emergent communication (EC) incurs high training cost, while language-oriented semantic communication (LSC) yields high inference cost. Proposed…
Jan 23, 2024

LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools
prompt-engineering
production
education
programming
Interpretable AI tool LLMCheckup enables interactive dialogue with large language models and supports multiple input modalities.
Jan 23, 2024

Red Teaming Visual Language Models
production
robustness
architectures
VLMs tested with red teaming dataset RTVLM. VLMs struggle with up to 31% performance gap, while LLaVA-v1.5 boosted with red teaming alignment.
Jan 23, 2024

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
production
education
KAM-CoT framework enhances large language models with multimodal understanding using knowledge graphs and achieves superior performance.
Jan 23, 2024

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
social-sciences
production
architectures
TL;DR: HAZARD is a simulated benchmark designed to test embodied agents’ decision-making in dynamic disaster scenarios.
Jan 23, 2024

Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context
production
TL;DR: Integrating knowledge graphs and context-driven retrieval enhances Large Language Models on community Q&A platforms.
Jan 23, 2024

The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts
security
robustness
Study explores large language model safety challenges across languages, finding disparities in unsafe and irrelevant responses. Training impacts alignment.
Jan 23, 2024

The teachers are confused as well: A Multiple-Stakeholder Ethics Discussion on Large Language Models in Computing Education
social-sciences
education
robustness
prompt-engineering
architectures
Large Language Models (LLMs) pose ethical concerns in higher education, including misuse and degraded outcomes, requiring guidance and rules.
Jan 23, 2024

Benchmarking LLMs via Uncertainty Quantification
production
architectures
New benchmarking approach introduces uncertainty quantification for Large Language Models, revealing its significance in evaluation.
Jan 23, 2024

Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
prompt-engineering
education
architectures
LLMs can simulate role-play dialogues with Ditto, outperforming open-source baselines.
Jan 23, 2024

From Numbers to Words: Multi-Modal Bankruptcy Prediction Using the ECL Dataset
production
education
ECL dataset includes textual, numerical data from corporate filings. Various bankruptcy prediction models evaluated. Complementary modalities, limitations, GPT-based text…
Jan 23, 2024

Raidar: geneRative AI Detection viA Rewriting
production
Large language models (LLMs) alter human-written text more than AI-generated text. Our Raidar method improves AI content detection.
Jan 23, 2024

Generating Unsupervised Abstractive Explanations for Rumour Verification
hci
production
TL;DR: This study rethinks rumor verification by using explanatory summaries from social media conversations, with results matching human evaluation.
Jan 23, 2024

How well can large language models explain business processes?
architectures
LLMs used in AI-augmented business systems improve explanations but can reduce interpretability.
Jan 23, 2024

XAI for All: Can Large Language Models Simplify Explainable AI?
education
x-[plAIn] uses a custom language model to explain AI methods, tailored to different audiences, making XAI more accessible.
Jan 23, 2024

Can Large Language Models Write Parallel Code?
production
programming
architectures
Large Language Models can generate source code but struggle with complex tasks. PCGBench evaluates their performance.
Jan 23, 2024

Towards Socially and Morally Aware RL agent: Reward Design With LLM
social-sciences
RL agents need clear objectives to avoid behavior conflicting with human values. Language models may help assess and guide agent behavior.
Jan 23, 2024

Towards Trustable Language Models: Investigating Information Quality of Large Language Models
robustness
Large language models generate unreliable information, impacting decision-making and economic activity. New evaluation method introduced.
Jan 23, 2024

ChatGraph: Chat with Your Graphs
hci
ChatGraph simplifies graph data analysis using natural language, overcoming traditional limitations.
Jan 23, 2024

Assessing and Understanding Creativity in Large Language Models
hci
social-sciences
prompt-engineering
education
Assessing creativity in large language models using modified tests reveals shortcomings in originality and highlights the impact of design on creativity.
Jan 23, 2024

SLANG: New Concept Comprehension of Large Language Models
production
social-sciences
architectures
Large language models struggle to keep up with rapidly evolving internet slang and memes. Proposed benchmark SLANG and FOCUS approach improve comprehension without…
Jan 23, 2024

From Understanding to Utilization: A Survey on Explainability for Large Language Models
production
Explainability for Large Language Models (LLMs) is essential, and this paper reviews methods for improving transparency and reliability.
Jan 23, 2024

Revisiting Demonstration Selection Strategies in In-Context Learning
programming
LLMs’ in-context learning performance varies with demonstration choice. New method improves language tasks.
Jan 22, 2024

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
architectures
social-sciences
production
Automated CXR interpretation using CheXagent outperforms other models, with fairness evaluation.
Jan 22, 2024

CodeTailor: Personalized Parsons Puzzles are Preferred Over AI-Generated Solutions to Support Learning
programming
Generative AI system supports novice programmers with personalized Parsons puzzles, promoting engagement and learning.
Jan 22, 2024

The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models
architectures
prompt-engineering
production
MLLMs integrate verbal and visual info, but struggle with abstract reasoning. Chain-of-Thought prompting improves performance.
Jan 22, 2024

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
architectures
education
production
CMMMU evaluates Chinese multimodal models on college-level tasks, highlighting the need for improvement.
Jan 22, 2024

SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese
production
SuperCLUE-Math6 is a new Chinese math dataset to evaluate reasoning abilities of language models.
Jan 22, 2024

Hallucination is Inevitable: An Innate Limitation of Large Language Models
architectures
robustness
Hallucination in large language models cannot be completely eliminated due to fundamental limitations.
Jan 22, 2024

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
prompt-engineering
production
TL;DR: RPG framework enhances text-to-image models using multimodal LLMs, achieving better performance in complex image generation and editing tasks.
Jan 22, 2024

Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts for Open-Domain QA?
architectures
hci
LLMs favor generated over retrieved contexts due to similarity and segmentation issues.
Jan 22, 2024

Analyzing the Effectiveness of Large Language Models on Text-to-SQL Synthesis
programming
Study compares LLM approaches for Text-to-SQL synthesis using spider dataset, achieving high accuracy and identifying common query errors.
Jan 22, 2024

Revolutionizing Finance with LLMs: An Overview of Applications and Insights
architectures
hci
prompt-engineering
production
Large Language Models (LLMs) like ChatGPT are being applied in finance for automating report generation, market analysis, and personalized advice.
Jan 22, 2024

Program Decomposition and Translation with Static Analysis
hci
prompt-engineering
programming
Large Language Models (LLMs) used for code tasks benefit from method-level program decomposition for processing very large files.
Jan 22, 2024

Multimodal Deep Learning of Word-of-Mouth Text and Demographics to Predict Customer Rating: Handling Consumer Heterogeneity in Marketing
social-sciences
hci
production
Using online product reviews and consumer profiles, this study constructs a model to understand consumer heterogeneity.
Jan 22, 2024

WARM: On the Benefits of Weight Averaged Reward Models
architectures
production
TL;DR: Reinforcement learning can lead to reward hacking in language models. WARM improves reliability and efficiency.
Jan 22, 2024

Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models
architectures
hci
production
New method, S2L, improves large language models’ performance on symbol-related tasks by converting symbols to language-based representations.
Jan 22, 2024

The Conversation is the Command: Interacting with Real-World Autonomous Robot Through Natural Language
architectures
production
Approach uses language and vision models to improve human-robot interaction in real-world settings.
Jan 22, 2024

Temporal Blind Spots in Large Language Models
architectures
production
LLMs struggle with temporal understanding, leading to low performance on temporal QA tasks.
Jan 22, 2024

Text Embedding Inversion Attacks on Multilingual Language Models
architectures
security
Text embeddings in NLP pose security risks, especially for multilingual models. More research and defenses needed.
Jan 22, 2024

The Ethics of Interaction: Mitigating Security Threats in LLMs
social-sciences
security
education
robustness
Ethical challenges of security threats to Language Learning Models, propose evaluative tool for defense.
Jan 22, 2024

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
production
Binoculars accurately detects machine-generated text from various language models without training data.
Jan 22, 2024

Improving Small Language Models’ Mathematical Reasoning via Mix Thoughts Distillation
prompt-engineering
production
TL;DR: New methods compress large language models into smaller ones without losing performance.
Jan 22, 2024

GRATH: Gradual Self-Truthifying for Large Language Models
prompt-engineering
GRATH improves large language models’ truthfulness without compromising other capabilities, achieving state-of-the-art performance on TruthfulQA.
Jan 22, 2024

An Empirical Analysis of In-context Learning Abilities of LLMs for MT
architectures
social-sciences
production
ICL in LLMs for NLG tasks impacted by perturbations, model type, noise, and pretraining.
Jan 22, 2024

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
security
hci
robustness
social-sciences
production
Multi-agent systems with LLMs pose safety risks due to dark psychological states of agents. Proposed framework addresses issues.
Jan 22, 2024

Using Large Language Model for End-to-End Chinese ASR and NER
architectures
New speech integration approach with Whisper encoder outperforms traditional LLM in ASR tasks and achieves SOTA F1 score.
Jan 21, 2024

Interactive AI with Retrieval-Augmented Generation for Next Generation Networking
architectures
Summary: Discusses the integration of interactive AI (IAI) into networking to enhance functionality and management, proposing a framework and suggesting future research.
Jan 21, 2024

General Flow as Foundation Affordance for Scalable Robot Learning
architectures
Scalable robot learning using flow prediction achieves 81% success in skill transfer, offering stable and universal training with public resources.
Jan 21, 2024

Enhancing Recommendation Diversity by Re-ranking with Large Language Models
hci
education
architectures
recommender
production
TL;DR: Recommender Systems need diverse recommendations. Large Language Models can help with diversity re-ranking but traditional methods outperform them.
Jan 21, 2024

Integration of Large Language Models in Control of EHD Pumps for Precise Color Synthesis
architectures
production
TL;DR: Integrating language models with EHD pumps for precise color synthesis in automation. Improves user interaction with complex hardware systems.
Jan 21, 2024

Over-Reasoning and Redundant Calculation of Large Language Models
education
production
Large language models generate redundant calculations in solving math problems, despite unnecessary, according to a study on GSM8K-Zero.
Jan 21, 2024

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
architectures
TL;DR: Linear Alignment algorithm improves AI assistants’ alignment with human preferences without complex training.
Jan 21, 2024

MedLM: Exploring Language Models for Medical Question Answering Systems
architectures
Study evaluates medical-specific LLMs for Q&A, comparing performance and fine-tuning effectiveness. Insights for medical domain applications.
Jan 21, 2024

AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology
architectures
production
Large language models use Transformer architectures for natural language processing. AttentionLego accelerator enhances performance.
Jan 21, 2024

Towards Reliable and Factual Response Generation: Detecting Unanswerable Questions in Information-Seeking Conversations
robustness
Approach uses AI to find and summarize relevant passages, improving answer accuracy and trust in conversational AI models.
Jan 21, 2024

Analyzing Task-Encoding Tokens in Large Language Models
prompt-engineering
In-context learning (ICL) in NLP uses task-encoding tokens to store reasoning procedures, improving computational efficiency and sequence handling.
Jan 20, 2024

Self-Rewarding Language Models
production
architectures
Models need superhuman feedback for training signals. A self-rewarding language model outperforms existing systems.
Jan 18, 2024

Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap
production
architectures
Large Language Models (LLMs) and Evolutionary Algorithms (EAs) show mutual potential for collaboration and optimization in diverse applications.
Jan 18, 2024

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access
production
robustness
architectures
New approach, sketch-guided constrained decoding (SGCD), allows controlling blackbox language models without accessing their logits.
Jan 18, 2024

Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation
production
architectures
Open large language models (LLMs) can generate coherent text from structured data, but semantic accuracy remains a major issue.
Jan 18, 2024

Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
education
prompt-engineering
Novel approach develops Large Multi-Modal Model with explicit reasoning and question-asking for robust visual content interpretation.
Jan 18, 2024

Large Language Models for Scientific Information Extraction: An Empirical Study for Virology
production
education
architectures
Automated structured summaries of scholarly content aiding navigation and LLMs’ potential in intricate information extraction tasks.
Jan 18, 2024

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
architectures
DistServe enhances large language model serving by separating prefill and decoding computation, reducing interference, and improving performance.
Jan 18, 2024

ChatQA: Building GPT-4 Level Conversational QA Models
production
education
architectures
ChatQA family achieves GPT-4 level accuracies using two-stage tuning method and dense retriever for conversational QA.
Jan 18, 2024

All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
production
robustness
security
prompt-engineering
architectures
Study introduces a simple method to generate harmful prompts for large language models, achieving high attack success rates.
Jan 18, 2024

A Fast, Performant, Secure Distributed Training Framework For Large Language Model
production
robustness
security
architectures
TL;DR: Proposed secure distributed model slicing method using TEE to prevent data theft and enhance model performance.
Jan 18, 2024

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
prompt-engineering
architectures
TL;DR: SkyEyeGPT is a new multi-modal language model designed for remote sensing data tasks, showing superior performance in vision-language understanding.
Jan 18, 2024

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
production
robustness
security
architectures
TL;DR: R-Judge benchmark evaluates language models’ ability to judge safety risks in diverse environments. GPT-4 scores 72.29% compared to human 89.38%.
Jan 18, 2024

Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
programming
education
prompt-engineering
architectures
Code prompts trigger conditional reasoning in language models, improving performance on reasoning tasks. They require natural language text and high-quality code.
Jan 18, 2024

Large Language Model Lateral Spear Phishing: A Comparative Study in Large-Scale Organizational Settings
robustness
security
LLMs enable sophisticated phishing attacks. Research highlights shortcomings and proposes machine learning-based detection techniques with high accuracy.
Jan 18, 2024

A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation
production
social-sciences
Comparison of Language Models and Crowdsourcing for label aggregation reveals potential enhancement with a hybrid approach.
Jan 18, 2024

Leveraging Biases in Large Language Models: bias-kNN’’ for Effective Few-Shot Learning
production
social-sciences
architectures
Study introduces ‘bias-kNN’ method harnessing model biases for improved performance across diverse datasets and GPT-2 sizes.
Jan 18, 2024

LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge
production
SoC analysts manually customize threat reports; LOCALINTEL automates this process using global and local knowledge databases.
Jan 18, 2024

Spatial-Temporal Large Language Model for Traffic Prediction
production
architectures
Traffic prediction improved using Spatial-Temporal Large Language Model (ST-LLM), surpassing existing models in accuracy and robustness.
Jan 18, 2024

Evolutionary Multi-Objective Optimization of Large Language Model Prompts for Balancing Sentiments
production
hci
prompt-engineering
architectures
Summary: Evolutionary multi-objective approach (EMO-Prompts) optimizes prompts for large language models, enhancing performance in sentiment analysis.
Jan 18, 2024

A Survey on Hardware Accelerators for Large Language Models
production
architectures
LLMs are powerful for natural language processing, but face computational challenges. The paper surveys hardware accelerators to enhance their performance.
Jan 18, 2024

Comparing Traditional and LLM-based Search for Image Geolocation
production
hci
Comparing traditional and LLM-based search for image geolocation. Traditional more accurate; LLM users issued longer queries.
Jan 18, 2024

DiffusionGPT: LLM-Driven Text-to-Image Generation System
production
prompt-engineering
DiffusionGPT combines language models and domain-specific trees to enhance image generation flexibility and performance.
Jan 18, 2024

Augmenting Math Word Problems via Iterative Question Composing
production
robustness
A dataset is introduced to improve math reasoning in language models, achieving 5.8% higher accuracy on math problems.
Jan 17, 2024

Material Informatics through Neural Networks on Ab-Initio Electron Charge Densities: the Role of Transfer Learning
production
This work explores using Neural Networks to extract representations from electron charge density profiles in Materials Science, emphasizing the role of transfer learning.
Jan 17, 2024

Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs’ Mathematical Competency through Ontology-guided Perturbations
production
architectures
education
Advancements in language models excel in reasoning, but struggle with math; created dataset exposes limitations. Models’ robustness questioned.
Jan 17, 2024

InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
production
architectures
Buff improves long-sequence language model training efficiency with effective parallelism and memory management for better performance.
Jan 17, 2024

Code Simulation Challenges for Large Language Models
architectures
education
programming
production
hci
prompt-engineering
LLMs struggle to simulate longer computer code but CoSm method helps improve performance without memorization.
Jan 17, 2024

Herding LLaMaS: Using LLMs as an OS Module
production
architectures
LLaMaS adapts easily to new devices using language models for OS decisions. Reduces admin burden.
Jan 17, 2024

Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models
hci
social-sciences
Remote Sensing ChatGPT connects AI-based remote sensing models for interpretation tasks.
Jan 17, 2024

Vlogger: Make Your Dream A Vlog
production
architectures
prompt-engineering
Vlogger AI system creates complex vlogs from text using a Large Language Model and video diffusion model. State-of-the-art results.
Jan 17, 2024

Large Language Models Are Neurosymbolic Reasoners
production
prompt-engineering
education
This paper explores using Large Language Models (LLMs) as symbolic reasoners in text-based games, achieving 88% average task performance.
Jan 17, 2024

What makes for a ‘good’ social actor? Using respect as a lens to evaluate interactions with language agents
hci
social-sciences
Ethical dialogue agents need to be helpful, honest, and avoid harm, considering social context.
Jan 17, 2024

AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
security
production
architectures
prompt-engineering
Novel evaluation method for jailbreak attacks on Large Language Models, offering comprehensive scoring and dataset for future research.
Jan 17, 2024

Aligning Large Language Models with Counterfactual DPO
robustness
social-sciences
prompt-engineering
Advancements in large language models have challenges aligning response styles. Counterfactual prompting with DPO can help without human intervention.
Jan 17, 2024

ReFT: Reasoning with Reinforced Fine-Tuning
production
architectures
prompt-engineering
SFT uses CoT annotations, but ReFT with PPO reinforcement learning outperforms SFT for reasoning.
Jan 17, 2024

LLMs for Relational Reasoning: How Far are We?
production
architectures
prompt-engineering
education
LLMs struggle with reasoning in complex decision-making and logic tasks.
Jan 17, 2024

Learning Shortcuts: On the Misleading Promise of NLU in Language Models
social-sciences
LMs show enhanced performance via shortcuts, lacking generalizability. This affects NLU evaluation and requires deeper research for robust models.
Jan 17, 2024

Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods Study
social-sciences
Tool uses large language models to simplify clinical notes, benefiting patient understanding, but may introduce errors requiring human oversight.
Jan 17, 2024

ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change
social-sciences
hci
ClimateGPT synthesizes climate research, trained on a large dataset, optimized for retrieval, accessible to non-English speakers, and performs well in climate benchmarks.
Jan 17, 2024

BibSonomy Meets ChatLLMs for Publication Management: From Chat to Publication Management: Organizing your related work using BibSonomy & LLMs
production
architectures
New system uses chat-based language models to simplify scientific publication management with improved retrieval and organization.
Jan 17, 2024

Canvil: Designerly Adaptation for LLM-Powered User Experiences
architectures
education
production
hci
prompt-engineering
Large language models (LLMs) can be used in user experiences, and designers have a role in shaping responsible LLM-powered products.
Jan 17, 2024

Understanding the concerns and choices of public when using large language models for healthcare
production
architectures
robustness
LLMs are increasingly used by the public for healthcare information, offering accuracy and convenience, but ethical considerations remain.
Jan 17, 2024

The Effect of Group Status on the Variability of Group Representations in LLM-generated Text
hci
social-sciences
LLMs reproduce biases, portraying certain groups as less homogeneous. Potential to reinforce stereotypes.
Jan 16, 2024

Ask the experts: sourcing high-quality datasets for nutritional counselling through Human-AI collaboration
social-sciences
production
robustness
hci
LLMs can generate nutrition counseling data, but may produce biased and harmful content.
Jan 16, 2024

Large Language Models are Null-Shot Learners
prompt-engineering
architectures
robustness
production
education
Null-shot prompting exploits LLM hallucination to improve task performance, with potential for model comparison.
Jan 16, 2024

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models
programming
TL;DR: SpecGen uses Large Language Models to automate formal program specification generation, outperforming existing methods for complex programs.
Jan 16, 2024

LLMs for Test Input Generation for Semantic Caches
education
LLMs enable semantic capabilities, but are costly. VaryGen generates test queries for semantic caches.
Jan 16, 2024

Generative Multi-Modal Knowledge Retrieval with Large Language Models
production
architectures
Proposing a new framework for multi-modal knowledge retrieval using large language models.
Jan 16, 2024

Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models
production
architectures
Inferflow is an efficient, configurable inference engine for large language models with key features.
Jan 16, 2024

PRewrite: Prompt Rewriting with Reinforcement Learning
education
prompt-engineering
architectures
production
TL;DR: PRewrite automates prompt engineering, outperforming manual and previous methods.
Jan 16, 2024

Understanding User Experience in Large Language Model Interactions
architectures
social-sciences
production
hci
education
LLMs need user-centered focus for human-AI collaboration, addressing satisfaction, concerns, and future research paths.
Jan 16, 2024

DAPT: A Dual Attention Framework for Parameter-Efficient Continual Learning of Large Language Models
production
architectures
Propose a Dual Attention Framework to improve continual learning for large language models.
Jan 16, 2024

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
education
HuixiangDou is a technical assistant for algorithm developers, designed for group chat scenarios, with code available on GitHub.
Jan 16, 2024

Application of LLM Agents in Recruitment: A Novel Framework for Resume Screening
education
architectures
production
LLM-based agent framework speeds up resume screening, improves decision-making, and outperforms GPT-3.5.
Jan 16, 2024

RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning
architectures
production
RoTBench evaluates LLMs’ robustness in tool learning with diverse environments and proposes RoTTuning.
Jan 16, 2024

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
education
production
architectures
AI agents advancing with large language models, DoraemonGPT handles dynamic video tasks efficiently.
Jan 16, 2024

MARIO: MAth Reasoning with code Interpreter Output – A Reproducible Pipeline
production
architectures
TL;DR: Large language models struggle with mathematical reasoning, but new dataset and protocol improve performance.
Jan 16, 2024

RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
production
architectures
Developers use RAG and fine-tuning with LLMs, showing improved accuracy and knowledge incorporation.
Jan 16, 2024

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
hci
social-sciences
MultiPLY is a large language model that incorporates multisensory interactive data for improved performance.
Jan 16, 2024

Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models
production
architectures
Evolution of Neural Machine Translation influenced by 6 core challenges, LLMs address some but new challenges arise.
Jan 16, 2024

Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive Analysis
social-sciences
robustness
hci
prompt-engineering
TL;DR: Large language models have pros and cons, but Synthetic-Siamese detector improves reliability in academic writing.
Jan 16, 2024

Whispering Pixels: Exploiting Uninitialized Register Accesses in Modern GPUs
security
robustness
GPUs serve as powerful platforms for non-graphical tasks but have vulnerabilities leading to data leakage.
Jan 16, 2024

LLM-Guided Multi-View Hypergraph Learning for Human-Centric Explainable Recommendation
recommender
production
Proposes LLMHG framework for personalized, explainable recommendation systems, outperforming traditional models.
Jan 16, 2024

Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model’s Generalizability in Permafrost Mapping
prompt-engineering
Assessing AI foundation models for computer vision in natural landscapes. Testing Meta’s Segment Anything Model performance for geospatial tasks.
Jan 16, 2024

Supporting Student Decisions on Learning Recommendations: An LLM-Based Chatbot with Knowledge Graph Contextualization for Conversational Explainability and Mentoring
recommender
prompt-engineering
education
hci
Chatbots help students understand learning recommendations, but still need human mentor support.
Jan 16, 2024

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
production
architectures
13B LLM-based translation models have shortcomings, but new approach improves performance to match or exceed competition winners.
Jan 16, 2024

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
programming
AlphaCodium improves LLMs’ performance on code generation tasks, increasing accuracy from 19% to 44%.
Jan 16, 2024

Can Large Language Models Explain Themselves?
security
prompt-engineering
Large language models (LLMs) need accurate self-explanations to ensure AI safety.
Jan 15, 2024

Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
security
social-sciences
hci
Challenges in managing large AI models, especially for sentiment analysis, due to stability issues and uncertainty in handling text attacks.
Jan 15, 2024

MAPLE: Multilingual Evaluation of Parameter Efficient Finetuning of Large Language Models
social-sciences
Parameter efficient finetuning improves language model performance, but can impact English and low-resource languages.
Jan 15, 2024

JumpCoder: Go Beyond Autoregressive Coder via Online Modification
programming
JumpCoder improves code large language models with non-sequential generation, achieving significant performance gains.
Jan 15, 2024

SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning
education
prompt-engineering
SciGLM enhances large language models for scientific reasoning, addressing data scarcity in science.
Jan 15, 2024

Prompting open-source and commercial language models for grammatical error correction of English learner text
prompt-engineering
social-sciences
Generative AI can produce fluent texts and attempt grammatical error correction, but performance varies.
Jan 15, 2024

Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
security
robustness
prompt-engineering
TL;DR: New ‘Signed-Prompt’ method defends against prompt injection attacks in AI.
Jan 15, 2024

Question Translation Training for Better Multilingual Reasoning
education
prompt-engineering
social-sciences
Large language models struggle in non-English languages, but question alignment improves multilingual reasoning.
Jan 15, 2024

Authorship Obfuscation in Multilingual Machine-Generated Text Detection
security
social-sciences
robustness
Latest Large Language Models (LLMs) can generate disinformation, evading detection in multiple languages.
Jan 15, 2024

On Inter-dataset Code Duplication and Data Leakage in Large Language Models
programming
education
robustness
Large language models (LLMs) may have inflated performance metrics due to inter-dataset code duplication.
Jan 15, 2024

The Pitfalls of Defining Hallucination
social-sciences
robustness
NLG evaluation lacks clarity, proposes logic-based synthesis of hallucination and omission classifications.
Jan 15, 2024

When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment
education
AI agents in 6G networks use split learning for better user interaction and privacy.
Jan 15, 2024

A Study on Large Language Models’ Limitations in Multiple-Choice Question Answering
education
Small open-source language models struggle with Multiple Choice Question tasks, requiring caution when using them.
Jan 15, 2024

The What, Why, and How of Context Length Extension Techniques in Large Language Models – A Detailed Survey
social-sciences
LLMs improve NLP, but struggle with context length. Survey explores challenges and strategies for improvement.
Jan 15, 2024

Exploring the Potential of Large Language Models in Self-adaptive Systems
education
LLMs can enhance SAS, but potential is unexplored due to lack of literature. Interdisciplinary approach needed.
Jan 15, 2024

A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models
programming
robustness
TL;DR Large Language Models can use Round-Trip Translation to repair bugs in code.
Jan 15, 2024

Active Learning for NLP with Large Language Models
social-sciences
Active Learning reduces labeling cost and uses Large Language Models for sample annotation in Natural Language Processing.
Jan 14, 2024

PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits
education
prompt-engineering
social-sciences
hci
Large language models can now curate personalization-focused conversational datasets effectively. This study presents the PersonalityChat dataset and shows improved dialogue…
Jan 14, 2024

Investigating Data Contamination for Pre-training Language Models
production
architectures
Pre-trained language models could be artificially boosted by including evaluation data in their training corpus, impacting their performance.
Jan 11, 2024

How to write a CHI paper (asking for a friend)
social-sciences
AI tool KITSUNE aids authors in adhering to CHI paper format and conventions. Questions the influence of LLMs on academic writing.
Jan 11, 2024

Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback
architectures
Use of PLPF enhances LLMs in medical dialogue by 17.6%, improving accuracy in multi-round and single-round tasks.
Jan 11, 2024

Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models
production
architectures
prompt-engineering
Developed autocompletion tool using machine learning models to improve documenting Chief Complaints, BioGPT-Large showed superior performance.
Jan 11, 2024

Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models
production
architectures
Patchscopes framework explains large language model behavior, addresses shortcomings, and unlocks new applications.
Jan 11, 2024

TOFU: A Task of Fictitious Unlearning for LLMs
production
architectures
robustness
Unlearning methods for language models to forget private data are ineffective, prompting the need for improved approaches.
Jan 11, 2024

Probing Structured Semantics Understanding and Generation of Language Models via Question Answering
prompt-engineering
LLMs evaluated for structured semantics in question answering, with potential for improvement in logical form generation.
Jan 11, 2024

LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase
hci
programming
Rise of large language models raises concerns about mixed machine and human-generated text. Existing detectors struggle to accurately identify mixcase.
Jan 11, 2024

Scaling Laws for Forgetting When Fine-Tuning Large Language Models
robustness
architectures
Fine-tuning large language models suffers from catastrophic forgetting, even with parameter-efficient strategies like LoRA. Forgetting cannot be avoided easily.
Jan 11, 2024

Designing Heterogeneous LLM Agents for Financial Sentiment Analysis
production
hci
architectures
Large language models (LLMs) improve financial sentiment analysis with a new design framework and demonstrate better accuracy.
Jan 11, 2024

Zero Resource Cross-Lingual Part Of Speech Tagging
architectures
Using alignment models can help predict POS tags in low-resource languages, benefiting from transfer learning with multilingual models.
Jan 11, 2024

The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
education
architectures
prompt-engineering
CCoT prompts reduced response length without impacting problem-solving, with implications for AI systems and researchers.
Jan 11, 2024

Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
production
architectures
RLMEC is a new reinforcement learning method for language models, using generative rewards to focus on key tokens.
Jan 11, 2024

Natural Language Processing for Dialects of a Language: A Survey
social-sciences
This survey explores NLP performance on dialect datasets, covering various NLP tasks and languages, aiming to improve equity in language technologies.
Jan 11, 2024

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
production
architectures
robustness
New method, AlignInstruct, improves large language model (LLM) translation for unseen languages and low-resource languages using cross-lingual supervision.
Jan 11, 2024

Evidence to Generate (E2G): A Single-agent Two-step Prompting for Context Grounded and Retrieval Augmented Reasoning
production
programming
architectures
prompt-engineering
New E2G prompting framework improves reasoning in LLMs, outperforming current methods on various tasks.
Jan 11, 2024

Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models
production
architectures
prompt-engineering
Training for machine translation has shifted to finetuning pre-trained language models, enhancing multilingual translation. The approach consistently improves performance.
Jan 11, 2024

POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation
prompt-engineering
UNMT methods for LRLs face challenges, but POMP improves translation quality significantly.
Jan 11, 2024

Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs
production
programming
architectures
LLMs’ code understanding performance is assessed using code mutations, showing variation in capability across different types and programming languages.
Jan 11, 2024

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
robustness
security
architectures
production
LLMs’ capabilities in NLP are hindered by safety and security concerns. This paper proposes a taxonomy to analyze and mitigate the risks associated with LLM systems.
Jan 11, 2024

SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully
robustness
programming
TL;DR: Self-Highlighted Hesitation (SH2) method improves LLMs’ accuracy and reduces hallucinations during text generation.
Jan 11, 2024

EpilepsyLLM: Domain-Specific Large Language Model Fine-tuned with Epilepsy Medical Knowledge
production
social-sciences
Fine-tuned EpilepsyLLM provides specialized, accurate medical knowledge for epilepsy in Japanese language, improving responses.
Jan 11, 2024

CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer
social-sciences
A new framework, CAT-LLM, improves Chinese article-style transfer using large language models, enhancing accuracy and applicability.
Jan 11, 2024

Transformers are Multi-State RNNs
production
architectures
robustness
TL;DR: Transformers can be conceptualized as infinite multi-state RNNs, and a new conversion policy, TOVA, significantly outperforms existing techniques.
Jan 11, 2024

Large Language Models vs. Search Engines: Evaluating User Preferences Across Varied Information Retrieval Scenarios
hci
architectures
recommender
Study compares user preferences for Search Engines and Large Language Models in various scenarios. Insights for future innovations.
Jan 11, 2024

Using Large Language Models for Commit Message Generation: A Preliminary Study
production
architectures
robustness
programming
Study evaluates using large language models like Llama 2 and ChatGPT to generate Git commit messages. Results show promising potential.
Jan 11, 2024

Chain of History: Learning and Forecasting with LLMs for Temporal Knowledge Graph Completion
architectures
Paper proposes using LLMs for Temporal Knowledge Graph Completion, outperforming existing models in experiments.
Jan 11, 2024

Towards Conversational Diagnostic AI
social-sciences
hci
AI system AMIE outperformed PCPs in diagnostic accuracy and performance according to specialists and patients, but real-world translation requires further research.
Jan 11, 2024

Extreme Compression of Large Language Models via Additive Quantization
production
New algorithm improves large language model compression, achieving better accuracy at low bit counts.
Jan 11, 2024

Leveraging Print Debugging to Improve Code Generation in Large Language Models
architectures
programming
robustness
production
In-context learning improves large language models’ debugging in coding, outperforming rubber duck debugging in Leetcode problems.
Jan 10, 2024

AUTOACT: Automatic Agent Learning from Scratch via Self-Planning
architectures
production
AutoAct is an automatic agent learning framework that eliminates reliance on large-scale annotated data and synthetic trajectories. It outperforms strong baselines with…
Jan 10, 2024

Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis
recommender
architectures
prompt-engineering
production
Study explores using large language models as recommender systems through prompting engineering, analyzing impacts and proposing a general framework.
Jan 10, 2024

Machine Teaching for Building Modular AI Agents based on Zero-shot Learners
education
New method enhances AI agents using large language models as zero-shot learners, reducing reliance on human supervision.
Jan 10, 2024

Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
production
architectures
social-sciences
TL;DR: A new method uses large language models to collect data through self-talk dialogues for fine-tuning and improving conversation quality.
Jan 10, 2024

I am a Strange Dataset: Metalinguistic Tests for Language Models
production
architectures
New dataset I am a Strange Dataset tests large language models in metalinguistic tasks, with mixed results.
Jan 10, 2024

ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain
production
architectures
social-sciences
New Chinese evaluation benchmark ANGO introduces keypoint categorization and quantifiable difficulty levels for better model analysis.
Jan 10, 2024

Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing
production
architectures
Efficiently process long sequence input using FIFO memory, eviction policies, and Attendre layer for LLMs. Tested on TriviaQA task.
Jan 10, 2024

Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model Benchmarking
production
architectures
Paper introduces LLM-based system to manage factory knowledge efficiently, yielding benefits, but human expert preference exists. GPT-4 outperforms other LLMs.
Jan 10, 2024

Can ChatGPT Rival Neural Machine Translation? A Comparative Study
architectures
social-sciences
prompt-engineering
hci
Comparison of ChatGPT and NMT in translating Chinese diplomatic texts, showing potential for ChatGPT with proper prompts.
Jan 10, 2024

The Impact of Reasoning Step Length on Large Language Models
prompt-engineering
Expanding reasoning steps in prompts improves large language models’ abilities, especially for complex tasks. Shortening steps diminishes performance.
Jan 10, 2024

Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate Group Conversations
architectures
hci
Advancements in large language models enable multi-user chatbots with 3W design dimensions and a new framework, MUCA.
Jan 10, 2024

Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?
robustness
production
prompt-engineering
social-sciences
hci
Large Language Models exhibit ToM abilities in Human Robot Interaction task but fail perturbation tests.
Jan 10, 2024

Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs
social-sciences
hci
Are LLMs like photocopiers or printing presses, only transmitting info? Novel text may rely on human content. LLMs may have a limited form of agency.
Jan 10, 2024

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
prompt-engineering
InfiAgent-DABench is a benchmark to evaluate LLM-based agents in data analysis. It includes DAEval dataset, agent framework, and toolkits.
Jan 10, 2024

CASA: Causality-driven Argument Sufficiency Assessment
social-sciences
prompt-engineering
Existing methods for argument sufficiency assessment rely on human-annotated data, but CASA proposes a causality-driven framework using large language models to identify…
Jan 10, 2024

Divide and Conquer for Large Language Models Reasoning
architectures
prompt-engineering
Propose Divide and Conquer approach to improve reasoning of LLMs, achieve significant performance boosts in various tasks.
Jan 10, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
social-sciences
security
robustness
prompt-engineering
AI models can learn to behave deceptively, and current safety training techniques may not effectively detect and remove such behavior.
Jan 10, 2024

INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges
production
architectures
INACIA uses AI to automate case analysis for Brazilian Federal Court, with potential for global legal system integration.
Jan 10, 2024

Pre-trained Large Language Models for Financial Sentiment Analysis
production
architectures
TL;DR: Using large language models for financial sentiment analysis outperforms prior algorithms with limited training data.
Jan 10, 2024

Can AI Write Classical Chinese Poetry like Humans? An Empirical Study Inspired by Turing Test
social-sciences
architectures
production
This paper challenges the belief that AI cannot match human creativity and sentiment, showing recent LLMs can compose classical Chinese poetry indistinguishable from humans.
Jan 10, 2024

MISS: A Generative Pretraining and Finetuning Approach for Med-VQA
production
Medical VQA is complex, lacking data. Proposal for MISS for generative VQA, using Transfer-and-Caption method, shows promising results.
Jan 10, 2024

Aligning Translation-Specific Understanding to General Understanding in Large Language Models
architectures
production
New translation process xIoD improves language model translation by aligning specific and general understandings, with +3.85 COMET.
Jan 10, 2024

Agent Alignment in Evolving Social Norms
social-sciences
hci
LLMs need alignment with human values; propose EvolutionaryAgent for better adaptation to social norms.
Jan 9, 2024

DebugBench: Evaluating Debugging Capability of Large Language Models
architectures
robustness
programming
production
LLMs’ debugging capability evaluated with ‘DebugBench’ benchmark, showing mixed performance and bug category complexity.
Jan 9, 2024

Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection Dataset
production
robustness
prompt-engineering
hci
TL;DR: Large language models can be used to create fake news and misinformation; proposing an approach to identify and detect misinformation.
Jan 9, 2024

TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models
architectures
production
(TL;DR) Large language models (LLMs) excel in professional domains, but their performance in transportation tasks needs improvement, leading to the proposal of…
Jan 9, 2024

The Critique of Critique
architectures
social-sciences
MetaCritique evaluates critique quality through precision and recall scores, using AIUs for detailed assessment and providing natural language rationale.
Jan 9, 2024

Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs
prompt-engineering
recommender
production
TL;DR: The paper proposes a new method for user targeting using natural language demands transformed into logical languages, leveraging large language models.
Jan 9, 2024

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
architectures
prompt-engineering
production
TL;DR: Chain-of-Table framework leverages tabular data in reasoning chain for better predictions in table understanding tasks.
Jan 9, 2024

Language Detection for Transliterated Content
architectures
production
social-sciences
hci
Internet transcends barriers, transliteration challenges addressed using BERT and Google Translate API.
Jan 9, 2024

SonicVisionLM: Playing Sound with Vision Language Models
architectures
recommender
SonicVisionLM generates sound effects for silent videos using vision language models, improving audio-visual alignment.
Jan 9, 2024

TechGPT-2.0: A large language model project to solve the task of knowledge graph construction
architectures
robustness
production
TechGPT-2.0 enhances large language models and supports Chinese open-source community, with robust text processing capabilities in multiple domains.
Jan 9, 2024

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
architectures
production
PEFT method RoSA improves LLM performance with limited resources. Sparse GPU kernels support. Code available.
Jan 9, 2024

Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search
architectures
programming
production
Code search improved by ReCo for style normalization, boosting retrieval accuracy with new metric.
Jan 9, 2024

MERA: A Comprehensive LLM Evaluation in Russian
architectures
production
Summary: This article introduces MERA, a benchmark for evaluating Russian language models, aiming to understand their capabilities, limitations, and associated risks.
Jan 9, 2024

Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive Learning
production
hci
Entity-based contrastive learning framework improves robustness of dialogue systems, achieving state-of-the-art performance in real-world noisy contexts.
Jan 9, 2024

Exploring Prompt-Based Methods for Zero-Shot Hypernym Prediction with Large Language Models
prompt-engineering
production
Zero-shot hypernymy prediction using large language models through prompt selection, additional information, and iterative approach.
Jan 9, 2024

Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
social-sciences
hci
Large language models (LLMs) integrate with robots for task planning, with a focus on multimodal LLMs for enhanced performance.
Jan 9, 2024

Enhanced Automated Code Vulnerability Repair using Large Language Models
programming
security
Novel code repair format using LLMs improves accuracy, sets new standards for digital security.
Jan 8, 2024

TextMachina: Seamless Generation of Machine-Generated Text Datasets
programming
Advancements in LLMs lead to MGT, but misuse challenges addressed by TextMachina framework.
Jan 8, 2024

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
production
SSMs challenge Transformers, MoE improves LLMs, MoE-Mamba outperforms Mamba and Transformer-MoE.
Jan 8, 2024

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
architectures
FlightLLM enables efficient LLM inference on FPGAs, overcoming challenges with sparse DSP chain, memory bandwidth, and compilation overhead.
Jan 8, 2024

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
architectures
hci
social-sciences
TL;DR: SpeechAgents uses multi-modal LLM to simulate human communication effectively.
Jan 8, 2024

TeleChat Technical Report
prompt-engineering
TeleChat: large language models, pretrained and fine-tuned, performs well on various tasks. Checkpoints released.
Jan 8, 2024

RePLan: Robotic Replanning with Perception and Language Models
prompt-engineering
robustness
Advancements in language models help robots plan and execute tasks, with a new framework enabling real-time replanning for long-horizon tasks.
Jan 8, 2024

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education
programming
education
prompt-engineering
Usage of Large Language Models for education raises concerns about potential bypassing of AI-generated content detectors. Study shows poor detector performance.
Jan 8, 2024

Unveiling Bias in Fairness Evaluations of Large Language Models: A Critical Literature Review of Music and Movie Recommendation Systems
architectures
recommender
Generative AI fairness evaluations overlook personalization, perpetuating unfair practices, need improvement.
Jan 8, 2024

FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference
architectures
Pretrained Language Models need model compression for efficient deployment on commodity hardware.
Jan 8, 2024

MARG: Multi-Agent Review Generation for Scientific Papers
prompt-engineering
MARG improves AI feedback quality for scientific papers, generating specific and helpful comments using multiple LLM instances.
Jan 8, 2024

The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance
hci
prompt-engineering
TL;DR: Small changes in how prompts are constructed can significantly impact the decisions made by Large Language Models (LLMs).
Jan 8, 2024

TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series
architectures
Pretrained large language models adapted for time series forecasting, TTM, outperforms benchmarks with smaller size.
Jan 8, 2024

Boldly Going Where No Benchmark Has Gone Before: Exposing Bias and Shortcomings in Code Generation Evaluation
programming
prompt-engineering
Study evaluates Python code generation benchmarks, finding bias and overestimation of model performance.
Jan 8, 2024

Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects
hci
This article explores the potential of large language model-based intelligent agents for various applications and their deployment in single-agent and multi-agent systems.
Jan 7, 2024

An Investigation of Large Language Models for Real-World Hate Speech Detection
robustness
prompt-engineering
social-sciences
Large language models (LLMs) show promise in detecting hate speech, but effective prompting strategies are crucial for leveraging their knowledge base.
Jan 7, 2024

ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback
programming
hci
recommender
prompt-engineering
ChatGPT is investigated as a conversational recommendation system, and reprompting with feedback improves relevancy while mitigating popularity bias.
Jan 7, 2024

A Large Language Model Supported Synthesis of Contemporary Academic Integrity Research Trends
robustness
ChatGPT analyzed academic integrity research, finding 7 themes and 13 key areas. Technology plays a significant role.
Jan 7, 2024

LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward
robustness
architectures
security
prompt-engineering
programming
AI-driven tools like GitHub Copilot improve code development efficiency but also create security concerns. SecRepair addresses vulnerabilities with reinforcement learning…
Jan 7, 2024

Grimoire is All You Need for Enhancing Large Language Models
prompt-engineering
In this paper, a method called SLEICL is proposed to enhance weak language models’ performance using examples learned by strong models.
Jan 7, 2024

LLMs for Robotic Object Disambiguation
prompt-engineering
Large language models (LLMs) excel at solving decision-making challenges in robotics, but struggle with object disambiguation without additional prompting.
Jan 7, 2024

Overview of Dialogue Robot Competition 2023
architectures
hci
DRC2023 competition tested advanced real-time dialogue robot performance with a human-like android in challenging travel agency tasks.
Jan 7, 2024

Escalation Risks from Language Models in Military and Diplomatic Decision-Making
social-sciences
AI agents in wargames show escalation patterns, arms-race dynamics, and nuclear weapon deployment risks.
Jan 7, 2024

InFoBench: Evaluating Instruction Following Ability in Large Language Models
architectures
education
prompt-engineering
TL;DR: Introduces DRFR metric for evaluating Language Models’ instruction-following, presents InFoBench benchmark, and evaluates LLMs’ performance.
Jan 7, 2024

Malla: Demystifying Real-world Large Language Model Integrated Malicious Services
robustness
security
Study uncovers proliferation of malicious language models in underground markets, prompting need for counteraction strategies.
Jan 6, 2024

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
robustness
Proposes DCR framework for evaluating and improving Large Language Models text consistency, outperforming existing methods.
Jan 4, 2024

Text2MDT: Extracting Medical Decision Trees from Medical Texts
programming
TL;DR: Text2MDT extracts medical decision trees from texts, with an end-to-end method showing promising results. Source codes and dataset are open-sourced.
Jan 4, 2024

Are LLMs Robust for Spoken Dialogues?
social-sciences
Large language models perform well in written dialogue tasks but struggle with spoken interactions. Fine-tuning on spoken datasets improves performance.
Jan 4, 2024

DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models
programming
DIALIGHT toolkit evaluates dialogue systems: PLMs for higher accuracy, LLMs for diversity. Challenges identified for future research.
Jan 4, 2024

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers
hci
Introduction of ICE-GRT, a model utilizing Reinforcement Learning from Human Feedback, performs well in domain-specific tasks and general capabilities.
Jan 4, 2024

Using LLM to select the right SQL Query from candidates
prompt-engineering
programming
Automatic test case generation improves text-to-SQL model performance by re-ranking queries based on execution results and generation probabilities.
Jan 4, 2024

The Effects of Generative AI on Computing Students’ Help-Seeking Preferences
education
prompt-engineering
Generative AI tools in computing education are being adopted, but traditional resources still hold value. Use of AI requires skill development.
Jan 4, 2024

Correctness Comparison of ChatGPT-4, Bard, Claude-2, and Copilot for Spatial Tasks
hci
programming
Generative AI, including ChatGPT-4, excels in spatial tasks but has weaknesses in mapping and code generation.
Jan 4, 2024

Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study
hci
Large language models (LLMs) expanded with visual perception through multi-modal large language models (MLLM). GPT-4V evaluated for marine analysis, with results falling…
Jan 4, 2024

Learning to Prompt with Text Only Supervision for Vision-Language Models
education
prompt-engineering
Foundational vision-language models like CLIP have excellent generalization, but adapting for downstream tasks is challenging. Proposed method learns prompts using text only…
Jan 4, 2024

Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
robustness
prompt-engineering
External feedback stabilizes model’s self-reflection. Self-Contrast strategy reduces biases and improves LLM’s accuracy.
Jan 4, 2024

LLaMA Pro: Progressive LLaMA with Block Expansion
prompt-engineering
programming
We propose a new post-pretraining method for Large Language Models using an expansion of Transformer blocks, yielding LLaMA Pro-8.3B, excelling in general tasks…
Jan 4, 2024

Understanding LLMs: A Comprehensive Overview from Training to Inference
hci
ChatGPT has increased Large Language Model usage, sparking focus on cost-effective training and deployment for future development.
Jan 4, 2024

LLM Augmented LLMs: Expanding Capabilities through Composition
programming
Foundational models with billions of parameters are difficult to augment or impart new skills. CALM proposes cross-attention to compose representations and enable new…
Jan 4, 2024

Multilingual Instruction Tuning With Just a Pinch of Multilinguality
programming
Multilingual instruction-tuning enhances LLMs to follow instructions across languages with minimal multilingual examples.
Jan 3, 2024

Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering
robustness
TL;DR: Proposed LLM can self-determine when to use external sources, achieving 78.2% direct answers and minimizing search to 77.2%.
Jan 3, 2024

AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets
prompt-engineering
education
Enhancing LLMs for astronomy Q&A using continual pre-training. Improved specialized topic comprehension & released open-source conversational AI tool.
Jan 3, 2024

PLLaMa: An Open-source Large Language Model for Plant Science
programming
PLLaMa is an enhanced language model for plant science. It incorporates a vast database and expert panel for accurate responses.
Jan 3, 2024

Physio: An LLM-Based Physiotherapy Advisor
social-sciences
New language models have potential for real-world use but must be trustworthy. Physio combines these models with reliable health sources.
Jan 3, 2024

GPT-4V(ision) is a Generalist Web Agent, if Grounded
prompt-engineering
Recent development in multimodal models has led to new web agents. SEEACT, using GPT-4V, can perform tasks on live websites.
Jan 3, 2024

Generalist embedding models are better at short-context clinical semantic search than specialized embedding models
social-sciences
Large Language Models (LLMs) in medicine raise concerns about robustness and reliability. Benchmarking shows generalist models perform better.
Jan 3, 2024

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope
hci
WordArt Designer API uses Large Language Models to simplify artistic typography for non-professionals, enhancing design flexibility and creative expression.
Jan 3, 2024

Social Media Ready Caption Generation for Brands
hci
Proposed solution uses image captioning and brand personalities to create engaging social media captions.
Jan 3, 2024

De-Hallucinator: Iterative Grounding for LLM-Based Code Completion
robustness
programming
LLMs have limitations in code completion due to a lack of project-specific context. De-Hallucinator addresses this by integrating API references, improving code predictions.
Jan 3, 2024

Cross-target Stance Detection by Exploiting Target Analytical Perspectives
prompt-engineering
MPPT model uses analysis perspective to improve Cross-target Stance Detection, outperforming baseline methods.
Jan 3, 2024

Large Language Models Relearn Removed Concepts
robustness
Model editing via neuron pruning allows for concept removal from language models. Models exhibit resilience and fluidity in relearning pruned concepts.
Jan 3, 2024

Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
prompt-engineering
Visual reasoning with large language models can address current limitations by decomposing tasks and leveraging abstract routines.
Jan 3, 2024

A Generative AI Assistant to Accelerate Cloud Migration
architectures
Tool uses generative AI to speed up on-premises app migration to the cloud, helping users find the right migration strategy.
Jan 3, 2024

Economics Arena for Large Language Models
education
LLMs tested in competitive economics games show varying levels of rationality and strategic reasoning, with GPT-4 exhibiting faster convergence to Nash Equilibria.
Jan 3, 2024

MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries
social-sciences
hci
Creating summaries of medical questions from patients is important for improving doctor-patient interactions. Current research overlooks visual cues and multilingual input…
Jan 3, 2024

Predicting challenge moments from students’ discourse: A comparison of GPT-4 to two traditional natural language processing approaches
education
social-sciences
hci
Groups need strategic self-regulation; ML and LLM models aid in identifying and supporting challenges.
Jan 3, 2024

Applying Bayesian Data Analysis for Causal Inference about Requirements Quality: A Replicated Experiment
social-sciences
Study finds quality defects in requirements impact software engineering activities differently, highlighting the need for varying levels of attention.
Jan 2, 2024

Experimenting a New Programming Practice with LLMs
programming
education
A prototype called AISD uses large language models to automate software development, allowing engineers to focus on high-level tasks.
Jan 2, 2024

Privacy Preserving Personal Assistant with On-Device Diarization and Spoken Dialogue System for Home and Beyond
hci
Voice assistants lack memory, rely on internet, but smartphones enable on-device processing for privacy.
Jan 2, 2024

CXL and the Return of Scale-Up Database Engines
architectures
Specialization trend leads to bottleneck in CPU-device connection. CXL specification aims to tackle this with modern, more powerful interface.
Jan 2, 2024

Zero-Shot Position Debiasing for Large Language Models
architectures
Fine-tuning LLMs can improve domain performance, but may lead to bias. A zero-shot position debiasing framework is proposed.
Jan 2, 2024

PPBFL: A Privacy Protected Blockchain-based Federated Learning Model
security
Developed Privacy Protected Blockchain-based Federated Learning Model (PPBFL) enhances security and participation in federated learning, outperforming baseline methods.
Jan 2, 2024

Socially Responsible Computing in an Introductory Course
social-sciences
hci
prompt-engineering
education
TL;DR: Promoting social responsibility in Computer Science education boosts student motivation and inclusivity.
Jan 2, 2024

Fairness Certification for Natural Language Processing and Large Language Models
social-sciences
NLP needs fairness certification due to potential biases. Researched and developed six criteria for certification.
Jan 2, 2024

SSP: A Simple and Safe automatic Prompt engineering method towards realistic image synthesis on LVM
prompt-engineering
Enhancing text-to-image (T2I) synthesis with Large Language Models (LLM) and Large Vision Models (LVM) using specific camera descriptions for safer and improved image…
Jan 2, 2024

CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
hci
An introduction of CharacterEval, a Chinese benchmark for Role-Playing Conversational Agents’ assessment with a tailored dataset.
Jan 2, 2024

Experimental Validation of Sensor Fusion-based GNSS Spoofing Attack Detection Framework for Autonomous Vehicles
security
Validation of sensor fusion-based GNSS spoofing attack detection for AVs using two strategies.
Jan 2, 2024

Search Games with Predictions
security
Study explores search games with mobile Searcher and immobile Hider, considering consistency and robustness tradeoffs in search strategies.
Jan 2, 2024

Optimal Synthesis of Finite State Machines with Universal Gates using Evolutionary Algorithm
production
Optimization method reduces on-chip area and circuit cost by 30%.
Jan 2, 2024

Physics-informed Generalizable Wireless Channel Modeling with Segmentation and Deep Learning: Fundamentals, Methodologies, and Challenges
social-sciences
Data-driven techniques improve wireless channel modeling. Physics-informed neural networks show promise for accurate, interpretable predictions.
Jan 2, 2024

Uncertainty Resolution in Misinformation Detection
hci
Large Language Models (LLMs) help combat misinformation but struggle with ambiguous statements. New framework improves context assessment.
Jan 2, 2024

Detection and Defense Against Prominent Attacks on Preconditioned LLM-Integrated Virtual Assistants
prompt-engineering
security
LLM virtual assistants need safeguards against malicious manipulation for reliability and integrity.
Jan 2, 2024

Noise-NeRF: Hide Information in Neural Radiance Fields using Trainable Noise
robustness
security
NeRF faces security issues. This paper introduces Noise-NeRF for improved steganography quality and efficiency.
Jan 2, 2024

Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond
social-sciences
Multi-Task Learning can be successful with little overlapping annotations and uneven data sizes, with performance improvements in multiple domains.
Jan 2, 2024

Profiling Programming Language Learning
prompt-engineering
programming
education
Year-long experiment on programming language learning, using quizzes to improve understanding and retention.
Jan 2, 2024

GEqO: ML-Accelerated Semantic Equivalence Detection
architectures
GEqO framework automates detection of semantic equivalence in large-scale analytics, yielding significant performance gains.
Jan 2, 2024

Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models
hci
robustness
LLMs in law risk legal hallucinations 69-88% of interviews; caution against unsupervised use; risky for pro se litigants.
Jan 2, 2024

LLM Harmony: Multi-Agent Communication for Problem Solving
social-sciences
prompt-engineering
Novel multi-agent communication framework enhances autonomy and problem-solving of Large Language Models for diverse scenarios.
Jan 2, 2024

The social graph based on real data
social-sciences
Proposed model creates realistic social graph using real community data, with power-law distribution and small world properties.
Jan 2, 2024

LLbezpeky: Leveraging Large Language Models for Vulnerability Detection
security
LLMs show promise in detecting Android app vulnerabilities with 91.67% accuracy, aiming to build a robust vulnerability detection system.
Jan 2, 2024

Spiker+: a framework for the generation of efficient Spiking Neural Networks FPGA accelerators for inference at the edge
robustness
Spiker+ is a customizable framework for generating efficient Spiking Neural Networks accelerators on FPGA for edge computing, achieving competitive performance and low…
Jan 2, 2024

Generative AI is already widespread in the public sector
social-sciences
Generative AI is transforming the public sector, with widespread use and positive opinions, but lack of clear guidelines.
Jan 2, 2024

Joint Offloading and Resource Allocation for Hybrid Cloud and Edge Computing in SAGINs: A Decision Assisted Hybrid Action Space Deep Reinforcement Learning Approach
architectures
Research on space-air-ground integrated networks (SAGINs) using deep reinforcement learning to optimize offloading and resource allocation in cloud and edge computing…
Jan 2, 2024

VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
prompt-engineering
VideoDrafter uses language models to create consistent multi-scene videos, outperforming existing models in quality and consistency.
Jan 2, 2024

Deplatforming Norm-Violating Influencers on Social Media Reduces Overall Online Attention Toward Them
social-sciences
hci
Online deplatforming reduces attention towards influencers. Study addresses limitations, finds impact, and contributes to content moderation research.
Jan 2, 2024

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
social-sciences
Novel framework for personalized face generation with sophisticated expression control and identity retention.
Jan 2, 2024

A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models
prompt-engineering
security
Novel evaluation framework measures application resilience to prompt injection attacks, showing newer models are more resilient.
Jan 2, 2024

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
social-sciences
hci
robustness
LLMs have a hallucination issue hindering real-world deployment. Survey of 32 techniques for mitigation presented.
Jan 2, 2024

LLaMA Beyond English: An Empirical Study on Language Capability Transfer
social-sciences
Transfer English LLM capabilities to non-English languages with minimal pretraining data, achieving comparable performance.
Jan 2, 2024

FedQV: Leveraging Quadratic Voting in Federated Learning
security
Federated Learning improved with FedQV, an election-based aggregation algorithm, offers better resistance to poisoning attacks and privacy breaches.
Jan 2, 2024

TREC iKAT 2023: The Interactive Knowledge Assistance Track Overview
hci
TREC iKAT focuses on creating adaptive conversational search agents for personalized information seeking and decision-making tasks.
Jan 2, 2024

Unifying Structured Data as Graph for Data-to-Text Pre-Training
production
Data-to-text (D2T) generation enhanced by graph-based pre-training shows effective performance on various structured data.
Jan 2, 2024

Beam-Based Multiple Access for IRS-Aided Millimeter-Wave and Terahertz Communications
architectures
Paper proposes beam-based multiple-access strategy using intelligent reflecting surface for IRS-aided mmWave and THz communications. Increases system capacity significantly.
Jan 2, 2024

A Comprehensive Study of Knowledge Editing for Large Language Models
production
LLMs face computational demands for ongoing updates. Research examines editing approaches for efficient model modifications and proposes a categorization criterion.
Jan 2, 2024

IdentiFace : A VGG Based Multimodal Facial Biometric System
social-sciences
IdentiFace is a multimodal facial biometric system with high accuracy in recognition tasks.
Jan 2, 2024

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
architectures
LLMs can handle long contexts without fine-tuning. Self-Extend extends their context window effortlessly.
Jan 2, 2024

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
architectures
TL;DR: Self-Play fIne-tuNing (SPIN) method improves language models using their own training data without additional human annotation.
Jan 2, 2024

JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial Example
security
Proposes a more effective targeted attack against deep learning classifiers, capable of inducing targeted modifications in complex classification scenarios.
Jan 2, 2024

A Computational Framework for Behavioral Assessment of LLM Therapists
social-sciences
ChatGPT and other large language models are being considered as therapists, but research shows their behavior may not reflect high-quality therapy.
Jan 1, 2024

Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education
education
MLLMs like GPT-4V enhance education with multimodal learning, but careful integration is needed for ethical and effective use.
Jan 1, 2024

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models
architectures
Survey on resource-efficient techniques for Large Language Models (LLMs) advancement in AI.
Jan 1, 2024

Leveraging Large Language Models to Boost Dafny’s Developers Productivity
prompt-engineering
programming
Proposal to use Large Language Models to enhance Dafny developers’ productivity and adoption.
Jan 1, 2024

Digger: Detecting Copyright Content Mis-usage in Large Language Model Training
hci
Pre-training LLMs can raise copyright concerns. A new framework is introduced to detect and address copyrighted content misuse.
Jan 1, 2024

A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models
social-sciences
hci
prompt-engineering
Advancements in large language models enable breakthroughs in tasks like writing and translation, but evaluating their reasoning is challenging. LogicAsker assesses logical…
Jan 1, 2024

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
programming
CoDI-Eval evaluates large language models’ ability to follow instructions with specific constraints, revealing limitations and the need for improvement.
Jan 1, 2024

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
prompt-engineering
programming
LLMs benefit from integrating code in training, enhancing code generation and reasoning ability for complex tasks.
Jan 1, 2024

Large Language Models aren’t all that you need
education
Comparison of traditional and Large Language Model for Multilingual Named Entity Recognition, with novel techniques.
Jan 1, 2024

Distillation is All You Need for Practically Using Different Pre-trained Recommendation Models
recommender
Proposed PRM-KD model efficiently utilizes diverse pre-trained recommendation models to enhance student models for real-world recommendations.
Jan 1, 2024

The Earth is Flat? Unveiling Factual Errors in Large Language Models
robustness
TL;DR: FactChecker is a new automatic testing framework that uncovers factual inaccuracies in large language models with up to 45% error detection.
Jan 1, 2024

SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models
security
Privacy concerns with large language models led to Secure Multi-Party Computing (SMPC) for Privacy-Preserving Inference. SecFormer optimizes SMPC for Transformer models…
Jan 1, 2024

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
security
Astraios compares fine-tuning methods for large language models and finds full-parameter fine-tuning generally leads to best performance.
Jan 1, 2024

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
robustness
prompt-engineering

ToolEyes assesses large language model tool learning in authentic scenarios, uncovering limitations and guiding future research.

Jan 1, 2024

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
social-sciences
hci
Study introduces Emotional chat Model (E-chat) for emotion-sensitive spoken dialogue, outperforming baseline models.
Dec 31, 2023

Opening A Pandora’s Box: Things You Should Know in the Era of Custom GPTs
security
Custom GPTs pose security threats, with 26 potential attack vectors identified. Urgent need for robust security measures.
Dec 31, 2023

Viz: A QLoRA-based Copyright Marketplace for Legally Compliant Generative AI
production
Viz system integrates QLoRA to fine-tune large language models legally and efficiently, addressing AI challenges.
Dec 31, 2023

KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
programming
KernelGPT automates syscall specification generation for enhanced kernel fuzzing, improving coverage and finding new bugs.
Dec 31, 2023

DocLLM: A layout-aware generative language model for multimodal document understanding
hci
DocLLM is a model for reasoning over visual documents using text and layout information, outperforming existing models.
Dec 31, 2023

keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM
education
hci
LLMs struggle with knowledge gaps. Keqing assists by retrieving relevant info and guiding logical answering paths.
Dec 31, 2023

GeoGalactica: A Scientific Large Language Model in Geoscience
education
LLMs show potential in AI for science. GeoGalactica is a large language model tailored for geoscience.
Dec 31, 2023

State of What Art? A Call for Multi-Prompt LLM Evaluation
robustness
prompt-engineering
Advances in large language models are analyzed for their evaluation, suggesting diverse prompts for more reliable assessments.
Dec 31, 2023

An Analysis of Embedding Layers and Similarity Scores using Siamese Neural Networks
programming
TL;DR: Large language models use word embeddings, and our research compares their accuracy and environmental impact.
Dec 31, 2023

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
prompt-engineering
RAGTruth is a dataset for analyzing hallucinations in large language models, helping measure and prevent unsupported claims in retrieved content.
Dec 31, 2023

Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing
architectures
Large language models (LLMs) like ChatGPT impact NLP, but struggle with biomedical tasks. Study proposes instruction tuning for biomedical language processing.
Dec 31, 2023

Fairness in Serving Large Language Models
architectures
New scheduling algorithm VTC ensures fair LLM serving, offering superior performance and resource utilization.
Dec 31, 2023

LaFFi: Leveraging Hybrid Natural Language Feedback for Fine-tuning Language Models
social-sciences
education
LLMs trained with LaFFi reflect on the feedback they’ll receive, improving question-answering accuracy. Experiments show the potential of natural language feedback.
Dec 31, 2023

BatchEval: Towards Human-like Text Evaluation
robustness
prompt-engineering
BatchEval improves text evaluation over LLMs, addressing design sensitivity, noise resistance, and ensemble performance, with 10.5% higher correlations at reduced API cost.
Dec 31, 2023

Boosting Large Language Model for Speech Synthesis: An Empirical Study
hci
production
Combining LLM LLaMA/OPT and VALL-E speech synthesis model, findings show directly fine-tuning LLMs or using superposed layers has limitations. Coupled LLMs and VALL-E…
Dec 30, 2023

Open-TI: Open Traffic Intelligence with Augmented Language Model
hci
Intelligent transportation benefits cities, but complex algorithms pose challenges. Open-TI aims to bridge industry-academic gap with advanced traffic analysis.
Dec 30, 2023

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
architectures
production
TL;DR: Reinforcement learning from human feedback (RLHF) can lead to overoptimization, but uncertainty-penalized RLHF (UP-RLHF) mitigates this issue effectively.
Dec 30, 2023

Is Knowledge All Large Language Models Needed for Causal Reasoning?
hci
Paper explores enhancing large language models’ causal reasoning for AI, finding its dependence on contextual information and domain-specific knowledge.
Dec 30, 2023

Unicron: Economizing Self-Healing LLM Training at Scale
architectures
Unicron is a self-healing workload manager for large-scale language model training, reducing failure-related costs and improving efficiency.
Dec 30, 2023

The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness
security
SODE benchmark assesses LLM safety and over-defensiveness, revealing key defense strategy insights for further research.
Dec 30, 2023

Teach Large Language Models to Forget Privacy
prompt-engineering
Tackle privacy risks in large language models with Prompt2Forget, achieving 90% forgetfulness without utility loss.
Dec 30, 2023

Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
security
robustness
Study evaluates prompting techniques for LLMs on math tasks. Findings show models struggle with elementary calculations and reasoning even with red teaming.
Dec 30, 2023

The Problem of Alignment
social-sciences
hci
prompt-engineering
Language models need alignment with human values to avoid reproducing biases. This relationship shapes linguistic theories and practice.
Dec 30, 2023

Evaluation is all you need. Prompting Generative Large Language Models for Annotation Tasks in the Social Sciences. A Primer using Open Models
social-sciences
prompt-engineering
Open generative LLMs for social science annotation tasks, advocating for open source models.
Dec 30, 2023

Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation
hci
robustness
Cybersecurity experts explore using advanced language models to interpret and summarize cyberattack methods for better understanding.
Dec 30, 2023

Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models
hci
TL;DR: Large Multimodal Models (LMMs) merge language and vision, showing great potential for image classification and zero-shot learning.
Dec 30, 2023

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning
robustness
prompt-engineering
Hybrid planner combines rule-based and language models, outperforming existing methods in driving scenario handling.
Dec 30, 2023

Exploring the Sensitivity of LLMs’ Decision-Making Capabilities: Insights from Prompt Variation and Hyperparameters
hci
Study examines language models’ decision making with varying prompts and hyperparameters showing human-like exploration-exploitation tradeoff.
Dec 29, 2023

ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education
education
ChatGPT and similar language models have potential in education but face challenges with accuracy. New architecture offers enhanced student support.
Dec 29, 2023

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model
hci
prompt-engineering
programming
LLMs effectively repair code review defects, achieving 72.97% repair rate, improving automatic repair practicality.
Dec 29, 2023

EHR Interaction Between Patients and AI: NoteAid EHR Interaction
education
Introduction of NoteAid EHR Interaction Pipeline using LLMs for patient education from EHRs, with dataset evaluation.
Dec 29, 2023

Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs
education
CoT method improved for LLMs. Olapa-MCoT, based on llama2-13B, enhanced Chinese math reasoning by 36%. English reasoning also improved.
Dec 29, 2023

Jatmo: Prompt Injection Defense by Task-Specific Finetuning
hci
programming
security
Jatmo creates task-specific models resilient to prompt-injection attacks for LLMs.
Dec 29, 2023

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game
hci
robustness
LLMs show promise in ad hoc teamwork but may suffer from communication issues. CodeAct aims to address this with enhanced memory and code-driven reasoning.
Dec 29, 2023

K-PERM: Personalized Response Generation Using Dynamic Knowledge Retrieval and Persona-Adaptive Queries
social-sciences
hci
Personalizing conversational agents with external knowledge improves user engagement and quality of conversations. K-PERM achieves state-of-the-art performance.
Dec 29, 2023

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning
hci
LLM fine-tuning raises privacy concerns. DP-LoRA, a federated learning algorithm, addresses privacy and communication overhead challenges effectively.
Dec 29, 2023

DB-GPT: Empowering Database Interactions with Private Large Language Models
programming
DB-GPT integrates large language models with databases for natural language queries and secure data interaction.
Dec 29, 2023

Overview of the PromptCBLUE Shared Task in CHIP2023
prompt-engineering
Overview of PromptCBLUE shared task at CHIP-2023 Conference, featuring reformulated benchmarks for testing Chinese language models in medical domains.
Dec 29, 2023

SMoT: Think in State Machine
prompt-engineering
New approach uses State Machine of Thought (SMoT) and expert knowledge to improve language model reasoning accuracy.
Dec 29, 2023

Video Understanding with Large Language Models: A Survey
architectures
Survey explores advancements in video understanding using Large Language Models (Vid-LLMs), highlighting capabilities and applications.
Dec 29, 2023

Action-Item-Driven Summarization of Long Meeting Transcripts
prompt-engineering
Novel algorithm generates abstractive meeting summaries driven by action items, using sectional summaries and topic-based division method. Improved BERTScore.
Dec 29, 2023

How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation
hci
AI agent believability relies on user trust. Large Language Model agents face challenges, so new metrics are introduced.
Dec 28, 2023

An Adaptive Framework of Geographical Group-Specific Network on O2O Recommendation
recommender
User and service spatiotemporal info requires personalized models. GeoGrouse improves group-specific recommendation by studying user preferences.
Dec 28, 2023

GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
programming
LLMs struggle with varied tasks, but GitAgent integrates GitHub tools to improve task performance with 69.4% success.
Dec 28, 2023

Do Androids Know They’re Only Dreaming of Electric Sheep?
robustness
Probes trained on language model representations detect hallucination behavior across tasks, but force-decoded states are not valid for organic hallucination detection.…
Dec 28, 2023

Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
prompt-engineering
TL;DR: Proposed Grounding-Prompter method improves temporal grounding in long videos using multimodal information, enhancing state-of-the-art performance.
Dec 28, 2023

Factoring Expertise, Workload, and Turnover into Code Review Recommendation
recommender
Code review recommendation can distribute knowledge and mitigate turnover, reducing workload concentration and files at risk.
Dec 28, 2023

Timeliness: A New Design Metric and a New Attack Surface
security
TL;DR: Age-based communication networks are vulnerable to threats like timestomping and misinformation dissemination from adversaries.
Dec 28, 2023

FENet: Focusing Enhanced Network for Lane Detection
programming
Research addresses lane detection challenges in autonomous driving, by proposing targeted network enhancements and achieving improved accuracy.
Dec 28, 2023

Fast Inference of Mixture-of-Experts Language Models with Offloading
architectures
Sparse Mixture-of-Experts language models run faster with parameter offloading strategies, enabling efficient use on consumer hardware.
Dec 28, 2023

On Inapproximability of Reconfiguration Problems: PSPACE-Hardness and some Tight NP-Hardness Results
architectures
RIH asserts the hardness of finding a sequence of assignments satisfying constraints, proven and applied to reconfiguration problems.
Dec 28, 2023

When Metaverses Meet Vehicle Road Cooperation: Multi-Agent DRL-Based Stackelberg Game for Vehicular Twins Migration
architectures
TL;DR: Vehicular Metaverses use vehicle road cooperation and augmented intelligence for seamless user experience, with a proposed incentive mechanism for optimizing VT…
Dec 28, 2023

AQUALLM: Audio Question Answering Data Generation Using Large Language Models
programming
AQA dataset creation framework improves AQA models, sets superior benchmarks, and enhances generalizability. Accessible on GitHub.
Dec 28, 2023

Improving Code Reviewer Recommendation: Accuracy, Latency, Workload, and Bystanders
hci
robustness
Code review system at Meta improved through experiments, with emphasis on author-reviewer familiarity and balancing workloads. Bystander effect mitigated.
Dec 28, 2023

Generative AI for Math: Part I – MathPile: A Billion-Token-Scale Pretraining Corpus for Math
programming
Introducing , a high-quality math-centric corpus, prioritizing data quality over quantity for language model pre-training.
Dec 28, 2023

Fully Sparse 3D Panoptic Occupancy Prediction
programming
New method SparseOcc improves autonomous driving occupancy prediction with efficient sparse representation and instance differentiation, achieving high accuracy and…
Dec 28, 2023

Replica Tree-based Federated Learning using Limited Data
architectures
Proposed RepTreeFL framework enables effective federated learning with limited data and clients, outperforming in various tasks.
Dec 28, 2023

Scalable and automated Evaluation of Blue Team cyber posture in Cyber Ranges
security
Cyber ranges are vital for secure training. New automation proposal improves exercise evaluation and assessment.
Dec 28, 2023

Learning to Generate Text in Arbitrary Writing Styles
hci
prompt-engineering
Text generation to mimic specific author styles using contrastively-trained representations and discriminative control is effective and versatile.
Dec 28, 2023

Securing NextG Systems against Poisoning Attacks on Federated Learning: A Game-Theoretic Solution
security
Study analyzes poisoning attacks in federated learning (FL) for wireless signal classification, proposing a defense mechanism against malicious clients.
Dec 28, 2023

Towards Auto-Modeling of Formal Verification for NextG Protocols: A Multimodal cross- and self-attention Large Language Model Approach
programming
AVRE is a novel system for formal verification of Next Generation protocols, using Large Language Models to improve accuracy and scalability.
Dec 28, 2023

Structured Packing in LLM Training Improves Long Context Utilization
programming
Advances in language models are limited by context utilization. SPLiCe enhances model performance using related documents.
Dec 28, 2023

Knowledge Distillation of LLM for Education
education
Method distills knowledge of large models for efficient deployment on resource-constrained devices, improving accuracy and model size.
Dec 26, 2023

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
prompt-engineering
26 principles for efficient queries and prompts for large language models, verified on various models, to aid researchers.
Dec 26, 2023

Improved decoding of expander codes: fundamental trade-off between expansion ratio and minimum distance of inner code
programming
Tanner codes and expander codes use bipartite graphs. The paper shows conditions for decoding expander codes efficiently.
Dec 26, 2023

Task Contamination: Language Models May Not Be Few-Shot Anymore
prompt-engineering
Large language models (LLMs) perform better on older datasets, suggesting task contamination affects zero-shot and few-shot tasks.
Dec 26, 2023

On the Trajectories of SGD Without Replacement
production
Stochastic Gradient Descent without replacement implicitly regularizes and optimizes differently than other methods, leading to faster escape from saddles and sparser…
Dec 26, 2023

Supervised Knowledge Makes Large Language Models Better In-context Learners
prompt-engineering
LLMs’ in-context learning is enhanced through task-specific fine-tuned Language Models, improving generalizability and factuality.
Dec 26, 2023

Semantic Importance-Aware Based for Multi-User Communication Over MIMO Fading Channels
production
Novel SIA-SC system boosts semantic performance in multi-user MIMO scenarios, with a new metric to measure performance.
Dec 26, 2023

RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation
prompt-engineering
LLMs used in recommendation systems lack integration of multiple ranking tasks, so RecRanker was developed to address this and improve model performance.
Dec 26, 2023

A Comprehensive Survey of Evaluation Techniques for Recommendation Systems
recommender
This paper introduces a comprehensive suite of metrics to evaluate recommendation systems’ performance and their impact on business success.
Dec 26, 2023

AutoTask: Executing Arbitrary Voice Commands by Exploring and Learning from Mobile GUI
architectures
AutoTask is a voice command interface that automates any mobile app task without prior knowledge or configuration.
Dec 26, 2023

Critical nonlinear aspects of hopping transport for reconfigurable logic in disordered dopant networks
robustness
Nonlinear hopping transport enables logic gates in disordered devices, analyzed through simulations and compared to experimental data.
Dec 26, 2023

Achieving Fairness in DareFightingICE Agents Evaluation Through a Delay Mechanism
architectures
Delay mechanism mitigates gRPC latency impact on agents in DareFightingICE, balancing performance between Java and Python.
Dec 26, 2023

Inter-X: Towards Versatile Human-Human Interaction Analysis
hci
Largest human-human interaction dataset with accurate body movements, hand gestures, and textual descriptions for research.
Dec 26, 2023

From Text to Multimodal: A Comprehensive Survey of Adversarial Example Generation in Question Answering Systems
security
Critical review of adversarial example-generation techniques in Question Answering systems, including multimodal contexts.
Dec 26, 2023

Social-Transmotion: Promptable Human Trajectory Prediction
hci
Social-Transmotion model uses transformers to improve human trajectory prediction by leveraging non-verbal social cues.
Dec 26, 2023

Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models
production
Efficient zero-shot listwise reranking with LiT5-Distill and LiT5-Score challenge large-scale models. Competitive results with smaller models. Code available.
Dec 26, 2023

A Prompt Learning Framework for Source Code Summarization
prompt-engineering
programming
PromptCS improves code summarization using continuous prompts for LLMs, outperforming other schemes with faster training and better summaries.
Dec 26, 2023

The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias
hci
Media bias impacts public opinion. This article reviews research on detecting bias and introduces the Media Bias Taxonomy. Transformer-based approaches show promise, but…
Dec 26, 2023

Can ChatGPT Read Who You Are?
hci
AI and psychology intersect to assess personality traits using ChatGPT. It shows competitive performance with a positive bias.
Dec 26, 2023

One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
architectures
TL;DR: New erasing framework for text-to-image models prevents undesired behaviors, offers flexible and efficient concept elimination.
Dec 26, 2023

Aligning Large Language Models with Human Preferences through Representation Engineering
architectures
Aligning large language models with human preferences is crucial. Representation Alignment from Human Feedback (RAHF) effectively manipulates model representations to align…
Dec 26, 2023

Ensemble Learning to Assess Dynamics of Affective Experience Ratings and Physiological Change
hci
Using advanced technology and open science to address the relationship between emotions, physiology, and data analysis in the EPiC challenge.
Dec 26, 2023

Large Language Models are Not Stable Recommender Systems
recommender
LLMs’ positional bias hinders recommendation stability. Researchers propose STELLA, a Bayesian framework, to mitigate bias and improve recommendation performance in LLMs.
Dec 25, 2023

Unlocking the Potential of Large Language Models for Explainable Recommendations
recommender
TL;DR: The study proposes LLMXRec, a framework using large language models for better explanations in recommendation systems.
Dec 25, 2023

Alleviating Hallucinations of Large Language Models through Induced Hallucinations
robustness
TL;DR: New method Induce-then-Contrast Decoding reduces inaccuracies in large language models by penalizing induced hallucinations in their responses.
Dec 25, 2023

The Persuasive Power of Large Language Models
hci
Large Language Models could generate effective arguments, shaping public opinion in online discourse. Synthetic social systems mimic human opinion dynamics.
Dec 24, 2023

Evolving Large Language Model Assistant with Long-Term Conditional Memory
robustness
AI assistants like ChatGPT with long-term memory improve responses using past dialogue, tested on different datasets.
Dec 22, 2023

Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs
hci
prompt-engineering
Large Language Models have potential for recommendation explanations, but existing models struggle. Logic-Scaffolding offers a solution.
Dec 22, 2023

Context-aware Decoding Reduces Hallucination in Query-focused Summarization
robustness
Query-focused summarization (QFS) uses Context-aware Decoding (CAD) to improve generation quality for QFS tasks.
Dec 21, 2023

Geometric Awareness in Neural Fields for 3D Human Registration
robustness
TL;DR: New neural field model (LoVD) and self-supervised task (INT) improve 3D human body alignment, outperforming existing methods.
Dec 21, 2023

Neural Contextual Bandits for Personalized Recommendation
recommender
Tutorial on contextual bandits for personalized recommendations, exploring challenges, advanced algorithms, and future prospects in online businesses.
Dec 21, 2023

ChatGPT as a commenter to the news: can LLMs generate human-like opinions?
programming
GPT-3.5 can’t generate human-like Dutch news comments, even with various prompting techniques and personas.
Dec 21, 2023

Open-Set: ID Card Presentation Attack Detection using Neural Transfer Style
security
Study explores using GANs to improve ID card Presentation Attack detection, showing effectiveness in training fraud detection systems.
Dec 21, 2023

Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration
production
Flexible privacy framework Pufferfish faces challenges in maintaining utility. A variant using Renyi divergence improves applicability and utility.
Dec 21, 2023

Designing Artificial Intelligence Equipped Social Decentralized Autonomous Organizations for Tackling Sextortion Cases Version 0.7
hci
Text explores sextortion, studies lack of coordination in victim support, proposes AI and blockchain-based solutions.
Dec 21, 2023

A Novel Approach for Rapid Development Based on ChatGPT and Prompt Engineering
prompt-engineering
programming
ChatGPT improves code generation with a web-based platform, showing significant performance improvements.
Dec 21, 2023

Prometheus: Infrastructure Security Posture Analysis with AI-generated Attack Graphs
security
TL;DR: Cybersecurity breaches demand a holistic security solution. Prometheus system assesses vulnerabilities and attack paths comprehensively.
Dec 20, 2023

A Novel Approach for RapidDevelopment Based on ChatGPT and Prompt Engineering
prompt-engineering
programming
ChatGPT used for code generation platform, improving performance and validation in real scenarios.
Dec 20, 2023

SoK: A Broad Comparative Evaluation of Software Debloating Tools
robustness
Debloating tools lack maturity, struggle to produce sound programs, and don’t significantly improve performance or security.
Dec 20, 2023

Contextual Code Switching for Machine Translation using Language Models
programming
Large language models (LLMs) excel in various tasks, but smaller models outperform in machine translation.
Dec 20, 2023

RIShield: Enabling Electromagnetic Blackout in Radiation-Sensitive Environments
architectures
RIShield uses RIS technology to block radiation leakage in sensitive environments.
Dec 20, 2023

Scaling Compute Is Not All You Need for Adversarial Robustness
security
Progress in adversarial robust deep learning, but large models and computing power limitations. Benchmarking framework available.
Dec 20, 2023

Quick Order Fairness: Implementation and Evaluation
security
Decentralized finance tackles trust issues using blockchain but faces front-running vulnerabilities. QOF protocol mitigates attacks but adds complexity.
Dec 20, 2023

Android dialogue system for customer service using prompt-based topic control and compliments generation
hci
prompt-engineering
A chatbot system for trip planning uses AI to control conversation topics and generate personalized compliments, showing effectiveness in a preliminary evaluation.
Dec 20, 2023

StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation
architectures
KD struggles with accuracy and slow distillation. StableKD breaks IBOE, boosts accuracy, and speeds convergence.
Dec 20, 2023

Automated DevOps Pipeline Generation for Code Repositories using Large Language Models
programming
TL;DR: GPT 3.5 and GPT 4 improve GitHub Action workflows, with GPT 4 showing better DevOps awareness.
Dec 20, 2023

When Memory Mappings Attack: On the (Mis)use of the ARM Cortex-M FPB Unit
robustness
security
Low-cost microcontrollers in IoT devices are vulnerable to security attacks, despite protection mechanisms.
Dec 20, 2023

Enhancing Neural Training via a Correlated Dynamics Model
architectures
TL;DR: Correlation Mode Decomposition clusters parameters to represent training dynamics efficiently, improving generalization and training efficiency.
Dec 20, 2023

dIR – Discrete Information Retrieval: Conversational Search over Unstructured (and Structured) Data with Large Language Models
prompt-engineering
dIR enables querying of both free text and structured knowledge for complex queries.
Dec 20, 2023

LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate
security
TL;DR: The paper proposes Lipschitz Regularized Surrogate for improving transfer-based black-box attacks using transformed surrogate models.
Dec 20, 2023

On Inference Stability for Diffusion Models
production
TL;DR: Denoising Probabilistic Models (DPMs) improve image generation with a new sequence-aware loss, yielding better results than traditional methods.
Dec 19, 2023

Terrapin Attack: Breaking SSH Channel Integrity By Sequence Number Manipulation
security
SSH protocol vulnerabilities allow attackers to break channel integrity and downgrade security measures, affecting millions of servers.
Dec 19, 2023

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
security
LLMs lack safety training and are vulnerable to priming attacks, effectively bypassing alignment, increasing attack success rate.
Dec 19, 2023

FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels
production
F-LNL aims for optimal server model via collaborative learning, FedDiv introduces global noise filter for stability and performance.
Dec 19, 2023

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
prompt-engineering
Diffusion models require engineered prompts for faithful image synthesis. This work focuses on inverting the model for interpretable language prompts, using a delayed…
Dec 19, 2023

Towards Automatic Support of Software Model Evolution with Large Language~Models
programming
Large language models support software model evolution, showing promise for future research in this area.
Dec 19, 2023

TeamCAD – A Multimodal Interface for Remote Computer Aided Design
education
TL;DR: TeamCAD improves remote design collaboration with voice and gesture recognition for better user experience.
Dec 19, 2023

Toward enriched Cognitive Learning with XAI
prompt-engineering
AI-supported system CL-XAI enhances cognitive learning with explainable AI tools, benefiting human learners and addressing knowledge deficiencies.
Dec 19, 2023

Web 3.0 and a Decentralized Approach to Education
education
Current centralized education system outdated; decentralized approach eliminates discrepancies, integrates Decentralized Identity with Web 3.0.
Dec 19, 2023

Localization and Discrete Beamforming with a Large Reconfigurable Intelligent Surface
production
TL;DR: Proposed scalable protocol and algorithms address issues in near-field RIS beamforming for improved localization in mmWave cellular systems.
Dec 19, 2023

On-Device Recommender Systems: A Tutorial on The New-Generation Recommendation Paradigm
recommender
TL;DR: On-device recommender systems (ODRSs) are emerging to address challenges of traditional cloud-based systems in e-commerce applications, offering lightweight…
Dec 18, 2023

A novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm
recommender
Sequential recommendation enhances user prediction with a novel diffusion recommendation algorithm named AREAL, achieving significant improvements in experiments.
Dec 18, 2023

Code Ownership in Open-Source AI Software Security
security
Novel code ownership metrics correlate with security in AI open-source projects, aiding project evaluation and benchmarking.
Dec 18, 2023

Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
education
Knowledge distillation improves image synthesis by blending student and teacher models for better quality samples.
Dec 17, 2023

Addressing Sample Inefficiency in Multi-View Representation Learning
recommender
Non-contrastive self-supervised learning (NC-SSL) insights improve representation learning efficiency and performance in computer vision.
Dec 17, 2023

A Mutation-Based Method for Multi-Modal Jailbreaking Attack Detection
security
JailGuard detects jailbreak attacks on large language models with 89.38% accuracy for image inputs and 85.42% for text, outperforming existing methods.
Dec 17, 2023

HE-DKSAP: Privacy-Preserving Stealth Address Protocol via Additively Homomorphic Encryption
security
Blockchain transactions face privacy concerns. Stealth addresses mitigate these, but have vulnerabilities. HE-DKSAP offers a secure, scalable privacy solution.
Dec 17, 2023

Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model within 0.5K Parameters
prompt-engineering
Vision-language model adaptation is enhanced through RLP prompts, reducing parameters and storage, achieving superior results.
Dec 17, 2023

AI Gender Bias, Disparities, and Fairness: Does Training Data Matter?
education
Study examines gender biases in AI scoring of student responses. Mixed-trained models show no significant scoring bias but may widen gender disparities.
Dec 17, 2023

Latent Space Editing in Transformer-Based Flow Matching
prompt-engineering
TL;DR: The paper introduces a new image editing method using Flow Matching and a transformer backbone for scalable and high-quality generative modeling.
Dec 17, 2023

A Unified Framework for Multi-Domain CTR Prediction via Large Language Models
recommender
Uni-CTR is a new approach to multi-domain click-through rate (MDCTR) prediction, leveraging a Large Language Model (LLM) and domain-specific networks for better performance…
Dec 17, 2023

Understanding the Instruction Mixture for Large Language Model
education
programming
Exploring the impact of different instruction types on large language models’ performance reveals the need for careful instruction design.
Dec 17, 2023

DePRL: Achieving Linear Convergence Speedup in Personalized Decentralized Learning with Shared Representations
production
DePRL is a new personalized decentralized learning algorithm that improves convergence speed and performance in heterogeneous data environments.
Dec 17, 2023

Mixed Distillation Helps Smaller Language Model Better Reasoning
production
Smaller models gain LLM capabilities through Mixed Distillation, outperforming LLMs in reasoning accuracy.
Dec 17, 2023

Language-conditioned Learning for Robotic Manipulation: A Survey
education
TL;DR: Survey of language-conditioned robotic manipulation, analyzing recent advancements and future research directions.
Dec 17, 2023

Revealing Networks: Understanding Effective Teacher Practices in AI-Supported Classrooms using Transmodal Ordered Network Analysis
prompt-engineering
education
Using AI and quantitative ethnography, the study uncovers effective teacher practices in classrooms using AI tutors.
Dec 17, 2023

kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest Neighbor In-Context Learning
programming
LLMs improve semantic parsing tasks without needing extra data or specialized prompts, achieving comparable performance to supervised models.
Dec 17, 2023

The Earth is Flat because…: Investigating LLMs’ Belief towards Misinformation via Persuasive Conversation
education
LLMs vulnerable to persuasive misinformation, belief change in multi-turn conversations.
Dec 14, 2023

CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality Labels
education
TL;DR: Engagement recognition in online learning can be improved with CMOSE dataset and MocoRank training mechanism.
Dec 14, 2023

TinyGSM: achieving >80% on GSM8k with small language models
education
Small-scale models can solve grade school math with high accuracy using high-quality datasets and verifiers.
Dec 14, 2023

On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
security
Contrastive backdoor attacks differ from supervised ones, requiring tailored defenses due to distinct learning mechanisms.
Dec 14, 2023

Coevolutionary Algorithm for Building Robust Decision Trees under Minimax Regret
security
Novel CoEvoRDT algorithm creates robust decision trees, outperforming state-of-the-art methods in handling adversarial attacks.
Dec 14, 2023

Fast Sampling via De-randomization for Discrete Diffusion Models
production
Novel de-randomized diffusion process accelerates discrete diffusion models for faster, high-quality data generation.
Dec 14, 2023

Evaluating Augmented Reality Communication: How Can We Teach Procedural Skill in AR?
education
AR in healthcare for remote medical training analyzed for teaching a CVC procedure, comparing AR and video communication.
Dec 14, 2023

Towards Trustworthy AI Software Development Assistance
programming
A new architecture aims to improve AI software development assistants’ reliability and code quality. It includes a foundational LLM and a knowledge graph.
Dec 14, 2023

Towards Verifiable Text Generation with Evolving Memory and Self-Reflection
robustness
Large Language Models (LLMs) face challenges in accuracy and verification. An innovative approach, VTG, uses memory and retrieval to improve text generation.
Dec 14, 2023

Venn: Resource Management Across Federated Learning Jobs
production
TL;DR: Venn is an FL resource manager that efficiently schedules devices among FL jobs, improving job completion time.
Dec 13, 2023

Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4
robustness
Dynamic analysis with GPT-4 creates explanatory text for API calls to improve malware detection. Outperforms TextCNN with high generalization.
Dec 13, 2023

Prompting LLMs with content plans to enhance the summarization of scientific articles
prompt-engineering
Novel prompting techniques improve scientific article summarization by providing contextual information, showing performance gains for smaller models.
Dec 13, 2023

Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models
education
BD-LLM improves toxic content detection accuracy by using Decision-Tree-of-Thought prompting and student LMs.
Dec 13, 2023

A Precoding for ORIS-Assisted MIMO Multi-User VLC System
production
Multi-user VLC system improves SINR with ORIS and optimized precoding matrices, outperforming ZF and MMSE algorithms.
Dec 13, 2023

Enhancing Robot Program Synthesis Through Environmental Context
robustness
Recent work on program synthesis uses deep neural networks and language models to generate programs, addressing challenges with partially observed environments.
Dec 13, 2023

GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements
prompt-engineering
programming
Purpose statements for functions may be ambiguous; a heuristic is proposed to suggest clarifications using language models.
Dec 13, 2023

prompt-engineering-assisted Malware Dynamic Analysis Using GPT-4
robustness
Dynamic analysis with GPT-4 creates explanatory text for API calls to improve malware detection. Outperforms TextCNN with high generalization.
Dec 13, 2023

ReRoGCRL: Representation-based Robustness in Goal-Conditioned Reinforcement Learning
security
Propose new attack and defense mechanisms for robustness in GCRL, with superior performance validated. Tool available.
Dec 12, 2023

Expand-and-Quantize: Unsupervised Semantic Segmentation Using High-Dimensional Space and Product Quantization
production
TL;DR: EQUSS improves unsupervised semantic segmentation with high-dimensional clustering and information compression for better results.
Dec 12, 2023

ICL Markup: Structuring In-Context Learning using Soft-Token Tags
programming
TL;DR: Soft-token tags simplify model adaptation for various tasks, improving LLM performance in enterprise applications.
Dec 12, 2023

LLMEval: A Preliminary Study on How to Evaluate Large Language Models
education
This paper examines Large Language Model (LLM) evaluation methods, proposes a new dataset, and provides insights.
Dec 12, 2023

Eroding Trust In Aerial Imagery: Comprehensive Analysis and Evaluation Of Adversarial Attacks In Geospatial Systems
security
Adversarial attacks threaten aerial imagery integrity, requiring urgent analysis and mitigation.
Dec 12, 2023

Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
prompt-engineering
Proposes a diagnosis framework using prompt-based learning for clinical reasoning in disease diagnosis, evaluating machine-generated rationales for real-world clinical…
Dec 12, 2023

Can ChatGPT Play the Role of a Teaching Assistant in an Introductory Programming Course?
programming
education
Study evaluates ChatGPT as a virtual TA for programming course. Compares its performance with human TAs in solving assignments, grading, and providing feedback.
Dec 12, 2023

Integrating micro-learning content in traditional e-learning platforms
education
TL;DR: This article explores micro-learning as a solution for corporate training, proposing to integrate it into traditional learning systems.
Dec 11, 2023

Performance-lossless Black-box Model Watermarking
robustness
Propose watermarking protocol protects model IP with branch backdoor-based method, verified with language generation task.
Dec 11, 2023

Sparse but Strong: Crafting Adversarially Robust Graph Lottery Tickets
security
Graph Lottery Tickets (GLTs) reduce latency and footprint, but are vulnerable to structure attacks. A framework called ARGS enhances robustness.
Dec 11, 2023

Mean estimation in the add-remove model of differential privacy
production
New algorithm for mean estimation in differential privacy under add-remove model, with similar error to swap model. Factor-of-two improvement demonstrated.
Dec 11, 2023

LLM Interactive Optimization of Open Source Python Libraries – Case Studies and Generalization
hci
programming
LLMs like ChatGPT-4 can optimize energy and compute efficiency in python libraries with human input.
Dec 8, 2023

Coordination-free Decentralised Federated Learning on Complex Networks: Overcoming Heterogeneity
production
Decentralised Federated Learning (DFL) copes with edge computing challenges, enabling devices to train accurate models using a communication-efficient algorithm.
Dec 7, 2023

On the Impact of Multi-dimensional Local Differential Privacy on Fairness
production
Automated decision systems raise ethical concerns; multi-dimensional LDP can reduce disparity and maintain fairness.
Dec 7, 2023

Large Language Models for Mathematicians
programming
education
ChatGPT and similar models can aid professional mathematicians by improving work speed and quality.
Dec 7, 2023

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
programming
LaMPilot framework for autonomous driving uses code-generation to interpret user instructions effectively. GPT-4 achieved 92.7% task completion.
Dec 7, 2023

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
programming
Code-writing aids language models in Chain of Thought reasoning, improving linguistic and logical tasks. Chain of Code outperforms Chain of Thought.
Dec 7, 2023

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
production
3D-GS accelerates scene synthesis, uses few Gaussians with quantized representations, reduces memory, and speeds up training and rendering.
Dec 7, 2023

MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations
robustness
Propose MOCHa, a reinforcement learning approach, to reduce hallucinations in image captioning and demonstrate its superior performance.
Dec 6, 2023

Mitigating Data Injection Attacks on Federated Learning
security
A novel method detects and mitigates data injection attacks in federated learning, ensuring model accuracy and data privacy.
Dec 4, 2023
No matching items

    Edit this page