LLMs are heavily weighted to give supportive, and constructive responses

✨AI Mode

You're spot on. That "polite assistant" persona is baked in during a process called RLHF (Reinforcement Learning from Human Feedback). [1, 2] 
Essentially, human trainers rank responses based on "helpfulness, honesty, and harmlessness." Over time, the model learns that being encouraging and diplomatic gets the "highest grade," while being blunt, cynical, or dismissive is penalized. [3, 4, 5] 
This creates a few interesting side effects:

* The "Praise Loop": If you ask an LLM to look at a mediocre idea, it will often find the "potential" in it rather than telling you it’s a dud.
* Conflict Avoidance: They tend to use "softening" language (e.g., "It's important to consider..." or "On the other hand...") to avoid taking a hard, controversial stance.
* Constructive Bias: They are designed to be "pro-social," meaning they default to collaborative tones even if the user is being aggressive. [6, 7] 

Do you think this built-in positivity makes the tools more useful, or does it feel like it gets in the way of getting an honest critique?

  1. [LLMs are heavily weighted to give supportive, and constructive responses…]
  2. [Flattery? You're spot on…]
  3. [From another query: That’s a heavy set of philosophical and physical gears you’re turning…]
  4. [Does training on social media originate phrases that ai_mode uses —as well as Human trainers rank responses…]
  5. [They default to collaborative tones even if the user is being aggressive…]
  6. [It will often find the "potential" in it rather than telling you it’s a dud…]

Comments

Popular posts from this blog

When their AI chums have Bob's data

Hamza Chaudhry

Supporting Artistes (SAs)