Large language models output the next token based on the sequence of tokens that came before it. Each generated token is selected from a "pool" of possible tokens, rather than simply selecting the most probable token (this would lead to boring and repetitive outputs).
The following samplers specify how tokens should be added and selected from the pool. Adjusting these parameters can drastically change how your model behaves.
- Default: 0.8
- Definition: Controls the "creativity" of the model by setting how likely the model is to choose a less probable token.
- Explanation: A very low temperature setting (0 - 0.2) can lead to repetition, as the model will lean towards repeating the previous sequence ("cat cat cat cat"). A very high temperature (2.0 or higher) can lead to nonsensical outputs as the next token will be unrelated to the previous sequence ("cat tree stew"). Typical values are between 0.5 ("cat climbed a tree") and 1.2 ("cat spoke eloquently").
- Default: 0.9
- Definition: Reduces the pool to tokens with a high probability of appearing next (that is, probabilities adding up to the top P value or higher).
- Explaination: Top-p sampling selects the next token from a small group of likely options, rather than considering every possible token. It improves quality by only considering words that have a high chance of being correct. For example, if you set Top P to 0.9, then only the most probable tokens that add up to 90% of the probability mass will be kept in the pool.
- Default: 30
- Definition: The number of high probabilty tokens that are kept in the selection pool.
- Explanation: If you set Top K to 10, then only the 10 most probable tokens will be kept in the pool. If you set it to 100, then the 100 most probable tokens will be kept in the pool. This is a good way to reduce the size of the pool, but it is less dynamic than Top P.
- Default: 1.0*
- Definition: How heavily a repeated token is penalized.
- Explanation: To avoid unnecessary repetition, repeat penalty looks at past tokens to reduce the probability of selecting a token that has already occured. A value of 1.0 is inactive. Typical values range between 1.0 - 1.2. Faraday defaults repeat penalty to be inactive as it behaves poorly when mirostat is turned on.
- Default: 128
- Definition: The number of trailing tokens in the context to consider when applying the repeat penalty.
- Explanation: Adjust this to determine how far back repetition penalty looks in the context. Increasing this will make outputs more conservative.
- Default: Enabled
- Definition: Mirostat adjusts the Top K dynamically to create a pool of tokens that will lead to a specific level of perplexity, or creativity. Note that Faraday uses Mirostat 2, not Mirostat 1.
- Explanation: Mirostat replaces the static Top K value with a dynamically adjusted Top K that adjusts the size of the token pool to ensure a desired level of creativity. It works well to make model responses sound natural while maintaining creativity. Note that while you can use repetition penalty with Mirostat enabled, we have found that doing so leads the model to quickly devolve into nonsense.
- Default: 0.1
- Definition: Learning rate determines how quickly the entropy (amount of creativity) is adjusted.
- Explanation: A higher learning rate will more quickly correct the entropy of the next token. The authors of the Mirostat paper recommend 0.1, but a value between 0.05 - 0.2 can be used.
- Default: 5.0
- Definition: The desired tau, or perplexity value that mirostat will aim to achieve.
- Explanation: Increasing Mirostat Entropy will increase the perplexity target, leading to less coherent, but more creative output. Lowering the value will be more coherent, but can cause repetition and "boredom". Some sources recommend that you set the entropy based on the size of the model: 5.0 for a 13B model, 6.0 for a 7B model and 4.0 for a 70B model.