Why does the softmax function use the exponential function?LLM Research/Why does the softmax function use the exponential function?