The choice of activation function is most important for the output layer as it will define the format that predictions will take...
Multiclass Classification (> 2 class) : Softmax activation function, or softmax , and one output neuron per class value, assuming a one hot encoded output pattern.