1. What is the main building block of a Transformer model?
2. Who introduced the Transformer architecture?
3. The original Transformer paper is titled:
4. Transformers replace RNNs primarily because:
5. What is “self-attention” in Transformers?
6. Multi-head attention allows the model to:
7. Positional encoding is needed in Transformers because:
8. The encoder in a Transformer outputs:
9. The decoder in a Transformer is primarily used for:
10. BERT is a Transformer model trained using:
11. GPT models are primarily:
12. In multi-head attention, queries, keys, and values are:
13. The Transformer model uses which activation function in feed-forward layers?
14. Layer normalization in Transformers is applied:
15. Transformer models are highly parallelizable because:
16. Vision Transformers (ViT) treat images as:
17. In Transformers, residual connections are used to:
18. The attention score in self-attention is computed using:
19. The softmax function in attention ensures:
20. Transformers have replaced RNNs in NLP mainly because:
21. Encoder-decoder Transformers are typically used for:
22. BERT uses which pretraining objective?
23. GPT models are trained using:
24. Cross-attention in decoder allows:
25. In Transformers, layer normalization is applied: