CustomTransformer
This module defines a custom T5 Encoder model that replaces ReLU activation functions with GELU.
The CustomT5EncoderModel class extends the Transformer model from the sentence_transformers library and modifies the activation functions in the feed-forward networks of the transformer blocks to use GELU instead of ReLU. This modification can help improve model performance since GELU has been shown to work well in transformer architectures.
CustomT5EncoderModel
¶
CustomT5EncoderModel(model_name_or_path: str, model_args: Optional[Dict] = None, max_seq_length: int = 256, do_lower_case: bool = False, dropout_rate: float = 0.2)
Bases: Transformer
Custom T5 Encoder model that replaces ReLU activation functions with GELU.
This class extends the Transformer model from the sentence_transformers library and modifies the activation functions in the feed-forward networks of the transformer blocks to use GELU instead of ReLU. GELU (Gaussian Error Linear Unit) is a smoother activation function that often performs better than ReLU in transformer architectures.
ATTRIBUTE | DESCRIPTION |
---|---|
model_name_or_path |
Name or path of the pre-trained T5 model to load.
TYPE:
|
model_args |
Additional arguments to pass to the T5 model constructor.
TYPE:
|
max_seq_length |
Maximum sequence length for input text. Longer sequences will be truncated.
TYPE:
|
do_lower_case |
Whether to convert input text to lowercase before tokenization.
TYPE:
|
dropout_rate |
Dropout rate to apply to the feed-forward networks.
TYPE:
|
PARAMETER | DESCRIPTION |
---|---|
model_name_or_path
|
Name or path of the pre-trained T5 model to load.
TYPE:
|
model_args
|
Additional arguments to pass to the T5 model constructor. Defaults to an empty dict if None.
TYPE:
|
max_seq_length
|
Maximum sequence length for input text. Default is 256.
TYPE:
|
do_lower_case
|
Whether to convert input text to lowercase. Default is False.
TYPE:
|
dropout_rate
|
Dropout rate to apply to the feed-forward networks. Default is 0.2.
TYPE:
|
Source code in src/custom_transformer.py
modify_activation
¶
Replace ReLU activation with GELU in all transformer blocks of the T5 encoder.
This method iterates through all transformer blocks in the encoder and replaces the ReLU activation in each feed-forward network with GELU activation.
PARAMETER | DESCRIPTION |
---|---|
model
|
The underlying T5 transformer model whose activations will be modified.
|
dropout_rate
|
Dropout rate to apply to the feed-forward networks.
TYPE:
|