Skip to content

CustomTransformer

This module defines a custom T5 Encoder model that replaces ReLU activation functions with GELU.

The CustomT5EncoderModel class extends the Transformer model from the sentence_transformers library and modifies the activation functions in the feed-forward networks of the transformer blocks to use GELU instead of ReLU. This modification can help improve model performance since GELU has been shown to work well in transformer architectures.

CustomT5EncoderModel

CustomT5EncoderModel(model_name_or_path: str, model_args: Optional[Dict] = None, max_seq_length: int = 256, do_lower_case: bool = False, dropout_rate: float = 0.2)

Bases: Transformer

Custom T5 Encoder model that replaces ReLU activation functions with GELU.

This class extends the Transformer model from the sentence_transformers library and modifies the activation functions in the feed-forward networks of the transformer blocks to use GELU instead of ReLU. GELU (Gaussian Error Linear Unit) is a smoother activation function that often performs better than ReLU in transformer architectures.

ATTRIBUTE DESCRIPTION
model_name_or_path

Name or path of the pre-trained T5 model to load.

TYPE: str

model_args

Additional arguments to pass to the T5 model constructor.

TYPE: Optional[Dict]

max_seq_length

Maximum sequence length for input text. Longer sequences will be truncated.

TYPE: int

do_lower_case

Whether to convert input text to lowercase before tokenization.

TYPE: bool

dropout_rate

Dropout rate to apply to the feed-forward networks.

TYPE: float

PARAMETER DESCRIPTION
model_name_or_path

Name or path of the pre-trained T5 model to load.

TYPE: str

model_args

Additional arguments to pass to the T5 model constructor. Defaults to an empty dict if None.

TYPE: Optional[Dict] DEFAULT: None

max_seq_length

Maximum sequence length for input text. Default is 256.

TYPE: int DEFAULT: 256

do_lower_case

Whether to convert input text to lowercase. Default is False.

TYPE: bool DEFAULT: False

dropout_rate

Dropout rate to apply to the feed-forward networks. Default is 0.2.

TYPE: float DEFAULT: 0.2

Source code in src/custom_transformer.py
def __init__(
    self,
    model_name_or_path: str,
    model_args: Optional[Dict] = None,
    max_seq_length: int = 256,
    do_lower_case: bool = False,
    dropout_rate: float = 0.2,
):
    """
    Initialize the CustomT5EncoderModel.

    Args:
        model_name_or_path (str): Name or path of the pre-trained T5 model to load.
        model_args (Optional[Dict]): Additional arguments to pass to the T5 model constructor.
            Defaults to an empty dict if None.
        max_seq_length (int): Maximum sequence length for input text. Default is 256.
        do_lower_case (bool): Whether to convert input text to lowercase. Default is False.
        dropout_rate (float): Dropout rate to apply to the feed-forward networks. Default is 0.2.
    """
    super().__init__(
        model_name_or_path=model_name_or_path,
        model_args=model_args if model_args is not None else {},
        max_seq_length=max_seq_length,
        do_lower_case=do_lower_case,
    )
    if not model_name_or_path.startswith("toobi/anime"):
        self.modify_activation(self.auto_model, dropout_rate)

modify_activation

modify_activation(model, dropout_rate)

Replace ReLU activation with GELU in all transformer blocks of the T5 encoder.

This method iterates through all transformer blocks in the encoder and replaces the ReLU activation in each feed-forward network with GELU activation.

PARAMETER DESCRIPTION
model

The underlying T5 transformer model whose activations will be modified.

dropout_rate

Dropout rate to apply to the feed-forward networks.

TYPE: float

Source code in src/custom_transformer.py
def modify_activation(self, model, dropout_rate):
    """
    Replace ReLU activation with GELU in all transformer blocks of the T5 encoder.

    This method iterates through all transformer blocks in the encoder and replaces
    the ReLU activation in each feed-forward network with GELU activation.

    Args:
        model: The underlying T5 transformer model whose activations will be modified.
        dropout_rate (float): Dropout rate to apply to the feed-forward networks.
    """
    if hasattr(model, 'encoder') and hasattr(model.encoder, 'block'):
        model.encoder.dropout = nn.Dropout(p=dropout_rate, inplace=False)
        for _, block in enumerate(model.encoder.block):
            # Accessing the feed-forward network within each block
            ff = block.layer[1].DenseReluDense
            # Replace ReLU with GELU
            ff.act = nn.GELU()
            # Set dropout rates using float values instead of nn.Dropout instances
            ff.dropout = nn.Dropout(p=dropout_rate, inplace=False)
            block.layer[0].dropout = nn.Dropout(p=dropout_rate, inplace=False)
            block.layer[1].dropout = nn.Dropout(p=dropout_rate, inplace=False)