CustomTransformer

This module defines a custom T5 Encoder model that replaces ReLU activation functions with GELU.

The CustomT5EncoderModel class extends the Transformer model from the sentence_transformers library and modifies the activation functions in the feed-forward networks of the transformer blocks to use GELU instead of ReLU. This modification can help improve model performance since GELU has been shown to work well in transformer architectures.

CustomT5EncoderModel ¶

CustomT5EncoderModel(model_name_or_path: str, model_args: Optional[Dict] = None, max_seq_length: int = 256, do_lower_case: bool = False, dropout_rate: float = 0.2)

Bases: Transformer

Custom T5 Encoder model that replaces ReLU activation functions with GELU.

This class extends the Transformer model from the sentence_transformers library and modifies the activation functions in the feed-forward networks of the transformer blocks to use GELU instead of ReLU. GELU (Gaussian Error Linear Unit) is a smoother activation function that often performs better than ReLU in transformer architectures.

ATTRIBUTE	DESCRIPTION
`model_name_or_path`	Name or path of the pre-trained T5 model to load. TYPE: `str`
`model_args`	Additional arguments to pass to the T5 model constructor. TYPE: `Optional[Dict]`
`max_seq_length`	Maximum sequence length for input text. Longer sequences will be truncated. TYPE: `int`
`do_lower_case`	Whether to convert input text to lowercase before tokenization. TYPE: `bool`
`dropout_rate`	Dropout rate to apply to the feed-forward networks. TYPE: `float`

PARAMETER	DESCRIPTION
`model_name_or_path`	Name or path of the pre-trained T5 model to load. TYPE: `str`
`model_args`	Additional arguments to pass to the T5 model constructor. Defaults to an empty dict if None. TYPE: `Optional[Dict]` DEFAULT: `None`
`max_seq_length`	Maximum sequence length for input text. Default is 256. TYPE: `int` DEFAULT: `256`
`do_lower_case`	Whether to convert input text to lowercase. Default is False. TYPE: `bool` DEFAULT: `False`
`dropout_rate`	Dropout rate to apply to the feed-forward networks. Default is 0.2. TYPE: `float` DEFAULT: `0.2`

Source code in src/custom_transformer.py

def __init__(
    self,
    model_name_or_path: str,
    model_args: Optional[Dict] = None,
    max_seq_length: int = 256,
    do_lower_case: bool = False,
    dropout_rate: float = 0.2,
):
    """
    Initialize the CustomT5EncoderModel.

    Args:
        model_name_or_path (str): Name or path of the pre-trained T5 model to load.
        model_args (Optional[Dict]): Additional arguments to pass to the T5 model constructor.
            Defaults to an empty dict if None.
        max_seq_length (int): Maximum sequence length for input text. Default is 256.
        do_lower_case (bool): Whether to convert input text to lowercase. Default is False.
        dropout_rate (float): Dropout rate to apply to the feed-forward networks. Default is 0.2.
    """
    super().__init__(
        model_name_or_path=model_name_or_path,
        model_args=model_args if model_args is not None else {},
        max_seq_length=max_seq_length,
        do_lower_case=do_lower_case,
    )
    if not model_name_or_path.startswith("toobi/anime"):
        self.modify_activation(self.auto_model, dropout_rate)

modify_activation ¶

modify_activation(model, dropout_rate)

Replace ReLU activation with GELU in all transformer blocks of the T5 encoder.

This method iterates through all transformer blocks in the encoder and replaces the ReLU activation in each feed-forward network with GELU activation.

PARAMETER	DESCRIPTION
`model`	The underlying T5 transformer model whose activations will be modified.
`dropout_rate`	Dropout rate to apply to the feed-forward networks. TYPE: `float`

Source code in src/custom_transformer.py

def modify_activation(self, model, dropout_rate):
    """
    Replace ReLU activation with GELU in all transformer blocks of the T5 encoder.

    This method iterates through all transformer blocks in the encoder and replaces
    the ReLU activation in each feed-forward network with GELU activation.

    Args:
        model: The underlying T5 transformer model whose activations will be modified.
        dropout_rate (float): Dropout rate to apply to the feed-forward networks.
    """
    if hasattr(model, 'encoder') and hasattr(model.encoder, 'block'):
        model.encoder.dropout = nn.Dropout(p=dropout_rate, inplace=False)
        for _, block in enumerate(model.encoder.block):
            # Accessing the feed-forward network within each block
            ff = block.layer[1].DenseReluDense
            # Replace ReLU with GELU
            ff.act = nn.GELU()
            # Set dropout rates using float values instead of nn.Dropout instances
            ff.dropout = nn.Dropout(p=dropout_rate, inplace=False)
            block.layer[0].dropout = nn.Dropout(p=dropout_rate, inplace=False)
            block.layer[1].dropout = nn.Dropout(p=dropout_rate, inplace=False)