Skip to content

Utility Functions

This page documents the utility components of AniSearch Model.

Overview

The utils package provides various utility functions and constants used throughout the application:

  • Display utilities for formatting and printing results
  • Logging configuration
  • Error handling utilities
  • Constants and configuration

Display Utilities

Functions for formatting and displaying search results:

Display Utilities

Formatting and display functions for the anime/manga search application.

This module provides utilities for displaying model information, formatting search results, and presenting information to users in a consistent and readable format. It's designed to be lightweight and not import any heavy ML dependencies, allowing it to be used for model listing without loading TensorFlow or PyTorch.

Features

  • Score formatting that adapts to different model types
  • Model listing capabilities for both pre-trained and fine-tuned models
  • User-friendly console display functions
  • Error handling for display operations

Usage Context

These utilities are primarily used in:

  1. CLI output formatting for search results
  2. Model listing for the '--list-models' CLI argument
  3. Interactive search result display
This module should NOT import any ML frameworks like TensorFlow or PyTorch,

as it's used for lightweight model listing without loading heavy dependencies.

MODEL_SAVE_PATH module-attribute

MODEL_SAVE_PATH = 'model/fine-tuned/'

display_available_models

display_available_models(fine_tuned_models: Optional[Dict[str, str]] = None) -> None

Display a formatted list of available models for searching or training.

This function prints a comprehensive list of available models to the console, organized by category. It displays both pre-trained models from the constants and fine-tuned models if provided. The output includes usage examples and a guide to help users select appropriate models for their needs.

The function is decorated with error handling to gracefully handle any exceptions that might occur during the display process.

PARAMETER DESCRIPTION
fine_tuned_models

Optional dictionary mapping model names to their paths. If provided, these fine-tuned models will be displayed in a separate section. If None, only pre-trained models will be shown.

TYPE: Optional[Dict[str, str]] DEFAULT: None

RETURNS DESCRIPTION
None

This function prints information to the console but doesn't return any values.

TYPE: None

Example Output
Available Pre-trained Cross-Encoder Models:
======================================

MS_MARCO_MODELS:
  ms-marco-MiniLM-L6-v2: cross-encoder/ms-marco-MiniLM-L6-v2
  ms-marco-TinyBERT-L6: cross-encoder/ms-marco-TinyBERT-L6
  ...

Available Fine-tuned Models:
==========================
  anime-search-v1: model/fine-tuned/anime-search-v1
  ...

Usage example:
  python src/main.py search --type anime --query "Your query"           --model "cross-encoder/ms-marco-MiniLM-L-6-v2"

To use a fine-tuned model:
  python src/main.py search --type anime --query "Your query"           --model "model/fine-tuned/your-model-name"

Model selection guide:
- TinyBERT models: Smallest and fastest, good for low-resource environments
- MiniLM models: Good balance of performance and efficiency
...
Notes
  • The function accesses ALTERNATIVE_MODELS from constants.py
  • The output includes usage examples tailored to the available models
  • The model selection guide helps users choose appropriate models
  • Error handling is provided by the @handle_exceptions decorator
Source code in src/utils/display.py
@handle_exceptions(cli_mode=True, reraise=False)
def display_available_models(
    fine_tuned_models: Optional[Dict[str, str]] = None,
) -> None:
    """
    Display a formatted list of available models for searching or training.

    This function prints a comprehensive list of available models to the console,
    organized by category. It displays both pre-trained models from the constants
    and fine-tuned models if provided. The output includes usage examples and a
    guide to help users select appropriate models for their needs.

    The function is decorated with error handling to gracefully handle any exceptions
    that might occur during the display process.

    Args:
        fine_tuned_models: Optional dictionary mapping model names to their paths.
            If provided, these fine-tuned models will be displayed in a separate section.
            If None, only pre-trained models will be shown.

    Returns:
        None: This function prints information to the console but doesn't return any values.

    Example Output:
        ```
        Available Pre-trained Cross-Encoder Models:
        ======================================

        MS_MARCO_MODELS:
          ms-marco-MiniLM-L6-v2: cross-encoder/ms-marco-MiniLM-L6-v2
          ms-marco-TinyBERT-L6: cross-encoder/ms-marco-TinyBERT-L6
          ...

        Available Fine-tuned Models:
        ==========================
          anime-search-v1: model/fine-tuned/anime-search-v1
          ...

        Usage example:
          python src/main.py search --type anime --query "Your query" \
          --model "cross-encoder/ms-marco-MiniLM-L-6-v2"

        To use a fine-tuned model:
          python src/main.py search --type anime --query "Your query" \
          --model "model/fine-tuned/your-model-name"

        Model selection guide:
        - TinyBERT models: Smallest and fastest, good for low-resource environments
        - MiniLM models: Good balance of performance and efficiency
        ...
        ```

    Notes:
        - The function accesses ALTERNATIVE_MODELS from constants.py
        - The output includes usage examples tailored to the available models
        - The model selection guide helps users choose appropriate models
        - Error handling is provided by the @handle_exceptions decorator
    """
    models = ALTERNATIVE_MODELS

    print("\nAvailable Pre-trained Cross-Encoder Models:")
    print("======================================")

    for category, model_dict in models.items():
        print(f"\n{category.upper()}:")
        for name, path in model_dict.items():
            print(f"  {name}: {path}")

    # Display fine-tuned models if provided
    if fine_tuned_models:
        print("\nAvailable Fine-tuned Models:")
        print("==========================")
        for name, path in fine_tuned_models.items():
            print(f"  {name}: {path}")

    print("\nUsage example:")
    print(
        '  python src/main.py search --type anime --query "Your query" '
        '--model "cross-encoder/ms-marco-MiniLM-L-6-v2"'
    )
    if fine_tuned_models:
        print("\nTo use a fine-tuned model:")
        print(
            '  python src/main.py search --type anime --query "Your query" '
            '--model "model/fine-tuned/your-model-name"'
        )
    print("\nModel selection guide:")
    print("- TinyBERT models: Smallest and fastest, good for low-resource environments")
    print("- MiniLM models: Good balance of performance and efficiency")
    print("- ELECTRA models: Higher accuracy but more computationally intensive")
    print("- MS Marco models: Optimized for information retrieval")
    print("- Fine-tuned models: Domain-specific models trained on anime/manga data")

format_score

format_score(score: float, normalize_scores: bool, model_name: str) -> str

Format a model's relevance score for user-friendly display.

This function formats search result scores differently based on the model type and normalization settings. It handles two main cases:

  1. MS Marco models or other models with normalized scores (0-1 range) These are displayed as percentages for intuitive interpretation
  2. Other models with unnormalized scores These are displayed as raw float values with fixed precision
PARAMETER DESCRIPTION
score

The raw relevance score from the model, typically a float between 0 and 1 for normalized models, or any float range for others.

TYPE: float

normalize_scores

Boolean flag indicating whether the scores are normalized to the 0-1 range. This affects the display format.

TYPE: bool

model_name

Name of the model that produced the score, used to detect specific model types that require special formatting.

TYPE: str

RETURNS DESCRIPTION
str

A formatted string representation of the score, either as a percentage (e.g., "95.2% relevance") or as a raw score (e.g., "score: 0.952").

TYPE: str

Examples:

# Format a score from an MS Marco model
formatted = format_score(0.952, True, "cross-encoder/ms-marco-MiniLM-L-6-v2")
# Result: "95.2% relevance"

# Format a score from a non-normalized model
formatted = format_score(4.73, False, "cross-encoder/stsb-roberta-base")
# Result: "score: 4.730"
Notes
  • MS Marco models are automatically detected by checking if "ms-marco" appears in the model name (case-insensitive)
  • Percentage format shows one decimal place (e.g., 95.2%)
  • Raw score format shows three decimal places (e.g., 4.730)
Source code in src/utils/display.py
def format_score(score: float, normalize_scores: bool, model_name: str) -> str:
    """
    Format a model's relevance score for user-friendly display.

    This function formats search result scores differently based on the model type
    and normalization settings. It handles two main cases:

    1. MS Marco models or other models with normalized scores (0-1 range)
       These are displayed as percentages for intuitive interpretation
    2. Other models with unnormalized scores
       These are displayed as raw float values with fixed precision

    Args:
        score: The raw relevance score from the model, typically a float between
            0 and 1 for normalized models, or any float range for others.
        normalize_scores: Boolean flag indicating whether the scores are normalized
            to the 0-1 range. This affects the display format.
        model_name: Name of the model that produced the score, used to detect
            specific model types that require special formatting.

    Returns:
        str: A formatted string representation of the score, either as a percentage
            (e.g., "95.2% relevance") or as a raw score (e.g., "score: 0.952").

    Examples:
        ```python
        # Format a score from an MS Marco model
        formatted = format_score(0.952, True, "cross-encoder/ms-marco-MiniLM-L-6-v2")
        # Result: "95.2% relevance"

        # Format a score from a non-normalized model
        formatted = format_score(4.73, False, "cross-encoder/stsb-roberta-base")
        # Result: "score: 4.730"
        ```

    Notes:
        - MS Marco models are automatically detected by checking if "ms-marco" appears
          in the model name (case-insensitive)
        - Percentage format shows one decimal place (e.g., 95.2%)
        - Raw score format shows three decimal places (e.g., 4.730)
    """
    if normalize_scores or "ms-marco" in model_name.lower():
        # For MS Marco models or normalized scores, display as percentage
        return f"{score:.1%} relevance"
    # For other models, just show the raw score
    return f"score: {score:.3f}"

list_fine_tuned_models

list_fine_tuned_models() -> Dict[str, str]

List all available fine-tuned models in the model directory.

This function scans the fine-tuned model directory for valid model folders, identifying them by the presence of a config.json file. It returns a mapping of model names to their full paths, which can be used to load the models.

This is a lightweight implementation that doesn't import heavy ML frameworks like TensorFlow or PyTorch, making it suitable for quick model listing without the overhead of loading these dependencies.

RETURNS DESCRIPTION
Dict[str, str]

Dict[str, str]: A dictionary where: - Keys are the model directory names (model identifiers) - Values are the full paths to the model directories

Returns an empty dictionary if no models are found or if the model directory doesn't exist.

Examples:

# Get a dictionary of available fine-tuned models
models = list_fine_tuned_models()

# Print the available models
if models:
    print("Available fine-tuned models:")
    for name, path in models.items():
        print(f"- {name}: {path}")
else:
    print("No fine-tuned models found")
Notes
  • Models are identified by the presence of a config.json file
  • The default search location is the MODEL_SAVE_PATH constant ("model/fine-tuned/")
  • This function does not validate that the models are functional or compatible
  • Use this function before attempting to load a fine-tuned model to check availability
Source code in src/utils/display.py
def list_fine_tuned_models() -> Dict[str, str]:
    """
    List all available fine-tuned models in the model directory.

    This function scans the fine-tuned model directory for valid model folders,
    identifying them by the presence of a config.json file. It returns a mapping
    of model names to their full paths, which can be used to load the models.

    This is a lightweight implementation that doesn't import heavy ML frameworks
    like TensorFlow or PyTorch, making it suitable for quick model listing without
    the overhead of loading these dependencies.

    Returns:
        Dict[str, str]: A dictionary where:
            - Keys are the model directory names (model identifiers)
            - Values are the full paths to the model directories

            Returns an empty dictionary if no models are found or if the
            model directory doesn't exist.

    Examples:
        ```python
        # Get a dictionary of available fine-tuned models
        models = list_fine_tuned_models()

        # Print the available models
        if models:
            print("Available fine-tuned models:")
            for name, path in models.items():
                print(f"- {name}: {path}")
        else:
            print("No fine-tuned models found")
        ```

    Notes:
        - Models are identified by the presence of a config.json file
        - The default search location is the MODEL_SAVE_PATH constant ("model/fine-tuned/")
        - This function does not validate that the models are functional or compatible
        - Use this function before attempting to load a fine-tuned model to check availability
    """
    if not os.path.exists(MODEL_SAVE_PATH):
        return {}

    fine_tuned_models = {}
    for model_name in os.listdir(MODEL_SAVE_PATH):
        model_path = os.path.join(MODEL_SAVE_PATH, model_name)
        config_path = os.path.join(model_path, "config.json")

        if os.path.isdir(model_path) and os.path.exists(config_path):
            fine_tuned_models[model_name] = model_path

    return fine_tuned_models

Logging Configuration

Setup and configuration of application logging:

Logging Configuration

Centralized logging setup for the anime/manga search application.

This module provides a standardized logging configuration to ensure consistent log formatting, appropriate log levels, and proper output handling across all application components. It implements a simple, reusable logging setup that can be called during application initialization.

Features

  • Consistent timestamp and log level formatting
  • Console output through StreamHandler
  • Error handling for logging setup via handle_exceptions decorator
  • INFO level logging by default for appropriate verbosity

Usage Context

The logging configuration is typically initialized at application startup:

  1. Called in the main entry point before any other operations
  2. Used by error handling utilities to log exceptions
  3. Available for all modules to use for consistent logging

Having a centralized logging configuration ensures that all components produce logs in a consistent format, making debugging and monitoring easier.

setup_logging

setup_logging() -> None

Configure logging for the application with standardized formatting.

This function initializes the Python logging system with consistent formatting, appropriate log levels, and console output. It sets up:

  • INFO level logging for moderate verbosity
  • Timestamp, level, and message formatting
  • Console output through a StreamHandler

The function is decorated with handle_exceptions to ensure that any issues during logging configuration are properly captured and reported.

RETURNS DESCRIPTION
None

This function configures the global logging system but doesn't return any value.

TYPE: None

Example
# Initialize logging at application startup
from src.utils.logging_config import setup_logging

def main():
    # Set up logging first
    setup_logging()

    # Now all subsequent log calls will use this configuration
    logging.info("Application starting")
    # ...
Notes
  • This function should be called once at application startup
  • The log format includes timestamp, log level, and message
  • The default level (INFO) can be overridden through environment variables if the standard logging configuration mechanisms are used
Source code in src/utils/logging_config.py
@handle_exceptions(log_exceptions=True, include_exc_info=True)
def setup_logging() -> None:
    """
    Configure logging for the application with standardized formatting.

    This function initializes the Python logging system with consistent formatting,
    appropriate log levels, and console output. It sets up:

    - INFO level logging for moderate verbosity
    - Timestamp, level, and message formatting
    - Console output through a StreamHandler

    The function is decorated with handle_exceptions to ensure that any issues
    during logging configuration are properly captured and reported.

    Returns:
        None: This function configures the global logging system but doesn't
            return any value.

    Example:
        ```python
        # Initialize logging at application startup
        from src.utils.logging_config import setup_logging

        def main():
            # Set up logging first
            setup_logging()

            # Now all subsequent log calls will use this configuration
            logging.info("Application starting")
            # ...
        ```

    Notes:
        - This function should be called once at application startup
        - The log format includes timestamp, log level, and message
        - The default level (INFO) can be overridden through environment variables
          if the standard logging configuration mechanisms are used
    """
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s - %(levelname)s - %(message)s",
        handlers=[logging.StreamHandler()],
    )

Error Handling

Utilities for centralized error handling:

Error Handling Utilities

Standardized error handling functionality for the anime/manga search application.

This module provides reusable decorators and utilities to maintain consistent error handling across the application. It implements a centralized approach to catching, logging, and reporting exceptions, allowing for more maintainable and user-friendly error handling.

Features

  • Decorator-based exception handling for consistent behavior
  • Configurable error logging with severity control
  • Customizable user-friendly error messages for CLI applications
  • Type-safe implementation with proper generic typing

Usage Context

These utilities are primarily used for:

  1. Wrapping IO-heavy functions that may encounter file or network issues
  2. Handling user input validation in CLI commands
  3. Gracefully managing expected exceptions in model loading and inference
  4. Providing informative error messages in both CLI and logging contexts

By centralizing error handling logic, the application maintains consistent behavior across different components and provides better debugging information when issues occur.

COMMON_EXCEPTIONS module-attribute

COMMON_EXCEPTIONS: Dict[Type[Exception], str] = {ValueError: 'Invalid value or parameter', KeyError: 'Missing data in results', ImportError: 'Failed to import required module', RuntimeError: 'Runtime error during operation', MemoryError: 'Insufficient memory to process this operation', FileNotFoundError: 'Required model or data file not found', PermissionError: 'Permission denied when accessing files', TimeoutError: 'Operation timed out', ConnectionError: 'Network connection error'}

F module-attribute

F = TypeVar('F', bound=Callable[..., Any])

logger module-attribute

logger = getLogger(__name__)

handle_exceptions

handle_exceptions(cli_mode: bool = False, exceptions: Optional[List[Type[Exception]]] = None, log_exceptions: bool = True, include_exc_info: bool = False, reraise: bool = True) -> Callable[[F], F]

Decorator for handling exceptions in a standardized way across the application.

This decorator wraps functions to provide consistent exception handling, including:

  1. Catching specified exceptions or all common exceptions by default
  2. Logging exceptions with configurable verbosity
  3. Presenting user-friendly error messages in CLI mode
  4. Optionally re-raising exceptions after handling

The decorator can be configured for different contexts (CLI vs. background processing) and adjusted for different levels of verbosity and strictness.

PARAMETER DESCRIPTION
cli_mode

Whether to print user-friendly messages to the console. When True, error messages are formatted for end users and printed to stdout. When False, errors are only logged (if log_exceptions is True). Default is False.

TYPE: bool DEFAULT: False

exceptions

List of exception types to catch explicitly. If None (default), uses all exceptions defined in COMMON_EXCEPTIONS. Specify a subset of exceptions to handle only specific error types.

TYPE: Optional[List[Type[Exception]]] DEFAULT: None

log_exceptions

Whether to log the exceptions to the application logger. When True, exceptions are logged using the module's logger. When False, exceptions are not logged (useful when handled elsewhere). Default is True.

TYPE: bool DEFAULT: True

include_exc_info

Whether to include exception traceback in logs. When True, full exception traceback is included in log messages. When False, only the exception message is logged. Default is False.

TYPE: bool DEFAULT: False

reraise

Whether to re-raise the exception after handling. When True, the exception is re-raised after logging/printing. When False, the function returns None instead of re-raising. Default is True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
Callable[[F], F]

Callable[[F], F]: A decorator function that wraps the target function with the specified exception handling behavior.

Examples:

Basic usage with default settings (logs exceptions and reraises):

@handle_exceptions()
def load_data(filepath):
    with open(filepath) as f:
        return json.load(f)

CLI-friendly error handling without reraising:

@handle_exceptions(cli_mode=True, reraise=False)
def process_user_input(user_input):
    # Process user input, potentially raising exceptions
    return validated_input

Handling only specific exceptions:

@handle_exceptions(
    exceptions=[FileNotFoundError, PermissionError],
    cli_mode=True
)
def read_config_file(filepath):
    with open(filepath) as f:
        return f.read()

Notes
  • When an exception is caught but not reraised (reraise=False), the function returns None. Callers should handle this case appropriately.
  • In CLI mode, caught exceptions generate user-friendly error messages based on the COMMON_EXCEPTIONS mapping.
  • Unexpected exceptions (not in the exceptions list) are always logged with full traceback information, regardless of the include_exc_info setting.
Source code in src/utils/error_handling.py
def handle_exceptions(
    cli_mode: bool = False,
    exceptions: Optional[List[Type[Exception]]] = None,
    log_exceptions: bool = True,
    include_exc_info: bool = False,
    reraise: bool = True,
) -> Callable[[F], F]:
    """
    Decorator for handling exceptions in a standardized way across the application.

    This decorator wraps functions to provide consistent exception handling, including:

    1. Catching specified exceptions or all common exceptions by default
    2. Logging exceptions with configurable verbosity
    3. Presenting user-friendly error messages in CLI mode
    4. Optionally re-raising exceptions after handling

    The decorator can be configured for different contexts (CLI vs. background processing)
    and adjusted for different levels of verbosity and strictness.

    Args:
        cli_mode: Whether to print user-friendly messages to the console.
            When True, error messages are formatted for end users and printed to stdout.
            When False, errors are only logged (if log_exceptions is True).
            Default is False.

        exceptions: List of exception types to catch explicitly.
            If None (default), uses all exceptions defined in COMMON_EXCEPTIONS.
            Specify a subset of exceptions to handle only specific error types.

        log_exceptions: Whether to log the exceptions to the application logger.
            When True, exceptions are logged using the module's logger.
            When False, exceptions are not logged (useful when handled elsewhere).
            Default is True.

        include_exc_info: Whether to include exception traceback in logs.
            When True, full exception traceback is included in log messages.
            When False, only the exception message is logged.
            Default is False.

        reraise: Whether to re-raise the exception after handling.
            When True, the exception is re-raised after logging/printing.
            When False, the function returns None instead of re-raising.
            Default is True.

    Returns:
        Callable[[F], F]: A decorator function that wraps the target function
            with the specified exception handling behavior.

    Examples:
        Basic usage with default settings (logs exceptions and reraises):
        ```python
        @handle_exceptions()
        def load_data(filepath):
            with open(filepath) as f:
                return json.load(f)
        ```

        CLI-friendly error handling without reraising:
        ```python
        @handle_exceptions(cli_mode=True, reraise=False)
        def process_user_input(user_input):
            # Process user input, potentially raising exceptions
            return validated_input
        ```

        Handling only specific exceptions:
        ```python
        @handle_exceptions(
            exceptions=[FileNotFoundError, PermissionError],
            cli_mode=True
        )
        def read_config_file(filepath):
            with open(filepath) as f:
                return f.read()
        ```

    Notes:
        - When an exception is caught but not reraised (reraise=False), the function
          returns None. Callers should handle this case appropriately.
        - In CLI mode, caught exceptions generate user-friendly error messages based
          on the COMMON_EXCEPTIONS mapping.
        - Unexpected exceptions (not in the exceptions list) are always logged with
          full traceback information, regardless of the include_exc_info setting.
    """
    if exceptions is None:
        exceptions = list(COMMON_EXCEPTIONS.keys())

    def decorator(func: F) -> F:
        @functools.wraps(func)
        def wrapper(*args: Any, **kwargs: Any) -> Any:
            try:
                return func(*args, **kwargs)
            except Exception as e:  # Catch all exceptions first to determine type
                exception_type = type(e)

                # Handle specific exception types
                if exception_type in exceptions:
                    # Get the exception message, defaulting to the exception type name
                    message = COMMON_EXCEPTIONS.get(
                        exception_type, str(exception_type.__name__)
                    )
                    error_detail = f"{message}: {str(e)}"

                    # Handle based on mode (CLI or logging)
                    if cli_mode:
                        print(f"Error: {error_detail}")

                    if log_exceptions:
                        context = f"Error in {func.__name__}"
                        logger.error(
                            "%s: %s", context, error_detail, exc_info=include_exc_info
                        )
                # Handle unexpected exceptions
                else:
                    if cli_mode:
                        print(f"Unexpected error: {str(e)}")
                        print("Please report this issue to the developers")

                    if log_exceptions:
                        logger.error(
                            "Unexpected error in %s: %s",
                            func.__name__,
                            str(e),
                            exc_info=True,
                        )

                if reraise:
                    raise
                return None

        return cast(F, wrapper)

    return decorator

Constants

Configuration constants and default values:

Application Constants

Centralized configuration constants for the anime/manga search application.

This module defines all the global constants used throughout the application, including default paths, model configurations, and search parameters. Centralizing these values makes it easier to maintain consistent settings across the application and simplifies configuration changes.

Constants Categories

  • Dataset Paths: Locations of the merged anime and manga datasets
  • Model Configuration: Default model name, batch size, and result count
  • Alternative Models: Dictionary of other models that can be used

Usage Context

These constants are imported and used throughout the application:

  1. Model initialization uses the model name and dataset paths
  2. Search operations use the default number of results and batch size
  3. Model listing commands use the alternative models dictionary

Using constants instead of hard-coded values improves maintainability and ensures consistency across the application.

ALTERNATIVE_MODELS module-attribute

ALTERNATIVE_MODELS: Dict[str, Dict[str, str]] = {'ms_marco_models': {'ms-marco-MiniLM-L2-v2': 'cross-encoder/ms-marco-MiniLM-L2-v2', 'ms-marco-MiniLM-L4-v2': 'cross-encoder/ms-marco-MiniLM-L4-v2', 'ms-marco-MiniLM-L6-v2': 'cross-encoder/ms-marco-MiniLM-L6-v2', 'ms-marco-MiniLM-L12-v2': 'cross-encoder/ms-marco-MiniLM-L12-v2', 'ms-marco-TinyBERT-L2': 'cross-encoder/ms-marco-TinyBERT-L2', 'ms-marco-TinyBERT-L2-v2': 'cross-encoder/ms-marco-TinyBERT-L2-v2', 'ms-marco-TinyBERT-L4': 'cross-encoder/ms-marco-TinyBERT-L4', 'ms-marco-TinyBERT-L6': 'cross-encoder/ms-marco-TinyBERT-L6', 'ms-marco-electra-base': 'cross-encoder/ms-marco-electra-base'}}

Dictionary of alternative cross-encoder models available for use.

Organized by category (currently only 'ms_marco_models'), this dictionary maps friendly model names to their full HuggingFace model identifiers. These models can be selected via the --model command-line argument.

Model performance characteristics: - TinyBERT models: Smallest and fastest, good for low-resource environments - MiniLM models: Good balance of performance and efficiency - ELECTRA models: Higher accuracy but more computationally intensive

ANIME_DATASET_PATH module-attribute

ANIME_DATASET_PATH: str = 'model/merged_anime_dataset.csv'

Path to the merged anime dataset CSV file.

DEFAULT_BATCH_SIZE module-attribute

DEFAULT_BATCH_SIZE: int = 256

Default batch size for model inference and data processing operations.

Larger batch sizes generally provide better performance up to hardware limits. This value may need adjustment based on available memory and processor capabilities.

MANGA_DATASET_PATH module-attribute

MANGA_DATASET_PATH: str = 'model/merged_manga_dataset.csv'

Path to the merged manga dataset CSV file.

MODEL_NAME module-attribute

MODEL_NAME: str = 'cross-encoder/ms-marco-MiniLM-L-6-v2'

Default cross-encoder model used for search operations.

This model offers a good balance between performance and accuracy for text ranking tasks. It's a MiniLM model with 6 layers, which makes it relatively lightweight while still providing good search results.

NUM_RESULTS module-attribute

NUM_RESULTS: int = 5

Default number of search results to return.

This controls how many top matches are returned when performing a search operation. Can be overridden via command-line arguments.