Utility Functions¶

This page documents the utility components of AniSearch Model.

Overview¶

The utils package provides various utility functions and constants used throughout the application:

Display utilities for formatting and printing results
Logging configuration
Error handling utilities
Constants and configuration

Display Utilities¶

Functions for formatting and displaying search results:

Display Utilities¶

Formatting and display functions for the anime/manga search application.

This module provides utilities for displaying model information, formatting search results, and presenting information to users in a consistent and readable format. It's designed to be lightweight and not import any heavy ML dependencies, allowing it to be used for model listing without loading TensorFlow or PyTorch.

Features¶

Score formatting that adapts to different model types
Model listing capabilities for both pre-trained and fine-tuned models
User-friendly console display functions
Error handling for display operations

Usage Context¶

These utilities are primarily used in:

CLI output formatting for search results
Model listing for the '--list-models' CLI argument
Interactive search result display

This module should NOT import any ML frameworks like TensorFlow or PyTorch,

as it's used for lightweight model listing without loading heavy dependencies.

MODEL_SAVE_PATH `module-attribute` ¶

MODEL_SAVE_PATH = 'model/fine-tuned/'

display_available_models ¶

display_available_models(fine_tuned_models: Optional[Dict[str, str]] = None) -> None

Display a formatted list of available models for searching or training.

This function prints a comprehensive list of available models to the console, organized by category. It displays both pre-trained models from the constants and fine-tuned models if provided. The output includes usage examples and a guide to help users select appropriate models for their needs.

The function is decorated with error handling to gracefully handle any exceptions that might occur during the display process.

PARAMETER	DESCRIPTION
`fine_tuned_models`	Optional dictionary mapping model names to their paths. If provided, these fine-tuned models will be displayed in a separate section. If None, only pre-trained models will be shown. TYPE: `Optional[Dict[str, str]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`None`	This function prints information to the console but doesn't return any values. TYPE: `None`

Example Output

Available Pre-trained Cross-Encoder Models:
======================================

MS_MARCO_MODELS:
  ms-marco-MiniLM-L6-v2: cross-encoder/ms-marco-MiniLM-L6-v2
  ms-marco-TinyBERT-L6: cross-encoder/ms-marco-TinyBERT-L6
  ...

Available Fine-tuned Models:
==========================
  anime-search-v1: model/fine-tuned/anime-search-v1
  ...

Usage example:
  python src/main.py search --type anime --query "Your query"           --model "cross-encoder/ms-marco-MiniLM-L-6-v2"

To use a fine-tuned model:
  python src/main.py search --type anime --query "Your query"           --model "model/fine-tuned/your-model-name"

Model selection guide:
- TinyBERT models: Smallest and fastest, good for low-resource environments
- MiniLM models: Good balance of performance and efficiency
...

Notes

The function accesses ALTERNATIVE_MODELS from constants.py
The output includes usage examples tailored to the available models
The model selection guide helps users choose appropriate models
Error handling is provided by the @handle_exceptions decorator

Source code in src/utils/display.py

@handle_exceptions(cli_mode=True, reraise=False)
def display_available_models(
    fine_tuned_models: Optional[Dict[str, str]] = None,
) -> None:
    """
    Display a formatted list of available models for searching or training.

    This function prints a comprehensive list of available models to the console,
    organized by category. It displays both pre-trained models from the constants
    and fine-tuned models if provided. The output includes usage examples and a
    guide to help users select appropriate models for their needs.

    The function is decorated with error handling to gracefully handle any exceptions
    that might occur during the display process.

    Args:
        fine_tuned_models: Optional dictionary mapping model names to their paths.
            If provided, these fine-tuned models will be displayed in a separate section.
            If None, only pre-trained models will be shown.

    Returns:
        None: This function prints information to the console but doesn't return any values.

    Example Output:
        ```
        Available Pre-trained Cross-Encoder Models:
        ======================================

        MS_MARCO_MODELS:
          ms-marco-MiniLM-L6-v2: cross-encoder/ms-marco-MiniLM-L6-v2
          ms-marco-TinyBERT-L6: cross-encoder/ms-marco-TinyBERT-L6
          ...

        Available Fine-tuned Models:
        ==========================
          anime-search-v1: model/fine-tuned/anime-search-v1
          ...

        Usage example:
          python src/main.py search --type anime --query "Your query" \
          --model "cross-encoder/ms-marco-MiniLM-L-6-v2"

        To use a fine-tuned model:
          python src/main.py search --type anime --query "Your query" \
          --model "model/fine-tuned/your-model-name"

        Model selection guide:
        - TinyBERT models: Smallest and fastest, good for low-resource environments
        - MiniLM models: Good balance of performance and efficiency
        ...
        ```

    Notes:
        - The function accesses ALTERNATIVE_MODELS from constants.py
        - The output includes usage examples tailored to the available models
        - The model selection guide helps users choose appropriate models
        - Error handling is provided by the @handle_exceptions decorator
    """
    models = ALTERNATIVE_MODELS

    print("\nAvailable Pre-trained Cross-Encoder Models:")
    print("======================================")

    for category, model_dict in models.items():
        print(f"\n{category.upper()}:")
        for name, path in model_dict.items():
            print(f"  {name}: {path}")

    # Display fine-tuned models if provided
    if fine_tuned_models:
        print("\nAvailable Fine-tuned Models:")
        print("==========================")
        for name, path in fine_tuned_models.items():
            print(f"  {name}: {path}")

    print("\nUsage example:")
    print(
        '  python src/main.py search --type anime --query "Your query" '
        '--model "cross-encoder/ms-marco-MiniLM-L-6-v2"'
    )
    if fine_tuned_models:
        print("\nTo use a fine-tuned model:")
        print(
            '  python src/main.py search --type anime --query "Your query" '
            '--model "model/fine-tuned/your-model-name"'
        )
    print("\nModel selection guide:")
    print("- TinyBERT models: Smallest and fastest, good for low-resource environments")
    print("- MiniLM models: Good balance of performance and efficiency")
    print("- ELECTRA models: Higher accuracy but more computationally intensive")
    print("- MS Marco models: Optimized for information retrieval")
    print("- Fine-tuned models: Domain-specific models trained on anime/manga data")

format_score ¶

format_score(score: float, normalize_scores: bool, model_name: str) -> str

Format a model's relevance score for user-friendly display.

This function formats search result scores differently based on the model type and normalization settings. It handles two main cases:

MS Marco models or other models with normalized scores (0-1 range) These are displayed as percentages for intuitive interpretation
Other models with unnormalized scores These are displayed as raw float values with fixed precision

PARAMETER	DESCRIPTION
`score`	The raw relevance score from the model, typically a float between 0 and 1 for normalized models, or any float range for others. TYPE: `float`
`normalize_scores`	Boolean flag indicating whether the scores are normalized to the 0-1 range. This affects the display format. TYPE: `bool`
`model_name`	Name of the model that produced the score, used to detect specific model types that require special formatting. TYPE: `str`

RETURNS	DESCRIPTION
`str`	A formatted string representation of the score, either as a percentage (e.g., "95.2% relevance") or as a raw score (e.g., "score: 0.952"). TYPE: `str`

Examples:

# Format a score from an MS Marco model
formatted = format_score(0.952, True, "cross-encoder/ms-marco-MiniLM-L-6-v2")
# Result: "95.2% relevance"

# Format a score from a non-normalized model
formatted = format_score(4.73, False, "cross-encoder/stsb-roberta-base")
# Result: "score: 4.730"

Notes

MS Marco models are automatically detected by checking if "ms-marco" appears in the model name (case-insensitive)
Percentage format shows one decimal place (e.g., 95.2%)
Raw score format shows three decimal places (e.g., 4.730)

Source code in src/utils/display.py

def format_score(score: float, normalize_scores: bool, model_name: str) -> str:
    """
    Format a model's relevance score for user-friendly display.

    This function formats search result scores differently based on the model type
    and normalization settings. It handles two main cases:

    1. MS Marco models or other models with normalized scores (0-1 range)
       These are displayed as percentages for intuitive interpretation
    2. Other models with unnormalized scores
       These are displayed as raw float values with fixed precision

    Args:
        score: The raw relevance score from the model, typically a float between
            0 and 1 for normalized models, or any float range for others.
        normalize_scores: Boolean flag indicating whether the scores are normalized
            to the 0-1 range. This affects the display format.
        model_name: Name of the model that produced the score, used to detect
            specific model types that require special formatting.

    Returns:
        str: A formatted string representation of the score, either as a percentage
            (e.g., "95.2% relevance") or as a raw score (e.g., "score: 0.952").

    Examples:
        ```python
        # Format a score from an MS Marco model
        formatted = format_score(0.952, True, "cross-encoder/ms-marco-MiniLM-L-6-v2")
        # Result: "95.2% relevance"

        # Format a score from a non-normalized model
        formatted = format_score(4.73, False, "cross-encoder/stsb-roberta-base")
        # Result: "score: 4.730"
        ```

    Notes:
        - MS Marco models are automatically detected by checking if "ms-marco" appears
          in the model name (case-insensitive)
        - Percentage format shows one decimal place (e.g., 95.2%)
        - Raw score format shows three decimal places (e.g., 4.730)
    """
    if normalize_scores or "ms-marco" in model_name.lower():
        # For MS Marco models or normalized scores, display as percentage
        return f"{score:.1%} relevance"
    # For other models, just show the raw score
    return f"score: {score:.3f}"

list_fine_tuned_models ¶

list_fine_tuned_models() -> Dict[str, str]

List all available fine-tuned models in the model directory.

This function scans the fine-tuned model directory for valid model folders, identifying them by the presence of a config.json file. It returns a mapping of model names to their full paths, which can be used to load the models.

This is a lightweight implementation that doesn't import heavy ML frameworks like TensorFlow or PyTorch, making it suitable for quick model listing without the overhead of loading these dependencies.

RETURNS	DESCRIPTION
`Dict[str, str]`	Dict[str, str]: A dictionary where: - Keys are the model directory names (model identifiers) - Values are the full paths to the model directories Returns an empty dictionary if no models are found or if the model directory doesn't exist.

Examples:

# Get a dictionary of available fine-tuned models
models = list_fine_tuned_models()

# Print the available models
if models:
    print("Available fine-tuned models:")
    for name, path in models.items():
        print(f"- {name}: {path}")
else:
    print("No fine-tuned models found")

Notes

Models are identified by the presence of a config.json file
The default search location is the MODEL_SAVE_PATH constant ("model/fine-tuned/")
This function does not validate that the models are functional or compatible
Use this function before attempting to load a fine-tuned model to check availability

Source code in src/utils/display.py

def list_fine_tuned_models() -> Dict[str, str]:
    """
    List all available fine-tuned models in the model directory.

    This function scans the fine-tuned model directory for valid model folders,
    identifying them by the presence of a config.json file. It returns a mapping
    of model names to their full paths, which can be used to load the models.

    This is a lightweight implementation that doesn't import heavy ML frameworks
    like TensorFlow or PyTorch, making it suitable for quick model listing without
    the overhead of loading these dependencies.

    Returns:
        Dict[str, str]: A dictionary where:
            - Keys are the model directory names (model identifiers)
            - Values are the full paths to the model directories

            Returns an empty dictionary if no models are found or if the
            model directory doesn't exist.

    Examples:
        ```python
        # Get a dictionary of available fine-tuned models
        models = list_fine_tuned_models()

        # Print the available models
        if models:
            print("Available fine-tuned models:")
            for name, path in models.items():
                print(f"- {name}: {path}")
        else:
            print("No fine-tuned models found")
        ```

    Notes:
        - Models are identified by the presence of a config.json file
        - The default search location is the MODEL_SAVE_PATH constant ("model/fine-tuned/")
        - This function does not validate that the models are functional or compatible
        - Use this function before attempting to load a fine-tuned model to check availability
    """
    if not os.path.exists(MODEL_SAVE_PATH):
        return {}

    fine_tuned_models = {}
    for model_name in os.listdir(MODEL_SAVE_PATH):
        model_path = os.path.join(MODEL_SAVE_PATH, model_name)
        config_path = os.path.join(model_path, "config.json")

        if os.path.isdir(model_path) and os.path.exists(config_path):
            fine_tuned_models[model_name] = model_path

    return fine_tuned_models

Logging Configuration¶

Setup and configuration of application logging:

Logging Configuration¶

Centralized logging setup for the anime/manga search application.

This module provides a standardized logging configuration to ensure consistent log formatting, appropriate log levels, and proper output handling across all application components. It implements a simple, reusable logging setup that can be called during application initialization.

Features¶

Consistent timestamp and log level formatting
Console output through StreamHandler
Error handling for logging setup via handle_exceptions decorator
INFO level logging by default for appropriate verbosity

Usage Context¶

The logging configuration is typically initialized at application startup:

Called in the main entry point before any other operations
Used by error handling utilities to log exceptions
Available for all modules to use for consistent logging

Having a centralized logging configuration ensures that all components produce logs in a consistent format, making debugging and monitoring easier.

setup_logging ¶

setup_logging() -> None

Configure logging for the application with standardized formatting.

This function initializes the Python logging system with consistent formatting, appropriate log levels, and console output. It sets up:

INFO level logging for moderate verbosity
Timestamp, level, and message formatting
Console output through a StreamHandler

The function is decorated with handle_exceptions to ensure that any issues during logging configuration are properly captured and reported.

RETURNS	DESCRIPTION
`None`	This function configures the global logging system but doesn't return any value. TYPE: `None`

Example

# Initialize logging at application startup
from src.utils.logging_config import setup_logging

def main():
    # Set up logging first
    setup_logging()

    # Now all subsequent log calls will use this configuration
    logging.info("Application starting")
    # ...

Notes

This function should be called once at application startup
The log format includes timestamp, log level, and message
The default level (INFO) can be overridden through environment variables if the standard logging configuration mechanisms are used

Source code in src/utils/logging_config.py

@handle_exceptions(log_exceptions=True, include_exc_info=True)
def setup_logging() -> None:
    """
    Configure logging for the application with standardized formatting.

    This function initializes the Python logging system with consistent formatting,
    appropriate log levels, and console output. It sets up:

    - INFO level logging for moderate verbosity
    - Timestamp, level, and message formatting
    - Console output through a StreamHandler

    The function is decorated with handle_exceptions to ensure that any issues
    during logging configuration are properly captured and reported.

    Returns:
        None: This function configures the global logging system but doesn't
            return any value.

    Example:
        ```python
        # Initialize logging at application startup
        from src.utils.logging_config import setup_logging

        def main():
            # Set up logging first
            setup_logging()

            # Now all subsequent log calls will use this configuration
            logging.info("Application starting")
            # ...
        ```

    Notes:
        - This function should be called once at application startup
        - The log format includes timestamp, log level, and message
        - The default level (INFO) can be overridden through environment variables
          if the standard logging configuration mechanisms are used
    """
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s - %(levelname)s - %(message)s",
        handlers=[logging.StreamHandler()],
    )

Error Handling¶

Utilities for centralized error handling:

Error Handling Utilities¶

Standardized error handling functionality for the anime/manga search application.

This module provides reusable decorators and utilities to maintain consistent error handling across the application. It implements a centralized approach to catching, logging, and reporting exceptions, allowing for more maintainable and user-friendly error handling.

Features¶

Decorator-based exception handling for consistent behavior
Configurable error logging with severity control
Customizable user-friendly error messages for CLI applications
Type-safe implementation with proper generic typing

Usage Context¶

These utilities are primarily used for:

Wrapping IO-heavy functions that may encounter file or network issues
Handling user input validation in CLI commands
Gracefully managing expected exceptions in model loading and inference
Providing informative error messages in both CLI and logging contexts

By centralizing error handling logic, the application maintains consistent behavior across different components and provides better debugging information when issues occur.

COMMON_EXCEPTIONS `module-attribute` ¶

COMMON_EXCEPTIONS: Dict[Type[Exception], str] = {ValueError: 'Invalid value or parameter', KeyError: 'Missing data in results', ImportError: 'Failed to import required module', RuntimeError: 'Runtime error during operation', MemoryError: 'Insufficient memory to process this operation', FileNotFoundError: 'Required model or data file not found', PermissionError: 'Permission denied when accessing files', TimeoutError: 'Operation timed out', ConnectionError: 'Network connection error'}

F `module-attribute` ¶

F = TypeVar('F', bound=Callable[..., Any])

logger `module-attribute` ¶

logger = getLogger(__name__)

handle_exceptions ¶

handle_exceptions(cli_mode: bool = False, exceptions: Optional[List[Type[Exception]]] = None, log_exceptions: bool = True, include_exc_info: bool = False, reraise: bool = True) -> Callable[[F], F]

Decorator for handling exceptions in a standardized way across the application.

This decorator wraps functions to provide consistent exception handling, including:

Catching specified exceptions or all common exceptions by default
Logging exceptions with configurable verbosity
Presenting user-friendly error messages in CLI mode
Optionally re-raising exceptions after handling

The decorator can be configured for different contexts (CLI vs. background processing) and adjusted for different levels of verbosity and strictness.

PARAMETER	DESCRIPTION
`cli_mode`	Whether to print user-friendly messages to the console. When True, error messages are formatted for end users and printed to stdout. When False, errors are only logged (if log_exceptions is True). Default is False. TYPE: `bool` DEFAULT: `False`
`exceptions`	List of exception types to catch explicitly. If None (default), uses all exceptions defined in COMMON_EXCEPTIONS. Specify a subset of exceptions to handle only specific error types. TYPE: `Optional[List[Type[Exception]]]` DEFAULT: `None`
`log_exceptions`	Whether to log the exceptions to the application logger. When True, exceptions are logged using the module's logger. When False, exceptions are not logged (useful when handled elsewhere). Default is True. TYPE: `bool` DEFAULT: `True`
`include_exc_info`	Whether to include exception traceback in logs. When True, full exception traceback is included in log messages. When False, only the exception message is logged. Default is False. TYPE: `bool` DEFAULT: `False`
`reraise`	Whether to re-raise the exception after handling. When True, the exception is re-raised after logging/printing. When False, the function returns None instead of re-raising. Default is True. TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`Callable[[F], F]`	Callable[[F], F]: A decorator function that wraps the target function with the specified exception handling behavior.

Examples:

Basic usage with default settings (logs exceptions and reraises):

@handle_exceptions()
def load_data(filepath):
    with open(filepath) as f:
        return json.load(f)

CLI-friendly error handling without reraising:

@handle_exceptions(cli_mode=True, reraise=False)
def process_user_input(user_input):
    # Process user input, potentially raising exceptions
    return validated_input

Handling only specific exceptions:

@handle_exceptions(
    exceptions=[FileNotFoundError, PermissionError],
    cli_mode=True
)
def read_config_file(filepath):
    with open(filepath) as f:
        return f.read()

Notes

When an exception is caught but not reraised (reraise=False), the function returns None. Callers should handle this case appropriately.
In CLI mode, caught exceptions generate user-friendly error messages based on the COMMON_EXCEPTIONS mapping.
Unexpected exceptions (not in the exceptions list) are always logged with full traceback information, regardless of the include_exc_info setting.

Source code in src/utils/error_handling.py

def handle_exceptions(
    cli_mode: bool = False,
    exceptions: Optional[List[Type[Exception]]] = None,
    log_exceptions: bool = True,
    include_exc_info: bool = False,
    reraise: bool = True,
) -> Callable[[F], F]:
    """
    Decorator for handling exceptions in a standardized way across the application.

    This decorator wraps functions to provide consistent exception handling, including:

    1. Catching specified exceptions or all common exceptions by default
    2. Logging exceptions with configurable verbosity
    3. Presenting user-friendly error messages in CLI mode
    4. Optionally re-raising exceptions after handling

    The decorator can be configured for different contexts (CLI vs. background processing)
    and adjusted for different levels of verbosity and strictness.

    Args:
        cli_mode: Whether to print user-friendly messages to the console.
            When True, error messages are formatted for end users and printed to stdout.
            When False, errors are only logged (if log_exceptions is True).
            Default is False.

        exceptions: List of exception types to catch explicitly.
            If None (default), uses all exceptions defined in COMMON_EXCEPTIONS.
            Specify a subset of exceptions to handle only specific error types.

        log_exceptions: Whether to log the exceptions to the application logger.
            When True, exceptions are logged using the module's logger.
            When False, exceptions are not logged (useful when handled elsewhere).
            Default is True.

        include_exc_info: Whether to include exception traceback in logs.
            When True, full exception traceback is included in log messages.
            When False, only the exception message is logged.
            Default is False.

        reraise: Whether to re-raise the exception after handling.
            When True, the exception is re-raised after logging/printing.
            When False, the function returns None instead of re-raising.
            Default is True.

    Returns:
        Callable[[F], F]: A decorator function that wraps the target function
            with the specified exception handling behavior.

    Examples:
        Basic usage with default settings (logs exceptions and reraises):
        ```python
        @handle_exceptions()
        def load_data(filepath):
            with open(filepath) as f:
                return json.load(f)
        ```

        CLI-friendly error handling without reraising:
        ```python
        @handle_exceptions(cli_mode=True, reraise=False)
        def process_user_input(user_input):
            # Process user input, potentially raising exceptions
            return validated_input
        ```

        Handling only specific exceptions:
        ```python
        @handle_exceptions(
            exceptions=[FileNotFoundError, PermissionError],
            cli_mode=True
        )
        def read_config_file(filepath):
            with open(filepath) as f:
                return f.read()
        ```

    Notes:
        - When an exception is caught but not reraised (reraise=False), the function
          returns None. Callers should handle this case appropriately.
        - In CLI mode, caught exceptions generate user-friendly error messages based
          on the COMMON_EXCEPTIONS mapping.
        - Unexpected exceptions (not in the exceptions list) are always logged with
          full traceback information, regardless of the include_exc_info setting.
    """
    if exceptions is None:
        exceptions = list(COMMON_EXCEPTIONS.keys())

    def decorator(func: F) -> F:
        @functools.wraps(func)
        def wrapper(*args: Any, **kwargs: Any) -> Any:
            try:
                return func(*args, **kwargs)
            except Exception as e:  # Catch all exceptions first to determine type
                exception_type = type(e)

                # Handle specific exception types
                if exception_type in exceptions:
                    # Get the exception message, defaulting to the exception type name
                    message = COMMON_EXCEPTIONS.get(
                        exception_type, str(exception_type.__name__)
                    )
                    error_detail = f"{message}: {str(e)}"

                    # Handle based on mode (CLI or logging)
                    if cli_mode:
                        print(f"Error: {error_detail}")

                    if log_exceptions:
                        context = f"Error in {func.__name__}"
                        logger.error(
                            "%s: %s", context, error_detail, exc_info=include_exc_info
                        )
                # Handle unexpected exceptions
                else:
                    if cli_mode:
                        print(f"Unexpected error: {str(e)}")
                        print("Please report this issue to the developers")

                    if log_exceptions:
                        logger.error(
                            "Unexpected error in %s: %s",
                            func.__name__,
                            str(e),
                            exc_info=True,
                        )

                if reraise:
                    raise
                return None

        return cast(F, wrapper)

    return decorator

Constants¶

Configuration constants and default values:

Application Constants¶

Centralized configuration constants for the anime/manga search application.

This module defines all the global constants used throughout the application, including default paths, model configurations, and search parameters. Centralizing these values makes it easier to maintain consistent settings across the application and simplifies configuration changes.

Constants Categories¶

Dataset Paths: Locations of the merged anime and manga datasets
Model Configuration: Default model name, batch size, and result count
Alternative Models: Dictionary of other models that can be used

Usage Context¶

These constants are imported and used throughout the application:

Model initialization uses the model name and dataset paths
Search operations use the default number of results and batch size
Model listing commands use the alternative models dictionary

Using constants instead of hard-coded values improves maintainability and ensures consistency across the application.

ALTERNATIVE_MODELS `module-attribute` ¶

ALTERNATIVE_MODELS: Dict[str, Dict[str, str]] = {'ms_marco_models': {'ms-marco-MiniLM-L2-v2': 'cross-encoder/ms-marco-MiniLM-L2-v2', 'ms-marco-MiniLM-L4-v2': 'cross-encoder/ms-marco-MiniLM-L4-v2', 'ms-marco-MiniLM-L6-v2': 'cross-encoder/ms-marco-MiniLM-L6-v2', 'ms-marco-MiniLM-L12-v2': 'cross-encoder/ms-marco-MiniLM-L12-v2', 'ms-marco-TinyBERT-L2': 'cross-encoder/ms-marco-TinyBERT-L2', 'ms-marco-TinyBERT-L2-v2': 'cross-encoder/ms-marco-TinyBERT-L2-v2', 'ms-marco-TinyBERT-L4': 'cross-encoder/ms-marco-TinyBERT-L4', 'ms-marco-TinyBERT-L6': 'cross-encoder/ms-marco-TinyBERT-L6', 'ms-marco-electra-base': 'cross-encoder/ms-marco-electra-base'}}

Dictionary of alternative cross-encoder models available for use.

Organized by category (currently only 'ms_marco_models'), this dictionary maps friendly model names to their full HuggingFace model identifiers. These models can be selected via the --model command-line argument.

Model performance characteristics: - TinyBERT models: Smallest and fastest, good for low-resource environments - MiniLM models: Good balance of performance and efficiency - ELECTRA models: Higher accuracy but more computationally intensive

ANIME_DATASET_PATH `module-attribute` ¶

ANIME_DATASET_PATH: str = 'model/merged_anime_dataset.csv'

Path to the merged anime dataset CSV file.

DEFAULT_BATCH_SIZE `module-attribute` ¶

DEFAULT_BATCH_SIZE: int = 256

Default batch size for model inference and data processing operations.

Larger batch sizes generally provide better performance up to hardware limits. This value may need adjustment based on available memory and processor capabilities.

MANGA_DATASET_PATH `module-attribute` ¶

MANGA_DATASET_PATH: str = 'model/merged_manga_dataset.csv'

Path to the merged manga dataset CSV file.

MODEL_NAME `module-attribute` ¶

MODEL_NAME: str = 'cross-encoder/ms-marco-MiniLM-L-6-v2'

Default cross-encoder model used for search operations.

This model offers a good balance between performance and accuracy for text ranking tasks. It's a MiniLM model with 6 layers, which makes it relatively lightweight while still providing good search results.

NUM_RESULTS `module-attribute` ¶

NUM_RESULTS: int = 5

Default number of search results to return.

This controls how many top matches are returned when performing a search operation. Can be overridden via command-line arguments.

Utility Functions¶

Overview¶

Display Utilities¶

Display Utilities¶

Features¶

Usage Context¶

MODEL_SAVE_PATH module-attribute ¶

display_available_models ¶

format_score ¶

list_fine_tuned_models ¶

Logging Configuration¶

Logging Configuration¶

Features¶

Usage Context¶

setup_logging ¶

Error Handling¶

Error Handling Utilities¶

Features¶

Usage Context¶

COMMON_EXCEPTIONS module-attribute ¶

F module-attribute ¶

logger module-attribute ¶

handle_exceptions ¶

Constants¶

Application Constants¶

Constants Categories¶

Usage Context¶

ALTERNATIVE_MODELS module-attribute ¶

ANIME_DATASET_PATH module-attribute ¶

DEFAULT_BATCH_SIZE module-attribute ¶

MANGA_DATASET_PATH module-attribute ¶

MODEL_NAME module-attribute ¶

NUM_RESULTS module-attribute ¶

MODEL_SAVE_PATH `module-attribute` ¶

COMMON_EXCEPTIONS `module-attribute` ¶

F `module-attribute` ¶

logger `module-attribute` ¶

ALTERNATIVE_MODELS `module-attribute` ¶

ANIME_DATASET_PATH `module-attribute` ¶

DEFAULT_BATCH_SIZE `module-attribute` ¶

MANGA_DATASET_PATH `module-attribute` ¶

MODEL_NAME `module-attribute` ¶

NUM_RESULTS `module-attribute` ¶