Utility Functions¶
This page documents the utility components of AniSearch Model.
Overview¶
The utils package provides various utility functions and constants used throughout the application:
- Display utilities for formatting and printing results
- Logging configuration
- Error handling utilities
- Constants and configuration
Display Utilities¶
Functions for formatting and displaying search results:
Display Utilities¶
Formatting and display functions for the anime/manga search application.
This module provides utilities for displaying model information, formatting search results, and presenting information to users in a consistent and readable format. It's designed to be lightweight and not import any heavy ML dependencies, allowing it to be used for model listing without loading TensorFlow or PyTorch.
Features¶
- Score formatting that adapts to different model types
- Model listing capabilities for both pre-trained and fine-tuned models
- User-friendly console display functions
- Error handling for display operations
Usage Context¶
These utilities are primarily used in:
- CLI output formatting for search results
- Model listing for the '--list-models' CLI argument
- Interactive search result display
This module should NOT import any ML frameworks like TensorFlow or PyTorch,
as it's used for lightweight model listing without loading heavy dependencies.
display_available_models ¶
Display a formatted list of available models for searching or training.
This function prints a comprehensive list of available models to the console, organized by category. It displays both pre-trained models from the constants and fine-tuned models if provided. The output includes usage examples and a guide to help users select appropriate models for their needs.
The function is decorated with error handling to gracefully handle any exceptions that might occur during the display process.
PARAMETER | DESCRIPTION |
---|---|
fine_tuned_models | Optional dictionary mapping model names to their paths. If provided, these fine-tuned models will be displayed in a separate section. If None, only pre-trained models will be shown. TYPE: |
RETURNS | DESCRIPTION |
---|---|
None | This function prints information to the console but doesn't return any values. TYPE: |
Example Output
Available Pre-trained Cross-Encoder Models:
======================================
MS_MARCO_MODELS:
ms-marco-MiniLM-L6-v2: cross-encoder/ms-marco-MiniLM-L6-v2
ms-marco-TinyBERT-L6: cross-encoder/ms-marco-TinyBERT-L6
...
Available Fine-tuned Models:
==========================
anime-search-v1: model/fine-tuned/anime-search-v1
...
Usage example:
python src/main.py search --type anime --query "Your query" --model "cross-encoder/ms-marco-MiniLM-L-6-v2"
To use a fine-tuned model:
python src/main.py search --type anime --query "Your query" --model "model/fine-tuned/your-model-name"
Model selection guide:
- TinyBERT models: Smallest and fastest, good for low-resource environments
- MiniLM models: Good balance of performance and efficiency
...
Notes
- The function accesses ALTERNATIVE_MODELS from constants.py
- The output includes usage examples tailored to the available models
- The model selection guide helps users choose appropriate models
- Error handling is provided by the @handle_exceptions decorator
Source code in src/utils/display.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
|
format_score ¶
Format a model's relevance score for user-friendly display.
This function formats search result scores differently based on the model type and normalization settings. It handles two main cases:
- MS Marco models or other models with normalized scores (0-1 range) These are displayed as percentages for intuitive interpretation
- Other models with unnormalized scores These are displayed as raw float values with fixed precision
PARAMETER | DESCRIPTION |
---|---|
score | The raw relevance score from the model, typically a float between 0 and 1 for normalized models, or any float range for others. TYPE: |
normalize_scores | Boolean flag indicating whether the scores are normalized to the 0-1 range. This affects the display format. TYPE: |
model_name | Name of the model that produced the score, used to detect specific model types that require special formatting. TYPE: |
RETURNS | DESCRIPTION |
---|---|
str | A formatted string representation of the score, either as a percentage (e.g., "95.2% relevance") or as a raw score (e.g., "score: 0.952"). TYPE: |
Examples:
# Format a score from an MS Marco model
formatted = format_score(0.952, True, "cross-encoder/ms-marco-MiniLM-L-6-v2")
# Result: "95.2% relevance"
# Format a score from a non-normalized model
formatted = format_score(4.73, False, "cross-encoder/stsb-roberta-base")
# Result: "score: 4.730"
Notes
- MS Marco models are automatically detected by checking if "ms-marco" appears in the model name (case-insensitive)
- Percentage format shows one decimal place (e.g., 95.2%)
- Raw score format shows three decimal places (e.g., 4.730)
Source code in src/utils/display.py
list_fine_tuned_models ¶
List all available fine-tuned models in the model directory.
This function scans the fine-tuned model directory for valid model folders, identifying them by the presence of a config.json file. It returns a mapping of model names to their full paths, which can be used to load the models.
This is a lightweight implementation that doesn't import heavy ML frameworks like TensorFlow or PyTorch, making it suitable for quick model listing without the overhead of loading these dependencies.
RETURNS | DESCRIPTION |
---|---|
Dict[str, str] | Dict[str, str]: A dictionary where: - Keys are the model directory names (model identifiers) - Values are the full paths to the model directories Returns an empty dictionary if no models are found or if the model directory doesn't exist. |
Examples:
# Get a dictionary of available fine-tuned models
models = list_fine_tuned_models()
# Print the available models
if models:
print("Available fine-tuned models:")
for name, path in models.items():
print(f"- {name}: {path}")
else:
print("No fine-tuned models found")
Notes
- Models are identified by the presence of a config.json file
- The default search location is the MODEL_SAVE_PATH constant ("model/fine-tuned/")
- This function does not validate that the models are functional or compatible
- Use this function before attempting to load a fine-tuned model to check availability
Source code in src/utils/display.py
Logging Configuration¶
Setup and configuration of application logging:
Logging Configuration¶
Centralized logging setup for the anime/manga search application.
This module provides a standardized logging configuration to ensure consistent log formatting, appropriate log levels, and proper output handling across all application components. It implements a simple, reusable logging setup that can be called during application initialization.
Features¶
- Consistent timestamp and log level formatting
- Console output through StreamHandler
- Error handling for logging setup via handle_exceptions decorator
- INFO level logging by default for appropriate verbosity
Usage Context¶
The logging configuration is typically initialized at application startup:
- Called in the main entry point before any other operations
- Used by error handling utilities to log exceptions
- Available for all modules to use for consistent logging
Having a centralized logging configuration ensures that all components produce logs in a consistent format, making debugging and monitoring easier.
setup_logging ¶
Configure logging for the application with standardized formatting.
This function initializes the Python logging system with consistent formatting, appropriate log levels, and console output. It sets up:
- INFO level logging for moderate verbosity
- Timestamp, level, and message formatting
- Console output through a StreamHandler
The function is decorated with handle_exceptions to ensure that any issues during logging configuration are properly captured and reported.
RETURNS | DESCRIPTION |
---|---|
None | This function configures the global logging system but doesn't return any value. TYPE: |
Example
Notes
- This function should be called once at application startup
- The log format includes timestamp, log level, and message
- The default level (INFO) can be overridden through environment variables if the standard logging configuration mechanisms are used
Source code in src/utils/logging_config.py
Error Handling¶
Utilities for centralized error handling:
Error Handling Utilities¶
Standardized error handling functionality for the anime/manga search application.
This module provides reusable decorators and utilities to maintain consistent error handling across the application. It implements a centralized approach to catching, logging, and reporting exceptions, allowing for more maintainable and user-friendly error handling.
Features¶
- Decorator-based exception handling for consistent behavior
- Configurable error logging with severity control
- Customizable user-friendly error messages for CLI applications
- Type-safe implementation with proper generic typing
Usage Context¶
These utilities are primarily used for:
- Wrapping IO-heavy functions that may encounter file or network issues
- Handling user input validation in CLI commands
- Gracefully managing expected exceptions in model loading and inference
- Providing informative error messages in both CLI and logging contexts
By centralizing error handling logic, the application maintains consistent behavior across different components and provides better debugging information when issues occur.
COMMON_EXCEPTIONS module-attribute
¶
COMMON_EXCEPTIONS: Dict[Type[Exception], str] = {ValueError: 'Invalid value or parameter', KeyError: 'Missing data in results', ImportError: 'Failed to import required module', RuntimeError: 'Runtime error during operation', MemoryError: 'Insufficient memory to process this operation', FileNotFoundError: 'Required model or data file not found', PermissionError: 'Permission denied when accessing files', TimeoutError: 'Operation timed out', ConnectionError: 'Network connection error'}
handle_exceptions ¶
handle_exceptions(cli_mode: bool = False, exceptions: Optional[List[Type[Exception]]] = None, log_exceptions: bool = True, include_exc_info: bool = False, reraise: bool = True) -> Callable[[F], F]
Decorator for handling exceptions in a standardized way across the application.
This decorator wraps functions to provide consistent exception handling, including:
- Catching specified exceptions or all common exceptions by default
- Logging exceptions with configurable verbosity
- Presenting user-friendly error messages in CLI mode
- Optionally re-raising exceptions after handling
The decorator can be configured for different contexts (CLI vs. background processing) and adjusted for different levels of verbosity and strictness.
PARAMETER | DESCRIPTION |
---|---|
cli_mode | Whether to print user-friendly messages to the console. When True, error messages are formatted for end users and printed to stdout. When False, errors are only logged (if log_exceptions is True). Default is False. TYPE: |
exceptions | List of exception types to catch explicitly. If None (default), uses all exceptions defined in COMMON_EXCEPTIONS. Specify a subset of exceptions to handle only specific error types. TYPE: |
log_exceptions | Whether to log the exceptions to the application logger. When True, exceptions are logged using the module's logger. When False, exceptions are not logged (useful when handled elsewhere). Default is True. TYPE: |
include_exc_info | Whether to include exception traceback in logs. When True, full exception traceback is included in log messages. When False, only the exception message is logged. Default is False. TYPE: |
reraise | Whether to re-raise the exception after handling. When True, the exception is re-raised after logging/printing. When False, the function returns None instead of re-raising. Default is True. TYPE: |
RETURNS | DESCRIPTION |
---|---|
Callable[[F], F] | Callable[[F], F]: A decorator function that wraps the target function with the specified exception handling behavior. |
Examples:
Basic usage with default settings (logs exceptions and reraises):
CLI-friendly error handling without reraising:
@handle_exceptions(cli_mode=True, reraise=False)
def process_user_input(user_input):
# Process user input, potentially raising exceptions
return validated_input
Handling only specific exceptions:
@handle_exceptions(
exceptions=[FileNotFoundError, PermissionError],
cli_mode=True
)
def read_config_file(filepath):
with open(filepath) as f:
return f.read()
Notes
- When an exception is caught but not reraised (reraise=False), the function returns None. Callers should handle this case appropriately.
- In CLI mode, caught exceptions generate user-friendly error messages based on the COMMON_EXCEPTIONS mapping.
- Unexpected exceptions (not in the exceptions list) are always logged with full traceback information, regardless of the include_exc_info setting.
Source code in src/utils/error_handling.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
Constants¶
Configuration constants and default values:
Application Constants¶
Centralized configuration constants for the anime/manga search application.
This module defines all the global constants used throughout the application, including default paths, model configurations, and search parameters. Centralizing these values makes it easier to maintain consistent settings across the application and simplifies configuration changes.
Constants Categories¶
- Dataset Paths: Locations of the merged anime and manga datasets
- Model Configuration: Default model name, batch size, and result count
- Alternative Models: Dictionary of other models that can be used
Usage Context¶
These constants are imported and used throughout the application:
- Model initialization uses the model name and dataset paths
- Search operations use the default number of results and batch size
- Model listing commands use the alternative models dictionary
Using constants instead of hard-coded values improves maintainability and ensures consistency across the application.
ALTERNATIVE_MODELS module-attribute
¶
ALTERNATIVE_MODELS: Dict[str, Dict[str, str]] = {'ms_marco_models': {'ms-marco-MiniLM-L2-v2': 'cross-encoder/ms-marco-MiniLM-L2-v2', 'ms-marco-MiniLM-L4-v2': 'cross-encoder/ms-marco-MiniLM-L4-v2', 'ms-marco-MiniLM-L6-v2': 'cross-encoder/ms-marco-MiniLM-L6-v2', 'ms-marco-MiniLM-L12-v2': 'cross-encoder/ms-marco-MiniLM-L12-v2', 'ms-marco-TinyBERT-L2': 'cross-encoder/ms-marco-TinyBERT-L2', 'ms-marco-TinyBERT-L2-v2': 'cross-encoder/ms-marco-TinyBERT-L2-v2', 'ms-marco-TinyBERT-L4': 'cross-encoder/ms-marco-TinyBERT-L4', 'ms-marco-TinyBERT-L6': 'cross-encoder/ms-marco-TinyBERT-L6', 'ms-marco-electra-base': 'cross-encoder/ms-marco-electra-base'}}
Dictionary of alternative cross-encoder models available for use.
Organized by category (currently only 'ms_marco_models'), this dictionary maps friendly model names to their full HuggingFace model identifiers. These models can be selected via the --model command-line argument.
Model performance characteristics: - TinyBERT models: Smallest and fastest, good for low-resource environments - MiniLM models: Good balance of performance and efficiency - ELECTRA models: Higher accuracy but more computationally intensive
ANIME_DATASET_PATH module-attribute
¶
Path to the merged anime dataset CSV file.
DEFAULT_BATCH_SIZE module-attribute
¶
Default batch size for model inference and data processing operations.
Larger batch sizes generally provide better performance up to hardware limits. This value may need adjustment based on available memory and processor capabilities.
MANGA_DATASET_PATH module-attribute
¶
Path to the merged manga dataset CSV file.
MODEL_NAME module-attribute
¶
Default cross-encoder model used for search operations.
This model offers a good balance between performance and accuracy for text ranking tasks. It's a MiniLM model with 6 layers, which makes it relatively lightweight while still providing good search results.
NUM_RESULTS module-attribute
¶
Default number of search results to return.
This controls how many top matches are returned when performing a search operation. Can be overridden via command-line arguments.