API
This module implements a Flask application that provides API endpoints for finding similar anime or manga descriptions.
The application uses Sentence Transformers and custom models to encode descriptions and calculate cosine similarities. It supports multiple synopsis columns from different datasets and returns paginated results of the most similar items.
Key Features
- Supports multiple pre-trained and custom Sentence Transformer models
- Handles both anime and manga similarity searches
- Implements rate limiting and CORS
- Provides memory management for GPU resources
- Includes comprehensive logging
- Returns paginated results with similarity scores
The API endpoints are
- POST /anisearchmodel/anime: Find similar anime based on description
- POST /anisearchmodel/manga: Find similar manga based on description
allowed_models
module-attribute
¶
allowed_models = ['sentence-transformers/all-distilroberta-v1', 'sentence-transformers/all-MiniLM-L6-v1', 'sentence-transformers/all-MiniLM-L12-v1', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L12-v2', 'sentence-transformers/all-mpnet-base-v1', 'sentence-transformers/all-mpnet-base-v2', 'sentence-transformers/all-roberta-large-v1', 'sentence-transformers/gtr-t5-base', 'sentence-transformers/gtr-t5-large', 'sentence-transformers/gtr-t5-xl', 'sentence-transformers/multi-qa-distilbert-dot-v1', 'sentence-transformers/multi-qa-mpnet-base-cos-v1', 'sentence-transformers/multi-qa-mpnet-base-dot-v1', 'sentence-transformers/paraphrase-distilroberta-base-v2', 'sentence-transformers/paraphrase-mpnet-base-v2', 'sentence-transformers/sentence-t5-base', 'sentence-transformers/sentence-t5-large', 'sentence-transformers/sentence-t5-xl', 'sentence-transformers/sentence-t5-xxl', 'toobi/anime', 'sentence-transformers/fine_tuned_sbert_anime_model', 'fine_tuned_sbert_anime_model', 'fine_tuned_sbert_model_anime']
anime_synopsis_columns
module-attribute
¶
anime_synopsis_columns = ['synopsis', 'Synopsis anime_dataset_2023', 'Synopsis animes dataset', 'Synopsis anime_270 Dataset', 'Synopsis Anime-2022 Dataset', 'Synopsis anime4500 Dataset', 'Synopsis wykonos Dataset', 'Synopsis Anime_data Dataset', 'Synopsis anime2 Dataset', 'Synopsis mal_anime Dataset']
device
module-attribute
¶
file_formatter
module-attribute
¶
file_handler
module-attribute
¶
file_handler = ConcurrentRotatingFileHandler('./logs/api.log', maxBytes=10 * 1024 * 1024, backupCount=10, encoding='utf-8')
limiter
module-attribute
¶
limiter = Limiter(get_remote_address, app=app, default_limits=['1 per second'])
manga_synopsis_columns
module-attribute
¶
stream_formatter
module-attribute
¶
calculate_cosine_similarities
¶
calculate_cosine_similarities(model: SentenceTransformer | CustomT5EncoderModel, model_name: str, new_embedding: ndarray, col: str, dataset_type: str) -> ndarray
Calculates cosine similarities between a new embedding and existing embeddings.
This function:
-
Loads pre-computed embeddings for the specified column
-
Verifies embedding dimensions match
-
Computes cosine similarity scores using GPU if available
PARAMETER | DESCRIPTION |
---|---|
model
|
The transformer model used for encoding
TYPE:
|
model_name
|
Name of the model
TYPE:
|
new_embedding
|
Embedding vector of the input description
TYPE:
|
col
|
Name of the synopsis column
TYPE:
|
dataset_type
|
Type of dataset ('anime' or 'manga')
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ndarray
|
Array of cosine similarity scores between the new embedding and all existing embeddings |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If embedding dimensions don't match |
Source code in src/api.py
clear_memory
¶
Frees up system memory and GPU cache.
This function performs two cleanup operations:
-
Empties the GPU cache if CUDA is being used
-
Runs Python's garbage collector to free memory
Source code in src/api.py
find_top_similarities
¶
find_top_similarities(cosine_similarities_dict: Dict[str, ndarray], num_similarities: int = 10) -> List[Tuple[int, str]]
Finds the top N most similar descriptions across all synopsis columns.
This function:
-
Processes similarity scores from all columns
-
Sorts them in descending order
-
Returns indices and column names for the top matches
PARAMETER | DESCRIPTION |
---|---|
cosine_similarities_dict
|
Dictionary mapping column names to arrays of similarity scores
TYPE:
|
num_similarities
|
Number of top similarities to return (default: 10)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Tuple[int, str]]
|
List of tuples containing (index, column_name) for the top similar descriptions, |
List[Tuple[int, str]]
|
sorted by similarity score in descending order |
Source code in src/api.py
get_anime_similarities
¶
API endpoint for finding similar anime based on a description.
This endpoint:
-
Validates the request payload
-
Processes the description using the specified model
-
Returns paginated results of similar anime
Expected JSON payload:
{
"model": str, # Name of the model to use
"description": str, # Input description to find similarities for
"page": int, # Optional: Page number (default: 1)
"resultsPerPage": int # Optional: Results per page (default: 10)
}
RETURNS | DESCRIPTION |
---|---|
Response
|
JSON response containing: |
Response
|
|
Response
|
|
Response
|
|
RAISES | DESCRIPTION |
---|---|
400
|
If request validation fails |
500
|
If internal processing error occurs |
Source code in src/api.py
get_manga_similarities
¶
API endpoint for finding similar manga based on a description.
This endpoint:
-
Validates the request payload
-
Processes the description using the specified model
-
Returns paginated results of similar manga
Expected JSON payload:
{
"model": str, # Name of the model to use
"description": str, # Input description to find similarities for
"page": int, # Optional: Page number (default: 1)
"resultsPerPage": int # Optional: Results per page (default: 10)
}
RETURNS | DESCRIPTION |
---|---|
Response
|
JSON response containing: |
Response
|
|
Response
|
|
Response
|
|
RAISES | DESCRIPTION |
---|---|
400
|
If request validation fails |
500
|
If internal processing error occurs |
Source code in src/api.py
get_similarities
¶
get_similarities(model_name: str, description: str, dataset_type: str, page: int = 1, results_per_page: int = 10) -> List[Dict[str, Any]]
Finds the most similar descriptions in the specified dataset.
This function:
-
Loads and validates the appropriate model
-
Encodes the input description
-
Calculates similarities with all stored descriptions
-
Returns paginated results with metadata
PARAMETER | DESCRIPTION |
---|---|
model_name
|
Name of the model to use
TYPE:
|
description
|
Input description to find similarities for
TYPE:
|
dataset_type
|
Type of dataset ('anime' or 'manga')
TYPE:
|
page
|
Page number for pagination (default: 1)
TYPE:
|
results_per_page
|
Number of results per page (default: 10)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Dict[str, Any]]
|
List of dictionaries containing similar items with metadata and similarity scores |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If model name is invalid or model loading fails |
Source code in src/api.py
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 |
|
load_embeddings
¶
Loads pre-computed embeddings for a specific model and dataset column.
PARAMETER | DESCRIPTION |
---|---|
model_name
|
Name of the model used to generate the embeddings
TYPE:
|
col
|
Name of the synopsis column
TYPE:
|
dataset_type
|
Type of dataset ('anime' or 'manga')
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ndarray
|
NumPy array containing the pre-computed embeddings |
RAISES | DESCRIPTION |
---|---|
FileNotFoundError
|
If the embeddings file doesn't exist |
Source code in src/api.py
periodic_memory_clear
¶
Runs a background thread that periodically cleans up memory.
The thread monitors the time since the last API request. If no requests have been made for over 300 seconds (5 minutes), it triggers memory cleanup to free resources.
The function runs indefinitely until the application is shut down.
Source code in src/api.py
update_last_request_time
¶
Updates the last request time to the current time in a thread-safe manner.
This function is used to track when the last API request was made, which helps with memory management and cleanup of unused resources.
Source code in src/api.py
validate_input
¶
Validates the input data for API requests.
This function checks that:
-
Both model name and description are provided
-
The description length is within acceptable limits
-
The specified model is in the list of allowed models
PARAMETER | DESCRIPTION |
---|---|
data
|
Dictionary containing the request data with 'model' and 'description' keys
TYPE:
|
RAISES | DESCRIPTION |
---|---|
HTTPException
|
If any validation check fails, with appropriate error message and status code |