Skip to content

API Server Implementation

This page provides a detailed technical reference for the AniSearch API Server implementation.

Module Overview

The src.api module implements a FastAPI server that exposes the AniSearch Model functionality through RESTful HTTP endpoints. It's designed to be scalable, configurable, and production-ready.

API Reference

AniSearch API Server

A FastAPI server that exposes the AniSearch functionality through HTTP endpoints.

This module provides a REST API for searching anime and manga datasets using cross-encoder models for semantic similarity. It allows clients to:

  1. Search for anime matching a description
  2. Search for manga matching a description
  3. List available models
  4. Get health check status

Features

  • RESTful API: Clean, standards-compliant API design
  • Interactive Documentation: Automatic OpenAPI/Swagger UI at /docs
  • CORS Support: Configurable cross-origin resource sharing
  • Multi-worker Architecture: Handles concurrent requests efficiently
  • Model Caching: Avoids reloading models for each request
  • Route Restrictions: Configurable endpoint enabling/disabling for production
  • Custom Performance Settings: Configurable worker count and connection limits

API Endpoints

Endpoint Method Description
/ GET Health check and CUDA availability
/models GET List available models and fine-tuned models
/search/anime POST Search for anime matching a description
/search/manga POST Search for manga matching a description

Server Usage

# Basic usage
python -m src.api

# With custom settings
python -m src.api --host=127.0.0.1 --port=9000 --workers=4

# Production mode with restricted routes
python -m src.api --enable-routes=search --cors-origins="https://yourdomain.com"

GPU Acceleration

For optimal performance, especially with larger models, using a GPU is recommended. To enable GPU support, install PyTorch with CUDA:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

You can then specify device=cuda in your API requests to utilize GPU acceleration.

all_routes_enabled module-attribute

all_routes_enabled = 'all' in enabled_routes

app module-attribute

app = FastAPI(title='AniSearch API', description='API for searching anime and manga using semantic similarity', version='1.0.0')

args module-attribute

args = parse_args()

enabled_routes module-attribute

enabled_routes = [lower() for route in split(',')]

headers module-attribute

headers = [strip() for header in split(',')] if cors_headers != '*' else ['*']

logger module-attribute

logger = getLogger(__name__)

methods module-attribute

methods = [strip() for method in split(',')] if cors_methods != '*' else ['*']

model_cache module-attribute

model_cache: Dict[str, BaseSearchModel] = {}

num_workers module-attribute

num_workers = max(1, int(cpu_count() / 2))

origins module-attribute

origins = [strip() for origin in split(',')] if cors_origins != '*' else ['*']

parser module-attribute

parser = ArgumentParser(description='AniSearch API Server')

restricted_app module-attribute

restricted_app = FastAPI(title=title, description=description, version=version)

safe_path module-attribute

safe_path = replace('\\', '/')

temp_module_path module-attribute

temp_module_path = join(temp_dir, 'temp_api.py')

HealthResponse

Bases: BaseModel

Response model for the health check endpoint.

This model defines the structure of the response returned by the health check endpoint. It includes the overall API status, the status of each model type, and information about CUDA availability.

ATTRIBUTE DESCRIPTION
status

Overall status of the API ('healthy' or 'degraded')

TYPE: str

models_loaded

Dictionary of model types and their loading status

TYPE: Dict[str, bool]

cuda_available

Whether CUDA is available on the system

TYPE: bool

Example
health = HealthResponse(
    status="healthy",
    models_loaded={"anime": True, "manga": True},
    cuda_available=True
)

cuda_available class-attribute instance-attribute

cuda_available: bool = Field(..., description='Whether CUDA is available on this system')

models_loaded class-attribute instance-attribute

models_loaded: Dict[str, bool] = Field(..., description='Status of the search models')

status class-attribute instance-attribute

status: str = Field(..., description='Health status of the API')

ModelsResponse

Bases: BaseModel

Response model for the models endpoint.

This model defines the structure of the response returned by the models endpoint. It includes information about available pre-trained models and any fine-tuned models.

ATTRIBUTE DESCRIPTION
models

Dictionary of model categories and available models

TYPE: Dict[str, Dict[str, str]]

fine_tuned

Dictionary of fine-tuned model names and their paths

TYPE: Dict[str, str]

Example
models = ModelsResponse(
    models={
        "Semantic Search": {
            "cross-encoder/ms-marco-MiniLM-L-6-v2": "Recommended for general search"
        }
    },
    fine_tuned={
        "anime-v1": "model/fine-tuned/anime-v1"
    }
)

fine_tuned class-attribute instance-attribute

fine_tuned: Dict[str, str] = Field(..., description='Available fine-tuned models')

models class-attribute instance-attribute

models: Dict[str, Dict[str, str]] = Field(..., description='Available models by category')

SearchRequest

Bases: BaseModel

Request model for anime and manga search endpoints.

This model defines the required and optional parameters for search requests. It includes validation rules to ensure the parameters are within acceptable ranges.

ATTRIBUTE DESCRIPTION
query

The search query text describing the anime/manga to find

TYPE: str

num_results

Number of results to return (default: 5)

TYPE: int

batch_size

Batch size for processing the search in the model (default: 32)

TYPE: int

Example
search_request = SearchRequest(
    query="A story about a young wizard learning magic",
    num_results=10,
    batch_size=64
)

batch_size class-attribute instance-attribute

batch_size: int = Field(32, description='Batch size for processing', ge=8, le=512)

num_results class-attribute instance-attribute

num_results: int = Field(5, description='Number of results to return', ge=1, le=100)

query class-attribute instance-attribute

query: str = Field(..., description='The search query text', min_length=1)

SearchResponse

Bases: BaseModel

Response model for anime and manga search endpoints.

This model defines the structure of the response returned by the search endpoints. It includes the search results, execution time, and device used for computation.

ATTRIBUTE DESCRIPTION
results

List of search results sorted by relevance

TYPE: List[SearchResult]

execution_time_ms

Total execution time of the search in milliseconds

TYPE: float

device_used

The device used for computation (e.g., 'cpu', 'cuda')

TYPE: str

Example
response = SearchResponse(
    results=[
        SearchResult(id=1535, title="Death Note", score=0.92, synopsis="..."),
        SearchResult(id=5114, title="Fullmetal Alchemist", score=0.85, synopsis="...")
    ],
    execution_time_ms=156.32,
    device_used="cuda"
)

device_used class-attribute instance-attribute

device_used: str = Field(..., description='Device used for computation (CPU/CUDA)')

execution_time_ms class-attribute instance-attribute

execution_time_ms: float = Field(..., description='Execution time in milliseconds')

results class-attribute instance-attribute

results: List[SearchResult] = Field(..., description='Search results')

SearchResult

Bases: BaseModel

Individual search result item returned by the search endpoints.

This model represents a single anime or manga entry matched by the search. It includes the basic information needed to display the result to the user.

ATTRIBUTE DESCRIPTION
id

Unique identifier for the entry (anime_id or manga_id)

TYPE: Union[int, str]

title

Title of the anime/manga

TYPE: str

score

Relevance score between 0.0 and 1.0 (higher is more relevant)

TYPE: float

synopsis

Partial synopsis text (may be truncated for display)

TYPE: str

Example
result = SearchResult(
    id=1535,
    title="Death Note",
    score=0.92,
    synopsis="Light Yagami is a genius high schooler who discovers..."
)

id class-attribute instance-attribute

id: Union[int, str] = Field(..., description='Unique identifier for the entry')

score class-attribute instance-attribute

score: float = Field(..., description='Relevance score', ge=0.0, le=1.0)

synopsis class-attribute instance-attribute

synopsis: str = Field(..., description='Synopsis text (may be truncated)')

title class-attribute instance-attribute

title: str = Field(..., description='Title of the anime/manga')

get_available_models async

get_available_models() -> ModelsResponse

Get a list of available pre-trained and fine-tuned models.

This endpoint returns information about models that can be used with the search endpoints. It includes:

  1. Pre-trained models categorized by type (e.g., Semantic Search, Question Answering)
  2. Fine-tuned models specifically trained for anime/manga search
RETURNS DESCRIPTION
ModelsResponse

Available models and their descriptions

TYPE: ModelsResponse

ModelsResponse
  • models: Dictionary of model categories and available pre-trained models
ModelsResponse
  • fine_tuned: Dictionary of fine-tuned model names and their paths
Example
curl -X GET "http://localhost:8000/models"
Note

Fine-tuned models are located in the model/fine-tuned directory. The API will only list models that have a valid configuration file.

Source code in src/api.py
@app.get("/models", response_model=ModelsResponse)
async def get_available_models() -> ModelsResponse:
    """
    Get a list of available pre-trained and fine-tuned models.

    This endpoint returns information about models that can be used with the
    search endpoints. It includes:

    1. Pre-trained models categorized by type (e.g., Semantic Search, Question Answering)
    2. Fine-tuned models specifically trained for anime/manga search

    Returns:
        ModelsResponse: Available models and their descriptions

        - **models**: Dictionary of model categories and available pre-trained models
        - **fine_tuned**: Dictionary of fine-tuned model names and their paths

    Example:
        ```bash
        curl -X GET "http://localhost:8000/models"
        ```

    Note:
        Fine-tuned models are located in the `model/fine-tuned` directory.
        The API will only list models that have a valid configuration file.
    """
    # Convert Mapping to Dict to satisfy the type checker
    models = dict(BaseSearchModel.list_available_models())
    fine_tuned = BaseSearchModel.list_fine_tuned_models()

    return ModelsResponse(models=models, fine_tuned=fine_tuned)

get_or_create_model

get_or_create_model(dataset_type: str, model_name: str, device: Optional[str] = None, include_light_novels: bool = False) -> BaseSearchModel

Get a cached model or create a new one if not already cached.

This function manages the model cache to avoid reloading models for each request. It handles device selection, CUDA availability checking, and model initialization.

PARAMETER DESCRIPTION
dataset_type

The type of dataset to use ('anime' or 'manga')

TYPE: str

model_name

The name or path of the model to use

TYPE: str

device

Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.) If None, automatically selects the best available device

TYPE: Optional[str] DEFAULT: None

include_light_novels

Whether to include light novels in manga search results Only relevant for manga dataset_type

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
BaseSearchModel

An initialized search model ready for queries

TYPE: BaseSearchModel

RAISES DESCRIPTION
ValueError

If the model or dataset cannot be loaded

RuntimeError

If there are issues initializing the model

Note

If CUDA is requested but not available, it will automatically fall back to CPU with a warning.

Source code in src/api.py
def get_or_create_model(
    dataset_type: str,
    model_name: str,
    device: Optional[str] = None,
    include_light_novels: bool = False,
) -> BaseSearchModel:
    """
    Get a cached model or create a new one if not already cached.

    This function manages the model cache to avoid reloading models for each request.
    It handles device selection, CUDA availability checking, and model initialization.

    Args:
        dataset_type: The type of dataset to use ('anime' or 'manga')
        model_name: The name or path of the model to use
        device: Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.)
            If None, automatically selects the best available device
        include_light_novels: Whether to include light novels in manga search results
            Only relevant for manga dataset_type

    Returns:
        BaseSearchModel: An initialized search model ready for queries

    Raises:
        ValueError: If the model or dataset cannot be loaded
        RuntimeError: If there are issues initializing the model

    Note:
        If CUDA is requested but not available, it will automatically
        fall back to CPU with a warning.
    """
    # Check if CUDA is available when 'cuda' is requested
    import torch  # pylint: disable=import-outside-toplevel

    cuda_requested = device is not None and "cuda" in device
    cuda_available = torch.cuda.is_available()

    # Force CPU if CUDA is requested but not available
    if cuda_requested and not cuda_available:
        logger.warning("CUDA was requested but is not available. Falling back to CPU.")
        selected_device = "cpu"
    else:
        # Use the specified device or auto-detect
        selected_device = get_device(device)

    # Create a unique key for this configuration
    key = f"{dataset_type}_{model_name}_{selected_device}_{include_light_novels}"

    if key not in model_cache:
        logger.info("Creating new model: %s on device: %s", key, selected_device)
        model = get_search_model(
            dataset_type=dataset_type,
            model_name=model_name,
            device=selected_device,
            include_light_novels=include_light_novels,
        )

        # The model's device is already set in its constructor
        model_cache[key] = model

    return model_cache[key]

health_check async

health_check() -> HealthResponse

Check if the API is running and ready to handle requests.

This endpoint verifies that the API server is operational and provides information about the status of different components:

  1. Whether the API server itself is running
  2. Whether each model type (anime, manga) can be loaded
  3. Whether CUDA is available for GPU acceleration
RETURNS DESCRIPTION
HealthResponse

The health status of the API

TYPE: HealthResponse

HealthResponse
  • status: "healthy" if critical components are working, "degraded" otherwise
HealthResponse
  • models_loaded: Dictionary indicating which models loaded successfully
HealthResponse
  • cuda_available: Boolean indicating if CUDA is available for GPU acceleration
Example
curl -X GET "http://localhost:8000/"
Note

This endpoint intentionally uses CPU for model loading checks to avoid GPU memory issues during health checking.

Source code in src/api.py
@app.get("/", response_model=HealthResponse)
async def health_check() -> HealthResponse:
    """
    Check if the API is running and ready to handle requests.

    This endpoint verifies that the API server is operational and provides
    information about the status of different components:

    1. Whether the API server itself is running
    2. Whether each model type (anime, manga) can be loaded
    3. Whether CUDA is available for GPU acceleration

    Returns:
        HealthResponse: The health status of the API

        - **status**: "healthy" if critical components are working, "degraded" otherwise
        - **models_loaded**: Dictionary indicating which models loaded successfully
        - **cuda_available**: Boolean indicating if CUDA is available for GPU acceleration

    Example:
        ```bash
        curl -X GET "http://localhost:8000/"
        ```

    Note:
        This endpoint intentionally uses CPU for model loading checks to avoid
        GPU memory issues during health checking.
    """
    # Check CUDA availability
    import torch  # pylint: disable=import-outside-toplevel

    cuda_available = torch.cuda.is_available()

    # Check if models can be loaded
    models_loaded = {
        "anime": False,
        "manga": False,
    }

    try:
        # Try on CPU to avoid GPU memory issues during health check
        get_or_create_model(
            "anime", "cross-encoder/ms-marco-MiniLM-L-6-v2", device="cpu"
        )
        models_loaded["anime"] = True
    except (ImportError, ValueError, RuntimeError, FileNotFoundError) as e:
        logger.error("Error loading anime model: %s", str(e))

    try:
        # Try on CPU to avoid GPU memory issues during health check
        get_or_create_model(
            "manga", "cross-encoder/ms-marco-MiniLM-L-6-v2", device="cpu"
        )
        models_loaded["manga"] = True
    except (ImportError, ValueError, RuntimeError, FileNotFoundError) as e:
        logger.error("Error loading manga model: %s", str(e))

    return HealthResponse(
        status="healthy" if any(models_loaded.values()) else "degraded",
        models_loaded=models_loaded,
        cuda_available=cuda_available,
    )

restricted_get_models async

restricted_get_models()

List available models endpoint for the restricted API mode.

This endpoint returns information about models that can be used with the search endpoints in restricted mode.

RETURNS DESCRIPTION
ModelsResponse

Available models and their descriptions

Source code in src/api.py
@restricted_app.get("/models", response_model=ModelsResponse)
async def restricted_get_models():
    """
    List available models endpoint for the restricted API mode.

    This endpoint returns information about models that can be used with
    the search endpoints in restricted mode.

    Returns:
        ModelsResponse: Available models and their descriptions
    """
    return await get_available_models()

restricted_health_check async

restricted_health_check()

Health check endpoint for the restricted API mode.

This endpoint verifies that the API server is operational in restricted mode and provides information about the status of different components.

RETURNS DESCRIPTION
HealthResponse

The health status of the API

Source code in src/api.py
@restricted_app.get("/", response_model=HealthResponse)
async def restricted_health_check():
    """
    Health check endpoint for the restricted API mode.

    This endpoint verifies that the API server is operational in restricted mode
    and provides information about the status of different components.

    Returns:
        HealthResponse: The health status of the API
    """
    return await health_check()

restricted_search_anime async

restricted_search_anime(*fn_args, **kwargs)

Search for anime endpoint for the restricted API mode.

This endpoint performs semantic search against the anime dataset using the specified model in restricted mode.

Parameters are the same as the regular search_anime endpoint.

RETURNS DESCRIPTION
SearchResponse

The search results with relevant anime matches

Source code in src/api.py
@restricted_app.post("/search/anime", response_model=SearchResponse)
async def restricted_search_anime(*fn_args, **kwargs):
    """
    Search for anime endpoint for the restricted API mode.

    This endpoint performs semantic search against the anime dataset
    using the specified model in restricted mode.

    Parameters are the same as the regular search_anime endpoint.

    Returns:
        SearchResponse: The search results with relevant anime matches
    """
    return await search_anime(*fn_args, **kwargs)

restricted_search_manga async

restricted_search_manga(*fn_args, **kwargs)

Search for manga endpoint for the restricted API mode.

This endpoint performs semantic search against the manga dataset using the specified model in restricted mode.

Parameters are the same as the regular search_manga endpoint.

RETURNS DESCRIPTION
SearchResponse

The search results with relevant manga matches

Source code in src/api.py
@restricted_app.post("/search/manga", response_model=SearchResponse)
async def restricted_search_manga(*fn_args, **kwargs):
    """
    Search for manga endpoint for the restricted API mode.

    This endpoint performs semantic search against the manga dataset
    using the specified model in restricted mode.

    Parameters are the same as the regular search_manga endpoint.

    Returns:
        SearchResponse: The search results with relevant manga matches
    """
    return await search_manga(*fn_args, **kwargs)

search_anime async

search_anime(request: SearchRequest, model_name: str = Query('cross-encoder/ms-marco-MiniLM-L-6-v2', description='Model name or path'), device: Optional[str] = Query(None, description="Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). If not specified, uses the best available device.")) -> SearchResponse

Search for anime matching the provided description.

This endpoint performs semantic search against the anime dataset using the specified model, returning the most relevant matches sorted by score.

Parameters
  • request: The search request body containing:

    • query: The search query text describing the anime
    • num_results: Number of results to return (default: 5, max: 100)
    • batch_size: Batch size for processing (default: 32)
  • model_name: The model to use for search (query parameter)

    • Can be a pre-trained model name or path to a fine-tuned model
    • Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"
  • device: The device to run the model on (query parameter)

    • Options: 'cpu', 'cuda', 'cuda:0', etc.
    • If not specified, uses the best available device
Returns
  • results: List of anime matching the query, sorted by relevance
  • execution_time_ms: Time taken to execute the search in milliseconds
  • device_used: The device used for computation (e.g., 'cpu', 'cuda')
Example
curl -X POST "http://localhost:8000/search/anime?device=cuda" \
  -H "Content-Type: application/json" \
  -d '{"query": "A story about robots and AI"}'
Notes
  • For optimal performance on large queries, use GPU acceleration with device=cuda
  • Model caching is used to avoid reloading models between requests
  • Results include truncated synopses; full content is available in the dataset
Source code in src/api.py
@app.post("/search/anime", response_model=SearchResponse)
async def search_anime(
    request: SearchRequest,
    model_name: str = Query(
        "cross-encoder/ms-marco-MiniLM-L-6-v2", description="Model name or path"
    ),
    device: Optional[str] = Query(
        None,
        description="Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). "
        "If not specified, uses the best available device.",
    ),
) -> SearchResponse:
    """
    Search for anime matching the provided description.

    This endpoint performs semantic search against the anime dataset using
    the specified model, returning the most relevant matches sorted by score.

    ## Parameters

    - **request**: The search request body containing:
        - **query**: The search query text describing the anime
        - **num_results**: Number of results to return (default: 5, max: 100)
        - **batch_size**: Batch size for processing (default: 32)

    - **model_name**: The model to use for search (query parameter)
        - Can be a pre-trained model name or path to a fine-tuned model
        - Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"

    - **device**: The device to run the model on (query parameter)
        - Options: 'cpu', 'cuda', 'cuda:0', etc.
        - If not specified, uses the best available device

    ## Returns

    - **results**: List of anime matching the query, sorted by relevance
    - **execution_time_ms**: Time taken to execute the search in milliseconds
    - **device_used**: The device used for computation (e.g., 'cpu', 'cuda')

    ## Example

    ```bash
    curl -X POST "http://localhost:8000/search/anime?device=cuda" \\
      -H "Content-Type: application/json" \\
      -d '{"query": "A story about robots and AI"}'
    ```

    ## Notes

    - For optimal performance on large queries, use GPU acceleration with `device=cuda`
    - Model caching is used to avoid reloading models between requests
    - Results include truncated synopses; full content is available in the dataset
    """
    import time  # pylint: disable=import-outside-toplevel

    try:
        # Get the search model
        start_time = time.time()
        search_model = get_or_create_model("anime", model_name, device=device)

        # Perform the search
        results = search_model.search(
            query=request.query,
            num_results=request.num_results,
            batch_size=request.batch_size,
        )

        # Convert to response format
        execution_time_ms = (time.time() - start_time) * 1000
        return SearchResponse(
            results=[SearchResult(**result) for result in results],
            execution_time_ms=execution_time_ms,
            device_used=search_model.device,
        )
    except (ImportError, ValueError, RuntimeError, FileNotFoundError) as e:
        logger.error("Error in anime search: %s", str(e), exc_info=True)
        raise HTTPException(
            status_code=500, detail=f"Error performing search: {str(e)}"
        ) from e

search_manga async

search_manga(request: SearchRequest, model_name: str = Query('cross-encoder/ms-marco-MiniLM-L-6-v2', description='Model name or path'), include_light_novels: bool = Query(False, description='Whether to include light novels in search results'), device: Optional[str] = Query(None, description="Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). If not specified, uses the best available device.")) -> SearchResponse

Search for manga matching the provided description.

This endpoint performs semantic search against the manga dataset using the specified model, returning the most relevant matches sorted by score.

Parameters
  • request: The search request body containing:

    • query: The search query text describing the manga
    • num_results: Number of results to return (default: 5, max: 100)
    • batch_size: Batch size for processing (default: 32)
  • model_name: The model to use for search (query parameter)

    • Can be a pre-trained model name or path to a fine-tuned model
    • Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"
  • include_light_novels: Whether to include light novels in results (query parameter)

    • Default: false
  • device: The device to run the model on (query parameter)

    • Options: 'cpu', 'cuda', 'cuda:0', etc.
    • If not specified, uses the best available device
Returns
  • results: List of manga matching the query, sorted by relevance
  • execution_time_ms: Time taken to execute the search in milliseconds
  • device_used: The device used for computation (e.g., 'cpu', 'cuda')
Example
curl -X POST "http://localhost:8000/search/manga?include_light_novels=true&device=cuda" \
  -H "Content-Type: application/json" \
  -d '{"query": "A fantasy adventure in a magical world", "num_results": 10}'
Notes
  • Use include_light_novels=true to include light novels in search results
  • For optimal performance on large queries, use GPU acceleration with device=cuda
  • Model caching is used to avoid reloading models between requests
  • Results include truncated synopses; full content is available in the dataset
Source code in src/api.py
@app.post("/search/manga", response_model=SearchResponse)
async def search_manga(
    request: SearchRequest,
    model_name: str = Query(
        "cross-encoder/ms-marco-MiniLM-L-6-v2", description="Model name or path"
    ),
    include_light_novels: bool = Query(
        False, description="Whether to include light novels in search results"
    ),
    device: Optional[str] = Query(
        None,
        description="Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). "
        "If not specified, uses the best available device.",
    ),
) -> SearchResponse:
    """
    Search for manga matching the provided description.

    This endpoint performs semantic search against the manga dataset using
    the specified model, returning the most relevant matches sorted by score.

    ## Parameters

    - **request**: The search request body containing:
        - **query**: The search query text describing the manga
        - **num_results**: Number of results to return (default: 5, max: 100)
        - **batch_size**: Batch size for processing (default: 32)

    - **model_name**: The model to use for search (query parameter)
        - Can be a pre-trained model name or path to a fine-tuned model
        - Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"

    - **include_light_novels**: Whether to include light novels in results (query parameter)
        - Default: false

    - **device**: The device to run the model on (query parameter)
        - Options: 'cpu', 'cuda', 'cuda:0', etc.
        - If not specified, uses the best available device

    ## Returns

    - **results**: List of manga matching the query, sorted by relevance
    - **execution_time_ms**: Time taken to execute the search in milliseconds
    - **device_used**: The device used for computation (e.g., 'cpu', 'cuda')

    ## Example

    ```bash
    curl -X POST "http://localhost:8000/search/manga?include_light_novels=true&device=cuda" \\
      -H "Content-Type: application/json" \\
      -d '{"query": "A fantasy adventure in a magical world", "num_results": 10}'
    ```

    ## Notes

    - Use `include_light_novels=true` to include light novels in search results
    - For optimal performance on large queries, use GPU acceleration with `device=cuda`
    - Model caching is used to avoid reloading models between requests
    - Results include truncated synopses; full content is available in the dataset
    """
    import time  # pylint: disable=import-outside-toplevel

    try:
        # Get the search model
        start_time = time.time()
        search_model = get_or_create_model(
            "manga",
            model_name,
            device=device,
            include_light_novels=include_light_novels,
        )

        # Perform the search
        results = search_model.search(
            query=request.query,
            num_results=request.num_results,
            batch_size=request.batch_size,
        )

        # Convert to response format
        execution_time_ms = (time.time() - start_time) * 1000
        return SearchResponse(
            results=[SearchResult(**result) for result in results],
            execution_time_ms=execution_time_ms,
            device_used=search_model.device,
        )
    except (ImportError, ValueError, RuntimeError, FileNotFoundError) as e:
        logger.error("Error in manga search: %s", str(e), exc_info=True)
        raise HTTPException(
            status_code=500, detail=f"Error performing search: {str(e)}"
        ) from e