API Server Implementation¶

This page provides a detailed technical reference for the AniSearch API Server implementation.

Module Overview¶

The src.api module implements a FastAPI server that exposes the AniSearch Model functionality through RESTful HTTP endpoints. It's designed to be scalable, configurable, and production-ready.

API Reference¶

AniSearch API Server¶

A FastAPI server that exposes the AniSearch functionality through HTTP endpoints.

This module provides a REST API for searching anime and manga datasets using cross-encoder models for semantic similarity. It allows clients to:

Search for anime matching a description
Search for manga matching a description
List available models
Get health check status

Features¶

RESTful API: Clean, standards-compliant API design
Interactive Documentation: Automatic OpenAPI/Swagger UI at /docs
CORS Support: Configurable cross-origin resource sharing
Multi-worker Architecture: Handles concurrent requests efficiently
Model Caching: Avoids reloading models for each request
Route Restrictions: Configurable endpoint enabling/disabling for production
Custom Performance Settings: Configurable worker count and connection limits

API Endpoints¶

Endpoint	Method	Description
`/`	GET	Health check and CUDA availability
`/models`	GET	List available models and fine-tuned models
`/search/anime`	POST	Search for anime matching a description
`/search/manga`	POST	Search for manga matching a description

Server Usage¶

# Basic usage
python -m src.api

# With custom settings
python -m src.api --host=127.0.0.1 --port=9000 --workers=4

# Production mode with restricted routes
python -m src.api --enable-routes=search --cors-origins="https://yourdomain.com"

GPU Acceleration¶

For optimal performance, especially with larger models, using a GPU is recommended. To enable GPU support, install PyTorch with CUDA:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

You can then specify device=cuda in your API requests to utilize GPU acceleration.

all_routes_enabled `module-attribute` ¶

all_routes_enabled = 'all' in enabled_routes

app `module-attribute` ¶

app = FastAPI(title='AniSearch API', description='API for searching anime and manga using semantic similarity', version='1.0.0')

args `module-attribute` ¶

args = parse_args()

enabled_routes `module-attribute` ¶

enabled_routes = [lower() for route in split(',')]

headers `module-attribute` ¶

headers = [strip() for header in split(',')] if cors_headers != '*' else ['*']

logger `module-attribute` ¶

logger = getLogger(__name__)

methods `module-attribute` ¶

methods = [strip() for method in split(',')] if cors_methods != '*' else ['*']

model_cache `module-attribute` ¶

model_cache: Dict[str, BaseSearchModel] = {}

num_workers `module-attribute` ¶

num_workers = max(1, int(cpu_count() / 2))

origins `module-attribute` ¶

origins = [strip() for origin in split(',')] if cors_origins != '*' else ['*']

parser `module-attribute` ¶

parser = ArgumentParser(description='AniSearch API Server')

restricted_app `module-attribute` ¶

restricted_app = FastAPI(title=title, description=description, version=version)

safe_path `module-attribute` ¶

safe_path = replace('\\', '/')

temp_module_path `module-attribute` ¶

temp_module_path = join(temp_dir, 'temp_api.py')

HealthResponse ¶

Bases: BaseModel

Response model for the health check endpoint.

This model defines the structure of the response returned by the health check endpoint. It includes the overall API status, the status of each model type, and information about CUDA availability.

ATTRIBUTE	DESCRIPTION
`status`	Overall status of the API ('healthy' or 'degraded') TYPE: `str`
`models_loaded`	Dictionary of model types and their loading status TYPE: `Dict[str, bool]`
`cuda_available`	Whether CUDA is available on the system TYPE: `bool`

Example

health = HealthResponse(
    status="healthy",
    models_loaded={"anime": True, "manga": True},
    cuda_available=True
)

cuda_available `class-attribute` `instance-attribute` ¶

cuda_available: bool = Field(..., description='Whether CUDA is available on this system')

models_loaded `class-attribute` `instance-attribute` ¶

models_loaded: Dict[str, bool] = Field(..., description='Status of the search models')

status `class-attribute` `instance-attribute` ¶

status: str = Field(..., description='Health status of the API')

ModelsResponse ¶

Bases: BaseModel

Response model for the models endpoint.

This model defines the structure of the response returned by the models endpoint. It includes information about available pre-trained models and any fine-tuned models.

ATTRIBUTE	DESCRIPTION
`models`	Dictionary of model categories and available models TYPE: `Dict[str, Dict[str, str]]`
`fine_tuned`	Dictionary of fine-tuned model names and their paths TYPE: `Dict[str, str]`

Example

models = ModelsResponse(
    models={
        "Semantic Search": {
            "cross-encoder/ms-marco-MiniLM-L-6-v2": "Recommended for general search"
        }
    },
    fine_tuned={
        "anime-v1": "model/fine-tuned/anime-v1"
    }
)

fine_tuned `class-attribute` `instance-attribute` ¶

fine_tuned: Dict[str, str] = Field(..., description='Available fine-tuned models')

models `class-attribute` `instance-attribute` ¶

models: Dict[str, Dict[str, str]] = Field(..., description='Available models by category')

SearchRequest ¶

Bases: BaseModel

Request model for anime and manga search endpoints.

This model defines the required and optional parameters for search requests. It includes validation rules to ensure the parameters are within acceptable ranges.

ATTRIBUTE	DESCRIPTION
`query`	The search query text describing the anime/manga to find TYPE: `str`
`num_results`	Number of results to return (default: 5) TYPE: `int`
`batch_size`	Batch size for processing the search in the model (default: 32) TYPE: `int`

Example

search_request = SearchRequest(
    query="A story about a young wizard learning magic",
    num_results=10,
    batch_size=64
)

batch_size `class-attribute` `instance-attribute` ¶

batch_size: int = Field(32, description='Batch size for processing', ge=8, le=512)

num_results `class-attribute` `instance-attribute` ¶

num_results: int = Field(5, description='Number of results to return', ge=1, le=100)

query `class-attribute` `instance-attribute` ¶

query: str = Field(..., description='The search query text', min_length=1)

SearchResponse ¶

Bases: BaseModel

Response model for anime and manga search endpoints.

This model defines the structure of the response returned by the search endpoints. It includes the search results, execution time, and device used for computation.

ATTRIBUTE	DESCRIPTION
`results`	List of search results sorted by relevance TYPE: `List[SearchResult]`
`execution_time_ms`	Total execution time of the search in milliseconds TYPE: `float`
`device_used`	The device used for computation (e.g., 'cpu', 'cuda') TYPE: `str`

Example

response = SearchResponse(
    results=[
        SearchResult(id=1535, title="Death Note", score=0.92, synopsis="..."),
        SearchResult(id=5114, title="Fullmetal Alchemist", score=0.85, synopsis="...")
    ],
    execution_time_ms=156.32,
    device_used="cuda"
)

device_used `class-attribute` `instance-attribute` ¶

device_used: str = Field(..., description='Device used for computation (CPU/CUDA)')

execution_time_ms `class-attribute` `instance-attribute` ¶

execution_time_ms: float = Field(..., description='Execution time in milliseconds')

results `class-attribute` `instance-attribute` ¶

results: List[SearchResult] = Field(..., description='Search results')

SearchResult ¶

Bases: BaseModel

Individual search result item returned by the search endpoints.

This model represents a single anime or manga entry matched by the search. It includes the basic information needed to display the result to the user.

ATTRIBUTE	DESCRIPTION
`id`	Unique identifier for the entry (anime_id or manga_id) TYPE: `Union[int, str]`
`title`	Title of the anime/manga TYPE: `str`
`score`	Relevance score between 0.0 and 1.0 (higher is more relevant) TYPE: `float`
`synopsis`	Partial synopsis text (may be truncated for display) TYPE: `str`

Example

result = SearchResult(
    id=1535,
    title="Death Note",
    score=0.92,
    synopsis="Light Yagami is a genius high schooler who discovers..."
)

id `class-attribute` `instance-attribute` ¶

id: Union[int, str] = Field(..., description='Unique identifier for the entry')

score `class-attribute` `instance-attribute` ¶

score: float = Field(..., description='Relevance score', ge=0.0, le=1.0)

synopsis `class-attribute` `instance-attribute` ¶

synopsis: str = Field(..., description='Synopsis text (may be truncated)')

title `class-attribute` `instance-attribute` ¶

title: str = Field(..., description='Title of the anime/manga')

get_available_models `async` ¶

get_available_models() -> ModelsResponse

Get a list of available pre-trained and fine-tuned models.

This endpoint returns information about models that can be used with the search endpoints. It includes:

Pre-trained models categorized by type (e.g., Semantic Search, Question Answering)
Fine-tuned models specifically trained for anime/manga search

RETURNS	DESCRIPTION
`ModelsResponse`	Available models and their descriptions TYPE: `ModelsResponse`
`ModelsResponse`	models: Dictionary of model categories and available pre-trained models
`ModelsResponse`	fine_tuned: Dictionary of fine-tuned model names and their paths

Example

curl -X GET "http://localhost:8000/models"

Note

Fine-tuned models are located in the model/fine-tuned directory. The API will only list models that have a valid configuration file.

Source code in src/api.py

@app.get("/models", response_model=ModelsResponse)
async def get_available_models() -> ModelsResponse:
    """
    Get a list of available pre-trained and fine-tuned models.

    This endpoint returns information about models that can be used with the
    search endpoints. It includes:

    1. Pre-trained models categorized by type (e.g., Semantic Search, Question Answering)
    2. Fine-tuned models specifically trained for anime/manga search

    Returns:
        ModelsResponse: Available models and their descriptions

        - **models**: Dictionary of model categories and available pre-trained models
        - **fine_tuned**: Dictionary of fine-tuned model names and their paths

    Example:
        ```bash
        curl -X GET "http://localhost:8000/models"
        ```

    Note:
        Fine-tuned models are located in the `model/fine-tuned` directory.
        The API will only list models that have a valid configuration file.
    """
    # Convert Mapping to Dict to satisfy the type checker
    models = dict(BaseSearchModel.list_available_models())
    fine_tuned = BaseSearchModel.list_fine_tuned_models()

    return ModelsResponse(models=models, fine_tuned=fine_tuned)

get_or_create_model ¶

get_or_create_model(dataset_type: str, model_name: str, device: Optional[str] = None, include_light_novels: bool = False) -> BaseSearchModel

Get a cached model or create a new one if not already cached.

This function manages the model cache to avoid reloading models for each request. It handles device selection, CUDA availability checking, and model initialization.

PARAMETER	DESCRIPTION
`dataset_type`	The type of dataset to use ('anime' or 'manga') TYPE: `str`
`model_name`	The name or path of the model to use TYPE: `str`
`device`	Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.) If None, automatically selects the best available device TYPE: `Optional[str]` DEFAULT: `None`
`include_light_novels`	Whether to include light novels in manga search results Only relevant for manga dataset_type TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`BaseSearchModel`	An initialized search model ready for queries TYPE: `BaseSearchModel`

RAISES	DESCRIPTION
`ValueError`	If the model or dataset cannot be loaded
`RuntimeError`	If there are issues initializing the model

Note

If CUDA is requested but not available, it will automatically fall back to CPU with a warning.

Source code in src/api.py

def get_or_create_model(
    dataset_type: str,
    model_name: str,
    device: Optional[str] = None,
    include_light_novels: bool = False,
) -> BaseSearchModel:
    """
    Get a cached model or create a new one if not already cached.

    This function manages the model cache to avoid reloading models for each request.
    It handles device selection, CUDA availability checking, and model initialization.

    Args:
        dataset_type: The type of dataset to use ('anime' or 'manga')
        model_name: The name or path of the model to use
        device: Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.)
            If None, automatically selects the best available device
        include_light_novels: Whether to include light novels in manga search results
            Only relevant for manga dataset_type

    Returns:
        BaseSearchModel: An initialized search model ready for queries

    Raises:
        ValueError: If the model or dataset cannot be loaded
        RuntimeError: If there are issues initializing the model

    Note:
        If CUDA is requested but not available, it will automatically
        fall back to CPU with a warning.
    """
    # Check if CUDA is available when 'cuda' is requested
    import torch  # pylint: disable=import-outside-toplevel

    cuda_requested = device is not None and "cuda" in device
    cuda_available = torch.cuda.is_available()

    # Force CPU if CUDA is requested but not available
    if cuda_requested and not cuda_available:
        logger.warning("CUDA was requested but is not available. Falling back to CPU.")
        selected_device = "cpu"
    else:
        # Use the specified device or auto-detect
        selected_device = get_device(device)

    # Create a unique key for this configuration
    key = f"{dataset_type}_{model_name}_{selected_device}_{include_light_novels}"

    if key not in model_cache:
        logger.info("Creating new model: %s on device: %s", key, selected_device)
        model = get_search_model(
            dataset_type=dataset_type,
            model_name=model_name,
            device=selected_device,
            include_light_novels=include_light_novels,
        )

        # The model's device is already set in its constructor
        model_cache[key] = model

    return model_cache[key]

health_check `async` ¶

health_check() -> HealthResponse

Check if the API is running and ready to handle requests.

This endpoint verifies that the API server is operational and provides information about the status of different components:

Whether the API server itself is running
Whether each model type (anime, manga) can be loaded
Whether CUDA is available for GPU acceleration

RETURNS	DESCRIPTION
`HealthResponse`	The health status of the API TYPE: `HealthResponse`
`HealthResponse`	status: "healthy" if critical components are working, "degraded" otherwise
`HealthResponse`	models_loaded: Dictionary indicating which models loaded successfully
`HealthResponse`	cuda_available: Boolean indicating if CUDA is available for GPU acceleration

Example

curl -X GET "http://localhost:8000/"

Note

This endpoint intentionally uses CPU for model loading checks to avoid GPU memory issues during health checking.

Source code in src/api.py

@app.get("/", response_model=HealthResponse)
async def health_check() -> HealthResponse:
    """
    Check if the API is running and ready to handle requests.

    This endpoint verifies that the API server is operational and provides
    information about the status of different components:

    1. Whether the API server itself is running
    2. Whether each model type (anime, manga) can be loaded
    3. Whether CUDA is available for GPU acceleration

    Returns:
        HealthResponse: The health status of the API

        - **status**: "healthy" if critical components are working, "degraded" otherwise
        - **models_loaded**: Dictionary indicating which models loaded successfully
        - **cuda_available**: Boolean indicating if CUDA is available for GPU acceleration

    Example:
        ```bash
        curl -X GET "http://localhost:8000/"
        ```

    Note:
        This endpoint intentionally uses CPU for model loading checks to avoid
        GPU memory issues during health checking.
    """
    # Check CUDA availability
    import torch  # pylint: disable=import-outside-toplevel

    cuda_available = torch.cuda.is_available()

    # Check if models can be loaded
    models_loaded = {
        "anime": False,
        "manga": False,
    }

    try:
        # Try on CPU to avoid GPU memory issues during health check
        get_or_create_model(
            "anime", "cross-encoder/ms-marco-MiniLM-L-6-v2", device="cpu"
        )
        models_loaded["anime"] = True
    except (ImportError, ValueError, RuntimeError, FileNotFoundError) as e:
        logger.error("Error loading anime model: %s", str(e))

    try:
        # Try on CPU to avoid GPU memory issues during health check
        get_or_create_model(
            "manga", "cross-encoder/ms-marco-MiniLM-L-6-v2", device="cpu"
        )
        models_loaded["manga"] = True
    except (ImportError, ValueError, RuntimeError, FileNotFoundError) as e:
        logger.error("Error loading manga model: %s", str(e))

    return HealthResponse(
        status="healthy" if any(models_loaded.values()) else "degraded",
        models_loaded=models_loaded,
        cuda_available=cuda_available,
    )

restricted_get_models `async` ¶

restricted_get_models()

List available models endpoint for the restricted API mode.

This endpoint returns information about models that can be used with the search endpoints in restricted mode.

RETURNS	DESCRIPTION
`ModelsResponse`	Available models and their descriptions

Source code in src/api.py

@restricted_app.get("/models", response_model=ModelsResponse)
async def restricted_get_models():
    """
    List available models endpoint for the restricted API mode.

    This endpoint returns information about models that can be used with
    the search endpoints in restricted mode.

    Returns:
        ModelsResponse: Available models and their descriptions
    """
    return await get_available_models()

restricted_health_check `async` ¶

restricted_health_check()

Health check endpoint for the restricted API mode.

This endpoint verifies that the API server is operational in restricted mode and provides information about the status of different components.

RETURNS	DESCRIPTION
`HealthResponse`	The health status of the API

Source code in src/api.py

@restricted_app.get("/", response_model=HealthResponse)
async def restricted_health_check():
    """
    Health check endpoint for the restricted API mode.

    This endpoint verifies that the API server is operational in restricted mode
    and provides information about the status of different components.

    Returns:
        HealthResponse: The health status of the API
    """
    return await health_check()

restricted_search_anime `async` ¶

restricted_search_anime(*fn_args, **kwargs)

Search for anime endpoint for the restricted API mode.

This endpoint performs semantic search against the anime dataset using the specified model in restricted mode.

Parameters are the same as the regular search_anime endpoint.

RETURNS	DESCRIPTION
`SearchResponse`	The search results with relevant anime matches

Source code in src/api.py

@restricted_app.post("/search/anime", response_model=SearchResponse)
async def restricted_search_anime(*fn_args, **kwargs):
    """
    Search for anime endpoint for the restricted API mode.

    This endpoint performs semantic search against the anime dataset
    using the specified model in restricted mode.

    Parameters are the same as the regular search_anime endpoint.

    Returns:
        SearchResponse: The search results with relevant anime matches
    """
    return await search_anime(*fn_args, **kwargs)

restricted_search_manga `async` ¶

restricted_search_manga(*fn_args, **kwargs)

Search for manga endpoint for the restricted API mode.

This endpoint performs semantic search against the manga dataset using the specified model in restricted mode.

Parameters are the same as the regular search_manga endpoint.

RETURNS	DESCRIPTION
`SearchResponse`	The search results with relevant manga matches

Source code in src/api.py

@restricted_app.post("/search/manga", response_model=SearchResponse)
async def restricted_search_manga(*fn_args, **kwargs):
    """
    Search for manga endpoint for the restricted API mode.

    This endpoint performs semantic search against the manga dataset
    using the specified model in restricted mode.

    Parameters are the same as the regular search_manga endpoint.

    Returns:
        SearchResponse: The search results with relevant manga matches
    """
    return await search_manga(*fn_args, **kwargs)

search_anime `async` ¶

search_anime(request: SearchRequest, model_name: str = Query('cross-encoder/ms-marco-MiniLM-L-6-v2', description='Model name or path'), device: Optional[str] = Query(None, description="Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). If not specified, uses the best available device.")) -> SearchResponse

Search for anime matching the provided description.

This endpoint performs semantic search against the anime dataset using the specified model, returning the most relevant matches sorted by score.

Parameters¶

request: The search request body containing:
- query: The search query text describing the anime
- num_results: Number of results to return (default: 5, max: 100)
- batch_size: Batch size for processing (default: 32)
model_name: The model to use for search (query parameter)
- Can be a pre-trained model name or path to a fine-tuned model
- Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"
device: The device to run the model on (query parameter)
- Options: 'cpu', 'cuda', 'cuda:0', etc.
- If not specified, uses the best available device

Returns¶

results: List of anime matching the query, sorted by relevance
execution_time_ms: Time taken to execute the search in milliseconds
device_used: The device used for computation (e.g., 'cpu', 'cuda')

Example¶

curl -X POST "http://localhost:8000/search/anime?device=cuda" \
  -H "Content-Type: application/json" \
  -d '{"query": "A story about robots and AI"}'

Notes¶

For optimal performance on large queries, use GPU acceleration with device=cuda
Model caching is used to avoid reloading models between requests
Results include truncated synopses; full content is available in the dataset

Source code in src/api.py

@app.post("/search/anime", response_model=SearchResponse)
async def search_anime(
    request: SearchRequest,
    model_name: str = Query(
        "cross-encoder/ms-marco-MiniLM-L-6-v2", description="Model name or path"
    ),
    device: Optional[str] = Query(
        None,
        description="Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). "
        "If not specified, uses the best available device.",
    ),
) -> SearchResponse:
    """
    Search for anime matching the provided description.

    This endpoint performs semantic search against the anime dataset using
    the specified model, returning the most relevant matches sorted by score.

    ## Parameters

    - **request**: The search request body containing:
        - **query**: The search query text describing the anime
        - **num_results**: Number of results to return (default: 5, max: 100)
        - **batch_size**: Batch size for processing (default: 32)

    - **model_name**: The model to use for search (query parameter)
        - Can be a pre-trained model name or path to a fine-tuned model
        - Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"

    - **device**: The device to run the model on (query parameter)
        - Options: 'cpu', 'cuda', 'cuda:0', etc.
        - If not specified, uses the best available device

    ## Returns

    - **results**: List of anime matching the query, sorted by relevance
    - **execution_time_ms**: Time taken to execute the search in milliseconds
    - **device_used**: The device used for computation (e.g., 'cpu', 'cuda')

    ## Example

    ```bash
    curl -X POST "http://localhost:8000/search/anime?device=cuda" \\
      -H "Content-Type: application/json" \\
      -d '{"query": "A story about robots and AI"}'
    ```

    ## Notes

    - For optimal performance on large queries, use GPU acceleration with `device=cuda`
    - Model caching is used to avoid reloading models between requests
    - Results include truncated synopses; full content is available in the dataset
    """
    import time  # pylint: disable=import-outside-toplevel

    try:
        # Get the search model
        start_time = time.time()
        search_model = get_or_create_model("anime", model_name, device=device)

        # Perform the search
        results = search_model.search(
            query=request.query,
            num_results=request.num_results,
            batch_size=request.batch_size,
        )

        # Convert to response format
        execution_time_ms = (time.time() - start_time) * 1000
        return SearchResponse(
            results=[SearchResult(**result) for result in results],
            execution_time_ms=execution_time_ms,
            device_used=search_model.device,
        )
    except (ImportError, ValueError, RuntimeError, FileNotFoundError) as e:
        logger.error("Error in anime search: %s", str(e), exc_info=True)
        raise HTTPException(
            status_code=500, detail=f"Error performing search: {str(e)}"
        ) from e

search_manga `async` ¶

search_manga(request: SearchRequest, model_name: str = Query('cross-encoder/ms-marco-MiniLM-L-6-v2', description='Model name or path'), include_light_novels: bool = Query(False, description='Whether to include light novels in search results'), device: Optional[str] = Query(None, description="Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). If not specified, uses the best available device.")) -> SearchResponse

Search for manga matching the provided description.

This endpoint performs semantic search against the manga dataset using the specified model, returning the most relevant matches sorted by score.

Parameters¶

request: The search request body containing:
- query: The search query text describing the manga
- num_results: Number of results to return (default: 5, max: 100)
- batch_size: Batch size for processing (default: 32)
model_name: The model to use for search (query parameter)
- Can be a pre-trained model name or path to a fine-tuned model
- Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"
include_light_novels: Whether to include light novels in results (query parameter)
- Default: false
device: The device to run the model on (query parameter)
- Options: 'cpu', 'cuda', 'cuda:0', etc.
- If not specified, uses the best available device

Returns¶

results: List of manga matching the query, sorted by relevance
execution_time_ms: Time taken to execute the search in milliseconds
device_used: The device used for computation (e.g., 'cpu', 'cuda')

Example¶

curl -X POST "http://localhost:8000/search/manga?include_light_novels=true&device=cuda" \
  -H "Content-Type: application/json" \
  -d '{"query": "A fantasy adventure in a magical world", "num_results": 10}'

Notes¶

Use include_light_novels=true to include light novels in search results
For optimal performance on large queries, use GPU acceleration with device=cuda
Model caching is used to avoid reloading models between requests
Results include truncated synopses; full content is available in the dataset

Source code in src/api.py

@app.post("/search/manga", response_model=SearchResponse)
async def search_manga(
    request: SearchRequest,
    model_name: str = Query(
        "cross-encoder/ms-marco-MiniLM-L-6-v2", description="Model name or path"
    ),
    include_light_novels: bool = Query(
        False, description="Whether to include light novels in search results"
    ),
    device: Optional[str] = Query(
        None,
        description="Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). "
        "If not specified, uses the best available device.",
    ),
) -> SearchResponse:
    """
    Search for manga matching the provided description.

    This endpoint performs semantic search against the manga dataset using
    the specified model, returning the most relevant matches sorted by score.

    ## Parameters

    - **request**: The search request body containing:
        - **query**: The search query text describing the manga
        - **num_results**: Number of results to return (default: 5, max: 100)
        - **batch_size**: Batch size for processing (default: 32)

    - **model_name**: The model to use for search (query parameter)
        - Can be a pre-trained model name or path to a fine-tuned model
        - Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"

    - **include_light_novels**: Whether to include light novels in results (query parameter)
        - Default: false

    - **device**: The device to run the model on (query parameter)
        - Options: 'cpu', 'cuda', 'cuda:0', etc.
        - If not specified, uses the best available device

    ## Returns

    - **results**: List of manga matching the query, sorted by relevance
    - **execution_time_ms**: Time taken to execute the search in milliseconds
    - **device_used**: The device used for computation (e.g., 'cpu', 'cuda')

    ## Example

    ```bash
    curl -X POST "http://localhost:8000/search/manga?include_light_novels=true&device=cuda" \\
      -H "Content-Type: application/json" \\
      -d '{"query": "A fantasy adventure in a magical world", "num_results": 10}'
    ```

    ## Notes

    - Use `include_light_novels=true` to include light novels in search results
    - For optimal performance on large queries, use GPU acceleration with `device=cuda`
    - Model caching is used to avoid reloading models between requests
    - Results include truncated synopses; full content is available in the dataset
    """
    import time  # pylint: disable=import-outside-toplevel

    try:
        # Get the search model
        start_time = time.time()
        search_model = get_or_create_model(
            "manga",
            model_name,
            device=device,
            include_light_novels=include_light_novels,
        )

        # Perform the search
        results = search_model.search(
            query=request.query,
            num_results=request.num_results,
            batch_size=request.batch_size,
        )

        # Convert to response format
        execution_time_ms = (time.time() - start_time) * 1000
        return SearchResponse(
            results=[SearchResult(**result) for result in results],
            execution_time_ms=execution_time_ms,
            device_used=search_model.device,
        )
    except (ImportError, ValueError, RuntimeError, FileNotFoundError) as e:
        logger.error("Error in manga search: %s", str(e), exc_info=True)
        raise HTTPException(
            status_code=500, detail=f"Error performing search: {str(e)}"
        ) from e

API Server Implementation¶

Module Overview¶

API Reference¶

AniSearch API Server¶

Features¶

API Endpoints¶

Server Usage¶

GPU Acceleration¶

all_routes_enabled module-attribute ¶

app module-attribute ¶

args module-attribute ¶

enabled_routes module-attribute ¶

headers module-attribute ¶

logger module-attribute ¶

methods module-attribute ¶

model_cache module-attribute ¶

num_workers module-attribute ¶

origins module-attribute ¶

parser module-attribute ¶

restricted_app module-attribute ¶

safe_path module-attribute ¶

temp_module_path module-attribute ¶

HealthResponse ¶

cuda_available class-attribute instance-attribute ¶

models_loaded class-attribute instance-attribute ¶

status class-attribute instance-attribute ¶

ModelsResponse ¶

fine_tuned class-attribute instance-attribute ¶

models class-attribute instance-attribute ¶

SearchRequest ¶

batch_size class-attribute instance-attribute ¶

num_results class-attribute instance-attribute ¶

query class-attribute instance-attribute ¶

SearchResponse ¶

device_used class-attribute instance-attribute ¶

execution_time_ms class-attribute instance-attribute ¶

results class-attribute instance-attribute ¶

SearchResult ¶

id class-attribute instance-attribute ¶

score class-attribute instance-attribute ¶

synopsis class-attribute instance-attribute ¶

title class-attribute instance-attribute ¶

get_available_models async ¶

get_or_create_model ¶

health_check async ¶

restricted_get_models async ¶

restricted_health_check async ¶

restricted_search_anime async ¶

restricted_search_manga async ¶

search_anime async ¶

Parameters¶

Returns¶

Example¶

Notes¶

search_manga async ¶

Parameters¶

Returns¶

Example¶

Notes¶

all_routes_enabled `module-attribute` ¶

app `module-attribute` ¶

args `module-attribute` ¶

enabled_routes `module-attribute` ¶

headers `module-attribute` ¶

logger `module-attribute` ¶

methods `module-attribute` ¶

model_cache `module-attribute` ¶

num_workers `module-attribute` ¶

origins `module-attribute` ¶

parser `module-attribute` ¶

restricted_app `module-attribute` ¶

safe_path `module-attribute` ¶

temp_module_path `module-attribute` ¶

cuda_available `class-attribute` `instance-attribute` ¶

models_loaded `class-attribute` `instance-attribute` ¶

status `class-attribute` `instance-attribute` ¶

fine_tuned `class-attribute` `instance-attribute` ¶

models `class-attribute` `instance-attribute` ¶

batch_size `class-attribute` `instance-attribute` ¶

num_results `class-attribute` `instance-attribute` ¶

query `class-attribute` `instance-attribute` ¶

device_used `class-attribute` `instance-attribute` ¶

execution_time_ms `class-attribute` `instance-attribute` ¶

results `class-attribute` `instance-attribute` ¶

id `class-attribute` `instance-attribute` ¶

score `class-attribute` `instance-attribute` ¶

synopsis `class-attribute` `instance-attribute` ¶

title `class-attribute` `instance-attribute` ¶

get_available_models `async` ¶

health_check `async` ¶

restricted_get_models `async` ¶

restricted_health_check `async` ¶

restricted_search_anime `async` ¶

restricted_search_manga `async` ¶

search_anime `async` ¶

search_manga `async` ¶