Estrategias de caché para IA: IMemoryCache, Redis...

This entry is part 9 of 25 in the series Introducción a Microsoft Semantic Kernel

wp-content/uploads/2026/02/ChatGPT-Image-14-feb-2026-20_54_00.png

Introducción a Microsoft Semantic Kernel

Introducción

El caché es esencial para optimizar costos y rendimiento en aplicaciones de IA. Las llamadas a LLMs son costosas en tiempo y dinero. En este tutorial aprenderás estrategias efectivas de caché para servicios de IA.

¿Por Qué Cachear?

Reducción de costos: Menos llamadas al API = menor gasto
Mejor rendimiento: Respuestas instantáneas desde caché
Resiliencia: Funciona aunque el servicio esté caído
Reducción de latencia: Microsegundos vs segundos

Tipos de Caché

1. Caché en Memoria (IMemoryCache)

using Microsoft.Extensions.Caching.Memory;

public class MemoryCachedAIService
{
    private readonly IMemoryCache _cache;
    private readonly IAIService _aiService;
    
    public MemoryCachedAIService(IMemoryCache cache, IAIService aiService)
    {
        _cache = cache;
        _aiService = aiService;
    }
    
    public async Task<string> GetResponseAsync(string prompt)
    {
        var cacheKey = GenerateCacheKey(prompt);
        
        if (_cache.TryGetValue<string>(cacheKey, out var cachedResponse))
        {
            return cachedResponse;
        }
        
        var response = await _aiService.GenerateAsync(prompt);
        
        var cacheOptions = new MemoryCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromHours(24),
            SlidingExpiration = TimeSpan.FromHours(6),
            Priority = CacheItemPriority.Normal
        };
        
        _cache.Set(cacheKey, response, cacheOptions);
        
        return response;
    }
    
    private string GenerateCacheKey(string prompt)
    {
        using var sha256 = SHA256.Create();
        var hashBytes = sha256.ComputeHash(Encoding.UTF8.GetBytes(prompt));
        return $"ai_{BitConverter.ToString(hashBytes).Replace("-", "").Substring(0, 16)}";
    }
}

2. Caché Distribuido (Redis)

using Microsoft.Extensions.Caching.Distributed;
using System.Text.Json;

public class DistributedCachedAIService
{
    private readonly IDistributedCache _cache;
    private readonly IAIService _aiService;
    private readonly ILogger<DistributedCachedAIService> _logger;
    
    public DistributedCachedAIService(
        IDistributedCache cache,
        IAIService aiService,
        ILogger<DistributedCachedAIService> logger)
    {
        _cache = cache;
        _aiService = aiService;
        _logger = logger;
    }
    
    public async Task<AIResponse> GetResponseAsync(
        string prompt,
        CancellationToken cancellationToken = default)
    {
        var cacheKey = GenerateCacheKey(prompt);
        
        // Intentar obtener del caché
        var cachedBytes = await _cache.GetAsync(cacheKey, cancellationToken);
        
        if (cachedBytes != null)
        {
            _logger.LogInformation("Cache HIT: {CacheKey}", cacheKey);
            var cachedResponse = JsonSerializer.Deserialize<AIResponse>(cachedBytes);
            if (cachedResponse != null)
            {
                cachedResponse.FromCache = true;
                return cachedResponse;
            }
        }
        
        _logger.LogInformation("Cache MISS: {CacheKey}", cacheKey);
        
        // Generar respuesta
        var response = await _aiService.GenerateAsync(prompt, cancellationToken);
        
        // Guardar en caché
        var responseBytes = JsonSerializer.SerializeToUtf8Bytes(response);
        
        var cacheOptions = new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromDays(7),
            SlidingExpiration = TimeSpan.FromDays(1)
        };
        
        await _cache.SetAsync(cacheKey, responseBytes, cacheOptions, cancellationToken);
        
        response.FromCache = false;
        return response;
    }
    
    private string GenerateCacheKey(string prompt)
    {
        using var sha256 = SHA256.Create();
        var hashBytes = sha256.ComputeHash(Encoding.UTF8.GetBytes(prompt));
        return $"ai_{BitConverter.ToString(hashBytes).Replace("-", "")}";
    }
}

public class AIResponse
{
    public required string Content { get; set; }
    public DateTime GeneratedAt { get; set; } = DateTime.UtcNow;
    public bool FromCache { get; set; }
    public Dictionary<string, string>? Metadata { get; set; }
}

3. Caché de Embeddings

public class EmbeddingCacheService
{
    private readonly IDistributedCache _cache;
    private readonly ITextEmbeddingGenerationService _embeddingService;
    
    public async Task<ReadOnlyMemory<float>> GetEmbeddingAsync(
        string text,
        CancellationToken cancellationToken = default)
    {
        var cacheKey = $"emb_{ComputeHash(text)}";
        
        var cachedBytes = await _cache.GetAsync(cacheKey, cancellationToken);
        
        if (cachedBytes != null)
        {
            // Deserializar embedding
            var floatArray = new float[cachedBytes.Length / sizeof(float)];
            Buffer.BlockCopy(cachedBytes, 0, floatArray, 0, cachedBytes.Length);
            return new ReadOnlyMemory<float>(floatArray);
        }
        
        // Generar embedding
        var embeddings = await _embeddingService.GenerateEmbeddingsAsync(
            new[] { text },
            kernel: null,
            cancellationToken);
        
        var embedding = embeddings.First();
        
        // Serializar y cachear
        var floats = embedding.ToArray();
        var bytes = new byte[floats.Length * sizeof(float)];
        Buffer.BlockCopy(floats, 0, bytes, 0, bytes.Length);
        
        await _cache.SetAsync(
            cacheKey,
            bytes,
            new DistributedCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = TimeSpan.FromDays(30)
            },
            cancellationToken);
        
        return embedding;
    }
    
    private string ComputeHash(string text)
    {
        using var sha256 = SHA256.Create();
        var hashBytes = sha256.ComputeHash(Encoding.UTF8.GetBytes(text));
        return BitConverter.ToString(hashBytes).Replace("-", "");
    }
}

Caché Inteligente

Caché con Versioning

public class VersionedCacheService
{
    private readonly IDistributedCache _cache;
    private readonly string _version;
    
    public VersionedCacheService(IDistributedCache cache, string version = "v1")
    {
        _cache = cache;
        _version = version;
    }
    
    private string GenerateCacheKey(string key)
    {
        return $"{_version}_{key}";
    }
    
    public async Task InvalidateVersionAsync()
    {
        // Cambiar versión invalida todo el caché anterior
        // Implementar incremento de versión
    }
}

Caché con TTL Dinámico

public class DynamicTTLCacheService
{
    private readonly IDistributedCache _cache;
    
    public async Task<string> GetWithDynamicTTLAsync(
        string prompt,
        Func<string, TimeSpan> ttlCalculator)
    {
        var cacheKey = GenerateCacheKey(prompt);
        var cached = await _cache.GetStringAsync(cacheKey);
        
        if (cached != null)
        {
            return cached;
        }
        
        var response = await GenerateResponseAsync(prompt);
        
        // TTL basado en características de la respuesta
        var ttl = ttlCalculator(response);
        
        await _cache.SetStringAsync(
            cacheKey,
            response,
            new DistributedCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = ttl
            });
        
        return response;
    }
}

// Uso
var response = await cacheService.GetWithDynamicTTLAsync(
    prompt,
    response => response.Length > 1000 
        ? TimeSpan.FromDays(7)  // Respuestas largas: TTL largo
        : TimeSpan.FromHours(1)); // Respuestas cortas: TTL corto

Caché por Similitud Semántica

public class SemanticCacheService
{
    private readonly IDistributedCache _cache;
    private readonly ITextEmbeddingGenerationService _embeddingService;
    private readonly double _similarityThreshold = 0.95;
    
    public async Task<(bool Found, string? Response)> TryGetSimilarAsync(
        string prompt,
        CancellationToken cancellationToken = default)
    {
        // Generar embedding del prompt
        var embeddings = await _embeddingService.GenerateEmbeddingsAsync(
            new[] { prompt },
            kernel: null,
            cancellationToken);
        
        var promptEmbedding = embeddings.First();
        
        // Buscar prompts similares en caché (simplificado)
        // En producción, usar una base de datos vectorial
        var cachedPrompts = await GetCachedPromptsAsync();
        
        foreach (var cachedPrompt in cachedPrompts)
        {
            var similarity = CalculateCosineSimilarity(
                promptEmbedding,
                cachedPrompt.Embedding);
            
            if (similarity >= _similarityThreshold)
            {
                var response = await _cache.GetStringAsync(cachedPrompt.Key);
                if (response != null)
                {
                    return (true, response);
                }
            }
        }
        
        return (false, null);
    }
    
    private double CalculateCosineSimilarity(
        ReadOnlyMemory<float> v1,
        ReadOnlyMemory<float> v2)
    {
        // Implementación de similitud coseno
        return 0.0;
    }
    
    private Task<List<CachedPrompt>> GetCachedPromptsAsync()
    {
        // Implementación
        return Task.FromResult(new List<CachedPrompt>());
    }
}

public class CachedPrompt
{
    public required string Key { get; init; }
    public required ReadOnlyMemory<float> Embedding { get; init; }
}

Estrategias de Invalidación

Invalidación por Tiempo

public class TimeBasedInvalidation
{
    private readonly IMemoryCache _cache;
    
    public void SetWithExpiration<T>(string key, T value, TimeSpan expiration)
    {
        _cache.Set(key, value, new MemoryCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = expiration
        });
    }
}

Invalidación por Evento

public class EventBasedInvalidation
{
    private readonly IMemoryCache _cache;
    
    public void InvalidateOnEvent(string pattern)
    {
        // Invalidar todas las claves que coincidan con el patrón
        // Nota: IMemoryCache no soporta pattern matching nativamente
        // Necesitarías mantener un registro de claves
    }
    
    public void InvalidateRelated(string entityId)
    {
        // Invalidar caché relacionado con una entidad
        var relatedKeys = new[]
        {
            $"entity_{entityId}",
            $"list_with_{entityId}",
            $"summary_of_{entityId}"
        };
        
        foreach (var key in relatedKeys)
        {
            _cache.Remove(key);
        }
    }
}

Invalidación por Tamaño

public class SizeLimitedCache
{
    private readonly IMemoryCache _cache;
    private readonly long _maxSizeInBytes;
    
    public SizeLimitedCache(IMemoryCache cache, long maxSizeInBytes)
    {
        _cache = cache;
        _maxSizeInBytes = maxSizeInBytes;
    }
    
    public void Set<T>(string key, T value, long estimatedSize)
    {
        _cache.Set(key, value, new MemoryCacheEntryOptions
        {
            Size = estimatedSize,
            Priority = CacheItemPriority.Normal
        });
    }
}

Caché en Múltiples Niveles

public class MultiLevelCacheService
{
    private readonly IMemoryCache _l1Cache;      // Nivel 1: Memoria local
    private readonly IDistributedCache _l2Cache; // Nivel 2: Redis
    private readonly IAIService _aiService;       // Nivel 3: Servicio IA
    
    public async Task<string> GetResponseAsync(
        string prompt,
        CancellationToken cancellationToken = default)
    {
        var cacheKey = GenerateCacheKey(prompt);
        
        // Nivel 1: Memoria local
        if (_l1Cache.TryGetValue<string>(cacheKey, out var l1Response))
        {
            return l1Response;
        }
        
        // Nivel 2: Caché distribuido
        var l2Response = await _l2Cache.GetStringAsync(cacheKey, cancellationToken);
        if (l2Response != null)
        {
            // Poblar L1
            _l1Cache.Set(cacheKey, l2Response, TimeSpan.FromMinutes(5));
            return l2Response;
        }
        
        // Nivel 3: Generar desde IA
        var response = await _aiService.GenerateAsync(prompt, cancellationToken);
        
        // Poblar ambos niveles
        _l1Cache.Set(cacheKey, response, TimeSpan.FromMinutes(5));
        await _l2Cache.SetStringAsync(
            cacheKey,
            response,
            new DistributedCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = TimeSpan.FromHours(24)
            },
            cancellationToken);
        
        return response;
    }
    
    private string GenerateCacheKey(string prompt)
    {
        using var sha256 = SHA256.Create();
        var hashBytes = sha256.ComputeHash(Encoding.UTF8.GetBytes(prompt));
        return BitConverter.ToString(hashBytes).Replace("-", "");
    }
}

Monitoreo de Caché

public class CacheMonitor
{
    private long _hits;
    private long _misses;
    private readonly ILogger<CacheMonitor> _logger;
    
    public void RecordHit()
    {
        Interlocked.Increment(ref _hits);
    }
    
    public void RecordMiss()
    {
        Interlocked.Increment(ref _misses);
    }
    
    public CacheStatistics GetStatistics()
    {
        var totalRequests = _hits + _misses;
        var hitRate = totalRequests > 0 ? (double)_hits / totalRequests : 0;
        
        return new CacheStatistics
        {
            Hits = _hits,
            Misses = _misses,
            TotalRequests = totalRequests,
            HitRate = hitRate
        };
    }
    
    public void LogStatistics()
    {
        var stats = GetStatistics();
        _logger.LogInformation(
            "Cache Stats: {Hits} hits, {Misses} misses, {HitRate:P2} hit rate",
            stats.Hits,
            stats.Misses,
            stats.HitRate);
    }
}

public class CacheStatistics
{
    public long Hits { get; init; }
    public long Misses { get; init; }
    public long TotalRequests { get; init; }
    public double HitRate { get; init; }
}

Mejores Prácticas

1. Definir TTL Apropiado

// ✅ TTL basado en volatilidad de datos
var staticDataTTL = TimeSpan.FromDays(7);
var dynamicDataTTL = TimeSpan.FromMinutes(5);
var realtimeDataTTL = TimeSpan.FromSeconds(30);

2. Considerar Tamaño de Caché

// ✅ Limitar tamaño para evitar problemas de memoria
services.AddMemoryCache(options =>
{
    options.SizeLimit = 1024; // Límite en unidades arbitrarias
});

3. Caché Selectivo

// ✅ Solo cachear operaciones costosas
public async Task<string> GetDataAsync(string id, bool useCache = true)
{
    if (!useCache || IsRealtimeRequired(id))
    {
        return await FetchFromSourceAsync(id);
    }
    
    return await GetFromCacheAsync(id);
}

4. Warming del Caché

public class CacheWarmer : IHostedService
{
    public async Task StartAsync(CancellationToken cancellationToken)
    {
        // Pre-cargar datos frecuentes al iniciar
        await WarmFrequentQueriesAsync(cancellationToken);
    }
    
    private async Task WarmFrequentQueriesAsync(CancellationToken cancellationToken)
    {
        var frequentQueries = await GetFrequentQueriesAsync();
        
        foreach (var query in frequentQueries)
        {
            await _cachedService.GetResponseAsync(query, cancellationToken);
        }
    }
    
    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}

Conclusión

El caché es esencial para aplicaciones de IA eficientes. Implementa caché en múltiples niveles, monitorea hit rates, y ajusta TTLs basándote en patrones de uso. Una estrategia de caché bien diseñada puede reducir costos hasta 80% y mejorar significativamente el rendimiento.

Palabras clave: caching strategies, distributed cache, Redis, memory cache, AI optimization, performance, cost reduction

Share this content:

9. Estrategias de Caché para Servicios de IA

Introducción

¿Por Qué Cachear?

Tipos de Caché

1. Caché en Memoria (IMemoryCache)

2. Caché Distribuido (Redis)

3. Caché de Embeddings

Caché Inteligente

Caché con Versioning

Caché con TTL Dinámico

Caché por Similitud Semántica

Estrategias de Invalidación

Invalidación por Tiempo

Invalidación por Evento

Invalidación por Tamaño

Caché en Múltiples Niveles

Monitoreo de Caché

Mejores Prácticas

1. Definir TTL Apropiado

2. Considerar Tamaño de Caché

3. Caché Selectivo

4. Warming del Caché

Conclusión

Introducción a Microsoft Semantic Kernel

Relacionado

por David Cantón Nadales

Mis libros

Te has perdido

25. Normalización y Preprocesamiento de Datos para IA

24. Filtrado por Relevancia Semántica en Búsquedas

23. Implementación de Routers Conversacionales Inteligentes

22. Seguridad en Aplicaciones de IA

Introducción

¿Por Qué Cachear?

Tipos de Caché

1. Caché en Memoria (IMemoryCache)

2. Caché Distribuido (Redis)

3. Caché de Embeddings

Caché Inteligente

Caché con Versioning

Caché con TTL Dinámico

Caché por Similitud Semántica

Estrategias de Invalidación

Invalidación por Tiempo

Invalidación por Evento

Invalidación por Tamaño

Caché en Múltiples Niveles

Monitoreo de Caché

Mejores Prácticas

1. Definir TTL Apropiado

2. Considerar Tamaño de Caché

3. Caché Selectivo

4. Warming del Caché

Conclusión

Introducción a Microsoft Semantic Kernel

Comparte esto:

Relacionado

por David Cantón Nadales

Entradas relacionadas

Te has perdido