AI Engineer
Introduction
- What is an AI Engineer
- AI Engineer vs ML Engineer
- Impact on product development
- AI vs AGI
- Using AI to improve UX
How LLMs work
- Large Language Models (LLMs)
- How LLMs work
- Neural networks
- Transformers
- Tokenization & token counting
- Next-token prediction
- Inference
- Understanding model capabilities
Models & selection
- Pre-trained models
- Closed vs open-source models
- Choosing the right model
- Model selection
- Understanding model capabilities
- Smaller models
- SKU / model variants
- Model families
- OpenAI GPT
- Anthropic Claude
- Google Gemini
- Meta Llama
- Mistral
- Gemma
- DeepSeek
- Cohere
- Perplexity
Using model APIs
- OpenAI API
- Claude Messages API
- Google Gemini API
- Hugging Face Inference SDK
- Input format
- Output format
- Structured outputs
- System prompts
- Token counting
Prompt engineering
- Prompt engineering
- System prompts
- Few-shot prompting
- Chain-of-thought (CoT)
- Context engineering
- Prompt optimization
- Prompt compression
- Constraining inputs & outputs
AI safety & ethics
- AI safety and ethics
- Bias and fairness
- Content moderation APIs
- Conducting adversarial testing
- Safety evaluation
- Data classification
- Anomaly detection
- Adding end-user IDs in prompts
- Know your customers / use cases
Open-source & local models
- Open-source models
- Ollama
- LM Studio
- Hugging Face
- Hub
- Models
- Tasks
- Inference SDK
- Connect to a local server
- Connect to a remote server
- Quantization
- Llama / Gemma locally
Embeddings
- Embeddings
- Vector embeddings
- Embedding models
- Indexing embeddings
- Semantic search
- Gemini embedding
- Cohere embed
- Jina
Vector databases
- Vector databases
- Chroma
- FAISS
- LanceDB
- Pinecone
- Qdrant
- Weaviate
RAG (Retrieval-Augmented Generation)
- RAG / retrieval-augmented generation
- Chunking
- Indexing embeddings
- Retrieval
- Ranking
- Re-ranking
- Generation
- Data layer
- Semantic search
- RAG chatbots
- RAG over multimodal documents
AI agents
- What are AI agents
- Agent use cases
- Function calling
- Tools
- Manual implementation (the loop)
- ReAct
- External memory
- Context compaction
- Context isolation
- Frameworks & tools
- LangChain
- LlamaIndex
- Haystack
- Development tools
- Agent SDKs / coding agents
- Claude Agent SDK
- Google ADK
- Claude Code
- OpenAI Codex
- Cursor
MCP (Model Context Protocol)
- What is MCP
- Building an MCP client
- Building an MCP server
- MCP client
- MCP server
Multimodal AI
- Multimodal
- Image understanding
- Image generation
- DALL-E API
- Vision-language models
- Audio processing
- Speech recognition
- Whisper
- Video understanding
- Multimodal RAG
- LangChain for multimodal apps
- LlamaIndex for multimodal apps
Classic NLP tasks
- NLP tasks
- Sentiment analysis
- Summarization
- Data classification
- Anomaly detection
- Web search integration
Evaluation, optimization & monitoring
- Model evaluation
- Safety evaluation
- Model optimization
- Quantization
- Monitoring
- Monitoring LLM apps
- Inference cost/latency
Fine-tuning & training
- Fine-tuning
- Training custom models
- Transformers
- Neural networks
Data & integrations
- SQL databases
- Web search integration
- Data layer