Learn
Context and Conversations
When you're working with an LLM, conversations don't flow the same way they do with your team members. Think of it this way: every time you send a message, the AI isn't actually "remembering" your previous exchanges. It's more like handing someone a complete transcript of your entire conversation and asking them to respond based on everything they just read.
The technical term for this is the context window or context length. Here's what happens behind the scenes: with each new prompt you send, the system packages up your entire conversation history (your initial question, the AI's response, your follow-up, its next response, and so on) and sends this complete thread to the model along with your latest message. It's like having to hand over the entire audit file every time you want to discuss a specific finding with your senior.
This bundled conversation history serves as the context that helps the LLM understand what you're really asking about. By seeing the full thread with each interaction, the model can:
- Stay consistent: It can see what it told you about control testing procedures earlier and build on that foundation, rather than giving you contradictory guidance.
- Reference previous work: When you ask "What about that revenue recognition issue we discussed?", the context allows the model to locate that specific discussion within your conversation history.
- Handle follow-up questions: Questions like "How does that impact my substantive testing approach?" make sense only because the model can see what "that" refers to from the preceding exchange.
The transformer architecture that powers modern LLMs is particularly good at handling these long conversation threads. Its attention mechanisms work like having multiple experienced auditors simultaneously reviewing different parts of your workpapers. It can weigh the importance of various points throughout your entire conversation when crafting its response.
But here's the catch: there's a limit to how much conversation history the model can handle at once, just like there's a practical limit to how many workpapers you can effectively review simultaneously. If your conversation gets extremely long and exceeds this context window, the earliest parts of your discussion will fall out of the model's awareness.
When this happens, the LLM literally loses access to those initial details and may seem to "forget" important information you discussed early in the conversation. This can lead to responses that seem disconnected from your original objectives or that ask you to re-explain things you've already covered, kind of like when a new team member joins a project midway through and needs to be brought up to speed.