From Black Box to Blueprint

Table of Contents

A remarkably common case in large established enterprises is that there
are systems that nobody wants to touch, but everyone depends on. They run
payrolls, handle logistics, reconcile inventory, or process customer orders.
They’ve been in place and evolving slowly for decades, built on stacks no
one teaches anymore, and maintained by a shrinking pool of experts. It’s
hard to find a person (or a team) that can confidently say that they know
the system well and are ready to provide the functional specs. This
situation leads to a really long cycle of analysis, and many programs get
long delayed or stopped mid way because of the Analysis Paralysis.

These systems often live inside frozen environments: outdated databases,
legacy operating systems, brittle VMs. Documentation is either missing or
hopelessly out of sync with reality. The people who wrote the code have long
since moved on. Yet the business logic they embody is still critical to
daily operations of thousands of users. The result is what we call a black
box: a system whose outputs we can observe, but whose inner workings remain
opaque. For CXOs and technology leaders, these black boxes create a
modernization deadlock

Too risky to replace without fully understanding them
Too costly to maintain on life support
Too critical to ignore

This is where AI-assisted reverse engineering becomes not just a
technical curiosity, but a strategic enabler. By reconstructing the
functional intent of a system,even if it’s missing the source code, we can
turn fear and opacity into clarity. And with clarity comes the confidence to
modernize.

The System we Encountered

The system itself was vast in both scale and complexity. Its databases
across multiple platforms contained more than 650 tables and 1,200 stored
procedures, reflecting decades of evolving business rules. Functionality
extended across 24 business domains and was presented through nearly 350
user screens. Behind the scenes, the application tier consisted of 45
compiled DLLs, each with thousands of functions and virtually no surviving
documentation. This intricate mesh of data, logic, and user workflows,
tightly integrated with multiple enterprise systems and databases, made
the application extremely challenging to modernize

Our task was to carry out an experiment to see if we could use AI to
create a functional specification of the existing system with sufficient
detail to drive the implementation of a replacement system. We completed
the experiment phase for an end to end thin slice with reverse and forward
engineering. Our confidence level is more than high because we did multiple
levels of cross checking and verification. We walked through the reverse
engineered functional spec with sys-admin / users to confirm the intended
functionality and also verified that the spec we generated is sufficient
for forward engineering as well.

The client issued an RFP for this work, with we estimated would take 6
months for a team of peak 20 people. Sadly for us, they decided to work
with one of their existing preferred partners, so we won’t be able to see
how our experiment scales to the full system in practice. We do, however,
think we learned enough from the exercise to be worth sharing with our
professional colleagues.

Key Challenges

Missing Source Code: legacy understanding is already complex when you
have source code and an SME (in some form) to put everything together. When the
source code is missing and there are no experts it is an even greater challenge.
What’s left are some compiled binaries. These are not the recent binaries that
are easy to decompile due to rich metadata (like .NET assemblies or JARs), these
are even older binaries: the kind that you might see in old windows XP under
C:\\Windows\\system32. Even when the database is accessible, it does not tell
the whole story. Stored procedures and triggers encode decades of accumulated
business rules. Schema reflects compromises made based on context unknown.
Outdated Infrastructure: OS and DB reached end of life, long past its
LTS. Application has been in a frozen state in the form of VM leading to
significant risk to not only business continuity, also significantly increasing
security vulnerability, non compliance and risk liability.
Institutional Knowledge Lost: while thousands of end users are
continuously using the system, there is hardly any business knowledge available
beyond the occasional support activities. The live system is the best source of
knowledge. The only reliable view of functionality is what users see on screen.
But the UI captures only the “last mile” of execution. Behind each screen lies a
tangled web of logic deeply integrated to multiple other core systems. This is a
common challenge, and this system was no exception, having a history of multiple
failed attempts to modernize.

Our Goal

The objective is to create a rich, comprehensive functional specification
of the legacy system without needing its original code, but with high
confidence. This specification then serves as the blueprint for building a
modern replacement application from a clean slate.

Understand overall picture of the system boundary and the integration
patterns
Build detailed understanding of each functional area
Identify the common and exceptional scenarios

To make sense of a black-box system, we needed a structured way to pull
together fragments from different sources. Our principle was simple: don’t
try to recover the code — reconstruct the functional intent.

Our Multi Lens Approach

It was a 3 tier architecture – Web Tier (ASP), App Tier (DLL) and
Persistence (SQL). This architecture pattern gave us a jump start even without
source repo. We extracted ASP files and DB schema, stored procedures from the
production system. For App Tier we only have the native binaries. With all
this information available, we planned to create a semi-structured
description of application behavior in natural language for the business
users to validate their understanding and expectations and use the validated
functional spec to do accelerated forward engineering. For the semi-structured
description, our approach had broadly two parts

Using AI to connect dots across different data sources
AI assisted binary Archaeology to uncover the hidden functionality from
the native DLL files

Connect dots across different data sources

UI Layer Reconstruction

Browsing the existing live application and screenshots, we identified the
UI elements. Using the ASP and JS content the dynamic behaviour associated
with the UI element could be added. This gave us a UI spec like below:

What we looked for: validation rules, navigation paths, hidden fields. One
of the key challenges we faced from the early stage was hallucination, every
step we added a detailed lineage to ensure that we cross check and confirm. In
the above example we had the lineage of where it comes from. Following this
pattern, for every key information we added the lineage along with the
context. Here the LLM really sped up the summarizing of large numbers of
screen definitions and consolidating logic from ASP and JS sources with the
already identified UI layouts and field descriptions that would otherwise take
weeks to create and consolidate.

Discovery with Change Data Capture (CDC)

We planned to use Change Data Capture (CDC) to trace how UI actions mapped
to database activity, retrieving change logs from MCP servers to track the
workflows. Environment constraints meant CDC could only be enabled partially,
limiting the breadth of captured data.

Other potential sources—such as front-end/back-end network traffic,
filesystem changes, additional persistence layers, or even debugging
breakpoints—remain viable options for finer-grained discovery. Even with
partial CDC, the insights proved valuable in linking UI behavior to underlying
data changes and enriching the system blueprint.

Server Logic Inferance

We then added more context by supplying
typelibs that were extracted from the native binaries, and stored procedures,
and schema extracted from the database. At this point with information about
layout, presentation logic, and DB changes, the server logic can be inferred,
which stored procedures are likely called, and which tables are involved for
most methods and interfaces defined in the native binaries. This process leads
to an Inferred Server Logic Spec. LLM helped in proposing likely relationships
between App tier code and procedures / tables, which we then validated through
observed data flows.

AI assisted Binary Archaeology

The most opaque layer was the compiled binaries (DLLs, executables). Here,
we treated binaries as artifacts to be decoded rather than rebuilt. What we
looked for: call trees, recurring assembly patterns, candidate entry points.
AI assisted in bulk summarizing disassembled code into human-readable
hypotheses, flagging probable function roles — always validated by human
experts.

The impact of not having good deployment practices was evident with the
Production machine having multiple versions of the same file with file names
used to identify different versions and confusing names. Timestamps provided
some clues. Locating the binaries was also done using the windows registry.
There were also proxies for each binary that passed calls to the actual binary
to allow the App tier to run on a different machine than the web tier. The
fact that proxy binaries had the same name as target binaries adds to
confusion.

We didn’t have to look at binary code of DLL. Tools like Ghidra help to
decompile binary to a big set of ASM functions. Some of these tools even have
the option to convert ASM into C code but we found that conversions are not
always accurate. In our case decompilation to C missed a crucial lead.

Each DLL had 1000s of assembly functions, and we settled on an approach
where we identify the relevant functions for a functional area and decode what
that subtree of relevant functions does.

Prior Attempts

Before we arrived at this approach, we tried

brute-force method: Added all assembly functions into a workspace, and used
the LLM agent to make it humanly readable pseudocode. Faced multiple challenges
with this. Ran out of the 1 million context window as LLM tried to eventually
load all functions due to dependencies (references it encountered e.g. function
calls, and other functions referencing current one)
Split the set of functions into multiple batches, a file each with 100s of
functions, and then use LLM to analyze each batch in isolation. We faced a lot
of hallucination issues, and file size issues while streaming to model. A few
functions were converted meaningfully but a lot of other functions didn’t make
any-sense at all, all sounded like similar capabilities, on cross checking we
realised the hallucination effect.
The next attempt was to convert the functions one at a time, to
ensure LLM is provided with a fresh narrow window of context to limit
hallucination. We faced multiple challenges (API usage limit, rate
limits) – We couldn’t verify what LLM translation of business logic
was right or wrong. Then we couldn’t connect the dots between these
functions. Interesting note, we even found some C++ STDLIB functions
like
std::vector::insert
in this approach. We found a lot were actually unwind functions purely
used to call destructors when exception happens (stack
unwinding)
destructors, catch block functions. Clearly we needed to focus on
business logic and ignore the compiled library functions, also mixed
into the binary

After these attempts we decided to change our approach to slice the DLL based
on functional area/workflow rather than consider the complete assembly code.

Finding the relevant function

The first challenge in the functional area / workflow approach is to find a
link or entry point among the 1000s of functions.

One of the available options was to carefully look at the constants and
strings in the DLL. We used the historical context: late 1990s or early 2000
common architectural pattern followed in that period was to insert data into
the DB: was to either “select for insert” or “insert/update handled by stored
procedure” or via ADO (which is an ORM). Interestingly we found all the
patterns in different parts of the system.

Our functionality was about inserting or updating the DB at the end of the
process but we couldn’t find any insert or update queries in the strings, no
stored procedure to perform the operation either. For the functionality we
were looking for, it happened to actually use a SELECT through SQL and then
updated via ADO (activex data object microsoft library).

We got our break based on the table name mentioned in the
strings/constants, and this led to finding the function which is using that
SQL statement. Initial look at that function didn’t reveal much, it could be
in the same functional area but part of a different workflow.

Building the relevant subtree

ASM code, and our disassembly tool, gave us the function call reference
data, using it we walked up the tree, assuming the statement execution is one
of the leaf functions, we navigated to the parent which called this to
understand its context. At each step we converted ASM into pseudo code to
build context.

Earlier when we converted ASM to pseudocode using brute-force we couldn’t
cross verify if it is true. This time we are better prepared because we know
to anticipate what could be the potential things that could happen before a
sql execution. And use the context that we gathered from earlier steps.

We mapped out relevant functions using this call tree navigation, sometimes
we have to avoid wrong paths. We learned about context poisoning in a hard
way, in-advertely we passed what we were looking for into LLM. From that
moment LLM started colouring its output targeted towards what we were looking
for, leading into wrong paths and eroding trust. We had to recreate a clean
room for AI to work in during this stage.

We got a high level outline of what the different functions were, and what
they could be doing. For a given work flow, we narrowed down from 4000+
functions to 40+ functions to deal with.

Multi-Pass Enrichment

AI accelerated the assembly archaeology layer by layer, pass by pass: We
applied multi pass enrichment. In each pass, we either navigated from leaf
node to top of the tree or reverse, in each step we enriched the context of
the function either using its parents context or its child context. This
helped us to change the technical conversion of pseudocode into a functional
specification. We followed simple techniques like asking LLM to give
meaningful method names based on known context. After multiple passes we build
out the entire functional context.

Validating the entry point

The last and critical challenge was to confirm the entry function. Typical
to C++, virtual functions made it harder to link entry functions in class
definition. While functionality looked complete starting with the root node,
we were not sure if there is any other additional operation happening in a
parent function or a wrapper. Life would have been easier if we had debugger
enabled, a simple break point and review of the call stack would have
confirmed it.

However with triangulation techniques, like:

Call stack analysis
Validating argument signatures and the the return signature in the
stack
Cross-checking with UI layer calls (e.g., associating method signature
with the “submit” call from Web tier, checking parameter types and usage, and
validating against that context)

Building the Spec from Fragments to Functionality

By integrating the reconstructed elements from the previous stages:UI Layer
Reconstruction, Discovery with CDC, Server Logic Inference, and Binary
analysis of App tier, a complete functional summary of the system is recreated
with high confidence. This comprehensive specification forms a traceable and
reliable foundation for business review and modernization/forward engineering
efforts.

From our work, a set of repeatable practices emerged. These aren’t
step-by-step recipes — every system is different — but guiding patterns that
shape how to approach the unknown.

Start Where Visibility is Highest: Begin with what you can see and trust:
screens, data schemas, logs. These give a foundation of observable behavior
before diving into opaque binaries. This avoids analysis paralysis by anchoring
early progress in artifacts users already understand.
Enrich in Passes: Don’t overload AI or humans with the whole system at
once. Break artifacts into manageable chunks, extract partial insights, and
progressively build context. This reduces hallucination risk, reduces
assumptions, scales better with large legacy estates.
Triangulate Everything: Never rely on a single artifact. Confirm every
hypothesis across at least two independent sources — e.g., a screen flow matched
against a stored procedure, then validated in a binary call tree. It creates
confidence in conclusions, exposes hidden contradictions.
Preserve Lineage: Track where every piece of inferred knowledge comes
from — UI screen, schema field, binary function. This “audit trail” prevents
false assumptions from propagating unnoticed. When questions arise later, you
can trace back to original evidence.
Keep Humans in the Loop: AI can accelerate analysis, but it cannot
replace domain understanding. Always pair AI hypotheses with expert validation,
especially for business-critical rules. Helps to avoid embedding AI errors
directly into future modernization designs.

Conclusion and Key Takeaways

Blackbox reverse engineering, especially when supercharged with AI, offers
significant advantages for legacy system modernization:

Accelerated Understanding: AI speeds up legacy system understanding from
months to weeks, transforming complex tasks like converting assembly code into
pseudocode and classifying functions into business or utility categories.
Reduced Fear of Undocumented Systems: organizations no longer need to
fear undocumented legacy systems.
Reliable First Step for Modernization: reverse engineering becomes a
reliable and responsible first step toward modernization.

This approach unlocks Clear Functional Specifications even without
source code, Better-Informed Decisions for modernization and cloud
migration, Insight-Driven Forward Engineering while moving away from
guesswork.

The future holds much faster legacy modernization due to the
impact of AI tools, drastically reducing steep costs and risky long-term
commitments. Modernization is expected to happen in “leaps and bounds”. In the
next 2-3 years we could expect more systems to be retired than in the last 20
years. It is recommended to start small, as even a sandboxed reverse
engineering effort can uncover surprising insights

Source link