A Coding Implementation to Construct an AI Agent with Stay Python Execution and Automated Validation


On this tutorial, we are going to uncover easy methods to harness the facility of a sophisticated AI Agent, augmented with each Python execution and result-validation capabilities, to sort out advanced computational duties. By integrating LangChain’s ReAct agent framework with Anthropic’s Claude API, we construct an end-to-end answer to generate Python code and execute it dwell, seize its outputs, preserve execution state, and routinely confirm outcomes in opposition to anticipated properties or take a look at instances. This seamless loop of “write → run → validate” empowers you to develop strong analyses, algorithms, and easy ML pipelines with confidence in each step.

!pip set up langchain langchain-anthropic langchain-core anthropic

We set up the core LangChain framework together with the Anthropic integration and its core utilities, making certain you may have each the agent orchestration instruments (langchain, langchain-core) and the Claude-specific bindings (langchain-anthropic, anthropic) out there in your surroundings.

import os
from langchain.brokers import create_react_agent, AgentExecutor
from langchain.instruments import Device
from langchain_core.prompts import PromptTemplate
from langchain_anthropic import ChatAnthropic
import sys
import io
import re
import json
from typing import Dict, Any, Listing

We deliver collectively every little thing wanted to construct our ReAct-style agent: OS entry for surroundings variables, LangChain’s agent constructors (create_react_agent, AgentExecutor), and Device class for outlining customized actions, the PromptTemplate for crafting the chain-of-thought immediate, and Anthropic’s ChatAnthropic shopper for connecting to Claude. Customary Python modules (sys, io, re, json) deal with I/O seize, common expressions, and serialization, whereas typing offers sort hints for clearer, extra maintainable code.

class PythonREPLTool:
    def __init__(self):
        self.globals_dict = {
            '__builtins__': __builtins__,
            'json': json,
            're': re
        }
        self.locals_dict = {}
        self.execution_history = []
   
    def run(self, code: str) -> str:
        attempt:
            old_stdout = sys.stdout
            old_stderr = sys.stderr
            sys.stdout = captured_output = io.StringIO()
            sys.stderr = captured_error = io.StringIO()
           
            execution_result = None
           
            attempt:
                outcome = eval(code, self.globals_dict, self.locals_dict)
                execution_result = outcome
                if outcome shouldn't be None:
                    print(outcome)
            besides SyntaxError:
                exec(code, self.globals_dict, self.locals_dict)
           
            output = captured_output.getvalue()
            error_output = captured_error.getvalue()
           
            sys.stdout = old_stdout
            sys.stderr = old_stderr
           
            self.execution_history.append({
                'code': code,
                'output': output,
                'outcome': execution_result,
                'error': error_output
            })
           
            response = f"**Code Executed:**n```pythonn{code}n```nn"
            if error_output:
                response += f"**Errors/Warnings:**n{error_output}nn"
            response += f"**Output:**n{output if output.strip() else 'No console output'}"
           
            if execution_result shouldn't be None and never output.strip():
                response += f"n**Return Worth:** {execution_result}"
           
            return response
           
        besides Exception as e:
            sys.stdout = old_stdout
            sys.stderr = old_stderr
           
            error_info = f"**Code Executed:**n```pythonn{code}n```nn**Runtime Error:**n{str(e)}n**Error Sort:** {sort(e).__name__}"
           
            self.execution_history.append({
                'code': code,
                'output': '',
                'outcome': None,
                'error': str(e)
            })
           
            return error_info
   
    def get_execution_history(self) -> Listing[Dict[str, Any]]:
        return self.execution_history
   
    def clear_history(self):
        self.execution_history = []

This PythonREPLTool encapsulates a stateful in‐course of Python REPL: it captures and executes arbitrary code (evaluating expressions or operating statements), redirects stdout/stderr to file outputs and errors, and maintains a historical past of every execution. Returning a formatted abstract, together with the executed code, any console output or errors, and return values, offers clear, reproducible suggestions for each snippet run inside our agent.

class ResultValidator:
    def __init__(self, python_repl: PythonREPLTool):
        self.python_repl = python_repl
   
    def validate_mathematical_result(self, description: str, expected_properties: Dict[str, Any]) -> str:
        """Validate mathematical computations"""
        validation_code = f"""
# Validation for: {description}
validation_results = {{}}


# Get the final execution outcomes
historical past = {self.python_repl.execution_history}
if historical past:
    last_execution = historical past[-1]
    print(f"Final execution output: {{last_execution['output']}}")
   
    # Extract numbers from the output
    import re
    numbers = re.findall(r'd+(?:.d+)?', last_execution['output'])
    if numbers:
        numbers = [float(n) for n in numbers]
        validation_results['extracted_numbers'] = numbers
       
        # Validate anticipated properties
        for prop, expected_value in {expected_properties}.objects():
            if prop == 'rely':
                actual_count = len(numbers)
                validation_results[f'count_check'] = actual_count == expected_value
                print(f"Depend validation: Anticipated {{expected_value}}, Acquired {{actual_count}}")
            elif prop == 'max_value':
                if numbers:
                    max_val = max(numbers)
                    validation_results[f'max_check'] = max_val <= expected_value
                    print(f"Max worth validation: {{max_val}} <= {{expected_value}} = {{max_val <= expected_value}}")
            elif prop == 'min_value':
                if numbers:
                    min_val = min(numbers)
                    validation_results[f'min_check'] = min_val >= expected_value
                    print(f"Min worth validation: {{min_val}} >= {{expected_value}} = {{min_val >= expected_value}}")
            elif prop == 'sum_range':
                if numbers:
                    complete = sum(numbers)
                    min_sum, max_sum = expected_value
                    validation_results[f'sum_check'] = min_sum <= complete <= max_sum
                    print(f"Sum validation: {{min_sum}} <= {{complete}} <= {{max_sum}} = {{min_sum <= complete <= max_sum}}")


print("nValidation Abstract:")
for key, worth in validation_results.objects():
    print(f"{{key}}: {{worth}}")


validation_results
"""
        return self.python_repl.run(validation_code)
   
    def validate_data_analysis(self, description: str, expected_structure: Dict[str, Any]) -> str:
        """Validate information evaluation outcomes"""
        validation_code = f"""
# Information Evaluation Validation for: {description}
validation_results = {{}}


# Examine if required variables exist in world scope
required_vars = {record(expected_structure.keys())}
existing_vars = []


for var_name in required_vars:
    if var_name in globals():
        existing_vars.append(var_name)
        var_value = globals()[var_name]
        validation_results[f'{{var_name}}_exists'] = True
        validation_results[f'{{var_name}}_type'] = sort(var_value).__name__
       
        # Sort-specific validations
        if isinstance(var_value, (record, tuple)):
            validation_results[f'{{var_name}}_length'] = len(var_value)
        elif isinstance(var_value, dict):
            validation_results[f'{{var_name}}_keys'] = record(var_value.keys())
        elif isinstance(var_value, (int, float)):
            validation_results[f'{{var_name}}_value'] = var_value
           
        print(f"✓ Variable '{{var_name}}' discovered: {{sort(var_value).__name__}} = {{var_value}}")
    else:
        validation_results[f'{{var_name}}_exists'] = False
        print(f"✗ Variable '{{var_name}}' not discovered")


print(f"nFound {{len(existing_vars)}}/{{len(required_vars)}} required variables")


# Extra construction validation
for var_name, expected_type in {expected_structure}.objects():
    if var_name in globals():
        actual_type = sort(globals()[var_name]).__name__
        validation_results[f'{{var_name}}_type_match'] = actual_type == expected_type
        print(f"Sort verify '{{var_name}}': Anticipated {{expected_type}}, Acquired {{actual_type}}")


validation_results
"""
        return self.python_repl.run(validation_code)
   
    def validate_algorithm_correctness(self, description: str, test_cases: Listing[Dict[str, Any]]) -> str:
        """Validate algorithm implementations with take a look at instances"""
        validation_code = f"""
# Algorithm Validation for: {description}
validation_results = {{}}
test_results = []


test_cases = {test_cases}


for i, test_case in enumerate(test_cases):
    test_name = test_case.get('title', f'Take a look at {{i+1}}')
    input_val = test_case.get('enter')
    anticipated = test_case.get('anticipated')
    function_name = test_case.get('operate')
   
    print(f"nRunning {{test_name}}:")
    print(f"Enter: {{input_val}}")
    print(f"Anticipated: {{anticipated}}")
   
    attempt:
        if function_name and function_name in globals():
            func = globals()[function_name]
            if callable(func):
                if isinstance(input_val, (record, tuple)):
                    outcome = func(*input_val)
                else:
                    outcome = func(input_val)
               
                handed = outcome == anticipated
                test_results.append({{
                    'test_name': test_name,
                    'enter': input_val,
                    'anticipated': anticipated,
                    'precise': outcome,
                    'handed': handed
                }})
               
                standing = "✓ PASS" if handed else "✗ FAIL"
                print(f"Precise: {{outcome}}")
                print(f"Standing: {{standing}}")
            else:
                print(f"✗ ERROR: '{{function_name}}' shouldn't be callable")
        else:
            print(f"✗ ERROR: Perform '{{function_name}}' not discovered")
           
    besides Exception as e:
        print(f"✗ ERROR: {{str(e)}}")
        test_results.append({{
            'test_name': test_name,
            'error': str(e),
            'handed': False
        }})


# Abstract
passed_tests = sum(1 for take a look at in test_results if take a look at.get('handed', False))
total_tests = len(test_results)
validation_results['tests_passed'] = passed_tests
validation_results['total_tests'] = total_tests
validation_results['success_rate'] = passed_tests / total_tests if total_tests > 0 else 0


print(f"n=== VALIDATION SUMMARY ===")
print(f"Assessments handed: {{passed_tests}}/{{total_tests}}")
print(f"Success charge: {{validation_results['success_rate']:.1%}}")


test_results
"""
        return self.python_repl.run(validation_code)

This ResultValidator class builds on the PythonREPLTool to routinely generate and run bespoke validation routines, checking numerical properties, verifying information buildings, or operating algorithm take a look at instances in opposition to the agent’s execution historical past. Emitting Python snippets that extract outputs, evaluate them to anticipated standards, and summarize cross/fail outcomes closes the loop on “execute → validate” inside our agent’s workflow.

python_repl = PythonREPLTool()
validator = ResultValidator(python_repl)

Right here, we instantiate our interactive Python REPL instrument (python_repl) after which create a ResultValidator tied to that very same REPL occasion. This wiring ensures any code you execute is straight away out there for automated validation steps, closing the loop on execution and correctness checking.

python_tool = Device(
    title="python_repl",
    description="Execute Python code and return each the code and its output. Maintains state between executions.",
    func=python_repl.run
)


validation_tool = Device(
    title="result_validator",
    description="Validate the outcomes of earlier computations with particular take a look at instances and anticipated properties.",
    func=lambda question: validator.validate_mathematical_result(question, {})
)

Right here, we wrap our REPL and validation strategies into LangChain Device objects, assigning them clear names and descriptions. The agent can invoke python_repl to run code and result_validator to verify the final execution in opposition to your specified standards routinely.

prompt_template = """You're Claude, a sophisticated AI assistant with Python execution and outcome validation capabilities.


You possibly can execute Python code to resolve advanced issues after which validate your outcomes to make sure accuracy.


Obtainable instruments:
{instruments}


Use this format:
Query: the enter query you will need to reply
Thought: analyze what must be accomplished
Motion: {tool_names}
Motion Enter: [your input]
Statement: [result]
... (repeat Thought/Motion/Motion Enter/Statement as wanted)
Thought: I ought to validate my outcomes
Motion: [validation if needed]
Motion Enter: [validation parameters]
Statement: [validation results]
Thought: I now have the whole reply
Remaining Reply: [comprehensive answer with validation confirmation]


Query: {enter}
{agent_scratchpad}"""


immediate = PromptTemplate(
    template=prompt_template,
    input_variables=["input", "agent_scratchpad"],
    partial_variables={
        "instruments": "python_repl - Execute Python codenresult_validator - Validate computation outcomes",
        "tool_names": "python_repl, result_validator"
    }
)

Above immediate template frames Claude as a dual-capability assistant that first causes (“Thought”), selects from the python_repl and result_validator instruments to run code and verify outputs, after which iterates till it has a validated answer. By defining a transparent chain-of-thought construction with placeholders for instrument names and their utilization examples, it guides the agent to: (1) break down the issue, (2) name python_repl to execute needed code, (3) name result_validator to substantiate correctness, and at last (4) ship a self-checked “Remaining Reply.” This scaffolding ensures a disciplined “write → run → validate” workflow.

class AdvancedClaudeCodeAgent:
    def __init__(self, anthropic_api_key=None):
        if anthropic_api_key:
            os.environ["ANTHROPIC_API_KEY"] = anthropic_api_key
       
        self.llm = ChatAnthropic(
            mannequin="claude-3-opus-20240229",
            temperature=0,
            max_tokens=4000
        )
       
        self.agent = create_react_agent(
            llm=self.llm,
            instruments=[python_tool, validation_tool],
            immediate=immediate
        )
       
        self.agent_executor = AgentExecutor(
            agent=self.agent,
            instruments=[python_tool, validation_tool],
            verbose=True,
            handle_parsing_errors=True,
            max_iterations=8,
            return_intermediate_steps=True
        )
       
        self.python_repl = python_repl
        self.validator = validator
   
    def run(self, question: str) -> str:
        attempt:
            outcome = self.agent_executor.invoke({"enter": question})
            return outcome["output"]
        besides Exception as e:
            return f"Error: {str(e)}"
   
    def validate_last_result(self, description: str, validation_params: Dict[str, Any]) -> str:
        """Manually validate the final computation outcome"""
        if 'test_cases' in validation_params:
            return self.validator.validate_algorithm_correctness(description, validation_params['test_cases'])
        elif 'expected_structure' in validation_params:
            return self.validator.validate_data_analysis(description, validation_params['expected_structure'])
        else:
            return self.validator.validate_mathematical_result(description, validation_params)
   
    def get_execution_summary(self) -> Dict[str, Any]:
        """Get abstract of all executions"""
        historical past = self.python_repl.get_execution_history()
        return {
            'total_executions': len(historical past),
            'successful_executions': len([h for h in history if not h['error']]),
            'failed_executions': len([h for h in history if h['error']]),
            'execution_details': historical past
        }

This AdvancedClaudeCodeAgent class wraps every little thing right into a single, easy-to-use interface: it configures the Anthropic Claude shopper (utilizing your API key), instantiates a ReAct-style agent with our python_repl and result_validator instruments and the customized immediate, and units up an executor that drives iterative “assume → code → validate” loops. Its run() methodology allows you to submit natural-language queries and returns Claude’s closing, self-checked reply; validate_last_result() exposes guide hooks for added checks; and get_execution_summary() offers a concise report on each code snippet you’ve executed (what number of succeeded, failed, and their particulars).

if __name__ == "__main__":
    API_KEY = "Use Your Personal Key Right here"
   
    agent = AdvancedClaudeCodeAgent(anthropic_api_key=API_KEY)
   
    print("🚀 Superior Claude Code Agent with Validation")
    print("=" * 60)
   
    print("n🔢 Instance 1: Prime Quantity Evaluation with Twin Prime Detection")
    print("-" * 60)
    query1 = """
    Discover all prime numbers between 1 and 200, then:
    1. Calculate their sum
    2. Discover all twin prime pairs (primes that differ by 2)
    3. Calculate the common hole between consecutive primes
    4. Establish the biggest prime hole on this vary
    After computation, validate that we discovered the proper variety of primes and that each one recognized numbers are literally prime.
    """
    result1 = agent.run(query1)
    print(result1)
   
    print("n" + "=" * 80 + "n")
   
    print("📊 Instance 2: Superior Gross sales Information Evaluation with Statistical Validation")
    print("-" * 60)
    query2 = """
    Create a complete gross sales evaluation:
    1. Generate gross sales information for 12 merchandise throughout 24 months with life like seasonal patterns
    2. Calculate month-to-month development charges, yearly totals, and pattern evaluation
    3. Establish prime 3 performing merchandise and worst 3 performing merchandise
    4. Carry out correlation evaluation between totally different merchandise
    5. Create abstract statistics (imply, median, customary deviation, percentiles)
    After evaluation, validate the info construction, guarantee all calculations are mathematically appropriate, and confirm the statistical measures.
    """
    result2 = agent.run(query2)
    print(result2)
   
    print("n" + "=" * 80 + "n")
   
    print("⚙️ Instance 3: Superior Algorithm Implementation with Take a look at Suite")
    print("-" * 60)
    query3 = """
    Implement and validate a complete sorting and looking system:
    1. Implement quicksort, mergesort, and binary search algorithms
    2. Create take a look at information with varied edge instances (empty lists, single parts, duplicates, sorted/reverse sorted)
    3. Benchmark the efficiency of various sorting algorithms
    4. Implement a operate to seek out the kth largest factor utilizing totally different approaches
    5. Take a look at all implementations with complete take a look at instances together with edge instances
    After implementation, validate every algorithm with a number of take a look at instances to make sure correctness.
    """
    result3 = agent.run(query3)
    print(result3)
   
    print("n" + "=" * 80 + "n")
   
    print("🤖 Instance 4: Machine Studying Mannequin with Cross-Validation")
    print("-" * 60)
    query4 = """
    Construct an entire machine studying pipeline:
    1. Generate an artificial dataset with options and goal variable (classification downside)
    2. Implement information preprocessing (normalization, function scaling)
    3. Implement a easy linear classifier from scratch (gradient descent)
    4. Break up information into prepare/validation/take a look at units
    5. Prepare the mannequin and consider efficiency (accuracy, precision, recall)
    6. Implement k-fold cross-validation
    7. Evaluate outcomes with totally different hyperparameters
    Validate your complete pipeline by making certain mathematical correctness of gradient descent, correct information splitting, and life like efficiency metrics.
    """
    result4 = agent.run(query4)
    print(result4)
   
    print("n" + "=" * 80 + "n")
   
    print("📋 Execution Abstract")
    print("-" * 60)
    abstract = agent.get_execution_summary()
    print(f"Whole code executions: {abstract['total_executions']}")
    print(f"Profitable executions: {abstract['successful_executions']}")
    print(f"Failed executions: {abstract['failed_executions']}")
   
    if abstract['failed_executions'] > 0:
        print("nFailed executions particulars:")
        for i, execution in enumerate(abstract['execution_details']):
            if execution['error']:
                print(f"  {i+1}. Error: {execution['error']}")
   
    print(f"nSuccess charge: {(abstract['successful_executions']/abstract['total_executions']*100):.1f}%")

Lastly, we instantiate the AdvancedClaudeCodeAgent together with your Anthropic API key, run 4 illustrative instance queries (overlaying prime‐quantity evaluation, gross sales information analytics, algorithm implementations, and a easy ML pipeline), and print every validated outcome. Lastly, it gathers and shows a concise execution abstract, complete runs, successes, failures, and error particulars, demonstrating the agent’s dwell “write → run → validate” workflow.

In conclusion, we’ve got developed a flexible AdvancedClaudeCodeAgent able to seamlessly mixing generative reasoning with exact computational management. At its core, this Agent doesn’t simply draft Python snippets; it runs them on the spot and checks their correctness in opposition to your specified standards, closing the suggestions loop routinely. Whether or not you’re performing prime-number analyses, statistical information evaluations, algorithm benchmarking, or end-to-end ML workflows, this sample ensures reliability and reproducibility.


Take a look at the Notebook on GitHub. All credit score for this analysis goes to the researchers of this challenge. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 95k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Leave a Reply

Your email address will not be published. Required fields are marked *