Constructing a BioCypher-Powered AI Agent for Biomedical Information Graph Technology and Querying


On this tutorial, we implement the BioCypher AI Agent, a robust instrument designed for constructing, querying, and analyzing biomedical data graphs utilizing the BioCypher framework. By combining the strengths of BioCypher, a high-performance, schema-based interface for organic knowledge integration, with the pliability of NetworkX, this tutorial empowers customers to simulate advanced organic relationships resembling gene-disease associations, drug-target interactions, and pathway involvements. The agent additionally contains capabilities for producing artificial biomedical knowledge, visualizing data graphs, and performing clever queries, resembling centrality evaluation and neighbor detection.

!pip set up biocypher pandas numpy networkx matplotlib seaborn


import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import json
import random
from typing import Dict, Checklist, Tuple, Any

We start by putting in the important Python libraries required for our biomedical graph evaluation, together with biocypher, Pandas, NumPy, NetworkX, Matplotlib, and Seaborn. These packages allow us to deal with knowledge, create and manipulate data graphs, and successfully visualize relationships. As soon as put in, we import all vital modules to arrange our improvement surroundings.

strive:
   from biocypher import BioCypher
   from biocypher._config import config
   BIOCYPHER_AVAILABLE = True
besides ImportError:
   print("BioCypher not obtainable, utilizing NetworkX-only implementation")
   BIOCYPHER_AVAILABLE = False

We try and import the BioCypher framework, which offers a schema-based interface for managing biomedical data graphs. If the import is profitable, we allow BioCypher options; in any other case, we gracefully fall again to a NetworkX-only mode, making certain that the remainder of the evaluation can nonetheless proceed with out interruption.

class BiomedicalAIAgent:
   """Superior AI Agent for biomedical data graph evaluation utilizing BioCypher"""
  
   def __init__(self):
       if BIOCYPHER_AVAILABLE:
           strive:
               self.bc = BioCypher()
               self.use_biocypher = True
           besides Exception as e:
               print(f"BioCypher initialization failed: {e}")
               self.use_biocypher = False
       else:
           self.use_biocypher = False
          
       self.graph = nx.Graph()
       self.entities = {}
       self.relationships = []
       self.knowledge_base = self._initialize_knowledge_base()
      
   def _initialize_knowledge_base(self) -> Dict[str, List[str]]:
       """Initialize pattern biomedical data base"""
       return {
           "genes": ["BRCA1", "TP53", "EGFR", "KRAS", "MYC", "PIK3CA", "PTEN"],
           "illnesses": ["breast_cancer", "lung_cancer", "diabetes", "alzheimer", "heart_disease"],
           "medication": ["aspirin", "metformin", "doxorubicin", "paclitaxel", "imatinib"],
           "pathways": ["apoptosis", "cell_cycle", "DNA_repair", "metabolism", "inflammation"],
           "proteins": ["p53", "EGFR", "insulin", "hemoglobin", "collagen"]
       }
  
   def generate_synthetic_data(self, n_entities: int = 50) -> None:
       """Generate artificial biomedical knowledge for demonstration"""
       print("🧬 Producing artificial biomedical knowledge...")
      
       for entity_type, gadgets in self.knowledge_base.gadgets():
           for merchandise in gadgets:
               entity_id = f"{entity_type}_{merchandise}"
               self.entities[entity_id] = {
                   "id": entity_id,
                   "kind": entity_type,
                   "identify": merchandise,
                   "properties": self._generate_properties(entity_type)
               }
      
       entity_ids = record(self.entities.keys())
       for _ in vary(n_entities):
           supply = random.alternative(entity_ids)
           goal = random.alternative(entity_ids)
           if supply != goal:
               rel_type = self._determine_relationship_type(
                   self.entities[source]["type"],
                   self.entities[target]["type"]
               )
               self.relationships.append({
                   "supply": supply,
                   "goal": goal,
                   "kind": rel_type,
                   "confidence": random.uniform(0.5, 1.0)
               })

We outline the BiomedicalAIAgent class because the core engine for analyzing biomedical data graphs utilizing BioCypher. Within the constructor, we verify whether or not BioCypher is out there and initialize it if doable; in any other case, we default to a NetworkX-only method. We additionally arrange our base constructions, together with an empty graph, dictionaries for entities and relationships, and a predefined biomedical data base. We then use generate_synthetic_data() to populate this graph with life like organic entities, resembling genes, illnesses, medication, and pathways, and simulate their interactions by way of randomly generated however biologically significant relationships.

  def _generate_properties(self, entity_type: str) -> Dict[str, Any]:
       """Generate life like properties for various entity varieties"""
       base_props = {"created_at": "2024-01-01", "supply": "artificial"}
      
       if entity_type == "genes":
           base_props.replace({
               "chromosome": f"chr{random.randint(1, 22)}",
               "expression_level": random.uniform(0.1, 10.0),
               "mutation_frequency": random.uniform(0.01, 0.3)
           })
       elif entity_type == "illnesses":
           base_props.replace({
               "prevalence": random.uniform(0.001, 0.1),
               "severity": random.alternative(["mild", "moderate", "severe"]),
               "age_of_onset": random.randint(20, 80)
           })
       elif entity_type == "medication":
           base_props.replace({
               "dosage": f"{random.randint(10, 500)}mg",
               "efficacy": random.uniform(0.3, 0.95),
               "side_effects": random.randint(1, 10)
           })
      
       return base_props


   def _determine_relationship_type(self, source_type: str, target_type: str) -> str:
       """Decide biologically significant relationship varieties"""
       relationships_map = {
           ("genes", "illnesses"): "associated_with",
           ("genes", "medication"): "targeted_by",
           ("genes", "pathways"): "participates_in",
           ("medication", "illnesses"): "treats",
           ("proteins", "pathways"): "involved_in",
           ("illnesses", "pathways"): "disrupts"
       }
      
       return relationships_map.get((source_type, target_type),
                                  relationships_map.get((target_type, source_type), "related_to"))


   def build_knowledge_graph(self) -> None:
       """Construct data graph utilizing BioCypher or NetworkX"""
       print("🔗 Constructing data graph...")
      
       if self.use_biocypher:
           strive:
               for entity_id, entity_data in self.entities.gadgets():
                   self.bc.add_node(
                       node_id=entity_id,
                       node_label=entity_data["type"],
                       node_properties=entity_data["properties"]
                   )
                  
               for rel in self.relationships:
                   self.bc.add_edge(
                       source_id=rel["source"],
                       target_id=rel["target"],
                       edge_label=rel["type"],
                       edge_properties={"confidence": rel["confidence"]}
                   )
               print("✅ BioCypher graph constructed efficiently")
           besides Exception as e:
               print(f"BioCypher construct failed, utilizing NetworkX solely: {e}")
               self.use_biocypher = False
          
       for entity_id, entity_data in self.entities.gadgets():
           self.graph.add_node(entity_id, **entity_data)
          
       for rel in self.relationships:
           self.graph.add_edge(rel["source"], rel["target"],
                             kind=rel["type"], confidence=rel["confidence"])
      
       print(f"✅ NetworkX graph constructed with {len(self.graph.nodes())} nodes and {len(self.graph.edges())} edges")


   def intelligent_query(self, query_type: str, entity: str = None) -> Dict[str, Any]:
       """Clever querying system with a number of evaluation varieties"""
       print(f"🤖 Processing clever question: {query_type}")
      
       if query_type == "drug_targets":
           return self._find_drug_targets()
       elif query_type == "disease_genes":
           return self._find_disease_associated_genes()
       elif query_type == "pathway_analysis":
           return self._analyze_pathways()
       elif query_type == "centrality_analysis":
           return self._analyze_network_centrality()
       elif query_type == "entity_neighbors" and entity:
           return self._find_entity_neighbors(entity)
       else:
           return {"error": "Unknown question kind"}


   def _find_drug_targets(self) -> Dict[str, List[str]]:
       """Discover potential drug targets"""
       drug_targets = {}
       for rel in self.relationships:
           if (rel["type"] == "targeted_by" and
               self.entities[rel["source"]]["type"] == "genes"):
               drug = self.entities[rel["target"]]["name"]
               goal = self.entities[rel["source"]]["name"]
               if drug not in drug_targets:
                   drug_targets[drug] = []
               drug_targets[drug].append(goal)
       return drug_targets


   def _find_disease_associated_genes(self) -> Dict[str, List[str]]:
       """Discover genes related to illnesses"""
       disease_genes = {}
       for rel in self.relationships:
           if (rel["type"] == "associated_with" and
               self.entities[rel["target"]]["type"] == "illnesses"):
               illness = self.entities[rel["target"]]["name"]
               gene = self.entities[rel["source"]]["name"]
               if illness not in disease_genes:
                   disease_genes[disease] = []
               disease_genes[disease].append(gene)
       return disease_genes


   def _analyze_pathways(self) -> Dict[str, int]:
       """Analyze pathway connectivity"""
       pathway_connections = {}
       for rel in self.relationships:
           if rel["type"] in ["participates_in", "involved_in"]:
               if self.entities[rel["target"]]["type"] == "pathways":
                   pathway = self.entities[rel["target"]]["name"]
                   pathway_connections[pathway] = pathway_connections.get(pathway, 0) + 1
       return dict(sorted(pathway_connections.gadgets(), key=lambda x: x[1], reverse=True))


   def _analyze_network_centrality(self) -> Dict[str, Dict[str, float]]:
       """Analyze community centrality measures"""
       if len(self.graph.nodes()) == 0:
           return {}
          
       centrality_measures = {
           "diploma": nx.degree_centrality(self.graph),
           "betweenness": nx.betweenness_centrality(self.graph),
           "closeness": nx.closeness_centrality(self.graph)
       }
      
       top_nodes = {}
       for measure, values in centrality_measures.gadgets():
           top_nodes[measure] = dict(sorted(values.gadgets(), key=lambda x: x[1], reverse=True)[:5])
      
       return top_nodes


   def _find_entity_neighbors(self, entity_name: str) -> Dict[str, List[str]]:
       """Discover neighbors of a selected entity"""
       neighbors = {"direct": [], "oblique": []}
       entity_id = None
      
       for eid, edata in self.entities.gadgets():
           if edata["name"].decrease() == entity_name.decrease():
               entity_id = eid
               break
              
       if not entity_id or entity_id not in self.graph:
           return {"error": f"Entity '{entity_name}' not discovered"}
          
       for neighbor in self.graph.neighbors(entity_id):
           neighbors["direct"].append(self.entities[neighbor]["name"])
          
       for direct_neighbor in self.graph.neighbors(entity_id):
           for indirect_neighbor in self.graph.neighbors(direct_neighbor):
               if (indirect_neighbor != entity_id and
                   indirect_neighbor not in record(self.graph.neighbors(entity_id))):
                   neighbor_name = self.entities[indirect_neighbor]["name"]
                   if neighbor_name not in neighbors["indirect"]:
                       neighbors["indirect"].append(neighbor_name)
                      
       return neighbors


   def visualize_network(self, max_nodes: int = 30) -> None:
       """Visualize the data graph"""
       print("📊 Creating community visualization...")
      
       nodes_to_show = record(self.graph.nodes())[:max_nodes]
       subgraph = self.graph.subgraph(nodes_to_show)
      
       plt.determine(figsize=(12, 8))
       pos = nx.spring_layout(subgraph, okay=2, iterations=50)
      
       node_colors = []
       color_map = {"genes": "crimson", "illnesses": "blue", "medication": "inexperienced",
                   "pathways": "orange", "proteins": "purple"}
      
       for node in subgraph.nodes():
           entity_type = self.entities[node]["type"]
           node_colors.append(color_map.get(entity_type, "grey"))
      
       nx.draw(subgraph, pos, node_color=node_colors, node_size=300,
               with_labels=False, alpha=0.7, edge_color="grey", width=0.5)
      
       plt.title("Biomedical Information Graph Community")
       plt.axis('off')
       plt.tight_layout()
       plt.present()

We designed a set of clever capabilities inside the BiomedicalAIAgent class to simulate real-world biomedical eventualities. We generate life like properties for every entity kind, outline biologically significant relationship varieties, and construct a structured data graph utilizing both BioCypher or NetworkX. To achieve insights, we included capabilities for analyzing drug targets, disease-gene associations, pathway connectivity, and community centrality, together with a visible graph explorer that helps us intuitively perceive the interactions between biomedical entities.

  def run_analysis_pipeline(self) -> None:
       """Run full evaluation pipeline"""
       print("🚀 Beginning BioCypher AI Agent Evaluation Pipelinen")
      
       self.generate_synthetic_data()
       self.build_knowledge_graph()
      
       print(f"📈 Graph Statistics:")
       print(f"   Entities: {len(self.entities)}")
       print(f"   Relationships: {len(self.relationships)}")
       print(f"   Graph Nodes: {len(self.graph.nodes())}")
       print(f"   Graph Edges: {len(self.graph.edges())}n")
      
       analyses = [
           ("drug_targets", "Drug Target Analysis"),
           ("disease_genes", "Disease-Gene Associations"),
           ("pathway_analysis", "Pathway Connectivity Analysis"),
           ("centrality_analysis", "Network Centrality Analysis")
       ]
      
       for query_type, title in analyses:
           print(f"🔍 {title}:")
           outcomes = self.intelligent_query(query_type)
           self._display_results(outcomes)
           print()
      
       self.visualize_network()
      
       print("✅ Evaluation full! AI Agent efficiently analyzed biomedical knowledge.")
      
   def _display_results(self, outcomes: Dict[str, Any], max_items: int = 5) -> None:
       """Show evaluation leads to a formatted means"""
       if isinstance(outcomes, dict) and "error" not in outcomes:
           for key, worth in record(outcomes.gadgets())[:max_items]:
               if isinstance(worth, record):
                   print(f"   {key}: {', '.be a part of(worth[:3])}{'...' if len(worth) > 3 else ''}")
               elif isinstance(worth, dict):
                   print(f"   {key}: {dict(record(worth.gadgets())[:3])}")
               else:
                   print(f"   {key}: {worth}")
       else:
           print(f"   {outcomes}")


   def export_to_formats(self) -> None:
       """Export data graph to numerous codecs"""
       if self.use_biocypher:
           strive:
               print("📤 Exporting BioCypher graph...")
               print("✅ BioCypher export accomplished")
           besides Exception as e:
               print(f"BioCypher export failed: {e}")
      
       print("📤 Exporting NetworkX graph to codecs...")
      
       graph_data = {
           "nodes": [{"id": n, **self.graph.nodes[n]} for n in self.graph.nodes()],
           "edges": [{"source": u, "target": v, **self.graph.edges[u, v]}
                    for u, v in self.graph.edges()]
       }
      
       strive:
           with open("biomedical_graph.json", "w") as f:
               json.dump(graph_data, f, indent=2, default=str)
          
           nx.write_graphml(self.graph, "biomedical_graph.graphml")
           print("✅ Graph exported to JSON and GraphML codecs")
       besides Exception as e:
           print(f"Export failed: {e}")


   def export_to_formats(self) -> None:
       """Export data graph to numerous codecs"""
       if self.use_biocypher:
           strive:
               print("📤 Exporting BioCypher graph...")
               print("✅ BioCypher export accomplished")
           besides Exception as e:
               print(f"BioCypher export failed: {e}")
      
       print("📤 Exporting NetworkX graph to codecs...")
      
       graph_data = {
           "nodes": [{"id": n, **self.graph.nodes[n]} for n in self.graph.nodes()],
           "edges": [{"source": u, "target": v, **self.graph.edges[u, v]}
                    for u, v in self.graph.edges()]
       }
      
       with open("biomedical_graph.json", "w") as f:
           json.dump(graph_data, f, indent=2, default=str)
      
       nx.write_graphml(self.graph, "biomedical_graph.graphml")
      
       print("✅ Graph exported to JSON and GraphML codecs")
       """Show evaluation leads to a formatted means"""
       if isinstance(outcomes, dict) and "error" not in outcomes:
           for key, worth in record(outcomes.gadgets())[:max_items]:
               if isinstance(worth, record):
                   print(f"   {key}: {', '.be a part of(worth[:3])}{'...' if len(worth) > 3 else ''}")
               elif isinstance(worth, dict):
                   print(f"   {key}: {dict(record(worth.gadgets())[:3])}")
               else:
                   print(f"   {key}: {worth}")
       else:
           print(f"   {outcomes}")

We wrap up the AI agent workflow with a streamlined run_analysis_pipeline() operate that ties all the things collectively, from artificial knowledge technology and graph development to clever question execution and last visualization. This automated pipeline allows us to look at biomedical relationships, analyze central entities, and perceive how completely different organic ideas are interconnected. Lastly, utilizing export_to_formats(), we be certain that the ensuing graph might be saved in each JSON and GraphML codecs for additional use, making our evaluation each shareable and reproducible.

if __name__ == "__main__":
   agent = BiomedicalAIAgent()
   agent.run_analysis_pipeline()

We conclude the tutorial by instantiating our BiomedicalAIAgent and working the complete evaluation pipeline. This entry level allows us to execute all steps, together with knowledge technology, graph constructing, clever querying, visualization, and reporting, in a single, streamlined command, making it simple to discover biomedical data utilizing BioCypher.

In conclusion, by way of this superior tutorial, we achieve sensible expertise working with BioCypher to create scalable biomedical data graphs and carry out insightful organic analyses. The twin-mode assist ensures that even when BioCypher is unavailable, the system gracefully falls again to NetworkX for full performance. The flexibility to generate artificial datasets, execute clever graph queries, visualize relationships, and export in a number of codecs showcases the pliability and analytical energy of the BioCypher-based agent. General, this tutorial exemplifies how BioCypher can function a crucial infrastructure layer for biomedical AI techniques, making advanced organic knowledge each usable and insightful for downstream functions.


Take a look at the Codes here. All credit score for this analysis goes to the researchers of this venture. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter.


Sana Hassan, a consulting intern at Marktechpost and dual-degree scholar at IIT Madras, is obsessed with making use of know-how and AI to handle real-world challenges. With a eager curiosity in fixing sensible issues, he brings a recent perspective to the intersection of AI and real-life options.

Leave a Reply

Your email address will not be published. Required fields are marked *