<p dir="ltr">In the contemporary landscape of technology, Artificial Intelligence (AI) — especially computer vision techniques within Machine Learning (ML) and Deep Learning (DL) — has played an important role in agriculture and forestry. Applications such as automated irrigation systems, agricultural drones for field analysis, and crop monitoring arrangements provide substantial benefits in addressing various challenges. One critical issue is the spread of invasive plant species, such as Eastern Red Cedar (ERC) trees. Managing and controlling the growth and structure of this tree is a real-world concern due to its scale and cost, often deemed impractical with traditional theory-oriented approaches. As many vision models, such as Segment Anything Model (SAM), often produce fragmented masks that fail to align with expert-defined boundaries in complex natural environments, alternative perspectives and approaches are required to address this limitation. One promising direction is to create multimodal solutions that combine vision and text information to enhance segmentation performance and contextual understanding.</p><p dir="ltr">To overcome these limitations, this research proposes CedarSAM, a fine-tuned model trained on ERC datasets collected via Unmanned Aerial Vehicle (UAV) imagery. The image data is enriched with spatial and contextual metadata and is further extended by incorporating high-resolution segmentation masks and aligned textual descriptions. The model fine-tunes the mask decoder while keeping the image encoder and prompt encoder frozen, enabling efficient domain adaptation under limited data conditions. The modeling pipeline begins with Convolutional Neural Network (CNN)-based classification, progresses through Faster R-CNN (both using ResNet-50 backbones), and culminates in the use of the SAM, which employs a Vision Transformer (ViT)-based architecture for advanced segmentation.</p><p dir="ltr">Despite the limited number of training samples, CedarSAM has achieved notable improvements across key image segmentation metrics, including Intersection over Union (IoU), Dice score, precision, recall, and inference speed, demonstrating its robustness under data-scarce conditions. Beyond segmentation performance, CedarSAM enhances interpretability and field applicability through a rule-based metadata extraction pipeline that parses spatial information and structured descriptions from image-level annotations. This approach enables context-aware ecological recommendations—such as targeted removal for small trees, mechanical removal for mature specimens, and systematic strategies for clustered tree formations. The proposed methodology demonstrates both high segmentation accuracy and practical usability through structured post-processing, offering an accessible interface for non-expert users and field practitioners. This research lays the groundwork for real-time decision support systems in ecological management and provides structured image-text outputs that serve as foundational data for future Vision-Language Model (VLM) development.</p>