<p dir="ltr">In this work, we examine, compare and contrast the performance of a number of
transformer-based LLMs perform in a zero-shot/few-shot environment, to generate task
plans and propose actionable steps. We also investigate the impact of human oversight
on the performance of these models and whether it can refine these plans in real-time,
optimizing efficiency and resource management. Our work primarily explores how a
HITL-based approach contributes to the optimization of task and path planning in terms
of the overall reduction in the number of rounds of planning in each step and number of
calls required made to the LLM to converge to a plan and ensure accurate performance
Additionally, we also compare the performance of smaller and more cost-effective LLMs,
such as Llama3.1 with HITL, to larger models like GPT-4 without HITL, in the context of
robotic task planning to see if HITL can help bridge the performance gap between them.
A comparatively smaller model, while more efficient and faster in real-time applications,
tends to generate less detailed and comprehensive plans, requiring more frequent human intervention. In contrast, larger models produce more robust initial plans but at the cost of
increased computational resources and potential delays. Our findings suggest that the HITL framework with properly structured feedback might not only improve the adaptability
and precision of LLM-driven robotic systems but also enhance their efficiency by optimizing
the task planning process, hence allowing smaller, cost-effective models to reach the performance levels of large enterprise models for robotic task planning.</p>