0. 목표

<aside> ✨

PyTorch 모델 vs ExecuTorch 모델 실행 결과 비교

</aside>

1. 파일 구성

├── model
│   ├── __init__.py
│   ├── export_resnet18.py # 모델을 export해서 .pte 파일 추출
│   └── model_outputs # 추출된 .pte 파일 디렉토리
│       └── resnet18.pte # ResNet18의 .pte 파일
└── tests
    ├── __init__.py
    └── test_resnet18_equivalence.py # pytest: 실제 모델과 executorch 모델 결과 비교

2. Export ResNet18

a. Export ResNet18 Model

# 1. Export ResNet18 model
model = resnet18(weights=ResNet18_Weights.DEFAULT).eval()
example_inputs = (torch.randn(1, 3, 224, 224),)
exported_program = torch.export.export(model, example_inputs)

b. Optimize for Target HW

# 2. Optimize for target hardware (backend = XNNPACK)
program = to_edge_transform_and_lower(
    exported_program,
    partitioner=[XnnpackPartitioner()],
).to_executorch()

c. Save for Deployment

# 3. Save for deployment
with open(model_path, "wb") as f:
    f.write(program.buffer)

2. Test Models

a. Common Values

# 0. Common Values
num_inference = 30 # 같은 횟수만큼 추론(inference) 반복
example_input = torch.randn(1, 3, 224, 224) # 같은 input 사용

b. Load & Validate PyTorch Model

# 1-1. Load ResNet18
model = resnet18(weights=ResNet18_Weights.DEFAULT).eval()

# 1-2. Validate ResNet18
torch_latencies = [] # get mean/max latency 
for _ in range(num_inference): # 추론 반복 횟수
    start = time.perf_counter() # second 단위
    with torch.no_grad(): # inference mode
        torch_output = model(example_input)
    torch_latencies.append((time.perf_counter() - start) * 1000) # millisecond 단위

torch_output_np = torch_output.detach().cpu().numpy() # torch inference result

c. Load & Validate ExecuTorch Model