# TD Controll

#### 1. Sarsa <a href="#id-1-sarsa" id="id-1-sarsa"></a>

TD(0)의 알고리즘은 다음과 같습니다.

![](https://3375536638-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LQlJLECo8fSZIbzAshb%2F-LQlO4IvpK1w0Y9Unen3%2F-LQlTm0G8OkI-GqdjPtN%2Fimage.png?alt=media\&token=b68c8dd8-afe9-42c0-82ce-a3787c15c0f3)

하지만 model-free control이 되기 위해서는 action-value function을 사용해야한다고 말했었습니다. 따라서 위 TD(0)의 식에서 value function을 action value function으로 바꾸어주면 Sarsa가 됩니다. Sarsa는 아래 backup diagram에서 따온 이름으로 아래 update식을 보면 현재 state-action pair에서 다음 state와 다음 action까지를 보고 update하기 때문에 붙은 이름입니다. TD(0)를 이해했다면 크게 어려운 점이 없는 부분입니다.&#x20;

![](https://3375536638-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LQlJLECo8fSZIbzAshb%2F-LQlO4IvpK1w0Y9Unen3%2F-LQlTowbt42Fmdiu4AUR%2Fimage.png?alt=media\&token=54c09a02-6c2f-4c9f-b080-84291c864606)

Sarsa는 따라서 TD(0)를 가지고 action-value function으로 바꾸고 $$\epsilon$$-greedy policy improvement를 한 것 입니다.

![](https://3375536638-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LQlJLECo8fSZIbzAshb%2F-LQlO4IvpK1w0Y9Unen3%2F-LQlTt2wc6s-iwucsN4g%2Fimage.png?alt=media\&token=fcb25d5f-79ed-4dfe-bceb-01f83f3b84e6)

Sarsa의 algorithm을 보면 다음과 같습니다. on-policy TD control algorithm으로서 매 time-step마다 현재의 Q value를 imediate reward와 다음 action의 Q value를 가지고 update합니다. policy는 따로 정의되지는 않고 이 Q value를 보고 $$\epsilon$$-greedy하게 움직이는 것 자체가 policy입니다.&#x20;

![](https://3375536638-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LQlJLECo8fSZIbzAshb%2F-LQlO4IvpK1w0Y9Unen3%2F-LQlTuGM6l_X2eZmScqH%2Fimage.png?alt=media\&token=619d8242-e064-49ec-abcf-59ee96565831)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dnddnjs.gitbook.io/rl/chapter-6-temporal-difference-methods/td-controll.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.