[Paper] A Composable Specification Language for Reinforcement Learing Tasks

IT/Paper

[Paper] A Composable Specification Language for Reinforcement Learing Tasks

성진팍 2020. 9. 5. 19:18

아직 코드는 못돌려봄. 작성중

multiple objective를 위해 control task 명시를 위한 language를 제안함

reaching q

reaching p

avoind region O

positive fule : reward function

state space ->

reward shaping??

q에 가까워지면서 p에 닿지 않는 partial credit을 만드는 reward.

control task를 명시하기 위한 language를 제안했음

predicates: 서술어
disjunctions: 괴리? or
primitives: 기초요소
user가 state에 대한 objective, constarint를 명시하면 -> 순차적으로 primitive를

사용자는 given task를 수행하기 위해 필요로 되는 acion 시퀀스를 염두에 둔다.

user가 q에 가기 위해 날아가고, 사진도 직고, 다시 p로 돌아온다. building O는 피하고. battery방전 없이.

action에 대한 시퀀스를 계속 명시했네

-> 그런데 우리는 action에 대한 다양한 시퀀스가 있는건가? 그걸정의할수있는건지 모르겠음

low level단의 액션시퀀스를 제공하지 않더라도 user가 task를 명시할수있다.

우리는 우리 언어를위한 컴파일러를 제안합니다. 사용자가 제공 한 작업 사양을 가져와 작업을 수행하는 제어 정책을 생성합니다.

user가 제공한 task .

user가 task structure를 제공하면 RL이 low level detail을 채운다?

환경에 대한 일반화 시키는 것

아 그런데 우리는 search problem이라서 해당안되는것같기도 함....

github.com/keyshor/spectrl_tool

여기에 코드가있다

TLTL ()

CCE (cross entropy method) 비교하여 성능이 좋앗다

저작자표시

'IT > Paper' 카테고리의 다른 글

[RL] Never Give Up: Learning Directed Exploration Strategies, ICML 2020 (0)	2021.03.04
[XAI] Fooling Neural Network Interpretations via Adversarial Model Manipulation, NeurIPS 2019 (0)	2021.03.04
[XAI] RAP, Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks, AAAI 2020 (0)	2021.03.02
[Paper] AutoCkt: Deep Reinforcement Learning of AnalogCircuit Designs (0)	2020.09.08
[Model] Sequence to Sequence Learning with Neural Networks (0)	2020.09.05

현재글[Paper] A Composable Specification Language for Reinforcement Learing Tasks

jin's blog Endure

jin's blog

Endure

R-CNN, XAI, intergrated gradient, Fast R-CNN, Never Give Up, Interpretability Beyond Feature Attribution:Quantitative Testing with Concept Activation Vectors, Paper리뷰, Deconvolution Network, TCAV, They Are Features, CAV, Axiomatic Attribution for Deep Networks, vision transformer, Concept vector, Adversarial Examples Are Not Bugs, Regularizing Trajectory Optimization with Denoising Autoencoders, smoothGrad, Learning Directed Exploration Strategies, RL논문, Quantifying Attention Flow in Transformers,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

jin's blog