on 02-Sep-2019 (Mon)

pdf

cannot see any pdfs

Flashcard 4121444355340

Tags
Question

1) Modulación de la oferta de transporte: (horas valle sobredimensionadas)

2) Aumento de tiempos de recorrido en algunas relaciones anteriormente directas

3) Empeoramiento de los niveles de puntualidad global

4) Saturación en nodos principales

status measured difficulty not learned 37% [default] 0

Flashcard 4361898560780

Tags
#daemon #man
Question
How to declare a daemon ?

#include <unistd.h>

int daemon(int nochdir, int noclose);

status measured difficulty not learned 37% [default] 0
daemon(3) - Linux manual page
DESCRIPTION | RETURN VALUE | ATTRIBUTES | CONFORMING TO | NOTES | BUGS | SEE ALSO | COLOPHON DAEMON(3) Linux Programmer's Manual DAEMON(3) NAME top daemon - run in the background SYNOPSIS top <span>#include <unistd.h> int daemon(int nochdir, int noclose); Feature Test Macro Requirements for glibc (see feature_test_macros(7) ): daemon(): Since glibc 2.21: _DEFAULT_SOURCE In glibc 2.19 and 2.20: _DEFAULT_SOURCE || (_XOPEN_SOURCE &&

Flashcard 4361901706508

Question
How to declare a child process ?
#include <sys/types.h> #include <unistd.h> pid_t fork(void);

status measured difficulty not learned 37% [default] 0
fork(2) - Linux manual page
PSIS | DESCRIPTION | RETURN VALUE | ERRORS | CONFORMING TO | NOTES | EXAMPLE | SEE ALSO | COLOPHON FORK(2) Linux Programmer's Manual FORK(2) NAME top fork - create a child process SYNOPSIS top <span>#include <sys/types.h> #include <unistd.h> pid_t fork(void); DESCRIPTION top fork() creates a new process by duplicating the calling process. The new process is referred to as the child process. The calling process is referred to as the parent pr

Flashcard 4361905376524

Question
How to get the process time ?

SYNOPSIS top

 #include <sys/times.h> clock_t times(struct tms *buf);

status measured difficulty not learned 37% [default] 0
times(2) - Linux manual page
raining NAME | SYNOPSIS | DESCRIPTION | RETURN VALUE | ERRORS | CONFORMING TO | NOTES | BUGS | SEE ALSO | COLOPHON TIMES(2) Linux Programmer's Manual TIMES(2) NAME top times - get process times <span>SYNOPSIS top #include <sys/times.h> clock_t times(struct tms *buf); DESCRIPTION top times() stores the current process times in the struct tms that buf points to. The struct tms is as defined in <sys/times.h>: struct tms { clock_t tms_utime; /* us

Flashcard 4361909046540

Question
How to declare a interval timer ?

SYNOPSIS top

 #include <sys/time.h> int getitimer(int which, struct itimerval *curr_value); int setitimer(int which, const struct itimerval *new_value, struct itimerval *old_value);

status measured difficulty not learned 37% [default] 0
getitimer(2) - Linux manual page
URN VALUE | ERRORS | CONFORMING TO | NOTES | BUGS | SEE ALSO | COLOPHON GETITIMER(2) Linux Programmer's Manual GETITIMER(2) NAME top getitimer, setitimer - get or set value of an interval timer <span>SYNOPSIS top #include <sys/time.h> int getitimer(int which, struct itimerval *curr_value); int setitimer(int which, const struct itimerval *new_value, struct itimerval *old_value); DESCRIPTION top These system calls provide access to interval timers, that is, timers that initially expire at some point in the future, and (optionally) at regular intervals after that

Flashcard 4361912716556

Question
How to create a timer expiration with file descriptor ?

SYNOPSIS top

 #include <sys/timerfd.h> int timerfd_create(int clockid, int flags); int timerfd_settime(int fd, int flags, const struct itimerspec *new_value, struct itimerspec *old_value); int timerfd_gettime(int fd, struct itimerspec *curr_value);

status measured difficulty not learned 37% [default] 0
timerfd_create(2) - Linux manual page
| EXAMPLE | SEE ALSO | COLOPHON TIMERFD_CREATE(2) Linux Programmer's Manual TIMERFD_CREATE(2) NAME top timerfd_create, timerfd_settime, timerfd_gettime - timers that notify via file descriptors <span>SYNOPSIS top #include <sys/timerfd.h> int timerfd_create(int clockid, int flags); int timerfd_settime(int fd, int flags, const struct itimerspec *new_value, struct itimerspec *old_value); int timerfd_gettime(int fd, struct itimerspec *curr_value); DESCRIPTION top These system calls create and operate on a timer that delivers timer expiration notifications via a file descriptor. They provide an alternative to the use of setitimer(

Flashcard 4361916386572

Question
How to create a POSIX- per-process timer ?

SYNOPSIS top

 #include <signal.h> #include <time.h> int timer_create(clockid_t clockid, struct sigevent *sevp, timer_t *timerid); Link with -lrt. Feature Test Macro Requirements for glibc (see feature_test_macros(7)): timer_create(): _POSIX_C_SOURCE >= 199309L

status measured difficulty not learned 37% [default] 0
timer_create(2) - Linux manual page
UE | ERRORS | VERSIONS | CONFORMING TO | NOTES | EXAMPLE | SEE ALSO | COLOPHON TIMER_CREATE(2) Linux Programmer's Manual TIMER_CREATE(2) NAME top timer_create - create a POSIX per-process timer <span>SYNOPSIS top #include <signal.h> #include <time.h> int timer_create(clockid_t clockid, struct sigevent *sevp, timer_t *timerid); Link with -lrt. Feature Test Macro Requirements for glibc (see feature_test_macros(7) ): timer_create(): _POSIX_C_SOURCE >= 199309L DESCRIPTION top timer_create() creates a new per-process interval timer. The ID of the new timer is returned in the buffer pointed to by timerid, which must be a non-null pointer. This

Flashcard 4361920056588

Question
How to get time of the day ?

SYNOPSIS top

 #include <sys/time.h> int gettimeofday(struct timeval *tv, struct timezone *tz); int settimeofday(const struct timeval *tv, const struct timezone *tz);

status measured difficulty not learned 37% [default] 0
gettimeofday(2) - Linux manual page
DESCRIPTION | RETURN VALUE | ERRORS | CONFORMING TO | NOTES | SEE ALSO | COLOPHON GETTIMEOFDAY(2) Linux Programmer's Manual GETTIMEOFDAY(2) NAME top gettimeofday, settimeofday - get / set time <span>SYNOPSIS top #include <sys/time.h> int gettimeofday(struct timeval *tv, struct timezone *tz); int settimeofday(const struct timeval *tv, const struct timezone *tz); Feature Test Macro Requirements for glibc (see feature_test_macros(7) ): settimeofday(): Since glibc 2.19: _DEFAULT_SOURCE Glibc 2.19 and earlier: _BSD_SOURCE DESCRIPTION top The functi

Flashcard 4361923726604

Question
How to set a alarm ?

SYNOPSIS top

 #include <unistd.h> unsigned int alarm(unsigned int seconds);

status measured difficulty not learned 37% [default] 0
alarm(2) - Linux manual page
ME | SYNOPSIS | DESCRIPTION | RETURN VALUE | CONFORMING TO | NOTES | SEE ALSO | COLOPHON ALARM(2) Linux Programmer's Manual ALARM(2) NAME top alarm - set an alarm clock for delivery of a signal <span>SYNOPSIS top #include <unistd.h> unsigned int alarm(unsigned int seconds); DESCRIPTION top alarm() arranges for a SIGALRM signal to be delivered to the calling process in seconds seconds. If seconds is zero, any pending alarm is canceled. In any event any prev

Annotation 4362064497932

 The investigation relied on data in the form of video, recovered debris, and medical findings, each supplemented with modeling and analyses when needed.

pdf

cannot see any pdfs

Annotation 4362122431756

 #reinforcement-learning We consider a broader class of Bellman equations that are non-linear in the rewards and future values: $$v(s)=\mathbb{E}\left[f\left(R_{t+1}, v\left(S_{t+1}\right)\right) | S_{t}=s, A_{t} \sim \pi\left(S_{t}\right)\right]$$ .

pdf

cannot see any pdfs

Flashcard 4362124791052

Tags
#reinforcement-learning
Question
What is the form of a non-linear Bellman equation?
$$v(s)=\mathbb{E}\left[f\left(R_{t+1}, v\left(S_{t+1}\right)\right) | S_{t}=s, A_{t} \sim \pi\left(S_{t}\right)\right]$$

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
We consider a broader class of Bellman equations that are non-linear in the rewards and future values: $$v(s)=\mathbb{E}\left[f\left(R_{t+1}, v\left(S_{t+1}\right)\right) | S_{t}=s, A_{t} \sim \pi\left(S_{t}\right)\right]$$ .

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4362130558220

 #reinforcement-learning Humans and animals seem to exhibit a different type of weighting of the future than would emerge from the standard linear Bellman equation which leads to exponential discounting when unrolled multiple steps because of the repeated multiplication with γ . One consequence is that the preference ordering of two dif- ferent rewards occurring at different times can reverse, depending on how far in the future the first reward is. For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g.,$2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks.

pdf

cannot see any pdfs

Flashcard 4362132131084

Tags
#reinforcement-learning
Question
Give an example of human preference ordering reversal that contradicts the use of exponential discounting in a reward function.
For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g.,$2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
ing of the future than would emerge from the standard linear Bellman equation which leads to exponential discounting when unrolled multiple steps because of the repeated multiplication with γ . <span>One consequence is that the preference ordering of two dif- ferent rewards occurring at different times can reverse, depending on how far in the future the first reward is. For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g.,$2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks. <span>

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4362134490380

Tags
#reinforcement-learning
Question
What type of non-linear form of discounting has been proposed to better fit human and animal preference ordering?
Hyperbolic discounting has been proposed as a well-fitting mathematical model, where a reward in t steps is discounted as R_t /( 1 + kt) , or some variation of this equation.

status measured difficulty not learned 37% [default] 0

pdf

cannot see any pdfs

Flashcard 4362336865548

Tags
#reinforcement-learning
Question
What 3 things do universal successor features approximators (USFAs) combine the advantages of?
"Our proposed universal successor features approximators (USFAs) combine the advantages of all of these, namely the scalability of UVFAs, the instant inference of SFs, and the strong generalisation of GPI."

status measured difficulty not learned 37% [default] 0

pdf

cannot see any pdfs

Annotation 4362345778444

 #reinforcement-learning As mentioned in the introduction, in this paper we are interested in the multitask RL scenario, where the agent has to solve multiple tasks. Each task is defined by a reward function R w ; thus, instead of a single MDP M , our environment is a set of MDPs that share the same structure except for the reward function. Following Barreto et al. (2017), we assume that the expected one-step reward associated with transition $$s \stackrel{a}{\rightarrow} s^{\prime}$$ is given by $$\mathrm{E}\left[R_{\mathbf{w}}\left(s, a, s^{\prime}\right)\right]=r_{\mathbf{w}}\left(s, a, s^{\prime}\right)=\phi\left(s, a, s^{\prime}\right)^{\top} \mathbf{w}$$, where $$\phi\left(s, a, s^{\prime}\right) \in \mathbb{R}^{d}$$ are features of (s, a, s') and $$\mathbf{w} \in \mathbb{R}^{d}$$ are weights.

pdf

cannot see any pdfs

Flashcard 4362351283468

Tags
#reinforcement-learning
Question
What are the successor features of a state-action pair (s, a) under policy π?
The SFs $$\boldsymbol{\psi} \in \mathbb{R}^{d}$$ of a state-action pair (s, a) under policy $$\pi$$ are given by $$\psi^{\pi}(s, a) \equiv \mathrm{E}^{\pi}\left[\sum_{i=t}^{\infty} \gamma^{i-t} \boldsymbol{\phi}_{i+1} | S_{t}=s, A_{t}=a\right]$$, where the $$\phi_{i+1} \in \mathbb{R}^{d}$$ are features of $$(S_i, A_i, S_{i+1})$$

status measured difficulty not learned 37% [default] 0

pdf

cannot see any pdfs

Annotation 4362357050636

 #reinforcement-learning SFs allow one to immediately compute the value of a policy π on any task w : it is easy to show that, when (1) holds, Q π w (s, a) = ψ π (s, a) > w . It is also easy to see that SFs satisfy a Bellman equation in which φ play the role of rewards, so ψ can be learned using any RL method (Szepesv ´ ari, 2010).

pdf

cannot see any pdfs

Flashcard 4362363604236

Tags
#reinforcement-learning
Question
How does a universal value function approximator (UVFA) generalise across the space of tasks in an environment? What assumptions could one make when implementing one?
Generalisation is achieved by modelling the shape of the universal optimal value function Q*(s, a, w). By using a neural network, for example, to represent Q~(s, a, w) one is implicitly assuming that Q∗(s, a, w) is smooth in the space of tasks; roughly speaking, this means that small perturbations to w will result in small changes in Q∗(s, a, w).

status measured difficulty not learned 37% [default] 0

pdf

cannot see any pdfs

Annotation 4362368322828

 #reinforcement-learning As one can see, the types of generalisation provided by UVFAs and SF&GPI are in some sense complementary. It is then natural to ask if we can simultaneously have the two types of generalisation. In this paper we propose a model that provides exactly that. The main insight is actually simple: since SFs are multi-dimensional value functions, we can extend them in the same way as universal value functions extend regular value functions. In the next section we elaborate on how exactly to do so.

pdf

cannot see any pdfs

Flashcard 4362369895692

Tags
#reinforcement-learning
Question
What is the main insight forming USFAs from UVFAs and SFs?
"The main insight is actually simple: since SFs are multi-dimensional value functions, we can extend them in the same way as universal value functions extend regular value functions."

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
SF&GPI are in some sense complementary. It is then natural to ask if we can simultaneously have the two types of generalisation. In this paper we propose a model that provides exactly that. <span>The main insight is actually simple: since SFs are multi-dimensional value functions, we can extend them in the same way as universal value functions extend regular value functions. In the next section we elaborate on how exactly to do so. <span>

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4362372779276

 This chapter’s focus on the organizational context for selection program

pdf

cannot see any pdfs