Edited, memorised or added to reading queue

on 04-Nov-2022 (Fri)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

#data-science #infrastructure
we must avoid introducing incidental complexity, or complexity that is not necessitated by the problem itself but is an unwanted artifact of a chosen approach. Incidental complexity is a huge problem for real-world data science because we have to deal with such a high level of inherent complexity that distinguishing between real problems and imaginary problems becomes hard
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#data-science #infrastructure
An effective infrastructure helps to expose and manage inherent complexity, which is the natural state of the world we live in, while making a conscious effort to avoid introducing incidental complexity. Doing this well is hard and requires constant judgment.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#data-science #infrastructure
Previously, many challenges in architecting systems for high-performance computing revolved around resource management: how to guard and ration access to limited compute and storage resources, and, correspondingly, how to make resource usage as efficient as possible. The cloud allows us to change our mindset. All the clouds provide a data layer, like Amazon S3, which provides a virtually unlimited amount of storage with close to a per- fect level of durability and high availability. Similarly, they provide nearly infinite, elas- tically scaling compute resources like Amazon Elastic Compute Cloud (Amazon EC2) and the abstractions built on top of it. We can architect our systems with the assump- tion that we have an abundant amount of compute resources and storage available and focus on cost-effectiveness and productivity instead.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#data-science #infrastructure
A typical bottleneck is caused by the fact that humans can’t deliver software (or hardware, if operating outside the cloud) fast enough. Even if they were capable of hacking code fast enough, they may be busy maintaining existing systems, which is another critically human activity. This observation helps us to realize that although “infrastructure” sounds very technical, we are not building infrastructure for the machines. We are building infrastructure to make humans more productive. This realization has fundamental ramifications to how we should think about and design infrastructure for data scientists— for fellow human beings, instead of for machines. For instance, if we assume that human-time is more expensive than computer-time, which is certainly true for most data scientists, it makes sense to use a highly expressive, productivity-boosting language like Python instead of a low-level language like C++, even if it makes workloads more inefficient to process. We will dig deeper into this question in chapter 5
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#bayes #bayesian #kruschke #programming #r
There are a variety of situations in which it might seem at first that no parameterized model would apply, such as figuring out the probability that a person has some rare disease if a diagnostic test for the disease is positive. But Bayesian analysis does apply even here, although the parameters refer to discrete states instead of continuous distributions. In the case of disease diagnosis, the parameter is the underlying health status of the individual, and the parameter can have one of two values, either “has disease” or “does
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs