on 27-Jan-2020 (Mon)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Flashcard 1602824572172

Tags
#broker #estate #real
Question
Middle Ages under old English Law had transfers of Real Property by what means?
Answer

status measured difficulty not learned 37% [default] 0

Flashcard 4881260612876

Tags
#python
Question
Python shape manipulation functions
Answer
Flatten, Resize, Stack , Reshape, Split
Flatten: array.ravel()
Reshape: array.reshape(3,4) #3 rows, 4 columns
Resize: array.resize(2,6) #2 resizes again to 2rows, 6 columns
Split: np.hsplit(array,2) #splits array to 2
Stack: np.hstack((array1,array2,2))

status measured difficulty not learned 37% [default] 0

Flashcard 4881262447884

Tags
#python
Question
Linear Algebra functions. transpose, inverse, trace.
Answer

array=np.array([[1,2,3,4],[5,6,7,8]])

array.transpose()
np.linalg.inv(array) #eg 1/value
np.trace(array) #sum of diagonals left to right only

status measured difficulty not learned 37% [default] 0

Flashcard 4883816254732

Tags
#python
Question

create a panda series.

from which type of data?

Answer

import pandas as pd

someSeries = pd.Series(list/nd.array)
someSeries = pd.Series(5. , index['a','b','c'])
someSeries = pd.Series([1,2,3] , index['a','b','c']) #like a dictionary

status measured difficulty not learned 37% [default] 0

Flashcard 4884405292300

Tags
#python
Question

What is a python dataframe?

(number of dimensions, same/different data types)

Answer

DataFrame is a

• two-dimensional
• labeled data structure with columns of
• potentially different types.

status measured difficulty not learned 37% [default] 0

Flashcard 4884407127308

Tags
#python
Question

Syntax for Creating DataFrames from

1. Lists
2. Dictionary
3. Series
4. nd.array
Answer

Syntax for Creating DataFrames
pd.Dataframe

1. Lists
pd.Dataframe{'columnName1':['val1','val2'], 'columnName2':[1,2]}
2. Dictionary
pd.Dataframe{'columnName1':{key1:value1}, 'columnName2':[key2:val2]}
3. Series
series1 = pd.Series([values],index=[indexes])
series2 = pd.Series(....)
#they have the same kind of indexes, eg by year.
newDF = pd.Dataframe({'colName1':series1,'colName2':series2})
4. nd.array
Create an ndarrays with years. np.array([2001,2020,2019])
Create a dict with the ndarray. dict = {'year':np_arr}
Pass this dict to a new DataFrame. df = pd.DataFrame(dict)
#it will have an index sequenced, column name year, with the values in the np array

status measured difficulty not learned 37% [default] 0

Flashcard 4884410535180

Tags
#DataScience #python
Question
Handle Missing Values with Functions. (2 ways)
Answer
1. Dropping the NaN (null) values. Use .dropna()
2. Filling the NaN values with something else.
fill with zeros. .fillna(0)
#can also fill with mean of the data.

status measured difficulty not learned 37% [default] 0

Flashcard 4884412370188

Tags
#DataScience #python
Question
Custom functions can be applied to the dataframe.
Name it's use and syntax
Answer

Eg, for creating new features, or standardizing.
custom functions can be applied with the applymap method.

df.applymap(functionName)

status measured difficulty not learned 37% [default] 0

Flashcard 4884414205196

Tags
#DataScience #python
Question

Dataframe statistical functions

Answer
.max()
.min()
.mean()
.std()
etc.

status measured difficulty not learned 37% [default] 0

Flashcard 4884416040204

Tags
#DataScience #python
Question
Data Operation Using Groupby - syntax.
Answer

grouped = df.groupby(field)

extract = grouped.get_group(wantedValue)

status measured difficulty not learned 37% [default] 0

Flashcard 4884417875212

Tags
#DataScience #python
Question
dataframe Data Operation – Sorting
Answer
df.sort_by('columnName')

status measured difficulty not learned 37% [default] 0

Flashcard 4884419710220

Tags
#DataScience #python #statistics
Question
Data Standardization applied on data. (define a function)
Answer
def standardize(test):
return(test-test.mean())/test.std()
#test. standardize(df['Test1'])
def standardizeResult(dataFrame):
return dataFrame.apply(standardize)
standardizeResult(df)
#get a dataframe with standardized figures. eg most within +-3 sd (standard deviations)

status measured difficulty not learned 37% [default] 0

Flashcard 4884421545228

Tags
#python
Question

python syntax for

1. indexing by label
2. indexing by position
Answer

python syntax for

1. indexing by label: loc
2. indexing by position: iloc

status measured difficulty not learned 37% [default] 0

Flashcard 4884423380236

Tags
#python
Question
While viewing a dataframe, head() method will _____.
Answer
The default value is 5 if nothing is passed in head method. So it will return the first five rows of the DataFrame.

status measured difficulty not learned 37% [default] 0

Flashcard 4884425215244

Tags
#DataScience #machineLearning
Question

Machine Learning Terminology:

1. Columns
2. Rows
3. Outcome
Answer

Machine Learning Terminology:

1. Columns: Features, attributes, inputs
2. Rows: Observations, samples, records
3. Outcome: Response, target, label

status measured difficulty not learned 37% [default] 0

Flashcard 4884427836684

Tags
#DataScience #machineLearning
Question
Machine Learning Approach/Steps
Answer
1. Understand the problem/dataset. Also deal with the outliers and null values?
2. Extract the features from the dataset. Check correlations, meaningful fields.
3. Identify the problem type. Continuous/Catergorical?
4. Choose the right model. Linear regression, logistic regression, clustering?
5. Train and test the model. Check accuracy, errors.
6. Strive for accuracy. Play with factors or relook at features if required.

status measured difficulty not learned 37% [default] 0

Flashcard 4884430195980

Tags
#DataScience #machineLearning
Question
What is Supervised Learning
Answer
1. The dataset used to train a model should have observations, features, and responses.
The model is trained to predict the “right” response for a given set of data points.
2. Supervised learning models are used to predict an outcome.
3. The goal of this model is to “generalize” a dataset so that the “general rule” can be applied to new data as well.

status measured difficulty not learned 37% [default] 0

Flashcard 4884432030988

Tags
#DataScience #machineLearning
Question
What is unsupervised learning
Answer
1. In unsupervised learning, the response or the outcome of the data is unknown.
2. Supervised learning models are used to identify and visualize patterns in data by grouping similar types of data.
3. The goal of this model is to “represent” data in a way that meaningful information can be extracted.

status measured difficulty not learned 37% [default] 0

Flashcard 4884433865996

Tags
#DataScience #machineLearning
Question
Identify the Problem Type and Learning Model for supervised/unsupervised learning.
Answer

Supervised
Continuous: Linear regression
Catergorical: Classification, logistic regression

Unsupervised
Continuous: Dimensionality reduction
Catergorical: Clustering

status measured difficulty not learned 37% [default] 0

Flashcard 4884435701004

Tags
#DataScience #machineLearning #python
Question
Scikit-Learn Considerations
Answer
• Create separate objects for feature and response.
• Ensure that features and response have only numeric values.
• Features and response should be in the form of a NumPy ndarray.
• Since features and response would be in the form of arrays, they would have shapes and sizes.
• Features are always mapped as x, and response is mapped as y.

status measured difficulty not learned 37% [default] 0

Flashcard 4884437536012

Tags
#DataScience #machineLearning #python
Question
The estimator instance in Scikit-learn is a _____.
Answer
The estimator instance or object is a model.

status measured difficulty not learned 37% [default] 0

Flashcard 4884439371020

Tags
#DataScience #mathematics-basic
Question
simple linear equation
Answer

y = mx + c

𝑦 = β0 + β1𝑥 + u
(u is the residuals)

status measured difficulty not learned 37% [default] 0

Flashcard 4884501761292

Tags
#DataScience #mathematics
Question
Errors in linear regression
Answer

SSR ~
Regression Sum of Squares
between the observed value - regression line
the sum of the differences between the predicted value and the mean of the dependent variable
Think of it as a measure that describes how well our line fits the data.

SSE or ESS ~
explained sum of squares / Error sum of Squares
between the regression line - mean of response variable
the difference between the observed value and the predicted value.

SST ~
Sum of squares total.
the squared differences between the observed dependent variable and its mean.
SST = SSR + SSE

RSS ~
or residual sum of squares. Residual as in: remaining or unexplained.

status measured difficulty not learned 37% [default] 0

Flashcard 4884507790604

Tags
#DataScience #machineLearning #python
Question
scikit learn linear model . syntax
Answer
sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)

status measured difficulty not learned 37% [default] 0

Flashcard 4884510412044

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Clustering. It is used to:
Answer
It is used:
• To extract the structure of the data
• To identify groups in the data

status measured difficulty not learned 37% [default] 0

Flashcard 4884512771340

Tags
#DataScience #machineLearning
Question
K-means Clustering. How it is created
Answer
K-means finds the best centroids by alternatively assigning random centroids to a dataset and selecting
mean data points from the resulting clusters to form new centroids. It continues this process iteratively
until the model is optimized.

status measured difficulty not learned 37% [default] 0

Flashcard 4884514868492

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Dimensionality Reduction. What is?
Answer

It reduces a high-dimensional dataset into a dataset with fewer dimensions.

This makes it easier and faster for the algorithm to analyze the data.

status measured difficulty not learned 37% [default] 0

Flashcard 4884516965644

Tags
#DataScience #machineLearning
Question
techniques used for dimensionality reduction:
Answer

Drop data columns with missing values

Drop data columns with low variance

Drop data columns with high correlations

Apply statistical functions - PCA

status measured difficulty not learned 37% [default] 0

Flashcard 4884518800652

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Principal Component Analysis (PCA)
Answer
It is a linear dimensionality reduction method which uses singular value decomposition of the data and
keeps only the most significant singular vectors to project the data to a lower-dimensional space.

status measured difficulty not learned 37% [default] 0

Article 4884520635660

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner

Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner and Jennifer B. Nuzzo Eric S. Toner Search for more papers by this author and Jennifer B. Nuzzo Search for more papers by this author Published Online:25 May 2011 About Figures References Related Details View PDF View PDF Plus Sections Diseases Jump the Species Barrier More Interconnected and Urbanized Hospitals Can Amplify Disease Hospital Infection Control Measures International Scientific Collaboration Disease Doesn't Stop at the Border Preparing to Respond Can Save Lives Superspreading and Respiratory Transmission What Remains To Be Done View Article View PDF View PDF Plus Tools Add to favorites Download Citations Track Citations Permissions Back To Publication Share Share on Facebook Twitter Lin

Annotation 4884522470668

 In recent years, most emerging infectious disease events have been the result of mutations in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV.
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ier The risk of infectious diseases jumping the species barrier remains a clear and present danger. People have been catching diseases from animals (zoonoses) as long as there have been people. <span>In recent years, most emerging infectious disease events have been the result of mutations in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV. SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Gu

Annotation 4884524043532

 SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Guangdong Province.
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ons in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV. <span>SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Guangdong Province. As humans encroach ever more deeply into previously wild areas, the incidence of zoonotic infections will likely increase. In recent years we have seen zoonotic outbreaks of ebola, Marb

Annotation 4884525878540

 Modern urban environments have conditions, such as high population density, poor sanitation, and many poor, malnourished people, that may accelerate the spread of emerging infections. For instance, the large outbreak of SARS at the Amoy Gardens apartment complex in Honk Kong (329 patients) is at least partially related to its enormous size and density—19,000 residents in 0.04 km2.6
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ict the emergence of a novel human coronavirus years before SARS appeared.5 More Interconnected and Urbanized The risk of pandemics grows as the world becomes more interconnected and urbanized. <span>Modern urban environments have conditions, such as high population density, poor sanitation, and many poor, malnourished people, that may accelerate the spread of emerging infections. For instance, the large outbreak of SARS at the Amoy Gardens apartment complex in Honk Kong (329 patients) is at least partially related to its enormous size and density—19,000 residents in 0.04 km2.6 Because of their great population density, the burgeoning megacities around the world may contribute to the spread of novel contagious diseases.7,8 The introduction of a highly contagio

Annotation 4884527975692

 It is reasonable to estimate that the case fatality rate for SARS was cut in half by sophisticated modern health care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.)
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
pandemic, modern hospitals and the sophisticated care they provided were double-edged swords. It is certainly true that many victims of SARS were saved in intensive care units around the world. <span>It is reasonable to estimate that the case fatality rate for SARS was cut in half by sophisticated modern health care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.) On the other hand, it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11 These superspreading ev

Annotation 4884529548556

 it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
alth care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.) On the other hand, <span>it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11 These superspreading events were very often related to certain medical procedures—such as endotracheal intubation, airway suctioning, and noninvasive ventilation—that turn respiratory d

Annotation 4884531383564

 Most SARS infections probably occurred in hospitals, and nearly all cases of SARS can be traced back to one or more nosocomial superspreading events starting with relatively small hospital outbreaks in rural Guangdong, then large nosocomial outbreaks in Guangzhou, Hong Kong, Hanoi, Beijing, Singapore, and Toronto.2
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
g events were very often related to certain medical procedures—such as endotracheal intubation, airway suctioning, and noninvasive ventilation—that turn respiratory droplets into aerosols.12,13 <span>Most SARS infections probably occurred in hospitals, and nearly all cases of SARS can be traced back to one or more nosocomial superspreading events starting with relatively small hospital outbreaks in rural Guangdong, then large nosocomial outbreaks in Guangzhou, Hong Kong, Hanoi, Beijing, Singapore, and Toronto.2 That hospitals can function as disease amplifiers is not entirely new: Outbreaks of influenza occur in healthcare facilities every year, and many hospital-related outbreaks of TB have b

Annotation 4884533480716

 SARS was brought under control within a matter of months largely due to the fact that the disease was most transmissible when the patients were most sick—that is, when they were in a hospital.2 There was relatively little community transmission of SARS compared to other respiratory infections like influenza.18 For this reason, controlling the transmission in hospitals was key in controlling the outbreak.19
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
red with other infectious diseases as well, including smallpox14,15 and ebola.16,17 Hospital Infection Control Measures Hospital infection control measures work to stop the spread of pandemics. <span>SARS was brought under control within a matter of months largely due to the fact that the disease was most transmissible when the patients were most sick—that is, when they were in a hospital.2 There was relatively little community transmission of SARS compared to other respiratory infections like influenza.18 For this reason, controlling the transmission in hospitals was key in controlling the outbreak.19 This also explains the large percentage of healthcare workers who became infected and the large percentage of victims who acquired their infections in hospitals. For the most part (with

Flashcard 4884535053580

Tags
#DataScience #machineLearning
Question
__ is mainly used to combine multiple models or estimators
Answer
Pipeline

status measured difficulty not learned 37% [default] 0

Annotation 4884535577868

 Transmission [of SARS] in hospitals was brought under control by the use of standard infection control practices, such as isolation of sick patients and wearing of masks, gowns, and gloves by hospital staff.20,21
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
rkers who became infected and the large percentage of victims who acquired their infections in hospitals. For the most part (with the very important exception of aerosol-generating procedures), <span>transmission in hospitals was brought under control by the use of standard infection control practices, such as isolation of sick patients and wearing of masks, gowns, and gloves by hospital staff.20,21 For those high-risk aerosol-generating procedures, more stringent measures, such as the use of negative pressure isolation and high-efficiency respirators, were effective in reducing tr

Flashcard 4884539247884

Tags
#DataScience #machineLearning #python
Question
Model Persistence
Answer

Save model for the future use. No need to retrain your model every time when you need them.

It is possible to save a model by using Python's Pickle method.
Scikit-learn has a special replacement for pickle called joblib.
You can use joblib.dump and joblib.load methods.

status measured difficulty not learned 37% [default] 0

Annotation 4884539772172

 Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.

Annotation 4884541345036

 Isolation is the sequestration of individuals known to have the infection.
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i

Flashcard 4884543704332

Question
[...] is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.
Answer
Quarantine

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they t

Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.

Flashcard 4884545277196

Question
Quarantine is [...].
Answer
the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.

Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.

Flashcard 4884547374348

Tags
#DataScience #machineLearning #python
Question
Model Evaluation: Metric Functions. Syntax for classification, clustering and regression.
Answer

Classification
metrics.accuracy_score
metrics.average_precision_score

Clustering
metrics.adjusted_rand_score

Regression
metrics.mean_absolute_error
metrics.mean_squared_error
metrics.median_absolute_error

status measured difficulty not learned 37% [default] 0

Flashcard 4884548685068

Question
[...] is the sequestration of individuals known to have the infection.
Answer
Isolation

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Isolation is the sequestration of individuals known to have the infection.

Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i

Flashcard 4884549733644

Question
Isolation is [...].
Answer
the sequestration of individuals known to have the infection

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Isolation is the sequestration of individuals known to have the infection.

Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i

Annotation 4884551830796

 Various types of travel screening were employed by a number of countries. Despite screening of millions of travelers, only a very few individuals with SARS were discovered. This was especially true of thermal screening. More than 35 million international travelers entering China, Canada, and Singapore had their temperatures measured, but no cases of SARS were found.18
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
d to comply. In fact, in some cases the imposition of quarantine orders produced a paradoxical result—people escaped. This may have contributed to the spread of SARS to remote parts of China.24 <span>Various types of travel screening were employed by a number of countries. Despite screening of millions of travelers, only a very few individuals with SARS were discovered. This was especially true of thermal screening. More than 35 million international travelers entering China, Canada, and Singapore had their temperatures measured, but no cases of SARS were found.18 Although the public health benefits of such measures are not clear, the resources required to implement them have been shown to be significant. Canada spent nearly $8 million (Canadian Annotation 4884553403660  In Canada, where 45% of SARS cases occurred among healthcare workers, a government-sponsored review found that, had the respiratory precautions and isolation policies that were eventually employed in the hospitals been in place at the beginning of the outbreak, many fewer people would have been infected.27 status not read Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner on the fly. After losing patients and staff to nosocomial transmission of SARS, affected hospitals were forced to figure out which infection control measures would halt the spread of infection. <span>In Canada, where 45% of SARS cases occurred among healthcare workers, a government-sponsored review found that, had the respiratory precautions and isolation policies that were eventually employed in the hospitals been in place at the beginning of the outbreak, many fewer people would have been infected.27 In both hospitals and health departments, advance planning, creation of information and communication systems, education and training, and stockpiling of supplies are necessary to enabl Annotation 4884556287244  While most people with SARS did not infect anyone else, the majority of SARS infections can be traced to a relatively small number of superspreading events in which 1 individual infected many other people.2 status not read Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner RS was estimated to be 2 to 4, this number only represents the average number of secondary cases caused by an infected person.28 The reality is that a huge range of transmission rates occurred. <span>While most people with SARS did not infect anyone else, the majority of SARS infections can be traced to a relatively small number of superspreading events in which 1 individual infected many other people.2 Superspreading is not a new phenomenon, having been described with tuberculosis, measles, and smallpox.29 It may well occur more frequently than is recognized in other contagious diseas Annotation 4884557860108  3 [SARS] superspreading events stand out as particularly puzzling: those that occurred at the Metropole Hotel and Amoy Gardens in Hong Kong, and the event in the emergency department (ED) of Scarborough Grace Hospital in Toronto. status not read Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner disease, diabetes, advanced age, and perhaps corticosteroid therapy. In addition to these host factors, high-risk aerosol-generating procedures were clearly associated with superspreading. But <span>3 superspreading events stand out as particularly puzzling: those that occurred at the Metropole Hotel and Amoy Gardens in Hong Kong, and the event in the emergency department (ED) of Scarborough Grace Hospital in Toronto. The event at the Metropole Hotel was the proximate cause of the SARS pandemic, as nearly all cases outside of China can be traced back to it.31 One infected individual stayed 1 night on Annotation 4884560219404  The [SARS superspreading] event at the Metropole Hotel was the proximate cause of the SARS pandemic, as nearly all cases outside of China can be traced back to it.31 One infected individual stayed 1 night on the ninth floor of the hotel and infected 16 other people on the same floor who then traveled across the globe before becoming ill. Although several possible explanations have been proposed, there is still no entirely satisfactory explanation for this event. status not read Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner out as particularly puzzling: those that occurred at the Metropole Hotel and Amoy Gardens in Hong Kong, and the event in the emergency department (ED) of Scarborough Grace Hospital in Toronto. <span>The event at the Metropole Hotel was the proximate cause of the SARS pandemic, as nearly all cases outside of China can be traced back to it.31 One infected individual stayed 1 night on the ninth floor of the hotel and infected 16 other people on the same floor who then traveled across the globe before becoming ill. Although several possible explanations have been proposed, there is still no entirely satisfactory explanation for this event. The ED of Scarborough Grace Hospital was the site of a chain of SARS transmission that led to most of the cases in Toronto. In particular, one event there stands out as unusual. The wif Flashcard 4884561792268 Tags #DataScience #machineLearning Question With ___ SSR or SSE, the prediction will be less accurate and the model will not be the best fit for the attributes. Answer higher status measured difficulty not learned 37% [default] 0 Annotation 4884563102988  The ED of Scarborough Grace Hospital was the site of a chain of SARS transmission that led to most of the cases in Toronto. In particular, one event there stands out as unusual. The wife of one of the SARS patients sat in the ED waiting room while her husband was being treated. Unbeknownst to the ED staff, she also had SARS, but her symptoms were mild. Despite having mild symptoms, she apparently infected a number of other people in the waiting room and possibly a number of the staff as well. This event contrasts with most other incidents of SARS transmission, which required close contact and occurred only when the patient had severe symptoms. Like the event in the Metropole Hotel, it would be useful to understand what factors contributed to this anomalous event. status not read Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner same floor who then traveled across the globe before becoming ill. Although several possible explanations have been proposed, there is still no entirely satisfactory explanation for this event. <span>The ED of Scarborough Grace Hospital was the site of a chain of SARS transmission that led to most of the cases in Toronto. In particular, one event there stands out as unusual. The wife of one of the SARS patients sat in the ED waiting room while her husband was being treated. Unbeknownst to the ED staff, she also had SARS, but her symptoms were mild. Despite having mild symptoms, she apparently infected a number of other people in the waiting room and possibly a number of the staff as well. This event contrasts with most other incidents of SARS transmission, which required close contact and occurred only when the patient had severe symptoms. Like the event in the Metropole Hotel, it would be useful to understand what factors contributed to this anomalous event. The Amoy Gardens event is even more disconcerting. Amoy Gardens is a high-rise apartment complex with 19,000 residents. One infected individual staying there infected 329 others. The be Annotation 4884565462284  The Amoy Gardens event is even more disconcerting. Amoy Gardens is a high-rise apartment complex with 19,000 residents. One infected individual staying there infected 329 others. The best explanation is that a malfunctioning plumbing system allowed the creation of a virus-laden aerosol plume that was blown outdoors, wafted hundreds of yards downwind, and infected people in other buildings through open windows.32 If this hypothesis is true, it undermines many assumptions about the transmission of infectious diseases. status not read Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner d close contact and occurred only when the patient had severe symptoms. Like the event in the Metropole Hotel, it would be useful to understand what factors contributed to this anomalous event. <span>The Amoy Gardens event is even more disconcerting. Amoy Gardens is a high-rise apartment complex with 19,000 residents. One infected individual staying there infected 329 others. The best explanation is that a malfunctioning plumbing system allowed the creation of a virus-laden aerosol plume that was blown outdoors, wafted hundreds of yards downwind, and infected people in other buildings through open windows.32 If this hypothesis is true, it undermines many assumptions about the transmission of infectious diseases. This leads to the consideration of how SARS was transmitted—that is, whether it involved droplets or aerosols. Much has been written on this topic, and there are strong opinions on both Flashcard 4884567821580 Tags #DataScience #machineLearning Question What are the requirements of the K-means algorithm? Answer • Number of clusters should be specified • More than one iteration should meet requisite criteria • Centroids should minimize inertia status measured difficulty not learned 37% [default] 0 Annotation 4884568345868  In most cases SARS transmission was blocked by simple droplet precautions, but in others it seems clear that aerosol transmission was the only logical explanation; this was especially true in certain hospital outbreaks (eg, Ward 8a of the Prince of Wales Hospital in Hong Kong33). Probably both droplets and aerosols were produced as patients coughed, and various host and environmental factors determined which mechanism was predominant at a particular time and place. status not read Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner h sides. The distinction is important, because the answer determines the appropriate infection control measures to be used. In fact, there is convincing evidence for both forms of transmission. <span>In most cases SARS transmission was blocked by simple droplet precautions, but in others it seems clear that aerosol transmission was the only logical explanation; this was especially true in certain hospital outbreaks (eg, Ward 8a of the Prince of Wales Hospital in Hong Kong33). Probably both droplets and aerosols were produced as patients coughed, and various host and environmental factors determined which mechanism was predominant at a particular time and place. Better understanding of this phenomenon is important, because if this is also true for other diseases, then the role of aerosol transmission in respiratory infections more generally mus Flashcard 4884569918732 Tags #machine-learning #management #software-engineering #unfinished Question The goal of [...] at Google is to bring significant, practical benefits to our users, and to do so rapidly, within a few years at most. Answer research status measured difficulty not learned 37% [default] 0 Parent (intermediate) annotation Open it The goal of research at Google is to bring significant, practical benefits to our users, and to do so rapidly, within a few years at most. Original toplevel document (pdf) cannot see any pdfs Flashcard 4884570967308 Tags #machine-learning #management #software-engineering #unfinished Question The goal of research at Google is to bring significant, [...] benefits to our users, and to do so rapidly, within a few years at most. Answer practical status measured difficulty not learned 37% [default] 0 Parent (intermediate) annotation Open it The goal of research at Google is to bring significant, practical benefits to our users, and to do so rapidly, within a few years at most. Original toplevel document (pdf) cannot see any pdfs Flashcard 4884572015884 Tags #machine-learning #management #software-engineering #unfinished Question The goal of research at Google is to bring significant, practical benefits to our users, and to do so [...] Answer rapidly, within a few years at most. status measured difficulty not learned 37% [default] 0 Parent (intermediate) annotation Open it The goal of research at Google is to bring significant, practical benefits to our users, and to do so rapidly, within a few years at most. Original toplevel document (pdf) cannot see any pdfs Annotation 4884577520908  The standard (unit) softmax function $${\displaystyle \sigma :\mathbb {R} ^{K}\to \mathbb {R} ^{K}}$$is defined by the formula $${\displaystyle \sigma (\mathbf {z} )_{i}={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}{\text{ for }}i=1,\dotsc ,K{\text{ and }}\mathbf {z} =(z_{1},\dotsc ,z_{K})\in \mathbb {R} ^{K}}$$ status not read Softmax function - Wikipedia ts will correspond to larger probabilities. Softmax is often used in neural networks , to map the non-normalized output of a network to a probability distribution over predicted output classes. <span>The standard (unit) softmax function σ : R K → R K {\displaystyle \sigma :\mathbb {R} ^{K}\to \mathbb {R} ^{K}} is defined by the formula σ ( z ) i = e z i ∑ j = 1 K e z j for i = 1 , … , K and z = ( z 1 , … , z K ) ∈ R K {\displaystyle \sigma (\mathbf {z} )_{i}={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}{\text{ for }}i=1,\dotsc ,K{\text{ and }}\mathbf {z} =(z_{1},\dotsc ,z_{K})\in \mathbb {R} ^{K}} In words: we apply the standard exponential function to each element z i {\displaystyle z_{i}} of the input vector z {\displaystyle \mathbf {z} } and normalize these values by dividing Annotation 4884579618060  #bert #knowledge-base-construction #nlp #unfinished For a sentence s with two target entities e 1 and e 2 , to make the BERT module capture the location information of the two entities, at both the begin- ning and end of the first entity, we insert a spe- cial token ‘$’, and at both the beginning and end of the second entity, we insert a special token ‘#’. We also add ‘[CLS]’ to the beginning of each sen- tence.
status not read

pdf

cannot see any pdfs

Annotation 4884581190924

 In the first few years following the SARS pandemic, “respiratory etiquette” became the “new normal” in hospitals—anyone with a cough had a surgical mask placed on them at the ED door, and aerosol-generating procedures were done only in closed rooms with staff wearing PPE
status not read

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
health. Important lessons were also learned in the area of emergency management at the municipal, provincial/state, and national levels34 and in the realm of international treaties.35 Hospitals <span>In the first few years following the SARS pandemic, “respiratory etiquette” became the “new normal” in hospitals—anyone with a cough had a surgical mask placed on them at the ED door, and aerosol-generating procedures were done only in closed rooms with staff wearing PPE—and it was said that things would never be the same again. But now when we walk the halls of hospitals, this “new normal” for infection control is hard to detect. If SARS were transmitt

Annotation 4884582763788

 #bert #knowledge-base-construction #nlp #unfinished In BERT, the input representation of each token is the sum of its token, segment and position embeddings.
status not read

pdf

cannot see any pdfs

Annotation 4884584336652

 #bert #knowledge-base-construction #nlp #unfinished [CLS]’ is appended to the beginning of each sequence as the first token of the sequence. The fi- nal hidden state from the Transformer output cor- responding to the first token is used as the sen- tence representation for classification tasks. In case there are two sentences in a task, ‘[SEP]’ is used to separate the two sentences
status not read

pdf

cannot see any pdfs

Annotation 4884585909516

 #bert #knowledge-base-construction #nlp #unfinished ERT pre-trains the model parameters by us- ing a pre-training objective: the masked language model (MLM), which randomly masks some of the tokens from the input, and set the optimiza- tion objective to predict the original vocabulary id of the masked word according to its context.
status not read

pdf

cannot see any pdfs

Annotation 4884587482380

 #bert #knowledge-base-construction #nlp #unfinished Un- like left-to-right language model pre-training, the MLM objective can help a state output to utilize both the left and the right context, which allows a pre-training system to apply a deep bidirectional Transformer.
status not read

pdf

cannot see any pdfs

Annotation 4884589055244

 #bert #knowledge-base-construction #nlp #unfinished Besides the masked language model, BERT also trains a “next sentence prediction” task that jointly pre-trains text-pair representations.
status not read

pdf

cannot see any pdfs

Flashcard 4884591414540

Tags
#DataScience #machineLearning
Question
Natural Language Processing (NLP)
Answer
Natural language processing is an automated way to understand and analyze natural human languages
and extract information from such data by applying machine algorithms.

status measured difficulty not learned 37% [default] 0

Annotation 4884592725260

 #nlp #reading-group #transformer #unfinished with multi-headed attention we have not only one, but multiple sets of Query/Key/Value weight matrices (the Transformer uses eight attention heads, so we end up with eight sets for each encoder/decoder). Each of these sets is randomly initialized. Then, after training, each set is used to project the input embeddings (or vectors from lower encoders/decoders) into a different representation subspace.
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
e animal didn’t cross the street because it was too tired”, we would want to know which word “it” refers to. It gives the attention layer multiple “representation subspaces”. As we’ll see next, <span>with multi-headed attention we have not only one, but multiple sets of Query/Key/Value weight matrices (the Transformer uses eight attention heads, so we end up with eight sets for each encoder/decoder). Each of these sets is randomly initialized. Then, after training, each set is used to project the input embeddings (or vectors from lower encoders/decoders) into a different representation subspace. With multi-headed attention, we maintain separate Q/K/V weight matrices for each head resulting in different Q/K/V matrices. As we did before, we multiply X by the WQ/WK/WV matrices to

Flashcard 4884594298124

Tags
#has-images #nlp #reading-group #transformer #unfinished
Question
[default - edit me]
Answer

The feed-forward layer is not expecting eight matrices – it’s expecting a single matrix (a vector for each word). So we need a way to condense these eight down into a single matrix.

How do we do that? We concat the matrices then multiple them by an additional weights matrix WO.

status measured difficulty not learned 37% [default] 0
Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
same self-attention calculation we outlined above, just eight different times with different weight matrices, we end up with eight different Z matrices This leaves us with a bit of a challenge. <span>The feed-forward layer is not expecting eight matrices – it’s expecting a single matrix (a vector for each word). So we need a way to condense these eight down into a single matrix. How do we do that? We concat the matrices then multiple them by an additional weights matrix WO. That’s pretty much all there is to multi-headed self-attention. It’s quite a handful of matrices, I realize. Let me try to put them all in one visual so we can look at them in one place

Annotation 4884595346700

 #nlp #reading-group #transformer #unfinished As we encode the word "it", one attention head is focusing most on "the animal", while another is focusing on "tired" -- in a sense, the model's representation of the word "it" bakes in some of the representation of both "animal" and "tired". If we add all the attention heads to the picture, however, things can be harder to interpret:
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
Now that we have touched upon attention heads, let’s revisit our example from before to see where the different attention heads are focusing as we encode the word “it” in our example sentence: <span>As we encode the word "it", one attention head is focusing most on "the animal", while another is focusing on "tired" -- in a sense, the model's representation of the word "it" bakes in some of the representation of both "animal" and "tired". If we add all the attention heads to the picture, however, things can be harder to interpret: Representing The Order of The Sequence Using Positional Encoding One thing that’s missing from the model as we have described it so far is a way to account for the order of the words in

Flashcard 4884597443852

Tags
#DataScience #machineLearning
Question

NLP Terminology

1. Tokenization
2. Stemming
3. Tf-idf
4. Semantic analytics
5. Disambiguation
6. Topic models
7. Word boundaries
Answer
1. Tokenization
Splits words, phrases, and idioms
2. Stemming
Maps to the valid root word
3. Tf-idf
Represents term frequency and inverse document frequency
4. Semantic analytics
Compares words, phrases, and idioms in a set of documents to extract meaning
5. Disambiguation
Determines meaning and sense of words (context vs. intent)
6. Topic models
Discover topics in a collection of documents
7. Word boundaries
Determines where one word ends and the other begins

status measured difficulty not learned 37% [default] 0

Annotation 4884597968140

 #has-images #nlp #reading-group #transformer #unfinished the transformer adds a vector to each input embedding. These vectors follow a specific pattern that the model learns, which helps it determine the position of each word, or the distance between different words in the sequence. The intuition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention.
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
Sequence Using Positional Encoding One thing that’s missing from the model as we have described it so far is a way to account for the order of the words in the input sequence. To address this, <span>the transformer adds a vector to each input embedding. These vectors follow a specific pattern that the model learns, which helps it determine the position of each word, or the distance between different words in the sequence. The intuition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention. To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern. If we assumed the embedding has a dimensionalit

Annotation 4884599541004

 #has-images #nlp #reading-group #transformer #unfinished To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern. If we assumed the embedding has a dimensionality of 4, the actual positional encodings would look like this:
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
uition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention. <span>To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern. If we assumed the embedding has a dimensionality of 4, the actual positional encodings would look like this: A real example of positional encoding with a toy embedding size of 4 What might this pattern look like? In the following figure, each row corresponds the a positional encoding of a vect

Annotation 4884602424588

 #nlp #reading-group #transformer #unfinished The formula for positional encoding is described in the paper (section 3.5). You can see the code for generating positional encodings in get_timing_signal_1d(). This is not the only possible method for positional encoding. It, however, gives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set).
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
generated by one function (which uses sine), and the right half is generated by another function (which uses cosine). They're then concatenated to form each of the positional encoding vectors. <span>The formula for positional encoding is described in the paper (section 3.5). You can see the code for generating positional encodings in get_timing_signal_1d() . This is not the only possible method for positional encoding. It, however, gives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set). The Residuals One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connec

Annotation 4884603997452

 #has-images #nlp #reading-group #transformer #unfinished One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connection around it, and is followed by a layer-normalization step.
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
ives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set). The Residuals <span>One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connection around it, and is followed by a layer-normalization step. If we’re to visualize the vectors and the layer-norm operation associated with self attention, it would look like this: This goes for the sub-layers of the decoder as well. If we’re to

Annotation 4884607143180

 #nlp #reading-group #transformer #unfinished After finishing the encoding phase, we begin the decoding phase. Each step in the decoding phase outputs an element from the output sequence (the English translation sentence in this case).
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
into a set of attention vectors K and V. These are to be used by each decoder in its “encoder-decoder attention” layer which helps the decoder focus on appropriate places in the input sequence: <span>After finishing the encoding phase, we begin the decoding phase. Each step in the decoding phase outputs an element from the output sequence (the English translation sentence in this case). The following steps repeat the process until a special symbol is reached indicating the transformer decoder has completed its output. The output of each step is fed to the bottom decode

Annotation 4884608716044

 #nlp #reading-group #transformer #unfinished The output of each step is fed to the bottom decoder in the next time step, and the decoders bubble up their decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word.
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
t sequence (the English translation sentence in this case). The following steps repeat the process until a special symbol is reached indicating the transformer decoder has completed its output. <span>The output of each step is fed to the bottom decoder in the next time step, and the decoders bubble up their decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word. The self attention layers in the decoder operate in a slightly different way than the one in the encoder: In the decoder, the self-attention layer is only allowed to attend to earlier p

Flashcard 4884610288908

Tags
#DataScience #machineLearning #python
Question
NLP scikit learn. ___ is used to convert text data into numerical feature vectors with a fixed size.
Answer
Bag of words

status measured difficulty not learned 37% [default] 0

Annotation 4884610813196

 #nlp #reading-group #transformer #unfinished The self attention layers in the decoder operate in a slightly different way than the one in the encoder: In the decoder, the self-attention layer is only allowed to attend to earlier positions in the output sequence. This is done by masking future positions (setting them to -inf) before the softmax step in the self-attention calculation. The “Encoder-Decoder Attention” layer works just like multiheaded self-attention, except it creates its Queries matrix from the layer below it, and takes the Keys and Values matrix from the output of the encoder stack.
status not read

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
ir decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word. <span>The self attention layers in the decoder operate in a slightly different way than the one in the encoder: In the decoder, the self-attention layer is only allowed to attend to earlier positions in the output sequence. This is done by masking future positions (setting them to -inf) before the softmax step in the self-attention calculation. The “Encoder-Decoder Attention” layer works just like multiheaded self-attention, except it creates its Queries matrix from the layer below it, and takes the Keys and Values matrix from the output of the encoder stack. The Final Linear and Softmax Layer The decoder stack outputs a vector of floats. How do we turn that into a word? That’s the job of the final Linear layer which is followed by a Softmax

Annotation 4884618153228

 Pasteur's quadrant is a classification of scientific research projects that seek fundamental understanding of scientific problems, while also having immediate use for society. Louis Pasteur's research is thought to exemplify this type of method, which bridges the gap between "basic" and "applied" research.[1] The term was introduced by Donald E. Stokes in his book, Pasteur's Quadrant.[
status not read

Pasteur's quadrant - Wikipedia
Pasteur's quadrant - Wikipedia Pasteur's quadrant From Wikipedia, the free encyclopedia Jump to navigation Jump to search Pasteur's quadrant is a classification of scientific research projects that seek fundamental understanding of scientific problems, while also having immediate use for society. Louis Pasteur 's research is thought to exemplify this type of method, which bridges the gap between "basic " and "applied " research.[1] The term was introduced by Donald E. Stokes in his book, Pasteur's Quadrant.[2] Applied and basic research[edit ] As shown in the following table, scientific research can be classified by whether it advances human knowledge by seeking a fundamental understanding

Annotation 4884619726092

#has-images
Applied and Basic research
Considerations of use?
No Yes
Quest for

fundamental
understanding?

Yes

Pure basic

research

Use-inspired

basic research

No

Pure applied

research

The result is three distinct classes of research:

1. Pure basic research, exemplified by the work of Niels Bohr, early 20th century atomic physicist.
2. Pure applied research, exemplified by the work of Thomas Edison, inventor.
3. Use-inspired basic research, described here as "Pasteur's Quadrant".
status not read

Flashcard 4884626017548

Tags
#machine-learning #management #software-engineering #unfinished
Question
we note that in the terminology of Pasteur’s Quadrant, 11 we do [...] (CS) research.
Answer
“use-inspired basic” and “pure ap- plied”

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
we note that in the terminology of Pasteur’s Quadrant, 11 we do “use-inspired basic” and “pure ap- plied” (CS) research.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4884627590412

Question

In this sentence, what does "promote" mean?

All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production.

Answer
The act of copying file content from a less controlled location into a more controlled location.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Not only do we have to manage the software code artifacts but also the data sets, the machine learning models, and the parameters and hyperparameters used by such models. All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production.

Original toplevel document

Sato,Wider,Windheuser_2019_Continuous-delivery_thoughtworks
icient collaboration and alignment. However, this integration also brings new challenges when compared to traditional software development. These include: A higher number of changing artifacts. <span>Not only do we have to manage the software code artifacts but also the data sets, the machine learning models, and the parameters and hyperparameters used by such models. All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production. It’s harder to achieve versioning, quality control, reliability, repeatability and audibility in that process. Size and portability: Training data and machine learning models usually co

Annotation 4884629949708

 #knowledge-base-construction #machine-learning #nlp #unfinished At the core of Alexandria is a probabilistic program that defines a process of generating text from a knowledge base consisting of a large set of typed entities.
status not read

pdf

cannot see any pdfs

Annotation 4884632309004

 #knowledge-base-construction #machine-learning #nlp #unfinished By applying probabilistic inference to this program, we can reason in the inverse direction: going from text back to facts.
status not read

pdf

cannot see any pdfs

Annotation 4884633881868

 #knowledge-base-construction #machine-learning #nlp #unfinished The use of a probabilistic program also provides an elegant way to handle the uncertainty inherent in natural text.
status not read

pdf

cannot see any pdfs

Annotation 4884635454732

 #knowledge-base-construction #machine-learning #nlp #unfinished An important advantage of using a generative model is that Alexandria does not require labelled data, which means it can be applied to new domains with little or no manual effort. The model is also inherently task-neutral – by varying which variables in the model are observed and which are inferred, the same model can be used for: learning a schema (relation discovery), entity discovery, entity linking, fact retrieval and other tasks, such as finding sources that support a particular fact.
status not read

pdf

cannot see any pdfs

Annotation 4884637027596

 #knowledge-base-construction #machine-learning #nlp #unfinished In this paper we demonstrate schema learning, fact retrieval, entity discovery and entity linking. We will evaluate the former two tasks, while the latter two are performed as part of these main tasks.
status not read

pdf

cannot see any pdfs

Annotation 4884640435468

 #knowledge-base-construction #machine-learning #nlp #unfinished An attractive aspect of our approach is that the entire system is defined by one coherent probabilistic model. This removes the need to create and train many separate components such as tokenizers, named entity recognizers, part-of-speech taggers, fact extractors, linkers and so on; a disadvantage of having such multiple components is that they are likely to encode different underlying assumptions, reducing the accuracy of the combined system. Furthermore, the use of a single probabilistic program allows uncertainty to be propagated consistently throughout the system – from the raw web text right through to the extracted facts (and back).
status not read

pdf

cannot see any pdfs

Flashcard 4885281901836

Question
congenital
Answer
[default - edit me]

status measured difficulty not learned 37% [default] 0

pdf

cannot see any pdfs

Flashcard 4885282950412

Question
RNA Polymerase in Eukaryotes
Answer
[default - edit me]

status measured difficulty not learned 37% [default] 0

pdf

cannot see any pdfs

Flashcard 4885284785420

Question
RNA polymerisa in eukaryotes
Answer
<p>RNA polymerase I makes rRNA, the most common (rampant) type; present only in nucleolus. RNA polymerase II makes mRNA (massive), microRNA (miRNA), and small nuclear RNA (snRNA). RNA polymerase III makes 5S rRNA, tRNA (tiny). No proofreading function, but can initiate chains. RNA polymerase II opens DNA at promoter site. I, II, and III are numbered in the same order that their products are used in protein synthesis: rRNA, mRNA, then tRNA. &alpha;-amanitin, found in Amanita phalloides (death cap mushrooms), inhibits RNA polymerase II. Causes severe hepatotoxicity if ingested. Actinomycin D, also called dactinomycin, inhibits RNA polymerase in both prokaryotes and eukaryotes</p>

status measured difficulty not learned 37% [default] 0

pdf

cannot see any pdfs

Annotation 4885287144716

 #machine-learning #management #software-engineering #unfinished Because of the time frame and ef- fort involved, Google’s approach to re- search is iterative and usually involves writing production, or near-produc- tion, code from day one.
status not read

pdf

cannot see any pdfs

Annotation 4885288717580

 #machine-learning #management #software-engineering #unfinished Typically, a single team iteratively ex- plores fundamental research ideas, de- velops and maintains the software, and helps operate the resulting Google ser- vices—all driven by real-world experi- ence and concrete data.
status not read

pdf

cannot see any pdfs

Annotation 4885290290444

 #machine-learning #management #software-engineering #unfinished This approach also helps ensure the research efforts produce results that benefit Google’s users, by allowing research ideas and implementations to be honed on em- pirical data and real-world constraints, and by utilizing even failed efforts to gather valuable data and statistics for further attempts.
status not read

pdf

cannot see any pdfs

Annotation 4885291863308

 #machine-learning #management #software-engineering #unfinished Google’s mission “To organize the world’s information and make it uni- versally accessible and useful,”
status not read

pdf

cannot see any pdfs

Annotation 4885293436172

 #machine-learning #management #software-engineering #unfinished Even a small team has at its disposal the power of many internal services, allowing the team to quickly create complex and powerful products and services. Design, testing, production, and maintenance pro- cesses are simplified.
status not read

pdf

cannot see any pdfs

Annotation 4885295009036

 #machine-learning #management #software-engineering #unfinished Google has been able to hire a tal- ented team across the entire engineer- ing operation. This gives us the op- portunity to innovate everywhere, and for people to move between projects, whether they be primarily research or primarily engineering.
status not read

pdf

cannot see any pdfs

Flashcard 4885296581900

Tags
#DataScience #machineLearning
Question
NLP. choice of model for supervised and unsupervised.
Answer

Supervised
Models predict the outcome of new observations and datasets, and classify
documents based on the features and response of a given dataset.
Eg: Naïve Bayes, SVM, linear regression, K-NN neighbors

Unsupervised
Models identify patterns in the data and extract its structure.
They are also used to group documents using clustering algorithms.
Example: K-means

status measured difficulty not learned 37% [default] 0

Annotation 4885298416908

 #machine-learning #management #software-engineering #unfinished We recognize that the wide dissemination of fundamental results often benefits us by garnering valuable feedback, educating future hires, providing collaborations, and seeding additional work.
status not read

pdf

cannot see any pdfs

Flashcard 4885299989772

Tags
#DataScience #machineLearning
Question

NLP. most basic technique for classification of text.

Advantages:

Uses:

Answer

Naïve Bayes Classifier

Advantages:
• It is efficient as it uses limited CPU and memory.
• It is fast as the model training takes less time.

Uses:
• Naïve Bayes is used for sentiment analysis, email spam detection, categorization of documents, and language detection.
• Multinomial Naïve Bayes is used whenmultiple occurrences of the words matter.

status measured difficulty not learned 37% [default] 0

Annotation 4885300514060

 #machine-learning #management #software-engineering #unfinished Even if we cannot fully factorize work, we have sometimes undertaken longer-term efforts. For example, we have started multiyear, large systems efforts (in- cluding Google Translate, Chrome, Google Health) that have important research components. These projects were characterized by the need for complex systems and research (such as Web-scale identification of paral- lel corpora for Translate 12 and various complex security features in Chrome 9 and Health). At the same time, we have recently shown that even in longer- term, publicly launched efforts, we are unafraid to refocus our work (for exam- ple, Google Health), if it seems we are not achieving success.
status not read

pdf

cannot see any pdfs

Annotation 4885302873356

 #machine-learning #management #software-engineering #unfinished this approach benefits from the mainly evolutionary nature of CS research, where great results are usu- ally the composition of many discrete steps.
status not read

pdf

cannot see any pdfs

Annotation 4885304446220

 #machine-learning #management #software-engineering #unfinished we have structured the Google environment as one where new ideas can be rapidly verified by small teams through large-scale experiments on real data, rather than just debated.
status not read

pdf

cannot see any pdfs

Annotation 4885306019084

 #machine-learning #statistics #unfinished First, studies often apply cross-validation on a subset of data subsampled from the original dataset. Performing this kind of preprocessing, in a machine learning context, without any kind of argumentation, raises doubts as it drastically increases the variance of the obtained results and avoids the problem of imbalanced data, which does not reflect reality in terms of potential applications
status not read

pdf

cannot see any pdfs

Annotation 4885307591948

 #machine-learning #statistics #unfinished Finally, there are many studies applying over- sampling before partitioning the data into two mutually exclusive sets in order to make the distribution of classes more uniform
status not read

pdf

cannot see any pdfs

Flashcard 4885309689100

Tags
#DataScience #machineLearning
Question
Document classifiers can have many parameters and a __ approach helps to search the best parameters
for model training and predicting the outcome accurately.
Answer
Grid Search

status measured difficulty not learned 37% [default] 0

Flashcard 4885311524108

Tags
#DataScience #machineLearning
Question
What is the tf-idf value in a document?
Answer

td-idf value reflects how important a word is to a document.

Directly proportional to the number of times a word appears.

Offset by frequency of the words in corpus.

status measured difficulty not learned 37% [default] 0

Annotation 4885312048396

 #machine-learning #statistics #unfinished they might be rather optimistic due to the fact that the evaluation happened in a leave-one-out scheme.
status not read

pdf

cannot see any pdfs

Annotation 4885314407692

 #machine-learning #statistics #unfinished While this subsampling strategy again avoids the problem of imbalanced data, which is reflected in the original dataset, it does show an improvement in AUC and thus indicates that adding the MEMD-based feature to the dataset could be beneficial for the predictive performance. More- over, due to the many repetitions of the experiment, the sample mean better reflects the real mean.
status not read

pdf

cannot see any pdfs

Flashcard 4885316504844

Tags
#DataScience #python
Question
Python’s data visualization library
Answer
matplotlib

status measured difficulty not learned 37% [default] 0

Flashcard 4885318339852

Tags
#DataScience
Question
create a plot using four simple steps.
Answer
Step 01: Import the required libraries
Step 02: Define or import the required dataset
Step 03: Set the plot parameters
Step 04: Display the created plot

status measured difficulty not learned 37% [default] 0

Flashcard 4885320174860

Tags
#DataScience #python
Question
matplotlib, subplot syntax
Answer

subplot(m,n,p).

It divides the current window into an m-by-n grid and creates an axis for a subplot in the position specified by p.

status measured difficulty not learned 37% [default] 0

Flashcard 4885322009868

Tags
#DataScience #python
Question
matplotlib. method used to adjust the distances between the subplots?
Answer
plt.subplots_adjust()

status measured difficulty not learned 37% [default] 0

Flashcard 4885323844876

Tags
#DataScience #python
Question
What is Seaborn?
Answer
Seaborn is a Python visualization library based on matplotlib.
It provides a high-level interface to draw attractive statistical graphics.

status measured difficulty not learned 37% [default] 0

Flashcard 4885325679884

Tags
#DataScience #python
Question
To import matplotlib and display the plot on Jupyter notebook use:
Answer

import matplotlib .pyplot as plt

%matplotlib inline

status measured difficulty not learned 37% [default] 0

Flashcard 4885327514892

Tags
#DataScience #python
Question
Which keywords is used to decide the transparency of the plot line? (in matplotlib)
Answer
Alpha

status measured difficulty not learned 37% [default] 0

Flashcard 4885329349900

Tags
#DataScience #python
Question
matplotlib statements limits both x and y axes to the interval [0, 6]?
Answer
plt.axis([0, 6, 0, 6]) statement limits both x and y axes to the interval [0, 6].

status measured difficulty not learned 37% [default] 0

Flashcard 4885331184908

Tags
#DataScience #machineLearning
Question
What is Web Scraping
Answer
Web scraping is a computer software technique of extracting information from websites in an automated fashion.

status measured difficulty not learned 37% [default] 0

Flashcard 4885333019916

Tags
#DataScience #machineLearning
Question
Web Scraping Process
Answer
Step 1: A web request is sent to the targeted website to collect the required data.
Step 2: The information is retrieved from the targeted website in HTML or XML format from web.
Step 3: The retrieved information is parsed to the several parsers based on the data format.
Parsing is a technique to read data and extract information from the available document.
Step 4: The parsed data is stored in the desired format.

status measured difficulty not learned 37% [default] 0

Flashcard 4885334854924

Tags
#DataScience #machineLearning
Question
Web Scraping Considerations (legal), what to look for
Answer
Legal Constraints
Notice
Trademark Material
Patented Information
Copyright

status measured difficulty not learned 37% [default] 0

Flashcard 4885336689932

Tags
#DataScience #machineLearning
Question
webscrapping tree structure
Answer
html > div > ul > lil > div class

status measured difficulty not learned 37% [default] 0

Flashcard 4885338524940

Tags
#DataScience #machineLearning #python
Question
web scraping. The ___ function searches and retrieves all tags’ descendants that matches your filters.
Answer
find_all()

status measured difficulty not learned 37% [default] 0

Flashcard 4885340359948

Tags
#DataScience #machineLearning #python
Question
web scraping. To find one result, use
Answer

find().

Returns only the first match value

status measured difficulty not learned 37% [default] 0

Flashcard 4885342194956

Tags
#DataScience #machineLearning #python
Question
web scraping. The method get_text() is used to _________.
Answer
parse only part of the document.

status measured difficulty not learned 37% [default] 0

Flashcard 4885344029964

Tags
#DataScience #machineLearning #python
Question
web scraping.
navigate down:
Navigating Up:
Navigating Sideways:
Navigating Back and Forth:
Answer

web scraping.
navigate down:
• .contents and .children
• .descendants
• .string
• .strings and stripped_strings

Navigating Up:
.parents and .parent

Navigating Sideways:
.next_sibling and
.previous_sibling.

Navigating Back and Forth:
.next_element and .previous_element
.next_elements and .previous_elements

status measured difficulty not learned 37% [default] 0

Annotation 4886699052300

 Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
status not read

pdf

cannot see any pdfs

Flashcard 4886700625164

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) [...] ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
Answer
balanço patrimonial

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) <span>balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d)

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4886702198028

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado [...]; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
Answer
do período

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado <span>do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4886703770892

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado [...] do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
Answer
abrangente

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
rações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado <span>abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme N

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4886705343756

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das [...] do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
Answer
mutações

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das <span>mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionad

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4886731296012

 assumes independence between the posterior distribution of the parameters associated with segments of data between successive changepoints
status not read

pdf

cannot see any pdfs

Annotation 4886767734028

 generalisation of that suggested by Liu and Lawrence (1999)
status not read

pdf

cannot see any pdfs

Annotation 4886769306892

 We consider two classes of prior for the changepoint pro- cess. One, that of Green (1995), involves a prior on the number of changepoints, and then a conditional prior on their position. The other is based on modelling the changepoint process by a point process (Pievatolo and Green, 1998), and is a special case of a product-partion model (Hartigan, 1990).
status not read

pdf

cannot see any pdfs

Annotation 4886771666188

 we assume that, conditional on the realisation of the changepoint process, the joint posterior distribution of the parameters is independent across the segments of the time series
status not read

pdf

cannot see any pdfs

Annotation 4886773239052

 assume a conjugate prior for the parameters associated with each segment
status not read

pdf

cannot see any pdfs

Annotation 4886775598348

 For a data set consisting of observations at discrete times, 1,...,n, the recursions are based on calcu- lating the probability of the data from time t to time n,given a changepoint at time t, in terms of the equivalent probabili- ties at times t + 1,...,n.
status not read

pdf

cannot see any pdfs

Annotation 4886777171212

 The assumption of conjugate priors can potentially be relaxed, but with an increase in the computational cost. Essentially, low-dimensional integrals that can be calculated analytically under conjugate priors would need to be calculated numerically (for example see Section 4.2).
status not read

pdf

cannot see any pdfs

Annotation 4886779530508

 Relaxation of the independence assumption is more difficult, but our algorithm can still be used as a useful tool for analysing such data.
status not read

pdf

cannot see any pdfs

Flashcard 4886852406540

Tags
#has-images

status measured difficulty not learned 37% [default] 0

pdf

cannot see any pdfs