Edited, memorised or added to reading list

on 27-Jan-2020 (Mon)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Flashcard 1602824572172

Tags
#broker #estate #real
Question
Middle Ages under old English Law had transfers of Real Property by what means?
Answer
FIGURE 2.2: The Bundle of Legal Rights

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4881260612876

Tags
#python
Question
Python shape manipulation functions
Answer
Flatten, Resize, Stack , Reshape, Split
Flatten: array.ravel()
Reshape: array.reshape(3,4) #3 rows, 4 columns
Resize: array.resize(2,6) #2 resizes again to 2rows, 6 columns
Split: np.hsplit(array,2) #splits array to 2
Stack: np.hstack((array1,array2,2))

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4881262447884

Tags
#python
Question
Linear Algebra functions. transpose, inverse, trace.
Answer

array=np.array([[1,2,3,4],[5,6,7,8]])

array.transpose()
np.linalg.inv(array) #eg 1/value
np.trace(array) #sum of diagonals left to right only


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4883816254732

Tags
#python
Question

create a panda series.

from which type of data?

Answer

import pandas as pd

someSeries = pd.Series(list/nd.array)
someSeries = pd.Series(5. , index['a','b','c'])
someSeries = pd.Series([1,2,3] , index['a','b','c']) #like a dictionary


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884405292300

Tags
#python
Question

What is a python dataframe?

(number of dimensions, same/different data types)

Answer

DataFrame is a

  • two-dimensional
  • labeled data structure with columns of
  • potentially different types.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884407127308

Tags
#python
Question

Syntax for Creating DataFrames from

  1. Lists
  2. Dictionary
  3. Series
  4. nd.array
Answer

Syntax for Creating DataFrames
pd.Dataframe

  1. Lists
    pd.Dataframe{'columnName1':['val1','val2'], 'columnName2':[1,2]}
  2. Dictionary
    pd.Dataframe{'columnName1':{key1:value1}, 'columnName2':[key2:val2]}
  3. Series
    series1 = pd.Series([values],index=[indexes])
    series2 = pd.Series(....)
    #they have the same kind of indexes, eg by year.
    newDF = pd.Dataframe({'colName1':series1,'colName2':series2})
  4. nd.array
    Create an ndarrays with years. np.array([2001,2020,2019])
    Create a dict with the ndarray. dict = {'year':np_arr}
    Pass this dict to a new DataFrame. df = pd.DataFrame(dict)
    #it will have an index sequenced, column name year, with the values in the np array

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884410535180

Tags
#DataScience #python
Question
Handle Missing Values with Functions. (2 ways)
Answer
  1. Dropping the NaN (null) values. Use .dropna()
  2. Filling the NaN values with something else.
    fill with zeros. .fillna(0)
    #can also fill with mean of the data.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884412370188

Tags
#DataScience #python
Question
Custom functions can be applied to the dataframe.
Name it's use and syntax
Answer

Eg, for creating new features, or standardizing.
custom functions can be applied with the applymap method.

df.applymap(functionName)


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884414205196

Tags
#DataScience #python
Question

Dataframe statistical functions

Answer
.max()
.min()
.mean()
.std()
etc.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884416040204

Tags
#DataScience #python
Question
Data Operation Using Groupby - syntax.
Answer

grouped = df.groupby(field)

extract = grouped.get_group(wantedValue)


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884417875212

Tags
#DataScience #python
Question
dataframe Data Operation – Sorting
Answer
df.sort_by('columnName')

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884419710220

Tags
#DataScience #python #statistics
Question
Data Standardization applied on data. (define a function)
Answer
def standardize(test):
return(test-test.mean())/test.std()
#test. standardize(df['Test1'])
def standardizeResult(dataFrame):
return dataFrame.apply(standardize)
standardizeResult(df)
#get a dataframe with standardized figures. eg most within +-3 sd (standard deviations)

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884421545228

Tags
#python
Question

python syntax for

  1. indexing by label
  2. indexing by position
Answer

python syntax for

  1. indexing by label: loc
  2. indexing by position: iloc

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884423380236

Tags
#python
Question
While viewing a dataframe, head() method will _____.
Answer
The default value is 5 if nothing is passed in head method. So it will return the first five rows of the DataFrame.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884425215244

Tags
#DataScience #machineLearning
Question

Machine Learning Terminology:

  1. Columns
  2. Rows
  3. Outcome
Answer

Machine Learning Terminology:

  1. Columns: Features, attributes, inputs
  2. Rows: Observations, samples, records
  3. Outcome: Response, target, label

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884427836684

Tags
#DataScience #machineLearning
Question
Machine Learning Approach/Steps
Answer
  1. Understand the problem/dataset. Also deal with the outliers and null values?
  2. Extract the features from the dataset. Check correlations, meaningful fields.
  3. Identify the problem type. Continuous/Catergorical?
  4. Choose the right model. Linear regression, logistic regression, clustering?
  5. Train and test the model. Check accuracy, errors.
  6. Strive for accuracy. Play with factors or relook at features if required.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884430195980

Tags
#DataScience #machineLearning
Question
What is Supervised Learning
Answer
  1. The dataset used to train a model should have observations, features, and responses.
    The model is trained to predict the “right” response for a given set of data points.
  2. Supervised learning models are used to predict an outcome.
  3. The goal of this model is to “generalize” a dataset so that the “general rule” can be applied to new data as well.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884432030988

Tags
#DataScience #machineLearning
Question
What is unsupervised learning
Answer
  1. In unsupervised learning, the response or the outcome of the data is unknown.
  2. Supervised learning models are used to identify and visualize patterns in data by grouping similar types of data.
  3. The goal of this model is to “represent” data in a way that meaningful information can be extracted.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884433865996

Tags
#DataScience #machineLearning
Question
Identify the Problem Type and Learning Model for supervised/unsupervised learning.
Answer

Supervised
Continuous: Linear regression
Catergorical: Classification, logistic regression

Unsupervised
Continuous: Dimensionality reduction
Catergorical: Clustering


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884435701004

Tags
#DataScience #machineLearning #python
Question
Scikit-Learn Considerations
Answer
  • Create separate objects for feature and response.
  • Ensure that features and response have only numeric values.
  • Features and response should be in the form of a NumPy ndarray.
  • Since features and response would be in the form of arrays, they would have shapes and sizes.
  • Features are always mapped as x, and response is mapped as y.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884437536012

Tags
#DataScience #machineLearning #python
Question
The estimator instance in Scikit-learn is a _____.
Answer
The estimator instance or object is a model.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884439371020

Tags
#DataScience #mathematics-basic
Question
simple linear equation
Answer

y = mx + c


𝑦 = β0 + β1𝑥 + u
(u is the residuals)


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884501761292

Tags
#DataScience #mathematics
Question
Errors in linear regression
Answer

SSR ~
Regression Sum of Squares
between the observed value - regression line
the sum of the differences between the predicted value and the mean of the dependent variable
Think of it as a measure that describes how well our line fits the data.

SSE or ESS ~
explained sum of squares / Error sum of Squares
between the regression line - mean of response variable
the difference between the observed value and the predicted value.

SST ~
Sum of squares total.
the squared differences between the observed dependent variable and its mean.
SST = SSR + SSE

RSS ~
or residual sum of squares. Residual as in: remaining or unexplained.


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884507790604

Tags
#DataScience #machineLearning #python
Question
scikit learn linear model . syntax
Answer
sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884510412044

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Clustering. It is used to:
Answer
It is used:
• To extract the structure of the data
• To identify groups in the data

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884512771340

Tags
#DataScience #machineLearning
Question
K-means Clustering. How it is created
Answer
K-means finds the best centroids by alternatively assigning random centroids to a dataset and selecting
mean data points from the resulting clusters to form new centroids. It continues this process iteratively
until the model is optimized.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884514868492

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Dimensionality Reduction. What is?
Answer

It reduces a high-dimensional dataset into a dataset with fewer dimensions.

This makes it easier and faster for the algorithm to analyze the data.


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884516965644

Tags
#DataScience #machineLearning
Question
techniques used for dimensionality reduction:
Answer

Drop data columns with missing values

Drop data columns with low variance

Drop data columns with high correlations

Apply statistical functions - PCA


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884518800652

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Principal Component Analysis (PCA)
Answer
It is a linear dimensionality reduction method which uses singular value decomposition of the data and
keeps only the most significant singular vectors to project the data to a lower-dimensional space.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Article 4884520635660

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner

Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner and Jennifer B. Nuzzo Eric S. Toner Search for more papers by this author and Jennifer B. Nuzzo Search for more papers by this author Published Online:25 May 2011 About Figures References Related Details View PDF View PDF Plus Sections Diseases Jump the Species Barrier More Interconnected and Urbanized Hospitals Can Amplify Disease Hospital Infection Control Measures International Scientific Collaboration Disease Doesn't Stop at the Border Preparing to Respond Can Save Lives Superspreading and Respiratory Transmission What Remains To Be Done View Article View PDF View PDF Plus Tools Add to favorites Download Citations Track Citations Permissions Back To Publication Share Share on Facebook Twitter Lin



In recent years, most emerging infectious disease events have been the result of mutations in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ier The risk of infectious diseases jumping the species barrier remains a clear and present danger. People have been catching diseases from animals (zoonoses) as long as there have been people. <span>In recent years, most emerging infectious disease events have been the result of mutations in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV. SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Gu




SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Guangdong Province.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ons in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV. <span>SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Guangdong Province. As humans encroach ever more deeply into previously wild areas, the incidence of zoonotic infections will likely increase. In recent years we have seen zoonotic outbreaks of ebola, Marb




Modern urban environments have conditions, such as high population density, poor sanitation, and many poor, malnourished people, that may accelerate the spread of emerging infections. For instance, the large outbreak of SARS at the Amoy Gardens apartment complex in Honk Kong (329 patients) is at least partially related to its enormous size and density—19,000 residents in 0.04 km2.6
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ict the emergence of a novel human coronavirus years before SARS appeared.5 More Interconnected and Urbanized The risk of pandemics grows as the world becomes more interconnected and urbanized. <span>Modern urban environments have conditions, such as high population density, poor sanitation, and many poor, malnourished people, that may accelerate the spread of emerging infections. For instance, the large outbreak of SARS at the Amoy Gardens apartment complex in Honk Kong (329 patients) is at least partially related to its enormous size and density—19,000 residents in 0.04 km2.6 Because of their great population density, the burgeoning megacities around the world may contribute to the spread of novel contagious diseases.7,8 The introduction of a highly contagio




It is reasonable to estimate that the case fatality rate for SARS was cut in half by sophisticated modern health care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.)
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
pandemic, modern hospitals and the sophisticated care they provided were double-edged swords. It is certainly true that many victims of SARS were saved in intensive care units around the world. <span>It is reasonable to estimate that the case fatality rate for SARS was cut in half by sophisticated modern health care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.) On the other hand, it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11 These superspreading ev




it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
alth care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.) On the other hand, <span>it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11 These superspreading events were very often related to certain medical procedures—such as endotracheal intubation, airway suctioning, and noninvasive ventilation—that turn respiratory d




Most SARS infections probably occurred in hospitals, and nearly all cases of SARS can be traced back to one or more nosocomial superspreading events starting with relatively small hospital outbreaks in rural Guangdong, then large nosocomial outbreaks in Guangzhou, Hong Kong, Hanoi, Beijing, Singapore, and Toronto.2
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
g events were very often related to certain medical procedures—such as endotracheal intubation, airway suctioning, and noninvasive ventilation—that turn respiratory droplets into aerosols.12,13 <span>Most SARS infections probably occurred in hospitals, and nearly all cases of SARS can be traced back to one or more nosocomial superspreading events starting with relatively small hospital outbreaks in rural Guangdong, then large nosocomial outbreaks in Guangzhou, Hong Kong, Hanoi, Beijing, Singapore, and Toronto.2 That hospitals can function as disease amplifiers is not entirely new: Outbreaks of influenza occur in healthcare facilities every year, and many hospital-related outbreaks of TB have b




SARS was brought under control within a matter of months largely due to the fact that the disease was most transmissible when the patients were most sick—that is, when they were in a hospital.2 There was relatively little community transmission of SARS compared to other respiratory infections like influenza.18 For this reason, controlling the transmission in hospitals was key in controlling the outbreak.19
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
red with other infectious diseases as well, including smallpox14,15 and ebola.16,17 Hospital Infection Control Measures Hospital infection control measures work to stop the spread of pandemics. <span>SARS was brought under control within a matter of months largely due to the fact that the disease was most transmissible when the patients were most sick—that is, when they were in a hospital.2 There was relatively little community transmission of SARS compared to other respiratory infections like influenza.18 For this reason, controlling the transmission in hospitals was key in controlling the outbreak.19 This also explains the large percentage of healthcare workers who became infected and the large percentage of victims who acquired their infections in hospitals. For the most part (with




Flashcard 4884535053580

Tags
#DataScience #machineLearning
Question
__ is mainly used to combine multiple models or estimators
Answer
Pipeline

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Transmission [of SARS] in hospitals was brought under control by the use of standard infection control practices, such as isolation of sick patients and wearing of masks, gowns, and gloves by hospital staff.20,21
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
rkers who became infected and the large percentage of victims who acquired their infections in hospitals. For the most part (with the very important exception of aerosol-generating procedures), <span>transmission in hospitals was brought under control by the use of standard infection control practices, such as isolation of sick patients and wearing of masks, gowns, and gloves by hospital staff.20,21 For those high-risk aerosol-generating procedures, more stringent measures, such as the use of negative pressure isolation and high-efficiency respirators, were effective in reducing tr




Flashcard 4884539247884

Tags
#DataScience #machineLearning #python
Question
Model Persistence
Answer

Save model for the future use. No need to retrain your model every time when you need them.

It is possible to save a model by using Python's Pickle method.
Scikit-learn has a special replacement for pickle called joblib.
You can use joblib.dump and joblib.load methods.


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.




Isolation is the sequestration of individuals known to have the infection.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i




Flashcard 4884543704332

Question
[...] is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.
Answer
Quarantine

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they t

Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.







Flashcard 4884545277196

Question
Quarantine is [...].
Answer
the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.

Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.







Flashcard 4884547374348

Tags
#DataScience #machineLearning #python
Question
Model Evaluation: Metric Functions. Syntax for classification, clustering and regression.
Answer

Classification
metrics.accuracy_score
metrics.average_precision_score

Clustering
metrics.adjusted_rand_score

Regression
metrics.mean_absolute_error
metrics.mean_squared_error
metrics.median_absolute_error


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4884548685068

Question
[...] is the sequestration of individuals known to have the infection.
Answer
Isolation

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Isolation is the sequestration of individuals known to have the infection.

Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i







Flashcard 4884549733644

Question
Isolation is [...].
Answer
the sequestration of individuals known to have the infection

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Isolation is the sequestration of individuals known to have the infection.

Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i







Various types of travel screening were employed by a number of countries. Despite screening of millions of travelers, only a very few individuals with SARS were discovered. This was especially true of thermal screening. More than 35 million international travelers entering China, Canada, and Singapore had their temperatures measured, but no cases of SARS were found.18
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
d to comply. In fact, in some cases the imposition of quarantine orders produced a paradoxical result—people escaped. This may have contributed to the spread of SARS to remote parts of China.24 <span>Various types of travel screening were employed by a number of countries. Despite screening of millions of travelers, only a very few individuals with SARS were discovered. This was especially true of thermal screening. More than 35 million international travelers entering China, Canada, and Singapore had their temperatures measured, but no cases of SARS were found.18 Although the public health benefits of such measures are not clear, the resources required to implement them have been shown to be significant. Canada spent nearly $8 million (Canadian




In Canada, where 45% of SARS cases occurred among healthcare workers, a government-sponsored review found that, had the respiratory precautions and isolation policies that were eventually employed in the hospitals been in place at the beginning of the outbreak, many fewer people would have been infected.27
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
on the fly. After losing patients and staff to nosocomial transmission of SARS, affected hospitals were forced to figure out which infection control measures would halt the spread of infection. <span>In Canada, where 45% of SARS cases occurred among healthcare workers, a government-sponsored review found that, had the respiratory precautions and isolation policies that were eventually employed in the hospitals been in place at the beginning of the outbreak, many fewer people would have been infected.27 In both hospitals and health departments, advance planning, creation of information and communication systems, education and training, and stockpiling of supplies are necessary to enabl




While most people with SARS did not infect anyone else, the majority of SARS infections can be traced to a relatively small number of superspreading events in which 1 individual infected many other people.2
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
RS was estimated to be 2 to 4, this number only represents the average number of secondary cases caused by an infected person.28 The reality is that a huge range of transmission rates occurred. <span>While most people with SARS did not infect anyone else, the majority of SARS infections can be traced to a relatively small number of superspreading events in which 1 individual infected many other people.2 Superspreading is not a new phenomenon, having been described with tuberculosis, measles, and smallpox.29 It may well occur more frequently than is recognized in other contagious diseas




3 [SARS] superspreading events stand out as particularly puzzling: those that occurred at the Metropole Hotel and Amoy Gardens in Hong Kong, and the event in the emergency department (ED) of Scarborough Grace Hospital in Toronto.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
disease, diabetes, advanced age, and perhaps corticosteroid therapy. In addition to these host factors, high-risk aerosol-generating procedures were clearly associated with superspreading. But <span>3 superspreading events stand out as particularly puzzling: those that occurred at the Metropole Hotel and Amoy Gardens in Hong Kong, and the event in the emergency department (ED) of Scarborough Grace Hospital in Toronto. The event at the Metropole Hotel was the proximate cause of the SARS pandemic, as nearly all cases outside of China can be traced back to it.31 One infected individual stayed 1 night on




The [SARS superspreading] event at the Metropole Hotel was the proximate cause of the SARS pandemic, as nearly all cases outside of China can be traced back to it.31 One infected individual stayed 1 night on the ninth floor of the hotel and infected 16 other people on the same floor who then traveled across the globe before becoming ill. Although several possible explanations have been proposed, there is still no entirely satisfactory explanation for this event.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
out as particularly puzzling: those that occurred at the Metropole Hotel and Amoy Gardens in Hong Kong, and the event in the emergency department (ED) of Scarborough Grace Hospital in Toronto. <span>The event at the Metropole Hotel was the proximate cause of the SARS pandemic, as nearly all cases outside of China can be traced back to it.31 One infected individual stayed 1 night on the ninth floor of the hotel and infected 16 other people on the same floor who then traveled across the globe before becoming ill. Although several possible explanations have been proposed, there is still no entirely satisfactory explanation for this event. The ED of Scarborough Grace Hospital was the site of a chain of SARS transmission that led to most of the cases in Toronto. In particular, one event there stands out as unusual. The wif




Flashcard 4884561792268

Tags
#DataScience #machineLearning
Question
With ___ SSR or SSE, the prediction will be less accurate and the model will not be the best fit for the attributes.
Answer
higher

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






The ED of Scarborough Grace Hospital was the site of a chain of SARS transmission that led to most of the cases in Toronto. In particular, one event there stands out as unusual. The wife of one of the SARS patients sat in the ED waiting room while her husband was being treated. Unbeknownst to the ED staff, she also had SARS, but her symptoms were mild. Despite having mild symptoms, she apparently infected a number of other people in the waiting room and possibly a number of the staff as well. This event contrasts with most other incidents of SARS transmission, which required close contact and occurred only when the patient had severe symptoms. Like the event in the Metropole Hotel, it would be useful to understand what factors contributed to this anomalous event.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
same floor who then traveled across the globe before becoming ill. Although several possible explanations have been proposed, there is still no entirely satisfactory explanation for this event. <span>The ED of Scarborough Grace Hospital was the site of a chain of SARS transmission that led to most of the cases in Toronto. In particular, one event there stands out as unusual. The wife of one of the SARS patients sat in the ED waiting room while her husband was being treated. Unbeknownst to the ED staff, she also had SARS, but her symptoms were mild. Despite having mild symptoms, she apparently infected a number of other people in the waiting room and possibly a number of the staff as well. This event contrasts with most other incidents of SARS transmission, which required close contact and occurred only when the patient had severe symptoms. Like the event in the Metropole Hotel, it would be useful to understand what factors contributed to this anomalous event. The Amoy Gardens event is even more disconcerting. Amoy Gardens is a high-rise apartment complex with 19,000 residents. One infected individual staying there infected 329 others. The be




The Amoy Gardens event is even more disconcerting. Amoy Gardens is a high-rise apartment complex with 19,000 residents. One infected individual staying there infected 329 others. The best explanation is that a malfunctioning plumbing system allowed the creation of a virus-laden aerosol plume that was blown outdoors, wafted hundreds of yards downwind, and infected people in other buildings through open windows.32 If this hypothesis is true, it undermines many assumptions about the transmission of infectious diseases.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
d close contact and occurred only when the patient had severe symptoms. Like the event in the Metropole Hotel, it would be useful to understand what factors contributed to this anomalous event. <span>The Amoy Gardens event is even more disconcerting. Amoy Gardens is a high-rise apartment complex with 19,000 residents. One infected individual staying there infected 329 others. The best explanation is that a malfunctioning plumbing system allowed the creation of a virus-laden aerosol plume that was blown outdoors, wafted hundreds of yards downwind, and infected people in other buildings through open windows.32 If this hypothesis is true, it undermines many assumptions about the transmission of infectious diseases. This leads to the consideration of how SARS was transmitted—that is, whether it involved droplets or aerosols. Much has been written on this topic, and there are strong opinions on both




Flashcard 4884567821580

Tags
#DataScience #machineLearning
Question
What are the requirements of the K-means algorithm?
Answer
  • Number of clusters should be specified
  • More than one iteration should meet requisite criteria
  • Centroids should minimize inertia

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






In most cases SARS transmission was blocked by simple droplet precautions, but in others it seems clear that aerosol transmission was the only logical explanation; this was especially true in certain hospital outbreaks (eg, Ward 8a of the Prince of Wales Hospital in Hong Kong33). Probably both droplets and aerosols were produced as patients coughed, and various host and environmental factors determined which mechanism was predominant at a particular time and place.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
h sides. The distinction is important, because the answer determines the appropriate infection control measures to be used. In fact, there is convincing evidence for both forms of transmission. <span>In most cases SARS transmission was blocked by simple droplet precautions, but in others it seems clear that aerosol transmission was the only logical explanation; this was especially true in certain hospital outbreaks (eg, Ward 8a of the Prince of Wales Hospital in Hong Kong33). Probably both droplets and aerosols were produced as patients coughed, and various host and environmental factors determined which mechanism was predominant at a particular time and place. Better understanding of this phenomenon is important, because if this is also true for other diseases, then the role of aerosol transmission in respiratory infections more generally mus




Flashcard 4884569918732

Tags
#machine-learning #management #software-engineering #unfinished
Question
The goal of [...] at Google is to bring significant, practical benefits to our users, and to do so rapidly, within a few years at most.
Answer
research

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
The goal of research at Google is to bring significant, practical benefits to our users, and to do so rapidly, within a few years at most.

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 4884570967308

Tags
#machine-learning #management #software-engineering #unfinished
Question
The goal of research at Google is to bring significant, [...] benefits to our users, and to do so rapidly, within a few years at most.
Answer
practical

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
The goal of research at Google is to bring significant, practical benefits to our users, and to do so rapidly, within a few years at most.

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 4884572015884

Tags
#machine-learning #management #software-engineering #unfinished
Question
The goal of research at Google is to bring significant, practical benefits to our users, and to do so [...]
Answer
rapidly, within a few years at most.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
The goal of research at Google is to bring significant, practical benefits to our users, and to do so rapidly, within a few years at most.

Original toplevel document (pdf)

cannot see any pdfs







The standard (unit) softmax function \({\displaystyle \sigma :\mathbb {R} ^{K}\to \mathbb {R} ^{K}}\)is defined by the formula

\({\displaystyle \sigma (\mathbf {z} )_{i}={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}{\text{ for }}i=1,\dotsc ,K{\text{ and }}\mathbf {z} =(z_{1},\dotsc ,z_{K})\in \mathbb {R} ^{K}}\)
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Softmax function - Wikipedia
ts will correspond to larger probabilities. Softmax is often used in neural networks , to map the non-normalized output of a network to a probability distribution over predicted output classes. <span>The standard (unit) softmax function σ : R K → R K {\displaystyle \sigma :\mathbb {R} ^{K}\to \mathbb {R} ^{K}} is defined by the formula σ ( z ) i = e z i ∑ j = 1 K e z j for i = 1 , … , K and z = ( z 1 , … , z K ) ∈ R K {\displaystyle \sigma (\mathbf {z} )_{i}={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}{\text{ for }}i=1,\dotsc ,K{\text{ and }}\mathbf {z} =(z_{1},\dotsc ,z_{K})\in \mathbb {R} ^{K}} In words: we apply the standard exponential function to each element z i {\displaystyle z_{i}} of the input vector z {\displaystyle \mathbf {z} } and normalize these values by dividing




#bert #knowledge-base-construction #nlp #unfinished
For a sentence s with two target entities e 1 and e 2 , to make the BERT module capture the location information of the two entities, at both the begin- ning and end of the first entity, we insert a spe- cial token ‘$’, and at both the beginning and end of the second entity, we insert a special token ‘#’. We also add ‘[CLS]’ to the beginning of each sen- tence.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




In the first few years following the SARS pandemic, “respiratory etiquette” became the “new normal” in hospitals—anyone with a cough had a surgical mask placed on them at the ED door, and aerosol-generating procedures were done only in closed rooms with staff wearing PPE
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
health. Important lessons were also learned in the area of emergency management at the municipal, provincial/state, and national levels34 and in the realm of international treaties.35 Hospitals <span>In the first few years following the SARS pandemic, “respiratory etiquette” became the “new normal” in hospitals—anyone with a cough had a surgical mask placed on them at the ED door, and aerosol-generating procedures were done only in closed rooms with staff wearing PPE—and it was said that things would never be the same again. But now when we walk the halls of hospitals, this “new normal” for infection control is hard to detect. If SARS were transmitt




#bert #knowledge-base-construction #nlp #unfinished
In BERT, the input representation of each token is the sum of its token, segment and position embeddings.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#bert #knowledge-base-construction #nlp #unfinished
[CLS]’ is appended to the beginning of each sequence as the first token of the sequence. The fi- nal hidden state from the Transformer output cor- responding to the first token is used as the sen- tence representation for classification tasks. In case there are two sentences in a task, ‘[SEP]’ is used to separate the two sentences
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#bert #knowledge-base-construction #nlp #unfinished
ERT pre-trains the model parameters by us- ing a pre-training objective: the masked language model (MLM), which randomly masks some of the tokens from the input, and set the optimiza- tion objective to predict the original vocabulary id of the masked word according to its context.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#bert #knowledge-base-construction #nlp #unfinished
Un- like left-to-right language model pre-training, the MLM objective can help a state output to utilize both the left and the right context, which allows a pre-training system to apply a deep bidirectional Transformer.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#bert #knowledge-base-construction #nlp #unfinished
Besides the masked language model, BERT also trains a “next sentence prediction” task that jointly pre-trains text-pair representations.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4884591414540

Tags
#DataScience #machineLearning
Question
Natural Language Processing (NLP)
Answer
Natural language processing is an automated way to understand and analyze natural human languages
and extract information from such data by applying machine algorithms.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






#nlp #reading-group #transformer #unfinished
  1. with multi-headed attention we have not only one, but multiple sets of Query/Key/Value weight matrices (the Transformer uses eight attention heads, so we end up with eight sets for each encoder/decoder). Each of these sets is randomly initialized. Then, after training, each set is used to project the input embeddings (or vectors from lower encoders/decoders) into a different representation subspace.

statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
e animal didn’t cross the street because it was too tired”, we would want to know which word “it” refers to. It gives the attention layer multiple “representation subspaces”. As we’ll see next, <span>with multi-headed attention we have not only one, but multiple sets of Query/Key/Value weight matrices (the Transformer uses eight attention heads, so we end up with eight sets for each encoder/decoder). Each of these sets is randomly initialized. Then, after training, each set is used to project the input embeddings (or vectors from lower encoders/decoders) into a different representation subspace. With multi-headed attention, we maintain separate Q/K/V weight matrices for each head resulting in different Q/K/V matrices. As we did before, we multiply X by the WQ/WK/WV matrices to




Flashcard 4884594298124

Tags
#has-images #nlp #reading-group #transformer #unfinished
Question
[default - edit me]
Answer

The feed-forward layer is not expecting eight matrices – it’s expecting a single matrix (a vector for each word). So we need a way to condense these eight down into a single matrix.

How do we do that? We concat the matrices then multiple them by an additional weights matrix WO.


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill
Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
same self-attention calculation we outlined above, just eight different times with different weight matrices, we end up with eight different Z matrices This leaves us with a bit of a challenge. <span>The feed-forward layer is not expecting eight matrices – it’s expecting a single matrix (a vector for each word). So we need a way to condense these eight down into a single matrix. How do we do that? We concat the matrices then multiple them by an additional weights matrix WO. That’s pretty much all there is to multi-headed self-attention. It’s quite a handful of matrices, I realize. Let me try to put them all in one visual so we can look at them in one place







#nlp #reading-group #transformer #unfinished

As we encode the word "it", one attention head is focusing most on "the animal", while another is focusing on "tired" -- in a sense, the model's representation of the word "it" bakes in some of the representation of both "animal" and "tired".

If we add all the attention heads to the picture, however, things can be harder to interpret:

statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
Now that we have touched upon attention heads, let’s revisit our example from before to see where the different attention heads are focusing as we encode the word “it” in our example sentence: <span>As we encode the word "it", one attention head is focusing most on "the animal", while another is focusing on "tired" -- in a sense, the model's representation of the word "it" bakes in some of the representation of both "animal" and "tired". If we add all the attention heads to the picture, however, things can be harder to interpret: Representing The Order of The Sequence Using Positional Encoding One thing that’s missing from the model as we have described it so far is a way to account for the order of the words in




Flashcard 4884597443852

Tags
#DataScience #machineLearning
Question

NLP Terminology

  1. Tokenization
  2. Stemming
  3. Tf-idf
  4. Semantic analytics
  5. Disambiguation
  6. Topic models
  7. Word boundaries
Answer
  1. Tokenization
    Splits words, phrases, and idioms
  2. Stemming
    Maps to the valid root word
  3. Tf-idf
    Represents term frequency and inverse document frequency
  4. Semantic analytics
    Compares words, phrases, and idioms in a set of documents to extract meaning
  5. Disambiguation
    Determines meaning and sense of words (context vs. intent)
  6. Topic models
    Discover topics in a collection of documents
  7. Word boundaries
    Determines where one word ends and the other begins

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






#has-images #nlp #reading-group #transformer #unfinished

the transformer adds a vector to each input embedding. These vectors follow a specific pattern that the model learns, which helps it determine the position of each word, or the distance between different words in the sequence. The intuition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention.

statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
Sequence Using Positional Encoding One thing that’s missing from the model as we have described it so far is a way to account for the order of the words in the input sequence. To address this, <span>the transformer adds a vector to each input embedding. These vectors follow a specific pattern that the model learns, which helps it determine the position of each word, or the distance between different words in the sequence. The intuition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention. To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern. If we assumed the embedding has a dimensionalit




#has-images #nlp #reading-group #transformer #unfinished

To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern.

If we assumed the embedding has a dimensionality of 4, the actual positional encodings would look like this:

statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
uition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention. <span>To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern. If we assumed the embedding has a dimensionality of 4, the actual positional encodings would look like this: A real example of positional encoding with a toy embedding size of 4 What might this pattern look like? In the following figure, each row corresponds the a positional encoding of a vect




#nlp #reading-group #transformer #unfinished
The formula for positional encoding is described in the paper (section 3.5). You can see the code for generating positional encodings in get_timing_signal_1d(). This is not the only possible method for positional encoding. It, however, gives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set).
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
generated by one function (which uses sine), and the right half is generated by another function (which uses cosine). They're then concatenated to form each of the positional encoding vectors. <span>The formula for positional encoding is described in the paper (section 3.5). You can see the code for generating positional encodings in get_timing_signal_1d() . This is not the only possible method for positional encoding. It, however, gives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set). The Residuals One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connec




#has-images #nlp #reading-group #transformer #unfinished

One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connection around it, and is followed by a layer-normalization step.

statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
ives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set). The Residuals <span>One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connection around it, and is followed by a layer-normalization step. If we’re to visualize the vectors and the layer-norm operation associated with self attention, it would look like this: This goes for the sub-layers of the decoder as well. If we’re to




#nlp #reading-group #transformer #unfinished
After finishing the encoding phase, we begin the decoding phase. Each step in the decoding phase outputs an element from the output sequence (the English translation sentence in this case).
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
into a set of attention vectors K and V. These are to be used by each decoder in its “encoder-decoder attention” layer which helps the decoder focus on appropriate places in the input sequence: <span>After finishing the encoding phase, we begin the decoding phase. Each step in the decoding phase outputs an element from the output sequence (the English translation sentence in this case). The following steps repeat the process until a special symbol is reached indicating the transformer decoder has completed its output. The output of each step is fed to the bottom decode




#nlp #reading-group #transformer #unfinished

The output of each step is fed to the bottom decoder in the next time step, and the decoders bubble up their decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word.

statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
t sequence (the English translation sentence in this case). The following steps repeat the process until a special symbol is reached indicating the transformer decoder has completed its output. <span>The output of each step is fed to the bottom decoder in the next time step, and the decoders bubble up their decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word. The self attention layers in the decoder operate in a slightly different way than the one in the encoder: In the decoder, the self-attention layer is only allowed to attend to earlier p




Flashcard 4884610288908

Tags
#DataScience #machineLearning #python
Question
NLP scikit learn. ___ is used to convert text data into numerical feature vectors with a fixed size.
Answer
Bag of words

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






#nlp #reading-group #transformer #unfinished

The self attention layers in the decoder operate in a slightly different way than the one in the encoder:

In the decoder, the self-attention layer is only allowed to attend to earlier positions in the output sequence. This is done by masking future positions (setting them to -inf) before the softmax step in the self-attention calculation.

The “Encoder-Decoder Attention” layer works just like multiheaded self-attention, except it creates its Queries matrix from the layer below it, and takes the Keys and Values matrix from the output of the encoder stack.

statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
ir decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word. <span>The self attention layers in the decoder operate in a slightly different way than the one in the encoder: In the decoder, the self-attention layer is only allowed to attend to earlier positions in the output sequence. This is done by masking future positions (setting them to -inf) before the softmax step in the self-attention calculation. The “Encoder-Decoder Attention” layer works just like multiheaded self-attention, except it creates its Queries matrix from the layer below it, and takes the Keys and Values matrix from the output of the encoder stack. The Final Linear and Softmax Layer The decoder stack outputs a vector of floats. How do we turn that into a word? That’s the job of the final Linear layer which is followed by a Softmax




Pasteur's quadrant is a classification of scientific research projects that seek fundamental understanding of scientific problems, while also having immediate use for society. Louis Pasteur's research is thought to exemplify this type of method, which bridges the gap between "basic" and "applied" research.[1] The term was introduced by Donald E. Stokes in his book, Pasteur's Quadrant.[
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

Pasteur's quadrant - Wikipedia
Pasteur's quadrant - Wikipedia Pasteur's quadrant From Wikipedia, the free encyclopedia Jump to navigation Jump to search Pasteur's quadrant is a classification of scientific research projects that seek fundamental understanding of scientific problems, while also having immediate use for society. Louis Pasteur 's research is thought to exemplify this type of method, which bridges the gap between "basic " and "applied " research.[1] The term was introduced by Donald E. Stokes in his book, Pasteur's Quadrant.[2] Applied and basic research[edit ] As shown in the following table, scientific research can be classified by whether it advances human knowledge by seeking a fundamental understanding




#has-images
Applied and Basic research
Considerations of use?
No Yes
Quest for

fundamental
understanding?

Yes

Pure basic

research

Use-inspired

basic research

No

Pure applied

research

The result is three distinct classes of research:

  1. Pure basic research, exemplified by the work of Niels Bohr, early 20th century atomic physicist.
  2. Pure applied research, exemplified by the work of Thomas Edison, inventor.
  3. Use-inspired basic research, described here as "Pasteur's Quadrant".
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on




Flashcard 4884626017548

Tags
#machine-learning #management #software-engineering #unfinished
Question
we note that in the terminology of Pasteur’s Quadrant, 11 we do [...] (CS) research.
Answer
“use-inspired basic” and “pure ap- plied”

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
we note that in the terminology of Pasteur’s Quadrant, 11 we do “use-inspired basic” and “pure ap- plied” (CS) research.

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 4884627590412

Question

In this sentence, what does "promote" mean?

All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production.

Answer
The act of copying file content from a less controlled location into a more controlled location.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Not only do we have to manage the software code artifacts but also the data sets, the machine learning models, and the parameters and hyperparameters used by such models. All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production.

Original toplevel document

Sato,Wider,Windheuser_2019_Continuous-delivery_thoughtworks
icient collaboration and alignment. However, this integration also brings new challenges when compared to traditional software development. These include: A higher number of changing artifacts. <span>Not only do we have to manage the software code artifacts but also the data sets, the machine learning models, and the parameters and hyperparameters used by such models. All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production. It’s harder to achieve versioning, quality control, reliability, repeatability and audibility in that process. Size and portability: Training data and machine learning models usually co







#knowledge-base-construction #machine-learning #nlp #unfinished
At the core of Alexandria is a probabilistic program that defines a process of generating text from a knowledge base consisting of a large set of typed entities.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#knowledge-base-construction #machine-learning #nlp #unfinished
By applying probabilistic inference to this program, we can reason in the inverse direction: going from text back to facts.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#knowledge-base-construction #machine-learning #nlp #unfinished
The use of a probabilistic program also provides an elegant way to handle the uncertainty inherent in natural text.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#knowledge-base-construction #machine-learning #nlp #unfinished
An important advantage of using a generative model is that Alexandria does not require labelled data, which means it can be applied to new domains with little or no manual effort. The model is also inherently task-neutral – by varying which variables in the model are observed and which are inferred, the same model can be used for: learning a schema (relation discovery), entity discovery, entity linking, fact retrieval and other tasks, such as finding sources that support a particular fact.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#knowledge-base-construction #machine-learning #nlp #unfinished
In this paper we demonstrate schema learning, fact retrieval, entity discovery and entity linking. We will evaluate the former two tasks, while the latter two are performed as part of these main tasks.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#knowledge-base-construction #machine-learning #nlp #unfinished
An attractive aspect of our approach is that the entire system is defined by one coherent probabilistic model. This removes the need to create and train many separate components such as tokenizers, named entity recognizers, part-of-speech taggers, fact extractors, linkers and so on; a disadvantage of having such multiple components is that they are likely to encode different underlying assumptions, reducing the accuracy of the combined system. Furthermore, the use of a single probabilistic program allows uncertainty to be propagated consistently throughout the system – from the raw web text right through to the extracted facts (and back).
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4885281901836

Question
congenital
Answer
[default - edit me]

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







Flashcard 4885282950412

Question
RNA Polymerase in Eukaryotes
Answer
[default - edit me]

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







Flashcard 4885284785420

Question
RNA polymerisa in eukaryotes
Answer
<p>RNA polymerase I makes rRNA, the most common (rampant) type; present only in nucleolus. RNA polymerase II makes mRNA (massive), microRNA (miRNA), and small nuclear RNA (snRNA). RNA polymerase III makes 5S rRNA, tRNA (tiny). No proofreading function, but can initiate chains. RNA polymerase II opens DNA at promoter site. I, II, and III are numbered in the same order that their products are used in protein synthesis: rRNA, mRNA, then tRNA. &alpha;-amanitin, found in Amanita phalloides (death cap mushrooms), inhibits RNA polymerase II. Causes severe hepatotoxicity if ingested. Actinomycin D, also called dactinomycin, inhibits RNA polymerase in both prokaryotes and eukaryotes</p>

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







#machine-learning #management #software-engineering #unfinished
Because of the time frame and ef- fort involved, Google’s approach to re- search is iterative and usually involves writing production, or near-produc- tion, code from day one.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #management #software-engineering #unfinished
Typically, a single team iteratively ex- plores fundamental research ideas, de- velops and maintains the software, and helps operate the resulting Google ser- vices—all driven by real-world experi- ence and concrete data.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #management #software-engineering #unfinished
This approach also helps ensure the research efforts produce results that benefit Google’s users, by allowing research ideas and implementations to be honed on em- pirical data and real-world constraints, and by utilizing even failed efforts to gather valuable data and statistics for further attempts.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #management #software-engineering #unfinished
Google’s mission “To organize the world’s information and make it uni- versally accessible and useful,”
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #management #software-engineering #unfinished
Even a small team has at its disposal the power of many internal services, allowing the team to quickly create complex and powerful products and services. Design, testing, production, and maintenance pro- cesses are simplified.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #management #software-engineering #unfinished
Google has been able to hire a tal- ented team across the entire engineer- ing operation. This gives us the op- portunity to innovate everywhere, and for people to move between projects, whether they be primarily research or primarily engineering.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4885296581900

Tags
#DataScience #machineLearning
Question
NLP. choice of model for supervised and unsupervised.
Answer

Supervised
Models predict the outcome of new observations and datasets, and classify
documents based on the features and response of a given dataset.
Eg: Naïve Bayes, SVM, linear regression, K-NN neighbors

Unsupervised
Models identify patterns in the data and extract its structure.
They are also used to group documents using clustering algorithms.
Example: K-means


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






#machine-learning #management #software-engineering #unfinished
We recognize that the wide dissemination of fundamental results often benefits us by garnering valuable feedback, educating future hires, providing collaborations, and seeding additional work.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4885299989772

Tags
#DataScience #machineLearning
Question

NLP. most basic technique for classification of text.

Advantages:

Uses:

Answer

Naïve Bayes Classifier

Advantages:
• It is efficient as it uses limited CPU and memory.
• It is fast as the model training takes less time.

Uses:
• Naïve Bayes is used for sentiment analysis, email spam detection, categorization of documents, and language detection.
• Multinomial Naïve Bayes is used whenmultiple occurrences of the words matter.


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






#machine-learning #management #software-engineering #unfinished
Even if we cannot fully factorize work, we have sometimes undertaken longer-term efforts. For example, we have started multiyear, large systems efforts (in- cluding Google Translate, Chrome, Google Health) that have important research components. These projects were characterized by the need for complex systems and research (such as Web-scale identification of paral- lel corpora for Translate 12 and various complex security features in Chrome 9 and Health). At the same time, we have recently shown that even in longer- term, publicly launched efforts, we are unafraid to refocus our work (for exam- ple, Google Health), if it seems we are not achieving success.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #management #software-engineering #unfinished
this approach benefits from the mainly evolutionary nature of CS research, where great results are usu- ally the composition of many discrete steps.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #management #software-engineering #unfinished
we have structured the Google environment as one where new ideas can be rapidly verified by small teams through large-scale experiments on real data, rather than just debated.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #statistics #unfinished
First, studies often apply cross-validation on a subset of data subsampled from the original dataset. Performing this kind of preprocessing, in a machine learning context, without any kind of argumentation, raises doubts as it drastically increases the variance of the obtained results and avoids the problem of imbalanced data, which does not reflect reality in terms of potential applications
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #statistics #unfinished
Finally, there are many studies applying over- sampling before partitioning the data into two mutually exclusive sets in order to make the distribution of classes more uniform
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4885309689100

Tags
#DataScience #machineLearning
Question
Document classifiers can have many parameters and a __ approach helps to search the best parameters
for model training and predicting the outcome accurately.
Answer
Grid Search

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885311524108

Tags
#DataScience #machineLearning
Question
What is the tf-idf value in a document?
Answer

td-idf value reflects how important a word is to a document.

Directly proportional to the number of times a word appears.

Offset by frequency of the words in corpus.


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






#machine-learning #statistics #unfinished
they might be rather optimistic due to the fact that the evaluation happened in a leave-one-out scheme.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




#machine-learning #statistics #unfinished
While this subsampling strategy again avoids the problem of imbalanced data, which is reflected in the original dataset, it does show an improvement in AUC and thus indicates that adding the MEMD-based feature to the dataset could be beneficial for the predictive performance. More- over, due to the many repetitions of the experiment, the sample mean better reflects the real mean.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4885316504844

Tags
#DataScience #python
Question
Python’s data visualization library
Answer
matplotlib

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885318339852

Tags
#DataScience
Question
create a plot using four simple steps.
Answer
Step 01: Import the required libraries
Step 02: Define or import the required dataset
Step 03: Set the plot parameters
Step 04: Display the created plot

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885320174860

Tags
#DataScience #python
Question
matplotlib, subplot syntax
Answer

subplot(m,n,p).

It divides the current window into an m-by-n grid and creates an axis for a subplot in the position specified by p.


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885322009868

Tags
#DataScience #python
Question
matplotlib. method used to adjust the distances between the subplots?
Answer
plt.subplots_adjust()

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885323844876

Tags
#DataScience #python
Question
What is Seaborn?
Answer
Seaborn is a Python visualization library based on matplotlib.
It provides a high-level interface to draw attractive statistical graphics.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885325679884

Tags
#DataScience #python
Question
To import matplotlib and display the plot on Jupyter notebook use:
Answer

import matplotlib .pyplot as plt

%matplotlib inline


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885327514892

Tags
#DataScience #python
Question
Which keywords is used to decide the transparency of the plot line? (in matplotlib)
Answer
Alpha

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885329349900

Tags
#DataScience #python
Question
matplotlib statements limits both x and y axes to the interval [0, 6]?
Answer
plt.axis([0, 6, 0, 6]) statement limits both x and y axes to the interval [0, 6].

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885331184908

Tags
#DataScience #machineLearning
Question
What is Web Scraping
Answer
Web scraping is a computer software technique of extracting information from websites in an automated fashion.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885333019916

Tags
#DataScience #machineLearning
Question
Web Scraping Process
Answer
Step 1: A web request is sent to the targeted website to collect the required data.
Step 2: The information is retrieved from the targeted website in HTML or XML format from web.
Step 3: The retrieved information is parsed to the several parsers based on the data format.
Parsing is a technique to read data and extract information from the available document.
Step 4: The parsed data is stored in the desired format.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885334854924

Tags
#DataScience #machineLearning
Question
Web Scraping Considerations (legal), what to look for
Answer
Legal Constraints
Notice
Trademark Material
Patented Information
Copyright

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885336689932

Tags
#DataScience #machineLearning
Question
webscrapping tree structure
Answer
html > div > ul > lil > div class

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885338524940

Tags
#DataScience #machineLearning #python
Question
web scraping. The ___ function searches and retrieves all tags’ descendants that matches your filters.
Answer
find_all()

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885340359948

Tags
#DataScience #machineLearning #python
Question
web scraping. To find one result, use
Answer

find().

Returns only the first match value


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885342194956

Tags
#DataScience #machineLearning #python
Question
web scraping. The method get_text() is used to _________.
Answer
parse only part of the document.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Flashcard 4885344029964

Tags
#DataScience #machineLearning #python
Question
web scraping.
navigate down:
Navigating Up:
Navigating Sideways:
Navigating Back and Forth:
Answer

web scraping.
navigate down:
• .contents and .children
• .descendants
• .string
• .strings and stripped_strings


Navigating Up:
.parents and .parent


Navigating Sideways:
.next_sibling and
.previous_sibling.


Navigating Back and Forth:
.next_element and .previous_element
.next_elements and .previous_elements


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill






Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4886700625164

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) [...] ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
Answer
balanço patrimonial

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) <span>balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d)

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 4886702198028

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado [...]; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
Answer
do período

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado <span>do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 4886703770892

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado [...] do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
Answer
abrangente

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
rações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado <span>abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme N

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 4886705343756

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das [...] do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
Answer
mutações

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das <span>mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionad

Original toplevel document (pdf)

cannot see any pdfs







assumes independence between the posterior distribution of the parameters associated with segments of data between successive changepoints
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




generalisation of that suggested by Liu and Lawrence (1999)
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




We consider two classes of prior for the changepoint pro- cess. One, that of Green (1995), involves a prior on the number of changepoints, and then a conditional prior on their position. The other is based on modelling the changepoint process by a point process (Pievatolo and Green, 1998), and is a special case of a product-partion model (Hartigan, 1990).
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




we assume that, conditional on the realisation of the changepoint process, the joint posterior distribution of the parameters is independent across the segments of the time series
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




assume a conjugate prior for the parameters associated with each segment
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




For a data set consisting of observations at discrete times, 1,...,n, the recursions are based on calcu- lating the probability of the data from time t to time n,given a changepoint at time t, in terms of the equivalent probabili- ties at times t + 1,...,n.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




The assumption of conjugate priors can potentially be relaxed, but with an increase in the computational cost. Essentially, low-dimensional integrals that can be calculated analytically under conjugate priors would need to be calculated numerically (for example see Section 4.2).
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Relaxation of the independence assumption is more difficult, but our algorithm can still be used as a useful tool for analysing such data.
statusnot read reprioritisations
last reprioritisation on reading queue position [%]
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4886852406540

Tags
#has-images



statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs