# on 27-Jan-2020 (Mon)

#### Flashcard 1602824572172

Tags
#broker #estate #real
Question
Middle Ages under old English Law had transfers of Real Property by what means?

status measured difficulty not learned 37% [default] 0

#### Flashcard 4881260612876

Tags
#python
Question
Python shape manipulation functions
Flatten, Resize, Stack , Reshape, Split
Flatten: array.ravel()
Reshape: array.reshape(3,4) #3 rows, 4 columns
Resize: array.resize(2,6) #2 resizes again to 2rows, 6 columns
Split: np.hsplit(array,2) #splits array to 2
Stack: np.hstack((array1,array2,2))

status measured difficulty not learned 37% [default] 0

#### Flashcard 4881262447884

Tags
#python
Question
Linear Algebra functions. transpose, inverse, trace.

array=np.array([[1,2,3,4],[5,6,7,8]])

array.transpose()
np.linalg.inv(array) #eg 1/value
np.trace(array) #sum of diagonals left to right only

status measured difficulty not learned 37% [default] 0

#### Flashcard 4883816254732

Tags
#python
Question

create a panda series.

from which type of data?

import pandas as pd

someSeries = pd.Series(list/nd.array)
someSeries = pd.Series(5. , index['a','b','c'])
someSeries = pd.Series([1,2,3] , index['a','b','c']) #like a dictionary

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884405292300

Tags
#python
Question

What is a python dataframe?

(number of dimensions, same/different data types)

DataFrame is a

• two-dimensional
• labeled data structure with columns of
• potentially different types.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884407127308

Tags
#python
Question

Syntax for Creating DataFrames from

1. Lists
2. Dictionary
3. Series
4. nd.array

Syntax for Creating DataFrames
pd.Dataframe

1. Lists
pd.Dataframe{'columnName1':['val1','val2'], 'columnName2':[1,2]}
2. Dictionary
pd.Dataframe{'columnName1':{key1:value1}, 'columnName2':[key2:val2]}
3. Series
series1 = pd.Series([values],index=[indexes])
series2 = pd.Series(....)
#they have the same kind of indexes, eg by year.
newDF = pd.Dataframe({'colName1':series1,'colName2':series2})
4. nd.array
Create an ndarrays with years. np.array([2001,2020,2019])
Create a dict with the ndarray. dict = {'year':np_arr}
Pass this dict to a new DataFrame. df = pd.DataFrame(dict)
#it will have an index sequenced, column name year, with the values in the np array

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884410535180

Tags
#DataScience #python
Question
Handle Missing Values with Functions. (2 ways)
1. Dropping the NaN (null) values. Use .dropna()
2. Filling the NaN values with something else.
fill with zeros. .fillna(0)
#can also fill with mean of the data.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884412370188

Tags
#DataScience #python
Question
Custom functions can be applied to the dataframe.
Name it's use and syntax

Eg, for creating new features, or standardizing.
custom functions can be applied with the applymap method.

df.applymap(functionName)

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884414205196

Tags
#DataScience #python
Question

Dataframe statistical functions

.max()
.min()
.mean()
.std()
etc.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884416040204

Tags
#DataScience #python
Question
Data Operation Using Groupby - syntax.

grouped = df.groupby(field)

extract = grouped.get_group(wantedValue)

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884417875212

Tags
#DataScience #python
Question
dataframe Data Operation – Sorting
df.sort_by('columnName')

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884419710220

Tags
#DataScience #python #statistics
Question
Data Standardization applied on data. (define a function)
def standardize(test):
return(test-test.mean())/test.std()
#test. standardize(df['Test1'])
def standardizeResult(dataFrame):
return dataFrame.apply(standardize)
standardizeResult(df)
#get a dataframe with standardized figures. eg most within +-3 sd (standard deviations)

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884421545228

Tags
#python
Question

python syntax for

1. indexing by label
2. indexing by position

python syntax for

1. indexing by label: loc
2. indexing by position: iloc

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884423380236

Tags
#python
Question
While viewing a dataframe, head() method will _____.
The default value is 5 if nothing is passed in head method. So it will return the first five rows of the DataFrame.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884425215244

Tags
#DataScience #machineLearning
Question

Machine Learning Terminology:

1. Columns
2. Rows
3. Outcome

Machine Learning Terminology:

1. Columns: Features, attributes, inputs
2. Rows: Observations, samples, records
3. Outcome: Response, target, label

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884427836684

Tags
#DataScience #machineLearning
Question
Machine Learning Approach/Steps
1. Understand the problem/dataset. Also deal with the outliers and null values?
2. Extract the features from the dataset. Check correlations, meaningful fields.
3. Identify the problem type. Continuous/Catergorical?
4. Choose the right model. Linear regression, logistic regression, clustering?
5. Train and test the model. Check accuracy, errors.
6. Strive for accuracy. Play with factors or relook at features if required.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884430195980

Tags
#DataScience #machineLearning
Question
What is Supervised Learning
1. The dataset used to train a model should have observations, features, and responses.
The model is trained to predict the “right” response for a given set of data points.
2. Supervised learning models are used to predict an outcome.
3. The goal of this model is to “generalize” a dataset so that the “general rule” can be applied to new data as well.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884432030988

Tags
#DataScience #machineLearning
Question
What is unsupervised learning
1. In unsupervised learning, the response or the outcome of the data is unknown.
2. Supervised learning models are used to identify and visualize patterns in data by grouping similar types of data.
3. The goal of this model is to “represent” data in a way that meaningful information can be extracted.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884433865996

Tags
#DataScience #machineLearning
Question
Identify the Problem Type and Learning Model for supervised/unsupervised learning.

Supervised
Continuous: Linear regression
Catergorical: Classification, logistic regression

Unsupervised
Continuous: Dimensionality reduction
Catergorical: Clustering

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884435701004

Tags
#DataScience #machineLearning #python
Question
Scikit-Learn Considerations
• Create separate objects for feature and response.
• Ensure that features and response have only numeric values.
• Features and response should be in the form of a NumPy ndarray.
• Since features and response would be in the form of arrays, they would have shapes and sizes.
• Features are always mapped as x, and response is mapped as y.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884437536012

Tags
#DataScience #machineLearning #python
Question
The estimator instance in Scikit-learn is a _____.
The estimator instance or object is a model.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884439371020

Tags
#DataScience #mathematics-basic
Question
simple linear equation

y = mx + c

𝑦 = β0 + β1𝑥 + u
(u is the residuals)

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884501761292

Tags
#DataScience #mathematics
Question
Errors in linear regression

SSR ~
Regression Sum of Squares
between the observed value - regression line
the sum of the differences between the predicted value and the mean of the dependent variable
Think of it as a measure that describes how well our line fits the data.

SSE or ESS ~
explained sum of squares / Error sum of Squares
between the regression line - mean of response variable
the difference between the observed value and the predicted value.

SST ~
Sum of squares total.
the squared differences between the observed dependent variable and its mean.
SST = SSR + SSE

or residual sum of squares. Residual as in: remaining or unexplained.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884507790604

Tags
#DataScience #machineLearning #python
Question
scikit learn linear model . syntax
sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884510412044

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Clustering. It is used to:
It is used:
• To extract the structure of the data
• To identify groups in the data

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884512771340

Tags
#DataScience #machineLearning
Question
K-means Clustering. How it is created
K-means finds the best centroids by alternatively assigning random centroids to a dataset and selecting
mean data points from the resulting clusters to form new centroids. It continues this process iteratively
until the model is optimized.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884514868492

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Dimensionality Reduction. What is?

It reduces a high-dimensional dataset into a dataset with fewer dimensions.

This makes it easier and faster for the algorithm to analyze the data.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884516965644

Tags
#DataScience #machineLearning
Question
techniques used for dimensionality reduction:

Drop data columns with missing values

Drop data columns with low variance

Drop data columns with high correlations

Apply statistical functions - PCA

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884518800652

Tags
#DataScience #machineLearning
Question
Unsupervised Learning Models: Principal Component Analysis (PCA)
It is a linear dimensionality reduction method which uses singular value decomposition of the data and
keeps only the most significant singular vectors to project the data to a lower-dimensional space.

status measured difficulty not learned 37% [default] 0

Article 4884520635660

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner

Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner and Jennifer B. Nuzzo Eric S. Toner Search for more papers by this author and Jennifer B. Nuzzo Search for more papers by this author Published Online:25 May 2011 About Figures References Related Details View PDF View PDF Plus Sections Diseases Jump the Species Barrier More Interconnected and Urbanized Hospitals Can Amplify Disease Hospital Infection Control Measures International Scientific Collaboration Disease Doesn't Stop at the Border Preparing to Respond Can Save Lives Superspreading and Respiratory Transmission What Remains To Be Done View Article View PDF View PDF Plus Tools Add to favorites Download Citations Track Citations Permissions Back To Publication Share Share on Facebook Twitter Lin

#### Annotation 4884522470668

 In recent years, most emerging infectious disease events have been the result of mutations in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV.

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ier The risk of infectious diseases jumping the species barrier remains a clear and present danger. People have been catching diseases from animals (zoonoses) as long as there have been people. <span>In recent years, most emerging infectious disease events have been the result of mutations in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV. SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Gu

#### Annotation 4884524043532

 SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Guangdong Province.

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ons in wildlife pathogens that have allowed infection of human hosts.4 In the past, such events contributed to some of history's great pandemics, including influenza, plague, smallpox, and HIV. <span>SARS was caused by a coronavirus that was endemic among fruit bats in China; it adapted to a human host after establishing itself in the captive animals in the wild animal markets of Guangdong Province. As humans encroach ever more deeply into previously wild areas, the incidence of zoonotic infections will likely increase. In recent years we have seen zoonotic outbreaks of ebola, Marb

#### Annotation 4884525878540

 Modern urban environments have conditions, such as high population density, poor sanitation, and many poor, malnourished people, that may accelerate the spread of emerging infections. For instance, the large outbreak of SARS at the Amoy Gardens apartment complex in Honk Kong (329 patients) is at least partially related to its enormous size and density—19,000 residents in 0.04 km2.6

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
ict the emergence of a novel human coronavirus years before SARS appeared.5 More Interconnected and Urbanized The risk of pandemics grows as the world becomes more interconnected and urbanized. <span>Modern urban environments have conditions, such as high population density, poor sanitation, and many poor, malnourished people, that may accelerate the spread of emerging infections. For instance, the large outbreak of SARS at the Amoy Gardens apartment complex in Honk Kong (329 patients) is at least partially related to its enormous size and density—19,000 residents in 0.04 km2.6 Because of their great population density, the burgeoning megacities around the world may contribute to the spread of novel contagious diseases.7,8 The introduction of a highly contagio

#### Annotation 4884527975692

 It is reasonable to estimate that the case fatality rate for SARS was cut in half by sophisticated modern health care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.)

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
pandemic, modern hospitals and the sophisticated care they provided were double-edged swords. It is certainly true that many victims of SARS were saved in intensive care units around the world. <span>It is reasonable to estimate that the case fatality rate for SARS was cut in half by sophisticated modern health care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.) On the other hand, it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11 These superspreading ev

#### Annotation 4884529548556

 it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
alth care. (If all patients who required intensive care would have died without it, then the case fatality rate would have been approximately 20% rather than the actual 10%.) On the other hand, <span>it is likely that SARS would not have become a major epidemic had it not been for the many superspreading events that occurred in hospitals.11 These superspreading events were very often related to certain medical procedures—such as endotracheal intubation, airway suctioning, and noninvasive ventilation—that turn respiratory d

#### Annotation 4884531383564

 Most SARS infections probably occurred in hospitals, and nearly all cases of SARS can be traced back to one or more nosocomial superspreading events starting with relatively small hospital outbreaks in rural Guangdong, then large nosocomial outbreaks in Guangzhou, Hong Kong, Hanoi, Beijing, Singapore, and Toronto.2

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
g events were very often related to certain medical procedures—such as endotracheal intubation, airway suctioning, and noninvasive ventilation—that turn respiratory droplets into aerosols.12,13 <span>Most SARS infections probably occurred in hospitals, and nearly all cases of SARS can be traced back to one or more nosocomial superspreading events starting with relatively small hospital outbreaks in rural Guangdong, then large nosocomial outbreaks in Guangzhou, Hong Kong, Hanoi, Beijing, Singapore, and Toronto.2 That hospitals can function as disease amplifiers is not entirely new: Outbreaks of influenza occur in healthcare facilities every year, and many hospital-related outbreaks of TB have b

#### Annotation 4884533480716

 SARS was brought under control within a matter of months largely due to the fact that the disease was most transmissible when the patients were most sick—that is, when they were in a hospital.2 There was relatively little community transmission of SARS compared to other respiratory infections like influenza.18 For this reason, controlling the transmission in hospitals was key in controlling the outbreak.19

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
red with other infectious diseases as well, including smallpox14,15 and ebola.16,17 Hospital Infection Control Measures Hospital infection control measures work to stop the spread of pandemics. <span>SARS was brought under control within a matter of months largely due to the fact that the disease was most transmissible when the patients were most sick—that is, when they were in a hospital.2 There was relatively little community transmission of SARS compared to other respiratory infections like influenza.18 For this reason, controlling the transmission in hospitals was key in controlling the outbreak.19 This also explains the large percentage of healthcare workers who became infected and the large percentage of victims who acquired their infections in hospitals. For the most part (with

#### Flashcard 4884535053580

Tags
#DataScience #machineLearning
Question
__ is mainly used to combine multiple models or estimators
Pipeline

status measured difficulty not learned 37% [default] 0

#### Annotation 4884535577868

 Transmission [of SARS] in hospitals was brought under control by the use of standard infection control practices, such as isolation of sick patients and wearing of masks, gowns, and gloves by hospital staff.20,21

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
rkers who became infected and the large percentage of victims who acquired their infections in hospitals. For the most part (with the very important exception of aerosol-generating procedures), <span>transmission in hospitals was brought under control by the use of standard infection control practices, such as isolation of sick patients and wearing of masks, gowns, and gloves by hospital staff.20,21 For those high-risk aerosol-generating procedures, more stringent measures, such as the use of negative pressure isolation and high-efficiency respirators, were effective in reducing tr

#### Flashcard 4884539247884

Tags
#DataScience #machineLearning #python
Question
Model Persistence

Save model for the future use. No need to retrain your model every time when you need them.

It is possible to save a model by using Python's Pickle method.
Scikit-learn has a special replacement for pickle called joblib.
You can use joblib.dump and joblib.load methods.

status measured difficulty not learned 37% [default] 0

#### Annotation 4884539772172

 Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.

#### Annotation 4884541345036

 Isolation is the sequestration of individuals known to have the infection.

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i

#### Flashcard 4884543704332

Question
[...] is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.
Quarantine

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they t

#### Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.

#### Flashcard 4884545277196

Question
Quarantine is [...].
the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers.

#### Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
to contain diseases at national borders provide limited value at great cost. Public health authorities in many locations imposed various forms of quarantine in attempts to quash the outbreaks. <span>Quarantine is the sequestration from the general public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control.

#### Flashcard 4884547374348

Tags
#DataScience #machineLearning #python
Question
Model Evaluation: Metric Functions. Syntax for classification, clustering and regression.

Classification
metrics.accuracy_score
metrics.average_precision_score

Clustering

Regression
metrics.mean_absolute_error
metrics.mean_squared_error
metrics.median_absolute_error

status measured difficulty not learned 37% [default] 0

#### Flashcard 4884548685068

Question
[...] is the sequestration of individuals known to have the infection.
Isolation

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Isolation is the sequestration of individuals known to have the infection.

#### Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i

#### Flashcard 4884549733644

Question
Isolation is [...].
the sequestration of individuals known to have the infection

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Isolation is the sequestration of individuals known to have the infection.

#### Original toplevel document

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
l public of individuals who have potentially been exposed to an infectious disease in an attempt to prevent them from spreading the disease if they turn out to be carriers. It is different from <span>isolation, which is the sequestration of individuals known to have the infection. Isolation was clearly effective in SARS and, in fact, was the key to its control. Quarantine, although widely employed, was not so clearly effective. In many cases, a large number of people subject to quarantine orders refused to comply. In fact, in some cases the i

#### Annotation 4884551830796

 Various types of travel screening were employed by a number of countries. Despite screening of millions of travelers, only a very few individuals with SARS were discovered. This was especially true of thermal screening. More than 35 million international travelers entering China, Canada, and Singapore had their temperatures measured, but no cases of SARS were found.18

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner

#### pdf

cannot see any pdfs

#### Annotation 4884581190924

 In the first few years following the SARS pandemic, “respiratory etiquette” became the “new normal” in hospitals—anyone with a cough had a surgical mask placed on them at the ED door, and aerosol-generating procedures were done only in closed rooms with staff wearing PPE

Toner and Nuzzo (2011): Acting on the Lessons of SARS: What Remains To Be Done? Eric S. Toner
health. Important lessons were also learned in the area of emergency management at the municipal, provincial/state, and national levels34 and in the realm of international treaties.35 Hospitals <span>In the first few years following the SARS pandemic, “respiratory etiquette” became the “new normal” in hospitals—anyone with a cough had a surgical mask placed on them at the ED door, and aerosol-generating procedures were done only in closed rooms with staff wearing PPE—and it was said that things would never be the same again. But now when we walk the halls of hospitals, this “new normal” for infection control is hard to detect. If SARS were transmitt

#### Annotation 4884582763788

 #bert #knowledge-base-construction #nlp #unfinished In BERT, the input representation of each token is the sum of its token, segment and position embeddings.

#### pdf

cannot see any pdfs

#### Annotation 4884584336652

 #bert #knowledge-base-construction #nlp #unfinished [CLS]’ is appended to the beginning of each sequence as the first token of the sequence. The fi- nal hidden state from the Transformer output cor- responding to the first token is used as the sen- tence representation for classification tasks. In case there are two sentences in a task, ‘[SEP]’ is used to separate the two sentences

#### pdf

cannot see any pdfs

#### Annotation 4884585909516

 #bert #knowledge-base-construction #nlp #unfinished ERT pre-trains the model parameters by us- ing a pre-training objective: the masked language model (MLM), which randomly masks some of the tokens from the input, and set the optimiza- tion objective to predict the original vocabulary id of the masked word according to its context.

#### pdf

cannot see any pdfs

#### Annotation 4884587482380

 #bert #knowledge-base-construction #nlp #unfinished Un- like left-to-right language model pre-training, the MLM objective can help a state output to utilize both the left and the right context, which allows a pre-training system to apply a deep bidirectional Transformer.

#### pdf

cannot see any pdfs

#### Annotation 4884589055244

 #bert #knowledge-base-construction #nlp #unfinished Besides the masked language model, BERT also trains a “next sentence prediction” task that jointly pre-trains text-pair representations.

#### pdf

cannot see any pdfs

#### Flashcard 4884591414540

Tags
#DataScience #machineLearning
Question
Natural Language Processing (NLP)
Natural language processing is an automated way to understand and analyze natural human languages
and extract information from such data by applying machine algorithms.

status measured difficulty not learned 37% [default] 0

#### Annotation 4884592725260

 #nlp #reading-group #transformer #unfinished with multi-headed attention we have not only one, but multiple sets of Query/Key/Value weight matrices (the Transformer uses eight attention heads, so we end up with eight sets for each encoder/decoder). Each of these sets is randomly initialized. Then, after training, each set is used to project the input embeddings (or vectors from lower encoders/decoders) into a different representation subspace.

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
e animal didn’t cross the street because it was too tired”, we would want to know which word “it” refers to. It gives the attention layer multiple “representation subspaces”. As we’ll see next, <span>with multi-headed attention we have not only one, but multiple sets of Query/Key/Value weight matrices (the Transformer uses eight attention heads, so we end up with eight sets for each encoder/decoder). Each of these sets is randomly initialized. Then, after training, each set is used to project the input embeddings (or vectors from lower encoders/decoders) into a different representation subspace. With multi-headed attention, we maintain separate Q/K/V weight matrices for each head resulting in different Q/K/V matrices. As we did before, we multiply X by the WQ/WK/WV matrices to

#### Flashcard 4884594298124

Tags
Question
[default - edit me]

The feed-forward layer is not expecting eight matrices – it’s expecting a single matrix (a vector for each word). So we need a way to condense these eight down into a single matrix.

How do we do that? We concat the matrices then multiple them by an additional weights matrix WO.

status measured difficulty not learned 37% [default] 0
Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
same self-attention calculation we outlined above, just eight different times with different weight matrices, we end up with eight different Z matrices This leaves us with a bit of a challenge. <span>The feed-forward layer is not expecting eight matrices – it’s expecting a single matrix (a vector for each word). So we need a way to condense these eight down into a single matrix. How do we do that? We concat the matrices then multiple them by an additional weights matrix WO. That’s pretty much all there is to multi-headed self-attention. It’s quite a handful of matrices, I realize. Let me try to put them all in one visual so we can look at them in one place

#### Annotation 4884595346700

 #nlp #reading-group #transformer #unfinished As we encode the word "it", one attention head is focusing most on "the animal", while another is focusing on "tired" -- in a sense, the model's representation of the word "it" bakes in some of the representation of both "animal" and "tired". If we add all the attention heads to the picture, however, things can be harder to interpret:

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
Now that we have touched upon attention heads, let’s revisit our example from before to see where the different attention heads are focusing as we encode the word “it” in our example sentence: <span>As we encode the word "it", one attention head is focusing most on "the animal", while another is focusing on "tired" -- in a sense, the model's representation of the word "it" bakes in some of the representation of both "animal" and "tired". If we add all the attention heads to the picture, however, things can be harder to interpret: Representing The Order of The Sequence Using Positional Encoding One thing that’s missing from the model as we have described it so far is a way to account for the order of the words in

#### Flashcard 4884597443852

Tags
#DataScience #machineLearning
Question

NLP Terminology

1. Tokenization
2. Stemming
3. Tf-idf
4. Semantic analytics
5. Disambiguation
6. Topic models
7. Word boundaries
1. Tokenization
Splits words, phrases, and idioms
2. Stemming
Maps to the valid root word
3. Tf-idf
Represents term frequency and inverse document frequency
4. Semantic analytics
Compares words, phrases, and idioms in a set of documents to extract meaning
5. Disambiguation
Determines meaning and sense of words (context vs. intent)
6. Topic models
Discover topics in a collection of documents
7. Word boundaries
Determines where one word ends and the other begins

status measured difficulty not learned 37% [default] 0

#### Annotation 4884597968140

 #has-images #nlp #reading-group #transformer #unfinished the transformer adds a vector to each input embedding. These vectors follow a specific pattern that the model learns, which helps it determine the position of each word, or the distance between different words in the sequence. The intuition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention.

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
Sequence Using Positional Encoding One thing that’s missing from the model as we have described it so far is a way to account for the order of the words in the input sequence. To address this, <span>the transformer adds a vector to each input embedding. These vectors follow a specific pattern that the model learns, which helps it determine the position of each word, or the distance between different words in the sequence. The intuition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention. To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern. If we assumed the embedding has a dimensionalit

#### Annotation 4884599541004

 #has-images #nlp #reading-group #transformer #unfinished To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern. If we assumed the embedding has a dimensionality of 4, the actual positional encodings would look like this:

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
uition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention. <span>To give the model a sense of the order of the words, we add positional encoding vectors -- the values of which follow a specific pattern. If we assumed the embedding has a dimensionality of 4, the actual positional encodings would look like this: A real example of positional encoding with a toy embedding size of 4 What might this pattern look like? In the following figure, each row corresponds the a positional encoding of a vect

#### Annotation 4884602424588

 #nlp #reading-group #transformer #unfinished The formula for positional encoding is described in the paper (section 3.5). You can see the code for generating positional encodings in get_timing_signal_1d(). This is not the only possible method for positional encoding. It, however, gives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set).

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
generated by one function (which uses sine), and the right half is generated by another function (which uses cosine). They're then concatenated to form each of the positional encoding vectors. <span>The formula for positional encoding is described in the paper (section 3.5). You can see the code for generating positional encodings in get_timing_signal_1d() . This is not the only possible method for positional encoding. It, however, gives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set). The Residuals One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connec

#### Annotation 4884603997452

 #has-images #nlp #reading-group #transformer #unfinished One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connection around it, and is followed by a layer-normalization step.

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
ives the advantage of being able to scale to unseen lengths of sequences (e.g. if our trained model is asked to translate a sentence longer than any of those in our training set). The Residuals <span>One detail in the architecture of the encoder that we need to mention before moving on, is that each sub-layer (self-attention, ffnn) in each encoder has a residual connection around it, and is followed by a layer-normalization step. If we’re to visualize the vectors and the layer-norm operation associated with self attention, it would look like this: This goes for the sub-layers of the decoder as well. If we’re to

#### Annotation 4884607143180

 #nlp #reading-group #transformer #unfinished After finishing the encoding phase, we begin the decoding phase. Each step in the decoding phase outputs an element from the output sequence (the English translation sentence in this case).

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
into a set of attention vectors K and V. These are to be used by each decoder in its “encoder-decoder attention” layer which helps the decoder focus on appropriate places in the input sequence: <span>After finishing the encoding phase, we begin the decoding phase. Each step in the decoding phase outputs an element from the output sequence (the English translation sentence in this case). The following steps repeat the process until a special symbol is reached indicating the transformer decoder has completed its output. The output of each step is fed to the bottom decode

#### Annotation 4884608716044

 #nlp #reading-group #transformer #unfinished The output of each step is fed to the bottom decoder in the next time step, and the decoders bubble up their decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word.

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
t sequence (the English translation sentence in this case). The following steps repeat the process until a special symbol is reached indicating the transformer decoder has completed its output. <span>The output of each step is fed to the bottom decoder in the next time step, and the decoders bubble up their decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word. The self attention layers in the decoder operate in a slightly different way than the one in the encoder: In the decoder, the self-attention layer is only allowed to attend to earlier p

#### Flashcard 4884610288908

Tags
#DataScience #machineLearning #python
Question
NLP scikit learn. ___ is used to convert text data into numerical feature vectors with a fixed size.
Bag of words

status measured difficulty not learned 37% [default] 0

#### Annotation 4884610813196

 #nlp #reading-group #transformer #unfinished The self attention layers in the decoder operate in a slightly different way than the one in the encoder: In the decoder, the self-attention layer is only allowed to attend to earlier positions in the output sequence. This is done by masking future positions (setting them to -inf) before the softmax step in the self-attention calculation. The “Encoder-Decoder Attention” layer works just like multiheaded self-attention, except it creates its Queries matrix from the layer below it, and takes the Keys and Values matrix from the output of the encoder stack.

Alammar-2018-The_Illustrated_Transformer-jalammar,github,io
ir decoding results just like the encoders did. And just like we did with the encoder inputs, we embed and add positional encoding to those decoder inputs to indicate the position of each word. <span>The self attention layers in the decoder operate in a slightly different way than the one in the encoder: In the decoder, the self-attention layer is only allowed to attend to earlier positions in the output sequence. This is done by masking future positions (setting them to -inf) before the softmax step in the self-attention calculation. The “Encoder-Decoder Attention” layer works just like multiheaded self-attention, except it creates its Queries matrix from the layer below it, and takes the Keys and Values matrix from the output of the encoder stack. The Final Linear and Softmax Layer The decoder stack outputs a vector of floats. How do we turn that into a word? That’s the job of the final Linear layer which is followed by a Softmax

#### Annotation 4884618153228

 Pasteur's quadrant is a classification of scientific research projects that seek fundamental understanding of scientific problems, while also having immediate use for society. Louis Pasteur's research is thought to exemplify this type of method, which bridges the gap between "basic" and "applied" research.[1] The term was introduced by Donald E. Stokes in his book, Pasteur's Quadrant.[

#### Annotation 4884619726092

#has-images
Applied and Basic research
Considerations of use?
No Yes
Quest for

fundamental
understanding?

Yes

Pure basic

research

Use-inspired

basic research

No

Pure applied

research

The result is three distinct classes of research:

1. Pure basic research, exemplified by the work of Niels Bohr, early 20th century atomic physicist.
2. Pure applied research, exemplified by the work of Thomas Edison, inventor.
3. Use-inspired basic research, described here as "Pasteur's Quadrant".

#### Flashcard 4884626017548

Tags
#machine-learning #management #software-engineering #unfinished
Question
we note that in the terminology of Pasteur’s Quadrant, 11 we do [...] (CS) research.
“use-inspired basic” and “pure ap- plied”

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
we note that in the terminology of Pasteur’s Quadrant, 11 we do “use-inspired basic” and “pure ap- plied” (CS) research.

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 4884627590412

Question

In this sentence, what does "promote" mean?

All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production.

The act of copying file content from a less controlled location into a more controlled location.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Not only do we have to manage the software code artifacts but also the data sets, the machine learning models, and the parameters and hyperparameters used by such models. All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production.

#### Original toplevel document

Sato,Wider,Windheuser_2019_Continuous-delivery_thoughtworks
icient collaboration and alignment. However, this integration also brings new challenges when compared to traditional software development. These include: A higher number of changing artifacts. <span>Not only do we have to manage the software code artifacts but also the data sets, the machine learning models, and the parameters and hyperparameters used by such models. All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production. It’s harder to achieve versioning, quality control, reliability, repeatability and audibility in that process. Size and portability: Training data and machine learning models usually co

#### Annotation 4884629949708

 #knowledge-base-construction #machine-learning #nlp #unfinished At the core of Alexandria is a probabilistic program that defines a process of generating text from a knowledge base consisting of a large set of typed entities.

#### pdf

cannot see any pdfs

#### Annotation 4884632309004

 #knowledge-base-construction #machine-learning #nlp #unfinished By applying probabilistic inference to this program, we can reason in the inverse direction: going from text back to facts.

#### pdf

cannot see any pdfs

#### Annotation 4884633881868

 #knowledge-base-construction #machine-learning #nlp #unfinished The use of a probabilistic program also provides an elegant way to handle the uncertainty inherent in natural text.

#### pdf

cannot see any pdfs

#### Annotation 4884635454732

 #knowledge-base-construction #machine-learning #nlp #unfinished An important advantage of using a generative model is that Alexandria does not require labelled data, which means it can be applied to new domains with little or no manual effort. The model is also inherently task-neutral – by varying which variables in the model are observed and which are inferred, the same model can be used for: learning a schema (relation discovery), entity discovery, entity linking, fact retrieval and other tasks, such as finding sources that support a particular fact.

#### pdf

cannot see any pdfs

#### Annotation 4884637027596

 #knowledge-base-construction #machine-learning #nlp #unfinished In this paper we demonstrate schema learning, fact retrieval, entity discovery and entity linking. We will evaluate the former two tasks, while the latter two are performed as part of these main tasks.

#### pdf

cannot see any pdfs

#### Annotation 4884640435468

 #knowledge-base-construction #machine-learning #nlp #unfinished An attractive aspect of our approach is that the entire system is defined by one coherent probabilistic model. This removes the need to create and train many separate components such as tokenizers, named entity recognizers, part-of-speech taggers, fact extractors, linkers and so on; a disadvantage of having such multiple components is that they are likely to encode different underlying assumptions, reducing the accuracy of the combined system. Furthermore, the use of a single probabilistic program allows uncertainty to be propagated consistently throughout the system – from the raw web text right through to the extracted facts (and back).

#### pdf

cannot see any pdfs

#### Flashcard 4885281901836

Question
congenital
[default - edit me]

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Flashcard 4885282950412

Question
RNA Polymerase in Eukaryotes
[default - edit me]

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Flashcard 4885284785420

Question
RNA polymerisa in eukaryotes
<p>RNA polymerase I makes rRNA, the most common (rampant) type; present only in nucleolus. RNA polymerase II makes mRNA (massive), microRNA (miRNA), and small nuclear RNA (snRNA). RNA polymerase III makes 5S rRNA, tRNA (tiny). No proofreading function, but can initiate chains. RNA polymerase II opens DNA at promoter site. I, II, and III are numbered in the same order that their products are used in protein synthesis: rRNA, mRNA, then tRNA. &alpha;-amanitin, found in Amanita phalloides (death cap mushrooms), inhibits RNA polymerase II. Causes severe hepatotoxicity if ingested. Actinomycin D, also called dactinomycin, inhibits RNA polymerase in both prokaryotes and eukaryotes</p>

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Annotation 4885287144716

 #machine-learning #management #software-engineering #unfinished Because of the time frame and ef- fort involved, Google’s approach to re- search is iterative and usually involves writing production, or near-produc- tion, code from day one.

#### pdf

cannot see any pdfs

#### Annotation 4885288717580

 #machine-learning #management #software-engineering #unfinished Typically, a single team iteratively ex- plores fundamental research ideas, de- velops and maintains the software, and helps operate the resulting Google ser- vices—all driven by real-world experi- ence and concrete data.

#### pdf

cannot see any pdfs

#### Annotation 4885290290444

 #machine-learning #management #software-engineering #unfinished This approach also helps ensure the research efforts produce results that benefit Google’s users, by allowing research ideas and implementations to be honed on em- pirical data and real-world constraints, and by utilizing even failed efforts to gather valuable data and statistics for further attempts.

#### pdf

cannot see any pdfs

#### Annotation 4885291863308

 #machine-learning #management #software-engineering #unfinished Google’s mission “To organize the world’s information and make it uni- versally accessible and useful,”

#### pdf

cannot see any pdfs

#### Annotation 4885293436172

 #machine-learning #management #software-engineering #unfinished Even a small team has at its disposal the power of many internal services, allowing the team to quickly create complex and powerful products and services. Design, testing, production, and maintenance pro- cesses are simplified.

#### pdf

cannot see any pdfs

#### Annotation 4885295009036

 #machine-learning #management #software-engineering #unfinished Google has been able to hire a tal- ented team across the entire engineer- ing operation. This gives us the op- portunity to innovate everywhere, and for people to move between projects, whether they be primarily research or primarily engineering.

#### pdf

cannot see any pdfs

#### Flashcard 4885296581900

Tags
#DataScience #machineLearning
Question
NLP. choice of model for supervised and unsupervised.

Supervised
Models predict the outcome of new observations and datasets, and classify
documents based on the features and response of a given dataset.
Eg: Naïve Bayes, SVM, linear regression, K-NN neighbors

Unsupervised
Models identify patterns in the data and extract its structure.
They are also used to group documents using clustering algorithms.
Example: K-means

status measured difficulty not learned 37% [default] 0

#### Annotation 4885298416908

 #machine-learning #management #software-engineering #unfinished We recognize that the wide dissemination of fundamental results often benefits us by garnering valuable feedback, educating future hires, providing collaborations, and seeding additional work.

#### pdf

cannot see any pdfs

#### Flashcard 4885299989772

Tags
#DataScience #machineLearning
Question

NLP. most basic technique for classification of text.

Uses:

Naïve Bayes Classifier

• It is efficient as it uses limited CPU and memory.
• It is fast as the model training takes less time.

Uses:
• Naïve Bayes is used for sentiment analysis, email spam detection, categorization of documents, and language detection.
• Multinomial Naïve Bayes is used whenmultiple occurrences of the words matter.

status measured difficulty not learned 37% [default] 0

#### Annotation 4885300514060

 #machine-learning #management #software-engineering #unfinished Even if we cannot fully factorize work, we have sometimes undertaken longer-term efforts. For example, we have started multiyear, large systems efforts (in- cluding Google Translate, Chrome, Google Health) that have important research components. These projects were characterized by the need for complex systems and research (such as Web-scale identification of paral- lel corpora for Translate 12 and various complex security features in Chrome 9 and Health). At the same time, we have recently shown that even in longer- term, publicly launched efforts, we are unafraid to refocus our work (for exam- ple, Google Health), if it seems we are not achieving success.

#### pdf

cannot see any pdfs

#### Annotation 4885302873356

 #machine-learning #management #software-engineering #unfinished this approach benefits from the mainly evolutionary nature of CS research, where great results are usu- ally the composition of many discrete steps.

#### pdf

cannot see any pdfs

#### Annotation 4885304446220

 #machine-learning #management #software-engineering #unfinished we have structured the Google environment as one where new ideas can be rapidly verified by small teams through large-scale experiments on real data, rather than just debated.

#### pdf

cannot see any pdfs

#### Annotation 4885306019084

 #machine-learning #statistics #unfinished First, studies often apply cross-validation on a subset of data subsampled from the original dataset. Performing this kind of preprocessing, in a machine learning context, without any kind of argumentation, raises doubts as it drastically increases the variance of the obtained results and avoids the problem of imbalanced data, which does not reflect reality in terms of potential applications

#### pdf

cannot see any pdfs

#### Annotation 4885307591948

 #machine-learning #statistics #unfinished Finally, there are many studies applying over- sampling before partitioning the data into two mutually exclusive sets in order to make the distribution of classes more uniform

#### pdf

cannot see any pdfs

#### Flashcard 4885309689100

Tags
#DataScience #machineLearning
Question
Document classifiers can have many parameters and a __ approach helps to search the best parameters
for model training and predicting the outcome accurately.
Grid Search

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885311524108

Tags
#DataScience #machineLearning
Question
What is the tf-idf value in a document?

td-idf value reflects how important a word is to a document.

Directly proportional to the number of times a word appears.

Offset by frequency of the words in corpus.

status measured difficulty not learned 37% [default] 0

#### Annotation 4885312048396

 #machine-learning #statistics #unfinished they might be rather optimistic due to the fact that the evaluation happened in a leave-one-out scheme.

#### pdf

cannot see any pdfs

#### Annotation 4885314407692

 #machine-learning #statistics #unfinished While this subsampling strategy again avoids the problem of imbalanced data, which is reflected in the original dataset, it does show an improvement in AUC and thus indicates that adding the MEMD-based feature to the dataset could be beneficial for the predictive performance. More- over, due to the many repetitions of the experiment, the sample mean better reflects the real mean.

#### pdf

cannot see any pdfs

#### Flashcard 4885316504844

Tags
#DataScience #python
Question
Python’s data visualization library
matplotlib

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885318339852

Tags
#DataScience
Question
create a plot using four simple steps.
Step 01: Import the required libraries
Step 02: Define or import the required dataset
Step 03: Set the plot parameters
Step 04: Display the created plot

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885320174860

Tags
#DataScience #python
Question
matplotlib, subplot syntax

subplot(m,n,p).

It divides the current window into an m-by-n grid and creates an axis for a subplot in the position specified by p.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885322009868

Tags
#DataScience #python
Question
matplotlib. method used to adjust the distances between the subplots?

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885323844876

Tags
#DataScience #python
Question
What is Seaborn?
Seaborn is a Python visualization library based on matplotlib.
It provides a high-level interface to draw attractive statistical graphics.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885325679884

Tags
#DataScience #python
Question
To import matplotlib and display the plot on Jupyter notebook use:

import matplotlib .pyplot as plt

%matplotlib inline

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885327514892

Tags
#DataScience #python
Question
Which keywords is used to decide the transparency of the plot line? (in matplotlib)
Alpha

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885329349900

Tags
#DataScience #python
Question
matplotlib statements limits both x and y axes to the interval [0, 6]?
plt.axis([0, 6, 0, 6]) statement limits both x and y axes to the interval [0, 6].

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885331184908

Tags
#DataScience #machineLearning
Question
What is Web Scraping
Web scraping is a computer software technique of extracting information from websites in an automated fashion.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885333019916

Tags
#DataScience #machineLearning
Question
Web Scraping Process
Step 1: A web request is sent to the targeted website to collect the required data.
Step 2: The information is retrieved from the targeted website in HTML or XML format from web.
Step 3: The retrieved information is parsed to the several parsers based on the data format.
Parsing is a technique to read data and extract information from the available document.
Step 4: The parsed data is stored in the desired format.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885334854924

Tags
#DataScience #machineLearning
Question
Web Scraping Considerations (legal), what to look for
Legal Constraints
Notice
Patented Information

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885336689932

Tags
#DataScience #machineLearning
Question
webscrapping tree structure
html > div > ul > lil > div class

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885338524940

Tags
#DataScience #machineLearning #python
Question
web scraping. The ___ function searches and retrieves all tags’ descendants that matches your filters.
find_all()

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885340359948

Tags
#DataScience #machineLearning #python
Question
web scraping. To find one result, use

find().

Returns only the first match value

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885342194956

Tags
#DataScience #machineLearning #python
Question
web scraping. The method get_text() is used to _________.
parse only part of the document.

status measured difficulty not learned 37% [default] 0

#### Flashcard 4885344029964

Tags
#DataScience #machineLearning #python
Question
web scraping.
navigate down:
Navigating Up:
Navigating Sideways:
Navigating Back and Forth:

web scraping.
navigate down:
• .contents and .children
• .descendants
• .string
• .strings and stripped_strings

Navigating Up:
.parents and .parent

Navigating Sideways:
.next_sibling and
.previous_sibling.

Navigating Back and Forth:
.next_element and .previous_element
.next_elements and .previous_elements

status measured difficulty not learned 37% [default] 0

#### Annotation 4886699052300

 Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;

#### pdf

cannot see any pdfs

#### Flashcard 4886700625164

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) [...] ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
balanço patrimonial

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) <span>balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d)

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 4886702198028

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado [...]; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
do período

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado <span>do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 4886703770892

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado [...] do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
abrangente

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
rações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado <span>abrangente do período; (c) demonstração das mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme N

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 4886705343756

Question
Algumas questões de concursos têm exigido o item 10 da NBC TG 26 (CPC 26) que versa sobre o “conjunto completo das demonstrações contábeis”. A seguir transcrevemos referido item: Conjunto completo de demonstrações contábeis 10. O conjunto completo de demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das [...] do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionado, se exigido legalmente ou por algum órgão regulador ou mesmo se apresentada voluntariamente; (e) notas explicativas, compreendendo as políticas contábeis significativas e outras informações elucidativas; (ea) informações comparativas com o período anterior, conforme especificado nos itens 38 e 38A;
mutações

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
demonstrações contábeis inclui: (a) balanço patrimonial ao final do período; (b) demonstração do resultado do período; (ba) demonstração do resultado abrangente do período; (c) demonstração das <span>mutações do patrimônio líquido do período; (d) demonstração dos fluxos de caixa do período; (da) demonstração do valor adicionado do período, conforme NBC TG 09 – Demonstração do Valor Adicionad

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 4886731296012

 assumes independence between the posterior distribution of the parameters associated with segments of data between successive changepoints

#### pdf

cannot see any pdfs

#### Annotation 4886767734028

 generalisation of that suggested by Liu and Lawrence (1999)

#### pdf

cannot see any pdfs

#### Annotation 4886769306892

 We consider two classes of prior for the changepoint pro- cess. One, that of Green (1995), involves a prior on the number of changepoints, and then a conditional prior on their position. The other is based on modelling the changepoint process by a point process (Pievatolo and Green, 1998), and is a special case of a product-partion model (Hartigan, 1990).

#### pdf

cannot see any pdfs

#### Annotation 4886771666188

 we assume that, conditional on the realisation of the changepoint process, the joint posterior distribution of the parameters is independent across the segments of the time series

#### pdf

cannot see any pdfs

#### Annotation 4886773239052

 assume a conjugate prior for the parameters associated with each segment

#### pdf

cannot see any pdfs

#### Annotation 4886775598348

 For a data set consisting of observations at discrete times, 1,...,n, the recursions are based on calcu- lating the probability of the data from time t to time n,given a changepoint at time t, in terms of the equivalent probabili- ties at times t + 1,...,n.

#### pdf

cannot see any pdfs

#### Annotation 4886777171212

 The assumption of conjugate priors can potentially be relaxed, but with an increase in the computational cost. Essentially, low-dimensional integrals that can be calculated analytically under conjugate priors would need to be calculated numerically (for example see Section 4.2).

#### pdf

cannot see any pdfs

#### Annotation 4886779530508

 Relaxation of the independence assumption is more difficult, but our algorithm can still be used as a useful tool for analysing such data.

#### pdf

cannot see any pdfs

#### Flashcard 4886852406540

Tags
#has-images

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs