Новости
Контакты
Схема проезда:
ИМБ
ФБиБи, МГУ
Статья о семинаре |
Краткие резюме докладов
|
|
2012-14
2009-11
2006-08
2003-05
2000-02
1997-99
1994-96 |
|
|
|
54.
23.1.1997. |
|
V.L.Surin
Scientific Center of Haematology, Russian Acad. Med.
Sci.
Non-traditional variants of PCR: Application to
oncology and phylogenetic studies
PCR with random primers is applied to analysis of
molecular mechanisms of chromosomal translocations t(9;22) causing
a number of blood oncological diseases. Biprimer RAPD-PCR systems
are used to establish phylogenetic relationships between different
animal species.
|
55.
6.2.1997 |
|
G.K. Frank, V.Ju. Makeev
Institute of Molecular Biology
G and T nucleotide contents show specie-invariant
negative correlation for all three codon positions
The nucleotide contents of the three codon positions
show a number of statistical pairwise correlations, some of which
are universal for all analyzed genomes. Among the most prominent
of these correlations are negative correlations between G and T
contents found in genes of all species analyzed. The pair A /C,
which is complementary to G/T shows similar negative correlation
in genes of most species. In the genes of several species including
all mammalian genes studied, positive correlations between A and
T contents, and G and C contents are found. Since these regularities
are observed in all three codon positions they are connected with
amino-acid content of proteins. Such correlations may origin from
features of the mutation process or/and translation reading frame
check. The well-known bias of the preference for G in the first
codon position and its deficiency in the second is accompanied by
opposite bias in T content In the third codon position there is
no general nucleotide preference, but its content is often biased
with regard to GC content of the gene. G and T contents in this
case are always shifted in the opposite directions. Several ideas
are drawn to explain this preference. [Frank & Makeev, 1997].
|
56.
15.5.1997 |
|
Nikita Vassetsky
Institute of Molecular Biology
Reconstruction of sequences of ex-functional genes
Eukaryotic genomes often carry sequences that had
coded for genes but were then spoiled with mutations. The only reliable
way to reconstruct the functional sequence, to compile it from numerous
copies, is applicable to repeated genome elements. We propose two
approaches to approximate the functional gene sequence for non-repeated
defective (but not heavily damaged) genes. The first requires a
sequence of a closely related functional gene. On the base of their
alignment the effect of the frameshift mutations in the defective
gene is eliminated so that a putative amino acid sequence can be
deduced. Another approach is based on the structural distinctions
between the coding and non-coding nucleotide sequences (commonly
used to estimate the coding potential of a sequence). The tested
sequence is optimized to be "coding-like", which eliminates
the effect of the frameshift mutations. Of course, these methods
produce no more than a rough approximation to the real functional
sequence. Still, this may suffice to gain valuable information (e.g.,
the produced amino acid sequence may prove similar to a functional
protein, while their original nucleotide sequences are dissimilar).
|
58.
5.6.1997 |
|
Igor Rogozin
CNR - Istituto Tecnologie Biomediche Avanzate; Institute
of Cytology and Genetics, Novosibirsk
Analysis of donor splice sites in different eukaryotic
organisms
We present here a new approach for functional site
analysis. It is based on four main assumptions: each variation of
nucleotide composition makes different contribution to the overall
binding free energy of interaction between functional site and other
molecule; nonfunctioning site-like regions (pseudosites) are absent
or rare in genome; there may be errors in the sample of sites; and
nucleotides of different site positions are considered to be mutually
dependent. In this algorithm, the site set is divided into subsets,
each described by a certain consensus. Donor splice sites of the
human protein-coding genes were analyzed. Comparing the results
with other methods of donor splice site prediction has demonstrated
a more accurate prediction of consensus sequences AG/GU(A,G), G/GUnAG,
/GU(A,G)AG, /GU(A,G)nGU, G/GUA than is achieved by weight matrix
and consensus (A,C)AG/GU(A,G)AGU with mismatches. The probability
of the first type error E1 for the obtained consensus set was about
0.05, and the probability of the second type error E2 was 0.15.
The analysis demonstrated that accuracy of the functional site prediction
could be improved if one takes into account correlations between
the site positions. The accuracy of prediction by using human consensus
sequences was tested on sequences from different organisms. Some
differences in consensus sequences for the plant Arabidopsis sp.,
the invertebrate Caenorhabditis sp. and the fungus Aspergillus sp.
were revealed. For the yeast Saccharomyces sp. only one conservative
consensus /GUA(U,A,C)G(U,A,C) was revealed (E1=0.03, E2=0.03). Yeast
can be suggested as a very interesting model for analysis of molecular
mechanisms of splicing.
|
61.
11.9.1997 |
|
Sh. Sunyaev
Institute of Molecular Biology
Towards statistical model of protein family. Application
for the remote homologue hunting
Profile search is one of the most popular methods
of sequence analysis and homologue hunting. Given the multiple alignment
of a protein family it derives family profile; i.e., the matrix
of position dependent scores. It allows to detect remote homologs
invisible for pairwise alignment routines. However, currently used
methods are purely empirical and do not provide any statistical
model of the protein family formation. We present new semi-empirical
statistical method of profilemaking which outperforms existing profile
methods as well as Hidden Markov Models. Further improvement of
statistical model and possible development of appropriate alignment
strategy will be also discussed. [Сюняев и др., 1999; Sunyaev et
al., 1999a; Sunyaev et al., 2000c].
|
62.
25.9.1997 |
|
Igor Berezovsky
Institute of Molecular Biology
Hierarchy regions of amino acid sequence in accordance
with their role in energetic properties of protein spatial structure
We represent an amino acid sequence by the energy
curve of the interactions between parts of the spatial structure.
Each point is set in correspondence with the Van der Waals interaction
energy between the regions of the globule separated by this point.
We determine positions corresponding to the minimal interaction
between parts of the globule. Zero level of the energy means independence
of the adjacent regions. Residues corresponding to minima on the
curve are boundaries of structurally independent parts of the globule.
After several iterations for different values of the minimal energy
we divide the sequence into a hierarchy of structural segments.
We analyze several families of proteins, determine
and compare boundaries of domains and modules, describe the differences
in contribution of these structural units to mantaining the spatial
structure, compare the structural division with results of sequence
alignment.
[Berezovsky et al., 1997; Berezovsky et al., 1999;
Berezovsky et al., 2000a; Berezovsky et al., 2000b].
|
63.
2.10.1997 |
|
Eugene V. Koonin
National Center for Biotechnology Information, NLM,
NIH
A genomic perspective on protein families
To extract maximum information from the rapidly
accumulating genome sequences, all conserved genes need to be classified
according to their homologous relationships. By comparing proteins
encoded in 7 complete genomes from 5 major phylogenetic lineages
(5 bacterial, one archaeal, and one eukaryotic) and elucidating
consistent patterns of sequence similarities, we delineated 720
Clusters of Orthologous Groups (COGs). Each COG consists of individual,
orthologous proteins or orthologous sets of paralogs from at least
three lineages. Orthologs typically have the same function, allowing
transfer of functional information from one member to an entire
COG. This automatically yields a number of functional predictions
for poorly characterized genomes. The COGs comprise a framework
for functional and evolutionary genome analysis. The COG system
was used to analyze three additional genomes, those of the giant
symbiotic plasmid from Rhizobium sp., the pathogenic bacterium Helicobacter
pylori, and the nematode Caenorhabditis elegans (~60% of the genome).
This analysis included semi- automatic functional annotation of
the conserved portion of each gene set, and identification of common
and rare phylogenetic patterns, which significantly differ in bacteria
and eukaryotes. A systematic survey of conserved families missing
in H. pylori suggests major revisions of the central metabolic pathways
in this bacterium.
|
64.
8.1.1998 |
|
Boris Galitsky, I.M.Gelfand, Alexander Kister
Rutgers University
Classificaion of immunoglobulins
Immunoglobulin (human heavy) sequences were analyzed
in terms of patterns (keywords) of small amino acid fragments. Representation
of the sequences as a combination of 17 keywors of each fragments
revealed that 6 main combinations describe the majority of sequences
( 60% exactly, 40% approximately).
An important feature of the new classification principle
is that the knowledge of few keywords, or even of the residues at
several key positions, allows one to determine the class affiliation
of immunoglobulins and thus to predict residue, or residue type
in almost any position of a sequence.
Prediction algorithm, designed for the molecular
language, displays the features of the semantic processor for the
natural language. The rule-based classification principle, developed
on immunoglobulin sequences, is applicable to a wide variety of
protein families.
[Galitsky et al., 1998; Galitsky et al., 1999].
|
65.
22.1.1998 |
|
V.A.Shepelev
Institute of Molecular Genetics
On the distribution of dinucleotides in nucleic
acid sequences
The distribution of dinucleotides in nucleic acid
sequences can be described by a set of dinucleotide frequencies
as well as relative frequencies (odds-ratios). It is well known
that in general the odds-ratio deviates from unity. This leads to
the concept of genome signature, which implies that the set of odds-ratios
is to some extent specific for particular genomes and taxons. The
so-called empirical distribution function yields more detailed description
for the dinucleotide distribution. Theoretical distributions are
derived for the reference purpose (zero-order model). An alternative
approach deals with the distribution of waiting times for different
dinucleotides. Examples of distribution for large mammalian and
human viruses are given. Special features of distributions for higher
eukaryotes are also shown.
|
66.
5.2.1998 |
|
V.Ju.Makeev
Institute of Molecular Biology
Probabilistic approach to segmentation of DNA sequences
The problem of DNA segmentation is considered. Each
DNA sequence is presented as the set of statistically independent
blocks, with Bernoulli probability of individual nucleotide to appear.
The total probability of the whole sequence to appear is calculated.
This probability is maximized to obtain an ideal segmentation. To
estimate the probability of nucleotides to appear within alleged
blocks Dirichlet probabilistic approach is used. [Ramensky et al.,
1999].
|
67.
12.2.1998 |
|
Pavel Khil
Institute of Bioorganic Chemistry
Phylogenetic analysis of long terminal repeats
of the HERV-K family endogenous retroviruses
Sequences of 45 long terminal repeats (LTRs) of
the human endogenous retroviruses HERV-K family precisely mapped
by us earlier on human chromosome 19 were determined, and a nearest
neighbour dendrogram was constructed. No correlation was observed
between the degree of identity of the LTR pairs and their relative
positions on the chromosome. Thus sequences of distantly located
LTRs positioned even on different chromosome arms could be highly
similar to each other whereas those of closely located LTRs could
significantly differ. We conclude that the LTRs randomly transposed
across the chromosome in the course of the evolution. The alignment
of the LTR sequences allowed us to assign most of the LTRs to two
major subfamilies. The LTRs belonging to the first subfamily (LTR-I)
are characterised by higher intrasubfamily sequence divergence than
those of the second subfamily (LTR-II). The two subfamilies are
easily discernible due to the presence of characteristic deletions/insertions
in the LTR sequences. The higher divergence of the first subfamily
members suggests that their propagation started at earlier stages
of the evolution, probably soon after their ancestor insertion into
the primate genome. In turn, each of the subfamilies includes several
distinct branches with various degrees of intragroup divergence
and with characteristic diagnostic features, suggesting that the
members of the branches represent amplified copies of particular
master genes having appeared in different periods of the evolution.
The sequences of the LTRs demonstrate characteristic distribution
of conservative and variable regions indicating that the LTRs might
have some sequence-dependent functions in the primate genome.
|
70.
4.6.1998 |
|
M.G.Sadovsky
Institute of Biophysics, Russian Acad. Sci., Siberian
Branch, Krasnoyarsk
Genetic texts, vocabularies and information
Genetic sequences are considered as texts (genetic
texts, GT). Each GT corresponds to the frequency vocabulary, that
is the list of all subwords of fixed length with their counts. The
fundamental problems are reconstitution of a longer vocabulary (in
particular, the text itself) given some vocabulary, and comparison
of two GT given thir vocabularies. In the first case, we demonstrate
the existence of a critical length after which the GT can be reconstituted
unambiguously and study the behavious of this length for various
genes and their fragments. In the second case, we consider the problem
of reconstitution of an ensemble of vocabularies. We introduce the
quality of reconstitution of vocabulary of length q+1 (and in general
q+s) given the vocabulary of length q. This value characterized
the informational capacity of a GT.
We classify GTs by their statistical characteristics.
By definition close GTs have in some sense similar vocabularies.
Using automated classification we demonstrate that functionally
similar sequences are close in the above sense. We demonstrate the
connection between the structure (vocabulary) and taxonomy. GTs
having close vocabularies are non-randomly distributed in the set
of families. It is interesting that a point in the classification
space is exactly a family: there is no correlation with higher taxonomic
levels.
We develop a method of sequence comparison not using
alignment. Sequence vocabularies are compared via an intermediate
object, namely the hybrid vocabulary which is the statistical ancestor
for all sequences in a group. The statistical ancestor is the vocabulary
that can be obtained from any set under comparison by adding some
minimal amount of information. The set of words in the hybrid vocabulary
is the union of the word sets of the compared vocabularies, whereas
the word frequencies are the averages of the frequencies in these
vocabularies. This provides for the minimum total entropy of the
considered vocabularies relative to the hybrid vocabulary. We present
the results of comparison of sequences from EMBL.
|
71.
2.7.1998 |
|
Boris Galitsky
Rutgers University
Natural language undertsanding and formal scenarios.
Some applications.
The talk will focus on the logical aspects of natural
language understanding. The issues of logical programming and peculiarities
of metaprogramming technique are addressed as a basis for representation
of natural language (NL) semantics.
The approach of NL understanding in the expandable
problem domain is implemented, allowing real-time introduction of
new facts and definitions of new concepts.
The syntactic analysis systems of Apresian/Boguslavsky
(IPPI) and START of Katz (MIT) will be presented in respect to compatibility
with the the semantic processor, based on advanced reasoning involving
time, space, action, knowledge and belief. Semantic subsystem of
filtering of the speech recognition results illustrates some mechanism
of reasoning in inconsistent conditions.
|
72.
6.7.1998 |
|
Yury Wolf
National Center for Biotechnology Information, Bethesda
Distribution of protein folds in the three superkingdoms
of life
A protein fold recognition procedure was developed
on the basis of iterative database search using the PSI-BLAST program.
In the completely sequenced genomes, folds could be automatically
identified for 20-30% of the proteins, with 5-6% more detectable
by additional analysis of conserved motifs. The distribution of
the most common folds is very similar in bacteria and archaea but
distinct in eukaryotes; parasitic bacteria being different from
the free-living ones. In all superkingdoms, the P-loop NTPases are
the most abundant fold. In bacteria and archaea, the next most common
folds are TIM-barrels, ferredoxin-like proteins and methyltransferases,
whereas in eukaryotes, the second to fourth places belong to protein
kinases, b-propellers and TIM-barrels. Several statistical aspects
of fold distribution are discussed.
|
76.
15.10.1998 |
|
И.А.Захаров
Институт общей генетики
Полиморфизм и половые соотношеиия в популяциях:
исследования в природе и математические модели
Оптимальной половой структурой у полигамных видов
является такая, при которой в популяции с максимальной частотой
присутствуют самки, дающие в своем потомстве наибольшее количество
жижзнеспособных и плодовитых дочерей. Самцов при этом должно быть
меньше, чем самок, т.к. сокращение их доли увеличивает пищевые и
другие ресурсы, которыми могут воспользоваться самки. Доля самцов,
однако, не может быть сколь угодно малой. Её предел задается, во-первых,
тем, сколько самок в сезон размножения может найти и оплодотворить
самец, и, во-вторых, тем, при какой доле самцов начинают сказываться
вредные последствия инбриидинга, когда самцы с заметной вероятностью
будут оплодотворять родствнных им дочерей.
У некоторых насекомых минимизация числа самцов в
популяции достигается наличием самок, дающих бессамцовое потомство.
При этом общий объем потомства у них оказывается равным 0.5 от объема
потомства нормальных самок, но дочери из бессамцового потомства
проявляют большую выживаемость, поскольку снабжены дополнительными
пищевыми ресурсами.
Популяция в указанном случае состоит из: m самцов,
n1 нормальных самок, n2 “бессамцовых” самок. Соотношение m:n1:n2
устойчиво воспроизводится в поколениях.
|
84.
13.5.1999 |
|
S.A.Spirin
Belozersky Institute of Physico-Chemical Biology
PSI-BLAST and its Russian analog
I will discuss two approaches used to estimate the
quality of a local alignment. In the first approach the quality
is defined as the sum of substitution weights. This approach is
used in the most popular algorithms for local alignment and databank
screening, such as Smith-Waterman algorithm, BLAST, and FastA. In
the other approach the quality is defined as the so-called power
of a local alignment. Although the second approach is much less
popular, it has some advantages.
PSI-BLAST is a relatively new tool for databank
screening. Its main idea is to use for screening the so-called profile
created with the results of preliminary screening(s). Its background
is usual BLAST, thus it uses the sum as the quality measure.
Recently V.K.Nikolaev wrote a program based on the
ideas of PSI-BLAST, but using power as quality. The first tests
of the program are promising. I will explain the algorithm of this
program.
[Николаев и др., 1997].
|
85.
22.6.1999. |
|
Gregory Kucherov
INRIA-Lorraine
On maximal repetitions in sequences
I will talk about maximal repetitions, called "tandem
repeats" in biological literature. In the first part I'll present
some theoretical background. In particular, I'll mention important
data structures (suffix tree, DAWG (Directed Acyclic Word Graph))
and describe main ideas behind our algorithm which finds all so-called
maximal repetitions in a sequence in linear time on the length of
the sequence. In the second part, I'll talk about our implementation
of this algorithm and computer experiments on DNA sequences.
The first part is a common work with Roman Kolpakov
from Moscow University. The second part describes partly a recent
work by Mathieu Giraud (student of ENS Lyon).
|
86.
23.9.1999 |
|
Д.А.Филимонов
ИБМХ РАМН
Компьютерная оценка свойств химических соединений
с использованием неполной эмпирической информации: математические
основы прогноза биологической активности
Оценка физико-химических свойств и биологической
активности химических соединений (ХС) необходима для решения многих
задач биологии, медицины, экологии, поскольку эмпирические данные
для каждого ХС содержат лишь часть из огромного разнообразия свойств.
Целью нашей работы является создание методов оценки свойств ХС на
основе использования имеющихся эмпирических данных о структурах
и свойствах ХС и применения компьютерных технологий извлечения знаний
из имеющейся информации.
В основе прогноза спектров биологической активности
в системе PASS (http://www.ibmh.msk.su/PASS/default.htm)
лежит традиционная гипотеза SAR/QSAR/QSPR/Molecular Modelling:
Активность = Функция (Структура молекулы)
В системе PASS структура молекул описывается дескрипторами
молекулярного базиса атомных окрестностей (МоБАО), а более чем 500
активностей представлены в обучающей выборке из более чем 30000
веществ качественно: “наличие/отсутствие” эффекта.
Задача прогноза биологической активности в такой
постановке сводится к проблеме построения решающего алгоритма, чему
и будет посвящен доклад.
Выбор оптимального алгоритма прогноза биологической
активности среди разнообразных классов и разновидностей оценок возможности
проявления активности прогнозируемым ХС выполнен с использованием
скользящего контроля с исключением по одному и по два и случайного
разбиения обучающей выборки на две независимые подвыборки на основе
разработанных нами критериев максимальной ошибки прогноза (MEP)
и инвариантной точности прогноза (IAP). Будут представлены и обсуждены
основные результаты этих исследований.
Представление результатов прогноза в системе PASS
выполняется в виде двух оценок по каждой прогнозируемой активности,
которые по своему построению являются оценками вероятности ошибок
1-го и 2-го рода, но могут интерпретироваться и как вероятности
принадлежности к классам активных и неактивных ХС. Это предоставляет
пользователю ясные возможности для решения разнообразных практических
задач.
|
87.
28.10.1999 |
|
Eugene Koonin
National Center for Biotechnology Information (NLM,
NIH; Bethesda, USA)
Horizontal gene transfer: evidence and role in
the evolution of prokaryotes
Orthologous gene families that are conserved in
diverse bacterial, archaeal and eukaryotic genomes typically show
patchy phylogenetic distribution, which suggests that horizontal
gene transfer and lineage-specific gene loss played a major role
in evolution. Distinguishing between these two types of events with
confidence is not easy. However, combined analysis of patterns of
phylogenetic distribution and tree topologies suggests parsimonious
scenarios that favor horizontal transfer, differential gene loss
or a combination thereof for individual gene families. Horizontal
gene transfer appears to involve all functional categories of prokaryotic
genes, with the possible exception of some of the core components
of translation and transcription, but seems particularly prominent
among genes that encode DNA repair and signal transduction system
components. Frequently, horizontal transfer seems to be accompanied
by the elimination of the original gene responsible for the respective
function. Such events can be classified into two categories: i)
non-orthologous gene displacement – replacement of a gene by an
unrelated or distantly related gene coding for a functionally similar
protein, and ii) xenologous gene displacement - replacement of a
gene by an ortholog from a phylogenetically distant lineages. I
will attempt to present a rough quantitative evaluation of the amount
of relatively recent horizontal gene transfer in evolution. The
conclusion will be that between very distant lineages, such as,
for example, archaea and bacteria, it is significant but not overwhelming.
By contrast, within tight taxonomic groups, such as the Euryarchaeota,
gene exchange seems to be rampant. Apparent horizontal gene transfer
and lineage - specific gene loss will be exemplified by a systematic
analysis of the evolution of aminoacyl-tRNA synthetases which includes
a variety of evolutionary scenarios. In spite of the prominence
of horizontal gene transfer and differential gene loss, a clear
phylogenetic signal still can be extracted from comparisons of entire
protein sets from completely sequenced genomes. Phylogenetic trees
produced by using parameters of the distribution of similarity scores
between likely orthologs to calculate evolutionary distances between
genomes will be discussed.
|
|
|
|
|