EDAM Ontology

Bioinformatics operations, types of data, topics, and data formats

  1. Introduction
  2. Concepts
  3. Relations
  4. Rules
  5. Sources
  6. OBO-format [Term] structure
  7. Guidelines for developers
  8. Guidelines for annotators
  9. Existing implementations and annotations with EDAM

Introduction

Motivation

Bioinformaticians handle an increasingly large and diverse set of tools and data. Meanwhile, researchers demand ever more powerful and convenient means to organise, find, compare, select, reuse and connect the available resources. These tasks often rely on consistent, machine-understandable descriptions of the underlying components, but these have been generally lacking in documentation and metadata developed ad hoc. There is therefore an urgent need for an ontology that unifies semantically the bioinformatics concepts in common use and, for the annotator, provides a comprehensive controlled vocabulary that is broadly applicable.

What is EDAM?

EDAM (EMBRACE Data and Methods) is an ontology of common bioinformatics operations, topics, types of data including identifiers, and formats. EDAM comprises common concepts (shared within the bioinformatics community) that apply to semantic annotation of resources such as:

Scope

As a rule (with a few exceptions) EDAM only includes concepts strictly in domain of bioinformatics. General computer science or biological concepts are (typically) not included.

EDAM includes 4 main sub-ontologies of concepts organised into simple hierarchies:

Noteworthy within the Data sub-ontology is:

These provide different semantic 'axes' for annotation. For example, annotation of a Web service might include:

Architecture

EDAM has 4 components:

Concepts - These are well established and familiar bioinformatics concepts. Concepts have a name (term), a definition and one or more simple relations to other concepts defined in EDAM. Each concept has one or more intrinsic properties (reflected in the definition and relations).

Hierarchy - Every concept (excluding top-level concepts) is related to (typically) one other concept within the same sub-ontology by an is_a (generalisation) relation. These relations define the sub-ontology hierarchies. All "child" concepts must share the intrinsic property of their "parent", in addition to having their own intrinsic properties.

Relations - Concepts are related by defined relation types, but these types of relation apply also between other entities outside of EDAM (for example artifacts semantically annotated by EDAM or another ontology)

Rules - There are simple rules dictating how different types of concepts are related within EDAM. They define which relations may be specified for which concepts. They reflect well established or self-evident principles.

The EDAM architecture (below) is intentionally very simple. Bold text within a box indicates a top-level concept (sub-ontology), text next to lines indicates a type of relation between two concepts that is maintained within EDAM:

EDAM sub-ontologies

Download and Status

Locations for download in OBO format:

http://edamontology.org/ontology?format=obo (Always points to the last released version)

http://sourceforge.net/projects/edamontology/files/ (All versions)

Version 1.0, the first stable version of EDAM has been released . It uses the concepts, relations and rules below and should adhere to the Guidelines for developers. Contributions and suggestions are welcome.

EDAM is being actively developed:

For further information see the EDAM Wiki:

https://sourceforge.net/apps/mediawiki/edamontology/index.php?title=Main_Page

See the EDAM presentation at the BioOntologies SIG, ISMB 2011:

http://bio-ontologies.knowledgeblog.org/224

Viewing

EDAM is best viewed in the OBO Ontology Editor (OBOEdit) Version 2:

http://oboedit.org

To load EDAM, select "File ... Load Ontologies". The most convenient view is to have the "Ontology Tree Editor" (from "Editors" menu) and the "Text Editor" (also from "Editors" menu) side by side. "Concepts" and "Relation types" will appear in the "Ontology Tree Editor".

The view is cleaner if you only show the is_a relations. To do this, select the small f from the "Ontology Tree Editor" and then select "Show a single relation..." and then "is_a". Alternatively, you can select any combination of EDAM-specific relation types to be displayed.

EDAM is available in the following Web-based ontology browsers:

Licence

EDAM is made available for everyone to use, with the following constraints on its use or redistribution (including online accessibility):

The intellectual content of EDAM or of any of its parts cannot be included in other projects and artifacts unless agreed with the authors of EDAM.

Contacts

All enquiries to Jon Ison (jison@ebi.ac.uk) cc'ing Matus Kalas (matus.kalas@bccs.uib.no)

Thanks to Matus Kalas, Peter Rice, Inge Jonassen, James Malone, Steve Pettifer, Hamish McWilliam, Alan Bleasby, Mahmut Uludag and others for valuable discussions and contributions.

Mailing lists

Feel free to subscribe to the mailing lists:

Once subscribed, you can mail the user and developer lists:

edamontology-announce is for announcements (very minimal traffic!) edamontology-developers is for technical discussions between EDAM developers / contributors. edamontology-users is for general discussions and announcements.


Concepts

Operation

"A function or process performed by a tool; what is done, but not (typically) how or in what context."

e.g. "Sequence alignment", "Pairwise sequence alignment", "Sequence database search".

"Operation" concepts provide mostly fine-grained concepts for annotation of tool functions.

The top-level concepts are:

The top-level operations are necessarily coarse-grained (abstract) providing a navigable top-level. They serve as placeholders for other, more specific concepts lower down in the tree.

Data

"A type of data in common use in bioinformatics."

e.g. "Sequence alignment", "Comparison matrix", "Phylogenetic tree" etc.

Data concepts:

The top-level concepts are:

Their meaning is:

Concepts within "Core data" are:

Topic

"A general bioinformatics subject or category, such as a field of study, data, processing, analysis or technology."

e.g. "Sequence analysis", "Alignment", "Sequencing", "Microarrays".

"Topic" concepts provide coarse-grained categories for annotation of diverse bioinformatics resources. They do not cover biology or computer science exhaustively.

The top-level concepts are:

Format

"A specific layout for encoding a specific type of data in a computer file or memory."

e.g. "FASTA format", "PDB format", "mmCIF format" etc.

"Format" concepts:

The top-level concepts are:

All concepts are nested under "Binary format", "Textual format" and "XML", with exception of pure "HTML" or "RDF" (and "BioPAX"). The "Format (typed)" branch arranges formats by type of data and provides an additional axis over (the same set of) concepts under "Binary format", "Textual format" and "XML".

Identifier

"A label that identifies (typically uniquely) something such as data, a resource or a biological entity."

e.g. "UniProt accession", "EC number", "Gene symbol" etc.

"Identifier" concepts:

The top-level concepts are:

As for "Format", the "Identifier (typed)" branch provides an additional axis over (the same set of) concepts under "Accession" and "Name".


Relations

is_a

This is an OBO core relation. Defines a concept as a specialisation of another concept, relating a concept to a single parent (out of eventually multiple generalised, parent concepts). A is a specialisation of B, and B a generalisation of A. The is_a relation is transitive: if conceptA is_a conceptB, and conceptB is_a conceptC, conceptA is also conceptC.

e.g. "operation:Pairwise sequence alignment" is_a "operation:Sequence alignment"

has_input

Defines an "Operation" concept as reading (inputting) a "Data" concept.

e.g. "operation:Sequence alignment" has_input "data:Sequence"

has_output

Defines an "Operation" concept as writing (outputting) a "Data" concept.

e.g. "operation:Sequence alignment" has_output "data:Sequence alignment"

has_topic

Defines a "Data" or "Operation" concept as being within the scope of a "Topic" concept.

e.g. "operation:PolyA signal identification" has_topic "topic:Nucleic acid sequence analysis"

is_identifier_of

Defines that an "Identifier" concept identifies a "Data" concept.

e.g. "identifier:Sequence accession number" is_identifier_of "data:Sequence"

is_format_of

Defines that a "Format" concept is the format of a "Data" concept.

e.g. "format:Sequence format" is_format_of "data:Sequence record"


Rules

Rules define how concepts are related.

Rules by concept type

"Topic"

"Operation"

"Data"

"Format"

"Identifier"

Rules by relation type

is_a

has_input

has_output

has_topic

is_identifier_of

is_format_of


Sources

Various resources were analysed while constructing EDAM and were used as sources listing common bioinformatics concepts in scope.

Web services and applications

Domain ontologies, taxonomies, data models

For database-related concepts

  1. dbxref.txt (databases cross-referenced in UniProtKB/Swiss-Prot)
  2. List of databases collated by the ELIXIR project
  3. Lists of databases from the Web

Other resources

OBO-format [Term] structure

An OBO concept statement consists of:

For example:

[Term]
id: EDAM_data:0970
name: Bibliographic reference
namespace: data
subset: data
def: "Bibliographic data that uniquely identifies a scientific article, book or other published material." [http://edamontology.org]
comment: A bibliographic reference might include information such as authors, title, journal name, date and (possibly) a link to the abstract or full-text of the article if availabile.
synonym: "Reference" EXACT [http://edamontology.org]
synonym: "Citation" EXACT [http://edamontology.org]
xref: Moby:GCP_SimpleCitation
xref: Moby:Publication
is_a: EDAM_data:2857 ! Article metadata
is_a: EDAM_data:2093 ! Data reference

[Term]
id: EDAM_operation:0292
name: Sequence alignment
namespace: operation
def: "Align (identify equivalent sites within) molecular sequences." [http://edamontology.org]
synonym: "Sequence alignment generation" EXACT [http://edamontology.org]
is_a: EDAM_operation:2463 ! Sequence alignment processing
is_a: EDAM_operation:2451 ! Sequence comparison
is_a: EDAM_operation:2928 ! Alignment
relationship: has_input EDAM_data:2044 {min:cardinality=2} ! Sequence
relationship: has_output EDAM_data:0863 {min:cardinality=1 ! Sequence alignment}
relationship: has_topic EDAM_data:0182 ! Sequence alignment

Unique identifier (ID)

This 4 digit number uniquely identifies an EDAM concept. IDs persist between EDAM versions.

Name

There are rules for naming concepts.

Namespace

This is one of:

The namespace and subsets define concept's sub-ontologies.

Definition

There are rules for defining concepts.

Comment

The comment is optional and (typically) clarifies the definition.

Synonyms

These include related phrases, alternative spellings and true synonyms.

Cross references

Relations

Several types of relations are defined. They are:

  1. Defined between pairs of concepts
  2. Directional
  3. Transitive (propagated from child to parent concepts), e.g. if A is_a B is_a C we can infer A is_a C.

has_input and has_output relations include a statement on the cardinality of the data that is read or written

Obsolete concepts

Obsolete concepts use these fields:

EDAM-specific fields

These include:


Guidelines for developers

Contributions are welcome and should adhere to the guidelines below. Please email Jon Ison (jison@ebi.ac.uk) for help.

Adding concepts

Concepts may be added if:

  1. They are within scope (with some exceptions).
  2. They are well established and in common use
  3. They are general, e.g. re-used in multiple contexts, and not overly specialised.

Concepts should:

  1. Have at least one unique intrinsic property as reflected in the concept definition and relations. Concepts must be distinct from each other! Phrases describing essentially the same thing are handled using synonym: (see OBO-format [Term] structure).
  2. Correspond to common concepts, not anybody's concrete artifacts. Concepts typically have a single is_a (generalisation) relation or (in some cases) more than one.
  3. Be correctly assigned. Concepts should not use is_a when what is really meant is is_part_of (a common mistake). Concepts for conceptual parts should be located appropriately - not necessarily with the same parent! Note EDAM does not currently use has_part / is_part_of relations.
  4. Be correctly constituted. EDAM should not erroneously place into one generalisation-specialisation hierarchy concepts of a fundamentally different nature (e.g. physical entity, tool operation and database) which in EDAM are kept separate.
  5. Have all appropriate relations specified. (From those that are explicitly maintained in EDAM)
  6. Be well named and defined (see below).

Terms

Terms (concept names and their synonyms) should:

  1. Be unique within a namespace (the same name may be reused in different namespaces, although this is not recommended)
  2. Reflect their definition: the meaning of a concept should be reasonably obvious from its name
  3. Follow the patterns in use, particularly for naming parent / child concepts
  4. Be short and simple

Definitions

Concept definitions should:

  1. Describe at least one unique intrinsic property
  2. Be clear and simple, avoiding jargon and obscure acronyms
  3. Informative
  4. Unambiguous, avoiding words (e.g. "can", "may", "should") that introduce modality or ambiguity
  5. Short (the comment: field is used for extended comments)

Principles

Principles that guide developments include:

Limitations

EDAM is/does not:


Guidelines for annotators

Annotators may email Jon Ison (jison@ebi.ac.uk) and Matus Kalas (matus.kalas@bccs.uib.no) for help.

General guidelines

Which EDAM sub-ontology to use?

  1. "Topic" for coarse-grained annotation of diverse entities
  2. "Operation" for fine-grained annotation of tool functions
  3. "Data" for annotation of data in semantic terms
  4. "Format" for annotation of the syntax or format of data
  5. "Identifier" for annotation of identifiers (names and accessions) of data or other entities (see Annotation of data identifiers)

Use of other ontologies

The expectation is for EDAM to be used alongside other ontologies for annotation where possible and desirable. For example, an operation that predicts specific features of a molecular sequence could be annotated with concepts from SO (Sequence Ontology) for the features. Look at the seealso: fields (see OBO-format [Term] structure) in the OBO file for clues as to what ontologies to use.

Picking concepts

If you have many annotations to do, it will help to familiarise yourself with EDAM first using a browser (see Viewing).

  1. Identify the correct sub-ontology ("Operation", "Data" etc.) of concepts considering what is being annotated (see above)
  2. Search EDAM using keywords to find candidate concepts. Multiple searches using synonyms, alternative spellings and so are preferable.
  3. Pick the most specific concept(s) available, bearing in mind some concepts are necessarily overlapping or general.
  4. Only pick a correct concept. If it doesn't exist, request it's added to EDAM

Example: Picking concepts for sequence data

Raw sequences and records

When annotating sequences, the following concept (or its children) may be used:

[Term]
id: EDAM_data:2044
name: Sequence
namespace: data
def: "One or more molecular sequences, possibly with associated annotation." [http://edamontology.org]
comment: This concept is a placeholder of concepts for primary sequence data including raw sequences and sequence records.  It should not normally be used for derivatives such as sequence alignments, motifs or profiles.
is_a: EDAM_data:2925 ! Sequence data
relationship: has_topic EDAM_topic:0080 ! Sequence

Children of "Sequence" include:

Data:Sequence data:Sequence:Nucleic acid sequence
Data:Sequence data:Sequence:Protein sequence
Data:Sequence data:Sequence:Raw sequence
Data:Sequence data:Sequence:Sequence record

They are defined as follows:

[Term]
id: EDAM_data:2977
name: Nucleic acid sequence
namespace: data
def: "One or more nucleic acid sequences, possibly with associated annotation." [http://edamontology.org]
is_a: EDAM_data:2044 ! Sequence
is_a: EDAM_data:2525 ! Nucleic acid data

[Term]
id: EDAM_data:2976
name: Protein sequence
namespace: data
def: "One or more protein sequences, possibly with associated annotation." [http://edamontology.org]
is_a: EDAM_data:2044 ! Sequence
is_a: EDAM_data:2524 ! Protein data

[Term]
id: EDAM_data:0848
name: Raw sequence
namespace: data
def: "A raw molecular sequence (string of characters) which might include ambiguity, unknown positions and non-sequence characters." [http://edamontology.org]
comment: Non-sequence characters may be used for example for gaps and translation stop.
is_a: EDAM_data:2044 ! Sequence

[Term]
id: EDAM_data:0849
name: Sequence record
namespace: data
def: "A molecular sequence and associated metadata." [http://edamontology.org]
is_a: EDAM_data:2044 ! Sequence

Depending on how a tool is configured, one of the above concepts (or its children) should be used. Note that "Nucleic acid sequence" and "Protein sequence" provide additional axes over the same set of concepts under "Raw sequence" and "Sequence record", e.g.:

Data:Sequence data:Sequence:Nucleic acid sequence:Raw sequence (nucleic acid)
Data:Sequence data:Sequence:Nucleic acid sequence:Sequence record (nucleic acid)

If the format (character encoding) of a raw sequence must be annotated, then use the following concept (or its children) from the "Format" sub-ontology:

[Term]
id: EDAM_format:2571
name: Raw sequence format
namespace: format
def: "Format of a raw molecular sequence (i.e. the alphabet used)." [http://edamontology.org]
comment: See OBO file for URL of format specification. {url=}
is_a: EDAM_format:2350 ! Format (typed)
relationship: is_format_of EDAM_data:0848 ! Raw sequence

Children of "Raw sequence format" include:

Format:Format (typed):Raw sequence format:nucleotide
Format:Format (typed):Raw sequence format:protein
Format:Format (typed):Raw sequence format:pure
etc.

Sequence annotation

When annotating sequence annotation, such as sequence features, reports or metadata, the following concept (or children) may be used:

[Term]
id: EDAM_data:2955
name: Sequence report
namespace: data
def: "An informative report derived from molecular sequence analysis, including annotation on positional features (such as a feature table) or non-positional properties, and reports of general information (metadata)." [http://edamontology.org]
synonym: "Sequence-derived report" EXACT [http://edamontology.org]
is_a: EDAM_data:2018 ! Metadata and annotation
is_a: EDAM_data:2925 ! Sequence data

Children of "Sequence report" include:

Data:Sequence data:Sequence report:Feature record
Data:Sequence data:Sequence report:Sequence property
Data:Sequence data:Sequence report:Sequence image or plot

They are defined as follows:

[Term]
id: EDAM_data:1255
name: Features record
namespace: data
def: "Annotation of positional features of molecular sequence(s), i.e. that can be mapped to position(s) in the sequence." [http://edamontology.org]
is_a: EDAM_data:2955 ! Sequence report

[Term]
id: EDAM_data:1254
name: Sequence property
namespace: data
def: "An informative report about non-positional sequence features, typically a report on general molecular sequence properties derived from sequence analysis." [http://edamontology.org]
synonym: "Sequence properties report" EXACT [http://edamontology.org]
is_a: EDAM_data:2955 ! Sequence report

[Term]
id: EDAM_data:2969
name: Sequence image or plot
namespace: data
def: "Image of a molecular sequence, possibly with sequence features or properties shown." [http://edamontology.org]
is_a: EDAM_data:2968 ! Image or plot
is_a: EDAM_data:2955 ! Sequence report

"Feature record" (or children) should be used for positional features and "Sequence property" (or children) for non-positional properties. "Sequence image or plot" provides an alternative axis over the same set of concepts nested under "Feature record" and "Sequence property".

Children of "Feature record" include:

Data:Sequence data:Sequence report:Sequence features:Feature table
Data:Sequence data:Sequence report:Sequence features:General sequence features
Data:Sequence data:Sequence report:Sequence features:Nucleic acid features
Data:Sequence data:Sequence report:Sequence features:Protein features

Data corresponding to a standard feature table may be annotated with "Feature table" (or children). The other concepts (and children) may be used for everything else (for example, sequence analysis tools that report in non-standard report formats.

Sequence sets

Sets of sequences which correspond to database entry records or lists of raw sequences should be annotated as above; using one (or children) of:

Data:Sequence data:Sequence:Sequence record
Data:Sequence data:Sequence:Raw sequence

Sets of sequences which do not correspond to database entry records or lists of raw sequences should use "Sequence set" (or children):

[Term]
id: EDAM_data:0850
name: Sequence set
namespace: data
def: "A collection of multiple molecular sequences and associated metadata that do not (typically) correspond to molecular sequence database records or entries." [http://edamontology.org]
comment: This concept may be used for arbitrary sequence sets and associated data arising from processing.
is_a: EDAM_data:2955 ! Sequence report

Children of "Sequence set" include:

Data:Sequence data:Sequence report:Sequence set:Sequence cluster
Data:Sequence data:Sequence report:Sequence set:Sequence set (bootstrapped)
Data:Sequence data:Sequence report:Sequence set:Sequence set (nucleic acid)
Data:Sequence data:Sequence report:Sequence set:Sequence set (protein)

For example:

[Term]
id: EDAM_data:1238
name: Proteolytic digest
namespace: data
def: "A protein sequence cleaved into peptide fragments (by enzymatic or chemical cleavage) with fragment masses." [http://edamontology.org]
is_a: EDAM_data:1233 ! Sequence set (protein)

Annotation of data identifiers

EDAM models identifiers of data (such as sequence accession numbers) separately (in the "identifier" branch) to data proper (in the "data" branch).

When annotating identifiers of data which are the inputs or outputs of tools, then use the following concept (or its children) from the "Data" sub-ontology:

[Term]
id: EDAM_data:2860
name: Identifier
def: "A short numerical or textual label that identifies a thing." [http://edamontology.org]
is_a: EDAM_data:2926 ! Identifier data

Children of "Identifier" include:

[Term]
id: EDAM_data:2861
name: ID
def: "A short string value or number that is an identifier of a thing, typically an object (entry) from a database." [http://edamontology.org]
comment: Identifiers typically are enumerated string (a string with one of a limited set of values) or are strings that are conformant to a regular expression. An ID is not necessarily stable or persistent over time, for example between different versions of a database.  See OBO file for regular expression.
is_a: EDAM_data:2860 ! Identifier

[Term]
id: EDAM_data:2862
name: Accession
def: "A persistent (stable) and unique identifier, typically identifying an object (entry) from a database." [http://edamontology.org]
is_a: EDAM_data:2861 ! ID

[Term]
id: EDAM_data:2863
name: Name
def: "A name of a thing, which need not necessarily uniquely identify it." [http://edamontology.org]
is_a: EDAM_data:2860 ! Identifier

For most intents and purposes one of the above annotations will suffice. If, however, the precise nature of the identifier is required (as might be the case, for example, when annotating a catalogue of data identifiers), then use concepts under the "Identifier" subontology. In the case of sequences:

[Term]
id: EDAM_data:1063
name: Sequence identifier
namespace: identifier
def: "An identifier of molecular sequence(s) or entries from a molecular sequence database." [http://edamontology.org]
is_a: EDAM_data:0976 ! Identifier (typed)
relationship: is_identifier_of EDAM_data:2044 ! Sequence

Children of "Sequence identifier" include:

Identifier:Identifier (typed):Sequence identifier:Sequence accession
Identifier:Identifier (typed):Sequence identifier:Sequence name

Drilling down to as fine detail as required, e.g.:

Identifier:Identifier (typed):Sequence identifier:Sequence accession:Sequence accession (protein)
Identifier:Identifier (typed):Sequence identifier:Sequence accession:Sequence accession (protein):UniProt ID
Identifier:Identifier (typed):Sequence identifier:Sequence accession:Sequence accession (protein):UniProt ID:UniProt accession
Identifier:Identifier (typed):Sequence identifier:Sequence accession:Sequence accession (protein):UniProt ID:UniProt accession:TREMBL accession

Annotation of Web services

Model of a Web service

A Web service is considered as an arbitrary (but usually related) set of one or more operations, reducing the problem of Web service interoperation to one of compatibility between operations.

Operation

Input

Output

XML elements

Levels of annotation

Annotation of a WSDL file or associated XSD schema is possible at several levels. Assuming SAWSDL annotation (http://www.w3.org/TR/sawsdl/), the XML elements that may be annotated by EDAM concepts are:

  1. Web service (as a whole) (<wsdl:portType>)
  2. Operation (<wsdl:operation> inside <wsdl:portType>)
  3. Input parameters and their sub-parts (<xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute>)
  4. Output parameters and their sub-parts (<xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute>)

NB. The input and output parameters should be annotated inside the XML Schema that defines them. In case of services that are not following the highly recommended document/literal wrapped SOAP-binding style, the <wsdl:part> inside <wsdl:message> can be annotated (the same applies to faults, but meanings of faults are not modelled by EDAM)

The following annotations might be useful but are not directly recommended by SAWSDL:

  1. Enumerated values of input/output parameters (<xs:enumeration>)

For details of incorporating the SAWSDL annotations into WSDLs and XSDs, see EDAM URIs and SAWSDL annotation.

EDAM URIs and SAWSDL annotation

SAWSDL mandates the use of sawsdl:modelReference attributes for annotation. The format of EDAM URIs used inside this attribute includes the ontology name (http://edamontology.org), main sub-ontology, and the unique identifier (ID) of the particular concept:

 
<xs:element name="elementName" sawsdl:modelReference="http://edamontology.org/subontology_id">

Where ...

The value of the sawsdl:modelReference attribute is a URI pointing to the concept definition. The URI to use is in case of EDAM includes the concept's sub-ontology:

So for these 3 concepts:

[Term]
id: EDAM_topic:0182
name: Sequence alignment
namespace: topic
...

[Term]
id: EDAM_operation:0292
name: Sequence alignment construction
namespace: operation
...

[Term]
id: EDAM_data:0863
name: Sequence alignment
namespace: data
...

We'd have

http://edamontology.org/topic_0182
http://edamontology.org/operation_0292
http://edamontology.org/data_0863

Which can be used in SAWSDL annotation, e.g.

<wsdl:portType name="myService" sawsdl:modelReference="http://edamontology.org/topic_0182">
<sawsdl:attrExtension sawsdl:modelReference="http://edamontology.org/operation_0292>
<xs:element name="outfile" sawsdl:modelReference="http://edamontology.org/data_0863>

If more than one annotation of an element is required, these can be given in the sawsdl:modelReference attribute delimited by space characters:

<wsdl:portType name="myService" sawsdl:modelReference="http://edamontology.org/topic_0182 http://edamontology.org/operation_0292">

NB. Such multiple annotations need not be in the same namespace, and need not at all to refer to the same ontology.

SAWSDL guidelines for annotating operations

One peculiarity of the SAWSDL specification is that annotations on <wsdl:operation> element inside <wsdl:portType> should be handled using a <sawsdl:attrExtensions> element. This is not a requirement for other elements.

Importantly, the <sawsdl:attrExtension> element inside the wsdl:operation must be before <wsdl:input>, <wsdl:output> and <wsdl:fault> elements (so typically after the <wsdl:documentation> element).

For example:

 <wsdl:portType name="Clustalw2PortType" sawsdl:modelReference="http://edamontology.org/topic_0186 http://edamontology.org/operation_0496">
         <wsdl:operation name="submitClustalw2">
                 <wsdl:documentation>Submit a sequence and get a jobID</wsdl:documentation>
                 <sawsdl:attrExtensions sawsdl:modelReference="http://edamontology.org/operation_0496"/>
                 <wsdl:input message="submitClustalw2Msg"/>
                 <wsdl:output message="submitClustalw2ResponseMsg"/>
         </wsdl:operation>

Some WSDL/XSD validators or SOAP libraries do not check for it, but some do require the strict order of these elements.

Existing implementations and annotations with EDAM

EMBOSS

EMBOSS applications have been annotated using EDAM and these annotations appear in corresponding Web services.

Annotated WSDL files (and associated XSD data schema) are available from:

You will see a list of service end-points with WSDL URLs. For example:

To see the data schema associated with a WSDL, you must replace "?wsdl" with "?xsd=1", "?xsd=2" or "?xsd=3". For example:

BioXSD

The BioXSD XML schema (XSD) defines exchange formats of everyday bioinformatics data types. BioXSD aims to serve as the common, canonical data model for bioinformatics Web services. It includes commonly used types including sequences, sequence annotations, alignments and references to resources:

BioXSD has been annotated with EDAM concepts.

DRCAT biological resource catalogue

A catalogue of data resources (DRCAT) is being compiled as part of the EMBOSS project. Each entry in DRCAT gives metadata on a data resource available on the Web. The metadata includes "Query" lines describe the type(s) of data available, the data format, data identifier (used to query) and a URL from which data can be retrieved. The "Query" lines and the resources themselves are annotated with EDAM concepts.

A typical entry is shown below:

(NB. The format of EDAM ids has not been upgraded to version 1.0 yet. Will be done asap.)

ID      PDB
Acc     DB-0070
Name    The RCSB Protein Data Bank
Desc    A repository for 3D biological macromolecular structure data.
URL     http://www.rcsb.org/pdb/
Cat     3D structure databases
EDAMres EDAM:0000693 | Tertiary structure
EDAMdat EDAM:0000883 | Tertiary structure
EDAMdat EDAM:0002085 | Structure annotation
EDAMfmt EDAM:0001476 | pdb
EDAMfmt EDAM:0001478 | pdbml
EDAMfmt EDAM:0001477 | mmCIF
EDAMfmt EDAM:0002331 | HTML 
EDAMid  EDAM:0001127 | PDB ID
Xref    SP_explicit | None
Xref    SP_FT | None
Xref    EMBL_explicit | None
Query   EDAM:0002085 | EDAM:0002331 | EDAM:0001127 | http://www.pdb.org/pdb/explore/explore.do?structureId=%s
Query   EDAM:0000693 | EDAM:0001476 | EDAM:0001127 | http://www.pdb.org/pdb/files/%s.pdb
Query   EDAM:0000693 | EDAM:0001477 | EDAM:0001127 | http://www.pdb.org/pdb/files/%s.cif
Query   EDAM:0000693 | EDAM:0001478 | EDAM:0001127 | http://www.pdb.org/pdb/files/%s.xml
Example EDAM:0001127 | 1rbp
Email   deposit@deposit.rcsb.org
CCmisc  EMBL DR line example "1OSN", /dbxref="PDB:12GS"
Status  Referenced

DRCAT development will proceed in harmony with bioDBCore, which proposes a community-defined, uniform, generic description of the core attributes of biological databases:

bioDBCore is under the auspices of the International Society for Biocuration:

All enquiries to Jon Ison (jison@ebi.ac.uk)

Bio-jETI

Bio-jETI allows automatic composition of functional units into software systems according to higher-level specifications using EDAM:

iHOP Web service

The iHOP Web service is annotated with EDAM concepts, either directly or via its use of BioXSD:

CBU Web services

The Web services provided by the Computational Biology Unit (CBU) of the University of Bergen and its affiliated Uni Computing are annotated with EDAM concepts:

eSysbio

The eSysbio workbench for sharing and analysing bioinformatics data using public or private Web services and R scripts. eSysbio uses EDAM to annotate and denote the type and format of data items submitted to the system.