Text and data mining (TDM) refers to the automated analysis of (large) amounts of texts or other data with regard to a specific question. Computer-aided methods are used to examine a data set, for example, with regard to patterns, trends or correlations.
- Acquisition of the data
First, the data to be used must be procured. On the one hand, objects that are not subject to copyright can be used for this purpose; on the other hand, copyrighted works can be used, although the legal regulations must be observed.
- Preparation of the data
The preparation of the data includes, among other things, the creation of machine readability, the structuring and normalization of the data, and the compilation of the data into a corpus.
- Analysis of the data
The corpus is analyzed using automated procedures to answer the research question.
- Publication and archiving of the corpus
The corpus will be kept available, archived as appropriate, and/or published in accordance with good scientific practice.
- Publication of the analysis results
The results of the analysis are prepared and published, for example, as part of a publication.
- Is it commercial or non-commercial research?
- Questions of copyright, legal access (concerns the acquisition of the data)
- Questions of making the data accessible (concerns the transfer or sharing of data)
- Questions about preservation of data (concerns archiving of the corpus)
- Questions about references (concerns the publication of the results of the analysis)
- if applicable, data protection, personal rights, etc.
- The mass download of data such as PDF files or information from databases is often prohibited and can lead to the blocking of the offer for the entire TU Dortmund.
We are happy to advise and support you regarding the various aspects of TDM.