Development¶
This guide provides useful information for those who like to contribute to the project or just want into integrate it into their own.
Useful links¶
Write a custom processing step¶
A key part of OLDP is our data processing pipeline. All the data stored on the platform can be used as starting point for further processing, e.g. information extraction.
We illustrate how this can be done with the example of topic extraction. In our example we want to assign one or more topics to a law book. Hereby, a topic means a tag, category or list that helps users finding relevant information.
We start with a new Django app in oldp/apps/topics
and add it to the settings.py
.
Next, we define simple models.
Topic
represents a topic with a descriptive title
.
TopicContent
is an abstract class that we use as mixin to add the topics
field to the LawBook
model.
# oldp/apps/topics/models.py
class Topic(db.models):
title = models.CharField(
max_length=200,
help_text='Verbose title of topic',
unique=True,
)
class TopicContent(db.models):
topics = models.ManyToManyField(
Topic,
help_text='Topics that are covered by this content',
blank=True,
)
class Meta:
abstract = True
Extent the target model (LawBook
in our case) with TopicContent:
# oldp/apps/laws/models.py
class LawBook(TopicContent): # BEFORE: class LawBook(db.models):
# ...
# multiple inherits are possible:
# class Law(TopicContent, SearchContent, SomeOtherContent, ...):
Make and apply migrations:
./manage.py makemigrations
./manage.py migrate
For the actual processing step we need to implement as class called ProcessingStep
that inherits from
BaseProcessingStep
or the corresponding content class. Here, we inherit from LawBookProcessingStep
.
The cornerstone of each processing step is the process
method which takes as input a LawBook
, does the processing
and then returns the processed LawBook
.
In our example, we added an __init__
that pre-loads all available topics. Then, the actual process
only assigns
five random topics to the law book:
# oldp/apps/topics/processing/processing_steps/assign_topics_to_law_book.py
class ProcessingStep(LawBookProcessingStep):
description = 'Assign topics'
def __init__(self):
super().__init__()
self.topics = Topic.objects.all()
def process(self, law_book: LawBook) -> LawBook:
# Remove existing topics
law_book.topics.clean()
# Select 5 random topics
for t in random.sample(self.topics, 5):
law_book.topics.add(t)
law_book.save()
return law_book
Add processing step to settings:
# oldp/settings.py
# ...
# Processing pipeline
PROCESSING_STEPS = {
# ...
'LawBook': [
'oldp.apps.topics.processing.processing_steps.assign_topics_to_law_book', # Add this line
]
}
# ...
If the processing step is added to the settings and the corresponding admin class inherits from ProcessingStepActionsAdmin
,
the processing step gets automatically available as admin action.