Water Quality Monitoring and information dissemination at real-time for South African cities
- 0 Collaborators
A conversational model to apprise users with limited access to computational resources about water quality and real-time accessibility for a given location: We used natural language understanding through neural embedding driven approaches. This was integrated with a chatbot interface to accept user queries and decide on action output based on entity recognition from such input query and online information from standard databases and governmental and non-governmental resources. We present results of attempts made for some South African use cases, and demonstrate utility for information search and dissemination at a local level. ...learn more
Project status: Under Development
Overview / Usage
Developing countries have non-trivial constraints when it comes to adequately monitoring local
environments and disseminating relevant information around them. With the advent of smartphones and
cheap data plans, it is pertinent to alleviate such information asymmetry by leveraging the wealth of
information available in online content generated across sources and estimate the locally relevant pieces
using machine learning based tools reliant on natural language process and similar methods. The idea was to
incorporate such automated tools to extract topics of situational relevance for different stakeholders in the constraints of inadequate computational bandwidth, information distribution
at the last mile and a lack of physical infrastructure or capital resources. Such a topical exploration of relevant
information using a machine learning driven interactive chatbot system has myriad applications in our action
areas, like locating potable water sources with real-time information about impediments to accessibility owing
to unanticipated events such as road closures or unrest. Such a pipeline can be also used for other use cases
like cyclone warnings, disease outbreaks etc.
We attempt to leverage the progress made on applications around
deep learning for language, speech (6) and image context understanding to explore solutions towards water
quality information modelling and similar environmental aspects.We consider the problem of accessibility of water resources with considerations to the effects of real-time impediments driven by unforeseen situations. In this work, we attempted to implement the information relay system using a combination of a Natural Language Understanding unit and a dialogue module implemented as a chatbot which serves as the user input and the pipeline output interface recommending a policy/action based
on the attentive information from the user extracting keywords or key phrases (such as water quality) and
location along with relevant metadata, and matching such requests with relevant sources obtained from online
mining of concerned databases/websites to provide query-relevant knowledge. We leverage the recent works
on entity recognition to estimate information relevance and bundled it with attention mechanisms as described
in Fig. 2. The implementation in the South African context is at a community level and the idea has been to
improve access to information for local, granular decision making. The utility of the language processing
modules has helped in parsing relevant governmental, non-governmental and media resources to address
information gaps in such decision making.
Co-authors: Laing Lourens, Luqmaan Hassim, Faheem Sima, Avashlin Moodley, Pulkit Sharma
Methodology / Approach
Our proposed architecture is based on a NLP component, and an interface in the form of a chatbot (detailed in
figure). The user input query is processed using a Natural language understanding model based on the
RASA toolkit for processing sequences in English (to be extended to other South African languages) to
classify and parse semantically and contextually relevant information. The NLU model was trained with the
tensorflow embedding pipeline. The policy for this pipeline has multiple steps, including: using dense layers
to establish embeddings for entities and actions, including their histories; using Neural Turing Machines
(NTM) for calculating attention probabilities over system memory; detailed use of LSTM to determine
similarity between dialogue embedding and embedded system actions. This allows us to perform entity
recognition and intent modelling following the language identification step. A recurrent attention mechanism
on top of this is used to identify salient keywords to aid with the global search from publicly available
information with respect to water quality at real time.
Real-time availability of water and its accessibility is typically a function of complex interplays of predictable
data and unpredictable stochastic impediments, which can not be estimated from static data alone. Therefore,
it is important that our machine learning based method is able to incorporate real-time changes in the
information state at the input stage. We identify key variables with respect to the last mile access to potable
water resources not only in terms of environmental variables, but also social and situational aspects like
outbreaks of epidemics, civil unrest etc. We term this dictionary as the ‘Situational Variables’ used to inform
the Action step for the chatbot response once the policy has been decided to be apart from Greeting or
Farewell (and based on the conversation state which is tracked for the bot). This is so that such stochastic
events be tracked from user input and publicly scraped information from sources like the Department of
Water Affairs (Govt. of RSA), WESSA (Wildlife and Environmental Society of South Africa) and
Cyanolakes (online monitoring and mapping service for water and health authorities). Although API’s are not
yet available for all these data sources, they are included as such in the proposed architecture for optimal
system functionality. Instead, subsets of these data sources were stored locally for testing purposes. These
sources are parsed to inform the Action of the chatbot for response to the original user queries.
Technologies Used
TensorFlow
Keras
RASA toolkit