mirror of
https://github.com/ushahidi/SMSSync.git
synced 2026-04-25 15:55:57 +03:00
[GH-ISSUE #22] regular expressions for keyword filtering #17
Labels
No labels
Bug report
Code improvement
Concern
Feature request
Feature request
Good first issue to work on
In progress
Needs info
Question
Ready
Translation
User Experience
User Experience
Website
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/SMSSync#17
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @swoertz on GitHub (May 29, 2012).
Original GitHub issue: https://github.com/ushahidi/SMSSync/issues/22
it would be nice if you could filter sms messages by regular expressions
@eyedol commented on GitHub (Aug 26, 2012):
Don't have the bandwidth to implement this feature. I'm leaving it open. Any community member who is willing to implement this feature should let me know. I'll humbly merge any pull request that comes my way on that front.
@olliebennett commented on GitHub (Dec 22, 2012):
I am playing with implementing this at the moment. My thoughts are:
I propose an alternative, whereby entering
regex(YOUR_REGEX_HERE)in the keywords field would cause the filtering to be processed via regular expressions rather than comma-separated keywords.What are your thoughts on this? I will submit a pull request once I have this implemented and tested, so you can decide whether it's acceptable for inclusion! :)
Notes:
@olliebennett commented on GitHub (Dec 22, 2012):
What I've got so far. All changes are within
ProcessSMS.java, and there's nothing complicated.Introduced dependency on java.utils.regex:
Changed name + function call:
Replaced...
... with ...
Then, changed the function itself by replacing...
... with ...
Feel free to take this, as I don't have more time at the moment - it should work but it's not yet properly tested.
@swoertz commented on GitHub (Dec 22, 2012):
I would suggest to make a checkbox (regular expression - true, false) instead of implementing it as a hidden feature. Persons who are not familiar either not check the box or look up what it means. All the others can simple use the feature. Further I would also suggest to make a checkbox for case-sensitivity.
@olliebennett commented on GitHub (Dec 22, 2012):
Okay - thanks for sharing your thoughts. I agree this would be a nice addition, but my concerns remain about making the UI too cluttered, and adding complexity.
Have a look at a mock-up of these new options:
I think you'll agree, it's getting a bit complicated, and there's no longer a nice way to give instructions on comma-separated keywords (currently, the keywords field says: "Enter keyword Eg. Fire, Crime, Help", which is clear and consise).
I'm thinking of a more useful position for the extra options. I think, at least, this feature would require a new full-page screen, that could be used when creating or editing sync URLs.
In my code above, I'm just being lazy, to be honest, and implementing the feature with the fewest changes possible! ;)
I do think your case-sensitivity option is important too. I had considered working this into my "secret" regex activation code - something like
regex(YOUR_REGEX)params(CASE_INSENSITIVE)- but had ignored the idea as this just makes it very confusing. With the UI changes, it'd be a great idea. :)Let's see what @eyedol etc. think?
@swoertz commented on GitHub (Dec 22, 2012):
An other possibility would be, making two input fields and a checkbox for each to (de)activate (grey out the textfields) the filter mechanisms.
Of course this makes the UI more complicated but in my oppinion the benefits of the additional functionality outweighs the increased complexitiy. It would also be possible to omit the checkboxes and check if the input fields are empty or not...
I would further process the message if one of the mechanisms matches (LOGICAL OR).
@eyedol commented on GitHub (Dec 22, 2012):
Perhaps we could use a label saying you can enter regex or CSV to filter for keywords instead of the checkboxes. I agree with @olliebennett the ui can be a bit messy when we add the checkboxes
@swoertz commented on GitHub (Dec 22, 2012):
I agree with you, but we have to distinguish between keywords and regexs. The 3 ways to do that we mentioned so far are:
@olliebennett commented on GitHub (Dec 22, 2012):
Of those three options, I think @swoertz's ASCII layout above is best.
I think the lack of case sensitivity toggling is a fair UX compromise, but there are two options for treating the input:
(k|K)(e|E)(w|W)(o|O)(r|R)(d|D)if required?!Which do you prefer? My use case for RegEx, would be to case-insensitively match messages that start with a keyword, for instance
^keyword.*$.@eyedol, is there a fool-proof way of detecting whether the list is a RegEx, or a comma-separated list of keywords?
@swoertz commented on GitHub (Dec 22, 2012):
In general there is no way, I can think of. Because every keyword is a valid regex and vice-versa if I take the regex literally I can use it as keyword.
If we allow only [a-zA-Z]* for example for keywords we could detect the special characters like "^$*" and so on, but I think this is confusing.
@eyedol commented on GitHub (Dec 23, 2012):
@olliebennett no, there is no fool-proof way of detecting whether the list is a RegEx or CSV. I think we should just make the keyword input filed process RegEx. So as a user when I enter,
heat,sun,cold,yamas keywords, SMSSync should be able to match them and when I enter^keyword.*$it should also be able to match that as well.@olliebennett commented on GitHub (Dec 23, 2012):
Okay. I guess we could try use a RegEx of our own to detect whether the filter text was a valid CSV (and if not, treat as RegEx).
This (realistically) means enforcing some limits on the acceptable keywords used in the CSV list. I propose limiting these keywords to be alpha-numeric, with spaces permitted both surrounding the comma (
test, keyword) and also within the keyword itself (test keyword,and another).Are we happy to place these limits on the acceptable keywords? And what should the limits be? I think those I've listed are sufficient and reasonable.
See http://refiddle.com/by/olliebennett/csv-or-regex for a first attempt that seems to work†
It would be nice to simply treat everything as a regex, but to use @eyedol's example of
heat,sun,cold,yam, we'd need to convert this to(heat|sun|cold|yam)(or something like that) in order to make it have the same effect as the keywords would suggest.If this is okay with you guys, the change should be trivial, and I'll update the code (and relevant documentation) and submit a pull request.
† assuming RegEx in Java behaves in the same way as RegEx in JavaScript...
@swoertz commented on GitHub (Dec 23, 2012):
This is a nice solution, but would not work for e.g. hashtags as keywords (example:
#foo, #bar). Of course if one wants to filter by these tags(#foo)|(#bar)would work. There are other examples beside hash-tags as well. How would you document this, so that the user can recoginze the difference between keywords and reg. expressions while using the app? Many applications which are able to work with regex and "normal" strings use a checkbox (e.g. search-function in text-editors).@olliebennett commented on GitHub (Dec 23, 2012):
This is a tricky one!
I agree - there's a problem of letting the user know that what they think are normal keywords are in fact going to be treated as a (probably useless or even invalid) RegEx.
Perhaps we could add a confirmation box that pops up when saving the Sync URL settings, which would only appear if a special character has been entered, saying†:
The documentation would need to state that:
There would also need to be a couple of common RegEx examples, such as "starts with X", or similar.
It's my opinion that in this case, creating extra options in the UI adds more confusion than it does value.
Another option - as an alternative to using
regex(abc)- is to require the use of^and$at the start and end, respectively (although that still has problems).There's always the (inefficient) workaround of handling the RegEx matching on the server, and only taking any action if a match is found. If it doesn't match, simply return
success: "true"and then both systems can move on. This of course removes the need for RegEx support altogether (but it's not a nice solution!).† at this stage, we could also check the validity of the RegEx.
@swoertz commented on GitHub (Dec 23, 2012):
There are so many combinations which are not alphanumeric, which would lead to the message "You have entered special characters...".
The question is what is more confusing, the checkbox or the info-message when one enters "special" characters. If one is not familiar with regex's the default value of the checkbox (FALSE) is just fine. Further I think most of the smssync users are developers anyway (or is that a wrong assumption?).
How should a user who can't handle regex's react on the info message?
@eyedol commented on GitHub (Feb 11, 2013):
@olliebennett what's the status of this? If you send me a pull request, I'm happy to merge after a code review
@olliebennett commented on GitHub (Feb 11, 2013):
I'm actually using my own modified version with RegEx in place of keywords as standard. Adding the actual RegEx capability was trivial, however adding a configuration option proved more complicated (creating database info, UI elements, loading/saving of existing configuration etc.) so the version I use has no changes to the UI. :( Sorry
Seem to have lost the code, but can provide the apk if anyone needs - just email.
I'll have another go at implementing it sometime, perhaps.
@mandric commented on GitHub (Feb 12, 2013):
Maybe provide a branch if it's not too much hassle?
@olliebennett commented on GitHub (Feb 24, 2013):
I've slightly changed the way this works for me, by first checking whether the filter text matches a keyword, and if not, checking if it matches when treated as a RegEx. I know there could be issues with this approach, but can't think of any that would interfere with the typical, simple use cases.
I'm not comfortable enough with this code to throw a pull request yet, but I'll do some more tests, and consider bulking it out a bit, if needed.
Check out my feature-regex branch and see what you think. If anyone's got any input, please reply here.
@eyedol commented on GitHub (Feb 25, 2013):
We could document on SMSSync website as, for simple filtering, use CSV as the keywords and for complex filtering, consider entering a valid RegEx in the filter by input field.
@olliebennett commented on GitHub (Feb 25, 2013):
Agreed. The only possible problem that I foresee is if someone sends a message containing the RegEx itself. This would match as a 'keyword'.
Do you think we should change the 'enter keywords eg fire, help' placeholder text to something mentioning regex too? Or just document it online?
@eyedol commented on GitHub (Feb 25, 2013):
Yeah, we could make the placeholder text mention RegEx too