reger
6f8c3ccea4
improve url hash computation for file path with mixed java & windows
...
file.separator to compute equal hashes (by normalizing path for computation)
+ expand test case for to check mixed java / windows file url notation
like e.g. file:///c:/test/file.html vs. file:///c:\test/file.html
- relates partially to http://mantis.tokeek.de/view.php?id=692
9 years ago
reger
efcb6a1e74
fix supported mime XML -> xml for rssParser (mime normalized to lower case for comparison)
...
+ add mime text/xml as in use for rss in the wild
9 years ago
luccioman
b3b75b0498
Accessibility : add a customizable alternative text to YaCy log
...
Applied W3C recommendations :
https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image
and
https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems
9 years ago
luccioman
f2bc1b268d
Updated URL fragment validation rules according to current standards
...
See RFC 3986 (https://tools.ietf.org/html/rfc3986 ) or URL living
standard (https://url.spec.whatwg.org/ )
9 years ago
luccioman
b1b8e69da8
Fixed NullPointerException cases
9 years ago
luccioman
3ee4f56c39
Improved ErrorCache behavior when switching networks
...
Even after network switch, ErroCache was still holding a reference to
the previous Solr cores, thus becoming useless until next YaCy restart.
Initial error cache filling with recent errors from the index was also
missing after the swtich.
9 years ago
luccioman
7d5ba2afa4
Added some JavaDoc and moved crawlStacker close at the right place.
9 years ago
luccioman
8edbcd8ad4
Log eventual Solr instances close errors.
...
We do not want to block on this kind of error, but this should not
silently fail as it may have later consequences.
9 years ago
reger
330768c8a2
fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686
...
The embedded core holds a lock on the index and must be closed. Earlier commit
comment states that core should be closed with solr instance instead on close
of connector.
Adjusted the InstanceMirror.close() to take care of closing the embedded
instance to release the lock.
In 2 routines of fulltext this was already explicite implemented (disconnectLocalSolr).
Now this disconnect is part of the InstanceMirror.close().
9 years ago
reger
585d2a6441
test case: for NewsPool to check the id modificator (for unique id)
...
and observe the distribution order .. hands on.
+ add test/DATA to gitignor
9 years ago
luccioman
de5c873e38
Removed unused JavaScript file docs.min.js
...
This file is used by Bootstrap documentation website
(http://getbootstrap.com/ ) but is not part of the Bootstrap distribution
and has not be included in a Bootstrap based application.
9 years ago
Michael Peter Christen
df51e4ef07
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
e063aaf97f
enable fuzzy search, solr style (append a ~ to get a fuzzyness on the
...
word)
9 years ago
reger
ff6589fc0f
test case: simulating multi word query for local rwi index
...
Purpose of the test case is to be able to (controlled) analyse the rwi ranking for
multi word searches (with focus on posintext and word-distance ranking)
9 years ago
reger
e990297d2e
avoid NPE on hello message with missing "yourip" key
...
http://mantis.tokeek.de/view.php?id=684
9 years ago
reger
e51ab8c7aa
hack to generate a unique message-id for messages created in the same second
...
by optionally add a 1 second offset counter to the current time (which is
used as the unique id part)
9 years ago
Michael Peter Christen
b82300358a
removed version number check because it does not work any more if
...
version numbers are expressed in a different way as we expect. That
could cause that YaCy does not run on systems which are appropriate but
we simply do not understand the version string.
9 years ago
Michael Peter Christen
2107674999
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
0d28f563f4
fix for java version "9-ea"
9 years ago
reger
3b694b3935
add some javadoc to rwi wordreference distance, position
...
to remember facts for http://mantis.tokeek.de/view.php?id=683
Init missing word position to 0 like in other non text body words
9 years ago
reger
a4465c97d6
as requested, disable/remove old swf parser
...
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5861#p33098
9 years ago
reger
7f63fc50f3
prepare a IndexSegment test case for RWI index testing
...
+ prevent NPE in Segment.clear() on missing embedded solr instance.
9 years ago
reger
96467c5467
remove not needed counter in Tokeninzer (completing last changes)
...
including a small change, word posintext counting.
We remember/store 1st posintext. Previously following words got a handle (posintext)
excluding found. Now it just counts and assigns true posintext as handle (posintext)
9 years ago
luccioman
d66b0f7b7b
Fixed french messages encoding in YaCy tray.
...
Also added the missing french translations.
9 years ago
reger
7efb66ee10
adjust the WordReference.join wordsintext calc to take the max (instead of sum)
...
The reference is for the same url (add same for title and phrases).
+ del redundant join() procedure
9 years ago
luccioman
0a9ff14d96
Fixed NullPointerException case and added Javadoc
9 years ago
luccioman
06d4f93d03
Merged master into postprocessing branch
9 years ago
Michael Peter Christen
b73d2db914
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
25a3c7a6d0
catch exception and write end of object
9 years ago
reger
272cdd496a
reactivate sentence counter in WordTokenizer for phrasepos ranking,
...
by counting punktuation (delivered as 1 char word) again.
9 years ago
Michael Peter Christen
5e165a8150
removed unused imports
9 years ago
Michael Peter Christen
c716648c78
enhanced json encoding of strings
9 years ago
Michael Peter Christen
6139bd85a8
fix for broken facet names
9 years ago
Michael Peter Christen
5060f9fee9
fix for too long snippets
9 years ago
Michael Peter Christen
8681cee3f3
fix for bad comma
9 years ago
Michael Peter Christen
db6d8fc197
fix for bad json
9 years ago
Michael Peter Christen
8f4a341735
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
9934f546bb
added default fl to solr query, removed large texts retrieval and
...
changed snippet to description tag if no other description is available
9 years ago
reger
120bf7e6e2
implemented RWI WordReference to return the word position value (was always left empty)
...
This is needed and enables existing word position ranking for RWI.
The upcoming concurrency issue in word position min/max calculation were eliminated
by iterator.hasHext check before next() access.
9 years ago
reger
e310ec5f70
fix posInText ranking calculation to score 0 on no position info
...
+ fix Word posInText calc in Tokenizer to start with 1
+ test case
9 years ago
luccioman
74f9927ddc
Merge remote-tracking branch 'origin/master' into dist_macOS
9 years ago
reger
51c077f493
adjust the getTopics() and getTopicNavigator() to current useage
...
- move the maxcount limit restriction completely to getTopicNavigator (as there not used in getTopics)
- let search servlet use getTopics by default (w/o RWI connected check, as of now, Topics are available w/o any additional index interaction)
9 years ago
reger
39dd244693
fix ConcurrentScoreMap.set() calculation of totalCount()
...
+ test case
9 years ago
reger
ebf818ad95
log a error on aborted news publish (due to duplicate news.id)
...
+ change printed err msg to log entry in PeerAction.processPeerArrival
9 years ago
reger
cc2d9dd3f1
reactivate the use of included-in-topwords boost in postRanking
...
+ changed the postRanking to add one score only if word appears more as one time.
+ getTopics() unused code block rem'd (save performace)-> routine needs rework !
9 years ago
luccioman
39ea28adfd
Merged master to dist_macOS branch.
9 years ago
luccioman
8255e91c99
Fixed serverClassLoader.findClass method
...
htroot is a supposed to be a subfolder of appPath and not of dataPath,
as assumed in other places where htroot is loaded. This issue was not
visible when dataPath and appPath are equals.
9 years ago
reger
6801673a07
apply postranking media search boost only on media queries
9 years ago
luccioman
1dc4306058
Fixed indentation for better readability.
9 years ago
luccioman
8c49a755da
Postprocessing refactoring
...
Added Javadocs to refactored methods.
Added log warnings instead of silently failing some errors.
Only fill collection1hosts when required ( shallComputeCR true).
9 years ago
luccioman
42f45760ed
Refactored postprocessing
...
For easier understanding and performances profiling.
9 years ago
reger
4386e84b55
correct NewPool rentention calculation
...
(was still clearing everything after one day)
9 years ago
reger
5e72d37f0a
TransNews_p: add ad-hoc translation of target file on positive vote (additon to local translation)
...
+ errmsg on language=default
9 years ago
reger
9462a32244
Added news service for easy, community driven UI translation support.
...
New or modified translation (via /Translator_p.html) can be shared/distributed
via the YaCy internal news service. Remote peers can see and vote on the
translation via the new http://localhost:8090/TransNews_p.html servlet.
A positive vote will add the received translation to the local translation
list and post a voting message to the news service.
(at this no processing of received votings is implemented)
+ fixed the msg service retention time check (NewsPool.automaticProcessP)
9 years ago
reger
f8d6543a23
Rename class CreateTranslationMaster to TranslationManager and add
...
additional routines and the capability to handle translation maps internally
(to reduce complexity of handling translation maps for calling servelets)
9 years ago
reger
19b4509d54
speed-up reading of xlif language file, by using xmlparser (stax) instead of jaxb
...
making xliff-core-1.2-1.1.jar obsolete
9 years ago
Michael Peter Christen
e1fac86f53
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
a9316ceff6
force browser-caching of favicons from search results
9 years ago
Orbiter
503312ca43
Merge pull request #61 from luccioman/heroku_experiments
...
Deploy YaCy on Heroku
9 years ago
reger
33bf35d90f
missing file for prev commint "Introduction of additional language setting browser"
9 years ago
reger
16e8ed3f01
Introduce additional language setting "browser/Browser Language" for UI internationalization.
...
If language is set to "browser" the client/user browser language is used to choose from
available translation.
simply: one users browser speaks English -> YaCy responds in English, other users browser speaks French -> YaCy responds in French.
! To make a translation/language available you have to activate the language once !
(or manually use the utility class TranslateAll)
In ConfigBasic.html availabel translations are marked green on setting language=Browser
The client language is determined by http header Accept-Language (checked in DefaultServlet)
9 years ago
reger
3b47a07dd1
change unused servletProperties entry CONNECTION_PROP_CLIENT_REQUEST_HEADER to
...
use directly HttpServletRequest. This is used to get the http protocol version
in HTTPDProxyHandler.fulfillRequestFromWeb() for error response to client.
- adjust YaCyProxyServlet and UrlProxyServlet accordingly
- use more http_version constants in headerframework and httpdeamon
- equalize servlets (3) use of HeaderFramework.CONNECTION_PROP_HOST to HeaderFramework.HOST
9 years ago
reger
036c1dc6ef
fix CookieTest_p formatting (output of <br> as text),
...
change to dataoutput only by servlet, leave formatting to html.
+ removed link to obsolete env/grafics gif
9 years ago
Michael Peter Christen
bf6709d196
fixed missing browser activation in linux
9 years ago
Michael Peter Christen
d8504418b6
enhanced browser-caching of static content
9 years ago
Michael Peter Christen
079112358c
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
efeb592661
don't do solr optimization, this create high IO load. We should leave
...
this task to solr to do that on it's own instead of forcing it.
9 years ago
luccioman
46b8836548
Copy image resources contained in donation iframe.
...
Handle eventual images loading errors.
9 years ago
reger
4c7a77662a
eleminate dependency on file-extension in storeDocument but use supported mime-type
...
to also support handling of urls w/o corresponding file-extension.
For this refactor use of document.getParserObject() to alway return a Parser (for clean logic)
and define/move the scraperObject as local var of AbstractParser.
Adjust related calls to getParserObject (where actually a scraperObject is wanted).
Addionally skip appending url token to parsed text for dht metadata entries
(by default returned as result by rwi index).
9 years ago
reger
ebde21079a
refactor xlsParser to include Excel file attribute (like author) in parser result doc.
...
Similar to ppt and doc parser, completing a TODO in xlsParser.
9 years ago
luccioman
744c9a2615
Opensearch desc : handle https protocol url with default port (443)
...
This completes modifications made for mantis 669
(http://mantis.tokeek.de/view.php?id=669 )
9 years ago
luccioman
b9c28893ee
Merged master to 'heroku' branch.
9 years ago
Michael Peter Christen
103a8348b3
fix for NPE and small performance enhancement
9 years ago
reger
2910fe35c1
add missing scheduler calc of next exec_date (call of calculateAPIScheduler)
...
- after last_exec_date is altered, next_exec_date should be recalculated
- makes the recalculation of next_exec in advance (without api call surely made) in Switchbard.schedulerJob() obsolete
Slightly modify next_exec calc. on missed event to now+schedule_time (from fix 10min)
9 years ago
reger
70d47ae38a
keep scheduler selection by repeat entry from 07311020d4
...
to allow exec schedule on actual exec event.
Iterate on exec date (of advantage after interruption/shutdown) to schedule
older or missed events first.
9 years ago
reger
7c3f932e5d
revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)
9 years ago
reger
07311020d4
postpone apicall exec date init until actual call
...
fix for http://mantis.tokeek.de/view.php?id=677
The difference is on scheduling a large number of rss feeds and loading
is not finished before shutdown of YaCy. The change makes sure not already
loaded RSS will be loaded by the scheduler on next startup.
9 years ago
reger
5e335b32da
fix Blacklist.contains() matching path pattern to string
...
similar to 5e9e871192
+ add proof testcase
9 years ago
reger
5e9e871192
fix Blacklist.remove by using pattern.toString to find pattern to remove,
...
parameter String path did never equal Pattern.
+ delete unused removeAll, as it does not persist changes after restart
9 years ago
reger
1843ea7e69
on Blacklist.add pattern to source file also update internal entry maps
...
as in Blacklist.add(blacklistType) to make entry effective w/o restart
fix for http://mantis.tokeek.de/view.php?id=676
9 years ago
reger
bf6ce33da3
Correct use of _htDocsPath config in YaCyDefaultServlet to use servlet config variable
...
+ add some javadoc and remove a not useful static declaration
9 years ago
luccioman
480027ec98
Merge remote-tracking branch 'origin/master' into heroku_experiments
9 years ago
reger
fcad2d0744
add uses of config constant INDEX_RECEIVE_ALLOW
9 years ago
reger
226f81cfcf
declare poison pill url MultiProtocolURL() as protected to make sure not
...
used from outside.
After double checking use of poison url revert path init from commit
f8632ad292
9 years ago
reger
f8632ad292
prevent string index out of bounds MultiProtocolURL.getPaths
...
as path maybe a empty string
+ init path to "" also in init for poison url (to guarantee success for
all existing uses of path w/o check for null)
9 years ago
reger
35a7d57260
update lucenematchversion to current (5.2.0 -> 5.5.0)
...
there should be no need for reindex by the update
9 years ago
reger
9b07bbf955
deprecate newurl(), not used and already replaced
...
instead of making it handle all supported the protocols
9 years ago
luccioman
47d486298f
Merged changes from master.
9 years ago
reger
774b3906a9
fix GenericFormatter.parse ("time","timeoffset")
...
change: UTC offset internally expected in minutes
9 years ago
reger
27163af0e1
improve detection of referenced links by taking http and https link protocol
...
into account
+ correct query start detection of commit f89d4eb51d
9 years ago
reger
f89d4eb51d
fix MultiProtocolURL init (assign of host) for urls with '/' in query part
...
+ add to test case
9 years ago
reger
87fcfc6d78
Adjusted hash computation and toNormalform for file:// protocol to deliver
...
same hash same file on Windows filesystem path with forward- and backslash in path.
Background see http://mantis.tokeek.de/view.php?id=671
+Test case
9 years ago
luccioman
d6bf90803f
Merged from maain master branch.
9 years ago
luccioman
9b9c112263
Handle more propertly local port configuration by system property
...
And prefixed property with "net.yacy" to avoid ambiguity.
9 years ago
reger
3811184abd
fix GSA servlet clientIP retrival
9 years ago
reger
7ab41d4ff1
use directories original lastmodified date in file- & smbloader in response
9 years ago
reger
708bcbb042
one more replacement to use cached hosthash vs. calculated
9 years ago
luccioman
b57a06d88e
Let Heroku decide which http port to use
9 years ago
reger
22db449f2a
to prevent crawler to concurrently access and alter same crawl queue
...
after restart, put hosthash in queue's filename (which is used as primary
key for crawl queue. Hint: initial hosthash from url and recalculated hosthash
from just hostname:port are not the same.
fixes http://mantis.tokeek.de/view.php?id=668 (partially)
9 years ago
luccioman
893a40995a
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Orbiter
50c5ddf1a1
Merge pull request #56 from luccioman/LibreJS
...
LibreJS compliance : YaCy JavaScript license information
9 years ago
Michael Peter Christen
7466d390b2
small refactoring + do not accept too old peers during bootstrap
9 years ago
luccioman
6e96c7341a
Merge remote-tracking branch 'origin/master'
...
Conflicts:
htroot/Load_MediawikiWiki.java
htroot/Load_PHPBB3.java
htroot/ViewImage.java
9 years ago
reger
8d58a48029
remove wrong log line in CrawlSwitchboard
...
+ don't allow CrawlSwitchboard to exit application
making network param unused
9 years ago
reger
5aaa057c65
ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read.
...
equalizes behavior with getListString()
improves: case were blacklist file contained a undesired empty line, not
fixed by blacklist-cleaner.
9 years ago
reger
41c36ffd75
exclude rejected results from result count
...
(by using the resultcontainer.size instead of input docList.size)
skip waiting for write-search-result-to-local-index
(by removing the Thread.join - which will bring a small performance increase)
9 years ago
reger
d4da4805a8
internal wiki code, require header line to start with markup
...
(to allow something like "one=two" as text)
+ incl. test case
9 years ago
reger
e952e355a2
have Translator servlet adhoc apply added translation by translating a single file
...
+ fix NPE in Translator, coming from translation read by TranslatorXliff
which allows null content for not translated key's
9 years ago
reger
b119ff65be
clean out not used Switchboard variables
...
counter indexedPages, const xstackCrawlSlots
9 years ago
reger
223071337b
Translator to take caution of word boundaries to identify text portion to
...
be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it"
translation of partial terms/words.
Add check of word boundary before and after sourcetext (incl. take care
of current praxis for key to be delimetered by > <
+ add test case
9 years ago
luccioman
009657791e
Merge remote-tracking branch 'origin/master' into LibreJS
9 years ago
luccioman
a73c9327a5
JavaScript License fixes for LibreJS compatibility
9 years ago
reger
0c40401d28
fix MessageBoard test for null data
9 years ago
reger
5b22c63030
Adjust TranslatorXliff to load default 1st and merge downloaded or modified local translation.
...
process 1. load default from locales/*.*
2. load and merge(overwrite) from DATA/LOCALE/*.* (can be partial translation as it is merged)
- include all entries from DATA/LOCAL to be edited in Translator servlet
and save just modifications (instead of full list) to DATA/LOCALE
This shall make it easy to share modifications.
9 years ago
reger
a2e0f00456
optimize Translator
...
- translateFilesRecursive: load translation once (reduce io), return true on complete success
- remove resulting unused translateFiles() variant
- translate: use StringBuilder parameter (skip toString conversion)
- remove not needed static declaration
- upd some javadoc
9 years ago
reger
a6ba1faa80
introduce a translation edit servlet Translator_p.html YaCy's UI text translation
...
This is the 1st rudimentary approach to support the translatio utilities.
It allows currently to edit untranslated text and save it in a local translation file
in the DATA/LOCALE directory.
+ refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine
+ adjust TranslatorXliff to check for local translations in DATA/LOCALE,
this includes storing manually downloaded translation files in DATA as well
(to keep default untouched)
+ on 1st call of Translator_p a master tanslation file is generated, checking
the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages)
Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.
9 years ago
reger
b3c9041f79
remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames
...
to free unused resources
9 years ago
reger
bd8f7c11f5
Use transparent addToCrawler in AutoSearch instead of addToIndex
...
This would likely also be of advantage for RSS import/schedule as
following bug-reports suggest
http://mantis.tokeek.de/view.php?id=569
http://mantis.tokeek.de/view.php?id=655
9 years ago
reger
f23d8ab47b
fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP()
...
returning null in intranet mode (in servlets: ConfigSearchBox, Load_PHPBB3
+remove unused (const ∅) seed.IPTYPE
9 years ago
reger
bb0076c3dd
fix: assure close inputstream in TranslatorXliff after reading xlf file
...
by using try-wiht-resource block
9 years ago
reger
6384b7d82e
fix NPE in Load_MediawikiWiki servlet in intranet mode
...
- in intranet mode getip returns null causing a NPE
- adjust starturl (which was set to http://localip/repository ) which is never the start url for the Mediawiki
+ correct javadoc for seed.getIP()
9 years ago
Michael Peter Christen
596b5dfa59
add the JRE version in the seed. Purpose: identify if it is possible to
...
migrate to new JRE version
9 years ago
reger
4cc38e979d
add InputStream close after reading input file (Vocabulary_p servlet)
9 years ago
reger
6bf9c55584
adjust Solr select servlet to lates bugfix for boostquery (bq param)
...
to split query into multiple parameter on line separator in input query.
e.g. split "crawldepth_i_0^10.0 \n crawldepth_i:1^5.0"
but allow "url_file_ext_s:jpg OR url_file_ext_s:png" to be unsplitted
9 years ago
Burkhard
9a18e2297b
Merge pull request #51 from JeremyRand/multiple-boost-query
...
Fix multiple boost queries
9 years ago
reger
f0d7b93372
make use and activate autodetect charset in Vocabulary input from file
...
+ revert mistake of empty cn.lng
9 years ago
JeremyRand
433217b33e
Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
9 years ago
JeremyRand
58824dfa6c
Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.
9 years ago
reger
9e94989237
upd to PDFBox 2.0.1
9 years ago
reger
d0a571bed2
del cytag trail for own index.html (save resource not used by default)
9 years ago
reger
de46879637
fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut)
9 years ago
reger
24b0fa2a38
extend snapshot Html2Image.pdf2image to use PDFBox image export capability
...
if no external tool installed (and for Win)
Resulting jpg are not always perfect (if graphic included) but imho sufficient.
9 years ago
reger
eb2a00b1d8
fix NPE on missing crawldepth_i
9 years ago
reger
efb9f1a8b7
save resource for unused blacklistFiles map
9 years ago
reger
5f113be760
cleanup connectPeer & yacyVersion.latestRelease usage
...
obsolete since
527b3decde
9 years ago
reger
7097dcbdbd
cleanup hack for partial Solr update on multivalued datefields
...
has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050
9 years ago
reger
f10ea3c155
clean-out unused SwitchboardConstants
9 years ago
reger
ef24593347
delete obsolete SEARCHRESULT busythread constants
...
not used since 29.05.2013 18:27:27
0c1a018bbd
9 years ago
reger
125b5e26a5
apply bugfix for ChartPlotter from Pullreq 42
...
https://github.com/yacy/yacy_search_server/pull/42
thanks to otteresk (https://github.com/otteresk )
9 years ago
reger
06ce9ae711
prevent "unchecked conversion" compiler message
...
+ include "translate" property in xlf "trans-unit" export
9 years ago
reger
b4a576dbdf
exclude unused protocol param "duetime"
...
(receiver interpretes param "time" only)
9 years ago
reger
3bd6ae8d8b
keep addon/Notepad++ keyword marker on lng export
...
(length of remarks devider line)
+ harmonize status_p.inc lng text
9 years ago
reger
16837d60c7
fix version in locale version file
...
(it's compared to full version)
9 years ago
reger
0fb01e429e
fix migration, account for ssl port in config (for auto-disable https)
9 years ago
reger
7be1c7a05a
fix logger name
9 years ago
reger
1d940e5a94
upd commons-compress 1.11
9 years ago
reger
7789c32c82
delete crawl queue on init exception
...
(happens occasionally on path name vaiolation and will never get resolved)
9 years ago
reger
f781b9dd47
revert call condition f. migration.installSkins
...
(a bug introduced in fb8ae14b21
,
see comment on that commit )
9 years ago
reger
3adb670f44
remove never used Domains.myHostNames set
9 years ago
reger
6ecc180299
fix rwi doubledom return best (highest) ranking
9 years ago
reger
2343e3f1cd
keep and update existing xlf translation master instead of create new
...
in utility CreateTranslationMasters
+ small fixes in lng's
9 years ago
reger
a1935f485f
Added utility class CreateTranslationMasters to create a language independant
...
translation master as source to harmonize individual translation files
Included a main to create masters in YaCy an xliff format for testing
+ restrict TranslatorXliff to use only entries with State=translated
P.S. used https://open-language-tools.java.net/editor/about-xliff-editor.html to
experiement with xlf output (haven't a Pootle avail.)
9 years ago
reger
acaf51b296
keep ConfigLanguage_p as 1st entry in exported translation file
...
+ rem untranslated text & some typo fixes in several translations
(considering to create a translation master file to harmonize entries)
9 years ago
reger
61c5b6b403
fix empty drop down list in ConfigLanguage after wrong/empty download
...
+ add xliff translated attribut
+ append japanese lng name
9 years ago
reger
4eddabee42
translate Network History screen -> de
...
+ remove leftover debug line
9 years ago
reger
90c79014ae
remove unused translator routine which also doesn't handle rel path input
...
+ correct some language file match issues
9 years ago
reger
902e79e261
Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
...
This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649
Allows longer term also to store translation maps for the htroot files
in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ).
+ added test case creating and comparing xliff file with internal custom prop file.
(currently the introduced class is not used in core code)
9 years ago
reger
d9adc2c255
load handler for Transparent Proxy on startup only if feature is activated
...
to save the resources and keep handler chain small if the feature is not used.
+add a warning message on settingsack_p page to restart on first activation
9 years ago
reger
ec24a0c85a
add test case for optimized toTokens()
9 years ago
reger
cada24f918
adjust utility ListNonTranslatedFiles for path compare on windows
...
(backslash replace)
9 years ago
reger
fb8ae14b21
make migration version safe
9 years ago
reger
258cd41577
reduce logging (EmbeddedSolrConnector.query)
...
mainly to reduce the frequent metadat checks like
> EmbeddedSolrConnector.query QUERY: q={!cache=false raw f=id}xXxXxX&rows=1&start=0&fl=id,load_date_dt
(p.s. direct servlet queries logged via AccessTracker.addToDump)
9 years ago
reger
6783ef5540
move example code SearchClient out of yacycore package
...
to example directory
9 years ago
Michael Peter Christen
b89465d952
0N - basic dump upload servlet infrastructure, to share index dumps
...
within an experimental new sharing model
9 years ago
Michael Peter Christen
f12a900f3e
harmonization of http post of files for one and several files - this had
...
been differently - and wrong for several files. also: base64-encoding
for gzipped push files because our data structures currently only
supports ASCII POST pushes..
9 years ago
Michael Peter Christen
849ab671a9
0n: modified the p2p bootstraping process - rules had been too tight and
...
did not support the re-start of a network with just one principal peer.
9 years ago
reger
764f5100f0
fix delete of temp file after odt % ooxml parser
...
Close zipfile after parsing
9 years ago
reger
379e9b330d
use supplied url port to get robots.txt in crawlers hostqueue
9 years ago
reger
58a959403d
fix mixed logfactory in UrlProxyServlet,
...
Class doesn't use functions of declared ancestor, change to extend on httpservlet
9 years ago
Michael Peter Christen
2494a820c7
0N - added recording of dump exports if given time frame is not negative
9 years ago
Michael Peter Christen
ef2cc4f690
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
a6bf0b1649
0N - added option to generate index export files for a specific number
...
of minutes in the past and reverted latest change. The export file dump
will now contain four data elements: f - first date of index entry write
date, l - last date of index write date, n - now-date of index dump
time, c - count of numbers inside the dump. '0N' denotes a series of
changes which will lead to the opportunity to exchange index data dumps
in a way that is needed to integrate ZeroNet index data. This will be
based on index dump sharing; that causes this commit.
9 years ago
reger
6d56beaed8
fix assertion exception in toString of MultiProtocolURL
...
toString of AnchorURL and MultiProtocolURL are identical code
(no need to override or to protect call to parent)
as reported in https://github.com/yacy/yacy_search_server/issues/43
9 years ago
reger
42a7bdb2af
fix SolrSelectServlet authentication to default to true
9 years ago
reger
dbb28bb4f3
del unused statistic parameter (from status servlet)
9 years ago
reger
06d0e2aeb9
result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
...
- Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).
9 years ago
reger
caf9e98f09
put metadata dc_publisher in corresponding schema field
9 years ago
reger
38e2b054d4
remove servlet classloder internal cache map (to save the resources, cache hits marginal)
...
- DefaultServlet includes already a class cache "templateMethodCache" which is emptied
on low mem status
- avoid classloader cache gets has no hits but over time holds all (used) servlet classes
9 years ago
luc
3f338777f7
Also check and index eventual icon url information from metadata.
9 years ago
luc
9f712146df
Display icons in ViewFile "links" mode.
9 years ago
luc
26f1ead57c
Created ViewFavicon class specialized in favicon viewing.
...
Main image processing is now in ImageViewer, used by both ViewImage and
ViewFavicon.
Fixed URIMetadataNode.getFavicon to use non-standard icons with no size
ass fallback.
9 years ago
reger
6f0b073bf3
override detected language (statistic langdetect) only with TLD determided
...
language if langdetect probability is not high.
+ additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh
used by YaCy
9 years ago
reger
b65e2b527d
include use of condenser's content text for language detection.
...
Language identification may show poor performance on documents with short or no
title but clear lang indication in text content. Using content text too
improves lang detection.
+ remove double caching of text in Identificator
9 years ago
luc
07222b3e1a
Added favicon url transmission in RWI chunks.
9 years ago
luc
480772c070
Fixed json search results from commit "Improved URLLicence reliability"
9 years ago
reger
937fbb0b9f
correct isHidden() for smb from last commit
9 years ago
reger
535d4bf75f
respect hidden attribute for file and smb directory listing
...
(hidden directories are not listed, effects crawling of local file system)
9 years ago
luc
3cc5619d93
Improved HTML icons indexing and rendering in search results.
...
See http://mantis.tokeek.de/view.php?id=629
9 years ago
luc
edef6cd0dc
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
c28142095a
add findClass() to servlet class loader (used in YaCyDefaltServlet)
...
In the 2 cases where servlet calls servlet the jvm classloader chain is
invoked and servlet class loaded by jvm loader (successful while requiring
htroot in system classpath). This patch uses the standard override design
for loaders to handle these cases (making in not longer crucial to have htroot
in system classpath, as this classLoader is mainly used for servlets and
looks in this case for the class in the configured path).
+ As the default classloader is parallelcapable we should register this too.
9 years ago
luc
f7b854465b
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
a6617ad887
expand initRemoteCrawler() to terminate worker threads if called to deactivate
...
remote crawl.
On startup we save the resources for remote crawler if disabled. Once started
threads are running idle after disable remote crawl. Now threads are terminated
to save the resources also while disabeling during runtime.
+ remove empty class Channels
9 years ago
reger
2048b7e057
support scraping start-/enddate from html tag with property "datetime"
...
This may be used in html5 <time> tag (which we don't explicite support yet for date in content scraping).
9 years ago
reger
900d4584ba
complet resource cleanup of lists in contentscraper's close()
9 years ago
luc
aa60ad1dbc
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
1f18653de0
pass parsed swf content trough htmlscraper
...
Swf may contain subset of html tags which shoul'd appear as text.
Especially <font> tag may totally screw up metadata servlet if not filtered out.
9 years ago
reger
18ecf57792
add support of compressed swf to swfParser
...
from JavaSWF2 (source compatible to WebCat).
Moved swf file signature check to parser
Changed use of synced vector to list swf InStream
9 years ago
sixcooler
5cb7ba0dc4
fix for connections not getting closed to get favicon.ico during seach
9 years ago
luc
ef83e34b8a
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
ed3e16e092
apply remote result count config value to Bookmark Autosearch
...
+ prepare to make the widely unused Bookmark feature optional
9 years ago
Ryszard Goń
a98c395023
Add the Autocrawl thread
9 years ago
Ryszard Goń
1728cd30c6
Create autocrawl profiles
9 years ago
luc
41767a01c2
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
ff27824964
fix swfParser reading file signature
...
before passing to library (current version expects data w/o signature)
9 years ago
luc
7aa1a29e33
Return more accurate HTTP status 400 with detail message when some error
...
occurs on ViewImage :
- missing required parameters
- url licence invalid
9 years ago
luc
bd9dc2f32b
Corrected NullPointerException cases occuring in YJsonResponseWriter
...
when no description is available.
9 years ago
luc
0076f9f97d
Updated documented sample url
9 years ago
luc
cfdbc2b487
Improved URLLicence reliability for use by conccurrent non authaurized
...
users.
Removed URLLicence generation when unnecessary (authorized users)
9 years ago
reger
c91e712178
further refactor using standard java / (one) utf-8 charset variable
...
extending initiative of commit 9a25751850
9 years ago
luc
571bc55937
Refactoring : use StandardCharsets constants instead of hard-coded
...
charset names.
9 years ago
reger
1af0e9ef74
remove workaround for Solr bug regarding multivalued date fields
...
fixed in 5.4.0
http://issues.apache.org/jira/browse/SOLR-8050
9 years ago
sixcooler
5a35f9383a
bump to solr/lucene 5.4.0
9 years ago
reger
a58d34a4e8
check error URL cache before adding errorDoc to index
...
- del obsolete related switchboardconstant
9 years ago
reger
e9539b1086
reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart
...
- add filename to parameter fieldname
- add filecontent to special parameter fieldname$file
(some servlets use this $file parameter)
fix for http://mantis.tokeek.de/view.php?id=542
9 years ago
reger
cd26717ba2
fix low memory status hint (dht-in disabled)
...
http://mantis.tokeek.de/view.php?id=619
9 years ago
reger
a5faf73afa
remove obsolete yacy.init entries interaction.*
...
(related to removed triplestore)
9 years ago
sixcooler
dce1cb65c4
Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger
46ac0867ff
fix poison mediawikiimporter output queue also after ExecutionException
...
in worker thread.
Writer of importer keeps needs a poison to close the file. On exception (e.g. OOM)
add a poison marker in outer most try/catch to assure output queue will terminate
in this condition too (and closes+renames the surrogate/in/xxx.prt file)
9 years ago
reger
a7591d3ed0
fix mediawikiimporter number format exception on coordinate parsing
...
handle uncomplete metadata like "NS=43/50//N".
For other {expr ... } type entries a try catch added
9 years ago
reger
9da1712a31
increase http header EXPIRES for css and images in DefaultServlet
...
to increase browser cache hits for not changing content
9 years ago
reger
6d54eb3d36
skip loading document on crawl start for YMark bookmarks
...
by adding a constructor giving the already loaded document as parameter.
9 years ago
reger
80e2c82249
fix NPE on empty blog importfile parameter
9 years ago
reger
e84d94f8ca
fix mime table for ms office / open office documents
...
(causing wrong parser detect in intranet mode)
9 years ago
reger
45b9bd8403
adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
...
and feeding hyperlinks to webgraph processing.
9 years ago
reger
d5fd031449
fix reading of ippattern config array in URLProxy
9 years ago
reger
b7e8358645
make use of header.getContentType where possible (mime is normalized afterwards)
...
otherwise use header.mime() differentiated in prev. commit.
9 years ago
reger
7a8c077838
fix HeaderFramework.mime() to strip charset parameter.
...
Differentiate mime() and getContentType() which gives the raw header field.
This improves parser detection if charsets are included in http content-type field.
9 years ago
reger
b4b6910d60
fix (todo): correct doc.id of remote search result if no match with newly
...
calculated doc hash if different.
Testing showed that in some cases delivered url doesn't match the local
calculated hash. In this case replace doc.id (and host_id_s) with calculation
from url.
9 years ago
reger
dec3e6ad96
fix: adjust urlstub for mailto links
...
(skip protocol)
9 years ago
reger
cb83e65f89
drop returning document language "en" if unknown (fix todo)
...
which also harmonizes handling of query.modifier for rwi and solr results
(to result must match a given language filter)
9 years ago
reger
0c5548a7ff
fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger
71c416f383
show mailto links in ViewFile.html linklist
9 years ago
reger
6b7c10cef8
fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago
reger
14803d58cd
let html scraper accept html5 <link rel="icon"> for favicon links
9 years ago
luc
b4cdacee76
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc
ba0a293f5c
Corrected another case of
...
org.apache.lucene.store.AlreadyClosedException" occuring when
SearchEvent.cleanup() was called while committing local solr index.
9 years ago
reger
4d2b934487
prevent mailto links getting into parser result document's in/outbound link collection
...
by checking mailto scheme early.
- fix upper case mailto protocol assignment
- add test case for getProtocol
9 years ago
luc
8c4ab9c76b
Added an option to eventually limit size of remote solr documents put to
...
local index. See mantis #626 .
9 years ago
luc
a2c08402af
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc
70595d05d0
Modified MemoryControl.main() test to properly end for better results
...
displaying.
9 years ago
sixcooler
1be67d9ab6
CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years
...
ago - time to let it go
Commented out unused table of cache-objects
9 years ago
reger
28b8bc290a
fix use of NETWORK_SEARCHVERIFY for rwi verification
...
was not used to set the searchevent parameter (done in SearchEventCache.getEvent)
- remove unused corresponding QueryParams.filterfailurls param.
9 years ago
reger
020630efd8
remove unused network scanner parameter from queryparameter
...
Search event is not using networkscanner
(removed filterscannerfail param always init to false)
9 years ago
luc
ad5586f8f6
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc
8ebefa4233
Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
...
failing. Looks like it was broken since Commit
b43811d38c
9 years ago
luc
7736ee5a42
Updated MediaWimporter main() : display usage in console and stop
...
properly without calling System.exit
9 years ago
reger
cdb8f3b10d
make current ranking score value avail. to search interface / api
...
Update the result score result field with the result queue ranking value to reflect
the actual calculated/used score,
for rwi & solr stack results.
(calc. etc. is unchanged, it's just that result entry carries the latest val
as api retrieves the number from it)
9 years ago
luc
27d11f8671
Fixed isSolrDump function : PushBackInputStream was not unread when
...
returning false (for example with a WikiMedia dump).
9 years ago
Michael Peter Christen
135a123a77
less logging in new language detection
9 years ago
Michael Peter Christen
ef8cd80593
fix for npe
9 years ago
reger
0347bfa71f
Apply collection query constraint/modifiert to rwi result stack.
...
Collection is not available in pure rwi entries (but in local solr metadata)
But if user wishes to filter by query constraint also rwi shall adhere to this
(even if only rwi entries with parsed or solr received metadata may fit)
9 years ago
luc
2a67d2ba6f
Corrected error management for unsupported image formats, parsing
...
errors, and unavailable resources : avoid logging to much Exceptions as
these errors easily occur when searching images.
9 years ago
Michael Peter Christen
d6e9834040
Merge branch 'master' of
...
https://github.com/Scarfmonster/yacy_search_server
# Conflicts:
# .classpath
# build.xml
9 years ago
Michael Peter Christen
d82d311995
Merge branch 'master' of https://github.com/luccioman/yacy_search_server
...
# Conflicts:
# .classpath
9 years ago
reger
b5371ea8c1
read/init crawl queue in a thread
...
to speed-up YaCy start on large existing crawler queues
9 years ago
reger
1160b13172
remove unused md5 from ViewFile servlet params
9 years ago
reger
e163ea88f6
fix vsdParser (Visio) parser return statement
...
(final block un-necessary throw)
9 years ago
reger
b2c8bc0ae6
remove md5_s from default index fields
...
it is not assigned a value / not used
Due to above also excluded from transfer protocol.
9 years ago
luc
e40ae0943b
- No max dimensions specified : render raw image data when source and
...
target image format are the same.
- Corrected scaling condition.
9 years ago
reger
90686a75a2
fix flux factor (additional crawl delay by access count) calculation
9 years ago
luc
4af27289e5
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
297fdb60d3
throw exception if crawler hostqueue can't create hostpath directory.
...
In rare cases hostname may not be a valid filesystem directory name,
which can't be created (e.g. containing '*' char). To prevent crawl queue
looping on this invalid entry by throwing a malformedurlexception.
9 years ago
luc
755efac17d
Use same max file size when loading all resource bytes or opening stream
...
content
9 years ago
luc
bc6c79fc12
Corrected scaling function for non RGB images.
9 years ago
luc
1565559df8
Refactoring : extracted write InputStream method.
9 years ago
luc
f0478bb14d
BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
...
imageio-bmp-3.2 library.
- better BMP format flavours support
- handle PNG encoded icons
- handle transparency
Added some javadoc url references to .classpath
9 years ago
luc
07437986e7
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
97cc03ef6a
start using a template for urlproxy header
...
It is included as iframe /proxmsg/urlproxyheader.html
to allow full servlet functionallity and flexibility to display some
index/meta data in future.
9 years ago
luc
f01d49c37a
Process large or local file images dealing directly with content
...
InputStream.
9 years ago
luc
3c4c77099d
If available, check content length before downloading. Check also
...
content length is not over Integer.MAX_VALUE.
9 years ago
luc
5bbb2e1730
Ensure resource is closed when reading a full file InputStream
9 years ago
luc
6291a57300
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
0d3c5b223e
have psParser cleanup temp file
9 years ago
reger
7d0d19cb8e
avoid File.deleteOnExit() on temp files
...
JVM registers each file in a list regardless of already deleted and never
cleans up the list during runtime.
This accumulates to a considerable amount of mem during large crawls and/or
long uptime.
To tackle this, all temp files are now created in a subdir of java.io.tmpdir
and the jvm tmpdir property is set to this subdir, which is deleted by
code on shutdown.
Additionally let pdfParser use this tmp subdir too.
9 years ago
luc
bfe51001e3
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
02e4489a23
set tmpfile.deleteOnExit by default,
...
to make sure files are removed on shutdown.
9 years ago
reger
2985baaa01
Exclude repetitive protocol part in tokenized url
...
used as description if none is avail. from parser.
9 years ago
reger
ca3d26a401
harmonize wordsintitle & CollectionSchema.title_words_val calculation,
...
remove obsolete partial init of wordreference from urimetadata
9 years ago
reger
52a9040ae6
Sort out double keywords (dc_subject) early in parsed documents
...
- by direct using Set vs. List
- remove not neede String[] getter
9 years ago
luc
49331dc523
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
47d70732f6
improve locale translator
...
- skip empty line
- robustness file section detection (space independant)
9 years ago
sixcooler
646afe9183
do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler
194df613de
not using 'location' as defaultfacetfield - since we removed it being
...
default.
9 years ago
sixcooler
d3b9349b6f
simplification / speedup of GenerationMemoryStrategy
9 years ago
sixcooler
4a905ec134
fix to not let the AccessTracker-Log grow to much, but have enough data
...
to monitor.
(+gitignore-correction)
9 years ago
reger
20e18d79f8
harmonize document title for archive parsers
9 years ago
luc
f11b5e8309
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
112ae013f4
update bzip and bzip parser process,
...
to return one document for the file with combined parser results of the
containing file and registers it with supplied url and mime of the archive.
9 years ago
reger
e76a90837b
update zip and tar parser process,
...
to return one document for the file with combined parser results of the
containing files.
9 years ago
luc
4e673ffc9a
Ensure closing of InputStream even when an exception occurs.
9 years ago
luc
10696b53f7
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
8532565c7d
optimize order of parsers to try
...
- start with a parser matching the remote supplied mime
9 years ago
reger
681889ae64
use current tar library for untar files
...
- remove old source copy
9 years ago
reger
5d71fc70e3
fix tarParser early exit on looping content
...
- adjust check of data available according to doc
- return null on no recognized content (to not exit TextParser next parser try)
- use commons.compress directly
9 years ago
luc
bcc2e7cb5b
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
2fcf6f104c
fix bzipParser recognition
...
- Bzip2Inputstream checks magic byte itself to identify bz2 (leave it in input)
- try to suppy fitting mime for parsing bz2 content
9 years ago
luc
745e97a575
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
a60b1fb6c2
differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger
11f3666660
increase use of pre.defined CATCHALL_QUERY string
9 years ago
reger
a58ee49307
Optimize internal imagequery focus on using content_type to select images
...
(in favor of url file extension)
9 years ago