luc
07222b3e1a
Added favicon url transmission in RWI chunks.
9 years ago
luc
480772c070
Fixed json search results from commit "Improved URLLicence reliability"
9 years ago
reger
937fbb0b9f
correct isHidden() for smb from last commit
9 years ago
reger
535d4bf75f
respect hidden attribute for file and smb directory listing
...
(hidden directories are not listed, effects crawling of local file system)
9 years ago
luc
3cc5619d93
Improved HTML icons indexing and rendering in search results.
...
See http://mantis.tokeek.de/view.php?id=629
9 years ago
luc
edef6cd0dc
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
c28142095a
add findClass() to servlet class loader (used in YaCyDefaltServlet)
...
In the 2 cases where servlet calls servlet the jvm classloader chain is
invoked and servlet class loaded by jvm loader (successful while requiring
htroot in system classpath). This patch uses the standard override design
for loaders to handle these cases (making in not longer crucial to have htroot
in system classpath, as this classLoader is mainly used for servlets and
looks in this case for the class in the configured path).
+ As the default classloader is parallelcapable we should register this too.
9 years ago
luc
f7b854465b
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
a6617ad887
expand initRemoteCrawler() to terminate worker threads if called to deactivate
...
remote crawl.
On startup we save the resources for remote crawler if disabled. Once started
threads are running idle after disable remote crawl. Now threads are terminated
to save the resources also while disabeling during runtime.
+ remove empty class Channels
9 years ago
reger
2048b7e057
support scraping start-/enddate from html tag with property "datetime"
...
This may be used in html5 <time> tag (which we don't explicite support yet for date in content scraping).
9 years ago
reger
900d4584ba
complet resource cleanup of lists in contentscraper's close()
9 years ago
luc
aa60ad1dbc
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
1f18653de0
pass parsed swf content trough htmlscraper
...
Swf may contain subset of html tags which shoul'd appear as text.
Especially <font> tag may totally screw up metadata servlet if not filtered out.
9 years ago
reger
18ecf57792
add support of compressed swf to swfParser
...
from JavaSWF2 (source compatible to WebCat).
Moved swf file signature check to parser
Changed use of synced vector to list swf InStream
9 years ago
sixcooler
5cb7ba0dc4
fix for connections not getting closed to get favicon.ico during seach
9 years ago
luc
ef83e34b8a
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
ed3e16e092
apply remote result count config value to Bookmark Autosearch
...
+ prepare to make the widely unused Bookmark feature optional
9 years ago
Ryszard Goń
a98c395023
Add the Autocrawl thread
9 years ago
Ryszard Goń
1728cd30c6
Create autocrawl profiles
9 years ago
luc
41767a01c2
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
ff27824964
fix swfParser reading file signature
...
before passing to library (current version expects data w/o signature)
9 years ago
luc
7aa1a29e33
Return more accurate HTTP status 400 with detail message when some error
...
occurs on ViewImage :
- missing required parameters
- url licence invalid
9 years ago
luc
bd9dc2f32b
Corrected NullPointerException cases occuring in YJsonResponseWriter
...
when no description is available.
9 years ago
luc
0076f9f97d
Updated documented sample url
9 years ago
luc
cfdbc2b487
Improved URLLicence reliability for use by conccurrent non authaurized
...
users.
Removed URLLicence generation when unnecessary (authorized users)
9 years ago
reger
c91e712178
further refactor using standard java / (one) utf-8 charset variable
...
extending initiative of commit 9a25751850
9 years ago
luc
571bc55937
Refactoring : use StandardCharsets constants instead of hard-coded
...
charset names.
9 years ago
reger
1af0e9ef74
remove workaround for Solr bug regarding multivalued date fields
...
fixed in 5.4.0
http://issues.apache.org/jira/browse/SOLR-8050
9 years ago
sixcooler
5a35f9383a
bump to solr/lucene 5.4.0
9 years ago
reger
a58d34a4e8
check error URL cache before adding errorDoc to index
...
- del obsolete related switchboardconstant
9 years ago
reger
e9539b1086
reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart
...
- add filename to parameter fieldname
- add filecontent to special parameter fieldname$file
(some servlets use this $file parameter)
fix for http://mantis.tokeek.de/view.php?id=542
9 years ago
reger
cd26717ba2
fix low memory status hint (dht-in disabled)
...
http://mantis.tokeek.de/view.php?id=619
9 years ago
reger
a5faf73afa
remove obsolete yacy.init entries interaction.*
...
(related to removed triplestore)
9 years ago
sixcooler
dce1cb65c4
Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger
46ac0867ff
fix poison mediawikiimporter output queue also after ExecutionException
...
in worker thread.
Writer of importer keeps needs a poison to close the file. On exception (e.g. OOM)
add a poison marker in outer most try/catch to assure output queue will terminate
in this condition too (and closes+renames the surrogate/in/xxx.prt file)
9 years ago
reger
a7591d3ed0
fix mediawikiimporter number format exception on coordinate parsing
...
handle uncomplete metadata like "NS=43/50//N".
For other {expr ... } type entries a try catch added
9 years ago
reger
9da1712a31
increase http header EXPIRES for css and images in DefaultServlet
...
to increase browser cache hits for not changing content
9 years ago
reger
6d54eb3d36
skip loading document on crawl start for YMark bookmarks
...
by adding a constructor giving the already loaded document as parameter.
9 years ago
reger
80e2c82249
fix NPE on empty blog importfile parameter
9 years ago
reger
e84d94f8ca
fix mime table for ms office / open office documents
...
(causing wrong parser detect in intranet mode)
9 years ago
reger
45b9bd8403
adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
...
and feeding hyperlinks to webgraph processing.
9 years ago
reger
d5fd031449
fix reading of ippattern config array in URLProxy
9 years ago
reger
b7e8358645
make use of header.getContentType where possible (mime is normalized afterwards)
...
otherwise use header.mime() differentiated in prev. commit.
9 years ago
reger
7a8c077838
fix HeaderFramework.mime() to strip charset parameter.
...
Differentiate mime() and getContentType() which gives the raw header field.
This improves parser detection if charsets are included in http content-type field.
9 years ago
reger
b4b6910d60
fix (todo): correct doc.id of remote search result if no match with newly
...
calculated doc hash if different.
Testing showed that in some cases delivered url doesn't match the local
calculated hash. In this case replace doc.id (and host_id_s) with calculation
from url.
9 years ago
reger
dec3e6ad96
fix: adjust urlstub for mailto links
...
(skip protocol)
9 years ago
reger
cb83e65f89
drop returning document language "en" if unknown (fix todo)
...
which also harmonizes handling of query.modifier for rwi and solr results
(to result must match a given language filter)
9 years ago
reger
0c5548a7ff
fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger
71c416f383
show mailto links in ViewFile.html linklist
9 years ago
reger
6b7c10cef8
fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago